CN1262991C

CN1262991C - Method and apparatus for tracking the phase of a quasi-periodic signal

Info

Publication number: CN1262991C
Application number: CNB008192006A
Authority: CN
Inventors: A·达斯
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2000-02-29
Filing date: 2000-02-29
Publication date: 2006-07-05
Anticipated expiration: 2020-02-29
Also published as: JP4567289B2; KR20020081352A; EP1259955B1; CN1437746A; EP1259955A1; JP2004502203A; DE60025471D1; AU2000233852A1; DE60025471T2; HK1055834A1; WO2002003381A1; KR100711040B1

Abstract

The present invention relates to a method for tracking phases of pseudo-periodicity signals. The present invention comprises the steps that for a frame with a periodic signal, the phase of the signal is estimated, and a closed-loop performance measuring value is used to monitor the performance of the estimated phase; for a frame with a periodic signal, and the performance of the estimated phase of the signal is lower than a preset threshold level, the phase of the signal is measured. In the estimated phase, if the previous frame is periodic, an initial phase value is set to be equal to the estimated final phase value of the previous frame; if the previous frame is nonperiodic, or a situation that the previous frame is periodic, and the performance of the estimated phase of the previous frame is lower than the preset threshold level exists, the initial phase value is set to be equal to the measured phase value of the previous frame; for a frame with a signal which is nonperiodic, the phase of the signal is measured. Whether the signal of a specified frame is periodic can be judged through open loop periodicity judgement.

Description

Method and apparatus for tracking the phase of a quasi-periodic signal

发明背景Background of the invention

发明领域field of invention

本发明一般与语音处理领域有关，尤其，与用于跟踪准周期性信号的相位的方法和设备有关。The present invention relates generally to the field of speech processing and, more particularly, to a method and apparatus for tracking the phase of a quasi-periodic signal.

背景background

通过数字技术的语音传输已经变得普及，特别是在远距离和数字无线电话应用中。而这又使得人们对确定可以通过信道发送最小信息量而同时保持再现语音的察觉质量方面产生兴趣。如果通过简单的采样和数字化发送语音，则需要大约为每秒64千比特(kbps)的数据速率，才能达到传统模拟电话的语音质量。然而，通过使用语音分析，接着通过合适的编码、发送和在接收机处再合成，可以使数据速率大大地降低。Voice transmission via digital technology has become popular, especially in long-distance and digital wireless telephony applications. This in turn has led to interest in determining the minimum amount of information that can be sent over the channel while maintaining the perceived quality of the reproduced speech. If voice is sent by simple sampling and digitization, a data rate of approximately 64 kilobits per second (kbps) is required to achieve the voice quality of traditional analog telephony. However, the data rate can be reduced considerably by using speech analysis followed by appropriate encoding, transmission and resynthesis at the receiver.

把使用通过析取与人类语音生成的模型有关的参数来压缩语音的技术的装置称为语音编码器。语音编码器把来话语音信号分成时间块或分析帧。语音编码器一般包括编码器和解码器。编码器分析来话语音帧，以析取某些有关参数，然后把参数量化成二进制表示，即，位组或二进制数据分组。通过通信信道把数据分组发送到接收机和解码器。解码器处理数据分组，使它们去量化以产生参数，并使用经去量化的参数重新合成语音帧。A device using a technique of compressing speech by extracting parameters related to a model of human speech generation is called a speech coder. The speech coder divides the incoming speech signal into time blocks or analysis frames. Speech encoders generally include encoders and decoders. The encoder analyzes incoming speech frames to extract certain relevant parameters, which are then quantized into a binary representation, ie, groups of bits or packets of binary data. Data packets are sent to receivers and decoders over a communication channel. A decoder processes data packets, dequantizes them to produce parameters, and uses the dequantized parameters to resynthesize speech frames.

语音编码器的功能是通过除去在语音中固有的所有自然冗余把数字化语音信号压缩成低位速率信号。通过用一组参数来表示输入语音帧，以及使用量化，以一组位来表示参数来实现数字压缩。如果输入语音帧具有位数N_i，而语音编码器产生的数据分组具有位数N_o，则通过语音编码器得到的压缩率是C_r＝N_i/N_o。而富有挑战性的是在保持经解码语音的高话音质量的情况下同时实现目标压缩率。语音编码器的性能取决于(1)语音模型、或上述分析和合成处理结合执行得好坏，以及(2)在每帧N_o位的目标位速率处，参数量化处理执行得好坏。因此，语音模型的目标是针对每帧用较小的参数组来捕获语音信号或目标话音质量的要素。The function of the speech coder is to compress the digitized speech signal into a low bit rate signal by removing all natural redundancies inherent in speech. Digital compression is achieved by representing an input speech frame with a set of parameters, and using quantization to represent the parameters with a set of bits. If the input speech frame has the number of bits N _i and the data packet produced by the vocoder has the number of bits N _o , then the compression rate obtained by the vocoder is C _r =N _i /N _o . Rather, it is challenging to achieve the target compression ratio while maintaining high voice quality of the decoded speech. The performance of a speech coder depends on (1) how well the speech model, or the combination of analysis and synthesis processing described above, performs, and (2) how well the parametric quantization process performs at a target bit rate of _N bits per frame. Thus, the goal of a speech model is to capture elements of the speech signal or target speech quality with a small set of parameters per frame.

语音编码器可以作为时域编码器实现，所述时域编码器试图使用高时间分辨率处理来捕获时域语音波形，以每次对语音小段(一般是5毫秒(ms)的子帧)进行编码。对于每个子帧，借助在技术领域中众知的各种搜索算法，从代码簿空间可找到高精确度表示。另一方面，语音编码器可以作为频域编码器实现，频域编码器试图用一组参数捕获输入语音帧的短期语音频谱(分析)，并使用相应的合成处理从频谱参数重新创建语音波形。根据A.Gersho和R.M.Gray的“矢量量化和信号压缩(1992)”中描述的已知量化技术，参数量化器通过用存储的码矢量表示来表示参数，从而保留了参数。Speech coders can be implemented as time-domain coders that attempt to capture time-domain speech waveforms using high temporal resolution processing to perform speech on small segments (typically 5 milliseconds (ms) subframes) at a time. coding. For each subframe, a high-accuracy representation is found from the codebook space by means of various search algorithms well known in the art. Speech coders, on the other hand, can be implemented as frequency-domain coders, which attempt to capture the short-term speech spectrum of an input speech frame with a set of parameters (analysis), and use corresponding synthesis processing to recreate the speech waveform from the spectral parameters. According to known quantization techniques described in "Vector Quantization and Signal Compression (1992)" by A. Gersho and R.M. Gray, a parameter quantizer preserves the parameters by representing them with a stored code vector representation.

众知的时域语音编码器是代码激励的线性预测(CELP)编码器，在L.B.Rabiner和R.W.Schafer的“语音信号的数字处理”396-453(1978)中描述所述代码激励的线性预测编码器，在此全部引用作为参考。在CELP编码器中，通过线性预测(LP)分析除去语音信号中的短期相关，或冗余，所述线性预测分析发现短期共振峰滤波系数。把短期预测滤波施加到来话语音帧产生一个LP残余信号，用长期预测滤波参数和后续随机代码簿进一步使该剩余信号模型化和量化。这样，CELP编码使编码时域语音波形的任务分成对LP短期滤波系数编码和对LP剩余编码的单独编码任务。可以按固定速率(即，对于每个帧使用相同的位数N_o)执行时域编码，或按可变速率(其中，对于不同类型的帧内容使用不同的位速率)执行时域编码。可变速率编码器试图只使用需要的位数量，使对编码器的参数编码达到足够得到目标质量的水平。在美国专利第5,414,796号中描述一种示例可变速率CELP编码器，该专利已转让给本发明的受让人，并在此全部引用作为参考。A well-known time-domain speech coder is the code-excited linear predictive (CELP) coder described in LB Rabiner and RWSchafer, "Digital Processing of Speech Signals" 396-453 (1978), All are incorporated herein by reference. In a CELP coder, short-term correlations, or redundancies, in the speech signal are removed by linear predictive (LP) analysis, which finds short-term formant filter coefficients. Applying short-term predictive filtering to incoming speech frames produces an LP residual signal that is further modeled and quantized using long-term predictive filtering parameters and subsequent random codebooks. Thus, CELP encoding splits the task of encoding the time-domain speech waveform into separate encoding tasks of encoding the LP short-term filter coefficients and encoding the LP residue. Time-domain encoding can be performed at a fixed rate (ie, using the same number of bits N _o for each frame), or at a variable rate (where different bit rates are used for different types of frame content). A variable rate encoder attempts to use only the number of bits needed to encode the parameters of the encoder to a level sufficient to achieve the target quality. An example variable rate CELP encoder is described in US Patent No. 5,414,796, assigned to the assignee of the present invention and incorporated herein by reference in its entirety.

诸如CELP编码器之类的时域编码器一般依赖于每帧的较高的位数N_o以保持时域语音波形的精确度。如果每帧的位数N_o相对较大，(例如，8kpbs或以上)，则这种编码器一般传送优良的话音质量。然而，在低位速率处(4kpbs或以下)，由于有限的可用位数，时域编码器就不能保持高质量和稳固的性能。在低位速率处，有限的代码簿空间限制传统时域编码器的波形匹配能力，而在较高速率的商业应用中就能成功使用。Time-domain coders such as CELP coders typically rely on a higher number of bits N _o per frame to preserve the accuracy of the time-domain speech waveform. If the number of bits N _o per frame is relatively large, (eg, 8 kpbs or more), such encoders generally deliver good voice quality. However, at low bit rates (4kpbs or below), time-domain encoders cannot maintain high quality and robust performance due to the limited number of available bits. At low bit rates, limited codebook space limits the waveform matching capabilities of conventional time-domain coders, which can be used successfully in commercial applications at higher rates.

当前，对于开发在中到低位速率(即，在2.4到4kpbs和以下的范围中)工作的高质量语音编码器存在强烈研究兴趣以及商业需求。应用范围包括无线电话、卫星通信、互联网电话、各种多媒体以及语音流应用、语音邮件以及其它语音存储系统。其驱动力是在数据分组丢失情况下对高性能需求和对稳固性的要求。各种近来语音编码标准化努力是推进低速率语音编码算法的研究和开发的另一个直接驱动力。低速率语音编码器能在每个允许应用的带宽上创建更多信道或用户，并且与合适信道编码的附加层耦合的低速率语音编码器可以适合于编码器规格的总位预算，以及在信道差错情况下传送稳固性能。Currently, there is a strong research interest as well as a commercial need to develop high quality speech coders operating at medium to low bit rates (ie in the range of 2.4 to 4 kpbs and below). Applications include wireless telephony, satellite communications, Internet telephony, various multimedia and voice streaming applications, voice mail and other voice storage systems. The driving force is the need for high performance and the need for robustness in case of data packet loss. Various recent speech coding standardization efforts are another direct driving force for advancing the research and development of low-rate speech coding algorithms. A low-rate vocoder can create more channels or users per bandwidth allowed for the application, and a low-rate vocoder coupled with an additional layer of appropriate channel coding can fit within the total bit budget of the coder specification, as well as in the channel Deliver robust performance under error conditions.

对于按较低位速率的编码，已经开发了各种频谱或频域语音编码的方法，其中，将语音信号作为频谱的时间—变化演变来分析。例如，见R.J.McAulay和T.F.Quatieri的在“语音编码和合成中的正弦编码”，第4章(W.B.Kleijn和K.K.Paliwal编辑，1995)。在频谱编码器中，目标是用一组频谱参数来模仿或预测每个语音输入帧的短期语音频谱，而不是精确地模拟时间—变化语音波形。然后，对频谱参数编码，并用经解码的参数建立语音的输出帧。所产生的合成语音与原始输入语音波形不匹配，但是提供了相似的察觉质量。本技术领域中众知的频域编码器的例子包括多频带激励编码器(MBE)、正弦变换编码器(STC)以及谐波编码器(HC)。这种频域编码器提供具有简洁参数组的高质量参数模型，可以用在低位速率处用较少的可用位数进行精确地量化。For coding at lower bit rates, various spectral or frequency-domain speech coding methods have been developed, in which the speech signal is analyzed as a time-varying evolution of the spectrum. See, eg, R.J. McAulay and T.F. Quatieri in "Sinusoidal Coding in Speech Coding and Synthesis", Chapter 4 (eds. W.B. Kleijn and K.K. Paliwal, 1995). In a spectral encoder, the goal is to model or predict the short-term speech spectrum for each speech input frame with a set of spectral parameters, rather than accurately model the time-varying speech waveform. The spectral parameters are then encoded and the decoded parameters are used to build an output frame of speech. The resulting synthesized speech did not match the original input speech waveform, but provided a similar perceptual quality. Examples of frequency domain coders known in the art include multiband excitation coders (MBE), sinusoidal transform coders (STC) and harmonic coders (HC). Such frequency-domain encoders provide high-quality parametric models with compact parameter sets that can be used for accurate quantization at low bit rates with fewer available bits.

但是，低位速率编码对有限编码分辨率或有限代码簿空间强加了苛刻的限制，这就限制了单个编码机构的有效性，使得编码器在各种背景条件下不能用同等精度来表示各种类型的语音分段。例如，传统低位速率频域编码器不发送语音帧的相位信息。而是，通过使用随机的、人工产生的初始相位值和线性内插技术来重构相位信息。例如，见H.Yang等人在29 Electrontic Letters 856-57(1993年5月)中的“在MBE模型中用于有声语音合成的二次相位内插法”。因为人工地产生相位信息，所以即使量化—去除量化处理完善地保留正弦波的振幅，但是频域编码器所产生的输出语音也不能与原始输入语音对准(即，主要脉冲不同步)。因此证明了在频域编码器中采用任何闭环性能测量法(例如，诸如信噪比(SNR)或感知SNR)是困难的。However, low bit-rate encoding imposes harsh constraints on limited encoding resolution or limited codebook space, which limits the effectiveness of a single encoding mechanism, making it impossible for encoders to represent various types of voice segmentation. For example, traditional low-bit-rate frequency-domain encoders do not send phase information for speech frames. Instead, the phase information is reconstructed by using random, artificially generated initial phase values and linear interpolation techniques. See, eg, "Quadratic Phase Interpolation in the MBE Model for Voiced Speech Synthesis" by H. Yang et al. in 29 Electronic Letters 856-57 (May 1993). Because the phase information is artificially generated, even though the quantization-dequantization process perfectly preserves the amplitude of the sine wave, the output speech produced by the frequency domain coder cannot be aligned with the original input speech (ie, the main pulses are not synchronized). It thus proves difficult to employ any closed-loop performance measure (eg, such as Signal-to-Noise Ratio (SNR) or Perceptual SNR) in a frequency-domain coder.

已经使用多模式编码技术结合开环模式判定处理来执行低位速率语音编码。在Amitava Das等人的Multimode and Variable-Rate Coding of Speech，in SpeechCoding and Synthesis ch.7(W.B.Kleijn和K.K.Paliwal编辑，1995)中描述了一种这样的多模式编码技术。传统的多模式编码器把不同的模式，或编码—解码算法，应用于不同类型的输入语音帧。定制每种模式或编码—解码处理以最有效的方式表示某种类型的语音段，例如，有声语音、无声语音或背景噪声(非语音)。外部的开环模式判定机构检查输入语音帧，并作出有关把哪个模式施加到该帧的判定。一般，通过从输入帧析取许多参数，按照某些临时的和频谱的特征对参数进行估计，以及根据估计以一种模式判定为基础来执行开环模式判定。因此在事先不知道输出语音的确切情况(即，在语音质量或其它性能测量方面，输出语音将和输入语音接近到什么程度)时作出模式判定。Low bit-rate speech coding has been performed using multi-mode coding techniques in combination with an open-loop mode decision process. One such multimode coding technique is described in Multimode and Variable-Rate Coding of Speech by Amitava Das et al., in Speech Coding and Synthesis ch. 7 (eds. W.B. Kleijn and K.K. Paliwal, 1995). Traditional multi-mode encoders apply different modes, or encoding-decoding algorithms, to different types of input speech frames. Each mode or encoding-decoding process is customized to represent a certain type of speech segment in the most efficient manner, for example, voiced speech, unvoiced speech, or background noise (non-speech). An external open-loop mode decision mechanism examines incoming speech frames and makes a decision as to which mode to apply to that frame. Typically, an open-loop mode decision is performed by extracting a number of parameters from an input frame, estimating the parameters according to some temporal and spectral characteristics, and basing a mode decision on the basis of the estimate. Mode decisions are therefore made without knowing in advance exactly what the output speech will be (ie, how close the output speech will be to the input speech in terms of speech quality or other performance measures).

根据上述，希望提供一种能更精确地估计相位信息的低位速率频域编码器。进一步最好是提供一种多模式、混合域编码器，根据帧的语音内容，对某些语音帧进行时域编码，而对其它语音帧进行频域编码。可以进一步希望提供一种混合域编码器，它可以根据闭环编码模式判定机构，对某些语音帧进行时域编码，而对其它语音帧进行频域编码。再又最好是提供一种闭环、多模式、混合域语音编码器，保证编码器产生的输出语音和输入到编码器的原始语音之间的时间同步。在此提出的有关申请中描述了这种语音编码器，所述申请题为“闭环多模式混合域线性预测(MDLP)语音编码器”，其已转让给本发明的受让人，并在此全部引用作为参考。In light of the above, it would be desirable to provide a low bit rate frequency domain encoder that can more accurately estimate phase information. It is further preferred to provide a multi-mode, mixed-domain encoder that encodes some speech frames in the time domain and other speech frames in the frequency domain, depending on the speech content of the frames. It may further be desirable to provide a mixed domain coder which can encode some speech frames in the time domain and other speech frames in the frequency domain according to a closed-loop coding mode decision mechanism. Still more preferably, a closed-loop, multi-mode, mixed-domain speech coder is provided which ensures time synchronization between the output speech produced by the coder and the original speech input to the coder. Such a speech coder is described in a related application filed here, entitled "Closed-Loop Multi-Mode Mixed-Domain Linear Predictive (MDLP) Speech Coder," assigned to the assignee of the present invention, and incorporated herein All cited by reference.

还希望提供一种方法，确保编码器产生的输出语音和输入到编码器的原始语音之间的时间同步。因此需要一种精确地跟踪准周期性信号的相位的方法。It is also desirable to provide a method to ensure time synchronization between the output speech produced by the encoder and the original speech input to the encoder. Therefore, there is a need for a method of accurately tracking the phase of a quasi-periodic signal.

发明内容Contents of the invention

本发明针对一种精确地跟踪准周期性信号的相位的方法。因此，在本发明的一个方面，一种对信号相位跟踪的方法包括下列步骤：对于在信号是周期性期间的帧估计其信号相位；用闭环性能测量监测所估计相位的性能；对于在信号是周期性期间的帧，测量其信号的相位；当所估计相位的性能落在预定阈电平以下时，提供所估计的相位作为输出相位；而当所估计相位的性能落在预定阈电平之上时，提供所测量的相位作为输出相位；以及对于在信号是非周期性期间的帧，测量其信号的相位的步骤。The present invention is directed to a method of accurately tracking the phase of a quasi-periodic signal. Accordingly, in one aspect of the present invention, a method of phase tracking a signal comprises the steps of: estimating the signal phase for frames during which the signal is periodic; monitoring the performance of the estimated phase with closed-loop performance measurements; Periodic period frames for which the phase of the signal is measured; when the performance of the estimated phase falls below a predetermined threshold level, providing the estimated phase as the output phase; and when the performance of the estimated phase falls above the predetermined threshold level , providing the measured phase as an output phase; and the step of measuring the phase of the signal for a frame during which the signal is aperiodic.

在本发明的另一个方面，一种对信号是周期性期间的帧进行准周期相位跟踪的方法包括下列步骤：对于在信号是周期性期间的帧，估计其信号相位；用闭环性能测量监测所估计相位的性能；对于在信号是周期性期间的帧，测量其信号的相位；当所估计相位的性能落在预定阈电平以下时，提供所估计的相位作为输出相位；以及当所估计相位的性能落在预定阈电平之上时，提供所测量的相位作为输出相位。In another aspect of the present invention, a method for quasi-periodic phase tracking of frames during which the signal is periodic comprises the steps of: estimating the signal phase of a frame during which the signal is periodic; Estimate the performance of the phase; measure the phase of the signal for a frame during which the signal is periodic; provide the estimated phase as an output phase when the performance of the estimated phase falls below a predetermined threshold level; and when the performance of the estimated phase Falling above a predetermined threshold level, the measured phase is provided as the output phase.

在本发明的又一个方面，一种对信号相位跟踪的装置包括：一估计装置，用于对在信号是周期性期间的帧，估计其信号的相位；一监测装置，用于用闭环性能测量来监测所估计相位的性能；一用于对在信号是周期性期间的帧，测量其信号的相位的测量装置；一第一执行装置，用于当所估计相位的性能落在预定阈电平以下时，提供所估计的相位作为输出相位；一第二执行装置，当所估计相位的性能落在预定阈电平之上时，提供所测量的相位作为输出相位；以及一用于对在信号是非周期性期间的帧，测量其信号的相位的测量装置。In yet another aspect of the present invention, a device for tracking the phase of a signal includes: an estimating device for estimating the phase of the signal for a frame during which the signal is periodic; a monitoring device for measuring to monitor the performance of the estimated phase; a measuring means for measuring the phase of the signal for a frame during which the signal is periodic; a first implementing means for when the performance of the estimated phase falls below a predetermined threshold level , providing the estimated phase as the output phase; a second implementing means, when the performance of the estimated phase falls above a predetermined threshold level, providing the measured phase as the output phase; A measuring device that measures the phase of a signal during a frame.

在本发明的再一个方面，一种对信号是周期性期间的帧进行准周期相位跟踪的装置包括：一估计装置，用于估计对于在信号是周期性期间帧的信号相位；一监测装置，用于用闭环性能测量来监测所估计相位的性能；以及一用于对在信号是周期性期间的帧，测量其信号的相位的测量装置；一第一执行装置，用于当所估计相位的性能落在预定阈电平以下时，提供所估计的相位作为输出相位；一第二执行装置，当所估计相位的性能落在预定阈电平之上时，提供所测量的相位作为输出相位。In another aspect of the present invention, a device for performing quasi-periodic phase tracking on a frame during which the signal is periodic includes: an estimating device for estimating the signal phase of the frame during the period when the signal is periodic; a monitoring device, For monitoring the performance of the estimated phase with a closed-loop performance measurement; and a measuring means for measuring the phase of the signal for a frame during which the signal is periodic; a first implementing means for when the performance of the estimated phase The estimated phase is provided as the output phase when it falls below a predetermined threshold level, and a second implementing means provides the measured phase as the output phase when the performance of the estimated phase falls above the predetermined threshold level.

附图说明Description of drawings

图1是在每个终端处通过语音编码器终止的通信信道的方框图。Figure 1 is a block diagram of a communication channel terminated by a speech coder at each terminal.

图2是可以在多模式、混合域线性预测(MDLP)语音编码器中使用的编码器的方框图。Figure 2 is a block diagram of an encoder that may be used in a multi-mode, mixed-domain linear prediction (MDLP) speech encoder.

图3是可以在多模式、混合域线性预测(MDLP)语音编码器中使用的解码器的方框图。Figure 3 is a block diagram of a decoder that may be used in a multi-mode, mixed-domain linear prediction (MDLP) speech coder.

图4是流程图，示出可以在图2的编码器中使用的MDLP编码器所执行的MDLP编码步骤。FIG. 4 is a flowchart showing the steps of MDLP encoding performed by an MDLP encoder that may be used in the encoder of FIG. 2 .

图5是流程图，示出语音编码判定过程。Fig. 5 is a flow chart showing a speech encoding decision process.

图6是闭环多模式MDLP语音编码器。Figure 6 is a closed-loop multi-mode MDLP speech coder.

图7是可以在图6的编码器或图2的编码器中使用的频谱编码器的方框图。FIG. 7 is a block diagram of a spectral encoder that may be used in the encoder of FIG. 6 or the encoder of FIG. 2 .

图8是振幅对频率的曲线图，示出在谐波编码器中的正弦波的振幅。Figure 8 is a graph of amplitude versus frequency showing the amplitude of a sine wave in a harmonic encoder.

图9是流程图，示出在多模式MDLP语音编码器中的模式判定处理。Fig. 9 is a flowchart showing mode decision processing in a multi-mode MDLP speech coder.

图10A是语音信号振幅对时间的图例，而图10B是线性预测(LP)残余振幅对时间的图例。FIG. 10A is a graph of speech signal amplitude versus time, and FIG. 10B is a graph of linear prediction (LP) residual amplitude versus time.

图11A是在闭环编码判定中的速率/模式对帧索引的曲线图；图11B是在闭环编码判定中的感知信噪比(PSNR)对帧索引的曲线图；以及图11C是不存在闭环编码判定时的速率/模式和PSNR两者对帧索引的曲线图。Figure 11A is a graph of rate/mode versus frame index in a closed-loop encoding decision; Figure 11B is a graph of perceptual signal-to-noise ratio (PSNR) versus frame index in a closed-loop encoding decision; and Figure 11C is a graph without closed-loop encoding Graph of both rate/mode and PSNR at decision time versus frame index.

图12是用于跟踪准周期性信号的相位的一种装置的方框图。Figure 12 is a block diagram of an apparatus for tracking the phase of a quasi-periodic signal.

具体实施方式Detailed ways

在图1中，第一编码器10接收数字化语音采样s(n)，并对采样s(n)进行编码，用于在发送媒体12或通信信道12上发送到第一解码器14。解码器14对经编码语音采样进行解码，并合成输出语音信号s_SYNTH(n)。对于在相反方向的发送，第二编码器16对数字化语音采样s(n)进行编码，该采样是在通信信道18上发送。第二解码器接收经编码语音采样，并对它进行解码，产生合成输出语音信号s_SYNTH(n)。In FIG. 1 , a first encoder 10 receives digitized speech samples s(n) and encodes the samples s(n) for transmission over a transmission medium 12 or communication channel 12 to a first decoder 14 . Decoder 14 decodes the encoded speech samples and synthesizes an output speech signal s _SYNTH (n). For transmission in the opposite direction, the second encoder 16 encodes the digitized speech samples s(n), which are transmitted on the communication channel 18 . A second decoder receives the encoded speech samples and decodes them to produce a synthesized output speech signal _sSYNTH (n).

语音采样s(n)表示已经根据本技术领域中众知的各种方法(例如，包括脉冲编码调制(PCM)、压缩扩展μ-律或A-律)中的任何一种进行数字化和量化的语音信号。如在本技术领域中众知，把语音采样s(n)组织成输入数据帧，其中，每个帧包括预定数目的数字语音采样s(n)。在示例实施例中，使用8kHz的采样率，每个20ms帧包括160个采样。在下面描述的实施例中，可以有利地以逐帧为基础改变数据传输率，从8kpbs(全速率)到4kpbs(半速率)到2kpbs(四分之一速率)到1kpbs(八分之一速率)。另一方面，可以使用其它数据速率。如这里所使用，术语“全速”或“高速”一般是指大于或等于8kpbs的数据速率，而术语“半速率”或“低速率”一般是指小于或等于4kpbs的数据速率。改变数据传输率是有利的，因为对于包括相对较少语音信息的帧，可以有选择地使用较低位速率。熟悉本技术领域的人员会理解，可以使用其它采样率、帧大小以及数据传输率。Speech samples s(n) represent speech samples that have been digitized and quantized according to any of various methods known in the art, including, for example, pulse code modulation (PCM), companding μ-law, or A-law voice signal. As is known in the art, the speech samples s(n) are organized into frames of input data, where each frame includes a predetermined number of digital speech samples s(n). In an example embodiment, using a sampling rate of 8kHz, each 20ms frame includes 160 samples. In the embodiments described below, the data transfer rate can advantageously be varied on a frame-by-frame basis from 8 kpbs (full rate) to 4 kpbs (half rate) to 2 kpbs (quarter rate) to 1 kpbs (eighth rate) ). Alternatively, other data rates may be used. As used herein, the terms "full rate" or "high speed" generally refer to data rates greater than or equal to 8 kpbs, while the terms "half rate" or "low rate" generally refer to data rates less than or equal to 4 kpbs. Changing the data transmission rate is advantageous because a lower bit rate can be selectively used for frames comprising relatively little speech information. Those skilled in the art will appreciate that other sampling rates, frame sizes, and data transmission rates may be used.

第一编码器10和第二解码器20一起构成第一语音编码器或语音编码器。相似地，第二编码器16和第一解码器14一起构成第二语音编码器。熟悉本技术领域的人员会理解，可以用数字信号处理器(DSP)、专用集成电路(ASIC)、分立门逻辑、固件或任何传统的可编程软件模块以及微处理器来实现语音编码器。软件模块可驻留在RAM存储器、闪存储器、寄存器或在本技术领域中众知的任何其它形式的可写入媒体中。另一方面，任何传统的处理器、控制器或状态机可以取代微处理器。在美国专利第5,727,123号中(该专利已转让给本发明的受让人，并在此全部引用作为参考)，以及在1994年2月16日提出的题为“声码器ASIC”的美国专利申请第08/197,417号中(该专利已转让给本发明的受让人，并在此全部引用作为参考)描述特别为语音编码而设计的示例ASIC。The first encoder 10 and the second decoder 20 together form a first speech coder or speech coder. Similarly, the second encoder 16 and the first decoder 14 together form a second speech encoder. Those skilled in the art will appreciate that the speech coder can be implemented with digital signal processors (DSPs), application specific integrated circuits (ASICs), discrete gate logic, firmware or any conventional programmable software modules as well as microprocessors. A software module may reside in RAM memory, flash memory, registers, or any other form of writable media known in the art. On the other hand, any conventional processor, controller or state machine can replace the microprocessor. In U.S. Patent No. 5,727,123 (which is assigned to the assignee of the present invention and is hereby incorporated by reference in its entirety), and in U.S. Patent No. 5,727,123, filed February 16, 1994, entitled "Vocoder ASIC" An example ASIC specifically designed for speech coding is described in Application Serial No. 08/197,417, assigned to the assignee of the present invention and incorporated by reference in its entirety.

如在图2中描绘，根据一个实施例，可以在语音编码器中使用的多模式混合域线性预测(MDLP)编码器100包括模式判定模块102、间距估计模块104、线性预测(LP)分析模块106、LP分析滤波器108、LP量化模块110以及MDLP剩余编码器112。把输入语音帧s(n)提供给模式判定模块102、间距估计模块104、LP分析模块106、以及LP分析滤波器108。模式判定模块102根据每个输入语音帧s(n)的周期性和诸如能量、频谱倾角、过零速率等其它析取参数产生模式索引I_M和模式M。在1997年3月11日提出的，题为“用于执行降低速率的可变速率语音编码的方法和设备”的美国申请序列号08/815,354中描述根据周期性对语音帧进行分类的各种方法，该申请已转让给本发明的受让人，并在此全部引用作为参考。这些方法包括在电信工业协会临时标准TIA/EIA IS-127和TIA/EIA IS-733中。As depicted in FIG. 2 , according to one embodiment, a multi-mode mixed-domain linear predictive (MDLP) encoder 100 that may be used in a speech encoder includes a mode decision module 102, a distance estimation module 104, a linear prediction (LP) analysis module 106 , LP analysis filter 108 , LP quantization module 110 and MDLP residual encoder 112 . The input speech frame s(n) is provided to the mode decision module 102 , the distance estimation module 104 , the LP analysis module 106 , and the LP analysis filter 108 . The mode decision module 102 generates a mode index I M and a mode _M according to the periodicity of each input speech frame s(n) and other extracted parameters such as energy, spectral dip, zero-crossing rate, etc. Various methods for classifying speech frames according to periodicity are described in U.S. Application Serial No. 08/815,354, entitled "Method and Apparatus for Performing Reduced-Rate Variable-Rate Speech Coding," filed March 11, 1997. method, which application is assigned to the assignee of the present invention and is incorporated herein by reference in its entirety. These methods are covered in Telecommunications Industry Association Interim Standards TIA/EIA IS-127 and TIA/EIA IS-733.

间距估计模块104根据每个输入语音帧s(n)产生间距索引I_p和滞后值P_o。LP分析模块106在每个输入语音帧s(n)上执行线性预测分析，以产生LP参数a。把LP参数a提供给LP量化模块110。LP量化模块110还接收模式M，从而以与模式有关的方式执行量化处理。LP量化模块110产生LP索引I_LP和量化的LP参数。LP分析滤波器108除了接收输入语音帧s(n)之外还接收量化的LP参数。LP分析滤波器108产生LP剩余信号R[n]，它表示输入语音帧s(n)和根据量化的线性预测LP参数重构的语音之间的误差。把LP剩余信号R[n]、模式M、以及量化的LP参数提供给MDLP剩余编码器112。根据下面参考图4的流程图描述的步骤，MDLP剩余编码器112依据这些值产生剩余索引I_R和量化的剩余信号

The pitch estimation module 104 generates a pitch index _Ip and a lag value _Po according to each input speech frame s(n). The LP analysis module 106 performs linear predictive analysis on each input speech frame s(n) to generate LP parameters a. The LP parameter a is provided to the LP quantization module 110 . The LP quantization module 110 also receives the mode M, thereby performing the quantization process in a mode-dependent manner. The LP quantization module 110 generates an LP index I _LP and quantized LP parameters φ. The LP analysis filter 108 receives quantized LP parameters [phi] in addition to the input speech frame s(n). The LP analysis filter 108 produces an LP residual signal R[n], which represents the error between the input speech frame s(n) and the speech reconstructed from the quantized linear predictive LP parameters φ. The LP residual signal R[n], the mode M, and the quantized LP parameters φ are provided to the MDLP residual encoder 112 . According to the steps described below with reference to the flowchart of FIG. 4 , the MDLP residual encoder 112 generates a residual index I _R and a quantized residual signal from these values

在图3中，在语音编码器中使用的解码器200包括LP参数解码模块202、剩余解码模块204、模式解码模块206以及LP合成滤波器208。模式解码模块206对模式索引I_M进行接收和解码，从其产生模式M。LP参数解码模块202接收模式M以及LP索引I_LP。LP参数解码模块202对所接收的值进行解码，以产生量化的LP参数。剩余解码模块204接收剩余索引I_R、间距索引I_p和模式索引I_M。剩余解码模块204对所接收的值进行解码，以产生量化的剩余信号把量化的剩余信号

和LP参数提供给LP合成滤波器208，它从中合成经解码的输出语音信号 In FIG. 3 , a decoder 200 used in a speech encoder includes an LP parameter decoding module 202 , a residual decoding module 204 , a pattern decoding module 206 and an LP synthesis filter 208 . Mode decoding module 206 receives and decodes mode index I _M from which mode M is generated. The LP parameter decoding module 202 receives the mode M and the LP index I _LP . LP parameter decoding module 202 decodes the received values to produce quantized LP parameters φ. The remainder decoding module 204 receives the remainder index I _R , the pitch index I _p and the mode index I _M . The residual decoding module 204 decodes the received values to produce a quantized residual signal The quantized residual signal

and LP parameters  are provided to LP synthesis filter 208, from which it synthesizes the decoded output speech signal

除了MDLP剩余编码器112之外，在本技术领域中众知图2的编码器100和图3的解码器200的各种模块的操作和实施，并在上述美国专利第5,414,796号和L.B.Rabiner和R.W.Schafer的“语音信号的数字处理”396-453(1978)中有描述。With the exception of the MDLP residual encoder 112, the operation and implementation of the various modules of the encoder 100 of FIG. 2 and the decoder 200 of FIG. 3 are well known in the art and described in the aforementioned U.S. Pat. It is described in "Digital Processing of Speech Signals" by R.W. Schafer, 396-453 (1978).

根据一个实施例，MDLP编码器(未示出)执行在图4的流程图中示出的步骤。MDLP编码器可以是图2的MDLP剩余编码器112。在步骤300中，MDLP编码器检查模式M是全速率(FR)、还是四分之一速率(QR)或八分之一速率(ER)。如果模式M是FR、QR或ER，则MDLP编码器转到步骤302。在步骤302中，MDLP编码器把相应的速率(FR、QR或ER——根据M的值)施加于剩余索引I_R。把对于FR模式是高精度、高速率编码，并且可能有利地是CELP编码的时域编码施加于LP剩余帧，或另一方面施加于语音帧。然后发送帧(在包括数-模转换和调制的进一步信号处理之后)。在一个实施例中，帧是表示预测误差的LP剩余帧。在另一个实施例中，帧是表示语音采样的语音帧。According to one embodiment, an MDLP encoder (not shown) performs the steps shown in the flowchart of FIG. 4 . The MDLP encoder may be the MDLP residual encoder 112 of FIG. 2 . In step 300, the MDLP encoder checks whether the mode M is full rate (FR), quarter rate (QR) or eighth rate (ER). If the mode M is FR, QR or ER, the MDLP encoder goes to step 302. In step 302, the MDLP encoder applies the corresponding rate (FR, QR or ER - depending on the value of M) to the remaining index I _R . Time-domain coding, which is a high precision, high rate coding for FR mode, and possibly advantageously CELP coding, is applied to the LP residual frames, or on the other hand to the speech frames. The frame is then transmitted (after further signal processing including digital-to-analog conversion and modulation). In one embodiment, the frames are LP residual frames representing prediction errors. In another embodiment, the frames are speech frames representing speech samples.

另一方面，如果在步骤300中，模式M不是FR、QR或ER，(即，如果模式M是半速率(HR))，则MDLP编码器转到步骤304。在步骤304中，把频谱编码(较有利的是谐波编码)以半速率施加于LP剩余，或施加于语音信号。然后MDLP编码器转到步骤306。在步骤306中，通过对经编码语音进行解码并将其与原始输入帧进行比较来得到失真测量值D。然后MDLP编码器转到步骤308。在步骤308中，失真测量值D与预定阈值T进行比较。如果失真测量值D大于阈值T，则把半速率的、频谱编码的帧的相应量化参数进行调制并发送。另一方面，如果失真测量值D不大于阈值T，则MDLP编码器转到步骤310。在步骤310中，在时域中以全速率对经解码的帧进行重新编码。可以使用任何传统高速率高精度编码算法，诸如，最好使用CELP编码。然后调制和发送与该帧相关联的FR模式量化的参数。On the other hand, if in step 300 mode M is not FR, QR or ER, (ie, if mode M is half rate (HR)), then the MDLP encoder goes to step 304 . In step 304, spectral coding (preferably harmonic coding) is applied to the LP residue at half rate, or to the speech signal. The MDLP encoder then goes to step 306. In step 306, a distortion measure D is obtained by decoding the encoded speech and comparing it with the original input frame. The MDLP encoder then goes to step 308. In step 308 the distortion measure D is compared with a predetermined threshold T. If the distortion measure D is greater than the threshold T, the corresponding quantization parameter of the half-rate, spectrally encoded frame is modulated and transmitted. On the other hand, if the distortion measure D is not greater than the threshold T, the MDLP encoder goes to step 310 . In step 310, the decoded frames are re-encoded in the time domain at full rate. Any conventional high rate high precision coding algorithm may be used, such as preferably CELP coding. The parameters quantized for the FR mode associated with that frame are then modulated and transmitted.

如在图5的流程图中示出，根据一个实施例，闭环多模式MDLP语音编码器在处理用于发送的语音采样中遵循一组步骤。在步骤400中，语音编码器接收在连续帧中的语音信号的数字采样。语音编码器根据所接收的给定帧而转到步骤402。在步骤402中，语音编码器检测帧的能量。该能量是帧的语音活动的量度。通过对数字语音采样的振幅平方求和并把结果能量值与阈值进行比较来执行语音检测。在一个实施例中，阈值根据背景噪声的变化电平进行自适应。在上述美国专利第5,414,796号中描述一种示范可变阈值语音活动检测器。某些无声语音声音可以是极低能量的采样，这就可能会将其错误地作为背景噪声进行编码。为了防止这种情况发生，可以使用低能量采样的频谱倾角，以从背景噪声中区分无声语音，如在上述美国专利第5,414,796号中描述。As shown in the flowchart of FIG. 5, a closed-loop multi-mode MDLP vocoder follows a set of steps in processing speech samples for transmission, according to one embodiment. In step 400, a speech encoder receives digital samples of a speech signal in successive frames. The speech encoder goes to step 402 based on the received given frame. In step 402, the speech encoder detects the energy of the frame. This energy is a measure of the speech activity of the frame. Speech detection is performed by summing the squared amplitudes of digital speech samples and comparing the resulting energy value to a threshold. In one embodiment, the threshold is adaptive based on changing levels of background noise. An exemplary variable threshold voice activity detector is described in the aforementioned US Patent No. 5,414,796. Some unvoiced speech sounds can be extremely low-energy samples, which can be incorrectly encoded as background noise. To prevent this, the spectral dip of low energy samples can be used to distinguish unvoiced speech from background noise, as described in the aforementioned US Patent No. 5,414,796.

在检测帧的能量之后，语音编码器转到步骤404。在步骤404中，语音编码器判定所检测的帧能量是否足以把帧分类作为包含语音信息。如果所检测的帧能量降低到预定阈电平之下，则语音编码器转到步骤406。在步骤406中，语音编码器将帧作为背景噪声(即，无声或静音)编码。在一个实施例中，以1/8速率，或1kpbs对背景噪声进行时域编码。如果在步骤404中，所检测的帧能量符合或超过预定阈电平，则把帧分类为语音帧，并且语音编码器转到步骤408。After detecting the energy of the frame, the speech encoder goes to step 404 . In step 404, the speech encoder determines whether the detected frame energy is sufficient to classify the frame as containing speech information. If the detected frame energy falls below a predetermined threshold level, the speech encoder goes to step 406 . In step 406, the speech encoder encodes the frame as background noise (ie, silence or silence). In one embodiment, background noise is time-domain encoded at 1/8 rate, or lkpbs. If in step 404 the detected frame energy meets or exceeds a predetermined threshold level, the frame is classified as a speech frame and the speech encoder goes to step 408 .

在步骤408中，语音编码器确定该帧是否为周期性的。周期性判定的各种已知方法包括，例如，使用过零点和使用归一化的自相关函数(NACF)。尤其，在1997年3月11日提出的，题为“用于执行降低速率的可变速率话音编码的方法和设备”的美国专利申请第08/815,354号中描述使用过零点和NACF来检测周期性，该申请已转让给本发明的受让人，并在此全部引用作为参考。此外，把用于区分有声语音和无声语音的上述方法包括在电信工业协会工业临时标准TIA/EIA IS-127和TIA/EIA IS-733中。如果在步骤408中没有判定帧是周期性的，则语音编码器转到步骤410。在步骤410中，语音编码器把帧作为无声语音编码。在一个实施例中，以1/4速率，或2kpbs，对无声语音帧进行时域编码。如果在步骤408中确定该帧是周期性的，则语音编码器转到步骤412。In step 408, the speech encoder determines whether the frame is periodic. Various known methods of determining periodicity include, for example, using zero crossings and using a normalized autocorrelation function (NACF). In particular, the use of zero crossings and NACF to detect periodic , which application is assigned to the assignee of the present invention and is hereby incorporated by reference in its entirety. Furthermore, the above-mentioned method for distinguishing voiced speech from unvoiced speech is included in Telecommunications Industry Association Industry Interim Standards TIA/EIA IS-127 and TIA/EIA IS-733. If it is not determined in step 408 that the frame is periodic, the speech encoder goes to step 410 . In step 410, the speech encoder encodes the frame as unvoiced speech. In one embodiment, unvoiced speech frames are time-domain encoded at 1/4 rate, or 2kpbs. If in step 408 it is determined that the frame is periodic, the speech encoder goes to step 412 .

在步骤412中，语音编码器使用本技术领域中众知的周期性检测方法(如在上述美国申请序列号08/815,354中所描述)来确定帧是否有足够的周期性。如果没有判定该帧有足够的周期性，则语音编码器转到步骤414。在步骤414中，将帧作为过渡语音(即，从无声语音到有声语音的过渡)进行时域编码。在一个实施例中，以全速率，或8kpbs，对过渡语音帧进行时域编码。In step 412, the speech encoder determines whether the frame is sufficiently periodic using periodicity detection methods known in the art (as described in the aforementioned US Application Serial No. 08/815,354). If the frame is not determined to be sufficiently periodic, the speech encoder proceeds to step 414 . In step 414, the frame is time-domain coded as transition speech (ie, the transition from unvoiced speech to voiced speech). In one embodiment, the transitional speech frames are time domain encoded at full rate, or 8kpbs.

如果在步骤412中，语音编码器确定帧有足够的周期性，则语音编码器转到步骤416。在步骤416中，语音编码器把帧作为有声语音编码。在一个实施例中，以半速率，或4kpbs，对有声语音帧进行频谱编码。较有利地，用谐波编码器对有声语音帧进行频谱编码，如下参考图7所述。另一方面，可以使用其它频谱编码器，例如，正弦变换编码器或多频带激励编码器，如在本技术领域中所众知。然后，语音编码器转到步骤418。在步骤418中，语音编码器对经编码的有声语音帧进行解码。然后，语音编码器转到步骤420。在步骤420中，把经解码的有声语音帧与对应于该帧的输入语音采样进行比较，以得到合成语音失真的测量值，并判定半速率有声语音频谱编码模型是否在可接受的限度范围内工作。然后，语音编码器转到步骤422。If in step 412 the vocoder determines that the frame is sufficiently periodic, the vocoder proceeds to step 416 . In step 416, the speech encoder encodes the frame as voiced speech. In one embodiment, voiced speech frames are spectrally encoded at half rate, or 4kpbs. Advantageously, the voiced speech frames are spectrally encoded using a harmonic encoder, as described below with reference to FIG. 7 . Alternatively, other spectral encoders may be used, such as sinusoidal transform encoders or multiband excitation encoders, as are known in the art. The speech encoder then goes to step 418. In step 418, the speech encoder decodes the encoded voiced speech frames. Then, the speech coder goes to step 420 . In step 420, the decoded voiced speech frame is compared to the input speech samples corresponding to that frame to obtain a measure of the distortion of the synthesized speech and to determine whether the half-rate voiced speech spectral coding model is within acceptable limits Work. The speech encoder then goes to step 422.

在步骤422中，语音编码器判定经解码的有声语音帧和对应于该帧的输入语音采样之间的误差是否降低到预定阈值之下。根据一个实施例，以下面参考图6描述的方式来作出这个判定。如果编码失真降低到预定阈值之下，则语音编码器转到步骤426。在步骤426中，语音编码器使用步骤416的参数将该帧作为有声语音发送。如果在步骤422中，编码失真符合或超过预定阈值，则语音编码器转到步骤414，对在步骤400中接收到的数字语音采样的帧作为过渡语音以全速率进行时域编码。In step 422, the speech encoder determines whether the error between the decoded frame of voiced speech and the input speech samples corresponding to the frame falls below a predetermined threshold. According to one embodiment, this determination is made in the manner described below with reference to FIG. 6 . If the coding distortion falls below a predetermined threshold, the speech encoder goes to step 426 . In step 426, the speech encoder sends the frame as voiced speech using the parameters of step 416. If in step 422 the coding distortion meets or exceeds the predetermined threshold, the speech encoder proceeds to step 414 to time-domain encode the frames of digital speech samples received in step 400 as transition speech at full rate.

应该指出，步骤400-410包括开环编码判定模式。另一方面，步骤412-426包括闭环编码判定模式。It should be noted that steps 400-410 comprise an open-loop encoding decision mode. Steps 412-426, on the other hand, include a closed-loop encoding decision mode.

在图6中示出的一个实施例中，一种闭环多模式MDLP语音编码器包括耦合到帧缓冲器502的模-数转换器(A/D)500，而帧缓冲器502依次耦合到控制处理器504。把能量计算器506、有声语音检测器508、背景噪声编码器510、高速率时域编码器512和低速率频谱编码器514耦合到控制处理器504。把频谱解码器516耦合到频谱编码器514，并把误差计算器518耦合到频谱解码器516和控制处理器504。把阈值比较器520耦合到误差计算器518和控制处理器504。把缓冲器522耦合到频谱编码器514、频谱解码器516以及阈值比较器520。In one embodiment shown in FIG. 6, a closed-loop multimode MDLP speech encoder includes an analog-to-digital converter (A/D) 500 coupled to a frame buffer 502, which in turn is coupled to a control processor 504 . Energy calculator 506 , voiced speech detector 508 , background noise encoder 510 , high rate time domain encoder 512 and low rate spectral encoder 514 are coupled to control processor 504 . Spectral decoder 516 is coupled to spectral encoder 514 , and error calculator 518 is coupled to spectral decoder 516 and control processor 504 . Threshold comparator 520 is coupled to error calculator 518 and control processor 504 . Buffer 522 is coupled to spectral encoder 514 , spectral decoder 516 and threshold comparator 520 .

在图6的实施例中，语音编码器部件最好作为在语音编码器中的固件或其它软件驱动模块实现，语音编码器本身有利地驻留在DSP或ASIC中。熟悉本技术领域的人员会理解，可以等效地以许多其它已知方法较好地实施语音编码器部件。控制处理器504可以有利地是微处理器，但是另外可以用控制器、状态机或离散逻辑电路实现。In the embodiment of FIG. 6, the vocoder components are preferably implemented as firmware or other software driver modules in the vocoder, which itself advantageously resides in a DSP or ASIC. Those skilled in the art will understand that the vocoder components may be equally well implemented in many other known ways. Control processor 504 may advantageously be a microprocessor, but may alternatively be implemented as a controller, state machine, or discrete logic circuitry.

在图6的多模式编码器中，把语音信号提供给A/D 500。A/D 500把模拟信号转换成数字语音采样的帧，S(n)。把数字语音采样提供给帧缓冲器502。控制处理器504从帧缓冲器502取得数字语音采样，并把它们提供给能量计算器506。能量计算器506根据下列公式计算语音采样的能量E：In the multi-mode encoder of FIG. 6, the speech signal is provided to A/D 500. The A/D 500 converts the analog signal into frames of digital speech samples, S(n). The digital speech samples are provided to frame buffer 502 . Control processor 504 takes digital speech samples from frame buffer 502 and provides them to energy calculator 506 . The energy calculator 506 calculates the energy E of the speech sample according to the following formula:

$E E. = = {Σ Σ}_{n no = = 00}^{159159} {S S}^{22} ((n no))$

其中，这些帧为20ms长，而采样率是8kHz。把计算的能量E发送回控制处理器504。Here, the frames are 20ms long and the sampling rate is 8kHz. The calculated energy E is sent back to the control processor 504 .

控制处理器504对计算的语音能量和语音活动阈值进行比较。如果计算的能量低于语音活动阈值，则控制处理器504把数字语音采样从帧缓冲器502传送到背景噪声编码器510。背景噪声编码器510使用保留背景噪声估计所需的最小位数来编码该帧。The control processor 504 compares the calculated speech energy to a speech activity threshold. If the calculated energy is below the speech activity threshold, the control processor 504 passes the digital speech samples from the frame buffer 502 to the background noise encoder 510 . Background noise encoder 510 encodes the frame using the minimum number of bits needed to preserve the background noise estimate.

如果计算的能量大于或等于语音活动阈值，则控制处理器504把数字语音采样从帧缓冲器502传送到有声语音检测器508。有声语音检测器508判定语音帧的周期性是否允许使用低位速率频谱编码进行有效编码。判定语音帧中的周期性水平的方法为本技术领域中所众知，并且包括，例如，使用归一自相关函数(NACF)和过零点。在上述美国申请序列号08/815,354中描述过这些方法以及其它方法。If the calculated energy is greater than or equal to the voice activity threshold, the control processor 504 transmits the digital speech samples from the frame buffer 502 to the voiced speech detector 508 . The voiced speech detector 508 determines whether the periodicity of the speech frame allows for efficient encoding using low bit rate spectral coding. Methods of determining the level of periodicity in speech frames are well known in the art and include, for example, the use of a normalized autocorrelation function (NACF) and zero crossings. These methods, as well as others, are described in the aforementioned US Application Serial No. 08/815,354.

有声语音检测器508把信号提供给控制处理器504，该信号指示了该语音帧是否包括足够周期性的语音，以通过频谱编码器514有效地编码。如果有声语音检测器508判定语音帧缺少足够的周期性，则控制处理器504把数字语音采样传送到高速率编码器512，它以预定的最大数据速率对语音进行时域编码。在一个实施例中，预定的最大速率是8kpbs，并且高速率编码器512是CELP编码器。Voiced speech detector 508 provides a signal to control processor 504 indicating whether the speech frame includes sufficiently periodic speech to be efficiently encoded by spectral encoder 514 . If voiced speech detector 508 determines that the speech frame lacks sufficient periodicity, control processor 504 passes the digital speech samples to high rate encoder 512, which time domain encodes the speech at a predetermined maximum data rate. In one embodiment, the predetermined maximum rate is 8 kpbs, and the high rate encoder 512 is a CELP encoder.

如果有声语音检测器508起初判定语音信号具有足够周期性以通过频谱编码器514有效地编码，则控制处理器504把数字语音采样从帧缓冲器502传送到频谱编码器514。下面参考图7详细描述一种示例频谱编码器。If voiced speech detector 508 initially determines that the speech signal is sufficiently periodic to be efficiently encoded by spectral encoder 514 , control processor 504 passes digital speech samples from frame buffer 502 to spectral encoder 514 . An example spectral encoder is described in detail below with reference to FIG. 7 .

频谱编码器514析取所估计的间距频率F₀、间距频率的谐波的振幅A_I以及语音信息V_C。频谱编码器514把这些参数提供给缓冲器522和频谱解码器516。频谱解码器516可以有利地模拟成传统CELP编码器中的编码器的解码器。频谱解码器516根据频谱解码格式(下面将参考图7描述)产生合成语音采样，The spectral encoder 514 extracts the estimated pitch frequency F ₀ , the amplitude A _I of the harmonics of the pitch frequency, and the speech information V _C . Spectral encoder 514 provides these parameters to buffer 522 and spectral decoder 516 . The spectral decoder 516 may advantageously be modeled as a decoder of an encoder in a conventional CELP encoder. Spectral decoder 516 generates synthetic speech samples according to the spectral decoding format (described below with reference to FIG. 7 ),

$\overset{^^}{S S} ((n no)),,$

并把合成语音采样提供给误差计算器518。控制处理器504把语音采样S(n)发送到误差计算器518。And the synthesized speech samples are provided to the error calculator 518 . Control processor 504 sends speech samples S(n) to error calculator 518 .

误差计算器518根据下列公式计算每个语音采样S(n)和每个相应的合成语音采样

之间的均方误差(MSE)：Error calculator 518 calculates each speech sample S(n) and each corresponding synthesized speech sample according to the following formula

The mean square error (MSE) between:

$MSE MSE = = {Σ Σ}_{n no = = 00}^{159159} {((S S ((n no)) - - \overset{^^}{S S} ((n no))))}^{22}$

把计算的MSE提供给阈值比较器520，它判定失真电平是否在可接受的范围内，即，失真电平是否降低到预定阈值之下。The calculated MSE is provided to a threshold comparator 520, which determines whether the distortion level is within an acceptable range, ie, whether the distortion level falls below a predetermined threshold.

如果计算的MSE在可接受的范围内，则阈值比较器520把信号提供给缓冲器502，并使频谱编码的数据从语音编码器输出。另一方面，如果MSE不在可接受的范围内，则阈值比较器520把信号提供给控制处理器504，控制处理器504依次把数字采样从帧缓冲器502传送到高速率时域编码器512。时域编码器512以预定最大速率对帧进行编码，并丢弃缓冲器522的内容。If the calculated MSE is within an acceptable range, threshold comparator 520 provides a signal to buffer 502 and causes spectrally encoded data to be output from the speech encoder. On the other hand, if the MSE is not within an acceptable range, threshold comparator 520 provides a signal to control processor 504 which in turn passes digital samples from frame buffer 502 to high rate time domain encoder 512 . Time domain encoder 512 encodes frames at a predetermined maximum rate and discards the contents of buffer 522 .

在图6的实施例中，所使用的频谱编码的类型是谐波编码，如下面参考图7所描述，但是另一方面，可以是任何类型的频谱编码，例如，正弦波变换编码或多频带激励编码。例如，使用在美国专利第5,195,166中描述的多频带激励编码，以及使用例如在美国专利第4,865,068中描述的正弦波变换编码。In the embodiment of Fig. 6, the type of spectral coding used is harmonic coding, as described below with reference to Fig. 7, but alternatively, any type of spectral coding is possible, for example, sinusoidal transform coding or multiband Incentive coding. For example, use multiband excitation coding as described in US Patent No. 5,195,166, and use sinusoidal transform coding such as described in US Patent No. 4,865,068.

对于相位失真阈值等于或低于周期性参数的过渡帧和有声帧，借助高速率时域编码器512，图6的多模式编码器有利地以全速率或8kpbs使用CELP编码。另一方面，对于这种帧，可以使用任何其它已知形式的高速率时域编码。因此，就以高精度对过渡帧(以及周期性不足够的有声帧)进行编码，以便通过较好地保留相位信息，使输入端和输出端处的波形较好地匹配。在一个实施例中，在处理完阈值超过周期测量值的预定数目的连续有声帧之后，多模式编码器对于一个帧不管阈值比较器520的判定如何，都从半速率频谱编码切换到全速率CELP编码。The multi-mode encoder of FIG. 6 advantageously uses CELP encoding at full rate or 8 kpbs by means of the high rate time domain encoder 512 for transitional and voiced frames with phase distortion thresholds equal to or below the periodicity parameter. On the other hand, any other known form of high-rate time-domain coding can be used for such frames. Therefore, the transition frames (and voiced frames which are not sufficiently periodic) are encoded with high precision in order to better match the waveforms at the input and output by better preserving the phase information. In one embodiment, the multimode encoder switches from half-rate spectral encoding to full-rate CELP for a frame after processing a predetermined number of consecutive voiced frames for which the threshold exceeds the period measure. coding.

应该指出，能量计算器506和有声语音检测器508连同控制处理器504构成开环编码判定。与此相比，频谱编码器514、频谱解码器516、误差计算器518、阈值比较器520、和缓冲器522连同控制处理器504构成闭环编码判定。It should be noted that the energy calculator 506 and the voiced speech detector 508 together with the control processor 504 form an open loop encoding decision. In contrast, spectral encoder 514 , spectral decoder 516 , error calculator 518 , threshold comparator 520 , and buffer 522 together with control processor 504 constitute a closed-loop encoding decision.

在参考图7描述的一个实施例中，使用频谱编码且最好使用谐波编码，以低位速率对足够周期性的有声帧进行编码。一般定义频谱编码器为算法，所述算法试图通过对每个语音帧在频域中进行模拟和编码以可感知有意义的方法来保留语音频谱特征的时间演变。这些算法的重要部分是：(1)频谱分析或参数估计；(2)参数量化；以及(3)分析具有经解码参数的输出语音波形。因此，其目标是用一组频谱参数保留短期语音频谱的重要特征，对参数编码，然后使用经解码频谱参数合成输出语音。一般，合成输出语音作为正弦波的加权和。正弦波的振幅、频率和相位是分析期间所估计的频谱参数。In one embodiment described with reference to Figure 7, sufficiently periodic voiced frames are encoded at a low bit rate using spectral coding, and preferably harmonic coding. Spectral encoders are generally defined as algorithms that attempt to preserve the temporal evolution of the spectral features of speech in a perceptually meaningful way by simulating and encoding each speech frame in the frequency domain. The important parts of these algorithms are: (1) spectral analysis or parameter estimation; (2) parameter quantization; and (3) analysis of the output speech waveform with decoded parameters. Therefore, the goal is to preserve the important features of the short-term speech spectrum with a set of spectral parameters, encode the parameters, and then use the decoded spectral parameters to synthesize the output speech. Typically, the synthesized output speech is a weighted sum of sinusoids. The amplitude, frequency, and phase of the sine wave are spectral parameters estimated during analysis.

在CELP编码技术中“综合分析”是一种众知的技术，而在频谱编码中没有利用该技术。综合分析不应用于频谱编码器的主要原因是由于初始相位信息的丢失，即使从可察觉的观点来看语音模型在功能上很适合，但是合成语音的均方能量(MSE)可能很高。因此，正确地产生初始相位的另一个优点是可得到一种能力，直接对语音采样和重构语音进行比较以允许判定语音模型是否精确编码语音帧。"Analysis by synthesis" is a well-known technique in CELP coding techniques, which is not utilized in spectral coding. The main reason why synthesis analysis should not be applied to spectral encoders is due to the loss of initial phase information, even if the speech model is functionally fit from a perceptual point of view, the mean square energy (MSE) of the synthesized speech can be high. Thus, another advantage of correctly generating the initial phase is the ability to make direct comparisons of speech samples and reconstructed speech to allow a decision as to whether the speech model is accurately encoding speech frames.

在频谱编码中，如下合成输出语音帧：In spectral coding, output speech frames are synthesized as follows:

S[n]＝S_v[n]+S_uv[n]，n＝1，2，…，N，S[n]=S _v [n]+S _uv [n], n=1, 2,..., N,

其中，N是每帧的采样数，而S_v和S_uv分别是有声和无声分量。正弦波求和合成处理如下创建了有声分量：where N is the number of samples per frame, and S _v and S _uv are the voiced and unvoiced components, respectively. The sum-of-sine-wave synthesis process creates the vocal component as follows:

$S S [[n no]] = = {Σ Σ}_{k k = = 11}^{L L} A A ((k k,, n no)) \cdot \cdot cos cos ((22 πn πn {f f}_{k k} + + θ θ ((k k,, n no))))$

其中，L是正弦波的总数，f_k是在短期频谱中关心的频率，A(k，n)是正弦波的振幅，以及θ(k，n)是正弦波的相位。通过频谱分析处理从输入帧的短期频谱估计振幅、频率和相位参数。无声分量可以在单独的正弦波求和合成中与有声部分一同创建，或可以通过专用无声合成处理分开计算，然后加回到S_v中。where L is the total number of sinusoids, f _k is the frequency of interest in the short-term spectrum, A(k,n) is the amplitude of the sinusoids, and θ(k,n) is the phase of the sinusoids. Amplitude, frequency, and phase parameters are estimated from the short-term spectrum of an input frame through spectral analysis processing. The unvoiced component can be created together with the voiced part in a separate sum of sine wave synthesis, or it can be calculated separately through a dedicated unvoiced synthesis process and then added back into _Sv .

在图7的实施例中，使用称之为谐波编码器的特定类型频谱编码器，以低位速率对足够周期性的有声帧进行频谱编码。谐波编码器将一帧表征为正弦波的和，分析帧的小段。在正弦波总和中的每个正弦波具有的频率是该帧的间距F₀的整数倍。在另外的实施例中，其中，所使用的特定类型的频谱编码器不是谐波编码器，从在0和2π之间的一组实数取得每个帧的正弦波频率。在图7的实施例中，有利地选择在总和中的每个正弦波的振幅和相位，以便总和将与一个周期上的信号最佳地匹配，如图8的图例所示。一般，谐波编码器使用外部分类，对每个输入语音帧标识为有声或无声。对于有声帧，把正弦波的频率限制于估计间距(F₀)的谐波，即，f_k＝kF₀。对于无声语音，使用短期频谱的峰值来确定正弦波。内插振幅和相位以如下模仿在帧上它们的演变，如：In the embodiment of Figure 7, sufficiently periodic voiced frames are spectrally encoded at a low bit rate using a specific type of spectral encoder called a harmonic encoder. Harmonic encoders represent a frame as a sum of sine waves, analyzing small segments of the frame. Each sine wave in the sum of sine waves has a frequency that is an integer multiple of the pitch F ₀ of the frame. In a further embodiment, where the particular type of spectral encoder used is not a harmonic encoder, the frequency of the sine wave for each frame is derived from a set of real numbers between 0 and 2[pi]. In the embodiment of FIG. 7 , the amplitude and phase of each sine wave in the sum are advantageously chosen so that the sum will best match the signal over one period, as shown in the legend of FIG. 8 . Typically, harmonic encoders use an external classification to identify each input speech frame as voiced or unvoiced. For voiced frames, the frequency of the sine wave is limited to harmonics of the estimated pitch (F ₀ ), ie, f _k =k F ₀ . For unvoiced speech, the peak of the short-term spectrum is used to determine the sine wave. Interpolate amplitude and phase to mimic their evolution over frames as follows:

A(k，n)＝C₁(k)*n+C₂(k)A(k,n)=C ₁ (k)*n+C ₂ (k)

θ(k，n)＝B₁(k)*n²+B₂(k)*n+B₃(k)θ(k,n)=B ₁ (k)*n ² +B ₂ (k)*n+B ₃ (k)

其中，在取窗口的输入语音帧的短期傅里叶变换(STFT)之外的特定频率位置f_k(＝kf₀)处，从振幅、频率和相位的瞬时值估计系数[Ci(k)，Bi(k)]。每个正弦波待发送的参数是振幅和频率。不发送相位，但是根据数种已知技术中的任何一种对其模拟作为替代，例如，所述已知技术包括二次相位模型，或任何传统的相位多项式表达式。where _the coefficients [Ci( _k ), Bi(k)]. The parameters to be sent for each sine wave are amplitude and frequency. The phase is not transmitted, but instead simulated according to any of several known techniques including, for example, quadratic phase models, or any conventional polynomial representation of the phase.

如在图7中所示，谐波编码器包括耦合到窗口逻辑602和离散傅里叶变换(DFT)和谐波分析逻辑604的间距析取器600。还把接收语音采样S(n)作为输入的间距析取器600耦合到DFT和谐波分析逻辑604。把DFT和谐波分析逻辑604耦合到剩余编码器606。把间距析取器600、DFT和谐波分析逻辑604以及剩余编码器606的每一个都耦合到参数量化器608。把参数量化器608耦合到信道编码器610，依次，把信道编码器610耦合到发射机612。通过标准射频(RF)接口(例如，诸如码分多址(CDMA)空中接口)把发射机612耦合到接收机614。把接收机614耦合到信道解码器616，依次，把信道解码器616耦合到去量化器618。把去量化器618耦合到正弦波求和语音合成器620。还把正弦波求和语音合成器620耦合到相位估计器622，它接收先前帧信息作为输入。配置正弦波求和语音合成器620以产生合成语音输出S_SYNTH(n)。As shown in FIG. 7 , the harmonic encoder includes a pitch extractor 600 coupled to window logic 602 and discrete Fourier transform (DFT) and harmonic analysis logic 604 . A spacing extractor 600 that receives speech samples S(n) as input is also coupled to DFT and harmonic analysis logic 604 . The DFT and harmonic analysis logic 604 is coupled to a residual encoder 606 . Each of pitch extractor 600 , DFT and harmonic analysis logic 604 , and residual encoder 606 are coupled to parameter quantizer 608 . Parameter quantizer 608 is coupled to channel encoder 610 , which in turn is coupled to transmitter 612 . Transmitter 612 is coupled to receiver 614 through a standard radio frequency (RF) interface (eg, such as a Code Division Multiple Access (CDMA) air interface). Receiver 614 is coupled to channel decoder 616 , which in turn is coupled to dequantizer 618 . The dequantizer 618 is coupled to a sine wave sum speech synthesizer 620 . A sum-of-sine-wave speech synthesizer 620 is also coupled to a phase estimator 622, which receives previous frame information as input. The sine wave sum speech synthesizer 620 is configured to produce a synthesized speech output S _SYNTH (n).

可以用熟悉本技术领域的人员众知的各种不同方法(例如，包括固件或软件模块)来实现间距析取器600、窗口逻辑602、DFT和谐波分析逻辑604、剩余编码器606、参数量化器608、信道编码器610、信道解码器616、去量化器618、正弦波求和语音合成器620以及相位估计器622。可以用熟悉本技术领域的人员众知的任何等效标准RF部件来实现发射机612和接收机614。The pitch extractor 600, window logic 602, DFT and harmonic analysis logic 604, residual encoder 606, parameter Quantizer 608 , Channel Encoder 610 , Channel Decoder 616 , Dequantizer 618 , Sine Wave Sum Speech Synthesizer 620 and Phase Estimator 622 . Transmitter 612 and receiver 614 may be implemented with any equivalent standard RF components known to those skilled in the art.

在图7的谐波编码器中，间距析取器600接收输入采样S(n)，析取间距频率信息F₀。然后通过窗口逻辑602使采样乘以合适的窗口函数，以允许对语音帧的小段进行分析。DFT和谐波分析逻辑604使用间距析取器600提供的间距信息计算采样的DFT，以产生复数频谱点，从所述复数频谱点析取了谐波振幅A_I，如图8的图例所示，其中，L表示谐波的总数。把DFT提供给剩余编码器606，剩余编码器606析取有声信息V_c。In the harmonic encoder of FIG. 7 , the pitch extractor 600 receives the input samples S(n), and extracts the pitch frequency information F ₀ . The samples are then multiplied by a suitable window function by windowing logic 602 to allow analysis of small segments of speech frames. DFT and harmonic analysis logic 604 computes the DFT of the samples using the spacing information provided by the spacing extractor 600 to produce the complex spectral points from which the harmonic amplitudes A _I are extracted, as shown in the legend of FIG. 8 , where L represents the total number of harmonics. The DFT is provided to a residual encoder 606 which extracts the voiced information V _c .

应该指出，如在图8中所示，V_c参数表示在频率轴上的一个点，在该点以上，频谱是无声语音信号特征，并且不再是谐波。与此相比，在点V_c以下，频谱是谐波，并且是有声语音特征。It should be noted that, as shown in Figure 8, the _Vc parameter represents the point on the frequency axis above which the spectrum is characteristic of an unvoiced speech signal and is no longer harmonic. In contrast, below point V _c the spectrum is harmonic and characteristic of voiced speech.

把A_I、F₀和V_c分量提供给参数量化器608，它对信息进行量化。把经量化信息以分组形式提供给信道编码器610，信道编码器610以低位速率(例如，诸如半速率，或4kpbs)对分组进行量化。把分组提供给发射机612，发射机612对分组进行调制，并把所产生信号在空中发送到接收机614。接收机614接收和解调信号，把经编码分组传递到信道解码器616。信道解码器616对分组进行解码，并把经解码分组提供给去量化器618。去除量化器618使信息去除量化。把信息提供给正弦波求和语音合成器620。The A _I , F ₀ and V _c components are provided to a parameter quantizer 608 which quantizes the information. The quantized information is provided in packets to a channel encoder 610, which quantizes the packets at a low bit rate (eg, such as half rate, or 4kpbs). The packets are provided to a transmitter 612, which modulates the packets and sends the resulting signal over the air to a receiver 614. Receiver 614 receives and demodulates the signal, passing the encoded packets to channel decoder 616 . Channel decoder 616 decodes the packets and provides the decoded packets to dequantizer 618 . Dequantizer 618 dequantizes information. The information is provided to the sine wave sum speech synthesizer 620 .

配置正弦波求和语音合成器620，使之根据上述S[n]的公式对模拟短期语音频谱的多个正弦波进行合成。正弦波的频率f_k是基本频率F₀的倍数或谐波，所述基本频率F₀是准周期性(即，过渡)有声语音段的间距周期性频率。The sine wave summation speech synthesizer 620 is configured to synthesize a plurality of sine waves simulating the short-term speech spectrum according to the above formula of S[n]. The frequency _fk of the sine wave is a multiple or harmonic of _the fundamental frequency _F0 , which is the pitch periodic frequency of quasi-periodic (ie, transitional) voiced speech segments.

正弦波求和语音合成器620还接收来自相位估计器622的相位信息。相位估计器622接收先前帧信息，即，紧靠的先前帧的A_I、F₀和V_c参数。相位估计器622还接收先前帧的重构的N个采样，其中，N是帧长度(即，N是每帧的采样数)。相位估计器622根据先前帧的信息判定帧的初始相位。把初始相位判定提供给正弦波求和语音合成器620。根据当前帧的信息以及初始相位计算(相位估计器622根据过去帧信息执行所述初始相位计算)，正弦波求和语音合成器620产生合成语音帧。如上所述。The sine wave sum speech synthesizer 620 also receives phase information from a phase estimator 622 . The phase estimator 622 receives previous frame information, ie, the _AI , _F0 and _Vc parameters of the immediately previous frame. The phase estimator 622 also receives reconstructed N samples of the previous frame, where N is the frame length (ie, N is the number of samples per frame). The phase estimator 622 determines the initial phase of the frame based on the information of the previous frame. The initial phase decision is provided to the sine wave sum speech synthesizer 620 . Based on the current frame information and the initial phase calculations performed by the phase estimator 622 based on past frame information, the sine-sum speech synthesizer 620 generates synthesized speech frames. as above.

如上所述，谐波编码器通过使用先前帧信息和预测相位从帧到帧线性地变化来合成或重构语音帧。在通常称之为二次相位模型的上述合成模型中，系数B₃(k)表示正在合成的当前有声帧的初始相位。在判定相位中，传统谐波编码器把初始相位设置成零，或者随机地或用某些伪随机产生方法产生初始相位值。为了更精确地预测相位，根据判定紧靠的先前帧是有声语音帧(即，足够周期性的帧)还是过渡语音帧，相位估计器622使用判定初始相位的两种可能方法中之一。如果先前帧是有声语音帧，则使用该帧的最后估计相位值作为当前帧的初始相位值。另一方面，如果先前帧的分类为过渡帧，则从先前帧的频谱得到当前帧的初始相位值，这是通过执行先前帧的解码器输出的DFT而得到的。因此，相位估计器622利用了已经可得到的精确相位信息(因为作为过渡帧的先前帧是以全速率处理的)。As described above, a harmonic encoder synthesizes or reconstructs speech frames by using previous frame information and predictive phases that vary linearly from frame to frame. In the above synthesis model, commonly referred to as the quadratic phase model, the coefficient _B3 (k) represents the initial phase of the current voiced frame being synthesized. In determining the phase, conventional harmonic encoders set the initial phase to zero, or generate initial phase values randomly or with some pseudo-random generation method. To more accurately predict the phase, the phase estimator 622 uses one of two possible methods of determining the initial phase, depending on whether the immediately preceding frame is a voiced speech frame (ie, a sufficiently periodic frame) or a transitional speech frame. If the previous frame was a voiced speech frame, then use the last estimated phase value for that frame as the initial phase value for the current frame. On the other hand, if the classification of the previous frame is a transition frame, the initial phase value of the current frame is obtained from the spectrum of the previous frame, which is obtained by performing the DFT of the decoder output of the previous frame. Thus, the phase estimator 622 makes use of the precise phase information that is already available (since the previous frame that was the transition frame was processed at full rate).

在一个实施例中，一种闭环多模式MDLP语音编码器遵循在图9的流程图中描绘的语音处理步骤。语音编码器通过选择最合适的编码模式对每个输入语音帧的LP剩余进行编码。某些模式在时域中对LP剩余或语音剩余进行编码，而其它模式在频域中表示LP剩余或语音剩余。模式的组有：用于过渡帧的全速率时域(T模式)；用于语音帧的半速率频域(V模式)；用于无声帧的四分之一速率时域(U模式)；以及用于噪声帧的八分之一速率时域(N模式)。In one embodiment, a closed-loop multi-mode MDLP vocoder follows the speech processing steps depicted in the flowchart of FIG. 9 . The speech encoder encodes the LP residue of each input speech frame by selecting the most appropriate encoding mode. Certain modes encode the LP residue or speech residue in the time domain, while other modes represent the LP residue or speech residue in the frequency domain. The groups of modes are: full rate time domain (T mode) for transition frames; half rate frequency domain (V mode) for speech frames; quarter rate time domain (U mode) for silent frames; and eighth rate time domain (N mode) for noisy frames.

熟悉本技术领域的人员会理解，可以遵循在图9中示出的步骤对语音信号或相应的LP剩余进行编码。噪声、无声、过渡以及有声语音的波形特征可以看作是在图10A的图例中的时间函数。噪声、无声、过渡以及有声LP剩余的波形特征可以看作是在图10B的图例中的时间函数。Those skilled in the art will understand that the speech signal or the corresponding LP residue can be encoded following the steps shown in FIG. 9 . The waveform characteristics of noisy, unvoiced, transitional, and voiced speech can be viewed as a function of time in the legend of FIG. 10A. The waveform characteristics of noise, silence, transitions, and the rest of the voiced LP can be viewed as a function of time in the legend of FIG. 10B.

在步骤700中，对有关把四种模式(T、V、U或N)中的哪一种施加于输入语音剩余S(n)作开环模式判定。如果施加T模式，则在步骤702中，在T模式下，即在时域中以全速率处理语音剩余。如果施加U模式，则在步骤704中，在U模式下，即在时域中以四分之一速率处理语音剩余。如果施加N模式，则在步骤706中，在N模式下，即在时域中以八分之一速率处理语音剩余。如果施加V模式，则在步骤708中，在V模式下，即在频域中以半速率处理语音剩余。In step 700, an open loop mode decision is made as to which of the four modes (T, V, U or N) to apply to the input speech remainder S(n). If T-mode is applied, then in step 702 the speech remainder is processed in T-mode, ie at full rate in the time domain. If U-mode is applied, then in step 704 the speech remainder is processed in U-mode, ie at quarter rate in the time domain. If N-mode is applied, then in step 706 the speech remainder is processed in N-mode, ie at one-eighth rate in the time domain. If V-mode is applied, then in step 708 the speech remainder is processed in V-mode, ie at half rate in the frequency domain.

在步骤710中，对在步骤708中编码的语音进行解码，并与输入语音剩余S(n)进行比较，并计算性能测量值D。在步骤712中，把性能测量值D与预定阈值T进行比较。如果性能测量值D大于或等于阈值T，则在步骤714中，步骤708的经频谱编码的语音剩余允许发送。另一方面，如果性能测量值D小于阈值T，则在步骤716中，在T模式下处理输入语音剩余S(n)。在另外的实施例中，不计算性能测量值，并且不定义阈值。而是在V模式下已经处理预定数目的语音剩余帧之后，在T模式下处理下一帧。In step 710, the speech encoded in step 708 is decoded and compared with the input speech remainder S(n), and a performance measure D is calculated. In step 712, the performance measure D is compared to a predetermined threshold T. If the performance measure D is greater than or equal to the threshold T, then in step 714, the spectrally encoded speech of step 708 remains allowed to be sent. On the other hand, if the performance measure D is less than the threshold T, then in step 716, the input speech remains S(n) processed in T mode. In other embodiments, performance measures are not calculated, and thresholds are not defined. Instead, after a predetermined number of remaining frames of speech have been processed in V mode, the next frame is processed in T mode.

有利地，在图9中示出的判定步骤允许仅当需要时才使用高位速率T模式，通过较低位速率V模式利用了有声语音分段的周期性，同时当V模式的执行不合适时通过切换到全速率而防止了任何质量下降。因此，可以以明显低于全速率的平均速率产生接近全速率的话音质量的极高话音质量。此外，可以通过所选择的性能测量值和所选择的阈值来控制目标话音质量。Advantageously, the decision step shown in FIG. 9 allows the high bit rate T-mode to be used only when required, exploiting the periodicity of voiced speech segments through the lower bit rate V-mode, while the execution of the V-mode is inappropriate. Any quality loss is prevented by switching to full rate. Thus, very high voice quality close to that of full rate can be produced at an average rate significantly lower than full rate. In addition, the target voice quality can be controlled by the selected performance measure and the selected threshold.

通过保持模型相位轨迹接近于输入语音的相位轨迹，“升级”到T模式也能改进后续V模式应用的性能。当在V模式中的性能不合适时，步骤710和712的闭环性能检查切换到T模式，从而通过“刷新”初始相位值来改进后续V模式处理的性能，这允许模式相位轨迹再次变成接近原始输入语音相位轨迹。通过如在图11A-C的图例中所示例子，从开始处的第五帧不合适在V模式中执行，如通过所使用的PSNR失真测量值明显看到。结果，没有闭环判定和升级的情况下，模拟的相位轨迹明显偏离原始输入语音相位轨迹，导致PSNR的严重降低，如在图11C中所示。此外，在V模式下处理的后续帧的性能降低。然而，如在图11A中所示，在闭环判定下，把第五帧切换到T模式处理。通过升级使第五帧的性能大大提高，如在图11B中示出的PSNR的提高可以明显地看到。此外，还改善了在V模式下处理的后续帧的性能。"Upgrading" to T-mode can also improve the performance of subsequent V-mode applications by keeping the model phase trajectory close to that of the input speech. When the performance in V-mode is not suitable, the closed-loop performance check of steps 710 and 712 switches to T-mode, thereby improving the performance of subsequent V-mode processing by "refreshing" the initial phase value, which allows the mode-phase trajectory to become close again to Raw input speech phase trace. By way of example as shown in the legends of Figures 11A-C, the fifth frame from the beginning is not suitable to perform in V-mode, as evident by the PSNR distortion measurements used. As a result, without closed-loop decision and upscaling, the simulated phase trajectory deviates significantly from the original input speech phase trajectory, resulting in a severe degradation of PSNR, as shown in Fig. 11C. In addition, the performance of subsequent frames processed in V-mode is reduced. However, as shown in FIG. 11A, under the closed-loop decision, the fifth frame is switched to T-mode processing. The performance of the fifth frame is greatly improved by upscaling, as can be clearly seen in the PSNR improvement shown in Fig. 11B. Also, improved performance for subsequent frames processed in V-mode.

通过提供极精确的初始相位估计值，保证所产生的V模式合成的语音剩余信号与原始输入语音剩余S(n)在时间上精确地对齐，在图9中示出的判定步骤改进了V模式表示的质量。以下述方式，从紧接的先前经解码帧得到第一V模式处理的语音剩余段的初始相位。对于每个谐波，如果先前帧是在V模式下处理的，则把初始相位设置成等于先前帧的最后估计相位。对于每个谐波，如果先前帧是在T模式下处理的，则把初始相位设置成等于先前帧的实际谐波相位。通过使用完整的先前帧采取过去解码剩余的DFT，可以得到先前帧的实际谐波相位。另一方面，通过处理先前帧的各种间距周期，以间距—同步方式，采取过去解码帧的DFT，可以得到先前帧的实际谐波相位。The decision step shown in Fig. 9 improves V-mode by providing an extremely accurate initial phase estimate, ensuring that the resulting V-mode synthesized speech residual signal is precisely time-aligned with the original input speech residual S(n). indicated quality. The initial phase of the first V-mode processed speech remainder is derived from the immediately preceding decoded frame in the following manner. For each harmonic, the initial phase is set equal to the last estimated phase of the previous frame if the previous frame was processed in V-mode. For each harmonic, the initial phase is set equal to the actual harmonic phase of the previous frame if the previous frame was processed in T-mode. The actual harmonic phase of the previous frame can be obtained by taking the DFT of the past decoding remainder using the complete previous frame. On the other hand, by processing the various pitch periods of the previous frame, in a pitch-synchronous manner, taking the DFT of the past decoded frame, the actual harmonic phase of the previous frame can be obtained.

在参考图12描述的一个实施例中，把准周期性信号S的连续帧输入到分析逻辑800中。例如，准周期性信号S可以是例如语音信号。该信号的某些帧是周期性的，而其它帧不是周期性的或非周期性的。分析逻辑800测量信号的振幅，并输出经测量的振幅A。分析逻辑800还测量信号的相位，并输出经测量相位P。把振幅A提供给合成逻辑802。还把相位值P_OUT提供给合成逻辑802。相位值P_OUT可以是经测量相位值P，或者相位值P_OUT可以是估计的相位值P_EST，如下所述。合成逻辑802合成信号，并输出经合成的信号S_SYNTH。In one embodiment described with reference to FIG. 12 , successive frames of a quasi-periodic signal S are input into analysis logic 800 . For example, the quasi-periodic signal S may be eg a speech signal. Some frames of the signal are periodic, while other frames are not periodic or aperiodic. The analysis logic 800 measures the amplitude of the signal and outputs the measured amplitude A. The analysis logic 800 also measures the phase of the signal and outputs the measured phase P. Amplitude A is provided to synthesis logic 802 . The phase value P _OUT is also provided to synthesis logic 802 . The phase value P _OUT may be a measured phase value P, or the phase value P _OUT may be an estimated phase value P _EST , as described below. The synthesis logic 802 synthesizes the signals and outputs the synthesized signal S _SYNTH .

还把准周期性信号S提供给分类逻辑804，它把信号分类成非周期性或周期性。对于信号的非周期性帧，把提供给合成逻辑802的相位P_OUT设置成等于测量相位P。把信号的周期性帧提供给闭环相位估计逻辑806。还把准周期性信号S提供给闭环相位估计逻辑806。闭环相位估计逻辑806估计相位，并输出估计相位P_EST。估计相位是根据初始相位值P_INIT的，把它输入到闭环相位估计逻辑806。如果分类逻辑804把提供的先前帧分类成周期性帧，则初始相位值是该信号先前帧的最后估计相位值。如果分类逻辑804把先前帧分类成非周期性帧，则初始相位值是先前帧的测量相位值P。The quasi-periodic signal S is also provided to classification logic 804, which classifies the signal as either aperiodic or periodic. For non-periodic frames of the signal, the phase P _OUT provided to the synthesis logic 802 is set equal to the measured phase P . The periodic frames of the signal are provided to closed loop phase estimation logic 806 . The quasi-periodic signal S is also provided to closed loop phase estimation logic 806 . Closed-loop phase estimation logic 806 estimates the phase and outputs the estimated phase P _EST . The estimated phase is based on the initial phase value P _INIT , which is input to closed loop phase estimation logic 806 . If the classification logic 804 classified the provided previous frame as a periodic frame, then the initial phase value is the last estimated phase value of the previous frame of the signal. If the classification logic 804 classified the previous frame as an aperiodic frame, then the initial phase value is the measured phase value P of the previous frame.

把估计相位P_EST提供给误差计算逻辑808。把准周期性信号S也提供给误差计算逻辑808。还把测量相位P提供给误差计算逻辑808。此外，误差计算逻辑808接收已经通过合成逻辑802合成的经合成信号S_SYNTH’。经合成信号S_SYNTH’是当输入到合成逻辑802的相位P_OUT等于估计相位P_EST时已经通过合成逻辑802合成的经合成信号S_SYNTH。误差计算逻辑808通过比较测量相位值和估计相位值来计算失真测量值，或误差测量值E。在另外的实施例中，误差计算逻辑808通过比较准周期性信号的输入帧和准周期性信号的经合成帧来计算失真测量值，或误差测量值E。The estimated phase P _EST is provided to error calculation logic 808 . The quasi-periodic signal S is also provided to error calculation logic 808 . The measured phase P is also provided to error calculation logic 808 . Additionally, error calculation logic 808 receives a synthesized signal S _SYNTH ′ that has been synthesized by synthesis logic 802 . The synthesized signal S _SYNTH ′ is the synthesized signal S _SYNTH that has been synthesized by the synthesis logic 802 when the phase P _OUT input to the synthesis logic 802 is equal to the estimated phase P _EST . Error calculation logic 808 calculates a distortion measure, or error measure E, by comparing the measured phase value to the estimated phase value. In further embodiments, the error calculation logic 808 calculates the distortion measure, or error measure E, by comparing the input frame of the quasi-periodic signal with the synthesized frame of the quasi-periodic signal.

把失真测量值E提供给比较逻辑810。比较逻辑810对失真测量值E和预定阈值T进行比较。如果失真测量值E大于预定阈值T，则把测量相位P设置成等于提供给合成逻辑802的相位值P_OUT。另一方面，如果失真测量值E不大于预定阈值T，则把估计相位P_EST设置成等于提供给合成逻辑802的相位值P_OUT。The distortion measurement E is provided to comparison logic 810 . Comparison logic 810 compares the distortion measure E to a predetermined threshold T. If the distortion measurement E is greater than the predetermined threshold T, the measurement phase P is set equal to the phase value P _OUT provided to the synthesis logic 802 . On the other hand, if the distortion measure E is not greater than the predetermined threshold T, then the estimated phase P _EST is set equal to the phase value P _OUT provided to the synthesis logic 802 .

因此，已经描述了一种用于跟踪准周期性信号的相位的新颖方法和设备。熟悉本技术领域的人员会理解，可以用数字信号处理器(DSP)、专用集成电路(ASIC)、分立门或晶体管逻辑、诸如寄存器和FIFO之类的分立硬件部件、执行一组固件指令的处理器或任何传统可编程软件模块以及处理器来实现或执行这里结合所揭示的实施例描述的各种示例逻辑块和算法步骤。有利地，处理器可以是微处理器，但是另一方面，处理器可以是任何传统处理器、控制器、微控制器、或状态机。软件模块可以驻留在RAM存储器、快闪存储器、寄存器或本技术领域中众知的任何其它形式的可写入存储媒体中。熟悉本技术领域的人员会理解，有利地通过电压、电流、电磁波、磁场或粒子、光学场或粒子、或它们的任何组合来表示上面整个说明书中所指的数据、指令、命令、信息、信号、位、码元、以及码片。Accordingly, a novel method and apparatus for tracking the phase of a quasi-periodic signal has been described. Those skilled in the art will appreciate that a digital signal processor (DSP), an application specific integrated circuit (ASIC), discrete gate or transistor logic, discrete hardware components such as registers and FIFOs, can be used to perform the processing of a set of firmware instructions The various example logical blocks and algorithm steps described herein in connection with the disclosed embodiments may be implemented or performed by a processor or any conventional programmable software module and processor. Advantageously, the processor may be a microprocessor, but alternatively the processor may be any conventional processor, controller, microcontroller, or state machine. A software module may reside in RAM memory, flash memory, registers, or any other form of writable storage medium known in the art. Those skilled in the art will understand that the data, instructions, commands, information, and signals referred to throughout the above specification are advantageously represented by voltage, current, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof , bits, symbols, and chips.

因此，已经示出和描述本发明的较佳实施例。然而，熟悉本技术领域的人员会明了，可以对这里揭示的实施例作出许多改变而不偏离本发明的精神和范围。因此，除根据下列权利要求书外，本发明不受限制。Thus, the preferred embodiments of the present invention have been shown and described. However, those skilled in the art will recognize that many changes can be made to the embodiments disclosed herein without departing from the spirit and scope of the invention. Accordingly, the invention is not to be restricted except in light of the following claims.

Claims

1. A method for signal phase tracking, characterized in that said method comprises the following steps:

Estimating the signal phase for frames during which the signal is periodic;

Monitor the performance of the estimated phase with closed-loop performance measurements;

For frames during which the signal is periodic, measure the phase of the signal;

providing the estimated phase as an output phase when the performance of the estimated phase falls below a predetermined threshold level; and

providing the measured phase as an output phase when the performance of the estimated phase falls above a predetermined threshold level; and

The step of measuring the phase of the signal for frames during which the signal is non-periodic.

2. The method of claim 1, further comprising the step of determining, for a given frame, whether the signal is periodic or aperiodic using an open loop periodicity determination.

3. The method of claim 1, wherein said estimating step includes the step of forming a polynomial expression for the phase from a harmonic model.

4. The method of claim 1, wherein said step of estimating includes the step of setting the initial phase value equal to the last phase value estimated for the previous frame if the previous frame was periodic

5. The method of claim 1, wherein said step of estimating includes the step of setting the initial phase value equal to the measured phase value of the previous frame if the previous frame was aperiodic.

6. A method as claimed in claim 5, characterized in that the measured phase values of the previous frame are obtained from the discrete Fourier transform of the previous frame.

7. The method of claim 1, wherein the estimating step includes setting an initial phase if the previous frame is periodic and the performance of the estimated phase of the previous frame falls below a predetermined threshold level. A step whose value is equal to the measured phase value of the previous frame.

8. A method as claimed in claim 7, characterized in that the measured phase values of the previous frame are obtained from the discrete Fourier transform of the previous frame.

9. A method for carrying out quasi-periodic phase tracking of a frame in a periodical period to a signal, comprising the following steps:

For frames during which the signal is periodic, estimate its signal phase;

When the performance of the estimated phase falls above a predetermined threshold level, the measured phase is provided as the output phase.

10. A device for tracking signal phase, characterized in that the device comprises:

An estimating means for estimating the phase of the signal for a frame during which the signal is periodic;

a monitoring device for monitoring the performance of the estimated phase using a closed-loop performance measurement;

a measuring device for measuring the phase of the signal for a frame during which the signal is periodic;

a first implementing means for providing the estimated phase as an output phase when the performance of the estimated phase falls below a predetermined threshold level;

a second implementing means for providing the measured phase as an output phase when the performance of the estimated phase falls above a predetermined threshold level; and

A measuring device for measuring the phase of the signal for a frame during which the signal is non-periodic.

11. The apparatus for phase tracking of a signal as claimed in claim 10, further comprising a decision means for determining whether the signal is periodic or aperiodic for a given frame using an open-loop periodicity decision.

12. The device for tracking the phase of a signal according to claim 10, characterized in that said estimating means comprises means for forming a polynomial expression of the phase from a harmonic model.

13. Apparatus for phase tracking a signal as claimed in claim 10, wherein said estimating means includes means for setting the initial phase value equal to the estimated last phase value of the previous frame if the previous frame was periodic.

14. Apparatus for phase tracking a signal as claimed in claim 10, characterized in that said estimating means comprises means for setting an initial phase value equal to the measured phase value of the previous frame if the previous frame was aperiodic.

15. Apparatus for phase tracking of a signal as claimed in claim 14, characterized in that the measured phase value of the previous frame is obtained from the discrete Fourier transform of the previous frame.

16. The apparatus for phase tracking of a signal according to claim 10, wherein said estimating means comprises if the previous frame is periodic and the estimated phase performance of the previous frame falls below a predetermined threshold level, then Means for setting an initial phase value equal to the phase value measured at the previous frame.

17. Apparatus for phase tracking of a signal as claimed in claim 16, characterized in that the measured phase value of the previous frame is obtained from the discrete Fourier transform of the previous frame.

18. A device for performing quasi-periodic phase tracking on a frame during which the signal is periodic, characterized in that it includes:

a monitoring device for monitoring the performance of the estimated phase using closed-loop performance measurements; and

A second implementing means for providing the measured phase as the output phase when the performance of the estimated phase falls above a predetermined threshold level.