CN1262991C - Method and apparatus for tracking the phase of a quasi-periodic signal - Google Patents
Method and apparatus for tracking the phase of a quasi-periodic signal Download PDFInfo
- Publication number
- CN1262991C CN1262991C CNB008192006A CN00819200A CN1262991C CN 1262991 C CN1262991 C CN 1262991C CN B008192006 A CNB008192006 A CN B008192006A CN 00819200 A CN00819200 A CN 00819200A CN 1262991 C CN1262991 C CN 1262991C
- Authority
- CN
- China
- Prior art keywords
- phase
- signal
- speech
- frame
- periodic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/005—Correction of errors induced by the transmission channel, if related to the coding algorithm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
发明背景Background of the invention
发明领域field of invention
本发明一般与语音处理领域有关,尤其,与用于跟踪准周期性信号的相位的方法和设备有关。The present invention relates generally to the field of speech processing and, more particularly, to a method and apparatus for tracking the phase of a quasi-periodic signal.
背景background
通过数字技术的语音传输已经变得普及,特别是在远距离和数字无线电话应用中。而这又使得人们对确定可以通过信道发送最小信息量而同时保持再现语音的察觉质量方面产生兴趣。如果通过简单的采样和数字化发送语音,则需要大约为每秒64千比特(kbps)的数据速率,才能达到传统模拟电话的语音质量。然而,通过使用语音分析,接着通过合适的编码、发送和在接收机处再合成,可以使数据速率大大地降低。Voice transmission via digital technology has become popular, especially in long-distance and digital wireless telephony applications. This in turn has led to interest in determining the minimum amount of information that can be sent over the channel while maintaining the perceived quality of the reproduced speech. If voice is sent by simple sampling and digitization, a data rate of approximately 64 kilobits per second (kbps) is required to achieve the voice quality of traditional analog telephony. However, the data rate can be reduced considerably by using speech analysis followed by appropriate encoding, transmission and resynthesis at the receiver.
把使用通过析取与人类语音生成的模型有关的参数来压缩语音的技术的装置称为语音编码器。语音编码器把来话语音信号分成时间块或分析帧。语音编码器一般包括编码器和解码器。编码器分析来话语音帧,以析取某些有关参数,然后把参数量化成二进制表示,即,位组或二进制数据分组。通过通信信道把数据分组发送到接收机和解码器。解码器处理数据分组,使它们去量化以产生参数,并使用经去量化的参数重新合成语音帧。A device using a technique of compressing speech by extracting parameters related to a model of human speech generation is called a speech coder. The speech coder divides the incoming speech signal into time blocks or analysis frames. Speech encoders generally include encoders and decoders. The encoder analyzes incoming speech frames to extract certain relevant parameters, which are then quantized into a binary representation, ie, groups of bits or packets of binary data. Data packets are sent to receivers and decoders over a communication channel. A decoder processes data packets, dequantizes them to produce parameters, and uses the dequantized parameters to resynthesize speech frames.
语音编码器的功能是通过除去在语音中固有的所有自然冗余把数字化语音信号压缩成低位速率信号。通过用一组参数来表示输入语音帧,以及使用量化,以一组位来表示参数来实现数字压缩。如果输入语音帧具有位数Ni,而语音编码器产生的数据分组具有位数No,则通过语音编码器得到的压缩率是Cr=Ni/No。而富有挑战性的是在保持经解码语音的高话音质量的情况下同时实现目标压缩率。语音编码器的性能取决于(1)语音模型、或上述分析和合成处理结合执行得好坏,以及(2)在每帧No位的目标位速率处,参数量化处理执行得好坏。因此,语音模型的目标是针对每帧用较小的参数组来捕获语音信号或目标话音质量的要素。The function of the speech coder is to compress the digitized speech signal into a low bit rate signal by removing all natural redundancies inherent in speech. Digital compression is achieved by representing an input speech frame with a set of parameters, and using quantization to represent the parameters with a set of bits. If the input speech frame has the number of bits N i and the data packet produced by the vocoder has the number of bits N o , then the compression rate obtained by the vocoder is C r =N i /N o . Rather, it is challenging to achieve the target compression ratio while maintaining high voice quality of the decoded speech. The performance of a speech coder depends on (1) how well the speech model, or the combination of analysis and synthesis processing described above, performs, and (2) how well the parametric quantization process performs at a target bit rate of N bits per frame. Thus, the goal of a speech model is to capture elements of the speech signal or target speech quality with a small set of parameters per frame.
语音编码器可以作为时域编码器实现,所述时域编码器试图使用高时间分辨率处理来捕获时域语音波形,以每次对语音小段(一般是5毫秒(ms)的子帧)进行编码。对于每个子帧,借助在技术领域中众知的各种搜索算法,从代码簿空间可找到高精确度表示。另一方面,语音编码器可以作为频域编码器实现,频域编码器试图用一组参数捕获输入语音帧的短期语音频谱(分析),并使用相应的合成处理从频谱参数重新创建语音波形。根据A.Gersho和R.M.Gray的“矢量量化和信号压缩(1992)”中描述的已知量化技术,参数量化器通过用存储的码矢量表示来表示参数,从而保留了参数。Speech coders can be implemented as time-domain coders that attempt to capture time-domain speech waveforms using high temporal resolution processing to perform speech on small segments (typically 5 milliseconds (ms) subframes) at a time. coding. For each subframe, a high-accuracy representation is found from the codebook space by means of various search algorithms well known in the art. Speech coders, on the other hand, can be implemented as frequency-domain coders, which attempt to capture the short-term speech spectrum of an input speech frame with a set of parameters (analysis), and use corresponding synthesis processing to recreate the speech waveform from the spectral parameters. According to known quantization techniques described in "Vector Quantization and Signal Compression (1992)" by A. Gersho and R.M. Gray, a parameter quantizer preserves the parameters by representing them with a stored code vector representation.
众知的时域语音编码器是代码激励的线性预测(CELP)编码器,在L.B.Rabiner和R.W.Schafer的“语音信号的数字处理”396-453(1978)中描述所述代码激励的线性预测编码器,在此全部引用作为参考。在CELP编码器中,通过线性预测(LP)分析除去语音信号中的短期相关,或冗余,所述线性预测分析发现短期共振峰滤波系数。把短期预测滤波施加到来话语音帧产生一个LP残余信号,用长期预测滤波参数和后续随机代码簿进一步使该剩余信号模型化和量化。这样,CELP编码使编码时域语音波形的任务分成对LP短期滤波系数编码和对LP剩余编码的单独编码任务。可以按固定速率(即,对于每个帧使用相同的位数No)执行时域编码,或按可变速率(其中,对于不同类型的帧内容使用不同的位速率)执行时域编码。可变速率编码器试图只使用需要的位数量,使对编码器的参数编码达到足够得到目标质量的水平。在美国专利第5,414,796号中描述一种示例可变速率CELP编码器,该专利已转让给本发明的受让人,并在此全部引用作为参考。A well-known time-domain speech coder is the code-excited linear predictive (CELP) coder described in LB Rabiner and RWSchafer, "Digital Processing of Speech Signals" 396-453 (1978), All are incorporated herein by reference. In a CELP coder, short-term correlations, or redundancies, in the speech signal are removed by linear predictive (LP) analysis, which finds short-term formant filter coefficients. Applying short-term predictive filtering to incoming speech frames produces an LP residual signal that is further modeled and quantized using long-term predictive filtering parameters and subsequent random codebooks. Thus, CELP encoding splits the task of encoding the time-domain speech waveform into separate encoding tasks of encoding the LP short-term filter coefficients and encoding the LP residue. Time-domain encoding can be performed at a fixed rate (ie, using the same number of bits N o for each frame), or at a variable rate (where different bit rates are used for different types of frame content). A variable rate encoder attempts to use only the number of bits needed to encode the parameters of the encoder to a level sufficient to achieve the target quality. An example variable rate CELP encoder is described in US Patent No. 5,414,796, assigned to the assignee of the present invention and incorporated herein by reference in its entirety.
诸如CELP编码器之类的时域编码器一般依赖于每帧的较高的位数No以保持时域语音波形的精确度。如果每帧的位数No相对较大,(例如,8kpbs或以上),则这种编码器一般传送优良的话音质量。然而,在低位速率处(4kpbs或以下),由于有限的可用位数,时域编码器就不能保持高质量和稳固的性能。在低位速率处,有限的代码簿空间限制传统时域编码器的波形匹配能力,而在较高速率的商业应用中就能成功使用。Time-domain coders such as CELP coders typically rely on a higher number of bits N o per frame to preserve the accuracy of the time-domain speech waveform. If the number of bits N o per frame is relatively large, (eg, 8 kpbs or more), such encoders generally deliver good voice quality. However, at low bit rates (4kpbs or below), time-domain encoders cannot maintain high quality and robust performance due to the limited number of available bits. At low bit rates, limited codebook space limits the waveform matching capabilities of conventional time-domain coders, which can be used successfully in commercial applications at higher rates.
当前,对于开发在中到低位速率(即,在2.4到4kpbs和以下的范围中)工作的高质量语音编码器存在强烈研究兴趣以及商业需求。应用范围包括无线电话、卫星通信、互联网电话、各种多媒体以及语音流应用、语音邮件以及其它语音存储系统。其驱动力是在数据分组丢失情况下对高性能需求和对稳固性的要求。各种近来语音编码标准化努力是推进低速率语音编码算法的研究和开发的另一个直接驱动力。低速率语音编码器能在每个允许应用的带宽上创建更多信道或用户,并且与合适信道编码的附加层耦合的低速率语音编码器可以适合于编码器规格的总位预算,以及在信道差错情况下传送稳固性能。Currently, there is a strong research interest as well as a commercial need to develop high quality speech coders operating at medium to low bit rates (ie in the range of 2.4 to 4 kpbs and below). Applications include wireless telephony, satellite communications, Internet telephony, various multimedia and voice streaming applications, voice mail and other voice storage systems. The driving force is the need for high performance and the need for robustness in case of data packet loss. Various recent speech coding standardization efforts are another direct driving force for advancing the research and development of low-rate speech coding algorithms. A low-rate vocoder can create more channels or users per bandwidth allowed for the application, and a low-rate vocoder coupled with an additional layer of appropriate channel coding can fit within the total bit budget of the coder specification, as well as in the channel Deliver robust performance under error conditions.
对于按较低位速率的编码,已经开发了各种频谱或频域语音编码的方法,其中,将语音信号作为频谱的时间—变化演变来分析。例如,见R.J.McAulay和T.F.Quatieri的在“语音编码和合成中的正弦编码”,第4章(W.B.Kleijn和K.K.Paliwal编辑,1995)。在频谱编码器中,目标是用一组频谱参数来模仿或预测每个语音输入帧的短期语音频谱,而不是精确地模拟时间—变化语音波形。然后,对频谱参数编码,并用经解码的参数建立语音的输出帧。所产生的合成语音与原始输入语音波形不匹配,但是提供了相似的察觉质量。本技术领域中众知的频域编码器的例子包括多频带激励编码器(MBE)、正弦变换编码器(STC)以及谐波编码器(HC)。这种频域编码器提供具有简洁参数组的高质量参数模型,可以用在低位速率处用较少的可用位数进行精确地量化。For coding at lower bit rates, various spectral or frequency-domain speech coding methods have been developed, in which the speech signal is analyzed as a time-varying evolution of the spectrum. See, eg, R.J. McAulay and T.F. Quatieri in "Sinusoidal Coding in Speech Coding and Synthesis", Chapter 4 (eds. W.B. Kleijn and K.K. Paliwal, 1995). In a spectral encoder, the goal is to model or predict the short-term speech spectrum for each speech input frame with a set of spectral parameters, rather than accurately model the time-varying speech waveform. The spectral parameters are then encoded and the decoded parameters are used to build an output frame of speech. The resulting synthesized speech did not match the original input speech waveform, but provided a similar perceptual quality. Examples of frequency domain coders known in the art include multiband excitation coders (MBE), sinusoidal transform coders (STC) and harmonic coders (HC). Such frequency-domain encoders provide high-quality parametric models with compact parameter sets that can be used for accurate quantization at low bit rates with fewer available bits.
但是,低位速率编码对有限编码分辨率或有限代码簿空间强加了苛刻的限制,这就限制了单个编码机构的有效性,使得编码器在各种背景条件下不能用同等精度来表示各种类型的语音分段。例如,传统低位速率频域编码器不发送语音帧的相位信息。而是,通过使用随机的、人工产生的初始相位值和线性内插技术来重构相位信息。例如,见H.Yang等人在29 Electrontic Letters 856-57(1993年5月)中的“在MBE模型中用于有声语音合成的二次相位内插法”。因为人工地产生相位信息,所以即使量化—去除量化处理完善地保留正弦波的振幅,但是频域编码器所产生的输出语音也不能与原始输入语音对准(即,主要脉冲不同步)。因此证明了在频域编码器中采用任何闭环性能测量法(例如,诸如信噪比(SNR)或感知SNR)是困难的。However, low bit-rate encoding imposes harsh constraints on limited encoding resolution or limited codebook space, which limits the effectiveness of a single encoding mechanism, making it impossible for encoders to represent various types of voice segmentation. For example, traditional low-bit-rate frequency-domain encoders do not send phase information for speech frames. Instead, the phase information is reconstructed by using random, artificially generated initial phase values and linear interpolation techniques. See, eg, "Quadratic Phase Interpolation in the MBE Model for Voiced Speech Synthesis" by H. Yang et al. in 29 Electronic Letters 856-57 (May 1993). Because the phase information is artificially generated, even though the quantization-dequantization process perfectly preserves the amplitude of the sine wave, the output speech produced by the frequency domain coder cannot be aligned with the original input speech (ie, the main pulses are not synchronized). It thus proves difficult to employ any closed-loop performance measure (eg, such as Signal-to-Noise Ratio (SNR) or Perceptual SNR) in a frequency-domain coder.
已经使用多模式编码技术结合开环模式判定处理来执行低位速率语音编码。在Amitava Das等人的Multimode and Variable-Rate Coding of Speech,in SpeechCoding and Synthesis ch.7(W.B.Kleijn和K.K.Paliwal编辑,1995)中描述了一种这样的多模式编码技术。传统的多模式编码器把不同的模式,或编码—解码算法,应用于不同类型的输入语音帧。定制每种模式或编码—解码处理以最有效的方式表示某种类型的语音段,例如,有声语音、无声语音或背景噪声(非语音)。外部的开环模式判定机构检查输入语音帧,并作出有关把哪个模式施加到该帧的判定。一般,通过从输入帧析取许多参数,按照某些临时的和频谱的特征对参数进行估计,以及根据估计以一种模式判定为基础来执行开环模式判定。因此在事先不知道输出语音的确切情况(即,在语音质量或其它性能测量方面,输出语音将和输入语音接近到什么程度)时作出模式判定。Low bit-rate speech coding has been performed using multi-mode coding techniques in combination with an open-loop mode decision process. One such multimode coding technique is described in Multimode and Variable-Rate Coding of Speech by Amitava Das et al., in Speech Coding and Synthesis ch. 7 (eds. W.B. Kleijn and K.K. Paliwal, 1995). Traditional multi-mode encoders apply different modes, or encoding-decoding algorithms, to different types of input speech frames. Each mode or encoding-decoding process is customized to represent a certain type of speech segment in the most efficient manner, for example, voiced speech, unvoiced speech, or background noise (non-speech). An external open-loop mode decision mechanism examines incoming speech frames and makes a decision as to which mode to apply to that frame. Typically, an open-loop mode decision is performed by extracting a number of parameters from an input frame, estimating the parameters according to some temporal and spectral characteristics, and basing a mode decision on the basis of the estimate. Mode decisions are therefore made without knowing in advance exactly what the output speech will be (ie, how close the output speech will be to the input speech in terms of speech quality or other performance measures).
根据上述,希望提供一种能更精确地估计相位信息的低位速率频域编码器。进一步最好是提供一种多模式、混合域编码器,根据帧的语音内容,对某些语音帧进行时域编码,而对其它语音帧进行频域编码。可以进一步希望提供一种混合域编码器,它可以根据闭环编码模式判定机构,对某些语音帧进行时域编码,而对其它语音帧进行频域编码。再又最好是提供一种闭环、多模式、混合域语音编码器,保证编码器产生的输出语音和输入到编码器的原始语音之间的时间同步。在此提出的有关申请中描述了这种语音编码器,所述申请题为“闭环多模式混合域线性预测(MDLP)语音编码器”,其已转让给本发明的受让人,并在此全部引用作为参考。In light of the above, it would be desirable to provide a low bit rate frequency domain encoder that can more accurately estimate phase information. It is further preferred to provide a multi-mode, mixed-domain encoder that encodes some speech frames in the time domain and other speech frames in the frequency domain, depending on the speech content of the frames. It may further be desirable to provide a mixed domain coder which can encode some speech frames in the time domain and other speech frames in the frequency domain according to a closed-loop coding mode decision mechanism. Still more preferably, a closed-loop, multi-mode, mixed-domain speech coder is provided which ensures time synchronization between the output speech produced by the coder and the original speech input to the coder. Such a speech coder is described in a related application filed here, entitled "Closed-Loop Multi-Mode Mixed-Domain Linear Predictive (MDLP) Speech Coder," assigned to the assignee of the present invention, and incorporated herein All cited by reference.
还希望提供一种方法,确保编码器产生的输出语音和输入到编码器的原始语音之间的时间同步。因此需要一种精确地跟踪准周期性信号的相位的方法。It is also desirable to provide a method to ensure time synchronization between the output speech produced by the encoder and the original speech input to the encoder. Therefore, there is a need for a method of accurately tracking the phase of a quasi-periodic signal.
发明内容Contents of the invention
本发明针对一种精确地跟踪准周期性信号的相位的方法。因此,在本发明的一个方面,一种对信号相位跟踪的方法包括下列步骤:对于在信号是周期性期间的帧估计其信号相位;用闭环性能测量监测所估计相位的性能;对于在信号是周期性期间的帧,测量其信号的相位;当所估计相位的性能落在预定阈电平以下时,提供所估计的相位作为输出相位;而当所估计相位的性能落在预定阈电平之上时,提供所测量的相位作为输出相位;以及对于在信号是非周期性期间的帧,测量其信号的相位的步骤。The present invention is directed to a method of accurately tracking the phase of a quasi-periodic signal. Accordingly, in one aspect of the present invention, a method of phase tracking a signal comprises the steps of: estimating the signal phase for frames during which the signal is periodic; monitoring the performance of the estimated phase with closed-loop performance measurements; Periodic period frames for which the phase of the signal is measured; when the performance of the estimated phase falls below a predetermined threshold level, providing the estimated phase as the output phase; and when the performance of the estimated phase falls above the predetermined threshold level , providing the measured phase as an output phase; and the step of measuring the phase of the signal for a frame during which the signal is aperiodic.
在本发明的另一个方面,一种对信号是周期性期间的帧进行准周期相位跟踪的方法包括下列步骤:对于在信号是周期性期间的帧,估计其信号相位;用闭环性能测量监测所估计相位的性能;对于在信号是周期性期间的帧,测量其信号的相位;当所估计相位的性能落在预定阈电平以下时,提供所估计的相位作为输出相位;以及当所估计相位的性能落在预定阈电平之上时,提供所测量的相位作为输出相位。In another aspect of the present invention, a method for quasi-periodic phase tracking of frames during which the signal is periodic comprises the steps of: estimating the signal phase of a frame during which the signal is periodic; Estimate the performance of the phase; measure the phase of the signal for a frame during which the signal is periodic; provide the estimated phase as an output phase when the performance of the estimated phase falls below a predetermined threshold level; and when the performance of the estimated phase Falling above a predetermined threshold level, the measured phase is provided as the output phase.
在本发明的又一个方面,一种对信号相位跟踪的装置包括:一估计装置,用于对在信号是周期性期间的帧,估计其信号的相位;一监测装置,用于用闭环性能测量来监测所估计相位的性能;一用于对在信号是周期性期间的帧,测量其信号的相位的测量装置;一第一执行装置,用于当所估计相位的性能落在预定阈电平以下时,提供所估计的相位作为输出相位;一第二执行装置,当所估计相位的性能落在预定阈电平之上时,提供所测量的相位作为输出相位;以及一用于对在信号是非周期性期间的帧,测量其信号的相位的测量装置。In yet another aspect of the present invention, a device for tracking the phase of a signal includes: an estimating device for estimating the phase of the signal for a frame during which the signal is periodic; a monitoring device for measuring to monitor the performance of the estimated phase; a measuring means for measuring the phase of the signal for a frame during which the signal is periodic; a first implementing means for when the performance of the estimated phase falls below a predetermined threshold level , providing the estimated phase as the output phase; a second implementing means, when the performance of the estimated phase falls above a predetermined threshold level, providing the measured phase as the output phase; A measuring device that measures the phase of a signal during a frame.
在本发明的再一个方面,一种对信号是周期性期间的帧进行准周期相位跟踪的装置包括:一估计装置,用于估计对于在信号是周期性期间帧的信号相位;一监测装置,用于用闭环性能测量来监测所估计相位的性能;以及一用于对在信号是周期性期间的帧,测量其信号的相位的测量装置;一第一执行装置,用于当所估计相位的性能落在预定阈电平以下时,提供所估计的相位作为输出相位;一第二执行装置,当所估计相位的性能落在预定阈电平之上时,提供所测量的相位作为输出相位。In another aspect of the present invention, a device for performing quasi-periodic phase tracking on a frame during which the signal is periodic includes: an estimating device for estimating the signal phase of the frame during the period when the signal is periodic; a monitoring device, For monitoring the performance of the estimated phase with a closed-loop performance measurement; and a measuring means for measuring the phase of the signal for a frame during which the signal is periodic; a first implementing means for when the performance of the estimated phase The estimated phase is provided as the output phase when it falls below a predetermined threshold level, and a second implementing means provides the measured phase as the output phase when the performance of the estimated phase falls above the predetermined threshold level.
附图说明Description of drawings
图1是在每个终端处通过语音编码器终止的通信信道的方框图。Figure 1 is a block diagram of a communication channel terminated by a speech coder at each terminal.
图2是可以在多模式、混合域线性预测(MDLP)语音编码器中使用的编码器的方框图。Figure 2 is a block diagram of an encoder that may be used in a multi-mode, mixed-domain linear prediction (MDLP) speech encoder.
图3是可以在多模式、混合域线性预测(MDLP)语音编码器中使用的解码器的方框图。Figure 3 is a block diagram of a decoder that may be used in a multi-mode, mixed-domain linear prediction (MDLP) speech coder.
图4是流程图,示出可以在图2的编码器中使用的MDLP编码器所执行的MDLP编码步骤。FIG. 4 is a flowchart showing the steps of MDLP encoding performed by an MDLP encoder that may be used in the encoder of FIG. 2 .
图5是流程图,示出语音编码判定过程。Fig. 5 is a flow chart showing a speech encoding decision process.
图6是闭环多模式MDLP语音编码器。Figure 6 is a closed-loop multi-mode MDLP speech coder.
图7是可以在图6的编码器或图2的编码器中使用的频谱编码器的方框图。FIG. 7 is a block diagram of a spectral encoder that may be used in the encoder of FIG. 6 or the encoder of FIG. 2 .
图8是振幅对频率的曲线图,示出在谐波编码器中的正弦波的振幅。Figure 8 is a graph of amplitude versus frequency showing the amplitude of a sine wave in a harmonic encoder.
图9是流程图,示出在多模式MDLP语音编码器中的模式判定处理。Fig. 9 is a flowchart showing mode decision processing in a multi-mode MDLP speech coder.
图10A是语音信号振幅对时间的图例,而图10B是线性预测(LP)残余振幅对时间的图例。FIG. 10A is a graph of speech signal amplitude versus time, and FIG. 10B is a graph of linear prediction (LP) residual amplitude versus time.
图11A是在闭环编码判定中的速率/模式对帧索引的曲线图;图11B是在闭环编码判定中的感知信噪比(PSNR)对帧索引的曲线图;以及图11C是不存在闭环编码判定时的速率/模式和PSNR两者对帧索引的曲线图。Figure 11A is a graph of rate/mode versus frame index in a closed-loop encoding decision; Figure 11B is a graph of perceptual signal-to-noise ratio (PSNR) versus frame index in a closed-loop encoding decision; and Figure 11C is a graph without closed-loop encoding Graph of both rate/mode and PSNR at decision time versus frame index.
图12是用于跟踪准周期性信号的相位的一种装置的方框图。Figure 12 is a block diagram of an apparatus for tracking the phase of a quasi-periodic signal.
具体实施方式Detailed ways
在图1中,第一编码器10接收数字化语音采样s(n),并对采样s(n)进行编码,用于在发送媒体12或通信信道12上发送到第一解码器14。解码器14对经编码语音采样进行解码,并合成输出语音信号sSYNTH(n)。对于在相反方向的发送,第二编码器16对数字化语音采样s(n)进行编码,该采样是在通信信道18上发送。第二解码器接收经编码语音采样,并对它进行解码,产生合成输出语音信号sSYNTH(n)。In FIG. 1 , a
语音采样s(n)表示已经根据本技术领域中众知的各种方法(例如,包括脉冲编码调制(PCM)、压缩扩展μ-律或A-律)中的任何一种进行数字化和量化的语音信号。如在本技术领域中众知,把语音采样s(n)组织成输入数据帧,其中,每个帧包括预定数目的数字语音采样s(n)。在示例实施例中,使用8kHz的采样率,每个20ms帧包括160个采样。在下面描述的实施例中,可以有利地以逐帧为基础改变数据传输率,从8kpbs(全速率)到4kpbs(半速率)到2kpbs(四分之一速率)到1kpbs(八分之一速率)。另一方面,可以使用其它数据速率。如这里所使用,术语“全速”或“高速”一般是指大于或等于8kpbs的数据速率,而术语“半速率”或“低速率”一般是指小于或等于4kpbs的数据速率。改变数据传输率是有利的,因为对于包括相对较少语音信息的帧,可以有选择地使用较低位速率。熟悉本技术领域的人员会理解,可以使用其它采样率、帧大小以及数据传输率。Speech samples s(n) represent speech samples that have been digitized and quantized according to any of various methods known in the art, including, for example, pulse code modulation (PCM), companding μ-law, or A-law voice signal. As is known in the art, the speech samples s(n) are organized into frames of input data, where each frame includes a predetermined number of digital speech samples s(n). In an example embodiment, using a sampling rate of 8kHz, each 20ms frame includes 160 samples. In the embodiments described below, the data transfer rate can advantageously be varied on a frame-by-frame basis from 8 kpbs (full rate) to 4 kpbs (half rate) to 2 kpbs (quarter rate) to 1 kpbs (eighth rate) ). Alternatively, other data rates may be used. As used herein, the terms "full rate" or "high speed" generally refer to data rates greater than or equal to 8 kpbs, while the terms "half rate" or "low rate" generally refer to data rates less than or equal to 4 kpbs. Changing the data transmission rate is advantageous because a lower bit rate can be selectively used for frames comprising relatively little speech information. Those skilled in the art will appreciate that other sampling rates, frame sizes, and data transmission rates may be used.
第一编码器10和第二解码器20一起构成第一语音编码器或语音编码器。相似地,第二编码器16和第一解码器14一起构成第二语音编码器。熟悉本技术领域的人员会理解,可以用数字信号处理器(DSP)、专用集成电路(ASIC)、分立门逻辑、固件或任何传统的可编程软件模块以及微处理器来实现语音编码器。软件模块可驻留在RAM存储器、闪存储器、寄存器或在本技术领域中众知的任何其它形式的可写入媒体中。另一方面,任何传统的处理器、控制器或状态机可以取代微处理器。在美国专利第5,727,123号中(该专利已转让给本发明的受让人,并在此全部引用作为参考),以及在1994年2月16日提出的题为“声码器ASIC”的美国专利申请第08/197,417号中(该专利已转让给本发明的受让人,并在此全部引用作为参考)描述特别为语音编码而设计的示例ASIC。The
如在图2中描绘,根据一个实施例,可以在语音编码器中使用的多模式混合域线性预测(MDLP)编码器100包括模式判定模块102、间距估计模块104、线性预测(LP)分析模块106、LP分析滤波器108、LP量化模块110以及MDLP剩余编码器112。把输入语音帧s(n)提供给模式判定模块102、间距估计模块104、LP分析模块106、以及LP分析滤波器108。模式判定模块102根据每个输入语音帧s(n)的周期性和诸如能量、频谱倾角、过零速率等其它析取参数产生模式索引IM和模式M。在1997年3月11日提出的,题为“用于执行降低速率的可变速率语音编码的方法和设备”的美国申请序列号08/815,354中描述根据周期性对语音帧进行分类的各种方法,该申请已转让给本发明的受让人,并在此全部引用作为参考。这些方法包括在电信工业协会临时标准TIA/EIA IS-127和TIA/EIA IS-733中。As depicted in FIG. 2 , according to one embodiment, a multi-mode mixed-domain linear predictive (MDLP)
间距估计模块104根据每个输入语音帧s(n)产生间距索引Ip和滞后值Po。LP分析模块106在每个输入语音帧s(n)上执行线性预测分析,以产生LP参数a。把LP参数a提供给LP量化模块110。LP量化模块110还接收模式M,从而以与模式有关的方式执行量化处理。LP量化模块110产生LP索引ILP和量化的LP参数。LP分析滤波器108除了接收输入语音帧s(n)之外还接收量化的LP参数。LP分析滤波器108产生LP剩余信号R[n],它表示输入语音帧s(n)和根据量化的线性预测LP参数重构的语音之间的误差。把LP剩余信号R[n]、模式M、以及量化的LP参数提供给MDLP剩余编码器112。根据下面参考图4的流程图描述的步骤,MDLP剩余编码器112依据这些值产生剩余索引IR和量化的剩余信号
The
在图3中,在语音编码器中使用的解码器200包括LP参数解码模块202、剩余解码模块204、模式解码模块206以及LP合成滤波器208。模式解码模块206对模式索引IM进行接收和解码,从其产生模式M。LP参数解码模块202接收模式M以及LP索引ILP。LP参数解码模块202对所接收的值进行解码,以产生量化的LP参数。剩余解码模块204接收剩余索引IR、间距索引Ip和模式索引IM。剩余解码模块204对所接收的值进行解码,以产生量化的剩余信号
把量化的剩余信号
和LP参数提供给LP合成滤波器208,它从中合成经解码的输出语音信号
In FIG. 3 , a
除了MDLP剩余编码器112之外,在本技术领域中众知图2的编码器100和图3的解码器200的各种模块的操作和实施,并在上述美国专利第5,414,796号和L.B.Rabiner和R.W.Schafer的“语音信号的数字处理”396-453(1978)中有描述。With the exception of the MDLP
根据一个实施例,MDLP编码器(未示出)执行在图4的流程图中示出的步骤。MDLP编码器可以是图2的MDLP剩余编码器112。在步骤300中,MDLP编码器检查模式M是全速率(FR)、还是四分之一速率(QR)或八分之一速率(ER)。如果模式M是FR、QR或ER,则MDLP编码器转到步骤302。在步骤302中,MDLP编码器把相应的速率(FR、QR或ER——根据M的值)施加于剩余索引IR。把对于FR模式是高精度、高速率编码,并且可能有利地是CELP编码的时域编码施加于LP剩余帧,或另一方面施加于语音帧。然后发送帧(在包括数-模转换和调制的进一步信号处理之后)。在一个实施例中,帧是表示预测误差的LP剩余帧。在另一个实施例中,帧是表示语音采样的语音帧。According to one embodiment, an MDLP encoder (not shown) performs the steps shown in the flowchart of FIG. 4 . The MDLP encoder may be the MDLP
另一方面,如果在步骤300中,模式M不是FR、QR或ER,(即,如果模式M是半速率(HR)),则MDLP编码器转到步骤304。在步骤304中,把频谱编码(较有利的是谐波编码)以半速率施加于LP剩余,或施加于语音信号。然后MDLP编码器转到步骤306。在步骤306中,通过对经编码语音进行解码并将其与原始输入帧进行比较来得到失真测量值D。然后MDLP编码器转到步骤308。在步骤308中,失真测量值D与预定阈值T进行比较。如果失真测量值D大于阈值T,则把半速率的、频谱编码的帧的相应量化参数进行调制并发送。另一方面,如果失真测量值D不大于阈值T,则MDLP编码器转到步骤310。在步骤310中,在时域中以全速率对经解码的帧进行重新编码。可以使用任何传统高速率高精度编码算法,诸如,最好使用CELP编码。然后调制和发送与该帧相关联的FR模式量化的参数。On the other hand, if in step 300 mode M is not FR, QR or ER, (ie, if mode M is half rate (HR)), then the MDLP encoder goes to step 304 . In step 304, spectral coding (preferably harmonic coding) is applied to the LP residue at half rate, or to the speech signal. The MDLP encoder then goes to step 306. In step 306, a distortion measure D is obtained by decoding the encoded speech and comparing it with the original input frame. The MDLP encoder then goes to step 308. In step 308 the distortion measure D is compared with a predetermined threshold T. If the distortion measure D is greater than the threshold T, the corresponding quantization parameter of the half-rate, spectrally encoded frame is modulated and transmitted. On the other hand, if the distortion measure D is not greater than the threshold T, the MDLP encoder goes to step 310 . In step 310, the decoded frames are re-encoded in the time domain at full rate. Any conventional high rate high precision coding algorithm may be used, such as preferably CELP coding. The parameters quantized for the FR mode associated with that frame are then modulated and transmitted.
如在图5的流程图中示出,根据一个实施例,闭环多模式MDLP语音编码器在处理用于发送的语音采样中遵循一组步骤。在步骤400中,语音编码器接收在连续帧中的语音信号的数字采样。语音编码器根据所接收的给定帧而转到步骤402。在步骤402中,语音编码器检测帧的能量。该能量是帧的语音活动的量度。通过对数字语音采样的振幅平方求和并把结果能量值与阈值进行比较来执行语音检测。在一个实施例中,阈值根据背景噪声的变化电平进行自适应。在上述美国专利第5,414,796号中描述一种示范可变阈值语音活动检测器。某些无声语音声音可以是极低能量的采样,这就可能会将其错误地作为背景噪声进行编码。为了防止这种情况发生,可以使用低能量采样的频谱倾角,以从背景噪声中区分无声语音,如在上述美国专利第5,414,796号中描述。As shown in the flowchart of FIG. 5, a closed-loop multi-mode MDLP vocoder follows a set of steps in processing speech samples for transmission, according to one embodiment. In
在检测帧的能量之后,语音编码器转到步骤404。在步骤404中,语音编码器判定所检测的帧能量是否足以把帧分类作为包含语音信息。如果所检测的帧能量降低到预定阈电平之下,则语音编码器转到步骤406。在步骤406中,语音编码器将帧作为背景噪声(即,无声或静音)编码。在一个实施例中,以1/8速率,或1kpbs对背景噪声进行时域编码。如果在步骤404中,所检测的帧能量符合或超过预定阈电平,则把帧分类为语音帧,并且语音编码器转到步骤408。After detecting the energy of the frame, the speech encoder goes to step 404 . In
在步骤408中,语音编码器确定该帧是否为周期性的。周期性判定的各种已知方法包括,例如,使用过零点和使用归一化的自相关函数(NACF)。尤其,在1997年3月11日提出的,题为“用于执行降低速率的可变速率话音编码的方法和设备”的美国专利申请第08/815,354号中描述使用过零点和NACF来检测周期性,该申请已转让给本发明的受让人,并在此全部引用作为参考。此外,把用于区分有声语音和无声语音的上述方法包括在电信工业协会工业临时标准TIA/EIA IS-127和TIA/EIA IS-733中。如果在步骤408中没有判定帧是周期性的,则语音编码器转到步骤410。在步骤410中,语音编码器把帧作为无声语音编码。在一个实施例中,以1/4速率,或2kpbs,对无声语音帧进行时域编码。如果在步骤408中确定该帧是周期性的,则语音编码器转到步骤412。In
在步骤412中,语音编码器使用本技术领域中众知的周期性检测方法(如在上述美国申请序列号08/815,354中所描述)来确定帧是否有足够的周期性。如果没有判定该帧有足够的周期性,则语音编码器转到步骤414。在步骤414中,将帧作为过渡语音(即,从无声语音到有声语音的过渡)进行时域编码。在一个实施例中,以全速率,或8kpbs,对过渡语音帧进行时域编码。In
如果在步骤412中,语音编码器确定帧有足够的周期性,则语音编码器转到步骤416。在步骤416中,语音编码器把帧作为有声语音编码。在一个实施例中,以半速率,或4kpbs,对有声语音帧进行频谱编码。较有利地,用谐波编码器对有声语音帧进行频谱编码,如下参考图7所述。另一方面,可以使用其它频谱编码器,例如,正弦变换编码器或多频带激励编码器,如在本技术领域中所众知。然后,语音编码器转到步骤418。在步骤418中,语音编码器对经编码的有声语音帧进行解码。然后,语音编码器转到步骤420。在步骤420中,把经解码的有声语音帧与对应于该帧的输入语音采样进行比较,以得到合成语音失真的测量值,并判定半速率有声语音频谱编码模型是否在可接受的限度范围内工作。然后,语音编码器转到步骤422。If in
在步骤422中,语音编码器判定经解码的有声语音帧和对应于该帧的输入语音采样之间的误差是否降低到预定阈值之下。根据一个实施例,以下面参考图6描述的方式来作出这个判定。如果编码失真降低到预定阈值之下,则语音编码器转到步骤426。在步骤426中,语音编码器使用步骤416的参数将该帧作为有声语音发送。如果在步骤422中,编码失真符合或超过预定阈值,则语音编码器转到步骤414,对在步骤400中接收到的数字语音采样的帧作为过渡语音以全速率进行时域编码。In
应该指出,步骤400-410包括开环编码判定模式。另一方面,步骤412-426包括闭环编码判定模式。It should be noted that steps 400-410 comprise an open-loop encoding decision mode. Steps 412-426, on the other hand, include a closed-loop encoding decision mode.
在图6中示出的一个实施例中,一种闭环多模式MDLP语音编码器包括耦合到帧缓冲器502的模-数转换器(A/D)500,而帧缓冲器502依次耦合到控制处理器504。把能量计算器506、有声语音检测器508、背景噪声编码器510、高速率时域编码器512和低速率频谱编码器514耦合到控制处理器504。把频谱解码器516耦合到频谱编码器514,并把误差计算器518耦合到频谱解码器516和控制处理器504。把阈值比较器520耦合到误差计算器518和控制处理器504。把缓冲器522耦合到频谱编码器514、频谱解码器516以及阈值比较器520。In one embodiment shown in FIG. 6, a closed-loop multimode MDLP speech encoder includes an analog-to-digital converter (A/D) 500 coupled to a
在图6的实施例中,语音编码器部件最好作为在语音编码器中的固件或其它软件驱动模块实现,语音编码器本身有利地驻留在DSP或ASIC中。熟悉本技术领域的人员会理解,可以等效地以许多其它已知方法较好地实施语音编码器部件。控制处理器504可以有利地是微处理器,但是另外可以用控制器、状态机或离散逻辑电路实现。In the embodiment of FIG. 6, the vocoder components are preferably implemented as firmware or other software driver modules in the vocoder, which itself advantageously resides in a DSP or ASIC. Those skilled in the art will understand that the vocoder components may be equally well implemented in many other known ways.
在图6的多模式编码器中,把语音信号提供给A/D 500。A/D 500把模拟信号转换成数字语音采样的帧,S(n)。把数字语音采样提供给帧缓冲器502。控制处理器504从帧缓冲器502取得数字语音采样,并把它们提供给能量计算器506。能量计算器506根据下列公式计算语音采样的能量E:In the multi-mode encoder of FIG. 6, the speech signal is provided to A/
其中,这些帧为20ms长,而采样率是8kHz。把计算的能量E发送回控制处理器504。Here, the frames are 20ms long and the sampling rate is 8kHz. The calculated energy E is sent back to the
控制处理器504对计算的语音能量和语音活动阈值进行比较。如果计算的能量低于语音活动阈值,则控制处理器504把数字语音采样从帧缓冲器502传送到背景噪声编码器510。背景噪声编码器510使用保留背景噪声估计所需的最小位数来编码该帧。The
如果计算的能量大于或等于语音活动阈值,则控制处理器504把数字语音采样从帧缓冲器502传送到有声语音检测器508。有声语音检测器508判定语音帧的周期性是否允许使用低位速率频谱编码进行有效编码。判定语音帧中的周期性水平的方法为本技术领域中所众知,并且包括,例如,使用归一自相关函数(NACF)和过零点。在上述美国申请序列号08/815,354中描述过这些方法以及其它方法。If the calculated energy is greater than or equal to the voice activity threshold, the
有声语音检测器508把信号提供给控制处理器504,该信号指示了该语音帧是否包括足够周期性的语音,以通过频谱编码器514有效地编码。如果有声语音检测器508判定语音帧缺少足够的周期性,则控制处理器504把数字语音采样传送到高速率编码器512,它以预定的最大数据速率对语音进行时域编码。在一个实施例中,预定的最大速率是8kpbs,并且高速率编码器512是CELP编码器。
如果有声语音检测器508起初判定语音信号具有足够周期性以通过频谱编码器514有效地编码,则控制处理器504把数字语音采样从帧缓冲器502传送到频谱编码器514。下面参考图7详细描述一种示例频谱编码器。If voiced
频谱编码器514析取所估计的间距频率F0、间距频率的谐波的振幅AI以及语音信息VC。频谱编码器514把这些参数提供给缓冲器522和频谱解码器516。频谱解码器516可以有利地模拟成传统CELP编码器中的编码器的解码器。频谱解码器516根据频谱解码格式(下面将参考图7描述)产生合成语音采样,The
并把合成语音采样提供给误差计算器518。控制处理器504把语音采样S(n)发送到误差计算器518。And the synthesized speech samples are provided to the
误差计算器518根据下列公式计算每个语音采样S(n)和每个相应的合成语音采样
之间的均方误差(MSE):
把计算的MSE提供给阈值比较器520,它判定失真电平是否在可接受的范围内,即,失真电平是否降低到预定阈值之下。The calculated MSE is provided to a threshold comparator 520, which determines whether the distortion level is within an acceptable range, ie, whether the distortion level falls below a predetermined threshold.
如果计算的MSE在可接受的范围内,则阈值比较器520把信号提供给缓冲器502,并使频谱编码的数据从语音编码器输出。另一方面,如果MSE不在可接受的范围内,则阈值比较器520把信号提供给控制处理器504,控制处理器504依次把数字采样从帧缓冲器502传送到高速率时域编码器512。时域编码器512以预定最大速率对帧进行编码,并丢弃缓冲器522的内容。If the calculated MSE is within an acceptable range, threshold comparator 520 provides a signal to buffer 502 and causes spectrally encoded data to be output from the speech encoder. On the other hand, if the MSE is not within an acceptable range, threshold comparator 520 provides a signal to control
在图6的实施例中,所使用的频谱编码的类型是谐波编码,如下面参考图7所描述,但是另一方面,可以是任何类型的频谱编码,例如,正弦波变换编码或多频带激励编码。例如,使用在美国专利第5,195,166中描述的多频带激励编码,以及使用例如在美国专利第4,865,068中描述的正弦波变换编码。In the embodiment of Fig. 6, the type of spectral coding used is harmonic coding, as described below with reference to Fig. 7, but alternatively, any type of spectral coding is possible, for example, sinusoidal transform coding or multiband Incentive coding. For example, use multiband excitation coding as described in US Patent No. 5,195,166, and use sinusoidal transform coding such as described in US Patent No. 4,865,068.
对于相位失真阈值等于或低于周期性参数的过渡帧和有声帧,借助高速率时域编码器512,图6的多模式编码器有利地以全速率或8kpbs使用CELP编码。另一方面,对于这种帧,可以使用任何其它已知形式的高速率时域编码。因此,就以高精度对过渡帧(以及周期性不足够的有声帧)进行编码,以便通过较好地保留相位信息,使输入端和输出端处的波形较好地匹配。在一个实施例中,在处理完阈值超过周期测量值的预定数目的连续有声帧之后,多模式编码器对于一个帧不管阈值比较器520的判定如何,都从半速率频谱编码切换到全速率CELP编码。The multi-mode encoder of FIG. 6 advantageously uses CELP encoding at full rate or 8 kpbs by means of the high rate
应该指出,能量计算器506和有声语音检测器508连同控制处理器504构成开环编码判定。与此相比,频谱编码器514、频谱解码器516、误差计算器518、阈值比较器520、和缓冲器522连同控制处理器504构成闭环编码判定。It should be noted that the
在参考图7描述的一个实施例中,使用频谱编码且最好使用谐波编码,以低位速率对足够周期性的有声帧进行编码。一般定义频谱编码器为算法,所述算法试图通过对每个语音帧在频域中进行模拟和编码以可感知有意义的方法来保留语音频谱特征的时间演变。这些算法的重要部分是:(1)频谱分析或参数估计;(2)参数量化;以及(3)分析具有经解码参数的输出语音波形。因此,其目标是用一组频谱参数保留短期语音频谱的重要特征,对参数编码,然后使用经解码频谱参数合成输出语音。一般,合成输出语音作为正弦波的加权和。正弦波的振幅、频率和相位是分析期间所估计的频谱参数。In one embodiment described with reference to Figure 7, sufficiently periodic voiced frames are encoded at a low bit rate using spectral coding, and preferably harmonic coding. Spectral encoders are generally defined as algorithms that attempt to preserve the temporal evolution of the spectral features of speech in a perceptually meaningful way by simulating and encoding each speech frame in the frequency domain. The important parts of these algorithms are: (1) spectral analysis or parameter estimation; (2) parameter quantization; and (3) analysis of the output speech waveform with decoded parameters. Therefore, the goal is to preserve the important features of the short-term speech spectrum with a set of spectral parameters, encode the parameters, and then use the decoded spectral parameters to synthesize the output speech. Typically, the synthesized output speech is a weighted sum of sinusoids. The amplitude, frequency, and phase of the sine wave are spectral parameters estimated during analysis.
在CELP编码技术中“综合分析”是一种众知的技术,而在频谱编码中没有利用该技术。综合分析不应用于频谱编码器的主要原因是由于初始相位信息的丢失,即使从可察觉的观点来看语音模型在功能上很适合,但是合成语音的均方能量(MSE)可能很高。因此,正确地产生初始相位的另一个优点是可得到一种能力,直接对语音采样和重构语音进行比较以允许判定语音模型是否精确编码语音帧。"Analysis by synthesis" is a well-known technique in CELP coding techniques, which is not utilized in spectral coding. The main reason why synthesis analysis should not be applied to spectral encoders is due to the loss of initial phase information, even if the speech model is functionally fit from a perceptual point of view, the mean square energy (MSE) of the synthesized speech can be high. Thus, another advantage of correctly generating the initial phase is the ability to make direct comparisons of speech samples and reconstructed speech to allow a decision as to whether the speech model is accurately encoding speech frames.
在频谱编码中,如下合成输出语音帧:In spectral coding, output speech frames are synthesized as follows:
S[n]=Sv[n]+Suv[n],n=1,2,…,N,S[n]=S v [n]+S uv [n], n=1, 2,..., N,
其中,N是每帧的采样数,而Sv和Suv分别是有声和无声分量。正弦波求和合成处理如下创建了有声分量:where N is the number of samples per frame, and S v and S uv are the voiced and unvoiced components, respectively. The sum-of-sine-wave synthesis process creates the vocal component as follows:
其中,L是正弦波的总数,fk是在短期频谱中关心的频率,A(k,n)是正弦波的振幅,以及θ(k,n)是正弦波的相位。通过频谱分析处理从输入帧的短期频谱估计振幅、频率和相位参数。无声分量可以在单独的正弦波求和合成中与有声部分一同创建,或可以通过专用无声合成处理分开计算,然后加回到Sv中。where L is the total number of sinusoids, f k is the frequency of interest in the short-term spectrum, A(k,n) is the amplitude of the sinusoids, and θ(k,n) is the phase of the sinusoids. Amplitude, frequency, and phase parameters are estimated from the short-term spectrum of an input frame through spectral analysis processing. The unvoiced component can be created together with the voiced part in a separate sum of sine wave synthesis, or it can be calculated separately through a dedicated unvoiced synthesis process and then added back into Sv .
在图7的实施例中,使用称之为谐波编码器的特定类型频谱编码器,以低位速率对足够周期性的有声帧进行频谱编码。谐波编码器将一帧表征为正弦波的和,分析帧的小段。在正弦波总和中的每个正弦波具有的频率是该帧的间距F0的整数倍。在另外的实施例中,其中,所使用的特定类型的频谱编码器不是谐波编码器,从在0和2π之间的一组实数取得每个帧的正弦波频率。在图7的实施例中,有利地选择在总和中的每个正弦波的振幅和相位,以便总和将与一个周期上的信号最佳地匹配,如图8的图例所示。一般,谐波编码器使用外部分类,对每个输入语音帧标识为有声或无声。对于有声帧,把正弦波的频率限制于估计间距(F0)的谐波,即,fk=kF0。对于无声语音,使用短期频谱的峰值来确定正弦波。内插振幅和相位以如下模仿在帧上它们的演变,如:In the embodiment of Figure 7, sufficiently periodic voiced frames are spectrally encoded at a low bit rate using a specific type of spectral encoder called a harmonic encoder. Harmonic encoders represent a frame as a sum of sine waves, analyzing small segments of the frame. Each sine wave in the sum of sine waves has a frequency that is an integer multiple of the pitch F 0 of the frame. In a further embodiment, where the particular type of spectral encoder used is not a harmonic encoder, the frequency of the sine wave for each frame is derived from a set of real numbers between 0 and 2[pi]. In the embodiment of FIG. 7 , the amplitude and phase of each sine wave in the sum are advantageously chosen so that the sum will best match the signal over one period, as shown in the legend of FIG. 8 . Typically, harmonic encoders use an external classification to identify each input speech frame as voiced or unvoiced. For voiced frames, the frequency of the sine wave is limited to harmonics of the estimated pitch (F 0 ), ie, f k =k F 0 . For unvoiced speech, the peak of the short-term spectrum is used to determine the sine wave. Interpolate amplitude and phase to mimic their evolution over frames as follows:
A(k,n)=C1(k)*n+C2(k)A(k,n)=C 1 (k)*n+C 2 (k)
θ(k,n)=B1(k)*n2+B2(k)*n+B3(k)θ(k,n)=B 1 (k)*n 2 +B 2 (k)*n+B 3 (k)
其中,在取窗口的输入语音帧的短期傅里叶变换(STFT)之外的特定频率位置fk(=kf0)处,从振幅、频率和相位的瞬时值估计系数[Ci(k),Bi(k)]。每个正弦波待发送的参数是振幅和频率。不发送相位,但是根据数种已知技术中的任何一种对其模拟作为替代,例如,所述已知技术包括二次相位模型,或任何传统的相位多项式表达式。where the coefficients [Ci( k ), Bi(k)]. The parameters to be sent for each sine wave are amplitude and frequency. The phase is not transmitted, but instead simulated according to any of several known techniques including, for example, quadratic phase models, or any conventional polynomial representation of the phase.
如在图7中所示,谐波编码器包括耦合到窗口逻辑602和离散傅里叶变换(DFT)和谐波分析逻辑604的间距析取器600。还把接收语音采样S(n)作为输入的间距析取器600耦合到DFT和谐波分析逻辑604。把DFT和谐波分析逻辑604耦合到剩余编码器606。把间距析取器600、DFT和谐波分析逻辑604以及剩余编码器606的每一个都耦合到参数量化器608。把参数量化器608耦合到信道编码器610,依次,把信道编码器610耦合到发射机612。通过标准射频(RF)接口(例如,诸如码分多址(CDMA)空中接口)把发射机612耦合到接收机614。把接收机614耦合到信道解码器616,依次,把信道解码器616耦合到去量化器618。把去量化器618耦合到正弦波求和语音合成器620。还把正弦波求和语音合成器620耦合到相位估计器622,它接收先前帧信息作为输入。配置正弦波求和语音合成器620以产生合成语音输出SSYNTH(n)。As shown in FIG. 7 , the harmonic encoder includes a pitch extractor 600 coupled to window logic 602 and discrete Fourier transform (DFT) and harmonic analysis logic 604 . A spacing extractor 600 that receives speech samples S(n) as input is also coupled to DFT and harmonic analysis logic 604 . The DFT and harmonic analysis logic 604 is coupled to a residual encoder 606 . Each of pitch extractor 600 , DFT and harmonic analysis logic 604 , and residual encoder 606 are coupled to parameter quantizer 608 . Parameter quantizer 608 is coupled to channel encoder 610 , which in turn is coupled to transmitter 612 . Transmitter 612 is coupled to receiver 614 through a standard radio frequency (RF) interface (eg, such as a Code Division Multiple Access (CDMA) air interface). Receiver 614 is coupled to channel decoder 616 , which in turn is coupled to dequantizer 618 . The dequantizer 618 is coupled to a sine wave sum speech synthesizer 620 . A sum-of-sine-wave speech synthesizer 620 is also coupled to a phase estimator 622, which receives previous frame information as input. The sine wave sum speech synthesizer 620 is configured to produce a synthesized speech output S SYNTH (n).
可以用熟悉本技术领域的人员众知的各种不同方法(例如,包括固件或软件模块)来实现间距析取器600、窗口逻辑602、DFT和谐波分析逻辑604、剩余编码器606、参数量化器608、信道编码器610、信道解码器616、去量化器618、正弦波求和语音合成器620以及相位估计器622。可以用熟悉本技术领域的人员众知的任何等效标准RF部件来实现发射机612和接收机614。The pitch extractor 600, window logic 602, DFT and harmonic analysis logic 604, residual encoder 606, parameter Quantizer 608 , Channel Encoder 610 , Channel Decoder 616 , Dequantizer 618 , Sine Wave Sum Speech Synthesizer 620 and Phase Estimator 622 . Transmitter 612 and receiver 614 may be implemented with any equivalent standard RF components known to those skilled in the art.
在图7的谐波编码器中,间距析取器600接收输入采样S(n),析取间距频率信息F0。然后通过窗口逻辑602使采样乘以合适的窗口函数,以允许对语音帧的小段进行分析。DFT和谐波分析逻辑604使用间距析取器600提供的间距信息计算采样的DFT,以产生复数频谱点,从所述复数频谱点析取了谐波振幅AI,如图8的图例所示,其中,L表示谐波的总数。把DFT提供给剩余编码器606,剩余编码器606析取有声信息Vc。In the harmonic encoder of FIG. 7 , the pitch extractor 600 receives the input samples S(n), and extracts the pitch frequency information F 0 . The samples are then multiplied by a suitable window function by windowing logic 602 to allow analysis of small segments of speech frames. DFT and harmonic analysis logic 604 computes the DFT of the samples using the spacing information provided by the spacing extractor 600 to produce the complex spectral points from which the harmonic amplitudes A I are extracted, as shown in the legend of FIG. 8 , where L represents the total number of harmonics. The DFT is provided to a residual encoder 606 which extracts the voiced information V c .
应该指出,如在图8中所示,Vc参数表示在频率轴上的一个点,在该点以上,频谱是无声语音信号特征,并且不再是谐波。与此相比,在点Vc以下,频谱是谐波,并且是有声语音特征。It should be noted that, as shown in Figure 8, the Vc parameter represents the point on the frequency axis above which the spectrum is characteristic of an unvoiced speech signal and is no longer harmonic. In contrast, below point V c the spectrum is harmonic and characteristic of voiced speech.
把AI、F0和Vc分量提供给参数量化器608,它对信息进行量化。把经量化信息以分组形式提供给信道编码器610,信道编码器610以低位速率(例如,诸如半速率,或4kpbs)对分组进行量化。把分组提供给发射机612,发射机612对分组进行调制,并把所产生信号在空中发送到接收机614。接收机614接收和解调信号,把经编码分组传递到信道解码器616。信道解码器616对分组进行解码,并把经解码分组提供给去量化器618。去除量化器618使信息去除量化。把信息提供给正弦波求和语音合成器620。The A I , F 0 and V c components are provided to a parameter quantizer 608 which quantizes the information. The quantized information is provided in packets to a channel encoder 610, which quantizes the packets at a low bit rate (eg, such as half rate, or 4kpbs). The packets are provided to a transmitter 612, which modulates the packets and sends the resulting signal over the air to a receiver 614. Receiver 614 receives and demodulates the signal, passing the encoded packets to channel decoder 616 . Channel decoder 616 decodes the packets and provides the decoded packets to dequantizer 618 . Dequantizer 618 dequantizes information. The information is provided to the sine wave sum speech synthesizer 620 .
配置正弦波求和语音合成器620,使之根据上述S[n]的公式对模拟短期语音频谱的多个正弦波进行合成。正弦波的频率fk是基本频率F0的倍数或谐波,所述基本频率F0是准周期性(即,过渡)有声语音段的间距周期性频率。The sine wave summation speech synthesizer 620 is configured to synthesize a plurality of sine waves simulating the short-term speech spectrum according to the above formula of S[n]. The frequency fk of the sine wave is a multiple or harmonic of the fundamental frequency F0 , which is the pitch periodic frequency of quasi-periodic (ie, transitional) voiced speech segments.
正弦波求和语音合成器620还接收来自相位估计器622的相位信息。相位估计器622接收先前帧信息,即,紧靠的先前帧的AI、F0和Vc参数。相位估计器622还接收先前帧的重构的N个采样,其中,N是帧长度(即,N是每帧的采样数)。相位估计器622根据先前帧的信息判定帧的初始相位。把初始相位判定提供给正弦波求和语音合成器620。根据当前帧的信息以及初始相位计算(相位估计器622根据过去帧信息执行所述初始相位计算),正弦波求和语音合成器620产生合成语音帧。如上所述。The sine wave sum speech synthesizer 620 also receives phase information from a phase estimator 622 . The phase estimator 622 receives previous frame information, ie, the AI , F0 and Vc parameters of the immediately previous frame. The phase estimator 622 also receives reconstructed N samples of the previous frame, where N is the frame length (ie, N is the number of samples per frame). The phase estimator 622 determines the initial phase of the frame based on the information of the previous frame. The initial phase decision is provided to the sine wave sum speech synthesizer 620 . Based on the current frame information and the initial phase calculations performed by the phase estimator 622 based on past frame information, the sine-sum speech synthesizer 620 generates synthesized speech frames. as above.
如上所述,谐波编码器通过使用先前帧信息和预测相位从帧到帧线性地变化来合成或重构语音帧。在通常称之为二次相位模型的上述合成模型中,系数B3(k)表示正在合成的当前有声帧的初始相位。在判定相位中,传统谐波编码器把初始相位设置成零,或者随机地或用某些伪随机产生方法产生初始相位值。为了更精确地预测相位,根据判定紧靠的先前帧是有声语音帧(即,足够周期性的帧)还是过渡语音帧,相位估计器622使用判定初始相位的两种可能方法中之一。如果先前帧是有声语音帧,则使用该帧的最后估计相位值作为当前帧的初始相位值。另一方面,如果先前帧的分类为过渡帧,则从先前帧的频谱得到当前帧的初始相位值,这是通过执行先前帧的解码器输出的DFT而得到的。因此,相位估计器622利用了已经可得到的精确相位信息(因为作为过渡帧的先前帧是以全速率处理的)。As described above, a harmonic encoder synthesizes or reconstructs speech frames by using previous frame information and predictive phases that vary linearly from frame to frame. In the above synthesis model, commonly referred to as the quadratic phase model, the coefficient B3 (k) represents the initial phase of the current voiced frame being synthesized. In determining the phase, conventional harmonic encoders set the initial phase to zero, or generate initial phase values randomly or with some pseudo-random generation method. To more accurately predict the phase, the phase estimator 622 uses one of two possible methods of determining the initial phase, depending on whether the immediately preceding frame is a voiced speech frame (ie, a sufficiently periodic frame) or a transitional speech frame. If the previous frame was a voiced speech frame, then use the last estimated phase value for that frame as the initial phase value for the current frame. On the other hand, if the classification of the previous frame is a transition frame, the initial phase value of the current frame is obtained from the spectrum of the previous frame, which is obtained by performing the DFT of the decoder output of the previous frame. Thus, the phase estimator 622 makes use of the precise phase information that is already available (since the previous frame that was the transition frame was processed at full rate).
在一个实施例中,一种闭环多模式MDLP语音编码器遵循在图9的流程图中描绘的语音处理步骤。语音编码器通过选择最合适的编码模式对每个输入语音帧的LP剩余进行编码。某些模式在时域中对LP剩余或语音剩余进行编码,而其它模式在频域中表示LP剩余或语音剩余。模式的组有:用于过渡帧的全速率时域(T模式);用于语音帧的半速率频域(V模式);用于无声帧的四分之一速率时域(U模式);以及用于噪声帧的八分之一速率时域(N模式)。In one embodiment, a closed-loop multi-mode MDLP vocoder follows the speech processing steps depicted in the flowchart of FIG. 9 . The speech encoder encodes the LP residue of each input speech frame by selecting the most appropriate encoding mode. Certain modes encode the LP residue or speech residue in the time domain, while other modes represent the LP residue or speech residue in the frequency domain. The groups of modes are: full rate time domain (T mode) for transition frames; half rate frequency domain (V mode) for speech frames; quarter rate time domain (U mode) for silent frames; and eighth rate time domain (N mode) for noisy frames.
熟悉本技术领域的人员会理解,可以遵循在图9中示出的步骤对语音信号或相应的LP剩余进行编码。噪声、无声、过渡以及有声语音的波形特征可以看作是在图10A的图例中的时间函数。噪声、无声、过渡以及有声LP剩余的波形特征可以看作是在图10B的图例中的时间函数。Those skilled in the art will understand that the speech signal or the corresponding LP residue can be encoded following the steps shown in FIG. 9 . The waveform characteristics of noisy, unvoiced, transitional, and voiced speech can be viewed as a function of time in the legend of FIG. 10A. The waveform characteristics of noise, silence, transitions, and the rest of the voiced LP can be viewed as a function of time in the legend of FIG. 10B.
在步骤700中,对有关把四种模式(T、V、U或N)中的哪一种施加于输入语音剩余S(n)作开环模式判定。如果施加T模式,则在步骤702中,在T模式下,即在时域中以全速率处理语音剩余。如果施加U模式,则在步骤704中,在U模式下,即在时域中以四分之一速率处理语音剩余。如果施加N模式,则在步骤706中,在N模式下,即在时域中以八分之一速率处理语音剩余。如果施加V模式,则在步骤708中,在V模式下,即在频域中以半速率处理语音剩余。In step 700, an open loop mode decision is made as to which of the four modes (T, V, U or N) to apply to the input speech remainder S(n). If T-mode is applied, then in step 702 the speech remainder is processed in T-mode, ie at full rate in the time domain. If U-mode is applied, then in step 704 the speech remainder is processed in U-mode, ie at quarter rate in the time domain. If N-mode is applied, then in step 706 the speech remainder is processed in N-mode, ie at one-eighth rate in the time domain. If V-mode is applied, then in step 708 the speech remainder is processed in V-mode, ie at half rate in the frequency domain.
在步骤710中,对在步骤708中编码的语音进行解码,并与输入语音剩余S(n)进行比较,并计算性能测量值D。在步骤712中,把性能测量值D与预定阈值T进行比较。如果性能测量值D大于或等于阈值T,则在步骤714中,步骤708的经频谱编码的语音剩余允许发送。另一方面,如果性能测量值D小于阈值T,则在步骤716中,在T模式下处理输入语音剩余S(n)。在另外的实施例中,不计算性能测量值,并且不定义阈值。而是在V模式下已经处理预定数目的语音剩余帧之后,在T模式下处理下一帧。In step 710, the speech encoded in step 708 is decoded and compared with the input speech remainder S(n), and a performance measure D is calculated. In step 712, the performance measure D is compared to a predetermined threshold T. If the performance measure D is greater than or equal to the threshold T, then in step 714, the spectrally encoded speech of step 708 remains allowed to be sent. On the other hand, if the performance measure D is less than the threshold T, then in step 716, the input speech remains S(n) processed in T mode. In other embodiments, performance measures are not calculated, and thresholds are not defined. Instead, after a predetermined number of remaining frames of speech have been processed in V mode, the next frame is processed in T mode.
有利地,在图9中示出的判定步骤允许仅当需要时才使用高位速率T模式,通过较低位速率V模式利用了有声语音分段的周期性,同时当V模式的执行不合适时通过切换到全速率而防止了任何质量下降。因此,可以以明显低于全速率的平均速率产生接近全速率的话音质量的极高话音质量。此外,可以通过所选择的性能测量值和所选择的阈值来控制目标话音质量。Advantageously, the decision step shown in FIG. 9 allows the high bit rate T-mode to be used only when required, exploiting the periodicity of voiced speech segments through the lower bit rate V-mode, while the execution of the V-mode is inappropriate. Any quality loss is prevented by switching to full rate. Thus, very high voice quality close to that of full rate can be produced at an average rate significantly lower than full rate. In addition, the target voice quality can be controlled by the selected performance measure and the selected threshold.
通过保持模型相位轨迹接近于输入语音的相位轨迹,“升级”到T模式也能改进后续V模式应用的性能。当在V模式中的性能不合适时,步骤710和712的闭环性能检查切换到T模式,从而通过“刷新”初始相位值来改进后续V模式处理的性能,这允许模式相位轨迹再次变成接近原始输入语音相位轨迹。通过如在图11A-C的图例中所示例子,从开始处的第五帧不合适在V模式中执行,如通过所使用的PSNR失真测量值明显看到。结果,没有闭环判定和升级的情况下,模拟的相位轨迹明显偏离原始输入语音相位轨迹,导致PSNR的严重降低,如在图11C中所示。此外,在V模式下处理的后续帧的性能降低。然而,如在图11A中所示,在闭环判定下,把第五帧切换到T模式处理。通过升级使第五帧的性能大大提高,如在图11B中示出的PSNR的提高可以明显地看到。此外,还改善了在V模式下处理的后续帧的性能。"Upgrading" to T-mode can also improve the performance of subsequent V-mode applications by keeping the model phase trajectory close to that of the input speech. When the performance in V-mode is not suitable, the closed-loop performance check of steps 710 and 712 switches to T-mode, thereby improving the performance of subsequent V-mode processing by "refreshing" the initial phase value, which allows the mode-phase trajectory to become close again to Raw input speech phase trace. By way of example as shown in the legends of Figures 11A-C, the fifth frame from the beginning is not suitable to perform in V-mode, as evident by the PSNR distortion measurements used. As a result, without closed-loop decision and upscaling, the simulated phase trajectory deviates significantly from the original input speech phase trajectory, resulting in a severe degradation of PSNR, as shown in Fig. 11C. In addition, the performance of subsequent frames processed in V-mode is reduced. However, as shown in FIG. 11A, under the closed-loop decision, the fifth frame is switched to T-mode processing. The performance of the fifth frame is greatly improved by upscaling, as can be clearly seen in the PSNR improvement shown in Fig. 11B. Also, improved performance for subsequent frames processed in V-mode.
通过提供极精确的初始相位估计值,保证所产生的V模式合成的语音剩余信号与原始输入语音剩余S(n)在时间上精确地对齐,在图9中示出的判定步骤改进了V模式表示的质量。以下述方式,从紧接的先前经解码帧得到第一V模式处理的语音剩余段的初始相位。对于每个谐波,如果先前帧是在V模式下处理的,则把初始相位设置成等于先前帧的最后估计相位。对于每个谐波,如果先前帧是在T模式下处理的,则把初始相位设置成等于先前帧的实际谐波相位。通过使用完整的先前帧采取过去解码剩余的DFT,可以得到先前帧的实际谐波相位。另一方面,通过处理先前帧的各种间距周期,以间距—同步方式,采取过去解码帧的DFT,可以得到先前帧的实际谐波相位。The decision step shown in Fig. 9 improves V-mode by providing an extremely accurate initial phase estimate, ensuring that the resulting V-mode synthesized speech residual signal is precisely time-aligned with the original input speech residual S(n). indicated quality. The initial phase of the first V-mode processed speech remainder is derived from the immediately preceding decoded frame in the following manner. For each harmonic, the initial phase is set equal to the last estimated phase of the previous frame if the previous frame was processed in V-mode. For each harmonic, the initial phase is set equal to the actual harmonic phase of the previous frame if the previous frame was processed in T-mode. The actual harmonic phase of the previous frame can be obtained by taking the DFT of the past decoding remainder using the complete previous frame. On the other hand, by processing the various pitch periods of the previous frame, in a pitch-synchronous manner, taking the DFT of the past decoded frame, the actual harmonic phase of the previous frame can be obtained.
在参考图12描述的一个实施例中,把准周期性信号S的连续帧输入到分析逻辑800中。例如,准周期性信号S可以是例如语音信号。该信号的某些帧是周期性的,而其它帧不是周期性的或非周期性的。分析逻辑800测量信号的振幅,并输出经测量的振幅A。分析逻辑800还测量信号的相位,并输出经测量相位P。把振幅A提供给合成逻辑802。还把相位值POUT提供给合成逻辑802。相位值POUT可以是经测量相位值P,或者相位值POUT可以是估计的相位值PEST,如下所述。合成逻辑802合成信号,并输出经合成的信号SSYNTH。In one embodiment described with reference to FIG. 12 , successive frames of a quasi-periodic signal S are input into
还把准周期性信号S提供给分类逻辑804,它把信号分类成非周期性或周期性。对于信号的非周期性帧,把提供给合成逻辑802的相位POUT设置成等于测量相位P。把信号的周期性帧提供给闭环相位估计逻辑806。还把准周期性信号S提供给闭环相位估计逻辑806。闭环相位估计逻辑806估计相位,并输出估计相位PEST。估计相位是根据初始相位值PINIT的,把它输入到闭环相位估计逻辑806。如果分类逻辑804把提供的先前帧分类成周期性帧,则初始相位值是该信号先前帧的最后估计相位值。如果分类逻辑804把先前帧分类成非周期性帧,则初始相位值是先前帧的测量相位值P。The quasi-periodic signal S is also provided to
把估计相位PEST提供给误差计算逻辑808。把准周期性信号S也提供给误差计算逻辑808。还把测量相位P提供给误差计算逻辑808。此外,误差计算逻辑808接收已经通过合成逻辑802合成的经合成信号SSYNTH’。经合成信号SSYNTH’是当输入到合成逻辑802的相位POUT等于估计相位PEST时已经通过合成逻辑802合成的经合成信号SSYNTH。误差计算逻辑808通过比较测量相位值和估计相位值来计算失真测量值,或误差测量值E。在另外的实施例中,误差计算逻辑808通过比较准周期性信号的输入帧和准周期性信号的经合成帧来计算失真测量值,或误差测量值E。The estimated phase P EST is provided to error
把失真测量值E提供给比较逻辑810。比较逻辑810对失真测量值E和预定阈值T进行比较。如果失真测量值E大于预定阈值T,则把测量相位P设置成等于提供给合成逻辑802的相位值POUT。另一方面,如果失真测量值E不大于预定阈值T,则把估计相位PEST设置成等于提供给合成逻辑802的相位值POUT。The distortion measurement E is provided to
因此,已经描述了一种用于跟踪准周期性信号的相位的新颖方法和设备。熟悉本技术领域的人员会理解,可以用数字信号处理器(DSP)、专用集成电路(ASIC)、分立门或晶体管逻辑、诸如寄存器和FIFO之类的分立硬件部件、执行一组固件指令的处理器或任何传统可编程软件模块以及处理器来实现或执行这里结合所揭示的实施例描述的各种示例逻辑块和算法步骤。有利地,处理器可以是微处理器,但是另一方面,处理器可以是任何传统处理器、控制器、微控制器、或状态机。软件模块可以驻留在RAM存储器、快闪存储器、寄存器或本技术领域中众知的任何其它形式的可写入存储媒体中。熟悉本技术领域的人员会理解,有利地通过电压、电流、电磁波、磁场或粒子、光学场或粒子、或它们的任何组合来表示上面整个说明书中所指的数据、指令、命令、信息、信号、位、码元、以及码片。Accordingly, a novel method and apparatus for tracking the phase of a quasi-periodic signal has been described. Those skilled in the art will appreciate that a digital signal processor (DSP), an application specific integrated circuit (ASIC), discrete gate or transistor logic, discrete hardware components such as registers and FIFOs, can be used to perform the processing of a set of firmware instructions The various example logical blocks and algorithm steps described herein in connection with the disclosed embodiments may be implemented or performed by a processor or any conventional programmable software module and processor. Advantageously, the processor may be a microprocessor, but alternatively the processor may be any conventional processor, controller, microcontroller, or state machine. A software module may reside in RAM memory, flash memory, registers, or any other form of writable storage medium known in the art. Those skilled in the art will understand that the data, instructions, commands, information, and signals referred to throughout the above specification are advantageously represented by voltage, current, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof , bits, symbols, and chips.
因此,已经示出和描述本发明的较佳实施例。然而,熟悉本技术领域的人员会明了,可以对这里揭示的实施例作出许多改变而不偏离本发明的精神和范围。因此,除根据下列权利要求书外,本发明不受限制。Thus, the preferred embodiments of the present invention have been shown and described. However, those skilled in the art will recognize that many changes can be made to the embodiments disclosed herein without departing from the spirit and scope of the invention. Accordingly, the invention is not to be restricted except in light of the following claims.
Claims (18)
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/US2000/005141 WO2002003381A1 (en) | 2000-02-29 | 2000-02-29 | Method and apparatus for tracking the phase of a quasi-periodic signal |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN1437746A CN1437746A (en) | 2003-08-20 |
| CN1262991C true CN1262991C (en) | 2006-07-05 |
Family
ID=21741099
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNB008192006A Expired - Lifetime CN1262991C (en) | 2000-02-29 | 2000-02-29 | Method and apparatus for tracking the phase of a quasi-periodic signal |
Country Status (7)
| Country | Link |
|---|---|
| EP (1) | EP1259955B1 (en) |
| JP (1) | JP4567289B2 (en) |
| KR (1) | KR100711040B1 (en) |
| CN (1) | CN1262991C (en) |
| AU (1) | AU2000233852A1 (en) |
| DE (1) | DE60025471T2 (en) |
| WO (1) | WO2002003381A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103811011A (en) * | 2012-11-02 | 2014-05-21 | 富士通株式会社 | Audio sine wave detection method and device |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104347082B (en) * | 2013-07-24 | 2017-10-24 | 富士通株式会社 | String ripple frame detection method and equipment and audio coding method and equipment |
| EP2963646A1 (en) | 2014-07-01 | 2016-01-06 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Decoder and method for decoding an audio signal, encoder and method for encoding an audio signal |
| CN108776319B (en) * | 2018-04-25 | 2022-11-08 | 中国电力科学研究院有限公司 | Optical fiber current transformer data accuracy self-diagnosis method and system |
| CN109917360A (en) * | 2019-03-01 | 2019-06-21 | 吉林大学 | A Staggered PRI Estimation Method for Aliased Pulses |
Family Cites Families (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2759646B2 (en) * | 1985-03-18 | 1998-05-28 | マサチユ−セツツ インステイテユ−ト オブ テクノロジ− | Sound waveform processing |
| CA1332982C (en) * | 1987-04-02 | 1994-11-08 | Robert J. Mcauley | Coding of acoustic waveforms |
| US5023910A (en) * | 1988-04-08 | 1991-06-11 | At&T Bell Laboratories | Vector quantization in a harmonic speech coding arrangement |
| JPH02288739A (en) * | 1989-04-28 | 1990-11-28 | Fujitsu Ltd | Voice coding and decoding transmission system |
| US5765127A (en) * | 1992-03-18 | 1998-06-09 | Sony Corp | High efficiency encoding method |
| US5787387A (en) * | 1994-07-11 | 1998-07-28 | Voxware, Inc. | Harmonic adaptive speech coding method and system |
| JP3680374B2 (en) * | 1995-09-28 | 2005-08-10 | ソニー株式会社 | Speech synthesis method |
| JPH10214100A (en) * | 1997-01-31 | 1998-08-11 | Sony Corp | Voice synthesizing method |
| US6233550B1 (en) * | 1997-08-29 | 2001-05-15 | The Regents Of The University Of California | Method and apparatus for hybrid coding of speech at 4kbps |
| JPH11224099A (en) * | 1998-02-06 | 1999-08-17 | Sony Corp | Phase quantization apparatus and method |
-
2000
- 2000-02-29 DE DE60025471T patent/DE60025471T2/en not_active Expired - Lifetime
- 2000-02-29 WO PCT/US2000/005141 patent/WO2002003381A1/en not_active Ceased
- 2000-02-29 KR KR1020027011075A patent/KR100711040B1/en not_active Expired - Lifetime
- 2000-02-29 AU AU2000233852A patent/AU2000233852A1/en not_active Abandoned
- 2000-02-29 JP JP2002507369A patent/JP4567289B2/en not_active Expired - Lifetime
- 2000-02-29 EP EP00912054A patent/EP1259955B1/en not_active Expired - Lifetime
- 2000-02-29 CN CNB008192006A patent/CN1262991C/en not_active Expired - Lifetime
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103811011A (en) * | 2012-11-02 | 2014-05-21 | 富士通株式会社 | Audio sine wave detection method and device |
Also Published As
| Publication number | Publication date |
|---|---|
| JP4567289B2 (en) | 2010-10-20 |
| KR20020081352A (en) | 2002-10-26 |
| EP1259955B1 (en) | 2006-01-11 |
| CN1437746A (en) | 2003-08-20 |
| EP1259955A1 (en) | 2002-11-27 |
| JP2004502203A (en) | 2004-01-22 |
| DE60025471D1 (en) | 2006-04-06 |
| AU2000233852A1 (en) | 2002-01-14 |
| DE60025471T2 (en) | 2006-08-24 |
| HK1055834A1 (en) | 2004-01-21 |
| WO2002003381A1 (en) | 2002-01-10 |
| KR100711040B1 (en) | 2007-04-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN1266674C (en) | Closed-loop multimode mixed-domain linear prediction (MDLP) speech coder | |
| CN100350453C (en) | Robust speech classification method and device | |
| US6640209B1 (en) | Closed-loop multimode mixed-domain linear prediction (MDLP) speech coder | |
| CN1302459C (en) | A low-bit-rate coding method and apparatus for unvoiced speed | |
| CN1188832C (en) | Multipulse interpolative coding of transition speech frames | |
| US6449592B1 (en) | Method and apparatus for tracking the phase of a quasi-periodic signal | |
| CN1262991C (en) | Method and apparatus for tracking the phase of a quasi-periodic signal | |
| HK1055834B (en) | Method and apparatus for tracking the phase of a quasi-periodic signal | |
| JP2011090311A (en) | Linear prediction voice coder in mixed domain of multimode of closed loop | |
| HK1055833B (en) | Closed-loop multimode mixed-domain linear prediction speech coder and method of processing frames | |
| HK1067444B (en) | Method and apparatus for robust speech classification | |
| HK1055174A1 (en) | Frame erasure compensation method in a variable rate speech coder and apparautus using the same |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CX01 | Expiry of patent term |
Granted publication date: 20060705 |
|
| CX01 | Expiry of patent term |