WO1994010682A1

WO1994010682A1 - Method of encoding speech

Info

Publication number: WO1994010682A1
Application number: PCT/DE1993/000999
Authority: WO
Inventors: Bertram Wächter
Original assignee: ANT Nachrichtentechnik GmbH
Current assignee: Bosch Telecom GmbH
Priority date: 1992-10-28
Filing date: 1993-10-20
Publication date: 1994-05-11
Anticipated expiration: 1995-04-28
Also published as: DE4236315C1; AU5174293A

Abstract

The invention concerns a speech-encoding method using analysis-by-synthesis techniques. The speech signal is scanned, a frame formed from a predetermined number of scanning samples, and the coefficients of a grade P speech-synthesis filter determined from the samples in each frame. Using these coefficients, a number P of so-called line-spectrum parameters (LSPs) are determined and quantized, for transmission over a channel with limited transmission capacity. The method is characterized in that every other line-spectrum parameter (LSP) is quantized in scalar (absolute) fashion and that the line spectrum parameters (LSPs) lying between them are transformed (normalized) before quantization. The invention is intended particularly for use in speech-encoding equipment in portable radio equipment.

Description

Beschreibung description

Verfahren zur Sprachcodierung Die Erfindung geht aus von einem Verfahren zur Sprachcodierung unter Verwendung der Analyse-durch-Synthese-Methode gemäß Oberbegriff des Anspruches 1 bzw. 2. Solche Method for speech coding The invention is based on a method for speech coding using the analysis-by-synthesis method according to the preamble of claims 1 and 2, respectively

Sprachcodierverfahren sind bekannt, beispielsweise durch die deutsche Patentschrift 38 34 871. Speech coding methods are known, for example from German Patent 38 34 871.

Den Sprachcodierungsverfahren gemein ist eine The speech coding procedure has one thing in common

Prädiktionsanalyse des Eingangssignales (Linear Prediction-Coder, LPC) . Dabei wird das Sprachsignal am Eingang des Prediction analysis of the input signal (linear prediction coder, LPC). The speech signal at the input of the

Encoders in Rahmen einer bestimmten Dauer von z.B. 20-30 ms unterteilt. Jeder Sprachrahmen wird im Encoder einer linearen Prädiktionsanalyse unterworfen, welche lineare Abhängigkeiten im Sprachsignal entfernt. Die lineare Prädiktion wird mit Hilfe von FIR-Filtern (Finite Impulse Response) durchgeführt. Die Koeffizienten dieser linearen Filter werden in jedem Encoders within a certain period of e.g. 20-30 ms divided. Each speech frame is subjected to a linear prediction analysis in the encoder, which removes linear dependencies in the speech signal. The linear prediction is carried out with the help of FIR filters (Finite Impulse Response). The coefficients of these linear filters are in each

Rahmen neu ermittelt, d.h. es handelt sich hier um adaptive Filter. Frame redetermined, i.e. these are adaptive filters.

Die heutigen Sprachcodierer welche bei Bitraten zwischen 4 und 16 kBit/sec. arbeiten, benutzen in der Regel die Analysedurch-Synthese-Methode wobei im Sender die oben angeführten Filterkoeffizienten und eine dazu gehörige Anregung so bestimmt werden, daß die Energie des gewichteten Fehlers e(n) zwischen Originalsprache und der synthetisierten Sprache möglichst klein wird. Today's speech coders which operate at bit rates between 4 and 16 kbit / sec. work, generally use the analysis-by-synthesis method, in which the filter coefficients listed above and an associated excitation are determined in the transmitter so that the energy of the weighted error e (n) between the original language and the synthesized language is as small as possible.

Zum Empfänger übertragen werden müssen Parameter, welche die Anregung beschreiben, und die schon weiter oben erwähnten Koeffizienten des linearen Filters. Auf die Ermittlung der Koeffizienten des linearen Filters soll hier nicht näher eingegangen werden. Als Ergebnis erhält man ein nichtrekursives Filter von Grade P mit der Parameters describing the excitation and the coefficients of the linear filter already mentioned above must be transmitted to the receiver. The determination of the coefficients of the linear filter will not be discussed in more detail here. As a result you get a Grade P non-recursive filter with the

Übertragungsfunktion

Transfer function

Die inverse Übertragungsfunktion H(z) = 1/A(z) wandelt das Federsignal (die Anregung) in das (synthetisierte) The inverse transfer function H (z) = 1 / A (z) converts the spring signal (the excitation) into the (synthesized)

Sprachsignal um:

Speech signal at:

Das nach dieser Methode berechnete Filter H(z) ist ohne The filter H (z) calculated according to this method is without

Quantisierung der Filterkoeffizienten a_i in jedem Fall stabil Quantization of the filter coefficients a _i stable in any case

Die Filterkoeffizienten a_i weisen jedoch eine große Dynamik auf und sind deshalb für die Quantisierung und Übertragung schlecht geeignet. Außerdem besteht nicht eine einfache However, the filter coefficients a _i have a large dynamic range and are therefore poorly suited for quantization and transmission. Besides, there is no easy one

Möglichkeit, im Empfänger die Stabilität des rekursiven Possibility of the stability of the recursive in the receiver

Filters zu prüfen. Check filters.

Bekannt ist es, daß die sogenannten Line Spectrum Parameter LSP für die Quantisierung und Übertragung, also zur It is known that the so-called line spectrum parameters LSP for quantization and transmission

Beschreibung des Prädiktorfilters H(z) geeignet sind. Man erhält diese Parameter als Nullstellen eines symmetrischen Polynoms Description of the predictor filter H (z) are suitable. These parameters are obtained as zeros of a symmetric polynomial

F₁(z) = A(z) + Z^-(P+1) A(z^-1) und eines antisymmetrischen Polynorms F ₁ (z) = A (z) + Z ^{- (P + 1)} A (z ^-1 ) and an antisymmetric polynorm

F₂(z) = A(z) - Z^-(P+1) A(z^-1) F ₂ (z) = A (z) - Z ^{- (P + 1)} A (z ^-1 )

Die Nullstellen z_Oi von F₁ und F₂ haben folgende Eigenschaften alle Nullstellen liegen auf dem Einheitskreis, sind also durch die Angabe einer Phase _i ausreichend beschrieben - alle Nullstellen sind einfach The zeros z _Oi of F ₁ and F ₂ have the following properties, all zeros are on the unit circle, so they are adequately described by specifying a phase _i - all zeros are simple

- auf dem Einheitskreis liegt abwechselnd eine Nullstelle von F₁ und F₂. - On the unit circle there is an alternating zero of F ₁ and F ₂ .

In Figur 2 sind die Nullstellen von F₁(z) und F₂(z) für die Fälle P = 6 und P = 5 dargestellt. Alle Nullstellen z_i können durch die Argumente ω_i oder durch den daraus abgeleiteten Frequenzwert

dargestellt werden. FIG. 2 shows the zeros of F ₁ (z) and F ₂ (z) for the cases P = 6 and P = 5. All zeros z _i can by the arguments ω _i or by the derived frequency value

being represented.

Da die Nullstellen in konjugiert komplexen Paaren auftreten und Nullstellen bei ± 1 in jedem Fall vorliegen, sind die Polynome F₁ und F₂ durch die Angabe von P Werten _i Since the zeros occur in conjugate complex pairs and there are zeros at ± 1 in any case, the polynomials F ₁ and F ₂ are _i by specifying P values

vollständig bestimmt. completely determined.

Nach den oben beschriebenen Eigenschaften muß gelten According to the properties described above must apply

ω₁ < ω ₂ < ... <ω_P ω ₁ <ω ₂ <... <ω _P

Diese Monotonie-Eigenschaft ist zwingend notwendig, damit das rekursive Filter H(z) stabil ist. Man hat damit ein Kriterium, um die Stabilität des Filters zu überprüfen. This monotony property is imperative for the recursive filter H (z) to be stable. This gives you a criterion for checking the stability of the filter.

Bei Änderung der Charakteristik des Spektrums des When changing the characteristic of the spectrum of the

Eingangssignals ändert sich die Verteilung einzelner LSPs stark. Als Beispiel ist in Fig. 1 die Verteilung der LSP für Filtergrad P = 10 dargestellt. Im oberen Bild, Fig. la, ist die Eingangssprache nur tiefpaßgefiltert, im unteren Bild, Fig. lb, IRS gefiltert (bandbegrentz) nach CCITT P.48. Input signal, the distribution of individual LSPs changes significantly. 1 shows the distribution of the LSP for filter grade P = 10 as an example. In the upper picture, Fig. La, the input language is only low pass filtered, in the lower picture, Fig. Lb, IRS filtered (band limited) according to CCITT P.48.

Ein übliches Verfahren ist die skalare Quantisierung jedes einzelnen LSP, beispielsweise werden in 4,8 kBit/sec. CELP- Sprachcodec nach dem Federal Standard 1016 des US- Verteidigungsministeriums US die Line Spectrum Parameter skalar mit insgesamt 34 Bit quantisiert. A common method is the scalar quantization of each individual LSP, for example, in 4.8 kbit / sec. CELP speech codec according to the Federal Standard 1016 of the US Department of Defense the Line Spectrum Parameters scalar quantized with a total of 34 bits.

Zu beachten ist bei der Quantisierung, daß auch nach der It should be noted in the quantization that even after the

Quantisierung die Monotonie-Eigenschaft erhalten werden muß, damit das rekursive Filter stabil ist; d.h. es muß gelten :

. Quantization the monotonicity property must be obtained for the recursive filter to be stable; ie the following must apply:

,

Da sich die Werte-Bereiche der Quantisierer für ω_i und ω_i+1 überlappen, sind nach der Quantisierung von ω_χ alle Since the value ranges of the quantizers for ω _i and ω _i +1 overlap, all are after the quantization of ω _χ

Quantisierungsstufen von ω_i+1 ausgeschlossen, welche diese strenge Monotonie verletzen (siehe Figur 3). Umgekehrt sind auch nach der Quantisierung von ω_i+l auch Werte aus dem Quantization levels of ω _{i + 1} excluded, which violate this strict monotony (see Figure 3). Conversely, even after the quantization of ω _{i + l} , values from the

Quantisierer für ω_i nicht mehr zulässig. Dies bedeutet, daß ein Teil der Bits, welche für die Quantisierung der Parameter LSP zur Verfügung stehen, nicht vollständig genutzt werden. Nach Figur 3 werden von 8 möglichen Stufen für ω_i+l Quantizer no longer permissible for ω _i . This means that some of the bits that are available for the quantization of the parameters LSP are not fully used. According to FIG. 3 there are 8 possible steps for ω _{i + l}

tatsächlich nur 5 benutzt. actually only 5 used.

Ein weiterer Nachteil dieses Verfahrens ist, daß eine Adaption an unterschiedliche Eingangsspektren des Sprachsignals nicht möglich ist. Soll der Quantisierer hierfür eingesetzt werden können, so vergrößert sich der Wertebereich einzelner Line Spectrum Parameter. Dies führt zu einer Erhöhung der Bitrate. Another disadvantage of this method is that adaptation to different input spectra of the speech signal is not possible. If the quantizer can be used for this, the range of values for individual line spectrum parameters increases. This leads to an increase in the bit rate.

In den Literaturstellen [5] und [6] wird eine Reduzierung der Bitrate für die Übertragung der Line Spectrum Parameter durch Quantisierung deren Differenzen vorgeschlagen. Dabei wird der erste LSP wie oben skalar quantisiert .References [5] and [6] suggest reducing the bit rate for the transmission of the line spectrum parameters by quantizing their differences. The first LSP is scalarized as above.

Für alle weiteren LSP wird die Differenz zum vorangegangenen Wert berechnet und diese dann quantisiert.

For all other LSPs, the difference to the previous value is calculated and then quantized.

Dieses Verfahren adaptiert sich gut an unterschiedlichen This method adapts well to different ones

Eingangsspektren des Sprachsignals, da nur der Wertebereich des ersten LSP ausreichend groß gewählt werden muß. Input spectra of the speech signal, since only the value range of the first LSP has to be chosen sufficiently large.

Ein Nachteil dieses Verfahrens besteht in der Fortpflanzung von Fehlern. Tritt bei der Übertragung von ω_x ein Fehler auf, so werden alle ω_i, für i=x bis P falsch dekodiert. A disadvantage of this method is the propagation of errors. If an error occurs during the transmission of ω _x , all ω _i , for i = x to P, are decoded incorrectly.

Der vorliegenden Erfindung lag die Aufgabe zugrunde, ein The present invention was based on the object

Verfahren der eingangs genannten Art anzugeben, welches in der Lage ist bei gleichbleibender Bitrate eine Verbesserung der Sprachqualität zu erreichen oder aber bei gleichbleibender Sprachqualität eine Verringerung der Bitrate zu erzielen. Method of the type mentioned at the beginning, which is able to achieve an improvement in the speech quality with a constant bit rate or a reduction in the bit rate with a constant speech quality.

Außerdem soll eine Verringerung der Empfindlichkeit des In addition, a reduction in the sensitivity of the

Sprachcodecs gegenüber Sprachsignalen mit unterschiedlichen Eingangscharakteristika erreicht werden. Der benötigte Speech codecs can be achieved compared to speech signals with different input characteristics. The one needed

Schaltungsaufwand soll dabei nicht allzu hoch sein. Circuitry should not be too high.

Diese Aufgabe wurde gelöst durch die Ansprüche 1 und 2. This object was achieved by claims 1 and 2.

Vorteilhafte Ausgestaltungen ergeben sich durch die Advantageous configurations result from the

Unteransprüche. Dependent claims.

Das erfindungsgemäße Verfahren erzielt die Vorteile eine Verbesserung der Sprachqualität bei gleichbleibender Bitrate bzw. eine Verringerung der Bitrate bei gleichbleibender The method according to the invention achieves the advantages of an improvement in the speech quality with a constant bit rate or a reduction in the bit rate with a constant bit rate

Sprachqualität. Außerdem weist das erfindungsgemäße Verfahren eine verringerte Empfindlichkeit des Sprachcodecs gegenüber Sprachsignalen mit sehr unterschiedlichen Eingangsspektren auf. Ein weiterer Vorteil besteht darin, daß sich ein Übertragunsfehler bei einem LSP nur auf maximal zwei weitere LSP-Werte auswirkt. Voice quality. In addition, the method according to the invention has a reduced sensitivity of the speech codec to speech signals with very different input spectra. Another advantage is that a Transmission error with an LSP only affects a maximum of two further LSP values.

Die Erfindung geht aus von der Idee, weder alle LSP-Parameter skalar zu quantisieren noch nur einen einzigen der insgesamt P Parameter skalar zu quantisieren, sondern nur jeden n ten der P Parameter skalar zu quantisieren und die dazwischen The invention is based on the idea of neither quantizing all LSP parameters scalarly nor quantifying only a single one of the total P parameters scalarly, but rather only quantizing every nth of the P parameters scalarly and the in between

liegenden Parameter zu transformieren bzw. abzubilden und danach zu quantisieren. transform or map lying parameters and then quantize them.

Das Verfahren wird im folgenden anhand eines The process is described below using a

Ausführungsbeispieles näher beschrieben, wobei angenommen wird, daß P eine gerade Zahl ist. Embodiment described in more detail, wherein it is assumed that P is an even number.

In einem ersten Schritt wird jeder zweite LSP skalar In a first step, every second LSP becomes scalar

quantisiert.

quantized.

Nun muß auf Grund der strengen Monotonie gelten

Now must apply due to the strict monotony

wobei der fiktive Wert _P+1 auf den maximal möglichen Wert fürwhere the fictitious value _{P + 1} to the maximum possible value for

_P gesetzt wird. Dieser Wertebereich für ω_i ändert sich von Rahmen zu Rahmen mit

und

. Ideal wäre es.nun, für jede Kombination von

und

einen eigenen _{P is} set. This range of values for ω _i changes from frame to frame

and

, It would be ideal for any combination of

and

an own

Quantisierer für ω_i zu verwenden, was aus Gründen des To use quantizers for ω _i , which is for the sake of

Realisierungsaufwandes nicht möglich ist. Stattdessen wird der Wertebereich durch folgende Transformation auf das Intervall ]0,1[ abgebildet. )

Jeder Wert X_i kann nun mit einem Quantisierer quantisiert und übertragen werden. Die Rücktransformation erfolgt gemäß

Realization effort is not possible. Instead, the value range is mapped to the interval] 0.1 [by the following transformation. )

Each value X _i can now be quantized and transmitted using a quantizer. The back transformation takes place according to

Das Verfahren funktioniert entsprechend, wenn man die The procedure works accordingly if you have the

Parameter, die absolut quantisiert werden, mit denen Parameters that are absolutely quantized with those

vertauscht, die nach Normierung quantisiert werden, d.h. quantisiere absolut: ω_i i = 2,(2), P quantisere nach Transformation: ω_i i = 1,(2), P - 1 swapped, which are quantized according to normalization, ie quantize absolutely: ω _i i = 2, (2), P quantize after transformation: ω _i i = 1, (2), P - 1

Statt der Transformation der LSP in den Bildbereich ist es auch möglich, den Quantisierer aus dem Bildbereich nach (13) in den ω-Bereich abzubilden. Instead of transforming the LSP into the image area, it is also possible to map the quantizer from the image area according to (13) into the ω area.

In ähnlicher Weise wird bei dem zweiten Ausführungsbeispiel jeder dritte LSP skalar quantisiert.Similarly, in the second embodiment, every third LSP is quantized scalarly.

für i = 1, (3),

for i = 1, (3),

Die Abbildungsfunktion für die dazwischen liegenden Parameter sind beispielsweise

The mapping function for the parameters in between are, for example

oder

or

da ω_i ja nun bekannt ist. Diese Lösung erbringt noch eine weitere Reduzierung der since ω _{i is} now known. This solution further reduces the

Bitrate bei gleichbleibender Qualität oder eine höhere Bit rate with the same quality or a higher one

Qualität bei gleichbleibender Bitrate; allerdings wirkt sich ein Übertragungsfehler hier auf max. drei weitere LSP-Werte aus. Quality at constant bit rate; however, a transmission error affects max. three more LSP values.

In entsprechender Weise kann auch verfahren werden, indem nur jeder vierte LSP skalar quantisiert und die dazwischen A corresponding procedure can also be followed in which only every fourth LSP is scalarly quantified and those in between

liegenden LSP entsprechend transformiert und dann quantisiert übertragen werden. lying LSP are transformed accordingly and then transmitted quantized.

Literatur literature

[1] Markel, J.D.; Gray, A.H.: Linear Prediction of Speech. [1] Markel, J.D .; Gray, A.H .: Linear Prediction of Speech.

Berlin, Heidelberg, New York: Springer Verlag, 1976 Berlin, Heidelberg, New York: Springer Verlag, 1976

[2] Müller, J.M.; Scheuermann, H.; Wächter, B.: Ein Beitrag zur Sprachcodierung für Bitraten unter 8 kbit/s Frequenz, Band [2] Müller, J.M .; Scheuermann, H .; Wächter, B .: A contribution to speech coding for bit rates below 8 kbit / s frequency, band

43, 9/89, S.242-252 43, 9/89, pp.242-252

[3] N. Sugamura, F. Itakura: "Speech Analysis and Synthesis [3] N. Sugamura, F. Itakura: "Speech Analysis and Synthesis

Methods Deveoped at ECL in NTT-Form LPC to LSP-. Speech Methods Deveoped at ECL in NTT form LPC to LSP-. Speech

Communication, Band 5, 1986, S.199-215 Communication, Volume 5, 1986, pp. 199-215

[4] J.P. Campbell, V.C. Welch, T.E. Tremain: "The DOD 4.8 kbps [4] J.P. Campbell, V.C. Welch, T.E. Tremain: "The DOD 4.8 kbps

Standard", aus "Advances in Speech Coding", Kluwer, 1991 Standard ", from" Advances in Speech Coding ", Kluwer, 1991

[5] F.K. Soong, B.H. Juang: "LSP and Speech Data Compression"; [5] F.K. Soong, B.H. Juang: "LSP and Speech Data Compression";

Proc. ICASSP-84, März' 84 Proc. ICASSP-84, March '84

[6] F.K. Soong, B.H. Juang: "Optimal Quantisation of LSP [6] F.K. Soong, B.H. Juang: "Optimal Quantization of LSP

Parameters" Proc. ICASSP-88, April'88 Parameters "Proc. ICASSP-88, April'88

Claims

1. A method for speech coding using the analysis-by-synthesis method, wherein the speech signal is sampled, a frame is formed from a fixed number of samples and the coefficients of the samples are frame-by-frame

Speech synthesis filters with the degree P are determined, a number P of so-called line spectrum parameters LSP being determined and quantized by means of these coefficients, for transmission over a channel with limited

Transmission capacity, characterized in that every second line spectrum parameter LSP scalar (absolute)

is quantized

for i = 1, (2), P-1 or i = 2, (2), P and that the line spectrum parameters LSP ω _{i in between} for i = 2, (2), P or i = 1, ( 2), P-1 are transformed (normalized) before quantization.

2. A method for speech coding using the analysis-by-synthesis method, wherein the speech signal is sampled, a frame is formed from a fixed number of samples and the coefficients of the samples are frame-by-frame

Transmission capacity, characterized in that everyone third line spectrum parameter LSP scalar (absolute)

is quantized

for i = 1, (3), ... or i = 2, (3), ... or i = 3, (3), ...

and that the intermediate line spectrum parameters LSP ω _i for i = 2, 3, 5, 6 ... or i = 1, 3, 4, 6, 7 ...

or i = 1, 2, 4, 5, 7, 8 ... with

transformed and then quantized.

3. A method for speech coding using the analysis-by-synthesis method, wherein the speech signal is sampled, a frame is formed from a fixed number of samples and the coefficients of the samples are frame-by-frame

Transmission capacity, characterized in that every nth line spectrum parameter LSP is scalar (absolute) quantized

for i = m, (n), P; l <m <n and that the lines between them transform spectrum parameters LSP ω _i for i = 1, P and i ≠ m, (n), P (normalized)

and then quantized.

4. The method according to claim 1, characterized in that the transformation according to the function

he follows.

5. The method according to claim 2, characterized in that the transformation according to the mapping functions

or

he follows.