JP2002358090A

JP2002358090A - Speech synthesis method, speech synthesis device, and recording medium

Info

Publication number: JP2002358090A
Application number: JP2002077096A
Authority: JP
Inventors: Takehiko Kagoshima; 岳彦籠嶋; Masami Akamine; 政巳赤嶺
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2001-03-26
Filing date: 2002-03-19
Publication date: 2002-12-13
Anticipated expiration: 2022-03-19
Also published as: DE60205421D1; EP1246163B1; JP3732793B2; DE60205421T2; EP1246163A2; CN1378199A; CN1185619C; KR20020076144A; EP1246163A3; KR100457414B1

Abstract

(57)【要約】【課題】音質が良いと同時に、声質などを柔軟に変化さ
せることができる音声合成方法、音声合成装置及び記録
媒体を提供することを目的とする。【解決手段】ピッチ周期の情報に従ってピッチ波形を重
畳することにより音声信号を生成する音声合成方法、音
声合成装置及び記録媒体であって、前記ピッチ波形は、
複数のホルマント波形の和によって生成され、概ホルマ
ント波形は、ホルマント周波数の正弦波に窓関数をかけ
ることによって生成されることを特徴とする。 (57) [Summary] An object of the present invention is to provide a speech synthesis method, a speech synthesis device, and a recording medium that can change voice quality and the like flexibly while having good sound quality. A voice synthesis method, a voice synthesis device, and a recording medium for generating a voice signal by superimposing a pitch waveform in accordance with pitch period information, wherein the pitch waveform includes:
The approximate formant waveform is generated by summing a plurality of formant waveforms, and the approximate formant waveform is generated by applying a window function to a sine wave of the formant frequency.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はテキスト音声合成に
関し、特に音韻記号列、ピッチ、音韻継続時間長などの
情報から音声信号を生成する音声合成に関する。The present invention relates to text-to-speech synthesis, and more particularly to speech synthesis for generating a speech signal from information such as a phoneme symbol string, pitch, and phoneme duration.

【０００２】[0002]

【従来の技術】任意の文章から人工的に音声信号を作り
出すことをテキスト音声合成という。通常このテキスト
音声合成システムは、言語処理部、音韻処理部、音声信
号生成部の３つの段階から構成される。2. Description of the Related Art Creating a speech signal artificially from an arbitrary sentence is called text-to-speech synthesis. Usually, this text-to-speech synthesis system includes three stages: a language processing unit, a phoneme processing unit, and a speech signal generation unit.

【０００３】入力されたテキストはまず言語処理部にお
いて形態素解析や構文解析などが行われ、次に音韻処理
部においてアクセントやイントネーションの処理が行わ
れて音韻記号列、ピッチパターン（声の高さの変化パタ
ーン）、音韻継続時間長などの情報が出力される。最後
に、音声信号生成部すなわち音声合成器では音韻記号
列、ピッチパターン、音韻継続時間長などの情報から音
声信号を合成する。The input text is first subjected to morphological analysis and syntactic analysis in a language processing unit, and then to accent and intonation processing in a phonological processing unit to obtain a phonological symbol string and a pitch pattern (voice pitch). Information such as a change pattern and a phoneme duration time is output. Finally, the voice signal generation unit, that is, the voice synthesizer, synthesizes a voice signal from information such as a phoneme symbol string, a pitch pattern, and a phoneme duration.

【０００４】このような任意の音韻記号列を合成するこ
とができる合成器の原理は、母音をＶ、子音をＣで表す
と、ＣＶ、ＣＶＣ、ＶＣＶなどの基本となる小さな単位
の特徴パラメータ（音声素片）を記憶し、ピッチや継続
時間長を制御して接続することにより音声を合成するも
のである。[0004] The principle of a synthesizer capable of synthesizing an arbitrary phoneme symbol string is that if a vowel is represented by V and a consonant is represented by C, the characteristic parameters of small basic units such as CV, CVC, VCV, etc. A speech unit is stored, and the speech is synthesized by controlling the pitch and the duration of the connection.

【０００５】このような音声合成器において音声素片の
情報から所望のピッチパターンや継続時間長の音声信号
を生成する方法として、ＰＳＯＬＡ法（Pitch-Synchron
ousOverlap-add）が良く知られている。例えば、音声素
片として記憶されている音声波形のピッチ周期をＰＳＯ
ＬＡ法を用いて所望のピッチ周期に変換する方法が特開
平８−２０２３９５「ピッチ変換方法およびその装置」
に開示されている。[0005] In such a speech synthesizer, a PSOLA (Pitch-Synchronous) method is used as a method for generating a speech signal having a desired pitch pattern or duration from speech unit information.
ousOverlap-add) is well known. For example, the pitch cycle of a speech waveform stored as a speech unit is represented by PSO
Japanese Patent Laid-Open No. 8-202395, "Pitch conversion method and device"
Is disclosed.

【０００６】図１８はＰＳＯＬＡ法を用いて入力音声信
号１０１のピッチ周期を変更し、出力音声信号１０４を
生成する原理を表している。まず、入力音声信号１０１
にピッチ分析を行ってピッチ周期を求める。そして、ピ
ッチ周期の２倍程度の窓長をもつ窓関数をピッチに同期
した位置で入力音声信号１０１にかけることによってピ
ッチ波形１０３を生成する。次に、所望のピッチ周期間
隔でピッチ波形１０３を重ね合わせることによってピッ
チ周期が変更された出力音声信号１０４を生成する。FIG. 18 shows the principle of generating an output audio signal 104 by changing the pitch period of an input audio signal 101 using the PSOLA method. First, the input audio signal 101
The pitch period is obtained by performing pitch analysis. Then, a pitch waveform 103 is generated by applying a window function having a window length of about twice the pitch period to the input audio signal 101 at a position synchronized with the pitch. Next, an output audio signal 104 having a changed pitch cycle is generated by superposing the pitch waveforms 103 at a desired pitch cycle interval.

【０００７】このＰＳＯＬＡ法を音声合成器に応用する
場合、入力音声信号１０１があらかじめ記憶されている
音声素片に相当し、出力音声信号１０４が合成音声信号
に相当する。ＰＳＯＬＡ法による合成音声はピッチ周期
の変更の度合いが小さい場合、ピッチ周期の変更による
音質劣化が小さく音質がよいことが知られている。When the PSOLA method is applied to a speech synthesizer, an input speech signal 101 corresponds to a speech unit stored in advance, and an output speech signal 104 corresponds to a synthesized speech signal. It is known that when the degree of change in the pitch cycle of a synthesized voice by the PSOLA method is small, sound quality deterioration due to the change in the pitch cycle is small and the sound quality is good.

【０００８】また、音声合成器の別の方式としてホルマ
ント合成方式がある。ホルマント合成方式は人間の発声
機構を模擬するモデルであり、声帯から発生する信号を
モデル化した音源信号で声道の特性をモデル化するフィ
ルタを駆動することにより音声信号を生成する。一例と
して特開平７−１５２３９６「音声合成装置」に、ホル
マント合成方式を用いた音声合成器が開示されている。Another form of the speech synthesizer is a formant synthesis method. The formant synthesis method is a model that simulates a human vocal mechanism, and generates a voice signal by driving a filter that models vocal tract characteristics with a sound source signal that models a signal generated from a vocal cord. As an example, Japanese Patent Application Laid-Open No. 7-152396 "Speech synthesizer" discloses a speech synthesizer using a formant synthesis method.

【０００９】図１９は、ホルマント合成方式によって音
声信号を生成する原理を表している。共振器２１、２
２、２３の縦続接続によって構成される声道フィルタを
所望のピッチ周期間隔で配置されたパルス列２０７で駆
動して合成音声２０８を生成する。共振器２１の周波数
特性２０４はホルマント周波数Ｆｌとホルマント帯域幅
Ｂｌによって決定される。同様に、共振器２２の周波数
特性２０５はホルマント周波数Ｆ２とホルマント帯域幅
Ｂ２によって、共振器２３の周波数特性２０６はホルマ
ント周波数Ｆ３とホルマント帯域幅Ｂ３によって決定さ
れる。FIG. 19 shows the principle of generating an audio signal by the formant synthesis method. Resonator 21, 2
A synthesized voice 208 is generated by driving a vocal tract filter composed of 2, 23 cascade connections with a pulse train 207 arranged at a desired pitch period interval. The frequency characteristic 204 of the resonator 21 is determined by the formant frequency Fl and the formant bandwidth Bl. Similarly, the frequency characteristic 205 of the resonator 22 is determined by the formant frequency F2 and the formant bandwidth B2, and the frequency characteristic 206 of the resonator 23 is determined by the formant frequency F3 and the formant bandwidth B3.

【００１０】このように、ホルマント合成方式ではホル
マント周波数と帯域幅の組み合わせによって、合成音声
の音韻（／ａ／，／ｉ／，／ｕ／など）や声質（男声、
女声など）が決定される。そのため、音声素片の情報は
波形ではなくホルマント周数と帯域幅の値の組み合とな
っている。ホルマント合成方式は、音韻や声質と直接関
係するパラメータを制御することができるため、声質を
変化させるなど柔軟な制御が可能であるという利点があ
る。As described above, in the formant synthesis method, the phoneme (/ a /, / i /, / u /, etc.) and voice quality (male,
Female voice). Therefore, the speech unit information is not a waveform but a combination of the formant frequency and the bandwidth value. Since the formant synthesis method can control parameters directly related to phonemes and voice qualities, there is an advantage that flexible control such as changing voice qualities is possible.

【００１１】[0011]

【発明が解決しようとする課題】上述したように、ＰＳ
ＯＬＡ法はピッチ周期の変更量が小さい範囲では、比較
的音質が良いものの変更の範囲が大きくなると音質が劣
化するという問題がある。As described above, the PS
The OLA method has a problem that the sound quality is relatively good in a range where the change amount of the pitch period is small, but the sound quality is deteriorated when the change range is large.

【００１２】人間が発声する音声は同じ音韻でもピッチ
周期が変化するとそのスペクトル包絡が変化するのに対
して、ＰＳＯＬＡ法ではこの変化をモデル化できないこ
とが劣化の原因となっている。また、音声素片の接続部
でスペクトルの不連続が生じた場合に、平滑化処理を行
うことによってスペクトルに歪みが生じて音質が劣化す
るという問題がある。さらに、波形そのものを音声素片
としているため声質を変化させることが難しく柔軟性に
欠ける。[0012] In a voice uttered by a human, the spectral envelope of the same phoneme changes when the pitch period changes, whereas the PSOLA method cannot model this change, which causes deterioration. In addition, when the discontinuity of the spectrum occurs at the connection part of the speech unit, there is a problem that the distortion is generated in the spectrum by performing the smoothing processing, and the sound quality is deteriorated. Furthermore, since the waveform itself is used as a speech unit, it is difficult to change the voice quality, and lacks flexibility.

【００１３】一方、ホルマント合成方式は柔軟性はある
ものの、モデルの精度が悪いという問題がある。つま
り、ホルマント周波数と帯域幅だけでは実際の音声信号
のスペクトルの微細な構造を表現することができず、音
質が悪く肉声感（人間らしさ）に欠ける。On the other hand, although the formant synthesis method is flexible, there is a problem that the accuracy of the model is poor. In other words, only the formant frequency and the bandwidth cannot express the fine structure of the spectrum of the actual audio signal, and the sound quality is poor and lacks real voice feeling (humanity).

【００１４】本発明は以上の事情を考慮してなされたも
のであり、音質が良いと同時に声質などを柔軟に変化さ
せることができる音声合成器を提供することを目的とす
る。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a speech synthesizer which has good sound quality and can flexibly change voice quality and the like.

【００１５】[0015]

【課題を解決するための手段】そこで上記課題を解決す
るために本発明の音声合成方法では、ピッチ周期の情報
に従ってピッチ波形を重畳することにより音声信号を生
成する音声合成方法において、ホルマント周波数の正弦
波に窓関数をかけることによって複数のホルマント波形
を生成し、これら複数のホルマント波形の和によって前
記ピッチ波形を生成することことを特徴とするものであ
る。SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, a voice synthesizing method according to the present invention includes a voice synthesizing method for generating a voice signal by superimposing a pitch waveform according to pitch period information. A plurality of formant waveforms are generated by applying a window function to a sine wave, and the pitch waveform is generated by the sum of the plurality of formant waveforms.

【００１６】また、本発明の音声合成装置では、ピッチ
パターン、音韻継続時間長及び音韻記号列が入力され、
ピッチ周期の情報にしたがって生成されるピッチマーク
に、ピッチ波形生成部により形成されたピッチ波形を重
畳することにより音声信号を生成する音声合成装置にお
いて、前記ピッチ波形生成部は音声素片の単位毎にホル
マントパラメータが記憶されている記憶部と、前記ピッ
チパターン、前記音韻継続時間長及び前記音韻記号列を
参照として、前記ピッチマークに対応する１フレーム分
の前記ホルマントパラメータを前記記憶部より選択して
読み出すパラメータ選択部と、前記読み出されたホルマ
ント周波数の正弦波を生成する正弦波生成部と、この生
成された正弦波に前記選択された窓関数をかけることに
よりホルマント波形を生成する掛け算器と、これらホル
マントをそれぞれ加算する加算器とを具備することをす
ることを特徴とするものである。Further, in the speech synthesizer of the present invention, a pitch pattern, a phoneme duration and a phoneme symbol string are inputted,
In a voice synthesizing apparatus for generating a voice signal by superimposing a pitch waveform formed by a pitch waveform generating section on a pitch mark generated in accordance with information on a pitch period, the pitch waveform generating section is provided for each speech unit. A storage unit in which formant parameters are stored, and by referring to the pitch pattern, the phoneme duration and the phoneme symbol string, the formant parameters for one frame corresponding to the pitch mark are selected from the storage unit. A parameter selecting unit for reading out the data, a sine wave generating unit for generating a sine wave having the read formant frequency, and a multiplier for generating a formant waveform by multiplying the generated sine wave by the selected window function. And an adder for adding each of these formants. It is intended.

【００１７】また、本発明の記録媒体では、ピッチ周期
の情報に従ってピッチ波形を重畳することにより音声信
号を生成する音声合成方法を実現するプログラムを記録
した記録媒体において、ホルマント周波数の正弦波に窓
関数をかけることによって複数のホルマント波形を生成
し、これら複数のホルマント波形の和によって前記ピッ
チ波形を生成する音声合成方法を実現するプログラムを
記録したことを特徴とする音声合成方法を記録すること
を特徴とするものである。According to the recording medium of the present invention, a program for realizing an audio synthesizing method for generating an audio signal by superimposing a pitch waveform according to pitch period information is recorded. Generating a plurality of formant waveforms by applying a function, and recording a program for realizing a voice synthesis method for generating the pitch waveform by summing up the plurality of formant waveforms. It is a feature.

【００１８】[0018]

【発明の実施の形態】以下、図面を参照して本発明の一
実施形態を説明する。図１は本発明の一実施形態に係る
音声合成方法を実現する音声合成装置の構成を示すブロ
ック図である。音声合成装置にはピッチパターン３０
６、音韻継続時間長３０７、音韻記号列３０８が入力さ
れ、合成音声信号３０５が出力される。本実施形態の音
声合成装置は無声音合成部３２と有声音合成部３１より
構成され、それぞれが出力する無声音声信号３０４と有
声音声信号３０３とを加算することによって合成音声信
号３０５を生成する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a speech synthesis apparatus for realizing a speech synthesis method according to an embodiment of the present invention. The voice synthesizer has a pitch pattern 30
6, a phoneme duration 307, a phoneme symbol string 308 is input, and a synthesized speech signal 305 is output. The voice synthesizing apparatus according to the present embodiment includes an unvoiced voice synthesis unit 32 and a voiced voice synthesis unit 31, and generates a synthesized voice signal 305 by adding the unvoiced voice signal 304 and the voiced voice signal 303 output from each of them.

【００１９】無声音合成部３２は、音韻継続時間長３０
７と音韻記号列３０８を参照して主に当該音素が無声子
音や有声摩擦音である場合に無声音声信号３０４を生成
する。無声音合成部３２は、ＬＰＣ合成フィルタを白色
雑音で駆動する方法など、公知の技術で実現することが
可能である。The unvoiced sound synthesizer 32 has a phoneme duration 30
7 and the phoneme symbol string 308, an unvoiced speech signal 304 is generated mainly when the phoneme is an unvoiced consonant or voiced fricative. The unvoiced sound synthesis unit 32 can be realized by a known technique such as a method of driving an LPC synthesis filter with white noise.

【００２０】また、有声音合成部３１はピッチマーク生
成部３３、ピッチ波形生成部３４、波形重畳部３５から
構成される。ピッチマーク生成部３３はピッチパターン
３０６と音韻継続時間長３０７を参照して、図２に示さ
れるようなピッチマーク３０２を生成する。ピッチマー
ク３０２はピッチ波形３０１を重畳する位置を表すもの
であり、ピッチマークの間隔がピッチ周期に対応する。
ピッチ波形生成部はピッチパターン３０６、音韻継続時
間長３０７、音韻記号列３０８を参照して、図２に示さ
れるようにピッチマーク３０２のそれぞれに対応するピ
ッチ波形３０１を生成する。波形重畳部３５はピッチマ
ーク３０２で示される位置に、対応するピッチ波形３０
１を重畳することによって有声音声信号３０３を生成す
る。The voiced sound synthesizer 31 comprises a pitch mark generator 33, a pitch waveform generator 34, and a waveform superimposing unit 35. The pitch mark generation unit 33 generates a pitch mark 302 as shown in FIG. 2 with reference to the pitch pattern 306 and the phoneme duration 307. The pitch mark 302 indicates the position where the pitch waveform 301 is superimposed, and the interval between the pitch marks corresponds to the pitch cycle.
The pitch waveform generation unit generates a pitch waveform 301 corresponding to each of the pitch marks 302 as shown in FIG. 2 with reference to the pitch pattern 306, the phoneme duration 307, and the phoneme symbol string 308. The waveform superimposing unit 35 places the corresponding pitch waveform 30 at the position indicated by the pitch mark 302.
A voiced speech signal 303 is generated by superimposing 1.

【００２１】次に、図１のピッチ波形生成部の構成につ
いて詳しく説明する。Next, the configuration of the pitch waveform generator of FIG. 1 will be described in detail.

【００２２】図３はピッチ波形生成部３４の一実施形態
の構成を示すブロック図である。ピッチ波形生成部３４
は、ホルマントパラメータ記憶部４１、パラメータ選択
部４２、正弦波生成部（４３、４４、４５）より構成さ
れる。ホルマントパラメータ記憶部４１には音声素片の
単位毎にホルマントパラメータが記憶されている。FIG. 3 is a block diagram showing the configuration of one embodiment of the pitch waveform generator 34. Pitch waveform generator 34
Comprises a formant parameter storage unit 41, a parameter selection unit 42, and a sine wave generation unit (43, 44, 45). The formant parameter storage unit 41 stores formant parameters for each speech unit.

【００２３】図４は音韻／ａ／の素片のホルマントパラ
メータの例を表している。この例では、／ａ／の素片は
３フレームから構成され、各フレームは３つのホルマン
トから構成されている。各ホルマントの特徴を表すパラ
メータとしてホルマント周波数、ホルマント位相、窓関
数が記憶されている。FIG. 4 shows an example of a formant parameter of a phoneme / a / fragment. In this example, the fragment of / a / is composed of three frames, and each frame is composed of three formants. A formant frequency, a formant phase, and a window function are stored as parameters representing the characteristics of each formant.

【００２４】ホルマントパラメータ選択部４２はピッチ
波形生成部３４に入力されるピッチパターン３０６、音
韻継続時間長３０７、音韻記号列３０８を参照して、ピ
ッチマーク３０２に対応する１フレーム分のホルマント
パラメータ４０１をホルマントパラメータ記憶部４１よ
り選択して読み出す。The formant parameter selection unit 42 refers to the pitch pattern 306, the phoneme duration 307, and the phoneme symbol string 308 input to the pitch waveform generation unit 34, and forms one formant parameter 401 corresponding to the pitch mark 302. Is selected from the formant parameter storage unit 41 and read out.

【００２５】ホルマントパラメータ４０１はホルマント
番号１に対応するパラメータがホルマント周波数４０
２、ホルマント位相４０３、窓関数４１１として出力さ
れ、同様に、ホルマント番号２に対応するパラメータが
ホルマント周波数４０４、ホルマント位相４０５、窓関
数４１２として、さらにホルマント番号３に対応するパ
ラメータがホルマント周波数４０６、ホルマント位相４
０７、窓関数４１３として出力される。The formant parameter 401 is a parameter corresponding to the formant number 1 and the formant frequency 40
2, the formant phase 403 and the window function 411 are output. Similarly, the parameters corresponding to the formant number 2 are the formant frequency 404, the formant phase 405, and the window function 412, and the parameters corresponding to the formant number 3 are the formant frequency 406. Formant phase 4
07, output as a window function 413.

【００２６】正弦波生成部４３はホルマント周波数４０
２とホルマント位相４０３に従って正弦波４０８を出力
する。正弦波４０８は窓関数４１１によって窓掛け処理
が行われホルマント波形４１４が生成される。ホルマン
ト周波数４０２をω、ホルマント位相４０３をφ、窓関
数４１１をｗ（ｔ）で表すと、ホルマント波形ｙ（ｔ）
は次の式で表される。The sine wave generator 43 has a formant frequency 40
2 and outputs a sine wave 408 according to the formant phase 403. The sine wave 408 is windowed by a window function 411 to generate a formant waveform 414. When the formant frequency 402 is represented by ω, the formant phase 403 is represented by φ, and the window function 411 is represented by w (t), the formant waveform y (t)
Is represented by the following equation.

【００２７】ｙ（ｔ）：＼Ｖ（ｔ）・ｓｉｎ（ωｔ＋φ）同様に、正弦波生成部４４はホルマント周波数４０４と
ホルマント位相４０５に従って正弦波４０９を出力し、
窓関数４１２による窓掛け処理を経てホルマント波形４
１５が生成される。正弦波生成部４５はホルマント周波
数４０６とホルマント位相４０７に従って正弦波４１０
を出力し、窓関数４１３による窓掛け処理を経てホルマ
ント波形４１６が生成される。Y (t): ΔV (t) · sin (ωt + φ) Similarly, the sine wave generator 44 outputs a sine wave 409 according to the formant frequency 404 and the formant phase 405,
Formant waveform 4 after windowing by window function 412
15 is generated. The sine wave generator 45 generates a sine wave 410 according to the formant frequency 406 and the formant phase 407.
Is output, and a formant waveform 416 is generated through windowing processing by a window function 413.

【００２８】さらに、ピッチ波形３０１はホルマント波
形（４１４、４１５、４１６）をそれぞれ加算すること
によって生成される。Further, the pitch waveform 301 is generated by adding the formant waveforms (414, 415, 416).

【００２９】正弦波、窓関数、ホルマント波形、ピッチ
波形の例を図６に示す。また、これらの波形のパワース
ペクトルを図７に示す。図６では横軸が時間、縦軸が振
幅を、図７では横軸が周波数、縦軸が振幅を表してい
る。FIG. 6 shows an example of a sine wave, a window function, a formant waveform, and a pitch waveform. FIG. 7 shows the power spectra of these waveforms. 6, the horizontal axis represents time, and the vertical axis represents amplitude. In FIG. 7, the horizontal axis represents frequency, and the vertical axis represents amplitude.

【００３０】正弦波は鋭いピークを持つ線スペクトルと
なり、窓関数は低域に集中したスペクトルとなってい
る。時間領域での窓掛け（掛け算）は周波数領域では畳
み込みに相当するため、ホルマント波形のスペクトルは
窓関数のスペクトルを正弦波の周波数の位置に平行移動
した形状となっている。そのため、正弦波の周波数や位
相を制御することによってピッチ波形のホルマントの中
心周波数や位相を変化させることができ、窓関数の形状
を制御することによってピッチ波形のホルマントのスペ
クトル形状を変化させることができる。The sine wave is a line spectrum having a sharp peak, and the window function is a spectrum concentrated in a low band. Since windowing (multiplication) in the time domain corresponds to convolution in the frequency domain, the spectrum of the formant waveform has a shape obtained by translating the spectrum of the window function to the position of the frequency of the sine wave. Therefore, the center frequency and phase of the pitch waveform formant can be changed by controlling the frequency and phase of the sine wave, and the form shape of the pitch waveform can be changed by controlling the shape of the window function. it can.

【００３１】このように、ホルマント毎にその中心周波
数や位相、スペクトル形状を独立に制御することが可能
であるため柔軟性の高いモデルであると言える。また同
時に、窓関数の形状によってスペクトルの微細な構造を
表現することが可能であるため、肉声のスペクトル構造
を高精度に近似することができ肉声感のある音声を合成
することが可能である。As described above, since the center frequency, phase, and spectrum shape of each formant can be controlled independently, it can be said that this is a highly flexible model. At the same time, since the fine structure of the spectrum can be expressed by the shape of the window function, the spectrum structure of the real voice can be approximated with high accuracy, and a voice with a real voice feeling can be synthesized.

【００３２】次に、本発明のピッチ波形生成部３４の第
２の実施形態を図８を参照して説明する。図３と相対応
する部分に同一の参照符号を付して相違点を中心に説明
すると、本実施形態では窓関数が基底関数展開されてお
り、ホルマントパラメータとして窓関数を記憶する代わ
りに重み係数の組が記憶されている。そして、新たに付
加された窓関数生成部５６において重み係数の組から窓
関数を生成する。Next, a second embodiment of the pitch waveform generator 34 of the present invention will be described with reference to FIG. The parts corresponding to those in FIG. 3 are denoted by the same reference numerals and mainly the differences will be described. In the present embodiment, the window function is expanded to a basis function, and instead of storing the window function as a formant parameter, a weighting coefficient is used. Are stored. Then, the newly added window function generator 56 generates a window function from the set of weighting coefficients.

【００３３】ホルマントパラメータ記憶部５１に記憶さ
れているホルマントパラメータの例を図５に示す。この
例では３つの基底関数の重み和に窓関数が展開されてお
り、窓関数重み係数のセットとして３つの係数の組が記
憶されている。パラメータ選択部４２は選択されたホル
マントパラメータ（ホルマント周波数、ホルマント位
相、窓関数重み係数）５０１の中でホルマント周波数
（４（）２，４０４，４０６）、ホルマント位相（４０
３，４０５，４０７）が正弦波生成部（４３，４４，４
５）へ、窓関数重み係数セット（５１７、５１８、５１
９）が窓関数生成部５６へ出力される。FIG. 5 shows an example of the formant parameters stored in the formant parameter storage unit 51. In this example, a window function is developed into a weighted sum of three basis functions, and a set of three coefficients is stored as a set of window function weighting coefficients. The parameter selection unit 42 selects the formant frequency (4 () 2, 404, 406) and the formant phase (40) among the selected formant parameters (formant frequency, formant phase, window function weighting coefficient) 501.
3, 405, 407) are sine wave generators (43, 44, 4).
5), a window function weighting coefficient set (517, 518, 51)
9) is output to the window function generator 56.

【００３４】窓関数生成部５６は、重み係数セット（５
１７、５１８、５１９）にしたがって、窓関数（５１
１、５１２、５１３）をそれぞれ生成する。重み係数セ
ットをそれぞれａ１，ａ２，ａ３とし、基底関数をｂ１
（ｔ），ｂ２（ｔ），ｂ３（ｔ）とすると、窓関数Ｗ
（ｔ）は次式で表される。The window function generating section 56 has a weighting coefficient set (5
17, 518, 519), the window function (51
1, 512, 513) respectively. The weight coefficient sets are a1, a2, and a3, respectively, and the basis function is b1
(T), b2 (t), b3 (t), the window function W
(T) is represented by the following equation.

【００３５】ｗ（ｔ）＝ａ１・ｂ１（ｔ）十ａ２・ｂ２
（ｔ）十ａ３・ｂ３（ｔ）なお、基底関数としてはＤＣ
Ｔ基底などを用いても良いし、窓関数をＫＬ展開するこ
とによって生成された基底関数を用いても良い。本実施
形態では、基底の次数を３としたが、次数はいくつでも
良い。窓関数を基底関数展開することによって、ホルマ
ントパラメータ記憶部の記憶容量が削減されるという利
点がある。W (t) = a1 · b1 (t) tens a2 · b2
(T) 10 a3 · b3 (t) The basis function is DC
A T-basis or the like may be used, or a basis function generated by performing KL expansion of a window function may be used. In the present embodiment, the order of the basis is set to 3, but the order may be any number. By expanding the window function into the basis function, there is an advantage that the storage capacity of the formant parameter storage unit is reduced.

【００３６】次に、本発明のピッチ波形生成部３４の第
３の実施形態を図９を参照して説明する。図３と相対応
する部分に同一の参照符号を付して相違点を中心に説明
すると、本実施形態ではパラメータ変形部６７が新たに
付加されており、ピッチパタ一ン３０６に従ってホルマ
ントパラメータが変化する点が異なっている。Next, a third embodiment of the pitch waveform generator 34 of the present invention will be described with reference to FIG. 3 will be described with the same reference numerals given to the parts corresponding to FIG. 3. In this embodiment, a parameter deforming unit 67 is newly added, and the formant parameters change according to the pitch pattern 306. The points are different.

【００３７】パラメータ変形部６７は、ホルマント周波
数４０２、ホルマント位相４０３、窓関数４１１、ホル
マント周波数４０４、ホルマント位相４０５、窓関数４
１２、ホルマント周波数４０６、ホルマント位相４０
７、窓関数４１３を、ピッチパターン３０６に従って変
化させて、ホルマント周波数７２０、ホルマント位相７
２１、窓関数７１７、ホルマント周波数７２２、ホルマ
ント位相７２３、窓関数７１８、ホルマント周波数７２
４、ホルマント位相７２５、窓関数７１９をそれぞれ出
力する。全てのパラメータを変化させるようにしても良
いし、一部のパラメータのみを変化させるようにしても
よい。The parameter deforming section 67 includes a formant frequency 402, a formant phase 403, a window function 411, a formant frequency 404, a formant phase 405, and a window function 4
12, formant frequency 406, formant phase 40
7. The window function 413 is changed in accordance with the pitch pattern 306 so that the formant frequency 720 and the formant phase 7
21, window function 717, formant frequency 722, formant phase 723, window function 718, formant frequency 72
4. Output formant phase 725 and window function 719. All parameters may be changed, or only some parameters may be changed.

【００３８】図１０はピッチ周期に応じてホルマント周
波数を制御する場合の制御関数の例を示している。この
ような制御関数は音韻ごとに設定しても良いし、あるい
はフレーム毎、ホルマント番号毎に設定して使い分ける
ようにしても良い。FIG. 10 shows an example of a control function for controlling the formant frequency according to the pitch period. Such a control function may be set for each phoneme, or may be set and used for each frame or formant number.

【００３９】また、ホルマント周波数そのものではなく
入カホルマント周波数と出カホルマント周波数の差分値
や比の値を制御する制御関数を用いても良い。Further, instead of the formant frequency itself, a control function for controlling a difference value or a ratio value between the input formant frequency and the output formant frequency may be used.

【００４０】図１１はピッチ周期に応じたゲインを窓関
数に乗じてホルマントのパワーを制御するための制御関
数を表している。このように、ピッチ周期に応じてパラ
メータを変化させることによりピッチ周期の変化による
音声のスペクトルの変化をモデル化することが可能とな
り、声の高さによらず高音質な合成音声を生成すること
ができる。FIG. 11 shows a control function for controlling the power of the formant by multiplying the window function by a gain corresponding to the pitch period. Thus, by changing the parameters according to the pitch cycle, it is possible to model the change in the spectrum of the voice due to the change in the pitch cycle, and to generate a synthesized voice with high sound quality regardless of the pitch of the voice. Can be.

【００４１】また、パラメータ変形部６７に音韻記号列
３０８を入力するようにして、先行あるいは後続の音韻
の種類に従ってホルマントパラメータを変化させるよう
にしても良い。これにより、音韻環境による音声のスペ
クトルの変化をモデル化することが可能となり、音質を
向上させることができる。Alternatively, the phoneme symbol string 308 may be input to the parameter transformation section 67 so that the formant parameters are changed according to the type of the preceding or succeeding phoneme. Accordingly, it is possible to model a change in the spectrum of the voice due to the phoneme environment, and it is possible to improve the sound quality.

【００４２】さらに、パラメータ変形部６７に外部から
入力される声質情報３０９に従ってパラメータを変化さ
せるようにしても良い。これにより、様々な声質の合成
音声を生成することが可能となる。Further, the parameters may be changed in accordance with the voice quality information 309 input from the outside to the parameter deforming section 67. As a result, it is possible to generate synthesized voices of various voice qualities.

【００４３】図１２はホルマント周波数を変化させるこ
とによって声の太さを変える場合の制御関数の例を示し
ている。（ａ）の制御関数を用いて全てのホルマント周
波数を変換すれば、ホルマントが高域にシフトすること
により細い声が生成され、（ｂ）の場合はやや細い声と
なる。反対に、（ｄ）の制御関数を用いると、ホルマン
ト周波数が低域にシフトすることにより、太い声が生成
され、（ｃ）の場合はやや太い声となる。FIG. 12 shows an example of a control function when the thickness of the voice is changed by changing the formant frequency. If all the formant frequencies are converted using the control function of (a), a thin voice is generated by shifting the formant to a high frequency band, and in the case of (b), the voice becomes slightly thin. Conversely, when the control function of (d) is used, a thick voice is generated by shifting the formant frequency to a low band, and in the case of (c), the voice becomes slightly thick.

【００４４】次に、本発明のピッチ波形生成部３４の第
４の実施形態を図１３を参照して説明する。図３と相対
応する部分に同一の参照符号を付して相違点を中心に説
明すると、本実施形態ではパラメータ平滑化部７７が新
たに付加されており、各ホルマントパラメータの時間的
な変化がなめらかになるようにパラメータを平滑化する
点が異なっている。Next, a fourth embodiment of the pitch waveform generator 34 of the present invention will be described with reference to FIG. The parts corresponding to those in FIG. 3 are denoted by the same reference numerals and mainly the differences will be described. In the present embodiment, a parameter smoothing unit 77 is newly added, and the temporal change of each formant parameter is changed. The difference is that the parameters are smoothed so as to be smooth.

【００４５】パラメータ平滑化部７７は、ホルマント周
波数４０２、ホルマント位相４０３、窓関数４１１、ホ
ルマント周波数４０４、ホルマント位相４０５、窓関数
４１２、ホルマント周波数４０６、ホルマント位相４０
７、窓関数４１３を、それぞれ平滑化して、ホルマント
周波数８２０、ホルマント位相８２１、窓関数８１７、
ホルマント周波数８２２、ホルマント位相８２３、窓関
数８１８、ホルマント周波数８２４、ホルマント位相８
２５、窓関数８１９をそれぞれ出力する。全てのパラメ
ータを平滑化するようにしても良いし、一部のパラメー
タのみを平滑化するようにしても良い。The parameter smoothing section 77 includes a formant frequency 402, a formant phase 403, a window function 411, a formant frequency 404, a formant phase 405, a window function 412, a formant frequency 406, and a formant phase 40.
7. The window function 413 is smoothed to formant frequency 820, formant phase 821, window function 817,
Formant frequency 822, formant phase 823, window function 818, formant frequency 824, formant phase 8
25, and outputs a window function 819, respectively. All parameters may be smoothed, or only some parameters may be smoothed.

【００４６】図１４はホルマントの平滑化の例を示して
いる。×で表されるのが平滑化前のホルマント周波数４
０２、４０４、４０６であり、先行あるいは後続のフレ
ームの対応するホルマント周波数との変化がなめらかに
なるように平滑化を行って○で表される平滑化されたホ
ルマント周波数８２０、８２２、８２４がそれぞれ生成
される。FIG. 14 shows an example of formant smoothing. × represents the formant frequency 4 before smoothing
02, 404, and 406. The smoothed formant frequencies 820, 822, and 824 represented by ○ are smoothed so that the change from the corresponding formant frequency of the preceding or subsequent frame is smooth. Generated.

【００４７】また、音声素片の接続部において、ホルマ
ントの対応がとれないような場合に、図１５（ａ）の×
で表されるようにホルマント周波数４０４に対応するホ
ルマントが消滅してしまうことが起りうる。この場合、
スペクトルに大きな不連続が生じて音質が劣化するため
○で表されるように、ホルマントを付加してホルマント
周波数８２２を生成する。この際、図１５（ｂ）に表さ
れるようにホルマント周波数８２２に対応する窓関数８
１８のパワーを減衰させるようにしてホルマントのパワ
ーの不連続が生じないようにする。In the case where the formant cannot be handled at the connection part of the speech unit, the X in FIG.
, The formant corresponding to the formant frequency 404 may disappear. in this case,
Since a large discontinuity occurs in the spectrum and the sound quality is degraded, a formant is added to generate a formant frequency 822 as indicated by a circle. At this time, the window function 8 corresponding to the formant frequency 822 as shown in FIG.
The power of 18 is attenuated so that the discontinuity of the formant power does not occur.

【００４８】図１６は窓関数位置の平滑化の例を示して
いる。窓関数４１１のピーク位置がフレーム間でなめら
かに変化するように窓関数位置の平滑化を行って、窓関
数８１７を生成している。この他にも、窓関数の形状
や、窓関数のパワーの平滑化を行っても良い。FIG. 16 shows an example of smoothing the window function position. A window function 817 is generated by smoothing the window function position so that the peak position of the window function 411 changes smoothly between frames. In addition, the shape of the window function and the power of the window function may be smoothed.

【００４９】上述した本発明の実施形態ではホルマント
数３の場合について説明したが、ホルマント数はいくつ
であっても良く、フレーム毎にホルマント数が変化して
も良い。In the above-described embodiment of the present invention, the case where the number of formants is 3 has been described. However, the number of formants may be any number, and the number of formants may change for each frame.

【００５０】また、本発明の実施形態の正弦波生成部は
正弦波を出力するものとして説明したが、線スペクトル
に近いパワースペクトルを持つ波形であれば完全な正弦
波でなくとも良い。例えば、計算量を削減する目的で計
算精度を落としたり、テーブル化した場合は誤差のため
に完全な正弦波とはならない場合がある。Although the sine wave generator according to the embodiment of the present invention outputs a sine wave, the waveform may not be a perfect sine wave as long as the waveform has a power spectrum close to a line spectrum. For example, when the calculation accuracy is lowered for the purpose of reducing the calculation amount, or when a table is formed, a perfect sine wave may not be obtained due to an error.

【００５１】また、ホルマント波形のスペクトルは、必
ずしも音声信号のスペクトルの山の部分を表現するとは
限らず、複数のホルマント波形の和であるピッチ波形の
スペクトルが音声のスペクトルを表現するものである。The spectrum of the formant waveform does not always represent the peak of the spectrum of the audio signal, and the spectrum of the pitch waveform, which is the sum of a plurality of formant waveforms, represents the spectrum of the audio.

【００５２】本発明の実施形態としてテキスト音声合成
における合成器について説明したが、本発明の他の実施
形態として音声符号化における復号化器がある。すなわ
ち、符号化器では音声信号からホルマント周波数、ホル
マント位相、窓関数などのホルマントパラメータとピッ
チ周期などを分析によって求め、それらを符号化して伝
送あるいは蓄積し、復号化器では、ホルマントパラメー
タとピッチ周期を復号化して上述した合成器と同様に音
声信号を再生することが可能である。Although the synthesizer for text-to-speech synthesis has been described as an embodiment of the present invention, there is a decoder for speech coding as another embodiment of the present invention. That is, the encoder determines formant parameters such as formant frequency, formant phase, and window function and the pitch period from the audio signal by analysis, encodes them and transmits or stores them, and the decoder encodes the formant parameters and the pitch period. Can be decoded to reproduce an audio signal in the same manner as the synthesizer described above.

【００５３】上述した音声合成は、記録媒体に格納され
たプログラムに従ってコンピュータをプログラム制御す
ることにより行うことができる。このプログラム制御を
図１７を参照して説明する。The above-described speech synthesis can be performed by program-controlling a computer according to a program stored in a recording medium. This program control will be described with reference to FIG.

【００５４】図１７（ａ）は音声合成処理のフローチャ
ートを示しており、図１７（ｂ）は音声合成処理の内の
有声音声生成処理のフローチャートを示しており、図１
７（ｃ）は図１７（ｂ）の有声音声生成処理のピッチ波
形生成処理のフローチャートを示している。FIG. 17A shows a flowchart of the voice synthesis process, and FIG. 17B shows a flowchart of the voiced voice generation process in the voice synthesis process.
FIG. 7C shows a flowchart of the pitch waveform generation processing of the voiced voice generation processing of FIG. 17B.

【００５５】図１７（ａ）における音声合成処理におい
ては、ピッチパターン３０６、音韻継続時間長３０７お
よび音韻記号列３０８を入力する（Ｓ１１）。ピッチパ
ターン３０６、音韻継続時間長３０７および音韻記号列
３０８に基づいて有声音声信号３０３を生成する（Ｓ１
２）。音韻継続時間長３０７および音韻記号列３０８を
参照して無声音声信号３０４を生成する（Ｓ１３）。有
声音声信号と無声音声信号とを加算して合成音声信号３
０５を生成する（Ｓ１４）。In the speech synthesis processing in FIG. 17A, a pitch pattern 306, a phoneme duration 307, and a phoneme symbol string 308 are input (S11). A voiced speech signal 303 is generated based on the pitch pattern 306, phoneme duration 307, and phoneme symbol string 308 (S1).
2). An unvoiced speech signal 304 is generated with reference to the phoneme duration 307 and phoneme symbol string 308 (S13). A synthesized voice signal 3 is obtained by adding the voiced voice signal and the unvoiced voice signal.
05 is generated (S14).

【００５６】図１７（ｂ）における有声音声生成処理で
は、ピッチパターン３０６と音韻継続時間長３０７とを
参照してピッチマーク３０２を生成する（Ｓ２１）。ピ
ッチパターン３０６、音韻継続時間長３０７および音韻
記号列３０８を参照してピッチマーク３０２にそれぞれ
対応するピッチ波形３０１を生成する（Ｓ２２）。ピッ
チマーク３０２で示される位置に対応するピッチ波形３
０１を重畳し、有声音声を生成する（Ｓ２３）。In the voiced voice generation processing in FIG. 17B, the pitch mark 302 is generated with reference to the pitch pattern 306 and the phoneme duration 307 (S21). A pitch waveform 301 corresponding to the pitch mark 302 is generated with reference to the pitch pattern 306, the phoneme duration 307, and the phoneme symbol string 308 (S22). Pitch waveform 3 corresponding to the position indicated by pitch mark 302
01 is superimposed to generate voiced speech (S23).

【００５７】図１７（ｃ）におけるピッチ波形生成処理
においては、ピッチパターン３０６、音韻継続時間長３
０７および音韻記号列３０８を参照してピッチマーク３
０２に対応する１フレーム分のホルマントパラメータ４
０１をホルマントパラメータ記憶部４１より選択する
（Ｓ３１）。選択したホルマントパラメータ４０１のホ
ルマント番号に対応するホルマント周波数とホルマント
位相に従って複数の正弦波が生成される（Ｓ３２）。複
数の正弦波を窓関数により窓掛けを行ってホルマント波
形４１４，４１５，４１６を生成する（Ｓ３３）。これ
らホルマント波形を加算してピッチ波形を生成する（Ｓ
３４）。In the pitch waveform generation processing in FIG. 17C, the pitch pattern 306 and the phoneme duration 3
07 and the phonological symbol string 308 with reference to the pitch mark 3
Formant parameter 4 for one frame corresponding to 02
01 is selected from the formant parameter storage unit 41 (S31). A plurality of sine waves are generated according to the formant frequency and formant phase corresponding to the formant number of the selected formant parameter 401 (S32). A plurality of sine waves are windowed by a window function to generate formant waveforms 414, 415, 416 (S33). A pitch waveform is generated by adding these formant waveforms (S
34).

【００５８】[0058]

【発明の効果】以上説明したように本発明によれば、ホ
ルマント毎にホルマント周波数、ホルマント形状を独立
に制御するため、ピッチ周期や声質の違いによる音声の
スペクトル変化を表現することが可能となり、高い柔軟
性を実現することができる。あるいは、窓関数の形状に
よってホルマントのスペクトルの微細な構造を表現する
ため、肉声感のある高音質な合成音を生成することがで
きる。As described above, according to the present invention, since the formant frequency and the formant shape are controlled independently for each formant, it is possible to express a change in the spectrum of a voice due to a difference in pitch period or voice quality. High flexibility can be realized. Alternatively, since the fine structure of the formant spectrum is expressed by the shape of the window function, a high-quality synthesized sound having a real voice feeling can be generated.

[Brief description of the drawings]

【図１】本発明の一実施形態に係る音声合成器のブロッ
ク図。FIG. 1 is a block diagram of a speech synthesizer according to an embodiment of the present invention.

【図２】ピッチ波形の重畳による有声音声の生成を示す
模式図。FIG. 2 is a schematic diagram showing generation of voiced speech by superposition of a pitch waveform.

【図３】本発明の一実施形態に係るピッチ波形生成部の
ブロック図。FIG. 3 is a block diagram of a pitch waveform generator according to one embodiment of the present invention.

【図４】ホルマントパラメータの例を示す模式図。FIG. 4 is a schematic diagram showing an example of a formant parameter.

【図５】ホルマントパラメータの例を示す模式図。FIG. 5 is a schematic diagram showing an example of a formant parameter.

【図６】正弦波、窓関数、ホルマント波形、ピッチ波形
の例を示す模式図。FIG. 6 is a schematic diagram showing an example of a sine wave, a window function, a formant waveform, and a pitch waveform.

【図７】正弦波、窓関数、ホルマント波形、ピッチ波形
のパワースペクトルの例を示す模式図。FIG. 7 is a schematic diagram showing an example of a power spectrum of a sine wave, a window function, a formant waveform, and a pitch waveform.

【図８】本発明の一実施形態に係るピッチ波形生成部の
ブロック図。FIG. 8 is a block diagram of a pitch waveform generator according to an embodiment of the present invention.

【図９】本発明の一実施形態に係るピッチ波形生成部の
ブロック図。FIG. 9 is a block diagram of a pitch waveform generator according to an embodiment of the present invention.

【図１０】ホルマント周波数の制御関数の例を示す模式
図。FIG. 10 is a schematic diagram illustrating an example of a control function of a formant frequency.

【図１１】ホルマントゲインの制御関数の例を示す模式
図。FIG. 11 is a schematic diagram showing an example of a control function of a formant gain.

【図１２】声質変換のためのホルマント周波数マッピン
グ関数の例を示す模式図FIG. 12 is a schematic diagram showing an example of a formant frequency mapping function for voice quality conversion;

【図１３】本発明の一実施形態に係るピッチ波形生成部
のブロック図。FIG. 13 is a block diagram of a pitch waveform generator according to one embodiment of the present invention.

【図１４】ホルマント周波数の平滑化の例を示す模式
図。FIG. 14 is a schematic diagram showing an example of smoothing a formant frequency.

【図１５】ホルマント周波数の平滑化の例を示す模式
図。FIG. 15 is a schematic diagram showing an example of smoothing a formant frequency.

【図１６】窓関数位置の平滑化の例を示す模式図。FIG. 16 is a schematic diagram showing an example of smoothing a window function position.

【図１７】本発明の音声合成器の処理を示すフローチャ
ートである。FIG. 17 is a flowchart showing processing of the speech synthesizer of the present invention.

【図１８】従来のＰＳＯＬＡ法による音声合成を示す模
式図。FIG. 18 is a schematic diagram showing speech synthesis by a conventional PSOLA method.

【図１９】従来のホルマント合成器のブロック図。FIG. 19 is a block diagram of a conventional formant synthesizer.

[Explanation of symbols]

３１…有声音合成部３２…無声音合成部３３…ピッチマーク生成部３４…ピッチ波形生成部３５…波形重畳部４１、５１…ホルマントパラメータ記憶部４２…パラメータ選択部４３、４４、４５…正弦波生成部５６…窓関数生成部６７…パラメータ変形部７７…パラメータ平滑化部 DESCRIPTION OF SYMBOLS 31 ... Voiced sound synthesis part 32 ... Unvoiced sound synthesis part 33 ... Pitch mark generation part 34 ... Pitch waveform generation part 35 ... Waveform superposition part 41, 51 ... Formant parameter storage part 42 ... Parameter selection part 43, 44, 45 ... Sine wave generation Unit 56: window function generating unit 67: parameter deforming unit 77: parameter smoothing unit

Claims

[Claims]

In a voice synthesizing method for generating a voice signal by superimposing a pitch waveform according to pitch period information, a plurality of formant waveforms are generated by applying a window function to a sine wave of a formant frequency. Wherein the pitch waveform is generated by the sum of the formant waveforms.

2. The speech synthesis method according to claim 1, wherein said window function is generated by weighted addition of a plurality of basis functions.

3. The voice according to claim 1, wherein at least one of the power of the formant waveform, the shape of the window function, the position of the window function, and the formant frequency changes in accordance with the pitch period. Synthesis method.

4. The apparatus according to claim 1, wherein at least one of the power of the formant waveform, the shape of the window function, the position of the window function, and the formant frequency changes in accordance with at least a type of a preceding or succeeding phoneme. Item 1. The speech synthesis method according to Item 1.

5. The apparatus according to claim 1, wherein at least one of the power of the formant waveform, the shape of the window function, the position of the window function, and the formant frequency changes in accordance with given voice quality information. Described speech synthesis method.

6. The method according to claim 6, wherein at least one of the formant frequency, the power of the formant waveform, the shape of the window function, the phase of the sine wave, and the position of the window function is at least one of a corresponding formant of a preceding or succeeding pitch waveform. 2. The speech synthesis method according to claim 1, wherein the voice synthesis method changes in accordance with at least one of a formant frequency, a power of a formant waveform, a shape of a window function, a phase of a sine wave, and a position of the window function.

7. At least one of the formant frequency, the power of the formant waveform, the shape of the window function, the phase of the sine wave, and the position of the window function is the presence or absence of a corresponding formant of at least a preceding or succeeding pitch waveform. 2. The speech synthesis method according to claim 1, wherein the speech synthesis method changes according to the following.

8. A voice signal is inputted by inputting a pitch pattern, a phoneme duration and a phoneme symbol string, and superimposing a pitch waveform formed by a pitch waveform generator on a pitch mark generated according to pitch period information. In the speech synthesizer that generates, the pitch waveform generating unit, a storage unit in which formant parameters are stored for each unit of the speech unit, and the pitch pattern, the phoneme duration and the phoneme symbol string as a reference, A parameter selection unit that selects and reads the formant parameters for one frame corresponding to the pitch mark from the storage unit, a sine wave generation unit that generates a sine wave of the read formant frequency, A multiplier for generating a formant waveform by multiplying the sine wave by the selected window function; Speech synthesis apparatus characterized by comprising an adder for adding the mantle, respectively.

9. A speech synthesizer according to claim 8, wherein said window function is stored in said storage unit.

10. A window function generating unit for storing a weighting factor of a window function in the storage unit, and generating the window function by introducing the weighting factor and adding weights of basis functions. The speech synthesizer according to claim 8, wherein

11. A speech synthesizer according to claim 8, further comprising a parameter deforming section for changing said selected formant parameter in accordance with said pitch period.

12. A speech synthesizer according to claim 8, further comprising a parameter transforming unit for changing said selected formant parameter in accordance with preceding or succeeding phoneme information.

13. A speech synthesizer according to claim 8, further comprising a parameter deforming unit for changing said selected formant parameter according to a given voice quality.

14. The speech synthesizer according to claim 8, further comprising a parameter smoothing unit for smoothing a temporal change of the selected formant parameter.

15. A recording medium on which a program for realizing a voice synthesizing method for generating a voice signal by superimposing a pitch waveform in accordance with pitch period information is recorded, by applying a window function to a sine wave of formant frequency. A recording medium recording a voice synthesis method, wherein a program for generating a formant waveform and a voice synthesis method for generating the pitch waveform by a sum of the plurality of formant waveforms is recorded.

16. A predetermined formant parameter is selected from said formant parameters according to a command for storing a plurality of formant parameters representing a formant frequency, a formant phase and a window function in a storage device, and a pitch pattern, a phoneme duration and a phoneme symbol string. Instructions to generate a plurality of sine waves based on the formant frequency and formant phase corresponding to the selected formant parameter, and a window function corresponding to the selected formant parameter to generate a plurality of formant waveforms. Speech synthesis including a command to multiply by the sine wave, a command to add the formant waveform to generate a plurality of pitch waveforms, and a command to superimpose the pitch waveform according to a pitch period to generate a voice signal program.

17. The speech synthesis program according to claim 16, further comprising an instruction to add a prescribed function weighted by a weight coefficient to generate the window function.