JP2002358091A

JP2002358091A - Speech synthesis method and speech synthesis device

Info

Publication number: JP2002358091A
Application number: JP2001166484A
Authority: JP
Inventors: Toshiyuki Isono; 敏幸礒野; Hirofumi Nishimura; 洋文西村
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2001-06-01
Filing date: 2001-06-01
Publication date: 2002-12-13

Abstract

(57)【要約】【課題】入力バッファサイズに制限がある場合でも文
章を自然に合成音声に変換することができる音声合成方
法および音声合成装置を提供する。【解決手段】入力したテキストデータの中から入力バ
ッファサイズ以下の文字数となる句読点、括弧の終了位
置、感情を示す特殊記号および係り受け情報を検索、調
査し（ステップＳ1〜Ｓ6）、前記検索、調査結果に基づ
いて入力してテキストデータから入力バッファサイズ以
下の文字列を分離し、合成単位文字列を生成する（ステ
ップＳ7）。次いで、生成された合成単位文字列に基づ
いて合成音声を生成し、このように入力されたテキスト
の文字列の全てが音声に変換されるまで一連の処理を繰
り返し、全ての文字列を音声化する（ステップＳ8、Ｓ
9）。 (57) [Summary] [PROBLEMS] To provide a speech synthesis method and a speech synthesis device capable of naturally converting sentences into synthesized speech even when the input buffer size is limited. SOLUTION: A search is made for punctuation marks, end positions of parentheses, special symbols indicating emotions, and dependency information from input text data that are equal to or smaller than the input buffer size (steps S1 to S6). A character string smaller than the input buffer size is separated from the text data input based on the investigation result, and a combined unit character string is generated (step S7). Next, a synthesized speech is generated based on the generated synthesized unit character string, and a series of processing is repeated until all of the character strings of the text thus input are converted into voice, and all the character strings are converted into voice. (Step S8, S
9).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力された文字列
に基づいて音声を出力する音声合成方法および音声合成
装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice synthesizing method and a voice synthesizing apparatus for outputting voice based on an input character string.

【０００２】[0002]

【従来の技術】入力された文字列に基づいて音声を合成
する方法としては、文字列を感情のある音声や無感情の
音声に変化するものがあり、このようなものとして、例
えば、特開平７−７２９００号公報に記載されたような
ものがある。2. Description of the Related Art As a method of synthesizing a voice based on an input character string, there is a method of converting a character string into an emotional voice or a non-emotional voice. There is one as described in JP-A-7-72900.

【０００３】このものは、文字コードを構文解析し、予
め内部に構築されている辞書を参照しながら各単語や文
節のアクセントを決定し、かつ同様に内部に構築されて
いる様々な音素を表現できるスペクトルの特徴パラメー
タの中から該当する単語等を構成するのに最も適切なも
のを選択し、さらに、規則により文章全体のイントネー
ションやパワーの変化を決定し、これらに基づいて音声
を出力するようにしている。In this method, character codes are parsed, accents of words and phrases are determined with reference to a dictionary constructed in advance, and various phonemes similarly constructed are expressed. Select the most appropriate one to compose the corresponding word from the characteristic parameters of the spectrum that can be obtained, further determine the intonation and power change of the whole sentence by rules, and output speech based on these. I have to.

【０００４】この技術にあっては、合成音声の入力バッ
ファサイズに関しては特に制限は設けられてはいなく、
実際は冗長的なバッファサイズを設定して、入力された
文字列がバッファサイズを越えないようにしていた。[0004] In this technique, there is no particular limitation on the input buffer size of the synthesized speech.
Actually, a redundant buffer size was set so that the input character string would not exceed the buffer size.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、このよ
うな従来の音声合成方法においては、メモリサイズの都
合で入力バッファサイズに制限が生じる場合には、入力
バッファサイズ以上の文字数である文字列は合成できな
いという問題があった。However, in such a conventional speech synthesis method, if the input buffer size is limited due to the memory size, a character string having the number of characters larger than the input buffer size is synthesized. There was a problem that it was not possible.

【０００６】本発明は、従来の問題を解決するためにな
されたもので、入力バッファサイズに制限がある場合で
も文章を自然に合成音声に変換することができる音声合
成方法および音声合成装置を提供するものである。SUMMARY OF THE INVENTION The present invention has been made to solve the conventional problem, and provides a speech synthesis method and a speech synthesis apparatus capable of naturally converting sentences into synthesized speech even when the input buffer size is limited. Is what you do.

【０００７】[0007]

【課題を解決するための手段】本発明の音声合成方法
は、入力された文字列から入力バッファサイズ以下の文
字数となる句読点、括弧の終了位置、感情を示す特殊記
号位置および係り受け情報を検出し、前記文字列から入
力バッファサイズ以下の長さの合成単位文字列を分離
し、分離された合成単位文字列毎に合成音声を生成する
ものである。SUMMARY OF THE INVENTION A speech synthesis method according to the present invention detects a punctuation mark, an end position of parentheses, a special symbol position indicating emotion, and dependency information from an input character string. Then, a synthesized unit character string having a length equal to or smaller than the input buffer size is separated from the character string, and a synthesized speech is generated for each separated synthesized unit character string.

【０００８】このような方法により、入力バッファサイ
ズ以上の文字数である文字列が入力された場合にも、自
然な位置で文章を区切ることができ、文意に沿った合成
音声を生成することができる。According to such a method, even when a character string having the number of characters equal to or larger than the input buffer size is input, the sentence can be divided at a natural position, and a synthesized speech according to the sentence can be generated. it can.

【０００９】また、本発明の音声合成方法は、前記分離
された合成単位文字列の中で感情を示す特殊記号につい
ては、感情付与音声素片データベースおよび朗読調音声
素片データベースの中から感情付与音声素片データベー
スを選択して合成音声を生成するものである。Further, in the speech synthesis method according to the present invention, the special symbol indicating the emotion in the separated synthesized unit character string is sent from the emotion-added speech unit database and the reading-speech-based speech unit database. The speech unit database is selected to generate a synthesized speech.

【００１０】このような方法により、入力バッファサイ
ズ以上の文字数である文字列が入力された場合にも、文
章を自然に区切ることができ、文意に沿った合成音声を
生成することができるとともに、感情を示す特殊記号に
応じた感情が付与された合成音声を生成することができ
る。According to such a method, even when a character string having the number of characters equal to or larger than the input buffer size is input, the sentence can be naturally separated, and a synthesized speech can be generated according to the sentence. In addition, it is possible to generate a synthesized speech to which an emotion corresponding to the special symbol indicating the emotion is added.

【００１１】また、本発明の音声合成方法は、前記分離
された合成単位文字列の中で括弧に囲まれた文字列につ
いては、口語調音声素片データベースおよび朗読調音声
素片データベースの中から口語調音声素片データを選択
して合成音声を生成するものである。In the speech synthesis method according to the present invention, a character string enclosed in parentheses in the separated synthesis unit character string may be selected from a spoken speech unit database and a reading speech unit database. This is to generate a synthesized speech by selecting colloquial speech unit data.

【００１２】このような方法により、入力バッファサイ
ズ以上の文字数である文字列が入力された場合にも、文
章を自然に区切ることができ、文意に沿った合成音声を
生成することができるとともに、括弧で囲まれた文字列
は口語調の合成音声を生成することができる。According to such a method, even when a character string having the number of characters equal to or larger than the input buffer size is input, the sentence can be naturally separated, and a synthesized speech can be generated according to the sentence. , A character string enclosed in parentheses can generate a spoken synthesized voice.

【００１３】また、本発明の音声合成装置は、テキスト
を入力するテキスト入力手段と、入力バッファサイズ以
下の文字数となる句点を検索する句点検索手段と、入力
バッファサイズ以下の文字数となる読点を検索する読点
検索手段と、入力バッファサイズ以下の文字数となる括
弧の終了位置を検索する括弧終了位置検索手段と、入力
バッファサイズ以下の文字数となる感情を示す特殊記号
を検索する感情を示す特殊記号検索手段と、入力バッフ
ァサイズ以下の文字数となる文字列の係り受け情報を調
査する係り受け情報調査手段と、前記句点検索手段、前
記読点検索手段、前記括弧終了位置検索手段、前記感情
を示す特殊記号検索手段、前記係り受け情報調査手段に
基づいて入力されたテキストデータ文字列から入力バッ
ファサイズ以下の文字列を分離し、合成音声単位文字列
を生成する合成単位文字列生成手段と、合成音声を生成
するための音声素片データベースと、前記合成単位文字
列生成手段により生成された文字列を合成音声データに
変換する合成音声生成手段と、前記合成音声生成手段に
より生成された音声信号を音声出力する音声出力手段と
を備えて構成される。Further, the speech synthesizing apparatus of the present invention includes a text input unit for inputting a text, a period search unit for searching for a period having a number of characters smaller than the input buffer size, and a search for a reading point having a number of characters smaller than the input buffer size. Reading point searching means, a parenthesis end position searching means for searching for an end position of a parenthesis having a number of characters equal to or less than the input buffer size, and a special symbol search indicating an emotion for searching for a special symbol indicating an emotion having a number of characters equal to or less than the input buffer size Means, dependency information checking means for checking dependency information of a character string having a number of characters equal to or smaller than the input buffer size, the period search means, the reading point search means, the parenthesis end position search means, the special symbol indicating the emotion A search unit configured to convert a text data character string input based on the A synthesizing unit character string generating means for separating a character string and generating a synthetic speech unit character string, a speech unit database for generating a synthetic speech, and synthesizing a character string generated by the synthesizing unit character string generating means It comprises a synthesized voice generating means for converting the voice data into voice data, and a voice output means for voice outputting the voice signal generated by the synthesized voice generating means.

【００１４】このような構成により、入力バッファサイ
ズ以上の文字数である文字列が入力された場合にも、文
章を自然に区切ることができ、文意に沿った合成音声を
生成することができる音声合成装置を提供することがで
きる。With such a configuration, even when a character string having a number of characters equal to or larger than the input buffer size is input, a sentence can be naturally separated, and a synthesized speech can be generated according to the sentence. A synthesizer can be provided.

【００１５】また、本発明の音声合成装置は、感情が付
与された感情付与音声素片データベースと、朗読調音声
素片データベースと、前記合成単位文字列生成手段によ
り生成された文字列を検索し、前記文字列の中に感情を
示す特殊記号がある場合に、前記感情付与音声素片デー
タベースを選択し、前記文字列の中に感情を示す特殊記
号がない場合に、前記朗読調音声素片データベースを選
択する音声素片データベース選択手段とを備えて構成さ
れる。Further, the speech synthesizing apparatus of the present invention retrieves an emotion-attached speech unit database to which an emotion is attached, a reading-speech-based speech unit database, and a character string generated by the synthesis unit character string generation unit. When there is a special symbol indicating an emotion in the character string, the emotion-added speech unit database is selected. When there is no special symbol indicating the emotion in the character string, the reading-speech-based speech unit is selected. And a speech unit database selecting means for selecting a database.

【００１６】このような構成により、入力バッファサイ
ズ以上の文字数である文字列が入力された場合にも、文
章を自然に区切ることができ、文意に沿った合成音声を
生成することができるとともに、感情を示す特殊記号に
応じた感情が付与された合成音声を生成することができ
る音声合成装置を提供することができる。With such a configuration, even when a character string having the number of characters equal to or larger than the input buffer size is input, the sentence can be naturally divided, and a synthesized speech in accordance with the sentence can be generated. In addition, it is possible to provide a speech synthesizer capable of generating a synthesized speech to which an emotion corresponding to a special symbol indicating an emotion is given.

【００１７】また、本発明の音声合成装置は、口語調音
声素片データベースと、朗読調音声素片データベース
と、前記合成単位文字列生成手段により生成された文字
列を検索し、前記文字列の中に括弧に囲まれた文字列が
ある場合に、前記口語調音声素片データベースを選択
し、前記文字列の中に括弧に囲まれた文字列がない場合
に、前記朗読調音声素片データベースを選択する音声素
片データベース選択手段とを備えて構成される。Further, the speech synthesizer of the present invention searches a spoken speech unit database, a reading-speech unit database, and a character string generated by the synthesizing unit character string generating means, and searches for the character string. When there is a character string enclosed in parentheses, the colloquial speech unit database is selected, and when there is no character string enclosed in parentheses in the character string, the reading-speech-based speech unit database is selected. And a speech unit database selecting means for selecting

【００１８】このような構成により、入力バッファサイ
ズ以上の文字数である文字列が入力された場合にも、文
章を自然に区切ることができ、文意に沿った合成音声を
生成することができるとともに、括弧で囲まれた文字列
は口語調の合成音声を生成することができる音声合成装
置を提供することができる。With such a configuration, even when a character string having a number of characters equal to or larger than the input buffer size is input, the sentence can be naturally separated, and a synthesized speech can be generated according to the sentence. , A character string enclosed in parentheses can provide a speech synthesizer capable of generating a spoken synthesized speech.

【００１９】[0019]

【発明の実施の形態】以下、本発明の実施形態を図面に
基づいて説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００２０】図１、図２は本発明に係る音声合成装置お
よび音声合成方法の第１実施形態を示す図である。FIGS. 1 and 2 show a first embodiment of a speech synthesis apparatus and a speech synthesis method according to the present invention.

【００２１】まず、構成を説明する。図１において、本
実施形態の音声合成装置は、テキスト入力手段１、句点
検索手段２、読点検索手段３、括弧終了位置検索手段
４、感情を示す特殊記号検索手段５、係り受け情報調査
手段６、合成単位文字列生成手段７、音声合成生成手段
８、音声素片データベース９および音声出力手段10から
構成される。First, the configuration will be described. In FIG. 1, the speech synthesizer of this embodiment includes a text input unit 1, a period search unit 2, a reading point search unit 3, a parenthesis end position search unit 4, a special symbol search unit 5 indicating emotion, and a dependency information search unit 6. , A synthesis unit character string generation means 7, a speech synthesis generation means 8, a speech unit database 9, and a speech output means 10.

【００２２】テキスト入力手段１にはテキスト文書が入
力されるようになっており、句点検索手段２はこの入力
されたテキスト文書から入力バッファサイズ以下の文字
数となる句点を検索する。また、読点検索手段３は入力
バッファサイズ以下の文字数となる読点を検索するよう
にになっており、括弧終了位置検索手段４により入力バ
ッファサイズ以下の文字数となる括弧終了位置を検索す
る。A text document is input to the text input unit 1. The period search unit 2 searches the input text document for a period having a character number equal to or smaller than the input buffer size. The reading point searching means 3 searches for a reading point having a number of characters equal to or smaller than the input buffer size, and the parenthesis ending position searching means 4 searches for a parenthesis ending position having a number of characters equal to or smaller than the input buffer size.

【００２３】また、殊記号検索手段５は入力バッファサ
イズ以下の文字数となる感情を示す特殊記号を検索する
ようになっており、係り受け情報調査手段６は入力バッ
ファサイズ以下の文字列の係り受け情報を調査するよう
になっている。Further, the special symbol search means 5 is designed to search for a special symbol indicating an emotion having the number of characters less than the input buffer size, and the dependency information search means 6 is used for the dependency of a character string smaller than the input buffer size. They are looking for information.

【００２４】また、合成単位文字列生成手段７により句
点検索手段２、読点検索手段３、括弧終了位置検索手段
４、感情を示す特殊記号検索手段５、係り受け情報調査
手段６によって得られた情報に基づいて、入力バッファ
サイズ以下となる文字列をテキストデータ文字列から分
離し、合成単位文字列を生成するようになっている。ま
た、合成音声生成手段８は音声素片データベース９を使
い合成単位文字列生成手段７により生成された文字列の
合成音声を生成するようになっている。The information obtained by the combining unit character string generation means 7 by the period search means 2, the reading point search means 3, the parenthesis end position search means 4, the special symbol search means 5 indicating emotion, and the dependency information search means 6 , A character string smaller than the input buffer size is separated from the text data character string, and a combined unit character string is generated. The synthesized speech generation means 8 generates a synthesized speech of the character string generated by the synthesis unit character string generation means 7 using the speech unit database 9.

【００２５】また、音声出力手段10は合成音声生成手段
８で生成された合成音声を出力するようになっている。The voice output means 10 outputs the synthesized voice generated by the synthesized voice generation means 8.

【００２６】次に、図２に示すフローチャートに基づい
て音声合成方法を説明する。Next, a speech synthesis method will be described with reference to the flowchart shown in FIG.

【００２７】まず、テキスト文書の入力処理が行なわれ
ると（ステップＳ1）、句点検索手段２によって入力バ
ッファサイズ以下の文字数となる句点を検索する処理を
実行する（ステップＳ2）。First, when a text document input process is performed (step S1), a process for searching for a period having a number of characters equal to or smaller than the input buffer size is executed by the period searching means 2 (step S2).

【００２８】次いで、読点検索手段３によって入力バッ
ファサイズ以下の文字数となる読点を検索する処理を実
行した後（ステップＳ3）、括弧終了位置検索手段４に
よって入力バッファサイズ以下の文字数となる括弧の終
了位置を検索する処理を実行する（ステップＳ4）。次
いで、特殊記号検索手段５によって入力バッファサイズ
以下の文字数となる感情を示す特殊記号を検索する処理
を実行した後（ステップＳ5）、係り受け情報調査手段
６によって入力バッファサイズ以下の文字列の係り受け
情報を調査する処理を実行する（ステップＳ6）。Next, after the reading point searching means 3 performs a process of searching for a reading point having a number of characters smaller than the input buffer size (step S3), the parenthesis ending position searching means 4 ends the parentheses having the number of characters smaller than the input buffer size. A process for searching for a position is executed (step S4). Next, the special symbol searching means 5 executes a process of searching for a special symbol indicating an emotion having the number of characters equal to or smaller than the input buffer size (step S5). A process for checking the received information is executed (step S6).

【００２９】次いで、合成単位文字列生成手段７によっ
て合成単位文字列を生成する処理を実行した後（ステッ
プＳ7）、音声合成生成手段８によって合成音声を生成
する処理を実行する（ステップＳ8）。次いで、入力さ
れた文字列すべてが音声に変換されたか否かを判別し
（ステップＳ9）、変換されていた場合には今回の処理
を終了し、変換されていない場合には、ステップＳ2に
戻ってステップＳ2以降の処理を実行して入力されたテ
キストの文字列の全てが音声に変換されるまで一連の処
理を繰り返し、すべての文字列を音声化する。Next, after the processing for generating a synthesized unit character string is executed by the synthesized unit character string generating means 7 (step S7), the processing for generating synthesized speech is executed by the speech synthesis generating means 8 (step S8). Next, it is determined whether or not all the input character strings have been converted to voice (step S9). If the character strings have been converted, the current process is terminated. If not, the process returns to step S2. Then, a series of processes is repeated until all the character strings of the input text are converted into voice by executing the processing after step S2, and all the character strings are converted into voice.

【００３０】このように本実施形態では、入力バッファ
サイズ以上の文字数である文字列が入力された場合に
も、文章を自然に区切り、文意に沿った合成音声を生成
することができる。As described above, according to the present embodiment, even when a character string having a number of characters equal to or larger than the input buffer size is input, a sentence can be naturally separated and a synthesized speech can be generated according to the sentence.

【００３１】図３、図４は本発明に係る音声合成方法お
よび音声合成装置の第２実施形態を示す図である。な
お、本実施形態では、第１実施形態と同様の構成には、
同一番号を付して説明を省略する。FIGS. 3 and 4 are diagrams showing a second embodiment of the voice synthesizing method and the voice synthesizing apparatus according to the present invention. In the present embodiment, the same configuration as the first embodiment includes:
The same numbers are assigned and the description is omitted.

【００３２】本実施形態では、第１実施形態と異なる構
成は、図３に示すように、音声素片データベース選択手
段11、朗読調音声素片データベース12、感情付与音声素
片データベース13を有する点である。The second embodiment differs from the first embodiment in that it has a speech unit database selection means 11, a reading-speech-based speech unit database 12, and an emotion-added speech unit database 13, as shown in FIG. It is.

【００３３】音声素片データベース選択手段11は、合成
単位文字列生成手段７によって生成された合成単位文字
列に感情を示す特殊記号が含まれているか否かを判別す
るようになっており、感情を示す特殊記号が含まれてい
る場合には、感情付与音声素片データベース13を選択
し、そうでない場合には朗読調音声素片データベース12
を選択するようになっている。The speech unit database selecting means 11 determines whether or not the synthesized unit character string generated by the synthesized unit character string generating means 7 includes a special symbol indicating an emotion. Is selected, the emotion-attached speech unit database 13 is selected. Otherwise, the reading-speech-based speech unit database 12 is selected.
Is to be selected.

【００３４】また、感情付与音声素片データベース13に
は感情が付与された素片が記憶されており、朗読調音声
素片データベース12には感情が付与されていない朗読調
の素片が記憶されている。In addition, the emotion-added speech unit database 13 stores a segment to which an emotion is added, and the reading-speech-based speech unit database 12 stores a reading-tone unit to which no emotion is added. ing.

【００３５】次に、図４に基づいて音声合成方法を説明
する。Next, a speech synthesis method will be described with reference to FIG.

【００３６】テキスト文書の入力処理が行なわれると
（ステップＳ11）、句点検索手段２によって入力バッフ
ァサイズ以下の文字数となる句点を検索する処理を実行
する（ステップＳ12）。When the input process of the text document is performed (step S11), a process of searching for a period having the number of characters equal to or smaller than the input buffer size is executed by the period searching means 2 (step S12).

【００３７】次いで、読点検索手段３によって入力バッ
ファサイズ以下の文字数となる読点を検索する処理を実
行した後（ステップＳ13）、括弧終了位置検索手段４に
よって入力バッファサイズ以下の文字数となる括弧の終
了位置を検索する処理を実行する（ステップＳ14）。次
いで、特殊記号検索手段５によって入力バッファサイズ
以下の文字数となる感情を示す特殊記号を検索する処理
を実行した後（ステップＳ15）、係り受け情報調査手段
７によって入力バッファサイズ以下の文字列の係り受け
情報を調査する処理を実行する（ステップＳ16）。Next, after the reading point searching means 3 performs a process of searching for a reading point having a number of characters smaller than the input buffer size (step S13), the parenthesis ending position searching means 4 ends the parentheses having the number of characters smaller than the input buffer size. A process for searching for a position is executed (step S14). Next, the special symbol searching means 5 performs a process of searching for a special symbol indicating an emotion having the number of characters equal to or smaller than the input buffer size (step S15). A process for checking the received information is executed (step S16).

【００３８】次いで、合成単位文字列生成手段７によっ
て合成単位文字列を生成する処理を実行した後（ステッ
プＳ17）、音声素片データベース選択手段11によって合
成単位文字列生成手段７で生成された合成単位文字列に
感情を示す特殊記号が含まれているか否かを判別し、感
情を示す特殊記号が含まれている場合には、感情付与音
声素片データベース13を選択し、そうでない場合には朗
読調音声素片データベース12を選択する（ステップＳ1
8）。Next, after the synthesis unit character string generation means 7 executes processing for generating a synthesis unit character string (step S17), the speech unit database selection means 11 generates the synthesis unit character string generation means 7. It is determined whether or not the unit character string includes a special symbol indicating an emotion.If the unit character string includes a special symbol indicating an emotion, the unit 13 selects the emotion-giving speech unit database 13, and if not, otherwise. Select the reading voice unit database 12 (step S1)
8).

【００３９】次いで、音声合成生成手段８によって選択
されたデータベースの合成音声を生成する処理を実行し
た後（ステップＳ19）、入力された文字列すべてが音声
に変換されたか否かを判別し（ステップＳ20）、変換さ
れていた場合には今回の処理を終了し、変換されていな
い場合にはステップＳ12に戻ってステップＳ12以降の処
理を実行して入力されたテキストの文字列の全てが音声
に変換されるまで一連の処理を繰り返し、すべての文字
列を音声化する。Next, after executing a process of generating a synthesized speech of the database selected by the speech synthesis generating means 8 (step S19), it is determined whether or not all the input character strings have been converted into speech (step S19). S20) If the conversion has been performed, the current processing is terminated. If the conversion has not been performed, the process returns to step S12 to execute the processing of step S12 and thereafter, and all the character strings of the input text are converted to voice. A series of processing is repeated until conversion, and all character strings are converted to speech.

【００４０】このように本実施形態では、入力バッファ
サイズ以上の文字数である文字列が入力された場合に
も、文章を自然に区切り、文意に沿った合成音声を生成
することができるとともに、感情を示す特殊記号に応じ
て感情が付与された合成音声を生成することができる。As described above, in the present embodiment, even when a character string having the number of characters equal to or larger than the input buffer size is input, the sentence can be naturally separated, and synthesized speech can be generated according to the sentence. It is possible to generate a synthetic voice to which an emotion is given according to the special symbol indicating the emotion.

【００４１】図５、図６は本発明に係る音声合成方法お
よび音声合成装置の第３実施形態を示す図である。な
お、本実施形態では、第１実施形態と同様の構成には、
同一番号を付して説明を省略する。FIGS. 5 and 6 show a third embodiment of the voice synthesizing method and the voice synthesizing apparatus according to the present invention. In the present embodiment, the same configuration as the first embodiment includes:
The same numbers are assigned and the description is omitted.

【００４２】本実施形態では、第１実施形態と異なる構
成は、図５に示すように、音声素片データベース選択手
段21、朗読調音声素片データベース22、口語調音声素片
データベース23を有する点である。The present embodiment differs from the first embodiment in that it has a speech unit database selecting means 21, a reading-speech-based speech unit database 22, and a spoken-speech-based speech unit database 23, as shown in FIG. It is.

【００４３】音声素片データベース選択手段11は、合成
単位文字列生成手段７によって生成された合成単位文字
列に中に括弧に囲まれた文字列があるか否かを判別する
ようになっており、括弧に囲まれた文字がある場合に
は、口語調音声素片データベース23を選択し、括弧に囲
まれた文字がない場合には、朗読調音声素片データベー
ス22を選択するようになっている。The speech unit database selection means 11 is adapted to determine whether or not the synthesis unit character string generated by the synthesis unit character string generation means 7 includes a character string enclosed in parentheses. If there is a character enclosed in parentheses, the colloquial speech unit database 23 is selected, and if there is no character enclosed in parentheses, the reading unit 22 is selected. I have.

【００４４】また、朗読調音声素片データベース22には
感情が付与されていない朗読調の素片が記憶されてお
り、口語調音声素片データベース23には口語調の素片が
記憶されている。Further, the reading-speech-speech unit database 22 stores speech-segment segments to which no emotion is given, and the spoken-speech unit database 23 stores speech-speech units. .

【００４５】次に、図６に基づいて音声合成方法を説明
する。Next, a speech synthesis method will be described with reference to FIG.

【００４６】テキスト文書の入力処理が行なわれると
（ステップＳ21）、句点検索手段２によって入力バッフ
ァサイズ以下の文字数となる句点を検索する処理を実行
する（ステップＳ22）。When the input process of the text document is performed (step S21), a process of searching for a period having the number of characters smaller than the input buffer size is executed by the period searching means 2 (step S22).

【００４７】次いで、読点検索手段３によって入力バッ
ファサイズ以下の文字数となる読点を検索する処理を実
行した後（ステップＳ23）、括弧終了位置検索手段４に
よって入力バッファサイズ以下の文字数となる括弧の終
了位置を検索する処理を実行する（ステップＳ24）。次
いで、特殊記号検索手段５によって入力バッファサイズ
以下の文字数となる感情を示す特殊記号を検索する処理
を実行した後（ステップＳ25）、係り受け情報調査手段
７によって入力バッファサイズ以下の文字列の係り受け
情報を調査する処理を実行する（ステップＳ26）。Next, after the reading point searching means 3 performs a process of searching for a reading point having a number of characters smaller than the input buffer size (step S23), the parenthesis ending position searching means 4 ends the parentheses having the number of characters smaller than the input buffer size. A process for searching for a position is executed (step S24). Next, the special symbol searching means 5 executes a process of searching for a special symbol indicating an emotion having the number of characters equal to or smaller than the input buffer size (step S25). A process for checking the received information is executed (step S26).

【００４８】次いで、合成単位文字列生成手段７によっ
て合成単位文字列を生成する処理を実行した後（ステッ
プＳ27）、音声素片データベース選択手段11によって合
成単位文字列生成手段７で生成された合成単位文字列に
括弧で囲まれた文字列が含まれているか否かを判別し、
括弧で囲まれた文字列が含まれている場合には、口語調
音声素片データ23を選択し、そうでない場合には朗読調
音声素片データベース22を選択する（ステップＳ28）。Next, after performing a process of generating a synthesized unit character string by the synthesized unit character string generating unit 7 (step S27), the synthesis unit character string generating unit 7 generates the synthesized unit character string by the speech unit database selecting unit 11. Determines whether the unit string contains a character string enclosed in parentheses,
If a character string enclosed in parentheses is included, the spoken voice unit data 23 is selected, otherwise, the reading voice unit database 22 is selected (step S28).

【００４９】次いで、音声合成生成手段８によって選択
されたデータベースの合成音声を生成する処理を実行し
た後（ステップＳ29）、入力された文字列すべてが音声
に変換されたか否かを判別し（ステップＳ30）、変換さ
れていた場合には今回の処理を終了し、変換されていな
い場合にはステップＳ22に戻ってステップＳ22以降の処
理を実行して入力されたテキストの文字列の全てが音声
に変換されるまで一連の処理を繰り返し、すべての文字
列を音声化する。Next, after executing a process of generating a synthesized speech of the database selected by the speech synthesis generating means 8 (step S29), it is determined whether or not all the input character strings have been converted into speech (step S29). S30) If the conversion has been performed, the current processing is terminated. If the conversion has not been performed, the process returns to step S22 to execute the processing of step S22 and thereafter, and all the character strings of the input text are converted to voice. A series of processing is repeated until conversion, and all character strings are converted to speech.

【００５０】このように本実施形態では、入力バッファ
サイズ以上の文字数である文字列が入力された場合に
も、文章を自然に区切り、文意に沿った合成音声を生成
することができるとともに、括弧に囲まれた文字列は口
語調の合成音声を生成することができる。As described above, in the present embodiment, even when a character string having the number of characters equal to or larger than the input buffer size is input, it is possible to naturally separate sentences and generate synthesized speech according to the meaning of the sentence. A character string enclosed in parentheses can generate a spoken synthesized voice.

【００５１】[0051]

【発明の効果】本発明によれば、入力バッファサイズ以
下の文字数となるように入力された文字列を合成単位文
字列として分離し、合成単位文字列毎に合成音声に変換
することにより、入力バッファサイズの制限がある場合
でも文章を自然に合成音声に変換することができるとい
う効果を有する。According to the present invention, a character string input so as to have a number of characters equal to or smaller than the input buffer size is separated as a synthetic unit character string, and is converted into a synthetic voice for each synthetic unit character string. There is an effect that a sentence can be naturally converted into a synthesized speech even when the buffer size is limited.

[Brief description of the drawings]

【図１】本発明に係る音声合成方法および音声合成装置
の第１実施形態を示す図であり、音声合成装置のブロッ
ク図FIG. 1 is a diagram showing a first embodiment of a speech synthesis method and a speech synthesis device according to the present invention, and is a block diagram of the speech synthesis device;

【図２】第１実施形態の音声合成方法のフローチャートFIG. 2 is a flowchart of a speech synthesis method according to the first embodiment;

【図３】本発明に係る音声合成方法および音声合成装置
の第２実施形態を示す図であり、音声合成装置のブロッ
ク図FIG. 3 is a diagram showing a second embodiment of the speech synthesis method and the speech synthesis device according to the present invention, and is a block diagram of the speech synthesis device;

【図４】第２実施形態の音声合成方法のフローチャートFIG. 4 is a flowchart of a speech synthesis method according to a second embodiment.

【図５】本発明に係る音声合成方法および音声合成装置
の第３実施形態を示す図であり、音声合成装置のブロッ
ク図FIG. 5 is a diagram showing a third embodiment of the speech synthesis method and the speech synthesis device according to the present invention, and is a block diagram of the speech synthesis device;

【図６】第３実施形態の音声合成方法のフローチャートFIG. 6 is a flowchart of a speech synthesis method according to a third embodiment.

[Explanation of symbols]

１テキスト入力手段２句点検索手段３読点検索手段４括弧終了位置検索手段５感情を示す特殊記号検索手段６係り受け情報調査手段７合成単位文字列生成手段８音声合成生成手段９音声素片データベース 10 音声出力手段 11 データベース選択手段 12 朗読調音声素片データベース 13 感情付与音声素片データベース 21 データベース選択手段 22 朗読調音声素片データベース 23 口語調音声素片データベース DESCRIPTION OF SYMBOLS 1 Text input means 2 Term search means 3 Reading point search means 4 Parentheses end position search means 5 Special symbol search means showing emotion 6 Dependency information search means 7 Synthesis unit character string generation means 8 Speech synthesis generation means 9 Speech unit database 10 Speech output means 11 Database selection means 12 Reading-speech speech unit database 13 Emotional speech unit database 21 Database selection means 22 Reading-speech speech unit database 23 Spoken speech unit database

Claims

[Claims]

1. A punctuation mark, an end position of parentheses, a position of a special symbol indicating an emotion, and dependency information having a number of characters equal to or less than an input buffer size are detected from an input character string. A speech synthesis method characterized in that the synthesis unit character string is separated and a synthesized speech is generated for each of the separated synthesis unit character strings.

2. A special symbol indicating an emotion in the separated synthesized unit character string is selected from an emotion-attached speech unit database from an emotion-attached speech unit database and a reading-speech-based speech unit database. 2. The speech synthesis method according to claim 1, wherein a synthesized speech is generated.

3. For the character string enclosed in parentheses in the separated synthesized unit character string, colloquial speech unit data is selected from a colloquial speech unit database and a reading-speech unit speech database. 3. The speech synthesis method according to claim 1, wherein the speech synthesis method generates a synthesized speech.

4. A text input unit for inputting a text, a period search unit for searching for a period having a number of characters equal to or smaller than the input buffer size, a reading point searching unit for searching for a period having a number of characters equal to or smaller than the input buffer size, and an input buffer. A parenthesis end position search means for searching for an end position of a parenthesis having a number of characters equal to or less than the size, a special symbol searching means for indicating an emotion for searching for a special symbol indicating an emotion having a number of characters equal to or less than the input buffer size, Dependency information investigating means for investigating dependency information of a character string which is the number of characters, the period search means, the reading point search means, the parenthesis end position search means, the special symbol search means indicating the emotion, the dependency information search A character string smaller than the input buffer size is separated from the text data character string input based on the , A speech unit database for generating a synthesized speech, and a synthesized speech generation means for converting the character string generated by the synthesized unit character string generation means into synthesized speech data, A voice output unit that outputs a voice signal generated by the synthesized voice generation unit.

5. Searching an emotion-attached speech unit database to which an emotion has been added, a reading-speech-based speech unit database, and a character string generated by the synthetic unit character string generation means,
When there is a special symbol indicating an emotion in the character string, the emotion-added speech unit database is selected. When there is no special symbol indicating an emotion in the character string, the reading-speech-based speech unit database is selected. 5. The speech synthesis apparatus according to claim 4, further comprising: a speech unit database selection unit for selecting a speech unit.

6. A colloquial speech unit database, a reading-speech unit database, and a character string generated by the synthesis unit character string generation unit, and a character string enclosed in parentheses is included in the character string. Speech unit database selection for selecting the spoken speech unit database when there is a sequence, and for selecting the reading tone speech unit database when there is no character string enclosed in parentheses in the character string. The voice synthesizing apparatus according to claim 4 or 5, further comprising: