JP5370138B2

JP5370138B2 - Input auxiliary device, input auxiliary program, speech synthesizer, and speech synthesis program

Info

Publication number: JP5370138B2
Application number: JP2009295267A
Authority: JP
Inventors: 勉兼安
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2009-12-25
Filing date: 2009-12-25
Publication date: 2013-12-18
Anticipated expiration: 2029-12-25
Also published as: JP2011133803A

Abstract

PROBLEM TO BE SOLVED: To improve quality of voice to be generated when generating voice synthesis for reading text data. SOLUTION: The invention is related to a voice synthesis device for outputting voice for reading the text data, and an input assistance device for making a user input the text data. The input assistance device includes an input assistance section for generating the text data described by discriminating ranges of an object word registered in a second database, and a word other than the object word, in the first database used for voice synthesis by the voice synthesis device, and the second database in which the voice data of a predetermined object word are registered, according to the operation of the user. The voice synthesis device includes a means for generating synthesized voice by using voice data registered in the second database for the range discriminated as the object word in the text data, and by using data of the first database for the range which is not discriminated as the object word. COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、入力補助装置、入力補助プログラム、音声合成装置及び音声合成プログラムに関し、例えば、テキストデータを読み上げる音声合成に適用し得る。 The present invention relates to an input assist device, an input assist program, a speech synthesizer, and a speech synthesis program, and can be applied to speech synthesis that reads out text data, for example.

ユーザが入力した文字データ（テキストデータ）を、読み上げる音声を、コーパスベースで合成する音声合成装置（予め蓄積した音声波形を、音素単位（合成単位）で接続して合成する装置）としては、従来、特許文献１に記載の方法がある。 Conventionally, as a speech synthesizer that synthesizes speech to be read out from character data (text data) input by a user on a corpus basis (a device that synthesizes speech waveforms accumulated in advance in units of phonemes (synthesis units)). There is a method described in Patent Document 1.

特開２００３−２０８１８８号公報JP 2003-208188 A

しかしながら、特許文献１の記載技術のような、従来のコーパスベースでの音声合成装置では、例えば、人名、地名等の固有名詞や、方言等について、不自然な発音の合成音声となり、読み上げた音声が聞きづらくなってしまう場合があった。 However, in a conventional corpus-based speech synthesizer such as the technology described in Patent Literature 1, for example, a proper noun such as a person name or a place name, a dialect, etc., becomes a synthesized speech with unnatural pronunciation and is read out. May be difficult to hear.

そのため、テキストデータを読み上げる音声合成を生成する際に、生成される音声の品質を向上させることができる入力補助装置、入力補助プログラム、音声合成装置及び音声合成プログラムが望まれている。 Therefore, there is a demand for an input assist device, an input assist program, a speech synthesizer, and a speech synthesis program that can improve the quality of generated speech when generating speech synthesis that reads text data.

第１の本発明は、テキストデータの内容を読み上げる音声を生成する音声合成装置に供給するためのテキストデータを、ユーザの操作に応じて生成する入力補助装置において、（１）上記音声合成装置が有する、音声合成に用いる第１のデータベースと、所定の対象語の音声データが登録された第２のデータベースのうち、上記第２のデータベースに登録された対象語と、対象語以外の範囲とを区別して表記されたテキストデータを上記ユーザの操作に応じて生成する入力補助部を有し、（３）上記入力補助部は、（３−１）上記第２のデータベースに登録された対象語の情報を保持する対象語保持手段と、（３−２）上記ユーザからの文字入力を逐次受付けるリアルタイム入力手段と、（３−３）上記ユーザに当該入力補助部から提供する情報を表示する表示部と、（３−４）上記ユーザが上記リアルタイム入力手段に入力中の文字に関連する対象語を、上記対象語保持手段が保持した情報から抽出して、上記表示部に表示させる抽出表示手段と、（３−５）上記抽出表示手段により表示された対象語のうちいずれかを上記ユーザに選択させ、上記リアルタイム入力手段に、選択された対象語が上記ユーザから入力されたものとして通知する選択受付手段と、（３−６）上記リアルタイム入力手段に入力された文字列について、上記選択受付手段で上記ユーザにより選択された対象語と、それ以外の範囲とを区別した内容のテキストデータを生成するテキストデータ生成手段とを有することを特徴とする。
According to a first aspect of the present invention, there is provided an input auxiliary device for generating text data to be supplied to a speech synthesizer that generates a speech for reading out the contents of text data in accordance with a user operation. A first database used for speech synthesis, and a second database in which speech data of a predetermined target word is registered, a target word registered in the second database, and a range other than the target word the distinction was text data denoted by chromatic input auxiliary unit for generating in response to the operation of the user, (3) the input auxiliary section (3-1) of the target word which is registered in the second database Target word holding means for holding information; (3-2) real-time input means for sequentially receiving character input from the user; and (3-3) providing to the user from the input auxiliary unit. A display unit for displaying information, and (3-4) extracting a target word related to a character being input to the real-time input unit by the user from information held by the target word holding unit, and (3-5) causing the user to select any one of the target words displayed by the extraction display unit, and the real-time input unit inputs the selected target word from the user. (3-6) For the character string input to the real-time input unit, the target word selected by the user by the selection receiving unit is distinguished from the other range. characterized by have a text data generating means for generating the content text data.

第２の本発明の入力補助プログラムは、（１）テキストデータの内容を読み上げる音声を生成する音声合成装置に供給するためのテキストデータを、ユーザの操作に応じて生成する入力補助装置に搭載されたコンピュータを、（２）上記音声合成装置が有する、音声合成に用いる第１のデータベースと、所定の対象語の音声データが登録された第２のデータベースのうち、上記第２のデータベースに登録された対象語と、対象語以外の範囲とを区別して表記されたテキストデータを上記ユーザの操作に応じて生成する入力補助部として機能させ、（３）上記入力補助部は、（３−１）上記第２のデータベースに登録された対象語の情報を保持する対象語保持手段と、（３−２）上記ユーザからの文字入力を逐次受付けるリアルタイム入力手段と、（３−３）上記ユーザに当該入力補助部から提供する情報を表示する表示部と、（３−４）上記ユーザが上記リアルタイム入力手段に入力中の文字に関連する対象語を、上記対象語保持手段が保持した情報から抽出して、上記表示部に表示させる抽出表示手段と、（３−５）上記抽出表示手段により表示された対象語のうちいずれかを上記ユーザに選択させ、上記リアルタイム入力手段に、選択された対象語が上記ユーザから入力されたものとして通知する選択受付手段と、（３−６）上記リアルタイム入力手段に入力された文字列について、上記選択受付手段で上記ユーザにより選択された対象語と、それ以外の範囲とを区別した内容のテキストデータを生成するテキストデータ生成手段とを有することを特徴とする。
The input assist program of the second aspect of the present invention is mounted on an input assist device that generates (1) text data to be supplied to a speech synthesizer that generates speech that reads out the contents of text data in response to a user operation. (2) the first database used for speech synthesis possessed by the speech synthesizer and the second database in which speech data of a predetermined target word is registered, are registered in the second database. Text data written in a manner that distinguishes between the target word and a range other than the target word, in accordance with the user's operation, and (3) the input auxiliary unit is (3-1) Target word holding means for holding information on the target words registered in the second database; and (3-2) real-time input means for sequentially receiving character input from the user. (3-3) a display unit for displaying information provided from the input auxiliary unit to the user, and (3-4) a target word related to a character being input to the real-time input means by the user. Extracting from the information held by the word holding means and displaying on the display unit; and (3-5) causing the user to select one of the target words displayed by the extraction display means, and Selection accepting means for notifying the real-time input means that the selected target word has been inputted by the user; and (3-6) for the character string inputted to the real-time input means, And a text data generating means for generating text data having a content that distinguishes the target word selected by the above and the other range .

第３の本発明の音声合成装置は、（１）音声合成処理に用いる第１のデータベースと、（２）所定の対象用語の音声データが登録された第２のデータベースと、（３）上記第２のデータベースに登録された対象用語と、対象用語以外の範囲とを区別して表記されたテキストデータについて、対象用語として区別された範囲については、上記第２のデータベースに登録された音声データを用い、対象用語として区別されていない範囲については、第１のデータベースのデータを用いて、上記テキストデータを読上げる音声を生成する音声生成手段と、（４）上記第１のデータベースと、上記第２のデータベースのうち、上記第２のデータベースに登録された対象語と、対象語以外の範囲とを区別して表記されたテキストデータをユーザの操作に応じて生成する入力補助部を有し、（５）上記入力補助部は、（５−１）上記第２のデータベースに登録された対象語の情報を保持する対象語保持手段と、（５−２）上記ユーザからの文字入力を逐次受付けるリアルタイム入力手段と、（５−３）上記ユーザに当該入力補助部から提供する情報を表示する表示部と、（５−４）上記ユーザが上記リアルタイム入力手段に入力中の文字に関連する対象語を、上記対象語保持手段が保持した情報から抽出して、上記表示部に表示させる抽出表示手段と、（５−５）上記抽出表示手段により表示された対象語のうちいずれかを上記ユーザに選択させ、上記リアルタイム入力手段に、選択された対象語が上記ユーザから入力されたものとして通知する選択受付手段と、（５−６）上記リアルタイム入力手段に入力された文字列について、上記選択受付手段で上記ユーザにより選択された対象語と、それ以外の範囲とを区別した内容のテキストデータを生成するテキストデータ生成手段とを有することを特徴とする。
The speech synthesizer according to the third aspect of the present invention includes (1) a first database used for speech synthesis processing, (2) a second database in which speech data of a predetermined target term is registered, and (3) the first For the text data that is described by distinguishing the target term registered in the database 2 and the range other than the target term, the speech data registered in the second database is used for the range distinguished as the target term. For a range that is not distinguished as a target term, using the data of the first database, voice generating means for generating voice that reads the text data , (4) the first database, and the second Text data expressed by distinguishing the target word registered in the second database and the range other than the target word from the database of (5) the input auxiliary unit includes (5-1) target word holding means for holding information on the target word registered in the second database, and (5- 2) real-time input means for sequentially receiving character input from the user; (5-3) a display unit for displaying information provided from the input auxiliary unit to the user; and (5-4) the user inputs the real-time input. A target word related to the character being input to the means is extracted from information held by the target word holding means and displayed on the display unit; and (5-5) displayed by the extraction display means. Selection accepting means for causing the user to select one of the selected target words and notifying the real-time input means that the selected target word has been input from the user; (5-6) real-time input The character string input to the means, and characterized by having a text data generating means for generating text data of contents distinguish between target words selected by the user by the selection receiving unit, and a range other than it To do.

第４の本発明の音声合成プログラムは、（１）テキストデータの内容を読み上げる音声を生成する音声合成装置に搭載されたコンピュータを、（２）音声合成処理に用いる第１のデータベースと、（３）所定の対象語の音声データが登録された第２のデータベースと、（４）上記第２のデータベースに登録された対象語と、対象語以外の範囲とを区別して表記されたテキストデータについて、対象語として区別された範囲については、上記第２のデータベースに登録された音声データを用い、対象語として区別されていない範囲については、第１のデータベースのデータを用いて、上記テキストデータを読上げる音声を生成する音声生成手段と、（５）上記第１のデータベースと、上記第２のデータベースのうち、上記第２のデータベースに登録された対象語と、対象語以外の範囲とを区別して表記されたテキストデータをユーザの操作に応じて生成する入力補助部として機能させ、（６）上記入力補助部は、（６−１）上記第２のデータベースに登録された対象語の情報を保持する対象語保持手段と、（６−２）上記ユーザからの文字入力を逐次受付けるリアルタイム入力手段と、（６−３）上記ユーザに当該入力補助部から提供する情報を表示する表示部と、（６−４）上記ユーザが上記リアルタイム入力手段に入力中の文字に関連する対象語を、上記対象語保持手段が保持した情報から抽出して、上記表示部に表示させる抽出表示手段と、（６−５）上記抽出表示手段により表示された対象語のうちいずれかを上記ユーザに選択させ、上記リアルタイム入力手段に、選択された対象語が上記ユーザから入力されたものとして通知する選択受付手段と、（６−６）上記リアルタイム入力手段に入力された文字列について、上記選択受付手段で上記ユーザにより選択された対象語と、それ以外の範囲とを区別した内容のテキストデータを生成するテキストデータ生成手段とを有することを特徴とする。
A speech synthesis program according to a fourth aspect of the present invention includes: (1) a computer installed in a speech synthesizer that generates speech that reads out the contents of text data; (2) a first database used for speech synthesis processing; A) a second database in which speech data of a predetermined target word is registered; and (4) a text data that is described by distinguishing a target word registered in the second database and a range other than the target word. The speech data registered in the second database is used for the range distinguished as the target word, and the text data is read using the data in the first database for the range not distinguished as the target word. and sound generating means for generating a voice to raise, (5) and the first database of the second database, to the second database Functioning as an input auxiliary unit that generates text data in which a recorded target word and a range other than the target word are distinguished and generated according to a user operation. (6) The input auxiliary unit is (6-1 ) Target word holding means for holding information of the target word registered in the second database; (6-2) real-time input means for sequentially receiving character input from the user; and (6-3) to the user. A display unit for displaying information provided from the input auxiliary unit; and (6-4) extracting a target word related to a character being input to the real-time input unit by the user from information held by the target word holding unit. And (6-5) causing the user to select one of the target words displayed by the extraction display means, and the real-time input means selects the target word displayed on the display unit. A selection receiving means for notifying that the target word is input from the user; and (6-6) a target word selected by the user by the selection receiving means for the character string input to the real-time input means; It has a text data generation means for generating text data having contents that are distinguished from other ranges .

本発明によれば、テキストデータを読み上げる音声合成を生成する際に、生成される音声の品質を向上させることができる。 According to the present invention, it is possible to improve the quality of generated speech when generating speech synthesis that reads out text data.

第１の実施形態に係る音声合成装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the speech synthesizer which concerns on 1st Embodiment. 第１の実施形態に係るユーザデータベースに登録されている内容の例について示した説明図である。It is explanatory drawing shown about the example of the content registered into the user database which concerns on 1st Embodiment. 第１の実施形態に係る表示部により表示される画面の内容例について示した説明図である。It is explanatory drawing shown about the example of the content of the screen displayed by the display part which concerns on 1st Embodiment. 第１の実施形態に係るリアルタイム入力モード時の入力補助部の動作の例について示したフローチャートである。It is the flowchart shown about the example of operation | movement of the input assistance part at the time of the real-time input mode which concerns on 1st Embodiment. 第１の実施形態に係る入力補助部が、リアルタイム入力モード時の表示部により出力される画面遷移の例である。The input assistance part which concerns on 1st Embodiment is an example of the screen transition output by the display part at the time of real-time input mode. 第１の実施形態に係る入力補助部が、バッチ入力モード時の入力補助部の動作の例について示したフローチャートである。5 is a flowchart illustrating an example of the operation of the input auxiliary unit in the batch input mode by the input auxiliary unit according to the first embodiment. 第１の実施形態に係る入力補助部が、バッチ入力モード時の画面遷移の例について示した説明図である。It is explanatory drawing which showed the example of the screen transition at the time of the input assistance part which concerns on 1st Embodiment at the time of batch input mode. 第１の実施形態に係る入力補助部に入力されるテキストファイルの内容の例について示した説明図である。It is explanatory drawing shown about the example of the content of the text file input into the input assistance part which concerns on 1st Embodiment. 第１の実施形態に係る表示切替部による動作モード切替の動作について示した説明図である。It is explanatory drawing shown about the operation | movement of the operation mode switching by the display switching part which concerns on 1st Embodiment. 第１の実施形態に係る音声合成部の動作について示した説明図である。It is explanatory drawing shown about operation | movement of the speech synthesizer which concerns on 1st Embodiment. 第２の実施形態に係る音声合成装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the speech synthesizer which concerns on 2nd Embodiment. 第２の実施形態に係る入力補助装置において、動作モードが切り替わった場合の表示画面の遷移について示した説明図である。In the input assistance apparatus which concerns on 2nd Embodiment, it is explanatory drawing shown about the transition of a display screen when an operation mode switches. 第３の実施形態に係る音声合成部の動作について示した説明図である。It is explanatory drawing shown about operation | movement of the speech synthesizer which concerns on 3rd Embodiment.

（Ａ）第１の実施形態
以下、本発明による入力補助装置、入力補助プログラム、音声合成装置及び音声合成プログラムの第１の実施形態を、図面を参照しながら詳述する。なお、第１の実施形態の入力補助装置は、入力補助部である。 (A) First Embodiment Hereinafter, a first embodiment of an input assistance device, an input assistance program, a speech synthesis device, and a speech synthesis program according to the present invention will be described in detail with reference to the drawings. In addition, the input assistance apparatus of 1st Embodiment is an input assistance part.

（Ａ−１）第１の実施形態の構成
図１は、この実施形態の音声合成装置１０の全体構成を示すブロック図である。なお、図１において、括弧内の符号は、後述する第３の実施形態において用いられる符号である。 (A-1) Configuration of the First Embodiment FIG. 1 is a block diagram showing the overall configuration of the speech synthesizer 10 of this embodiment. In FIG. 1, the reference numerals in parentheses are those used in a third embodiment to be described later.

音声合成装置１０は、入力補助部２０及び音声合成部３０を有している。 The speech synthesizer 10 includes an input auxiliary unit 20 and a speech synthesizer 30.

入力補助部２０は、ユーザの操作等に応じて、音声合成対象のテキストデータを生成するものである。音声合成部３０は、入力補助部２０から与えられたテキストデータについて読み上げる音声を生成して出力するものである。 The input auxiliary unit 20 generates text data to be synthesized in response to a user operation or the like. The voice synthesizer 30 generates and outputs a voice to be read out for the text data given from the input auxiliary unit 20.

図１では、入力補助部２０と音声合成部３０は一体の装置として構成されているが、別々の装置（入力補助装置と音声合成装置）として構成するようにしても良い。 In FIG. 1, the input assist unit 20 and the speech synthesizer 30 are configured as an integrated device, but may be configured as separate devices (input assist device and speech synthesizer).

入力補助部２０は、プロセッサを有する情報処理装置（１台に限定されず、複数台を分散処理し得るようにしたものであっても良い。）上に、実施形態の入力補助プログラムをインストールすることにより構築しても良いが、その場合でも機能的には図１のように表すことができる。また、音声合成部３０についても同様の情報処理装置上に、実施形態の音声合成プログラムをインストールすることにより構成するようにしても良い。 The input auxiliary unit 20 installs the input auxiliary program of the embodiment on an information processing apparatus having a processor (not limited to one, but may be configured to be able to distribute a plurality of units). However, even in that case, it can be functionally expressed as shown in FIG. The speech synthesizer 30 may also be configured by installing the speech synthesis program of the embodiment on the same information processing apparatus.

以下の説明では、例として、入力補助プログラム（入力補助部２０）と音声合成プログラム（音声合成部３０）を１台のパソコン等の情報処理装置にインストールして構築する場合について説明するものとする。そして、その情報処理装置には、ユーザに表示出力するためのディスプレイと、音声出力するためのスピーカと、ユーザに文字入力や操作信号を入力させるための入力手段としてキーボード（マウスを含むようにしても良い）が搭載されているものとする。なお、入力手段は上述のものに限定されないものである。 In the following description, as an example, a case where the input assistance program (input assistance unit 20) and the speech synthesis program (speech synthesis unit 30) are installed and constructed in an information processing apparatus such as one personal computer will be described. . The information processing apparatus may include a display for outputting to the user, a speaker for outputting sound, and a keyboard (mouse) as input means for allowing the user to input characters and operation signals. ) Is installed. The input means is not limited to that described above.

次に、音声合成部３０の詳細について説明する。 Next, details of the speech synthesizer 30 will be described.

音声合成部３０は、テキスト分割部３１、音声合成処理部３２、音声結合部３３、合成音声用ＤＢ３４、ユーザデータベース３５を有している。 The voice synthesis unit 30 includes a text division unit 31, a voice synthesis processing unit 32, a voice combination unit 33, a synthesized voice DB 34, and a user database 35.

音声合成部３０は、音声合成に用いるデータベースとして、合成音声用ＤＢ３４とユーザデータベース３５の２つのデータベースを備えている。 The speech synthesizer 30 includes two databases, a synthesized speech DB 34 and a user database 35, as databases used for speech synthesis.

合成音声用ＤＢ３４は、音声合成に用いる音素片等のデータが格納されているデータベースであり、例えば、特許文献１の記載技術等、既存のコーパスベースで音声合成を行う際に用いられるデータベースを用いることができる。 The synthesized speech DB 34 is a database in which data such as phonemes used for speech synthesis are stored. For example, a database used when speech synthesis is performed based on an existing corpus such as the technology described in Patent Document 1 is used. be able to.

一方、ユーザデータベース３５は、所定の言葉について、音素片ではなく、その言葉の一連の音声のデータ（以下、「実音声データ」という）が、その実音声データの内容を示す情報と対応付けて登録されている。なお、実音声データは、実際に人間が発した音声を録音したものを適用するようにしても良いし、自然な発音となるような合成音声を予め作成して適用するようにしても良い。 On the other hand, the user database 35 registers, for a given word, not a phoneme piece but a series of voice data of the word (hereinafter referred to as “real voice data”) in association with information indicating the contents of the real voice data. Has been. The actual voice data may be recorded from voices actually recorded by humans, or synthesized voices that produce natural pronunciation may be applied in advance.

例えば、人名、地名等の固有名詞等、ユーザが良く使う言葉であるが、既存の音声合成処理では、自然な発音が困難な言葉を、実音声データとして登録することが望ましい。また、以下の説明において、ユーザデータベース３５に実音声データが登録されている言葉を、「重要語」と呼ぶものとする。 For example, words that are frequently used by the user, such as proper nouns such as names of people and places, etc., but it is desirable to register words that are difficult to be pronounced naturally as actual speech data in existing speech synthesis processing. In the following description, words for which actual voice data is registered in the user database 35 are referred to as “important words”.

図２は、ユーザデータベース３５に登録されている内容の例について示した説明図である。 FIG. 2 is an explanatory diagram showing an example of contents registered in the user database 35.

図２では、「音声ファイル」の項目は実音声データが格納されたデータファイルのファイル名を示しており、「表記」の項目は、対応する実音声データを読み上げた場合の重要語の内容を示している。 In FIG. 2, the item “voice file” indicates the file name of the data file in which the actual voice data is stored, and the item “notation” indicates the contents of the important words when the corresponding actual voice data is read out. Show.

ユーザデータベース３５には、「音声ファイル」の内容（例えば、「Ａ００１」）に対応する実音声ファイルのデータも格納されており、ファイル名により識別されているものとする。 The user database 35 also stores data of an actual audio file corresponding to the content of the “audio file” (for example, “A001”), and is identified by the file name.

図２では、実音声データを示すものとして、「音声ファイル」の項目を用いる例について説明しているが、その他にも、リンク先のＵＲＬを示したり、データベース上の識別子を用いたり、実音声データ自体を登録したりする等、対応する実音声データを示すことができる情報であれば、そのデータ形式は限定されないものである。 FIG. 2 illustrates an example in which the item “voice file” is used as the real voice data. However, in addition to this, the link destination URL, the identifier on the database, The data format is not limited as long as it is information that can indicate the corresponding actual voice data, such as registering the data itself.

音声合成部３０では、音声合成の際に、重要語に関しては、実音声データに基づく音声が出力され、それ以外の範囲に関しては、合成音声用ＤＢ３４のデータを用いて合成された音声が出力される。 At the time of speech synthesis, the speech synthesizer 30 outputs speech based on actual speech data for important words, and outputs speech synthesized using the data in the synthesized speech DB 34 for other ranges. The

図２では、例えば、「ａｂｃさん」に対応する音声ファイルは「Ａ００１」であり、この「Ａ００１」のファイル名のファイルに「ａｂｃさん」という重要語を読み上げた実音声データが格納されている。 In FIG. 2, for example, the voice file corresponding to “Mr. abc” is “A001”, and the actual voice data that reads out the important word “Mr. abc” is stored in the file with the file name “A001”. .

ユーザデータベース３５には、重要語として、同じ文字で表されるものであっても、感情表現等が異なるものを登録しておいても良い。例えば、図２に示すように、「はい（笑）」と「はい（泣）」という重要語が登録されているが、これは「はい（笑）」に対応するＡ０１０のデータファイルには笑った様子で「はい」と読み上げた実音声データが格納されており、「はい（泣）」に対応するＡ０１１のデータファイルには泣いた様子で「はい」と読み上げた実音声データが格納されていることを示している。なお、表記で括弧内の内容は、感情表現だけでなく、人名であるか地名であるか等、対応する実音声データの内容を説明する内容であれば限定されないものである。また、図２では、表記で、対応する実音声データの内容を説明する内容は括弧を用いて区切っているが、その他の記号を用いても良いし、表記とは異なるフィールドを設けてデータベースを構成するようにしても良い。 In the user database 35, important words that are expressed by the same character or different in emotional expression may be registered. For example, as shown in FIG. 2, the important words “Yes (laugh)” and “Yes (cry)” are registered, but this is laughing in the data file of A010 corresponding to “Yes (laugh)”. The actual voice data read out as “Yes” is stored and the actual voice data read out as “Yes” is stored in the data file of A011 corresponding to “Yes (cry)”. It shows that. Note that the content in parentheses in the notation is not limited as long as it is content that explains the content of the corresponding real voice data, such as whether it is a person name or a place name, as well as emotional expressions. In FIG. 2, the contents explaining the contents of the corresponding real voice data are separated by parentheses in the notation, but other symbols may be used, and a database different from the notation is provided by providing a database. You may make it comprise.

入力補助部２０から音声合成部３０に与えられるテキストデータでは、例えば、重要語が「ｘｙｚ株式会社」であった場合には、「＃ｘｙｚ株式会社＃」と「＃」という制御文字で囲われた形式で与えられ、音声合成部３０では、制御文字で囲われた言葉は、重要語であるものとして扱われるものとする。なお、以下では、重要語を区別する方法として、制御文字「＃」を用いるものとして説明するが、その他の記号（複数文字でも良い）を制御文字としても良いし、ＸＭＬ等におけるタグ形式を利用いて区別したりするようにしてもよく、その区別方法は限定されないものである。 In the text data given from the input auxiliary unit 20 to the speech synthesizer 30, for example, when the important word is “xyz corporation”, it is surrounded by control characters “#xyz corporation #” and “#”. In the speech synthesizer 30, words enclosed in control characters are treated as important words. In the following description, the control character “#” is used as a method of distinguishing important words. However, other symbols (may be a plurality of characters) may be used as control characters, and a tag format in XML or the like is used. The distinction method is not limited.

テキスト分割部３１は、入力補助部２０から、テキストデータが与えられると、制御文字（＃）を区切り文字として、音声合成すべき文字と、重要語の文字とに分割して、音声合成処理部３２に与える。 When the text data is given from the input auxiliary unit 20, the text dividing unit 31 divides the control character (#) into a character to be synthesized and a key word character by using the control character (#) as a delimiter. 32.

例えば、テキスト分割部３１に、「合格＃おめでとう＃だね。＃おおさか＃＃ｘｙｚ株式会社＃にくる？」というテキストデータが与えられた場合、このテキストデータは、「合格」、「おめでとう」、「だね。」「おおさか」、「ｘｙｚ株式会社」、「にくる？」というテキストデータに分割されて、音声合成処理部３２に与えられる。なお、制御文字＃で囲われた重要語に関しては、重要語である旨も併せて、音声合成処理部３２に通知されるものとする。 For example, when text data “passed # Congratulation # is it. #Osaka ### come to #yz Co., Ltd.?” Is given to the text dividing unit 31, this text data is “pass”, “congratulations”, The data is divided into text data “Dane.”, “Osaka”, “xyz Inc.”, and “Nikuru?” And provided to the speech synthesis processing unit 32. It should be noted that the speech synthesis processing unit 32 is notified of an important word surrounded by the control character # together with the fact that it is an important word.

そして、音声合成処理部３２は、テキスト分割部３１で分割された分割テキストデータの単位で、音声データの合成等を行う。音声合成処理部３２は、重要語に関しては、ユーザデータベース３５から実音声データを読み込み、重要語でない範囲については、合成音声用ＤＢ３４のデータを用いて音声合成を行う。そして、音声合成処理部３２は、ユーザデータベース３５から読み込んだ実音声データ、及び、合成音声用ＤＢ３４のデータに基づいて合成した音声データを、音声結合部３３に与える。 The speech synthesis processing unit 32 then synthesizes speech data in units of the divided text data divided by the text dividing unit 31. The speech synthesis processing unit 32 reads the actual speech data from the user database 35 for the important words, and performs speech synthesis using the data in the synthesized speech DB 34 for the range that is not the important words. Then, the speech synthesis processing unit 32 gives the speech combining unit 33 the actual speech data read from the user database 35 and the speech data synthesized based on the data of the synthesized speech DB 34.

なお、音声合成処理部３２で、テキストデータについて形態素解析等の分析を行う際には、テキスト分割部３１で分割された分割テキストデータごとに行うようにしても良いし、全てを結合した分割前の状態で分析するようにしても良い。また、音声合成処理部３２では、制御文字＃で囲われた文字以外で、ユーザデータベース３５に登録された重要語が存在すると判定できる場合には、その部分の音声をユーザデータベース３５から実音声データを読み込んで保持するようにしても良いが、この実施形態では、そのような処理は行わないものとして説明する。 When the speech synthesis processing unit 32 performs analysis such as morphological analysis on the text data, it may be performed for each divided text data divided by the text dividing unit 31 or before the division by combining all the text data. You may make it analyze in the state of. If the speech synthesis processing unit 32 can determine that there is an important word registered in the user database 35 other than the character enclosed by the control character #, the speech synthesis processing unit 32 sends the speech of that portion from the user database 35 to the actual speech data. However, in this embodiment, it is assumed that such processing is not performed.

音声結合部３３は、音声合成処理部３２から与えられた、ユーザデータベース３５から読み込んだ実音声データ、及び、合成音声用ＤＢ３４のデータに基づいた合成された音声データを、入力補助部２０から与えられたテキストデータと一致する並びで結合して出力する。 The voice combining unit 33 gives from the input auxiliary unit 20 the actual voice data read from the user database 35 and the synthesized voice data based on the data of the synthesized voice DB 34 given from the voice synthesis processing unit 32. Combine and output in a sequence that matches the text data.

なお、音声結合部３３の音声データの出力方法としては、スピーカにより表音出力するだけでなく、例えば、ディスク装置等の記憶装置に記憶させたり、通信により他の装置に出力する構成としても良く、その出力方法は限定されないものである。 In addition, as a method of outputting the voice data of the voice coupling unit 33, not only outputting the phonetic sound by the speaker but also, for example, it may be stored in a storage device such as a disk device or output to another device by communication. The output method is not limited.

次に、入力補助部２０の詳細構成について説明する。 Next, a detailed configuration of the input auxiliary unit 20 will be described.

入力補助部２０は、リアルタイム入力部２１、バッチ入力部２２、補完部２３、表示切替部２４、表示部２５を有している。 The input auxiliary unit 20 includes a real-time input unit 21, a batch input unit 22, a complementing unit 23, a display switching unit 24, and a display unit 25.

入力補助部２０は、テキストデータの入力をユーザから受けるものとして、リアルタイム入力部２１とバッチ入力部２２を有している。 The input auxiliary unit 20 includes a real-time input unit 21 and a batch input unit 22 for receiving text data input from the user.

リアルタイム入力部２１は、ユーザからキーボード等の入力装置を用いて、リアルタイムにテキストデータの入力を受付け、ユーザの操作に応じてテキストデータを生成し、その生成したテキストデータを、ユーザの操作に応じたタイミングで音声合成部３０に与える。 The real-time input unit 21 receives input of text data from the user in real time using an input device such as a keyboard, generates text data according to the user's operation, and generates the generated text data according to the user's operation. Is given to the speech synthesizer 30 at the same timing.

バッチ入力部２２は、ユーザからテキストデータの入ったファイル（以下、「テキストファイル」という）の入力（記録媒体による入力や、ネットワーク経由での入力等入力方法は限定されない）を受付ける。なお、以下では、テキストファイルは、複数行のテキストデータを含むものとして説明する。そして、バッチ入力部２２は、テキストファイル内のテキストデータのうち、ユーザの操作に応じたタイミングで、ユーザの操作に応じた行のテキストデータを、音声合成部３０に与える。 The batch input unit 22 accepts input of a file containing text data (hereinafter referred to as “text file”) from the user (input methods such as input by a recording medium and input via a network are not limited). In the following description, it is assumed that the text file includes a plurality of lines of text data. Then, the batch input unit 22 provides the text-to-speech unit 30 with text data in a line corresponding to the user's operation at a timing corresponding to the user's operation among the text data in the text file.

表示切替部２４は、ユーザの操作に応じて、入力補助部２０の動作モードを切り替える機能を担っている。入力補助部２０は、入力補助部２０において、リアルタイム入力部２１によりユーザからのテキスト入力を受付ける状態である「リアルタイム入力モード」と、バッチ入力部２２によりテキストファイルのテキストデータを処理対象として取り扱う「バッチ入力モード」の２つの動作モードを備えており、表示切替部２４により切替が行われる。 The display switching unit 24 has a function of switching the operation mode of the input auxiliary unit 20 in accordance with a user operation. The input auxiliary unit 20 handles the “real-time input mode” in which the text input from the user is received by the real-time input unit 21 and the text data of the text file by the batch input unit 22 as a processing target. Two operation modes of “batch input mode” are provided, and the display switching unit 24 performs switching.

表示切替部２４が動作モードを切替える契機については、限定されないものであるが、例えば、ユーザから入力補助部２０が有するキーボードにより所定のキー入力（例えば「Ｔａｂ」キーの押下等）が行われた場合や、マウスによる画面上のボタン（図示せず）を押下すること等により切替るようにしても良い。 The trigger for switching the operation mode by the display switching unit 24 is not limited. For example, a predetermined key input (for example, pressing of the “Tab” key) is performed by the user using the keyboard of the input auxiliary unit 20. In some cases, the screen may be switched by pressing a button (not shown) on the screen with a mouse.

表示部２５は、当該入力補助部２０の操作状況を、ユーザにディスプレイ等の表示装置を用いて表示するものである。 The display unit 25 displays the operation status of the input auxiliary unit 20 to the user using a display device such as a display.

図３は、表示部２５により表示される画面の内容例について示した説明図である。 FIG. 3 is an explanatory diagram showing an example of the contents of the screen displayed by the display unit 25.

図３に示すように、表示部２５により表示される画面では、入力フィールドＦＩと、表示フィールドＦ０が配置されている。図３に示す表示画面は、例えば、入力補助部２０が有するディスプレイにおいて一部を占める１つのウィンドウとして表示するようにしても良い。 As shown in FIG. 3, on the screen displayed by the display unit 25, an input field FI and a display field F0 are arranged. The display screen shown in FIG. 3 may be displayed, for example, as one window that occupies a part of the display included in the input auxiliary unit 20.

入力フィールドＦＩは、入力補助部２０がリアルタイム入力モードで動作しているときに用いられるフィールドであり、ユーザがキーボードで入力したテキストデータが表示されるフィールドである。 The input field FI is a field used when the input auxiliary unit 20 is operating in the real-time input mode, and is a field in which text data input by the user with the keyboard is displayed.

表示フィールドＦ０は、リアルタイム入力モード及びバッチ入力モードの両方の動作モードで用いられるフィールドである。表示フィールドＦ０に表示される内容の詳細については後述する。なお、表示フィールドＦ０には、入力補助部２０の動作モードに応じた内容が切り替えて表示される。 The display field F0 is a field used in both the real-time input mode and the batch input mode. Details of the contents displayed in the display field F0 will be described later. In the display field F0, the contents corresponding to the operation mode of the input auxiliary unit 20 are switched and displayed.

補完部２３は、リアルタイム入力モード時に機能するものであり、入力フィールドＦＩに入力される内容に応じて、ユーザデータベース３５に登録された重要語から、候補となる重要語（以下、「絞込み候補」という）を表示フィールドＦ０に表示する。そして、リアルタイム入力部２１は、絞込み候補の中からユーザの操作により選択された重要語の入力を受付け、制御文字＃で囲ったその重要語を、生成するテキストデータに挿入する。 The complementing unit 23 functions in the real-time input mode, and from the important words registered in the user database 35 according to the contents input in the input field FI, candidate important words (hereinafter, “narrowing candidates”). Is displayed in the display field F0. Then, the real-time input unit 21 receives an input of an important word selected by a user operation from the narrowing candidates, and inserts the important word surrounded by the control character # into the text data to be generated.

（Ａ−２）第１の実施形態の動作
次に、以上のような構成を有する第１の実施形態の音声合成装置１０の動作を説明する。 (A-2) Operation of First Embodiment Next, the operation of the speech synthesizer 10 of the first embodiment having the above configuration will be described.

以下では、まず、入力補助部２０の動作について説明した後、音声合成部３０の動作について説明する。 In the following, first, the operation of the input auxiliary unit 20 will be described, and then the operation of the speech synthesis unit 30 will be described.

（Ａ−２−１）入力補助部のリアルタイム入力モードにおける動作
図４は、リアルタイム入力モード時の入力補助部２０の動作の例について示したフローチャートである。 (A-2-1) Operation of Input Assistant Unit in Real Time Input Mode FIG. 4 is a flowchart showing an example of the operation of the input assistant unit 20 in the real time input mode.

図５は、入力補助部２０が図４のフローチャートに基づいて動作した場合に、表示部２５により出力される画面遷移の例である。 FIG. 5 is an example of a screen transition output by the display unit 25 when the input auxiliary unit 20 operates based on the flowchart of FIG.

また、入力補助部２０（リアルタイム入力部２１）では、ユーザからキーボードで平仮名（ローマ字入力を含む）が入力されると、その平仮名をユーザの操作に応じて漢字に変換する仮名漢字変換に対応しているものとして説明している。仮名漢字変換に関する機能は、既存のパソコン等における仮名漢字変換と同様のものを適用することができる。 In addition, when the input auxiliary unit 20 (real-time input unit 21) inputs a hiragana character (including romaji input) with a keyboard from the user, the input auxiliary unit 20 corresponds to kana-kanji conversion for converting the hiragana character into a kanji character according to the user's operation. It is described as being. The same functions as kana-kanji conversion in an existing personal computer or the like can be applied to functions relating to kana-kanji conversion.

図４のフローチャートでは、ユーザが「おめでとう」という重要語を入力する場合の例について説明している。 The flowchart in FIG. 4 illustrates an example in which the user inputs the important word “congratulations”.

まず、リアルタイム入力部２１に、ユーザから「お」という文字が入力（図５（ａ）に示すようにユーザからキーボード操作により、「お」という文字が入力フィールドＦＩに入力）されたものとする（Ｓ１０１）。なお、図５においては、ユーザが入力中で未確定の範囲についてはアンダーバーを付して示しており、ユーザの入力が終了した確定済の範囲についてはアンダーバーを付していない。 First, it is assumed that the character “O” is input to the real-time input unit 21 (the character “O” is input to the input field FI by a keyboard operation from the user as shown in FIG. 5A). (S101). In FIG. 5, the range that is being entered by the user and not yet confirmed is indicated with an underbar, and the range that has been entered by the user is not indicated with an underbar.

次に、リアルタイム入力部２１により、ユーザから入力された未確定の文字「お」が、補完部２３へ通知される。そして、補完部２３により、ユーザデータベース３５の内容（表記の項目）が読み込まれ、入力中の１文字「お」を先頭に補完する語が絞込み候補の重要語として抽出される（Ｓ１０２）。ステップＳ１０２では、ユーザデータベース３５の内容が図２に示す内容であるものとすると、「おおさか」、「おおきに」、「おめでとう」、「おおさか（人名）」が絞込み候補の重要語として抽出される。なお、ユーザが未確定の文字として「おめ」と複数文字入力した場合には、「おめ」を先頭に補完する語として「おめでとう」が絞り込み候補として抽出されるようにしても良い。 Next, the uncertain character “o” input from the user is notified to the complementing unit 23 by the real-time input unit 21. Then, the contents (notation item) of the user database 35 are read by the complementing unit 23, and a word that complements the one character “o” being input at the head is extracted as a narrowing candidate important word (S102). In step S102, assuming that the contents of the user database 35 are the contents shown in FIG. 2, "Osaka", "Ookini", "Congratulations", and "Osaka (person name)" are extracted as important words as narrowing candidates. If the user inputs a plurality of characters as “unknown” as unconfirmed characters, “congratulations” may be extracted as a narrowing candidate as a word that complements “om” at the top.

なお、補完部２３では、その都度ユーザデータベース３５の内容を読み込むようにしても良いし、予め、ユーザデータベース３５のうち表記の項目だけを抜き出して保持しておくようにしても良く、その方法は限定されないものとする。 In addition, in the complement part 23, you may make it read the content of the user database 35 each time, and you may make it extract and hold | maintain only the description item in the user database 35 beforehand, The method is It is not limited.

次に、補完部２３により、上述のステップＳ１０２において抽出された絞込み候補の重要語の情報が、表示部２５へ送られ、図５（ｂ）に示すように表示フィールドＦ０に表示される（Ｓ１０３）。 Next, the information on the key words of the narrowing candidates extracted in step S102 described above is sent to the display unit 25 by the complementing unit 23 and displayed on the display field F0 as shown in FIG. 5B (S103). ).

次に、上述のステップＳ１０３で表示された絞込み候補の重要語のうちいずれかが、ユーザに選択（操作信号がリアルタイム入力部２１に入力）されると（Ｓ１０４）、その選択された重要語の情報がユーザにより入力された重要語としてリアルタイム入力部２１において認識される。ステップＳ１０４において、ユーザが絞り込み候補の重要語のうちいずれかを選択する操作としては、例えば、キーボードの矢印キーや、マウス操作により、いずれかの絞込み候補の重要語が選択される操作が行われた場合等が挙げられる。 Next, when any one of the narrow-down candidate key words displayed in step S103 is selected by the user (an operation signal is input to the real-time input unit 21) (S104), the selected key word is selected. Information is recognized by the real-time input unit 21 as an important word input by the user. In step S 104, as an operation for the user to select one of the narrow-down candidate key words, for example, an operation for selecting one of the narrow-down candidate key words is performed by an arrow key on the keyboard or a mouse operation. And the like.

図５（ｂ）の例では、ユーザによりキーボードの矢印キーにより上下に動作するカーソルを用いて、重要語が選択される例について示している。さらに、図５（ｂ）の例では、カーソルをいずれかの重要語に合わせた状態（図５（ｂ）では「おめでとう」に四角形のカーソルが合わされている）で、選択する重要語を決定する操作（例えば、エンターキー等の操作）を行うと、リアルタイム入力部２１によりその重要語がユーザから入力されたものとして処理される。 In the example of FIG. 5B, an example is shown in which an important word is selected by a user using a cursor that moves up and down with the arrow keys of the keyboard. Further, in the example of FIG. 5B, the important word to be selected is determined in a state where the cursor is set to any one of the important words (in FIG. 5B, the square cursor is set to “congratulations”). When an operation (for example, an operation of an enter key or the like) is performed, the real-time input unit 21 processes the key word as input from the user.

上述のステップＳ１０４においては、図５（ｂ）に示すように、ユーザにより、絞込み候補の重要語から、「おめでとう」が選択されたものとする。 In step S104 described above, as shown in FIG. 5B, it is assumed that “congratulations” is selected by the user from the key words of the narrowing candidates.

そして、ユーザにより絞込み候補の重要語からいずれかが選択されると、図５（ｃ）に示すように、ユーザの入力文字として、「おめでとう」に制御文字＃が付された「＃おめでとう＃」が、入力フィールドＦＩに表示され（Ｓ１０５）、リアルタイム入力部２１では、次の文字を入力することが可能な状態となる。 Then, when any one of the important words as candidates for narrowing is selected by the user, as shown in FIG. 5C, “# Congratulations #” in which the control character # is added to “Congratulations” as the user input character. Is displayed in the input field FI (S105), and the real-time input unit 21 is ready to input the next character.

次に、入力補助部２０がリアルタイム入力モードであり、ユーザが入力したいテキストデータが、「合格おめでとうだね。おおさかｘｙｚ株式会社に来る？」である場合について説明する。 Next, the case where the input auxiliary unit 20 is in the real-time input mode and the text data that the user wants to input is “Congratulations on passing. Come to Osaka xyz corporation?” Will be described.

まず、リアルタイム入力部２１（入力フィールドＦＩ）において、ユーザから「合格（ごうかく）」の「ご」の一文字が入力されたものとすると、図２に示す通り、ユーザデータベース３５には「ご」を先頭とする重要語は登録されていないので、絞り込み候補はなしとなる。そのため、ユーザはリアルタイム入力部２１（入力フィールドＦＩ）に、「合格」と入力する。 First, in the real-time input unit 21 (input field FI), assuming that one character “GO” of “PASS” is input from the user, as shown in FIG. Since no important word starting with is registered, there are no narrowing candidates. Therefore, the user inputs “pass” to the real-time input unit 21 (input field FI).

次に、リアルタイム入力部２１に「おめでとう」の「お」の一文字が入力されると、上述の図５の例と同様に、先頭文字が「お」である語の絞り込み候補が表示フィールドＦ０に表示される。そして、ユーザが、絞り込み候補から「おめでとう」を選択すると、リアルタイム入力部２１（入力フィールドＦＩ）に、「合格＃おめでとう＃」と表示され、次の文字を入力する可能な状態となる。 Next, when a single character “O” of “Congratulations” is input to the real-time input unit 21, a narrowing-down candidate of a word whose first character is “O” is displayed in the display field F0 as in the example of FIG. Is displayed. When the user selects “congratulations” from the narrowing candidates, “pass #congratulations #” is displayed on the real-time input unit 21 (input field FI), and the next character can be input.

以下、同様な流れで入力していくと、最終的に、「合格＃おめでとう＃だね。＃おおさか＃＃ｘｙｚ株式会社＃に来る？」といったテキストデータがリアルタイム入力部２１（入力フィールドＦＩ）に表示される。 Thereafter, when inputting in the same flow, finally, text data such as “Pass # Congratulations #. #Osaka ## Come to xyz Co., Ltd. #?” In the real-time input unit 21 (input field FI). Is displayed.

最後まで入力が終了し、ユーザの操作により、リアルタイム入力部２１に、入力フィールドＦＩに表示されたテキストデータの合成音声を出力する旨の信号が入力（例えば、所定のキーボード操作や、マウスにより所定のボタンが押下された場合）されると、リアルタイム入力部２１から音声合成部３０（テキスト分割部３１）に、そのテキストデータが与えられる。 The input is completed to the end, and a signal indicating that the synthesized voice of the text data displayed in the input field FI is output is input to the real-time input unit 21 by the user's operation (for example, a predetermined keyboard operation or a predetermined value by a mouse). When the button is pressed), the text data is given from the real-time input unit 21 to the speech synthesis unit 30 (text division unit 31).

ここで、重要語の文字が連続して続いた場合、「＃おおさか＃＃ＸＹＺ株式会社＃」のように、「＃＃」が連続した形になる。ただし、この連続した記号を、他の記号に置き換えてもよい。 Here, when the characters of the important word continue, “##” becomes a continuous form like “#Osaka ## XYZ Corporation #”. However, these consecutive symbols may be replaced with other symbols.

（Ａ−２−２）入力補助部のバッチ入力モードにおける動作
図６は、バッチ入力モード時の入力補助部２０の動作の例について示したフローチャートである。 (A-2-2) Operation of Input Assistant Unit in Batch Input Mode FIG. 6 is a flowchart showing an example of operation of the input assistant unit 20 in the batch input mode.

図７は、入力補助部２０が図６のフローチャートに基づいて動作した場合に、表示部２５により出力される画面遷移の例である。 FIG. 7 is an example of a screen transition output by the display unit 25 when the input auxiliary unit 20 operates based on the flowchart of FIG.

図８は、バッチ入力モードで入力されるテキストファイルの内容の例について示した説明図である。 FIG. 8 is an explanatory diagram showing an example of the contents of a text file input in the batch input mode.

入力補助部２０では、バッチ入力モードで動作時に、ユーザにより、バッチ入力部２２に、図８に示すテキストファイルが入力されると（Ｓ２０１）、その内容が、表示部２５へ送られ、表示部２５により、図７（ａ）に示すように、表示フィールドＦ０に行ごとに表示される（Ｓ２０２）。 In the input auxiliary unit 20, when the user inputs the text file shown in FIG. 8 to the batch input unit 22 when operating in the batch input mode (S201), the contents are sent to the display unit 25, and the display unit 25, each line is displayed in the display field F0 as shown in FIG. 7A (S202).

そして、表示フィールドＦ０に表示された、いずれかの行のテキストデータが、ユーザにより選択される操作信号がバッチ入力部２２に入力されると（Ｓ２０３）、その選択された行のテキストデータが、バッチ入力部２２から音声合成部３０（テキスト分割部３１）に与えられ、音声合成が開始される（Ｓ２０４）。 Then, when an operation signal selected by the user from any line of text data displayed in the display field F0 is input to the batch input unit 22 (S203), the text data of the selected line is It is given from the batch input unit 22 to the speech synthesis unit 30 (text division unit 31), and speech synthesis is started (S204).

ステップＳ２０３において、表示フィールドＦ０に表示された、いずれかの行のテキストデータを、ユーザが選択する操作としては、例えば、キーボードの矢印キーや、マウス操作により、いずれかの絞込み候補の重要語が選択する操作が挙げられる。 In step S203, as an operation for the user to select text data in any row displayed in the display field F0, for example, the key word of any narrowing candidate is selected by an arrow key on the keyboard or a mouse operation. The operation to select is mentioned.

図７の例では、ユーザによるキーボードの矢印キー操作で、上下に動作するカーソルを用いて、いずれかの行のテキストデータが選択される例について示している。さらに、カーソルをいずれかの行に合わせた状態（図７（ｂ）では「あっという間に過ぎましたね」の行に四角形のカーソルが合わされている）で、選択する行を決定する操作（例えば、エンターキー等の操作）を行うと、バッチ入力部２２によりその行が選択（及び決定）されたものとして処理される。 In the example of FIG. 7, an example is shown in which text data in any row is selected using a cursor that moves up and down by a user's keyboard arrow key operation. Further, in a state where the cursor is positioned on any line (in FIG. 7B, a square cursor is positioned on the line “It passed in no time”), an operation for determining a line to be selected (for example, , The operation of the enter key or the like), the batch input unit 22 processes the line as selected (and determined).

（Ａ−２−３）入力補助部の表示切替部の動作
次に、表示切替部２４による入力補助部２０の動作モードの切替について説明する。 (A-2-3) Operation of the display switching unit of the input auxiliary unit Next, switching of the operation mode of the input auxiliary unit 20 by the display switching unit 24 will be described.

上述のように、表示切替部２４は、ユーザの操作に応じて、入力補助部２０の動作モードを、リアルタイム入力モード又はバッチ入力モードに切り替える。 As described above, the display switching unit 24 switches the operation mode of the input auxiliary unit 20 to the real-time input mode or the batch input mode in accordance with a user operation.

図９は、表示切替部２４による動作モード切替の動作について示した説明図である。 FIG. 9 is an explanatory diagram showing an operation mode switching operation by the display switching unit 24.

図９（ａ）は、入力補助部２０がリアルタイム入力モードである場合の表示部２５の表示画面の内容例である。そして、図９（ｂ）は、入力補助部２０がバッチ入力モードである場合の表示部２５の表示画面の内容例である。 FIG. 9A is an example of the contents of the display screen of the display unit 25 when the input auxiliary unit 20 is in the real-time input mode. FIG. 9B shows an example of the contents of the display screen of the display unit 25 when the input auxiliary unit 20 is in the batch input mode.

例えば、表示切替部２４が、ユーザの操作を契機に、入力補助部２０を、バッチ入力モードからリアルタイム入力モードに切り替える場合には、バッチ入力部２２の機能を無効にしてリアルタイム入力部２１の機能を有効にし、表示部２５に表示させる内容を、図９（ａ）に示すようにリアルタイム入力部２１からの情報に切り替えさせる。 For example, when the display switching unit 24 switches the input auxiliary unit 20 from the batch input mode to the real-time input mode in response to a user operation, the function of the real-time input unit 21 is disabled by disabling the function of the batch input unit 22. Is enabled, and the content displayed on the display unit 25 is switched to the information from the real-time input unit 21 as shown in FIG.

一方、表示切替部２４が、ユーザの操作を契機に、入力補助部２０を、リアルタイム入力部２１からバッチ入力部２２に切り替える場合には、リアルタイム入力部２１の機能を無効してバッチ入力部２２の機能を有効にし、表示部２５に表示させる内容を、図９（ｂ）に示すようにバッチ入力部２２からの情報に切り替えさせる。 On the other hand, when the display switching unit 24 switches the input auxiliary unit 20 from the real-time input unit 21 to the batch input unit 22 in response to a user operation, the function of the real-time input unit 21 is invalidated and the batch input unit 22 is disabled. Is enabled, and the content displayed on the display unit 25 is switched to the information from the batch input unit 22 as shown in FIG. 9B.

（Ａ−２−４）音声合成部の動作
次に、音声合成部３０の動作について説明する。 (A-2-4) Operation of Speech Synthesizer Next, the operation of the speech synthesizer 30 will be described.

図１０は、入力補助部２０から与えられたテキストデータを音声合成部３０で処理する際の動作について示した説明図である。 FIG. 10 is an explanatory diagram showing an operation when the speech synthesizer 30 processes the text data given from the input auxiliary unit 20.

図１０では、入力補助部２０から音声合成部３０に与えられたテキストデータが、「合格＃おめでとう＃だね。＃おおさか＃＃ｘｙｚ株式会社＃に来る？」という内容であった場合の処理について説明している。 In FIG. 10, the processing in the case where the text data given from the input assistant 20 to the speech synthesizer 30 is “Pass # Congratulations #. Explains.

入力補助部２０から、音声合成部３０に、「合格＃おめでとう＃だね。＃おおさか＃＃ｘｙｚ株式会社＃に来る？」という内容のテキストデータが与えられると、まず、テキスト分割部３１により、制御文字「＃」を区切り文字として、そのテキストデータの内容が分割され。図１０に示すように、上述のテキストデータは、テキスト分割部３１により、「合格」「おめでとう」「だね。」「おおさか」「ｘｙｚ株式会社」「に来る？」というデータに分割され、音声合成処理部３２に与えられる。 When the text data of the content “Pass # Congratulations # Congratulations # # Are you coming to ## xyz Co., Ltd. #?” Is given to the speech synthesis unit 30 from the input auxiliary unit 20, first, the text dividing unit 31 Using the control character “#” as a delimiter, the text data content is divided. As shown in FIG. 10, the text data is divided into data “accepted”, “congratulations”, “dane.”, “Osaka”, “xyz corporation”, “come? This is given to the composition processing unit 32.

そして、音声合成処理部３２では、上記のテキスト分割部３１で分割されたそれぞれの分割テキストデータに対して、重要語に関しては、ユーザデータベース３５から該当する実音声データを選択して読み込み、音声合成すべき範囲に対しては、分割テキストデータ毎に音声合成処理を行い、実音声データ及び合成音声のデータを音声結合部３３に与える。 Then, the speech synthesis processing unit 32 selects and reads the corresponding actual speech data from the user database 35 with respect to each divided text data segmented by the text segmentation unit 31 and imports the speech synthesis. For the range to be processed, speech synthesis processing is performed for each divided text data, and real speech data and synthesized speech data are provided to the speech combining unit 33.

音声合成処理部３２では、ユーザ固有で用いられる重要語（制御文字＃で囲われていた範囲）に対しては、ユーザデータベース３５内に実音声データが存在する重要語の表記と完全一致するかを判定し、一致した場合、対応する実音声データをユーザデータベース３５から読み込むようにしても良い。上述のように、音声合成処理部３２において、重要語の登録内容とユーザデータベース３５内の登録内容との表記が完全一致しているかどうかの二重判定を行うことは、リアルタイム入力部２１での制御文字「＃」内の文字変更の可能性を考慮しているためである。 In the speech synthesis processing unit 32, whether the important words used in the user (the range enclosed by the control character #) completely match the notation of the important words in which the actual speech data exists in the user database 35. And the corresponding actual voice data may be read from the user database 35. As described above, in the speech synthesis processing unit 32, the double determination of whether the notation of the registered content of the important word and the registered content in the user database 35 is completely the same in the real-time input unit 21 This is because the possibility of changing the character in the control character “#” is taken into consideration.

音声合成処理部３２では、例えば、上述の分割テキストデータの「合格」といった部分に対しては、重要語ではないため、音声合成処理を行い、合成音声を作成する。一方、音声合成処理部３２は、上述の分割テキストデータの「おめでとう」といった部分は、重要語であるため、「おめでとう」に対応した実音声データ（音声ファイル「Ａ００６」）が選択される。 In the speech synthesis processing unit 32, for example, a portion such as “pass” in the above-described divided text data is not an important word, and therefore speech synthesis processing is performed to generate synthesized speech. On the other hand, the speech synthesis processing unit 32 selects the actual speech data (speech file “A006”) corresponding to “congratulations” because the portion “congratulations” of the above-described divided text data is an important word.

そして、音声合成処理部３２で上述の分割テキストデータの処理を、同様に行うと、「だね。」「に来る？」の文字に対しては、音声合成処理を行う。「おおさか」「ｘｙｚ株式会社」の文字に対しては、対応する実音声データ（音声ファイルＡ００３、Ａ００４）が選択される。 When the above-described divided text data processing is performed in the same manner by the speech synthesis processing unit 32, speech synthesis processing is performed for the characters “Dane.” Corresponding real voice data (voice files A003, A004) is selected for the characters “Osaka” and “xyz Inc.”.

また、音声合成処理部３２では、入力補助部２０から入力されたテキストデータで、制御文字＃で囲われていなかった文字についても、ユーザデータベース３５から検索して一致するものがあれば、音声合成を行わずに実音声データを用いるようにしても良い。 In the speech synthesis processing unit 32, the text data input from the input auxiliary unit 20 that is not enclosed in the control character # is searched for from the user database 35, and if there is a match, the speech synthesis is performed. Actual voice data may be used without performing the above.

音声結合部３３は、音声合成処理部３２から与えられた、合成音声のデータと、実音声データとを、入力補助部２０から与えられたテキストデータの内容と一致する並びで結合する。 The speech combining unit 33 combines the synthesized speech data provided from the speech synthesis processing unit 32 and the actual speech data in a sequence that matches the content of the text data provided from the input auxiliary unit 20.

例えば、音声結合部３３は、「合格」（合成音声）、「おめでとう」（音声ファイルＡ００６）、「だね。」（合成音声）、「おおさか」（音声ファイルＡ００３）、「ｘｙｚ株式会社」（音声ファイルＡ００４）、「に来る？」（合成音声）の順番につなげて、一つの音声データを生成する。 For example, the voice combining unit 33 may select “pass” (synthesized speech), “congratulations” (speech file A006), “Dane.” (Synthesized speech), “Osaka” (speech file A003), “xyz corporation” ( One voice data is generated by connecting the voice file A004) and “coming in?” (Synthesized voice) in this order.

なお、ユーザデータベース３５に登録された実音声データは、発声区間の前後に数十ｍｓの無音を付加し、音声結合部３３ではで単純に音声を結合しても接続劣化の影響を除かれることが望ましい。また、音声結合部３３では、句点「。」はあらかじめ設定された長さの無音が挿入されるようにしても良い。 The actual voice data registered in the user database 35 adds silence of several tens of ms before and after the utterance section, and the voice coupling unit 33 can remove the influence of connection deterioration even if voices are simply combined. Is desirable. In addition, the voice coupling unit 33 may insert silence of a preset length for the phrase “.”.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of First Embodiment According to the first embodiment, the following effects can be achieved.

音声合成装置１０では、ユーザデータベース３５を備え、重要語については、実音声データを出力するようにしているので、例えば、方言や固有名詞等、従来の合成音声だけでは再現が難しかったり、不自然な音声となってしまうテキストデータについても、自然な発音の音声を生成し、生成される音声の品質を向上させることができる。 The speech synthesizer 10 includes a user database 35 and outputs real speech data for important words. For example, dialects and proper nouns are difficult to reproduce with only conventional synthesized speech, or are unnatural. With respect to text data that becomes distorted speech, it is possible to generate speech with natural pronunciation and improve the quality of the generated speech.

また、入力補助部２０では、ユーザがユーザデータベース３５に登録した重要語を絞り込み候補として表示出力しているので、ユーザが登録した重要語を把握しやすくすることができる。入力補助部２０において、絞り込み候補を表示出力しない場合、ユーザがどのような重要語を登録したのか忘れてしまったり、登録したユーザと入力するユーザが異なる場合であっても、ユーザが重要語を入力することを容易にしている。すなわち、入力補助部２０では、ユーザが任意の文字を入力する行為の延長線上で、ユーザ固有で用いられる重要語の入力を補助させることができる。 In addition, since the input auxiliary unit 20 displays and outputs the important words registered by the user in the user database 35 as narrowing candidates, it is possible to easily grasp the important words registered by the user. When the input assistant 20 does not display and output the narrowing candidates, even if the user has forgotten what important words are registered, or even if the registered user and the input user are different, the user can select the important words. Making it easy to type. That is, the input assisting unit 20 can assist the input of important words used by the user on the extension of the act of the user inputting an arbitrary character.

さらに、入力補助部２０では、テキストデータの入力時に、入力文字を含む重要語を明示的に入力することが可能となり、出力される音声内で、重要語に対応する実音声の品質や、音声に含まれる感情等も効果的に伝達することが可能であり、さらに、合成音声であるといった任意の文字にも対応できるという効果を奏する。 Further, the input auxiliary unit 20 can explicitly input important words including input characters when inputting text data. In the output voice, the quality of the actual voice corresponding to the important words and the voice It is possible to effectively transmit emotions and the like included in the text, and further, it is possible to deal with arbitrary characters such as synthesized speech.

（Ｂ）第２の実施形態
以下、本発明による入力補助装置、入力補助プログラム、音声合成装置及び音声合成プログラムの第２の実施形態を、図面を参照しながら詳述する。なお、第２の実施形態の入力補助装置は、入力補助部である。 (B) Second Embodiment Hereinafter, a second embodiment of the input assistance device, the input assistance program, the speech synthesis device, and the speech synthesis program according to the present invention will be described in detail with reference to the drawings. In addition, the input assistance apparatus of 2nd Embodiment is an input assistance part.

（Ｂ−１）第２の実施形態の構成
図１１は、第２の実施形態の実施形態の音声合成装置１０Ａの全体構成を示すブロック図であり、上述した図１との同一、対応部分には同一、対応符号を付して示している。 (B-1) Configuration of the Second Embodiment FIG. 11 is a block diagram showing the overall configuration of the speech synthesizer 10A of the second embodiment, and the same and corresponding parts as in FIG. Are indicated by the same reference numerals.

以下、第２の実施形態の音声合成装置１０Ａについて、第１の実施形態との差異について説明する。 Hereinafter, the difference between the speech synthesis apparatus 10A of the second embodiment and the first embodiment will be described.

音声合成装置１０Ａは、入力補助部２０Ａ及び音声合成部３０を有しているが、音声合成部３０については、第１の実施形態と同様のものであるので詳しい説明を省略する。 The speech synthesizer 10A includes an input auxiliary unit 20A and a speech synthesizer 30. However, the speech synthesizer 30 is the same as that in the first embodiment, and thus detailed description thereof is omitted.

入力補助部２０Ａは、バッチ入力部２２Ａ、補完部２３、表示切替部２４、表示部２５、選択位置記憶部２６を有している。補完部２３、表示切替部２４、表示部２５については、第１の実施形態と同様のものであるので詳しい説明は省略する。 The input auxiliary unit 20A includes a batch input unit 22A, a complementing unit 23, a display switching unit 24, a display unit 25, and a selection position storage unit 26. About the complement part 23, the display switching part 24, and the display part 25, since it is the same as that of 1st Embodiment, detailed description is abbreviate | omitted.

バッチ入力部２２Ａは、テキストファイル内のテキストデータのうち、ユーザの操作に応じたタイミングで、ユーザの操作に応じて選択された行のテキストデータを、音声合成部３０に与えるが、最後にユーザの操作に応じて選択された行の情報（例えば、何行目であるか等の情報）（以下、「選択位置情報」という）を、選択位置記憶部２６に記憶させる。 The batch input unit 22A gives the text data of the line selected according to the user operation to the speech synthesizer 30 at the timing according to the user operation among the text data in the text file. The selected position information is stored in the selected position storage unit 26 (for example, information such as the number of lines) (hereinafter referred to as “selected position information”).

そして、バッチ入力部２２Ａは、バッチ入力モードからリアルタイム入力モードに変わり、さらにバッチ入力モードに切り替わった時に、選択位置記憶部２６に記憶された選択位置情報を読み込み、選択位置情報に該当する行のテキストデータが選択された状態（例えば、該当する行のテキストデータをハイライトさせたり四角で囲んだりするなどして強調表示する）の画面を、表示フィールドＦ０に表示させるように表示部２５を制御する。 Then, when the batch input unit 22A changes from the batch input mode to the real-time input mode and further switches to the batch input mode, the batch input unit 22A reads the selected position information stored in the selected position storage unit 26, and reads the line corresponding to the selected position information. The display unit 25 is controlled to display the screen in a state where the text data is selected (for example, highlighting the text data of the corresponding line by highlighting or enclosing it with a square) in the display field F0. To do.

（Ｂ−２）第２の実施形態の動作
次に、以上のような構成を有する第２の実施形態の音声合成装置１０Ａの動作を説明する。 (B-2) Operation of Second Embodiment Next, the operation of the speech synthesizer 10A of the second embodiment having the above configuration will be described.

以下では、第１の実施形態との差異である、バッチ入力部２２Ａ及び選択位置記憶部２６に係る動作についてのみ説明する。 Hereinafter, only the operations relating to the batch input unit 22A and the selection position storage unit 26, which are the differences from the first embodiment, will be described.

図１２は、入力補助部２０Ａにおいて、動作モードが切り替わった場合の表示部２５による表示画面の遷移について示した説明図である。 FIG. 12 is an explanatory diagram showing transition of the display screen by the display unit 25 when the operation mode is switched in the input auxiliary unit 20A.

図１２では、入力補助部２０Ａがバッチ入力モードにおいて、表示フィールドＦ０で、最後にユーザの操作に応じて選択された行のテキストデータの内容を、四角のカーソルで囲って強調表示している。 In FIG. 12, in the batch input mode, the input auxiliary unit 20A highlights the text data content of the line selected last according to the user operation in the display field F0 by surrounding it with a square cursor.

まず、入力補助部２０Ａがバッチ入力モードで動作しており、表示部２５による表示画面が図１２（ａ）の状態となっている場合を想定する。図１２（ａ）の状態では、「あっという間にすぎましたね」という行が、最後にユーザの操作に応じて選択された行として表示されている。このとき、バッチ入力部２２Ａは、当該行の位置情報を選択位置情報として、選択位置記憶部２６に記憶させている。 First, it is assumed that the input auxiliary unit 20A is operating in the batch input mode and the display screen by the display unit 25 is in the state of FIG. In the state of FIG. 12A, the line “It was too fast” is displayed as the line selected last in response to the user's operation. At this time, the batch input unit 22A stores the position information of the row in the selected position storage unit 26 as the selected position information.

そして、表示部２５による表示画面が図１２（ａ）の状態で、入力補助部２０Ａの動作モードがリアルタイム入力モードに切り替わると、表示フィールドＦ０の入力テキストファイルの内容は消えて図１２（ｂ）の状態に遷移する。 When the display screen of the display unit 25 is in the state of FIG. 12A and the operation mode of the input auxiliary unit 20A is switched to the real-time input mode, the contents of the input text file in the display field F0 disappear and FIG. 12B. Transition to the state.

その後、入力補助部２０Ａの動作モードがバッチ入力モードに切り替わると、バッチ入力部２２Ａは、選択位置記憶部２６に記憶させた選択位置情報に基づいて、図１２（ｃ）に示すように、表示部２５による表示画面を前回バッチ入力モードからリアルタイム入力モードに切り替わる直前の状態となるため、先頭の行が選択された状態ではなく、「あっという間にすぎましたね」という行が選択された状態として表示される。 Thereafter, when the operation mode of the input auxiliary unit 20A is switched to the batch input mode, the batch input unit 22A displays the display based on the selected position information stored in the selected position storage unit 26 as shown in FIG. Since the display screen by the unit 25 is in the state immediately before the previous batch input mode is switched to the real-time input mode, the state where the line “It was too fast” is selected instead of the state where the top line is selected Is displayed.

（Ｂ−３）第２の実施形態の効果
第２の実施形態によれば、以下のような効果を奏することができる。 (B-3) Effects of Second Embodiment According to the second embodiment, the following effects can be achieved.

音声合成部３０Ａでは、選択位置情報を記憶する選択位置記憶部２６を備え、バッチ入力部２２Ａが記憶された選択位置情報に基づいて、表示部２５に表示される内容を制御することにより、一旦リアルタイム入力モードに切り替わってバッチ入力モードに戻った場合に、テキストファイルのサイズが大きい場合でも、切り替わり前に指定していた行を探す手間を省略することができ、ユーザの操作を容易にすることができる。 The speech synthesis unit 30A includes a selection position storage unit 26 that stores selection position information, and the batch input unit 22A controls the content displayed on the display unit 25 based on the stored selection position information. When switching to real-time input mode and returning to batch input mode, even if the size of the text file is large, it is possible to save the trouble of searching for the line that was specified before switching, and to facilitate user operations Can do.

（Ｃ）第３の実施形態
以下、本発明による入力補助装置、入力補助プログラム、音声合成装置及び音声合成プログラムの第３の実施形態を、図面を参照しながら詳述する。なお、第３の実施形態の入力補助装置は、入力補助部である。 (C) Third Embodiment Hereinafter, a third embodiment of the input assistance device, the input assistance program, the speech synthesis device, and the speech synthesis program according to the present invention will be described in detail with reference to the drawings. In addition, the input assistance apparatus of 3rd Embodiment is an input assistance part.

第３の実施形態の音声合成装置１０Ｂの全体構成も図１を用いて示すことができる。なお、図１において括弧内の符号は、第３の実施形態においてのみ用いられる符号である。 The overall configuration of the speech synthesizer 10B of the third embodiment can also be shown using FIG. In FIG. 1, the reference numerals in parentheses are used only in the third embodiment.

以下、第３の実施形態の音声合成装置１０Ｂについて、第１の実施形態との差異について説明する。 Hereinafter, the difference between the speech synthesizer 10B of the third embodiment and the first embodiment will be described.

音声合成装置１０Ｂは、入力補助部２０及び音声合成部３０Ｂを有しているが、入力補助部２０については、第１の実施形態と同様のものであるので詳しい説明を省略する。 The speech synthesizer 10B includes the input assistant 20 and the speech synthesizer 30B, but the input assistant 20 is the same as that of the first embodiment, and thus detailed description thereof is omitted.

音声合成部３０Ｂは、第１の実施形態の音声合成部３０の、音声結合部３３が音声結合部３３Ｂに置き換わっただけであるので、その他の構成については説明を省略する。 The speech synthesizer 30B is the same as the speech synthesizer 30 of the first embodiment except that the speech combining unit 33 is replaced with the speech combining unit 33B.

音声合成部３０Ｂでは、入力補助部２０から音声合成部３０Ｂに与えられるテキストデータを読み上げる際の、「間」等を定義する制御文字（以下、「読上げ制御文字」という）が適用され、音声結合部３３Ｂでは、その読上げ制御文字に応じた処理を行う。 In the speech synthesizer 30B, control characters (hereinafter referred to as “speech control characters”) that define “between” and the like when the text data given from the input assisting unit 20 to the speech synthesizer 30B is read out are applied to the speech combination. The unit 33B performs processing according to the reading control character.

例えば、音声合成装置１０Ｂでは、テキストデータにおいて、通常の読点「、」と、間の長さを変えた新たな読点「、、」「、、、」を適用するものとし、間の長さは、「、」が０．５秒、「、、」が１．０秒、「、、、」が３．０秒と、音声結合部３３Ｂに設定しておくものとする。ただし、読上げ制御文字は、上述のものに限るものではない。さらに、各々の読上げ制御文字に対応する間の長さは、ユーザによって調整できるものとする。 For example, in the speech synthesizer 10B, in text data, a normal reading point “,” and a new reading point “,,” “,,” with different lengths are applied. , "," Is set to 0.5 seconds, ",," is set to 1.0 seconds, and ",," is set to 3.0 seconds in the voice coupling unit 33B. However, the reading control characters are not limited to those described above. Furthermore, it is assumed that the length between corresponding to each reading control character can be adjusted by the user.

図１３では、入力補助部２０から音声結合部３３Ｂに与えられたテキストデータが、「合格＃、、おめでとう＃だね。＃おおさか＃、＃ｘｙｚ株式会社＃、、、に来る？」という内容であった場合の音声合成部３０Ｂの処理について説明している。 In FIG. 13, the text data given from the input auxiliary unit 20 to the voice coupling unit 33 B is “pass #, congratulations #. #Osaka #, #xyz Inc. #, come to?” The process of the speech synthesizer 30B when there is a case will be described.

図１３では、テキスト分割部３１、音声合成処理部３２の処理については、第１の実施形態と同様であるため説明を省略する。 In FIG. 13, the processes of the text dividing unit 31 and the speech synthesis processing unit 32 are the same as those in the first embodiment, and thus description thereof is omitted.

そして、音声結合部３３Ｂは、音声合成処理部３２で合成された音声データや実音声データを結合する際に、テキストデータに挿入された読上げ制御文字の場所に、その読上げ制御文字の内容に応じた「間」（無音区間）を挿入する。 Then, when combining the speech data synthesized by the speech synthesis processing unit 32 or the actual speech data, the speech combining unit 33B corresponds to the location of the reading control character inserted in the text data according to the content of the reading control character. Insert “between” (silent intervals).

ここでは、図１３に示すように、音声結合部３３Ｂは、音声を結合する際に、「合格」と「おめでとう」の間に、読上げ制御文字「、、」に対応する１．０秒の無音を挿入する。また、音声結合部３３Ｂは、「おおさか」と、「ｘｙｚ株式会社」との間に、０．５秒の無音を挿入する。さらに、音声結合部３３Ｂは、「ｘｙｚ株式会社」と「に来る？」との間に３．０秒の無音を挿入する。ただし、音声結合部３３Ｂは、句点「。」もあらかじめ設定された長さの無音が挿入されるようにしても良い。 Here, as shown in FIG. 13, when combining voices, the voice combining unit 33 B is silent for 1.0 second corresponding to the reading control characters “,,” between “pass” and “congratulations”. Insert. Further, the voice coupling unit 33B inserts 0.5 seconds of silence between “Osaka” and “xyz Inc.”. Furthermore, the voice coupling unit 33B inserts 3.0 seconds of silence between “xyz corporation” and “come to?”. However, the voice coupling unit 33B may insert silence having a predetermined length as the phrase “.”.

音声の出力において、間の長さは重要であるため、上述のように、音声合成部３０Ｂにおいて読上げ制御文字を適用ことにより、この「間」をユーザによって自由に設定することができ、かつ、合成音声と、実音声との組合せによって、発話の意図を効果的に伝達することができる。 Since the length of the gap is important in the output of the voice, as described above, the “between” can be freely set by the user by applying the reading control character in the voice synthesizer 30B, and The intention of the utterance can be effectively transmitted by the combination of the synthesized speech and the actual speech.

（Ｄ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (D) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.

（Ｄ−１）上記の各実施形態で、音声合成装置において、テキストデータ（又は、テキストファイル）で重要語を区別する方法として、制御文字＃を用いたが、逆に重要語ではない範囲を明示的に区別する制御文字（以下、「非重要語制御文字」という）を適用するようにしても良い。 (D-1) In each of the embodiments described above, the control character # is used as a method for distinguishing important words from text data (or text file) in the speech synthesizer. Control characters that are explicitly distinguished (hereinafter referred to as “non-important word control characters”) may be applied.

例えば、非重要語制御文字として％を用いるものとすると、音声合成処理部３２では、％で囲まれた範囲については、重要語としては取り扱わず、合成音声用ＤＢ３４のデータを用いて合成音声を生成する。 For example, assuming that% is used as the non-important word control character, the speech synthesis processing unit 32 does not handle the range surrounded by% as an important word, and uses the synthesized speech DB 34 to synthesize synthesized speech. Generate.

これは、例えば、音声合成処理部３２が、制御文字＃で囲われていない範囲についても、ユーザデータベース３５の内容を参照して、重要語を抽出し、実音声データを適用する処理を行う場合には、明示的に非重要語制御文字％で囲うことにより、実音声データの適用を避けることができる。例えば、ユーザデータベース３５に方言で発音した実音声データが入っていた場合に、方言を用いた音声を出力したくない場合に、非重要語制御文字を用いるようにしても良い。 For example, when the speech synthesis processing unit 32 performs processing for extracting an important word and applying real speech data with reference to the content of the user database 35 even in a range not surrounded by the control character #. Can be avoided by explicitly enclosing it in non-important word control characters%. For example, if the user database 35 contains real voice data pronounced in a dialect, non-important word control characters may be used when it is not desired to output a voice using a dialect.

また例えば、一旦重要語として入力された部分について、ユーザが重要語としての取り扱いを解除する操作を行った場合に、制御文字＃を非重要制御文字％に置き換えるようにしても良い。例えば、ユーザが、一旦重要語を選択して「＃おおさか＃」と入力したものについて、重要語としての取り扱いを解除する操作を行った場合（例えば、所定のキー操作等を行った場合）に、入力フィールドＦＩの表示を、「＃おおさか＃」から「％おおさか％」に置き換えるようにしても良い。このように、ユーザにより、非重要語制御文字を用いた入力を行わせるようにしても良い。 In addition, for example, when the user performs an operation for canceling handling as an important word for a part once input as an important word, the control character # may be replaced with a non-important control character%. For example, when the user selects an important word and inputs “#Osaka #”, the user performs an operation to cancel the handling as the important word (for example, when a predetermined key operation or the like is performed). The display of the input field FI may be replaced from “# Osaka #” to “% Osaka%”. In this way, the user may be allowed to input using non-important word control characters.

また、表示切替部２４において、非重要語制御文字％を表示するか否かを切り替える動作モードを備え、ユーザの操作に応じて、その動作モードを切り替えるようにしても良い。 The display switching unit 24 may include an operation mode for switching whether or not to display the non-important word control character%, and the operation mode may be switched according to a user operation.

これにより、合成音声と実音声データとの組合せをユーザ自身で自由にカスタマイズすることができたり、ユーザが操作の履歴を把握すること等が可能になる。 As a result, the combination of the synthesized speech and the actual speech data can be freely customized by the user, or the user can grasp the history of operation.

（Ｄ−２）上記の各実施形態において、入力補助部のリアルタイム入力部は、音声合成部に与えるテキストデータを生成するものとして説明したが、バッチ入力部に入力するテキストデータを行ごとに生成する編集ツールとして用いるようにしても良い。 (D-2) In each of the above embodiments, the real-time input unit of the input auxiliary unit has been described as generating text data to be given to the speech synthesizer. However, the text data to be input to the batch input unit is generated for each line. It may be used as an editing tool.

（Ｄ−３）上記の各実施形態では、表示部において、ユーザに情報を出力するフィールドとして表示フィールドＦ０を設けて、リアルタイム入力モード時とバッチ入力モード時で共用しているが、動作モードごとにそれぞれ表示フィールドを設けるようにしても良い。 (D-3) In each of the above embodiments, the display unit provides the display field F0 as a field for outputting information to the user in the display unit, and is shared between the real-time input mode and the batch input mode. Each may be provided with a display field.

ただし、それぞれに表示フィールドを設けると、音声合成装置が備えるディスプレイにおいて、表示部が占有する領域が大きくなってしまうため、上記の各実施形態のように、動作モード間で、表示フィールドを共用することによりその占有領域を低減することができる。また、上記の各実施形態のように、動作モード間で、表示フィールドを共用することにより、ユーザは、動作モードが切り替わる度に、操作及び確認するフィールドの切替を行う必要が無く、ユーザの操作を容易にすることができる。 However, if each display field is provided, the area occupied by the display unit increases in the display provided in the speech synthesizer. Therefore, the display field is shared between the operation modes as in the above embodiments. Thus, the occupied area can be reduced. Further, by sharing the display field between the operation modes as in the above embodiments, the user does not need to switch the field to be operated and checked every time the operation mode is switched, and the user operation Can be made easier.

（Ｄ−４）上記の各実施形態において、入力補助部は、リアルタイム入力部（補完部を含む）とバッチ入力部の両方を備える構成について示したが、いずれか一方を備える構成としても良い。また、その場合、動作モードの切替は必要なくなるので、表示切替部を省略するようにしても良い。 (D-4) In each of the above embodiments, the input auxiliary unit has been described with respect to the configuration including both the real-time input unit (including the complement unit) and the batch input unit, but may be configured to include either one. In this case, since the operation mode does not need to be switched, the display switching unit may be omitted.

（Ｄ−５）上記の各実施形態において、本発明の音声合成装置で取り扱うテキストデータは、日本語であるものとして説明したが、言語は限定されず、英語、中国語、フランス語、ドイツ語等他の言語にも適用することができるのは当然である。 (D-5) In each of the above embodiments, the text data handled by the speech synthesizer of the present invention has been described as being in Japanese, but the language is not limited, and English, Chinese, French, German, etc. Of course, it can be applied to other languages.

１０…音声合成装置、２０…入力補助部、２１…リアルタイム入力部、２２…バッチ入力部、２３…補完部、２４…表示切替部、２５…表示部、３０…音声合成部、３１…テキスト分割部、３２…音声合成処理部、３３…音声結合部、３４…合成音声用ＤＢ、３５…ユーザデータベース。 DESCRIPTION OF SYMBOLS 10 ... Speech synthesizer, 20 ... Input auxiliary part, 21 ... Real time input part, 22 ... Batch input part, 23 ... Complement part, 24 ... Display switching part, 25 ... Display part, 30 ... Speech synthesizer, 31 ... Text division 32: Speech synthesis processing unit, 33 ... Speech combining unit, 34 ... DB for synthesized speech, 35 ... User database.

Claims

In an input auxiliary device that generates text data to be supplied to a speech synthesizer that generates speech that reads out the content of text data in response to a user operation,
Of the first database used for speech synthesis and the second database in which the speech data of a predetermined target word is registered, the target word registered in the second database, and the target word the text data is distinguished to shower the range other than have a input assistant unit for generating in response to the operation of the user,
The input auxiliary part is
Target word holding means for holding information on the target words registered in the second database;
Real-time input means for sequentially receiving character input from the user;
A display unit for displaying information provided from the input auxiliary unit to the user;
An extraction display means for extracting the target word related to the character being input to the real-time input means by the user from the information held by the target word holding means, and displaying it on the display unit;
Selection accepting means for causing the user to select one of the target words displayed by the extraction display means, and notifying the real-time input means that the selected target word is input from the user;
Text data generating means for generating text data having a content that distinguishes the target word selected by the user by the selection accepting means from the other range for the character string input to the real-time input means; An input auxiliary device characterized by.

The input auxiliary unit further includes text file input means for receiving input of a text file in which a plurality of lines of text data is stored,
The input auxiliary unit is one of a real-time input mode in which character input from the user is received by the real-time input unit and a batch input mode in which an operation relating to a text file received by the text file input unit is received from the user. Operate in operating mode,
The input auxiliary part is
An operation mode switching means for switching and applying one of the operation modes to the input auxiliary unit in accordance with the user's operation;
A text file display means for displaying the contents of the text file input to the text file input means on the display section line by line when the input auxiliary section operates in the batch input mode;
Text data selection means for allowing the user to select text data of any line of the contents of the text file displayed by the text file display means when the input auxiliary unit operates in the batch input mode;
The input auxiliary unit further includes text data supply means for supplying the text data of the line selected by the text data selection means to the speech synthesizer when operating in the batch input mode,
The extraction display unit, the selection receiving unit, and the text data generation unit are configured such that the input auxiliary unit functions when operating in a real-time input mode,
The input assisting device according to claim 1 , wherein the text data generating means supplies the generated text data to the speech synthesizer.

The text data selection means further comprises position information storage means for storing position information of the position of the text data last selected by the user's operation,
The text file display means displays the content of the state in which the text data indicated by the position information stored in the position information storage means is selected when the input auxiliary unit is switched from the real-time input mode to the batch input mode. The input assisting device according to claim 2 , wherein the display is controlled to be displayed on the display unit.

A computer mounted on an input auxiliary device that generates text data to be supplied to a speech synthesizer that generates speech that reads out the content of text data in response to a user operation,
Of the first database used for speech synthesis and the second database in which the speech data of a predetermined target word is registered, the target word registered in the second database, and the target word Function as an input auxiliary unit that generates text data that is written in distinction from a range other than
The input auxiliary part is
Target word holding means for holding information on the target words registered in the second database;
Real-time input means for sequentially receiving character input from the user;
A display unit for displaying information provided from the input auxiliary unit to the user;
An extraction display means for extracting the target word related to the character being input to the real-time input means by the user from the information held by the target word holding means, and displaying it on the display unit;
Selection accepting means for causing the user to select one of the target words displayed by the extraction display means, and notifying the real-time input means that the selected target word is input from the user;
Text data generating means for generating text data having a content that distinguishes the target word selected by the user by the selection accepting means from the other range for the character string input to the real-time input means; An input assistance program characterized by

A first database used for speech synthesis processing;
A second database in which audio data of a predetermined target word is registered;
Regarding the text data that is described by distinguishing the target word registered in the second database and the range other than the target word, the speech data registered in the second database for the range distinguished as the target word For a range that is not distinguished as a target word, using the data of the first database, a voice generation unit that generates a voice to read the text data ;
Generate text data in which the target word registered in the second database and the range other than the target word are distinguished from the first database and the second database in accordance with a user operation An input auxiliary unit to
The input auxiliary part is
Target word holding means for holding information on the target words registered in the second database;
Real-time input means for sequentially receiving character input from the user;
A display unit for displaying information provided from the input auxiliary unit to the user;
An extraction display means for extracting the target word related to the character being input to the real-time input means by the user from the information held by the target word holding means, and displaying it on the display unit;
Selection accepting means for causing the user to select one of the target words displayed by the extraction display means, and notifying the real-time input means that the selected target word is input from the user;
Text data generating means for generating text data having a content that distinguishes the target word selected by the user by the selection accepting means from the other range for the character string input to the real-time input means; A speech synthesizer characterized by the above.

It said sound generating means, the text data, if it contains a predetermined control character, claim 5, characterized in that the silence length corresponding to the control character is inserted into the position of the control character The speech synthesizer described in 1.

A computer installed in a speech synthesizer that generates speech that reads out the contents of text data.
A first database used for speech synthesis processing;
A second database in which audio data of a predetermined target word is registered;
Regarding the text data that is described by distinguishing the target word registered in the second database and the range other than the target word, the speech data registered in the second database for the range distinguished as the target word For a range that is not distinguished as a target word, using the data of the first database, speech generation means for generating speech that reads out the text data ,
Generate text data in which the target word registered in the second database and the range other than the target word are distinguished from the first database and the second database in accordance with a user operation Function as an auxiliary input
The input auxiliary part is
Target word holding means for holding information on the target words registered in the second database;
Real-time input means for sequentially receiving character input from the user;
A display unit for displaying information provided from the input auxiliary unit to the user;
An extraction display means for extracting the target word related to the character being input to the real-time input means by the user from the information held by the target word holding means, and displaying it on the display unit;
Selection accepting means for causing the user to select one of the target words displayed by the extraction display means, and notifying the real-time input means that the selected target word is input from the user;
Text data generating means for generating text data having a content that distinguishes the target word selected by the user by the selection accepting means from the other range for the character string input to the real-time input means; A speech synthesis program characterized by