JP2002229580A

JP2002229580A - Device and method for contents composition, and program

Info

Publication number: JP2002229580A
Application number: JP2001030889A
Authority: JP
Inventors: Yasushi Sato; 寧佐藤
Original assignee: Kenwood KK
Current assignee: Kenwood KK
Priority date: 2001-02-07
Filing date: 2001-02-07
Publication date: 2002-08-16

Abstract

PROBLEM TO BE SOLVED: To provide a contents composing device, etc., which can easily secure a sufficient amount of data for contents composition and adequately protect the right to those data. SOLUTION: A language processing part 1 takes a word analysis and a modification analysis of a sentence by referring to the storage contents of a word dictionary storage part 2 to generate a phonetic character string showing a reading. An acoustic processing part 3 finds coded phoneme spectrum data showing the phonemes included in the phonetic character string in a phoneme dictionary storage part 4 and supplies them to decoding parts 5-1 to 5-n (n: integer). The decoding parts 5-1 to 5-n decodes the coded phoneme spectrum data with decoding keys by bands and supplies the obtained phoneme spectrum data to an acoustic processing part 3. The acoustic processing part 3 determines the continuance of phonemes and basic frequency patterns to determine the waveform of a voice of the whole sentence. The voice having the determined waveform is outputted by a voice output part 6. Here, it is considered that spectrum components of phonemes in a band whose decoding key is unavailable.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、コンテンツを合
成するためのコンテンツ合成装置及びコンテンツ合成方
法に関し、特に、合成するコンテンツの品質が可変であ
るコンテンツ合成装置及びコンテンツ合成方法に関す
る。The present invention relates to a content synthesizing apparatus and a content synthesizing method for synthesizing contents, and more particularly, to a content synthesizing apparatus and a content synthesizing method in which the quality of the synthesized content is variable.

【０００２】[0002]

【従来の技術】テキストデータなどを音声へと変換する
音声合成の手法が近年行われるようになっている。音声
合成では、例えば、テキストデータが表す漢字かな混じ
り文に含まれる単語が特定され、次いで文節と、文節相
互の係り受け関係が特定される。そして、特定された単
語、文節及び係り受け関係に基づいて、漢字かな混じり
文の読み方が特定される。そして、特定した読み方を表
す表音文字列に基づき、音声を構成する音素の波形や継
続時間やピッチ（基本周波数）のパターンが決定され、
決定結果に基づいて漢字かな混じり文全体を表す音声の
波形が決定され、決定された波形を有するような音声が
出力される。2. Description of the Related Art Speech synthesis techniques for converting text data and the like into speech have been used in recent years. In speech synthesis, for example, words included in a sentence mixed with kanji or kana represented by text data are specified, and then phrases and interdependency between phrases are specified. Then, based on the specified word, phrase, and dependency relationship, how to read a sentence mixed with Chinese characters or kana is specified. Then, based on the phonetic character string representing the identified reading, the waveform of the phonemes constituting the voice, the pattern of the duration and the pitch (basic frequency) are determined,
Based on the determination result, the waveform of the voice representing the entire sentence mixed with the kanji or kana is determined, and a voice having the determined waveform is output.

【０００３】[0003]

【発明が解決しようとする課題】音素の波形を特定する
ためには、音素の波形のデータを集積した音素辞書を検
索する必要がある。しかし、自然な音声を合成したり、
あるいは複数の話者の音声を合成できるようにしたりす
るためには音素辞書は膨大な量のデータを集積していな
ければならず、十分な量のデータを集積した音素辞書を
格納するだけの記憶容量を有する記憶装置を確保するこ
とが困難だった。また、そのような記憶装置は大型で重
量も大きく、この結果、音声を合成する装置全体の構成
も大型で重くなっていた。In order to specify a phoneme waveform, it is necessary to search a phoneme dictionary in which phoneme waveform data is accumulated. However, if you synthesize natural speech,
Or, in order to be able to synthesize the voices of multiple speakers, the phoneme dictionary must accumulate an enormous amount of data, and only store the phoneme dictionary with a sufficient amount of data. It was difficult to secure a storage device having a capacity. Further, such a storage device is large and heavy, and as a result, the configuration of the entire device for synthesizing voice is also large and heavy.

【０００４】また、合成される音声や、合成の素材とな
る音素の波形などの上には、著作権等の法的な権利が発
生する。このため、このような権利が適切に保護される
ようにすることも望まれている。具体的には、例えば、
これらの権利を保護する手法として、音素の波形や合成
される音声などを全体として提供するか否かを二者択一
する手法をとった場合、合成される音声の評価版や音素
の波形の試供版の提供が困難になるなどの問題が生じ、
これらの権利を適切に行使することができない。このた
め、このような問題を回避して権利を適切に行使できる
ようにする手法が望まれている。[0004] In addition, legal rights such as copyrights are generated on synthesized voices and waveforms of phonemes used as materials for synthesis. Therefore, it is also desired that such rights be appropriately protected. Specifically, for example,
As a method to protect these rights, if a method is used to choose whether to provide the phoneme waveform or the synthesized speech as a whole, the evaluation version of the synthesized speech and the waveform of the phoneme Problems such as difficulties in providing a trial version occurred,
These rights cannot be properly exercised. For this reason, there is a need for a method of avoiding such a problem and appropriately exercising the right.

【０００５】この発明は、上記実状に鑑みてなされたも
のであり、コンテンツの合成のために用いるデータを容
易に十分な量確保できるコンテンツ合成装置及びコンテ
ンツ合成方法を提供することを目的とする。また、この
発明は、合成されたコンテンツやコンテンツの合成のた
めに用いるデータについて生じる権利を適切に保護でき
るコンテンツ合成装置及びコンテンツ合成方法を提供す
ることを目的とする。[0005] The present invention has been made in view of the above situation, and has as its object to provide a content synthesizing apparatus and a content synthesizing method which can easily secure a sufficient amount of data used for synthesizing content. Another object of the present invention is to provide a content synthesizing apparatus and a content synthesizing method that can appropriately protect the right generated for synthesized content and data used for synthesizing the content.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成すべく、
この発明の第１の観点にかかるコンテンツ合成装置は、
コンテンツを合成するためのデータであり、少なくとも
１個が暗号化されている複数の素材データを記憶する記
憶手段と、前記記憶手段に着脱可能に接続され、当該記
憶手段より前記素材データを取得して、各該素材データ
のうち少なくともいずれかに対応付けられた復号化キー
が供給されたとき当該復号化キーを取得して、取得した
素材データのうち、取得した復号化キーに対応付けられ
たものを、当該復号化キーを用いて復号化する復号化手
段と、復号化された前記素材データ及び／又は暗号化さ
れていない前記素材データに基づいて前記コンテンツを
合成するコンテンツ合成手段と、を備える、ことを特徴
とする。In order to achieve the above object,
A content synthesizing device according to a first aspect of the present invention includes:
Storage means for storing a plurality of material data, at least one of which is data for synthesizing contents, and which is detachably connected to the storage means, and acquires the material data from the storage means; When a decryption key associated with at least one of the material data is supplied, the decryption key is acquired, and the acquired material data is associated with the acquired decryption key. Decryption means for decrypting the content using the decryption key, and content synthesis means for synthesizing the content based on the decrypted material data and / or the unencrypted material data. Comprising.

【０００７】このようなコンテンツ合成装置によれば、
コンテンツの合成のために用いる素材データを記憶する
記憶手段を種々着脱することにより容易に十分な量の素
材データが確保される。また、このようなコンテンツ合
成装置によれば、素材データのうち保護が必要なものを
選択的に暗号化しておくことにより、素材データや素材
データを用いて合成されたコンテンツについて生じる権
利が適切に保護される。According to such a content synthesizing apparatus,
By attaching and detaching various storage means for storing material data used for synthesizing contents, a sufficient amount of material data can be easily secured. Further, according to such a content synthesizing device, by selectively encrypting the material data requiring protection, the rights generated for the material data and the content synthesized using the material data can be appropriately set. Protected.

【０００８】前記記憶手段は、暗号化された前記復号化
キーを記憶し、当該暗号化された復号化キーを前記復号
化手段に供給する手段を備え、前記復号化手段は、外部
より前記暗号化された復号化キーを復号化するための保
護用復号鍵が供給されたとき当該保護用復号鍵を取得し
て、前記記憶手段より供給された前記暗号化された復号
化キーを取得し、当該暗号化された復号化キーを前記保
護用復号鍵を用いて復号化するものであってもよい。こ
のような構成を有していれば、復号化キーを外部から取
得することを要せずに、素材データや素材データを用い
て合成されたコンテンツについて生じる権利が適切に保
護される。[0008] The storage means includes means for storing the encrypted decryption key and supplying the encrypted decryption key to the decryption means. When the protection decryption key for decrypting the encrypted decryption key is supplied, the protection decryption key is obtained, and the encrypted decryption key supplied from the storage unit is obtained, The encrypted decryption key may be decrypted using the protection decryption key. With such a configuration, the right generated for the material data or the content synthesized using the material data is appropriately protected without having to obtain the decryption key from the outside.

【０００９】また、この発明の第２の観点にかかるコン
テンツ合成装置は、コンテンツを合成するためのデータ
であり少なくとも１個が暗号化されている複数の素材デ
ータを外部より取得して、各該素材データのうち少なく
ともいずれかに対応付けられた復号化キーが供給された
とき当該復号化キーを取得して、取得した素材データの
うち、取得した復号化キーに対応付けられたものを、当
該復号化キーを用いて復号化する復号化手段と、復号化
された前記素材データ及び／又は暗号化されていない前
記素材データに基づいて前記コンテンツを合成するコン
テンツ合成手段と、を備える、ことを特徴とする。A content synthesizing apparatus according to a second aspect of the present invention obtains a plurality of material data, at least one of which is data for synthesizing a content, and at least one of which is encrypted, and obtains each of the data. When a decryption key associated with at least one of the material data is supplied, the decryption key is acquired, and among the acquired material data, the one associated with the acquired decryption key is acquired. Decoding means for decoding using a decoding key, and content synthesizing means for synthesizing the content based on the decrypted material data and / or the unencrypted material data. Features.

【００１０】このようなコンテンツ合成装置によれば、
コンテンツの合成のために用いる素材データを外部から
適宜取得することにより容易に十分な量の素材データが
確保される。また、このようなコンテンツ合成装置によ
れば、素材データのうち保護が必要なものを選択的に暗
号化しておくことにより、素材データや素材データを用
いて合成されたコンテンツについて生じる権利が適切に
保護される。According to such a content synthesizing device,
A sufficient amount of material data can be easily secured by appropriately acquiring the material data used for synthesizing the content from the outside. Further, according to such a content synthesizing device, by selectively encrypting the material data requiring protection, the rights generated for the material data and the content synthesized using the material data can be appropriately set. Protected.

【００１１】各前記素材データは、例えば、前記コンテ
ンツの構成部分のスペクトルを帯域別に表すものであっ
てもよい。この場合、前記コンテンツ合成手段は、例え
ば、前記復号化された素材データが表すスペクトルを有
し且つ前記復号化されなかった前記素材データが表すス
ペクトルが実質的に欠けた前記コンテンツを表すデータ
を生成するものであればよい。Each of the material data may represent, for example, a spectrum of a component of the content for each band. In this case, the content synthesizing unit generates, for example, data representing the content having a spectrum represented by the decoded material data and substantially lacking a spectrum represented by the undecoded material data. Anything should do.

【００１２】前記コンテンツ合成装置は、前記コンテン
ツの利用者に対する課金額を示す課金額データを外部よ
り取得し、取得した課金額データが示す課金額に基づい
て、復号化されるべき素材データを決定し、決定された
前記素材データに対応付けられた前記復号化キーを前記
復号化手段に供給する復号化キー供給手段を備えてもよ
く、この場合、前記復号化手段は、前記復号化キー供給
手段が供給する復号化キーを取得するものであればよ
い。前記コンテンツ合成装置がこのような構成を有する
ことにより、課金額に応じた品質を有するコンテンツが
合成されるようになる。[0012] The content synthesizing device obtains charging amount data indicating a charging amount for a user of the content from outside, and determines material data to be decoded based on the charging amount indicated by the obtained charging amount data. And a decryption key supply unit that supplies the decryption key associated with the determined material data to the decryption unit. In this case, the decryption unit supplies the decryption key. What is necessary is just to obtain the decryption key supplied by the means. When the content synthesizing apparatus has such a configuration, content having a quality corresponding to the billing amount can be synthesized.

【００１３】また、この発明の第３の観点にかかるコン
テンツ合成方法は、コンテンツを合成するためのデータ
であり、少なくとも１個が暗号化されている複数の素材
データを記憶装置に記憶し、前記記憶装置に着脱可能に
接続したアクセス用装置を介して当該記憶装置より前記
素材データを取得して、各該素材データのうち少なくと
もいずれかに対応付けられた復号化キーが供給されたと
き当該復号化キーを取得して、取得した素材データのう
ち、取得した復号化キーに対応付けられたものを、当該
復号化キーを用いて復号化し、復号化された前記素材デ
ータ及び／又は暗号化されていない前記素材データに基
づいて前記コンテンツを合成する、ことを特徴とする。A content synthesizing method according to a third aspect of the present invention is a data for synthesizing content, wherein a plurality of material data, at least one of which is encrypted, are stored in a storage device. The material data is obtained from the storage device via an access device detachably connected to the storage device, and when a decryption key associated with at least one of the material data is supplied, the decryption is performed. A decryption key, and among the acquired material data, the one associated with the acquired decryption key is decrypted using the decryption key, and the decrypted material data and / or the And synthesizing the content based on the material data that is not present.

【００１４】このようなコンテンツ合成方法によれば、
コンテンツの合成のために用いる素材データを記憶する
種々の記憶装置をアクセス用装置に着脱することにより
容易に十分な量の素材データが確保される。また、この
ようなコンテンツ合成装置によれば、素材データのうち
保護が必要なものを選択的に暗号化しておくことによ
り、素材データや素材データを用いて合成されたコンテ
ンツについて生じる権利が適切に保護される。According to such a content synthesizing method,
By attaching and detaching various storage devices for storing material data used for synthesizing contents to the access device, a sufficient amount of material data can be easily secured. Further, according to such a content synthesizing device, by selectively encrypting the material data requiring protection, the rights generated for the material data and the content synthesized using the material data can be appropriately set. Protected.

【００１５】また、この発明の第４の観点にかかるプロ
グラムは、コンピュータを、コンテンツを合成するため
のデータであり少なくとも１個が暗号化されている複数
の素材データを外部より取得して、各該素材データのう
ち少なくともいずれかに対応付けられた復号化キーが供
給されたとき当該復号化キーを取得して、取得した素材
データのうち、取得した復号化キーに対応付けられたも
のを、当該復号化キーを用いて復号化する復号化手段
と、復号化された前記素材データ及び／又は暗号化され
ていない前記素材データに基づいて前記コンテンツを合
成するコンテンツ合成手段と、して機能させるためのも
のであることを特徴とする。According to a fourth aspect of the present invention, there is provided a program for causing a computer to externally acquire a plurality of material data, at least one of which is data for synthesizing content and which is encrypted. When a decryption key associated with at least one of the material data is supplied, the decryption key is acquired, and among the acquired material data, those associated with the acquired decryption key, Decryption means for decrypting using the decryption key, and content synthesizing means for synthesizing the content based on the decrypted material data and / or the unencrypted material data are caused to function. It is characterized by that.

【００１６】このようなプログラムを実行するコンピュ
ータによれば、コンテンツの合成のために用いる素材デ
ータを記憶する記憶手段を種々着脱することにより容易
に十分な量の素材データが確保される。また、このよう
なプログラムを実行するコンピュータによれば、素材デ
ータのうち保護が必要なものを選択的に暗号化しておく
ことにより、素材データや素材データを用いて合成され
たコンテンツについて生じる権利が適切に保護される。According to the computer that executes such a program, a sufficient amount of material data can be easily secured by variously attaching and detaching storage means for storing material data used for synthesizing contents. Further, according to a computer that executes such a program, by selectively encrypting material data requiring protection, the right to be generated with respect to the material data and the content synthesized using the material data is obtained. Appropriately protected.

【００１７】また、この発明の第５の観点にかかるプロ
グラムは、コンテンツを合成するためのデータであり、
少なくとも１個が暗号化されている複数の素材データを
記憶する記憶装置に着脱可能に接続されたコンピュータ
を、前記記憶装置より前記素材データを取得して、各該
素材データのうち少なくともいずれかに対応付けられた
復号化キーが供給されたとき当該復号化キーを取得し
て、取得した素材データのうち、取得した復号化キーに
対応付けられたものを、当該復号化キーを用いて復号化
する復号化手段と、復号化された前記素材データ及び／
又は暗号化されていない前記素材データに基づいて前記
コンテンツを合成するコンテンツ合成手段と、して機能
させるためのものであることを特徴とする。A program according to a fifth aspect of the present invention is data for synthesizing contents,
A computer detachably connected to a storage device that stores a plurality of material data at least one of which is encrypted, obtains the material data from the storage device, and stores the acquired material data in at least one of the material data. When the associated decryption key is supplied, the decryption key is acquired, and the acquired material data, which is associated with the acquired decryption key, is decrypted using the decryption key. Decrypting means, and the decrypted material data and / or
Alternatively, the content synthesizing unit is configured to function as a content synthesizing unit that synthesizes the content based on the unencrypted material data.

【００１８】このようなプログラムを実行するコンピュ
ータによれば、コンテンツの合成のために用いる素材デ
ータを外部から適宜取得することにより容易に十分な量
の素材データが確保される。また、このようなプログラ
ムを実行するコンピュータによれば、素材データのうち
保護が必要なものを選択的に暗号化しておくことによ
り、素材データや素材データを用いて合成されたコンテ
ンツについて生じる権利が適切に保護される。According to the computer that executes such a program, a sufficient amount of material data can be easily secured by appropriately obtaining material data used for synthesizing contents from the outside. Further, according to a computer that executes such a program, by selectively encrypting material data requiring protection, the right to be generated with respect to the material data and the content synthesized using the material data is obtained. Appropriately protected.

【００１９】[0019]

【発明の実施の形態】以下に、図面を参照して、この発
明の実施の形態を説明する。（第１の実施の形態）図１は、この発明の第１の実施の
形態に係る音声合成システムの構成を示す図である。図
示するように、この音声合成システムは、言語処理部１
と、単語辞書記憶部２と、音響処理部３と、音素辞書記
憶部４と、復号化部５−１〜５−ｎ（ｎは正の整数）
と、音声出力部６とより構成されている。Embodiments of the present invention will be described below with reference to the drawings. (First Embodiment) FIG. 1 is a diagram showing a configuration of a speech synthesis system according to a first embodiment of the present invention. As shown, the speech synthesis system includes a language processing unit 1
, A word dictionary storage unit 2, a sound processing unit 3, a phoneme dictionary storage unit 4, and decoding units 5-1 to 5-n (n is a positive integer)
And an audio output unit 6.

【００２０】言語処理部１、単語辞書記憶部２、音響処
理部３、復号化部５−１〜５−ｎ及び音声出力部６は一
体に形成されており、音素辞書記憶部４は、図示しない
インターフェース等を介して、音響処理部３及び復号化
部５−１〜５−ｎに着脱可能に接続されている。The language processing unit 1, word dictionary storage unit 2, acoustic processing unit 3, decoding units 5-1 to 5-n and voice output unit 6 are integrally formed. It is detachably connected to the sound processing unit 3 and the decoding units 5-1 to 5-n via an interface or the like which is not used.

【００２１】言語処理部１は、ＣＰＵ（Central Proces
sing Unit）等からなる処理部と、キーボードや、記録
媒体（例えば、フロッピー（登録商標）ディスクやＭＯ
（Magneto Optical disk）など）に記録されたデータを
読み取る記録媒体ドライバ（フロッピーディスクドライ
ブや、ＭＯドライブなど）等からなる文書入力部とより
構成されている。言語処理部１の処理部は、単語辞書記
憶部２に接続されている。The language processing unit 1 has a CPU (Central Processes).
sing unit), a keyboard, a recording medium (for example, floppy (registered trademark) disk or MO).
(Magneto Optical disk) and a document input unit including a recording medium driver (floppy disk drive, MO drive, etc.) for reading data recorded on the disk. The processing unit of the language processing unit 1 is connected to the word dictionary storage unit 2.

【００２２】言語処理部１の文書入力部は、漢字かな混
じり文を表す文書データを入力して言語処理部１の処理
部に供給する。言語処理部１の処理部は、文書入力部よ
り供給された文書データが表す文について単語解析を行
う。The document input unit of the language processing unit 1 inputs document data representing a sentence mixed with kanji and kana and supplies it to the processing unit of the language processing unit 1. The processing unit of the language processing unit 1 performs word analysis on a sentence represented by the document data supplied from the document input unit.

【００２３】具体的な単語解析の処理として、言語処理
部１の処理部は、例えば、単語辞書記憶部２にアクセス
し、単語辞書記憶部２が記憶する後述の単語データを検
索することにより、文書入力部より供給された文書デー
タが表す文が表している可能性のある単語の候補を抽出
する。そして、単語同士の連結の整合性や単語の出現頻
度などに基づき、抽出した候補のうちから正しい単語を
推定する。As a specific word analysis process, the processing unit of the language processing unit 1 accesses, for example, the word dictionary storage unit 2 and searches for later-described word data stored in the word dictionary storage unit 2. A word candidate that may be represented by a sentence represented by the document data supplied from the document input unit is extracted. Then, a correct word is estimated from the extracted candidates based on the consistency of connection between words, the frequency of appearance of words, and the like.

【００２４】単語解析を終えると、言語処理部１の処理
部は、文書入力部より供給された文書データが表す文に
ついて更に係り受け解析を行う。具体的な係り受け解析
の処理として、言語処理部１の処理部は、例えば、単語
解析の結果推定された単語に基づき、文書入力部より供
給された文書データが表す文に含まれる文節を推定す
る。そして、推定した文節の文法上の整合性などに基づ
き、推定した文節同士の係り受け関係と、これらの文節
内でのアクセントの位置とを推定する。When the word analysis is completed, the processing unit of the language processing unit 1 further performs a dependency analysis on the sentence represented by the document data supplied from the document input unit. As a specific dependency analysis process, the processing unit of the language processing unit 1 estimates a phrase included in a sentence represented by the document data supplied from the document input unit based on, for example, a word estimated as a result of the word analysis. I do. Then, based on the grammatical consistency of the estimated phrases, the dependency relationship between the estimated phrases and the positions of accents in these phrases are estimated.

【００２５】単語解析及び係り受け解析を終えると、言
語処理部１の処理部は、単語解析及び係り受け解析の結
果に基づき、この文の読み方を所定の表音文字（例え
ば、ひらがな又はかたかな）を用いて表記したものを表
す表音文字列を生成する。そして、生成した表音文字列
を音響処理部３へと供給する。When the word analysis and the dependency analysis are completed, the processing unit of the language processing unit 1 determines how to read the sentence based on the result of the word analysis and the dependency analysis in a predetermined phonetic character (for example, hiragana or katakana). A phonetic character string representing what is described using Kana) is generated. Then, the generated phonetic character string is supplied to the acoustic processing unit 3.

【００２６】単語辞書記憶部２は、ＲＯＭ（Read Only
Memory）等からなる記憶装置を備えており、単語辞書を
記憶する。単語辞書は、単語を漢字かな混じり文で表記
したものを表す単語データを格納している。そして、単
語辞書記憶部２は、言語処理部１のアクセスに応答し
て、単語辞書に格納されている単語データを読み出し、
言語処理部１に供給する。The word dictionary storage unit 2 has a ROM (Read Only)
Memory), and stores a word dictionary. The word dictionary stores word data representing words expressed in kanji or kana mixed sentences. Then, the word dictionary storage unit 2 reads the word data stored in the word dictionary in response to the access of the language processing unit 1,
It is supplied to the language processing unit 1.

【００２７】音響処理部３は、ＣＰＵやＤＳＰ（Digita
l Signal Processor）等より構成されている。音響処理
部３は、言語処理部１より表音文字列を供給されると、
まず、この表音文字列が表す音声に含まれる音素を抽出
する。そして、抽出された音素同士の隣接関係と、この
表音文字列が表す音声の発声の速度とに基づいて、抽出
された音素が継続する時間の長さを決定する（すなわ
ち、音素継続時間長生成の処理を行う）。表音文字列が
表す音声の発声の速度は、例えば所定の値であってもよ
く、また、音響処理部３の外部から音響処理部３へと供
給されたデータが示す値であってもよい。The sound processing unit 3 includes a CPU and a DSP (Digita
l Signal Processor). When the sound processing unit 3 is supplied with the phonetic character string from the language processing unit 1,
First, phonemes included in the voice represented by the phonetic character string are extracted. Then, the length of time during which the extracted phoneme continues is determined based on the adjacency relationship between the extracted phonemes and the utterance speed of the voice represented by the phonetic character string (that is, the phoneme duration time). Generation processing). The utterance speed of the voice represented by the phonetic character string may be, for example, a predetermined value, or may be a value indicated by data supplied to the sound processing unit 3 from outside the sound processing unit 3. .

【００２８】次に、音響処理部３は、言語処理部１より
供給された表音文字列に含まれる文節及びこの表音文字
列が表す音声のアクセントの位置に基づき、この表音文
字列が表す音声の基本周波数の時間変化のパターン（基
本周波数パターン）を決定する処理（基本周波数パター
ン生成の処理）を行う。Next, the sound processing unit 3 converts the phonogram character string based on the phrase contained in the phonogram character string supplied from the language processing unit 1 and the position of the accent of the voice represented by the phonogram character string. A process (basic frequency pattern generation process) for determining a time-change pattern (basic frequency pattern) of the fundamental frequency of the voice to be represented is performed.

【００２９】次に、音響処理部３は音素辞書記憶部４に
アクセスし、抽出した音素を表す後述の暗号化音素スペ
クトルデータを復号化部５−１〜５−ｎへと供給するこ
とを指示する。そして、復号化部５−１〜５−ｎより後
述の音素スペクトルデータを供給されると、ポリフェー
ズフィルタ、ＦＦＴ（Fast Fourier Transform）、ＤＣ
Ｔ（Discrete Cosine Transform）、ＬＯＴ（Lapped Or
thogonal Transform）、ＭＬＴ（Modulated Lapped Tra
nsform）あるいはＥＬＴ（Extended Lapped Transfor
m）等の手法を用い、この音素スペクトルデータが表す
スペクトルを有する波形を表す波形データを生成する。Next, the sound processing unit 3 accesses the phoneme dictionary storage unit 4 and instructs to supply encrypted phoneme spectrum data (described later) representing the extracted phonemes to the decoding units 5-1 to 5-n. I do. When phoneme spectrum data described later is supplied from the decoding units 5-1 to 5-n, a polyphase filter, an FFT (Fast Fourier Transform), a DC
T (Discrete Cosine Transform), LOT (Lapped Or
thogonal Transform), MLT (Modulated Lapped Tra)
nsform) or ELT (Extended Lapped Transfor)
Using a technique such as m), waveform data representing a waveform having a spectrum represented by the phoneme spectrum data is generated.

【００３０】次に、音響処理部３は、自己が生成した波
形データが表す音素が、自己に供給された表音文字列内
で並んでいる順序に連結し、連結して得られる波形を、
基本周波数パターン生成の処理で決定されたパターンに
合致するような基本周波数成分を有するよう変形するこ
とにより、この表音文字列が表す音声全体の波形を表す
信号を生成して、音声出力部６に供給する。なお、音声
出力部６に供給する信号は、例えば、音声全体の波形を
ＰＣＭ（Pulse Code Modulation）の形式で表す信号で
あればよい。Next, the sound processing unit 3 connects the phonemes represented by the waveform data generated by the sound processing unit 3 in the order in which the phonemes are arranged in the phonetic character string supplied to the sound processing unit 3, and generates a waveform obtained by the connection.
By deforming to have a fundamental frequency component that matches the pattern determined in the fundamental frequency pattern generation process, a signal representing the waveform of the entire speech represented by the phonetic character string is generated, and the speech output unit 6 To supply. Note that the signal supplied to the audio output unit 6 may be, for example, a signal representing the waveform of the entire audio in the form of PCM (Pulse Code Modulation).

【００３１】音素辞書記憶部４は、フラッシュメモリや
ハードディスク装置等の不揮発性記憶装置より構成され
ている。音素辞書記憶部４は、音素辞書を記憶する。音
素辞書は、音素を表す波形のスペクトルを表すデータ
（音素スペクトルデータ）を暗号化して得られるデータ
である暗号化音素スペクトルデータを格納している。そ
して、音素辞書記憶部４は、音響処理部３のアクセスに
応答して、音素辞書に格納されている暗号化音素スペク
トルデータのうち、音響処理部３が指示する音素を表す
もの（具体的には、音響処理部３が指示する音素を識別
する後述の識別データが付されているもの）を読み出
し、復号化部５−１〜５−ｎに供給する。The phoneme dictionary storage unit 4 comprises a nonvolatile storage device such as a flash memory or a hard disk device. The phoneme dictionary storage unit 4 stores a phoneme dictionary. The phoneme dictionary stores encrypted phoneme spectrum data, which is data obtained by encrypting data (phoneme spectrum data) representing a spectrum of a waveform representing a phoneme. Then, the phoneme dictionary storage unit 4 responds to the access of the sound processing unit 3 and, among the encrypted phoneme spectrum data stored in the phoneme dictionary, represents the phoneme designated by the sound processing unit 3 (specifically, Reads identification data for identifying phonemes specified by the acoustic processing unit 3), and supplies the read data to the decoding units 5-1 to 5-n.

【００３２】各々の音素を表す音素スペクトルデータ
は、具体的には、ｎ個のデータからなる。これらｎ個の
データの各々は、当該音素のスペクトル分布をｎ等分し
て得られる等幅の互いに異なる帯域に含まれる当該音素
の各スペクトル成分の強度を表す、スペクトル成分１個
あたり所定ビット数のビット列を含む。暗号化音素スペ
クトルデータは、これらｎ個のデータを個別に暗号化し
て得られるｎ個のデータを含む。The phoneme spectrum data representing each phoneme is specifically composed of n data. Each of these n pieces of data represents a predetermined number of bits per spectrum component, representing the intensity of each spectrum component of the phoneme included in equal-width different bands obtained by dividing the spectrum distribution of the phoneme into n equal parts. Is included. The encrypted phoneme spectrum data includes n data obtained by individually encrypting these n data.

【００３３】そして、図２に示すように、暗号化音素ス
ペクトルデータを構成するｎ個のデータは、音素及び１
番からｎ番までの順番を識別する識別データを付され
て、音素辞書に格納されている。暗号化音素スペクトル
データを構成するｎ個のデータのうち、低周波側からｋ
番目（ｋは１以上ｎ以下の任意の整数）の帯域内のスペ
クトル成分を表す部分は、音響処理部３のアクセスに応
答した音素辞書記憶部４によって、復号化部５−ｋへと
供給される。Then, as shown in FIG. 2, the n pieces of data constituting the encrypted phoneme spectrum data are phonemes and 1
The identification data for identifying the order from the number to the n-th is added and stored in the phoneme dictionary. Of the n pieces of data constituting the encrypted phoneme spectrum data, k
The part representing the spectral component in the th (k is any integer from 1 to n) band is supplied to the decoding unit 5-k by the phoneme dictionary storage unit 4 in response to the access of the acoustic processing unit 3. You.

【００３４】なお、音素スペクトルデータを暗号化して
暗号化音素スペクトルデータをなすｎ個のデータを生成
する手法は任意であり、例えばＤＥＳ（Data Encryptio
n Standard）等の対称鍵暗号の手法でもよいし、公開鍵
暗号の手法でもよい。また、暗号化される対象の音素ス
ペクトルデータをなすｎ個のデータは、互いに同一の暗
号鍵を用いて暗号化されるものとは限られない。The method of encrypting the phoneme spectrum data to generate n pieces of data constituting the encrypted phoneme spectrum data is arbitrary. For example, DES (Data Encryptio)
n Standard) or a method of public key encryption. Further, the n pieces of data constituting the phoneme spectrum data to be encrypted are not necessarily encrypted using the same encryption key.

【００３５】復号化部５−１〜５−ｎ（ｎは上述の帯域
の数）は、互いに実質的に同一の構成を有しており、Ｄ
ＳＰやＣＰＵ等より構成されている。復号化部５−ｋ
は、音素辞書記憶部４が記憶する暗号化音素スペクトル
データのうち、低周波側からｋ番目の帯域内のスペクト
ル成分を表す部分を取得し、また、外部より復号化キー
を取得する。そして、取得した復号化キーを用いて当該
部分を復号化し、復号化により得られたデータを音響処
理部３に供給する。The decoding units 5-1 to 5-n (n is the number of the above-mentioned bands) have substantially the same configuration as each other.
It is composed of SP, CPU, etc. Decoding section 5-k
Obtains a portion representing the spectral components in the k-th band from the low frequency side in the encrypted phoneme spectrum data stored in the phoneme dictionary storage unit 4, and obtains a decryption key from outside. Then, the part is decrypted using the obtained decryption key, and the data obtained by the decryption is supplied to the acoustic processing unit 3.

【００３６】なお、暗号化音素スペクトルデータが音素
スペクトルデータを対称鍵暗号の手法により暗号化した
ものからなる場合、復号化部５−ｋが取得する復号化キ
ーは、当該音素スペクトルデータの暗号化のために用い
た暗号化キーそのものであればよい。また、暗号化音素
スペクトルデータが音素スペクトルデータを公開鍵暗号
の手法により暗号化したものからなる場合、「一方を用
いて暗号化されたデータは他方を用いて復号化され得
る」という関係にある一対の秘密鍵及び公開鍵のうち、
公開鍵の方を当該音素スペクトルデータを暗号化するた
めの暗号化キーとして用い、秘密鍵の方を、当該音素ス
ペクトルデータの暗号化により得られた暗号化音素スペ
クトルデータを復号化するための復号化キーとして用い
ればよい。When the encrypted phoneme spectrum data is obtained by encrypting the phoneme spectrum data by the symmetric key encryption method, the decryption key obtained by the decryption unit 5-k uses the encryption of the phoneme spectrum data. Any key may be used as long as it is the encryption key used for the encryption. When the encrypted phoneme spectrum data is obtained by encrypting the phoneme spectrum data by a public key encryption method, there is a relationship that “data encrypted using one can be decrypted using the other”. Of a pair of private and public keys,
The public key is used as an encryption key for encrypting the phoneme spectrum data, and the private key is used for decryption of the encrypted phoneme spectrum data obtained by encrypting the phoneme spectrum data. The key may be used as a key.

【００３７】復号化部５−ｋに供給された復号化キー
が、復号化部５−ｋが取得した暗号化音素スペクトルデ
ータを復号化するために用い得る正しい復号化キーであ
る場合、復号化部５−ｋが復号化により生成するデータ
は、当該暗号化音素スペクトルデータの生成に用いられ
た音素スペクトルデータのうち低周波側からｋ番目の帯
域内のスペクトル成分を表す部分と実質的に同一のデー
タとなる。If the decryption key supplied to the decryption unit 5-k is a correct decryption key that can be used to decrypt the encrypted phoneme spectrum data obtained by the decryption unit 5-k, The data generated by the decryption by the unit 5-k is substantially the same as the portion representing the spectral component in the k-th band from the low frequency side in the phoneme spectrum data used to generate the encrypted phoneme spectrum data. Data.

【００３８】一方、復号化部５−ｋは、自己に正しい復
号化キーが供給されなかった場合、自己に供給された暗
号化音素スペクトルデータの部分の内容の如何に関わら
ず、当該部分が示すスペクトル成分は実質的に存在しな
いものとして扱う。（例えば、当該成分を復号化して得
られるデータを音響処理部３に供給しないものとす
る。）なお、復号化キーの正誤の判定の手法は任意であ
り、例えば、当該復号化キーが所定のデータ形式を有し
ているか否かに基づいて行われればよい。On the other hand, if the correct decryption key is not supplied to the decryption unit 5-k, the decryption unit 5-k indicates the content of the decrypted phoneme spectrum data regardless of the content of the supplied decryption key. Spectral components are treated as substantially non-existent. (For example, it is assumed that data obtained by decoding the component is not supplied to the acoustic processing unit 3.) The method of determining whether the decoding key is correct or not is arbitrary. What is necessary is just to perform based on whether it has a data format.

【００３９】従って、音響処理部３が音声出力部６に供
給する信号のスペクトル分布は、暗号化音素スペクトル
データの生成に用いられた元の音素スペクトルデータが
表すスペクトルのうち、正しい復号化キーを取得しなか
った復号化部に供給された部分が表すスペクトル成分を
除いたスペクトル成分の分布に実質的に等しいものとな
る。Therefore, the spectrum distribution of the signal supplied from the acoustic processing unit 3 to the audio output unit 6 is determined by using the correct decryption key among the spectra represented by the original phoneme spectrum data used to generate the encrypted phoneme spectrum data. The distribution is substantially equal to the distribution of the spectral components excluding the spectral components represented by the parts supplied to the decoding unit that have not been acquired.

【００４０】そして、例えば、復号化部の数が１０個で
ある（上述のｎの値が１０である）として、復号化部５
−１〜５−３に供給すべき復号化キーがこの音声合成シ
ステムの所持者等に無償で提供される一方、復号化部５
−４〜５−１０に供給すべき復号化キーは有償で提供さ
れるものとする。この場合、この音声合成システムの所
持者等は、無償では、復号化部５−１〜５−３に供給さ
れた暗号化音素スペクトルデータの部分が表すスペクト
ル分布を有する音声（音素の帯域が制限された、品質の
劣る音声）を合成させることができる一方、復号化部５
−１〜５−１０に供給された暗号化音素スペクトルデー
タの部分すべてが表すスペクトル分布を有する音声（音
素の帯域が制限されていない、品質のよい音声）を合成
させることはできない。しかし、復号化部５−４〜５−
１０に供給すべき復号化キーを購入すれば、帯域が制限
されていない、品質のよい音声を合成させることができ
る。更に、提供される復号化キーの数を、代金の金額等
に応じて３段階以上にわたって変えるものとすれば、こ
の音声合成システムを用いて合成される音声の品質も、
４段階以上にわたって可変とすることができる。For example, assuming that the number of decoding units is ten (the value of n is 10), the decoding units 5
While the decryption key to be supplied to -1 to 5-3 is provided free of charge to the owner of the speech synthesis system, the decryption unit 5
The decryption key to be supplied to -4 to 5-10 shall be provided for a fee. In this case, the voice synthesis system owner or the like is free of charge for voice (having a restricted phoneme band) having a spectrum distribution represented by the encrypted phoneme spectrum data supplied to the decoding units 5-1 to 5-3. And low-quality speech) that have been decoded.
It is not possible to synthesize speech having a spectrum distribution represented by all of the encrypted phoneme spectrum data supplied to -1 to 5-10 (sounds of high quality in which the phoneme band is not limited). However, the decoding units 5-4 to 5-5
By purchasing a decryption key to be supplied to 10, it is possible to synthesize high-quality speech whose bandwidth is not restricted. Further, if the number of provided decryption keys is changed in three or more steps according to the amount of the price or the like, the quality of the speech synthesized using this speech synthesis system is also improved.
It can be variable over four or more steps.

【００４１】なお、復号化部５−１〜５−ｎが復号化キ
ーを取得する手法は任意であり、たとえば、復号化部５
−１〜５−ｎは、外部の記憶装置や記録媒体から復号化
キーを読み出すようにしてもよい。また、外部の装置か
ら通信回線を介して復号化部５−１〜５−ｎに復号化キ
ーを送信し、復号化部５−１〜５−ｎがこの復号化キー
を受信してもよい。また、復号化部５−１〜５−ｎが、
復号化キーを書き換え可能に記憶する記憶装置（たとえ
ば、ＲＡＭ（Random Access Memory））を備え、各自が
復号化に先立って記憶している当該復号化キーを復号化
に用いるようにしてもよい。The method by which the decryption units 5-1 to 5-n obtain the decryption keys is arbitrary.
For -1 to 5-n, the decryption key may be read from an external storage device or recording medium. Alternatively, the decryption key may be transmitted from an external device to the decryption units 5-1 to 5-n via a communication line, and the decryption units 5-1 to 5-n may receive the decryption keys. . Further, the decoding units 5-1 to 5-n
A storage device (for example, a RAM (Random Access Memory)) that stores the decryption key in a rewritable manner may be provided, and the decryption key stored before each decryption may be used for decryption.

【００４２】また、復号化キーは、保護用の暗号鍵を用
いて暗号化された状態で音素辞書記憶部４に記憶されて
いてもよい。一方、復号化部５−１〜５−ｎは、通信回
線等を介して外部の装置より送信された保護用の暗号鍵
を受信したとき、音素辞書記憶部４にアクセスして暗号
化されている復号化キーを読み出し、自己が受信した保
護用の暗号鍵を用いて復号化することにより、復号化キ
ーを取得するようにしてもよい。なお、復号化部５−１
〜５−ｎは、保護用の暗号鍵を送信する外部の装置をパ
スワード認証等の手法により認証し、認証に失敗したと
きは、保護用の暗号鍵の受信を拒絶するようにしてもよ
い。Further, the decryption key may be stored in the phoneme dictionary storage unit 4 in a state where the decryption key is encrypted by using the encryption key for protection. On the other hand, when the decryption units 5-1 to 5-n receive the encryption key for protection transmitted from an external device via a communication line or the like, the decryption units 5-1 to 5-n access the phoneme dictionary storage unit 4 and are encrypted. Alternatively, the decryption key may be obtained by reading out the decryption key that is present and decrypting the decryption key using the protection encryption key received by itself. The decoding unit 5-1
5-n may authenticate an external device that transmits a protection encryption key by a method such as password authentication, and may reject reception of the protection encryption key when the authentication fails.

【００４３】音声出力部６は、Ｄ／Ａ（Digital-to-Ana
log）コンバータ、ＡＦ（Audio Frequency）増幅器及び
スピーカ等より構成されており、音響処理部３より供給
されたＰＣＭ信号を取得し、取得したデータが表す音声
を出力する。The audio output unit 6 has a D / A (Digital-to-Ana)
log) a converter, an AF (Audio Frequency) amplifier, a speaker, and the like, acquire the PCM signal supplied from the acoustic processing unit 3, and output a sound represented by the acquired data.

【００４４】なお、この音声合成システムの構成は上述
のものに限られない。たとえば、言語処理部１、単語辞
書記憶部２、音響処理部３及び復号化部５−１〜５−ｎ
の機能を、単一のＤＳＰやＣＰＵが行ってもよい。The configuration of the speech synthesis system is not limited to the above. For example, a language processing unit 1, a word dictionary storage unit 2, a sound processing unit 3, and decoding units 5-1 to 5-n
The function described above may be performed by a single DSP or CPU.

【００４５】また、音響処理部３が音声出力部５に供給
する信号は、ＰＣＭ信号である必要はない。また、暗号
化音素スペクトルデータをなすｎ個のデータが表すスペ
クトル成分が分布する帯域は、音素のスペクトル分布を
不等分に分割して得られるものであってもよい。The signal supplied from the audio processing unit 3 to the audio output unit 5 does not need to be a PCM signal. The band in which the spectral components represented by the n pieces of data constituting the encrypted phoneme spectrum data are distributed may be obtained by unequally dividing the phoneme spectrum distribution.

【００４６】また、音素辞書記憶部４は、復号化部５−
１〜５−ｎに着脱可能に接続される代わりに、電話回
線、専用回線、衛星回線等の通信回線を介して復号化部
５−１〜５−ｎに接続されてもよい。この場合、音素辞
書記憶部４と、復号化部５−１〜５−ｎとは、それぞ
れ、例えばモデムやＤＳＵ（Data Service Unit）等か
らなる通信制御部を備えていればよい。The phoneme dictionary storage unit 4 includes a decoding unit 5-
Instead of being detachably connected to the decoding units 5-1 to 5-n, they may be connected to the decoding units 5-1 to 5-n via communication lines such as telephone lines, dedicated lines, and satellite lines. In this case, the phoneme dictionary storage unit 4 and the decoding units 5-1 to 5-n may each include a communication control unit including, for example, a modem and a DSU (Data Service Unit).

【００４７】また、言語処理部１の文書入力部は、通信
回線を介して文書データを入力するようにしてもよい。
この場合、文書入力部も、例えばモデムやＤＳＵ等から
なる通信制御部を備えていればよい。また、言語処理部
１の文書入力部は、外部より放送された文書データを受
信することによりこの文書データを入力するようにして
もよい。この場合、文書入力部も、例えば無線受信機を
備えていればよい。Further, the document input unit of the language processing unit 1 may input document data via a communication line.
In this case, the document input unit may include a communication control unit including, for example, a modem and a DSU. Further, the document input unit of the language processing unit 1 may input the document data by receiving the document data broadcast from the outside. In this case, the document input unit may include, for example, a wireless receiver.

【００４８】また、音素辞書記憶部４は、暗号化音素ス
ペクトルデータをなすｎ個のデータのうちの一部に代え
て、当該一部を暗号化されていない状態で記憶していて
もよい。この場合、復号化部５−１〜５−ｎのうち当該
一部を取得したものは、特に復号化の処理を行うことな
く、当該一部を音響処理部３に供給するようにしてもよ
い。The phoneme dictionary storage unit 4 may store a part of the n pieces of data constituting the encrypted phoneme spectrum data in an unencrypted state instead of a part of the data. In this case, one of the decoding units 5-1 to 5-n that has obtained the part may supply the part to the sound processing unit 3 without performing the decoding process. .

【００４９】また、暗号化音素スペクトルデータをなす
ｎ個のデータのうち少なくとも一部は、スペクトル成分
のスペクトル強度を表すビット列のうち、ＭＳＢ（Most
Significant Bit）を含む上位の所定桁数の各ビット
（上位ビット）を暗号化されていない状態で含んでいて
もよい。また、上位ビットとその他の各ビット（下位ビ
ット）とが互いに異なる暗号化キーで暗号化されていて
もよい。そして、復号化部５−１〜５−ｎのうち、複数
の暗号化キーを用いて暗号化されたデータを供給される
復号化部は、複数の復号化キーの供給を受け、これらの
復号化キーを用いてこのデータを復号化するようにして
もよい。ただし、この場合、当該復号化部は、自己に供
給されたいずれの復号化キーを用いても復号化できない
部分が示す値を所定値（例えば、０）であるものとして
扱うものとする。At least a part of the n pieces of data constituting the encrypted phoneme spectrum data is MSB (Most) in the bit string representing the spectrum intensity of the spectrum component.
Each bit (upper bit) of a predetermined number of higher digits including a significant bit (significant bit) may be included in an unencrypted state. Also, the upper bits and the other bits (lower bits) may be encrypted with different encryption keys. Then, of the decryption units 5-1 to 5-n, the decryption unit to which the data encrypted using the plurality of encryption keys is supplied receives the supply of the plurality of decryption keys, and decrypts them. This data may be decrypted using the encryption key. However, in this case, the decryption unit treats a value indicated by a part that cannot be decrypted using any decryption key supplied to itself as a predetermined value (for example, 0).

【００５０】そして、例えば、上位ビットの復号化に用
いるべき復号化キーがこの音声合成システムの所持者等
に無償で提供される一方、下位ビットの復号化に用いる
べき復号化キーは有償で提供されるものとする。この場
合、この音声合成システムの所持者は、無償では、各周
波数成分のスペクトル強度の精度が劣る音声を合成させ
ることができるに過ぎない。一方、下位ビットを復号化
するための復号化キーを購入すれば、スペクトル強度の
精度が制限されていない、品質のよい音声を合成させる
ことができる。For example, a decoding key to be used for decoding the upper bits is provided free of charge to the owner of the speech synthesis system, while a decoding key to be used for decoding the lower bits is provided for a fee. Shall be performed. In this case, the owner of this speech synthesis system can simply synthesize speech with inferior accuracy of the spectral intensity of each frequency component at no cost. On the other hand, if a decoding key for decoding the lower bits is purchased, a high-quality speech with unlimited spectral intensity accuracy can be synthesized.

【００５１】なお、暗号化音素スペクトルデータをなす
ｎ個のデータの各々は、スペクトル強度を表すビット列
を、当該ビット列３個以上に分割して得られる各部分が
互いに異なる暗号化キーで暗号化された状態で含んでい
てもよいし、各部分のうちＭＳＢを含む部分が暗号化さ
れていない状態で含んでいてもよい。Each of the n pieces of data forming the encrypted phoneme spectrum data is obtained by dividing a bit string representing the spectrum intensity into three or more bit strings, and each part obtained by the division is encrypted with a different encryption key. It may be included in an encrypted state, or a part including the MSB of each part may be included in an unencrypted state.

【００５２】また、この音声合成システムは、この音声
合成システムの所持者等に課金された代金の金額を表す
課金額データを外部より取得し、暗号化音素スペクトル
データのうち復号化する部分を、取得した課金額データ
に基づいて決定し、決定した部分を復号化する復号化キ
ーを、当該部分を復号化する復号化部へと供給する装置
を備えていてもよい。この装置から復号化キーを供給さ
れた復号化部は、供給された復号化キーを取得して復号
化に用いればよい。なお、この装置は、復号化部５−１
〜５−ｎに供給する対象の復号化キーを予め記憶してい
てもよいし、外部から取得してもよい。この音声合成シ
ステムがこの装置を備えることにより、合成される音声
等の品質が、外部から供給されるデータに従って変化す
るようになる。The speech synthesizing system obtains, from the outside, billing amount data representing the amount of money charged to the owner of the speech synthesizing system and the like, and decrypts a portion of the encrypted phoneme spectrum data for decoding. An apparatus may be provided that determines a decision key based on the acquired billing amount data and supplies a decryption key for decrypting the determined part to a decryption unit that decrypts the part. The decryption unit supplied with the decryption key from this device may obtain the supplied decryption key and use it for decryption. In addition, this device includes a decoding unit 5-1.
The decryption key to be supplied to .about.5-n may be stored in advance, or may be obtained from outside. By providing this apparatus in the speech synthesis system, the quality of speech or the like to be synthesized changes according to data supplied from the outside.

【００５３】（第２の実施の形態）次に、この発明の第
２の実施の形態に係る音声合成システムを説明する。図
３は、この発明の音声合成システムの構成を示す図であ
る。図示するように、この音声合成システムの物理的構
成は、復号化部５−１〜５−ｎに代えて単一の復号化部
５を備える点を除き、第１の実施の形態における構成と
実質的に同一である。なお、音素辞書記憶部４は、音響
処理部３及び復号化部５に着脱可能に接続されるものと
する。(Second Embodiment) Next, a speech synthesis system according to a second embodiment of the present invention will be described. FIG. 3 is a diagram showing the configuration of the speech synthesis system of the present invention. As shown, the physical configuration of this speech synthesis system is the same as that of the first embodiment except that a single decoding unit 5 is provided instead of the decoding units 5-1 to 5-n. Substantially the same. Note that the phoneme dictionary storage unit 4 is detachably connected to the sound processing unit 3 and the decoding unit 5.

【００５４】この音声合成システムの音素辞書記憶部４
が記憶する音素辞書は、図４に示すように、音素を識別
する識別データを格納し、また、この音素の波形を表す
音素データ、又は、この音素の波形を表す音素データを
暗号化したものを含む暗号化音素データを、この識別デ
ータに対応付けて格納しているものとする。そして、音
素辞書記憶部４は、音響処理部３のアクセスに応答し
て、音素辞書に格納されている音素データ及び／又は暗
号化音素データを読み出し、復号化部５に供給する。な
お、音素データを暗号化して暗号化音素データを生成す
る手法は任意であり、対称鍵暗号の手法でもよいし、公
開鍵暗号の手法でもよい。The phoneme dictionary storage unit 4 of the speech synthesis system
As shown in FIG. 4, the phoneme dictionary stores identification data for identifying phonemes, and also encrypts phoneme data representing the waveform of the phoneme or phoneme data representing the waveform of the phoneme. Is stored in association with the identification data. Then, the phoneme dictionary storage unit 4 reads the phoneme data and / or the encrypted phoneme data stored in the phoneme dictionary in response to the access of the acoustic processing unit 3 and supplies the phoneme data to the decoding unit 5. The method of encrypting phoneme data to generate encrypted phoneme data is arbitrary, and may be symmetric key encryption or public key encryption.

【００５５】一方、音響処理部３は、音素継続時間長生
成の処理及び基本周波数パターン生成の処理を行った
後、音素辞書記憶部４にアクセスし、言語処理部１より
供給された表音文字列に含まれる音素をキーとして音素
辞書記憶部４が記憶する音素データ及び／又は暗号化音
素データを検索して、検索の結果索出された音素データ
及び／又は暗号化音素データを、復号化部５に供給す
る。そして、復号化部５から、この音素データ、及び／
又はこの暗号化音素データの復号化により得られる音素
データを供給されると、これらの音素データが表す音素
が、自己に供給された表音文字列内で並んでいる順序に
連結し、基本周波数パターン生成の処理で決定されたパ
ターンに合致するような基本周波数成分を有するよう変
形することにより、この表音文字列が表す音声全体の波
形を表す信号（ＰＣＭ等の変調形式で当該波形を表す信
号）を生成して、音声出力部６に供給するものとする。On the other hand, after performing the processing of generating the phoneme duration and the processing of generating the fundamental frequency pattern, the acoustic processing section 3 accesses the phoneme dictionary storage section 4 and obtains the phonetic character supplied from the language processing section 1. The phoneme dictionary storage unit 4 searches for phoneme data and / or encrypted phoneme data stored in the phoneme dictionary storage unit 4 using the phonemes included in the column as keys, and decodes the phoneme data and / or encrypted phoneme data retrieved as a result of the search. Supply to section 5. Then, the phoneme data and / or
Or, when the phoneme data obtained by decrypting the encrypted phoneme data is supplied, the phonemes represented by these phoneme data are connected in the order in which the phonemes represented in the phonetic character string supplied thereto are arranged. A signal representing the waveform of the entire voice represented by the phonetic character string (representing the waveform in a modulation format such as PCM) is obtained by deforming to have a fundamental frequency component that matches the pattern determined in the pattern generation processing. Signal) is generated and supplied to the audio output unit 6.

【００５６】復号化部５は、ＤＳＰやＣＰＵ等より構成
されており、音素辞書記憶部４が記憶する音素データ及
び／又は暗号化音素データを取得し、また、外部より復
号化キーを取得する。そして、取得した暗号化音素デー
タをこの復号化キーを用いて復号化し、自己が取得した
音素データ、及び、暗号化音素データの復号化により得
られた音素データを、音響処理部３に供給する。The decryption unit 5 is composed of a DSP, a CPU and the like, acquires phoneme data and / or encrypted phoneme data stored in the phoneme dictionary storage unit 4, and acquires a decryption key from outside. . Then, the obtained encrypted phoneme data is decrypted using this decryption key, and the phoneme data obtained by the self and the phoneme data obtained by decrypting the encrypted phoneme data are supplied to the acoustic processing unit 3. .

【００５７】なお、暗号化音素データが音素データを対
称鍵暗号の手法により暗号化したものからなる場合、復
号化部５が取得する復号化キーは、当該音素データの暗
号化のために用いた暗号化キーそのものであればよい。
また、暗号化音素データが音素データを公開鍵暗号の手
法により暗号化したものからなる場合、一対の秘密鍵及
び公開鍵のうち、公開鍵の方を当該音素データを暗号化
するための暗号化キーとして用い、秘密鍵の方を、当該
音素データの暗号化により得られた暗号化音素データを
復号化するための復号化キーとして用いればよい。復号
化部５に供給された復号化キーが、復号化部５が取得し
た暗号化音素データを復号化するために用い得る正しい
復号化キーである場合、復号化部５が復号化により生成
するデータは、当該暗号化音素データの生成に用いられ
た音素デーと同一のデータとなる。When the encrypted phoneme data is obtained by encrypting the phoneme data by the symmetric key encryption method, the decryption key obtained by the decryption unit 5 is used for encrypting the phoneme data. What is necessary is just the encryption key itself.
When the encrypted phoneme data is obtained by encrypting phoneme data by a public key encryption method, the public key of the pair of secret key and public key is encrypted for encrypting the phoneme data. The secret key may be used as a decryption key for decrypting the encrypted phoneme data obtained by encrypting the phoneme data. When the decryption key supplied to the decryption unit 5 is a correct decryption key that can be used to decrypt the encrypted phoneme data acquired by the decryption unit 5, the decryption unit 5 generates the decryption key by decryption. The data is the same as the phoneme data used to generate the encrypted phoneme data.

【００５８】一方、復号化部５は、自己に正しい復号化
キーが供給されなかった場合、自己に供給された暗号化
音素データが表す音素は実質的に無音であるものとして
扱う。（例えば、当該暗号化音素データを復号化して得
られるデータを音響処理部３に供給しないものとす
る。）従って、この場合、音響処理部３が音声出力部６
に供給するＰＣＭ信号が表す波形は、音響処理部３に供
給された表音文字列が表す音声の波形のうち、暗号化音
素データが表す音素にあたる波形が欠落した波形に実質
的に等しいものとなる。On the other hand, if the correct decryption key is not supplied to the decryption unit 5 itself, the decryption unit 5 treats the phoneme represented by the encrypted phoneme data supplied thereto as substantially silent. (For example, it is assumed that data obtained by decrypting the encrypted phoneme data is not supplied to the audio processing unit 3.) Therefore, in this case, the audio processing unit 3
The waveform represented by the PCM signal supplied to the audio processing unit 3 is substantially equal to the waveform in which the waveform corresponding to the phoneme represented by the encrypted phoneme data is missing from the waveform of the speech represented by the phonetic character string supplied to the acoustic processing unit 3. Become.

【００５９】そして、例えば、復号化キーが有償で提供
されるものとすると、この音声合成システムの所持者等
は、無償では、暗号化音素データが表す音素が欠落した
音声を合成させることができる一方、暗号化音素データ
が表す音素を含んだ音声を合成させることはできない。
しかし、復号化キーを購入すれば、暗号化音素データが
表す音素も含んだ、品質のよい音声を合成させることが
できる。For example, assuming that the decryption key is provided for a fee, the owner of the speech synthesizing system can synthesize speech without the phoneme represented by the encrypted phoneme data at no cost. On the other hand, it is not possible to synthesize a voice including a phoneme represented by the encrypted phoneme data.
However, if a decryption key is purchased, it is possible to synthesize a high-quality voice including the phoneme represented by the encrypted phoneme data.

【００６０】なお、この音声合成システムの構成も上述
のものに限られない。例えば、互いに異なる暗号化音素
データが、互いに異なる暗号化キーを用いて生成された
ものであってもよい。この場合、復号化部５に提供され
る復号化キーの数を、代金の金額等に応じて２段階以上
にわたって変えるものとすれば、この音声合成システム
を用いて合成される音声の品質も、３段階以上にわたっ
て可変とすることができる。The configuration of the speech synthesis system is not limited to the above. For example, different encrypted phoneme data may be generated using different encryption keys. In this case, if the number of decryption keys provided to the decryption unit 5 is changed in two or more stages in accordance with the amount of the price or the like, the quality of the voice synthesized using this voice synthesis system is also improved. It can be variable over three or more stages.

【００６１】また、音素辞書記憶部４は、復号化部５に
着脱可能に接続される代わりに、電話回線、専用回線、
衛星回線等の通信回線を介して復号化部５に接続されて
もよい。この場合、音素辞書記憶部４と、復号化部５と
は、それぞれ、例えばモデムやＤＳＵ等からなる通信制
御部を備えていればよい。The phoneme dictionary storage unit 4 includes a telephone line, a dedicated line,
It may be connected to the decoding unit 5 via a communication line such as a satellite line. In this case, the phoneme dictionary storage unit 4 and the decoding unit 5 may each include a communication control unit including, for example, a modem and a DSU.

【００６２】なお、復号化部５が復号化キーを取得する
手法は任意であり、たとえば、復号化部５は、外部の記
憶装置や記録媒体から復号化キーを読み出すようにして
もよい。また、外部の装置から通信回線を介して復号化
部５に復号化キーを送信し、復号化部５がこの復号化キ
ーを受信してもよい。また、復号化部５が、復号化キー
を書き換え可能に記憶する記憶装置を備え、当該復号化
キーを復号化に用いるようにしてもよい。The method by which the decryption unit 5 acquires the decryption key is arbitrary. For example, the decryption unit 5 may read the decryption key from an external storage device or a recording medium. Alternatively, a decryption key may be transmitted from an external device to the decryption unit 5 via a communication line, and the decryption unit 5 may receive the decryption key. Further, the decryption unit 5 may include a storage device that stores the decryption key in a rewritable manner, and the decryption key may be used for decryption.

【００６３】また、復号化キーは、保護用の暗号鍵を用
いて暗号化された状態で音素辞書記憶部４に記憶されて
いてもよい。一方、復号化部５は、通信回線等を介して
外部の装置より送信された保護用の暗号鍵を受信したと
き、音素辞書記憶部４にアクセスして暗号化されている
復号化キーを読み出し、自己が受信した保護用の暗号鍵
を用いて復号化することにより、復号化キーを取得する
ようにしてもよい。なお、復号化部５は、保護用の暗号
鍵を送信する外部の装置を認証し、認証に失敗したとき
は、保護用の暗号鍵の受信を拒絶するようにしてもよ
い。Further, the decryption key may be stored in the phoneme dictionary storage unit 4 in a state where the decryption key is encrypted by using the encryption key for protection. On the other hand, when receiving the encryption key for protection transmitted from an external device via a communication line or the like, the decryption unit 5 accesses the phoneme dictionary storage unit 4 and reads the encrypted decryption key. Alternatively, the decryption key may be obtained by decrypting using the protection encryption key received by itself. Note that the decryption unit 5 may authenticate an external device that transmits the protection encryption key, and may reject the reception of the protection encryption key if the authentication fails.

【００６４】以上、この発明の実施の形態を説明した
が、この発明にかかるコンテンツ合成装置は、専用のシ
ステムによらず、通常のコンピュータシステムを用いて
実現可能である。例えば、Ｄ／Ａコンバータやスピーカ
等を備え、フラッシュメモリ等の不揮発性記憶装置を着
脱可能に接続するコンピュータに上述の言語処理部１、
単語辞書記憶部２、音響処理部３、復号化部５−１〜５
−ｎ及び音声出力部６の動作を実行させ、当該コンピュ
ータに着脱可能に接続される不揮発性記憶装置に音素辞
書記憶部４の動作を実行させるためのプログラムを格納
した媒体（ＣＤ−ＲＯＭ、ＭＯ、フロッピーディスク
等）から該プログラムをインストールすることにより、
上述の処理を実行する音声合成システムを構成すること
ができる。Although the embodiment of the present invention has been described above, the content synthesizing apparatus according to the present invention can be realized by using a general computer system without using a dedicated system. For example, the above-described language processing unit 1 is connected to a computer including a D / A converter, a speaker, and the like, to which a nonvolatile storage device such as a flash memory is detachably connected.
Word dictionary storage unit 2, sound processing unit 3, decoding units 5-1 to 5
-N and a medium (CD-ROM, MO) storing a program for causing the non-volatile storage device detachably connected to the computer to execute the operation of the phoneme dictionary storage unit 4. , Floppy disk, etc.)
A speech synthesis system that performs the above-described processing can be configured.

【００６５】また、例えば、通信回線の掲示板（ＢＢ
Ｓ）に当該プログラムを掲示し、これを通信回線を介し
て配信してもよく、また、当該プログラムを表す信号に
より搬送波を変調し、得られた変調波を伝送し、この変
調波を受信した装置が変調波を復調して当該プログラム
を復元するようにしてもよい。そして、当該プログラム
を起動し、ＯＳの制御下に、他のアプリケーションプロ
グラムと同様に実行することにより、上述の処理を実行
することができる。For example, a bulletin board (BB) of a communication line
The program may be posted on S) and distributed via a communication line. Alternatively, a carrier wave is modulated by a signal representing the program, the obtained modulated wave is transmitted, and the modulated wave is received. The device may restore the program by demodulating the modulated wave. Then, by starting up the program and executing it under the control of the OS in the same manner as other application programs, the above-described processing can be executed.

【００６６】なお、ＯＳが処理の一部を分担する場合、
あるいは、ＯＳが本願発明の１つの構成要素の一部を構
成するような場合には、記録媒体には、その部分を除い
たプログラムを格納してもよい。この場合も、この発明
では、その記録媒体には、コンピュータが実行する各機
能又はステップを実行するためのプログラムが格納され
ているものとする。When the OS shares a part of the processing,
Alternatively, when the OS constitutes a part of one component of the present invention, the recording medium may store a program excluding the part. Also in this case, in the present invention, it is assumed that the recording medium stores a program for executing each function or step executed by the computer.

【００６７】[0067]

【発明の効果】以上の説明のように、この発明によれ
ば、コンテンツの合成のために用いるデータを容易に十
分な量確保できるコンテンツ合成装置及びコンテンツ合
成方法が実現される。また、この発明によれば、合成さ
れたコンテンツやコンテンツの合成のために用いるデー
タについて生じる権利を適切に保護できるコンテンツ合
成装置及びコンテンツ合成方法が実現される。As described above, according to the present invention, a content synthesizing apparatus and a content synthesizing method capable of easily securing a sufficient amount of data used for synthesizing content are realized. Further, according to the present invention, a content synthesizing apparatus and a content synthesizing method capable of appropriately protecting a right generated for synthesized content and data used for synthesizing the content are realized.

[Brief description of the drawings]

【図１】この発明の第１の実施の形態に係る音声合成シ
ステムの構成を示す図である。FIG. 1 is a diagram showing a configuration of a speech synthesis system according to a first embodiment of the present invention.

【図２】図１の音素辞書記憶部が記憶する音素辞書のデ
ータ構造を模式的に示す図である。FIG. 2 is a diagram schematically illustrating a data structure of a phoneme dictionary stored in a phoneme dictionary storage unit in FIG. 1;

【図３】この発明の第２の実施の形態に係る音声合成シ
ステムの構成を示す図である。FIG. 3 is a diagram showing a configuration of a speech synthesis system according to a second embodiment of the present invention.

【図４】図３の音素辞書記憶部が記憶する音素辞書のデ
ータ構造を模式的に示す図である。FIG. 4 is a diagram schematically illustrating a data structure of a phoneme dictionary stored in a phoneme dictionary storage unit in FIG. 3;

[Explanation of symbols]

１言語処理部２単語辞書記憶部３音響処理部４音素辞書記憶部５、５−１〜５−ｎ復号化部６音声出力部 Reference Signs List 1 language processing unit 2 word dictionary storage unit 3 acoustic processing unit 4 phoneme dictionary storage unit 5, 5-1 to 5-n decoding unit 6 voice output unit

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｎ 7/16 Ｈ０４Ｌ 9/00 ６０１Ａ 7/167 Ｈ０４Ｎ 7/167 ＺＦターム(参考） 5B017 AA07 BA07 CA15 5C064 BA07 BB01 BB02 BB10 BC01 BC06 BC18 BC22 BD01 BD09 CA14 CA16 CC04 5D045 AA20 5J104 AA15 AA16 EA06 EA17 NA02 NA27 NA37 PA11 ──────────────────────────────────────────────────の Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) H04N 7/16 H04L 9/00 601A 7/167 H04N 7/167 Z F term (Reference) 5B017 AA07 BA07 CA15 5C064 BA07 BB01 BB02 BB10 BC01 BC06 BC18 BC22 BD01 BD09 CA14 CA16 CC04 5D045 AA20 5J104 AA15 AA16 EA06 EA17 NA02 NA27 NA37 PA11

Claims

[Claims]

1. A storage means for storing a plurality of material data, at least one of which is data for synthesizing contents, and at least one of which is encrypted, said storage means being detachably connected to said storage means, Acquiring the material data, acquiring the decryption key when a decryption key associated with at least one of the material data is supplied, and acquiring the acquired decryption key among the acquired material data; Is mapped to
Decryption means for decrypting using the decryption key; and content combining means for combining the content based on the decrypted material data and / or the unencrypted material data. A content synthesizing apparatus characterized by the above-mentioned.

2. The storage means comprises means for storing the encrypted decryption key and supplying the encrypted decryption key to the decryption means, wherein the decryption means is provided from an external device. When a protection decryption key for decrypting the encrypted decryption key is supplied, the protection decryption key is obtained, and the encrypted decryption key supplied from the storage unit is obtained. The content synthesizing apparatus according to claim 1, wherein the encrypted decryption key is decrypted using the protection decryption key.

3. A plurality of material data, which are data for synthesizing contents and at least one of which is encrypted, is obtained from the outside, and decryption associated with at least one of the material data is performed. When the key is supplied, obtain the decryption key, and among the obtained material data,
Decryption means for decrypting the data associated with the obtained decryption key using the decryption key; and the decryption means based on the decrypted material data and / or the unencrypted material data. A content synthesizing device, comprising: content synthesizing means for synthesizing content.

4. Each of the material data represents a spectrum of a component part of the content for each band, and the content synthesizing means has a spectrum represented by the decrypted material data, and The content synthesizing apparatus according to claim 1, wherein the content synthesizing apparatus is configured to generate data representing the content in which a spectrum represented by the missing material data is substantially missing.

5. The method according to claim 1, further comprising: acquiring charge amount data indicating a charge amount for a user of said content from outside; determining material data to be decoded based on the charge amount indicated by said acquired charge amount data; Decryption key supply means for supplying the decryption key associated with the material data to the decryption means, wherein the decryption means acquires the decryption key supplied by the decryption key supply means The content synthesizing device according to any one of claims 1 to 4, wherein

6. A storage device for storing a plurality of material data, at least one of which is data for synthesizing contents, which is encrypted, and via an access device detachably connected to the storage device. Acquiring the material data from the storage device, acquiring the decryption key when a decryption key associated with at least one of the material data is supplied, of the acquired material data, Decrypting the content associated with the obtained decryption key using the decryption key, and synthesizing the content based on the decrypted material data and / or the unencrypted material data; A content synthesizing method, comprising:

7. A computer comprising at least one data for synthesizing contents.
When a plurality of material data items each of which has been encrypted are obtained from the outside and a decryption key associated with at least one of the material data items is supplied, the decryption key is obtained and obtained. A decryption unit that decrypts, using the decryption key, the data associated with the obtained decryption key, from the decrypted material data and / or the unencrypted material data. A content synthesizing means for synthesizing the content based on the material data;

8. A computer which is detachably connected to a storage device for storing a plurality of material data, at least one of which is data for synthesizing content, wherein at least one of the plurality of material data is encrypted. To obtain the decryption key when a decryption key associated with at least one of the material data is supplied, and to obtain the decryption key corresponding to the obtained decryption key among the obtained material data. Decryption means for decrypting the attached data using the decryption key; and content synthesizing means for synthesizing the content based on the decrypted material data and / or the unencrypted material data. And a program to make it work.