JP3340581B2

JP3340581B2 - Text-to-speech device and window system

Info

Publication number: JP3340581B2
Application number: JP06030895A
Authority: JP
Inventors: 俊之在塚; 信夫畑岡; 義人禰寝; 英明菊池; 寿一 ▲高▼橋
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-03-20
Filing date: 1995-03-20
Publication date: 2002-11-05
Anticipated expiration: 2017-11-05
Also published as: JPH08263260A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、テキスト情報を読み上
げるテキスト読み上げ方法に係り、複数の文字種やレイ
アウトを持つテキストの構造や、表示画面上のテキスト
位置を表現するテキスト読み上げ方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text-to-speech method for reading out text information, and more particularly to a text-to-speech method for expressing a text structure having a plurality of character types and layouts and a text position on a display screen.

【０００２】[0002]

【従来の技術】コンピュータや通信端末等の情報機器の
発達に伴って情報の電子化が進んできた。このような電
子化された情報は、加工や蓄積、伝送、提示等を行うう
えで扱いやすい。また、最近の音声合成技術の進歩によ
って、電子化されたテキスト情報を音声化して読み上げ
ることが可能になった。このような背景の中で、最近、
文章の校正や読み合わせ作業、視覚障害者の情報取得
や、文字の読み取りが困難な状況などに、テキストの音
声による読み上げが利用されている。特に、携帯情報端
末などでは、表示画面が小さく見にくいことや、移動中
には表示を見ることが困難なため、音声によるテキスト
の読み上げが有効になる。2. Description of the Related Art With the development of information devices such as computers and communication terminals, information has been digitized. Such electronic information is easy to handle in processing, storing, transmitting, presenting, and the like. In addition, recent advances in speech synthesis technology have made it possible to read out digitized text information by voice. Against this background,
2. Description of the Related Art Text-to-speech reading is used in proofreading and reading work of texts, acquiring information on visually impaired persons, and in situations where reading characters is difficult. In particular, in a portable information terminal or the like, since the display screen is small and difficult to see, and it is difficult to see the display while moving, reading out text by voice is effective.

【０００３】一方、パーソナルコンピュータやワークス
テーションの端末制御方法として、近年、マルチウイン
ドウシステムが多く用いられている。このようなシステ
ムにおいては、マウス等のポインティングデバイスを用
いて、ウインドウやアイコンの選択、メニュー操作によ
るコマンド選択、ファイル検索等を行うが、この際に、
ウインドウ名やアイコン名、コマンド種類やファイル名
等を音声で読み上げることによって、ウインドウシステ
ムの操作結果を理解しやすくすることができる。特に、
視覚情報の取得が不可能か、制限のある視覚障害者にと
っては、このような音声による読み上げはウインドウシ
ステムの操作を行うためには不可欠である。On the other hand, in recent years, a multi-window system has been widely used as a terminal control method for a personal computer or a workstation. In such a system, a pointing device such as a mouse is used to select a window or an icon, select a command by operating a menu, search a file, and the like.
By reading out the window name, icon name, command type, file name, and the like by voice, the operation result of the window system can be easily understood. In particular,
For visually impaired persons who cannot obtain or have limited visual information, such reading aloud by voice is indispensable for operating the window system.

【０００４】従来提案されているテキスト読み上げ方法
としては、例えば、J. Allen, M. S. Hunnicutt and D.
Klatt, "From text to speech: The MITalk system"
（Cambridge University Press, 1987）記載の方法があ
る。これは、テキストを解析し、規則を用いて音声信号
に変換する方法である。また、テキスト中の文の種類等
を判別するために、文の種類によって音声の音質を変更
する方法が考案されている。このような方法として、例
えば、「文章読み上げ装置」（特開昭６２−２１９０６
６）記載の様に、文章中の括弧や引用符などの記号を検
出したときに、読み上げ音声の音質を変化させることに
より、引用文部分等を識別可能にする方法や、「音声出
力装置」（特開昭５６−１０１２４７）記載の様に、文
字ごとに予め読み属性を割り当てておき、その読み属性
に従って音声を規則合成する方法がある。Conventionally proposed text-to-speech methods include, for example, J. Allen, MS Hunnicutt and D.
Klatt, "From text to speech: The MITalk system"
(Cambridge University Press, 1987). This is a method of analyzing text and converting it into an audio signal using rules. In order to determine the type of a sentence in a text, a method of changing the sound quality of voice according to the type of a sentence has been devised. As such a method, for example, a “sentence reading device” (Japanese Patent Laid-Open No. 62-21906)
6) As described, when a symbol such as a parenthesis or a quotation mark in a sentence is detected, a method of changing a sound quality of a reading voice so that a quotation part or the like can be identified, or a “speech output device” As described in (JP-A-56-101247), there is a method of assigning a reading attribute to each character in advance, and synthesizing a speech in accordance with the reading attribute.

【０００５】[0005]

【発明が解決しようとする課題】ビットマップディスプ
レイ等を用いることによって、テキスト情報に文字の種
類や大きさ、文字飾り等の付加的情報を与え、文書全体
のレイアウトを構造化して視覚的により読み取りやすく
情報を提示することが可能になってきている。ところ
が、このようなディスプレイを想定した付加的情報付き
電子化テキストは、表示画面が小さい場合や表示を見る
ことができない場合には、従来の文章読み上げ方法で
は、複数の文字種や大きさ、文字飾り等の文字属性を含
み、段落分けや、段組み、タイトル等、レイアウトを持
ったテキストの構造を、単調な音声による読み上げのみ
で理解することは困難である。また、マルチウインドウ
システムにおけるウインドウの名称や、メニューのコマ
ンド名等は、画面上の様々な位置に表示されている。こ
のような名称等を音声によって読み上げる場合、従来の
読み上げ方法では、表示位置を判別することができない
という問題があった。By using a bitmap display or the like, additional information such as character type, size, and character decoration is given to text information, and the layout of the entire document is structured and visually read. It is becoming possible to present information easily. However, if the display screen is small or the display cannot be viewed, the digitized text with additional information assuming such a display cannot be displayed in a plurality of character types, sizes, and character decorations by the conventional text-to-speech method. It is difficult to understand the structure of a text including character attributes such as paragraphs, columns, titles, etc., only by reading a monotonous voice. In addition, the names of windows and menu command names in the multi-window system are displayed at various positions on the screen. When such a name or the like is read out by voice, there is a problem that the display position cannot be determined by the conventional reading method.

【０００６】本発明の目的は、このようなレイアウトを
持つテキストの構造を、音声やサウンドを用いて表現す
るテキスト読み上げ方法を提供することにある。本発明
のもうひとつの目的は、ウインドウシステム等の操作に
おける読み上げ対象の画面上の位置を、音声やサウンド
を用いて表現するテキスト読み上げ方法を提供すること
にある。An object of the present invention is to provide a text-to-speech method for expressing the structure of a text having such a layout by using voice and sound. It is another object of the present invention to provide a text-to-speech method for expressing a position on a screen of a reading target in operation of a window system or the like using voice or sound.

【０００７】[0007]

【課題を解決するための手段】テキスト中の文字のフォ
ント、下線、大きさ等の文字属性の分類手段を有し、該
分類ごとに異なる音質を持つ合成音声を用いてテキスト
を読み上げる方法を設けた。Means are provided for classifying character attributes such as font, underline, and size of characters in a text, and a method is provided for reading out a text using synthesized speech having a different sound quality for each classification. Was.

【０００８】また、サウンドの種類や音質、大きさ等を
変えて出力するサウンド生成手段と、テキスト中の文字
のフォント、下線、大きさ等の文字属性の分類手段を有
し、該分類ごとに異なる音質を持つサウンドを読み上げ
音声と同時に出力する方法を設けた。In addition, the apparatus has a sound generating means for changing the type, sound quality, size and the like of a sound and outputting the same, and means for classifying character attributes such as font, underline and size of a character in a text. A method is provided for outputting sounds with different sound quality at the same time as reading out voices.

【０００９】また、テキストから音声を合成する音声合
成手段と、テキストの文字に対応付けてサウンドを生成
するサウンド生成手段と、音声およびサウンドを出力す
る音響出力手段を有し、対象の位置や状態、対象に割り
当てたテキストの文字属性等によって、対象を音声によ
って読み上げるか、サウンドを出力するか、音声によっ
て読み上げてサウンドを出力する方法を設けた。The apparatus further includes a voice synthesizing means for synthesizing voice from text, a sound generating means for generating a sound in association with text characters, and a sound output means for outputting voice and sound. According to the present invention, there is provided a method of reading out a target by voice, outputting a sound, or reading out a sound by voice and outputting a sound depending on the character attribute of text assigned to the target.

【００１０】さらに、音声を任意の方向から聞こえるよ
うに提示することが可能な音像定位手段と、現在読み上
げている文字の、視覚的表示におけるテキスト全体に対
する相対的な座標を、音響空間上に写像する手段を有
し、テキスト中の現在読み上げている文字の、テキスト
全体に対する相対位置に対応する音響空間上の位置に、
該文字に割り当てられた音声信号やサウンド信号を定位
する方法を設けた。[0010] Further, a sound image localization means capable of presenting sound so as to be heard from an arbitrary direction, and mapping the coordinates of the character currently read out to the entire text in a visual display on an acoustic space. A position in the acoustic space corresponding to a relative position of the currently read out character in the text with respect to the entire text,
A method is provided for locating a voice signal or a sound signal assigned to the character.

【００１１】[0011]

【作用】テキスト中の文字のフォント、下線、大きさ等
の文字属性の分類手段を有し、該分類ごとに異なる音質
を持つ合成音声を用いてテキストを読み上げる方法を設
けたことにより、文字の属性を音声の種類で提示できる
ようになる。According to the present invention, there is provided a method of classifying character attributes such as font, underline, and size of a character in a text, and providing a method of reading out a text using a synthesized voice having a different sound quality for each classification. The attribute can be presented in the type of voice.

【００１２】また、音の音質や大きさを変えて出力する
音出力手段を有し、テキスト中の文字のフォント、下
線、大きさ等の文字属性の分類手段を有し、該分類ごと
に異なる音質を持つ音を読み上げ音声と同時に出力する
方法を設けたことにより、文字の属性を音で提示できる
ようになる。Also, there is provided sound output means for changing the sound quality and size of the sound and outputting the same, and means for classifying character attributes such as font, underline and size of characters in the text. By providing a method of outputting a sound having a sound quality at the same time as a reading voice, the attribute of the character can be presented as a sound.

【００１３】また、テキストから音声を合成する音声合
成手段と、テキストの文字に対応付けてサウンドを生成
するサウンド生成手段と、音声およびサウンドを出力す
る音響出力手段を有し、対象の位置や状態、対象に割り
当てたテキストの文字属性等によって、対象を音声によ
って読み上げるか、サウンドを出力するか、音声によっ
て読み上げてサウンドを出力する方法を設けたことによ
り、対象の位置や状態、対象に割り当てたテキストの文
字属性の違い等を提示できるようになる。[0013] The apparatus further includes voice synthesis means for synthesizing voice from text, sound generation means for generating sound in association with text characters, and sound output means for outputting voice and sound. Depending on the character attributes of the text assigned to the target, the target is read aloud, output a sound, or provided a method of reading the sound aloud and outputting a sound. It becomes possible to present differences in character attributes of texts and the like.

【００１４】さらに、現在読み上げているテキスト中の
文字のテキスト全体に対する相対的な位置を音響空間に
写像し、現在読み上げられている文字のテキスト全体に
対する相対位置に対応した方向から音声やサウンドが聞
こえるように提示する手段を設けたことにより、現在読
み上げられている文字のテキスト全体に対する相対位置
を提示できるようになる。Further, the relative position of the character in the currently read text with respect to the entire text is mapped to the acoustic space, and a voice or sound can be heard from a direction corresponding to the relative position of the currently read character with respect to the entire text. By providing such a presentation means, it becomes possible to present the relative position of the currently read out character to the entire text.

【００１５】[0015]

【実施例】以下、本発明の実施例を図を用いて説明す
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below with reference to the drawings.

【００１６】図１は、本発明の方法を実行するためのハ
ードウエア構成の一実施例である。図１は、音声および
サウンド出力のための音響デバイスを有する典型的なコ
ンピュータシステムの構成を持つ。すわなち、演算を行
うＣＰＵ１０１、演算のためのプログラムおよびデータ
を一時的に保持し、逐次書き換えを行うＲＡＭ１０２、
システム起動プログラム等を格納するＲＯＭ１０３、シ
ステム入出力を制御するためのＩ／Ｏコントローラ１０
４、システムプログラム、アプリケーションプログラ
ム、データ等を保持するための磁気ディスク等のディス
ク装置１０５を有し、これらはシステムバス１０６を介
して命令およびデータ転送を行う。また、Ｉ／Ｏコント
ローラ１０４は、ＣＲＴディスプレイ等の表示デバイス
１０７、マウス、タッチパネル等のポインティングデバ
イス１０８、キーボード等の入力デバイス１０９、デー
タ送受用通信ポート１１０、マイクロフォン、スピーカ
等の音響デバイス１１１をバス１１２を介して制御す
る。FIG. 1 shows an embodiment of a hardware configuration for executing the method of the present invention. FIG. 1 has the configuration of a typical computer system having audio devices for voice and sound output. That is, a CPU 101 that performs an operation, a RAM 102 that temporarily holds a program and data for the operation and sequentially rewrites the data,
ROM 103 for storing a system boot program and the like, I / O controller 10 for controlling system input / output
4. It has a disk device 105 such as a magnetic disk for holding system programs, application programs, data, and the like, and these perform instructions and data transfer via a system bus 106. The I / O controller 104 also includes a display device 107 such as a CRT display, a pointing device 108 such as a mouse and a touch panel, an input device 109 such as a keyboard, a communication port 110 for data transmission and reception, a microphone, and an acoustic device 111 such as a speaker. Control via 112.

【００１７】このようなハードウェアの標準的制御は、
一般にオペレーティングシステムと呼ばれるソフトウェ
アで行われる。本発明であるテキスト読み上げ方法のア
ルゴリズムは、該ハードウェアおよびオペレーティング
システム上で実行されるソフトウェアとして実現するこ
とが可能である。The standard control of such hardware is as follows.
This is generally performed by software called an operating system. The algorithm of the text-to-speech method according to the present invention can be realized as software executed on the hardware and the operating system.

【００１８】本方法のアルゴリズムを実行するソフトウ
ェアは、ＲＯＭ１０３またはディスク装置１０５に格納
され、システムの立ち上げ後、必要に応じてＲＡＭ１０
２にロードされて、ＣＰＵ１０１によって実行される。
また、ディスク装置１０５に予め格納されている電子化
テキストや、入力デバイス１０９や、通信ポート１１０
を介して入力されたテキストに対して、本テキスト読み
上げ方法によって生成した音声やサウンドは、音響デバ
イス１１１を通して出力される。このとき、システムの
使用者による制御やテキストの入力に、ポインティング
デバイス１０８や入力デバイス１０９、通信ポート１１
０を用いてもよいし、電子化テキストやその他のデータ
をＣＲＴディスプレイ等の表示デバイス１０７を用いて
表示してもよい。ただし、これらのデバイスは、本発明
であるテキスト読み上げ方法に必須ではない。The software for executing the algorithm of the present method is stored in the ROM 103 or the disk device 105.
2 is executed by the CPU 101.
Also, digitized text stored in the disk device 105 in advance, an input device 109, a communication port 110
The voice and sound generated by the text-to-speech method for the text input through the audio device 111 are output through the audio device 111. At this time, the pointing device 108, the input device 109, the communication port 11
0 may be used, or digitized text or other data may be displayed using a display device 107 such as a CRT display. However, these devices are not essential for the text-to-speech method of the present invention.

【００１９】本実施例は、例えば音響出力デバイスを有
するパーソナルコンピュータやワークステーションを用
いて実現することが可能である。This embodiment can be realized by using, for example, a personal computer or a workstation having an audio output device.

【００２０】図２は、立体音響出力を行う場合の、本発
明の方法を実現するためのハードウエア構成の一実施例
である。図２において、本発明を用いて生成された立体
音響を用いた音声やサウンドは、立体音響デバイス２０
１から出力される。立体音響デバイスは、たとえば市販
の立体音場創成用高速演算ボードおよび複数スピーカー
を用いることが可能である。FIG. 2 shows an embodiment of a hardware configuration for realizing the method of the present invention when performing stereophonic sound output. In FIG. 2, a sound or a sound using the stereophonic sound generated by using the present invention is
1 is output. As the three-dimensional sound device, for example, a commercially available high-speed operation board for creating a three-dimensional sound field and a plurality of speakers can be used.

【００２１】以下、本発明であるテキスト読み上げ方法
のアルゴリズムの各処理ブロックの機能について説明す
る。Hereinafter, the function of each processing block of the algorithm of the text-to-speech method according to the present invention will be described.

【００２２】まず、本発明におけるテキスト音声化を実
現する処理部分を示す。図３は、文字属性によって音質
を変える音声規則合成処理ブロックの構成を表す図であ
る。図３において、入力されたテキストは、読み上げ位
置検出処理ブロック３０１において読み上げる文字を検
出し、言語処理ブロック３０２において言語処理され、
韻律生成処理ブロック３０３において、イントネーショ
ンやアクセント等の韻律が決定される。また、音韻生成
処理ブロック３０４は、言語処理結果から音韻系列を生
成する。一方、文字属性抽出処理ブロック３０５は、テ
キストデータの読み上げ位置の文字に付与されている文
字属性を抽出し、音質制御処理ブロック３０６で、予め
文字属性に音声の音色を対応付けておいた音色テーブル
３０７や、話者を対応付けた話者テーブル３０８、文字
サイズに対応付けたレベルテーブル３０９を用いて音質
制御データを作成する。文字属性は、表示または印刷用
に文字属性を持つ電子化テキストには、予め各文字に対
して付与されており、この属性データを用いることによ
って容易に抽出することができる。音響パラメータ生成
処理ブロック３１０は、素片辞書３１１に格納されてい
る素片データを用いて、韻律生成処理ブロック３０３で
作成された韻律データ、音韻生成処理ブロック３０４で
作成された音韻データ、および音質制御処理ブロック３
０６で作成された音質データから実際に音響パラメータ
を生成し、信号生成処理ブロック３１２において音声信
号を生成して出力する。First, a processing portion for realizing text-to-speech in the present invention will be described. FIG. 3 is a diagram illustrating a configuration of a speech rule synthesis processing block that changes sound quality according to character attributes. In FIG. 3, the input text is subjected to language processing in a language processing block 302 by detecting a character to be read in a reading position detection processing block 301,
In the prosody generation processing block 303, the prosody such as intonation and accent is determined. The phoneme generation processing block 304 generates a phoneme sequence from the language processing result. On the other hand, a character attribute extraction processing block 305 extracts a character attribute assigned to the character at the text-to-speech position of the text data. 307, a speaker table 308 associated with a speaker, and a level table 309 associated with a character size, to create sound quality control data. The character attribute is given to each character in advance to the digitized text having the character attribute for display or printing, and can be easily extracted by using this attribute data. The acoustic parameter generation processing block 310 uses the segment data stored in the segment dictionary 311 to generate the prosody data generated by the prosody generation processing block 303, the phoneme data generated by the phoneme generation processing block 304, and the sound quality. Control processing block 3
Acoustic parameters are actually generated from the sound quality data created in step 06, and a signal generation processing block 312 generates and outputs an audio signal.

【００２３】テキストから再生音声情報を合成し、音声
信号を得る手順は、例えば、J. Allen, M. S. Hunnicut
t and D. Klatt, "From text to speech: The MITalk s
ystem"（Cambridge University Press,,1987）に詳細に
記述されている。A procedure for synthesizing reproduced audio information from text and obtaining an audio signal is described in, for example, J. Allen, MS Hunnicut.
t and D. Klatt, "From text to speech: The MITalk s
ystem "(Cambridge University Press ,, 1987).

【００２４】ただし、テキストから音声を得る方法とし
ては、例えばテキストの単語や文等と、該単語や文を発
声した音声信号を対応付けておき、出現した単語に対し
て音声信号を出力するいわゆる録音再生方法を用いるこ
とも可能である。However, as a method of obtaining voice from text, for example, a word or sentence of a text is associated with a voice signal that utters the word or sentence, and a so-called voice signal is output for the appearing word. It is also possible to use a recording / reproducing method.

【００２５】図４は、図３における音色テーブルの例で
ある。図４では、下線や太字等の文字属性ごとに、音声
の基本周波数の変更や、ローパスフィルタリング等の加
工を行うことによって、文字属性ごとに音声の音色を変
えるために、各文字属性に対して異なる加工方法を対応
付けている。FIG. 4 is an example of the tone color table in FIG. In FIG. 4, in order to change the tone of the voice for each character attribute by changing the fundamental frequency of the voice or performing processing such as low-pass filtering for each character attribute such as underline or bold, Different processing methods are associated.

【００２６】図５は、図３における話者テーブルの例で
ある。図５では、文字の字体ごとに異なる話者を対応付
け、字体が変わった際に話者を変えることによって字体
の違いを表現する。図５における話者インデクスは、図
３の素片辞書に含まれる図６に示すような話者データの
番号を表わし、図３における音質制御処理ブロックは、
抽出された字体に対応する話者を素片辞書から選択す
る。また、図７に示すように、文字属性に対して異なる
話者を対応付け、文字属性の違いを話者の違いで表現し
てもよい。FIG. 5 is an example of the speaker table in FIG. In FIG. 5, different speakers are associated with each font, and when the font changes, the speaker is changed to express the difference in the font. The speaker index in FIG. 5 indicates the number of the speaker data as shown in FIG. 6 included in the segment dictionary in FIG. 3, and the sound quality control processing block in FIG.
A speaker corresponding to the extracted font is selected from the segment dictionary. Further, as shown in FIG. 7, different speakers may be associated with character attributes, and differences in character attributes may be represented by differences in speakers.

【００２７】図８は、図３におけるレベルテーブルの例
である。図８では、文字の大きさによって異なるレベル
の音を出力するように、文字サイズと出力レベルを対応
付けることによって文字の大きさの違いを表現する。な
お、出力レベルは、必ずしも文字サイズの差に対して等
比である必要はなく、聴感上の差に合わせたレベルを選
択してもよい。FIG. 8 is an example of the level table in FIG. In FIG. 8, a difference in character size is expressed by associating a character size with an output level so that sounds of different levels are output depending on the character size. Note that the output level does not necessarily need to be equal to the difference in character size, and a level matching the difference in audibility may be selected.

【００２８】次に、本発明のアルゴリズムにおいてテキ
ストからサウンドを生成する処理部分を示す。図９は、
文字属性ごとに異なる音色のサウンドを出力するサウン
ド生成処理ブロックの構成を表わす図である。図９にお
いて、入力されたテキストに対し、文字属性抽出処理ブ
ロック９０１は、読み上げ位置検出処理ブロック９０２
によって検出されたテキスト中の読み上げ位置の文字に
ついて、図３と同様に入力されたテキストから文字属性
を抽出する。音質制御処理ブロック９０３は、抽出され
た文字属性から、予め文字属性に対応付けておいたサウ
ンドテーブル９０４や図３と同様に図８に示すような文
字サイズに対応付けたレベルテーブル９０５を用いて音
質制御データを作成する。音響パラメータ生成処理ブロ
ック９０６は、サウンドデータ９０７の中から、サウン
ドテーブル９０４によって文字属性に対応付けられたサ
ウンドを選択し、レベルを変更した音響パラメータを生
成する。信号生成処理ブロック９０８は、音響パラメー
タからサウンド信号を生成する。Next, a processing portion for generating a sound from text in the algorithm of the present invention will be described. FIG.
FIG. 3 is a diagram illustrating a configuration of a sound generation processing block that outputs a sound having a different timbre for each character attribute. 9, a character attribute extraction processing block 901 performs a reading position detection processing block 902 on an input text.
The character attribute is extracted from the input text in the same manner as in FIG. The sound quality control processing block 903 uses a sound table 904 associated with the character attributes in advance from the extracted character attributes and a level table 905 associated with the character size as shown in FIG. Create sound quality control data. The sound parameter generation processing block 906 selects a sound associated with the character attribute from the sound data 907 from the sound table 904, and generates a sound parameter whose level has been changed. The signal generation processing block 908 generates a sound signal from the acoustic parameters.

【００２９】図１０は、図９におけるサウンドテーブル
の例である。文字属性ごとに異なるサウンドを対応付
け、各文字属性に対して異なるサウンドを出力すること
によって文字属性の違いを表現する。FIG. 10 is an example of the sound table in FIG. A different sound is associated with each character attribute, and a different sound is output for each character attribute to express a difference in the character attribute.

【００３０】図１１は、図９におけるサウンドテーブル
の別の例である。この例では、サウンドとして純音や楽
器音、帯域雑音等を対応付け、音色を基本周波数や中心
周波数を変えることによって変更することができる。こ
の場合は、例えば使用者が音色を変更できるような公知
の入出力インタフェースを用意すれば、各サウンドの音
色を、使用者の好みに合わせて変更することができる。FIG. 11 is another example of the sound table in FIG. In this example, pure sounds, instrument sounds, band noises, and the like are associated as sounds, and the timbre can be changed by changing the fundamental frequency or the center frequency. In this case, for example, if a known input / output interface that allows the user to change the timbre is prepared, the timbre of each sound can be changed according to the user's preference.

【００３１】また、図４の音質加工方法や、図５、７に
おける話者、図８のレベル、図１０、１１に示したサウ
ンドについても、使用者の好みに合わせて選択したり、
新たに登録したりできるようにしてもよい。Also, the sound quality processing method shown in FIG. 4, the speakers shown in FIGS. 5 and 7, the levels shown in FIG. 8, and the sounds shown in FIGS. 10 and 11 can be selected according to the user's preference.
You may make it possible to newly register.

【００３２】ここで、本発明であるテキスト読み上げ方
法のアルゴリズムを実現する、入力されたテキストから
音声およびサウンドを生成して出力する処理を説明す
る。図１２は、音声およびサウンド出力制御処理の例で
ある。入力されたテキストに対し、先に図３および図９
に示したような音声規則合成処理ブロック１２０１やサ
ウンド生成処理ブロック１２０２によって生成された音
声およびサウンドは、同期制御処理ブロック１２０３に
おいて同期制御され、各文字に対応して同時に出力され
る。ただし、音声規則合成処理ブロック１２０１は、例
えば、J. Allen,M. S. Hunnicutt and D. Klatt, "From
text to speech: The MITalk system"（Cambridge Uni
versity Press, 1987）に記述されているような、文字
属性を考慮しない従来の音声規則合成方法に基づいた手
順で構成されるものであってもよい。Here, a description will be given of a process of generating and outputting voice and sound from input text, which realizes the algorithm of the text-to-speech method according to the present invention. FIG. 12 is an example of a sound and sound output control process. 3 and 9 for the input text.
The voice and sound generated by the voice rule synthesis processing block 1201 and the sound generation processing block 1202 as shown in (1) are synchronously controlled in a synchronization control processing block 1203, and are simultaneously output corresponding to each character. However, the speech rule synthesis processing block 1201 includes, for example, J. Allen, MS Hunnicutt and D. Klatt, "From
text to speech: The MITalk system "(Cambridge Uni
(versity Press, 1987), a procedure based on a conventional speech rule synthesis method that does not consider character attributes may be used.

【００３３】一方、図１３は、本発明であるテキスト読
み上げ方法を、立体音響生成手段を用いて実現する立体
音響生成処理の例である。電子化テキストが、構造化文
書のように段組みや文章揃え、行間、インデント等、文
字表示位置を様々に変えることによって視覚的な変化を
表現している場合には、文字を読み上げる際に立体音響
生成手段を用いることによって、このような表示位置の
変化を表現することが可能である。図１３において、入
力されたテキストに対し、文字座標抽出処理ブロック１
３０１は、読み上げ位置検出処理ブロック１３０２によ
って検出されたテキスト中の読み上げ位置の文字につい
て、入力された電子化テキストの、現在読み上げている
文字の表示領域における相対的な座標を抽出する。表示
または印刷用に文字位置を様々に設定された電子化テキ
ストは、予め左右マージンや段幅、行間等の情報を持っ
ているため、この位置データを用いることによって文字
座標を容易に抽出することができる。音像制御処理ブロ
ック１３０３は、抽出された文字座標をもとに、立体音
響空間の対応する位置に音像を写像するための制御を行
う。立体音響生成処理ブロック１３０４は、読み上げ文
字位置に対応する音響空間上の位置に、図１２に示した
音声／サウンド出力制御処理ブロック１３０５において
生成された音声やサウンドが定位するように立体音響を
生成する。信号生成処理ブロック１３０６は、立体的な
空間位置から聞こえる立体音声やサウンド信号を生成し
て出力する。音像の制御、立体音響生成方法について
は、例えば小宮山、「音像制御技術」（テレビジョン学
会誌、 Vol.46, No.9, 1076-1079,1992）記載の空間音
像定位技術を用いて実現することが可能である。FIG. 13 shows an example of a stereophonic sound generation process for realizing the text-to-speech method according to the present invention using the stereophonic sound generation means. If the digitized text expresses a visual change by changing the character display position in various ways, such as columns, sentence alignment, line spacing, and indentation, as in a structured document, a three-dimensional By using the sound generation means, it is possible to express such a change in the display position. In FIG. 13, a character coordinate extraction processing block 1 is applied to an input text.
In step 301, for the character at the reading position in the text detected by the reading position detection processing block 1302, the relative coordinates of the input digitized text in the display area of the currently read character are extracted. Since digitized texts with various character positions set for display or printing have information such as left and right margins, step widths, and line spacing in advance, character coordinates can be easily extracted by using this position data. Can be. The sound image control processing block 1303 performs control for mapping a sound image to a corresponding position in the three-dimensional sound space based on the extracted character coordinates. The three-dimensional sound generation processing block 1304 generates three-dimensional sound such that the sound or sound generated in the sound / sound output control processing block 1305 shown in FIG. 12 is localized at a position in the sound space corresponding to the read-out character position. I do. The signal generation processing block 1306 generates and outputs a three-dimensional sound or a sound signal that can be heard from a three-dimensional spatial position. The sound image control and the stereophonic sound generation method are realized using, for example, the spatial sound image localization technology described in Komiyama, “Sound Image Control Technology” (Journal of the Institute of Television Engineers of Japan, Vol. 46, No. 9, 1076-1079, 1992). It is possible.

【００３４】なお、本発明であるテキスト読み上げ方法
を実現する処理アルゴリズムを実行するためのテキスト
入力手段には、図１または２で説明した、ディスク装置
１０５に予め格納されている電子化テキストや、入力デ
バイス１０９、通信ポート１１０を介して入力されたテ
キストをロードする公知の方法を用いることができる。
また、本テキスト読み上げ方法によって生成した音声や
サウンド、または立体音響を用いた音声やサウンドは、
公知の方法を用いて音響デバイス１１１や立体音響デバ
イス２０１を通して出力することができる。The text input means for executing the processing algorithm for realizing the text-to-speech method according to the present invention includes the digitized text previously stored in the disk device 105 described in FIG. A known method of loading text input via the input device 109 and the communication port 110 can be used.
In addition, voices and sounds generated by this text-to-speech method, or voices and sounds using three-dimensional sound,
Output can be performed through the acoustic device 111 or the stereophonic device 201 using a known method.

【００３５】図１４は、文字属性や段組み等を持つ電子
化テキスト文書の例である。タイトルや章題、本文によ
って文字サイズや字体、斜体や下線等の文字属性が異な
る。また、段組みや段落分けによって、テキストの位置
が設定される。表や図などが含まれる場合は、通常、表
や図をさけて本文を配置するため、そのような条件によ
ってもテキストの配置が異なる。図は、現在表示座標系
において（ｘ１、ｙ１）の位置を読み上げていることを
示している。音の出力は、図１３で説明した方法を用い
て読み上げ位置（ｘ１、ｙ１）に対応する音響空間上の
位置に定位させることによって、表の構造と、項目の位
置を判別することが可能となる。FIG. 14 shows an example of an electronic text document having character attributes, columns, and the like. Character attributes such as character size, font, italics, and underline differ depending on the title, chapter title, and text. In addition, the position of the text is set by column or paragraph division. When a table or a figure is included, the text is usually arranged so as to avoid the table or the figure. Therefore, the arrangement of the text differs depending on such conditions. The figure shows that the position of (x1, y1) is currently read out in the display coordinate system. The sound output is localized at the position in the acoustic space corresponding to the reading position (x1, y1) using the method described with reference to FIG. 13, so that the structure of the table and the position of the item can be determined. Become.

【００３６】図１５は、図１４に示した文書の文字と文
字属性の対応を示している。このような対応関係を元に
図３、図９における文字属性抽出処理ブロックで、読み
上げる文字の文字属性を抽出する。FIG. 15 shows the correspondence between the characters of the document shown in FIG. 14 and the character attributes. Based on such correspondence, the character attribute of the character to be read is extracted in the character attribute extraction processing block in FIGS.

【００３７】図１６は、表示位置を音響空間に対応付け
る例である。図は、ＣＲＴディスプレイ装置等の表示空
間１６０１と使用者１６０２を、斜め後方から俯瞰的に
表している。表示空間１６０１上の読み上げる文字１６
０３の座標（ｘ１、ｙ１）を、音響空間１６０４の座標
（ｕ、ｖ）１６０５に対応付け、該文字に割れ当てられ
ている音声やサウンドを該音響空間上に定位する。FIG. 16 shows an example in which a display position is associated with an acoustic space. The figure shows a display space 1601 such as a CRT display device and a user 1602 in a bird's-eye view from obliquely behind. Text 16 to be read out in display space 1601
The coordinates (x1, y1) of 03 are associated with the coordinates (u, v) 1605 of the acoustic space 1604, and the sound or sound cracked to the character is localized on the acoustic space.

【００３８】ここで、音響空間を表示空間と完全に重な
るように配置すれば、表示空間上の対象物の位置と音響
空間上の音像の位置は完全に一致する。また、音響空間
を表示空間より大きくした場合は、相対的に音像定位の
解像度を向上させる効果が得られる。また、表示空間と
音響空間の対応付けは、例えば図１７に示すように、使
用者の両耳の中点を原点１７０１とし、該原点と表示空
間１７０２上の読み上げ位置１７０３の座標を直線で結
んだ延長上の等比位置に、音響空間１７０４の対象物に
対応する音像１７０５の座標を割り当てることによって
行うことが可能である。なお、音響空間への写像は、聴
覚の特性に合せて聴感に合うように変形してもよい。図
１７では、２次元空間を例に説明したが、本対応付けが
３次元空間に容易に拡張できることはいうまでもない。Here, if the acoustic space is arranged so as to completely overlap the display space, the position of the object in the display space and the position of the sound image in the acoustic space completely match. When the acoustic space is made larger than the display space, an effect of relatively improving the resolution of the sound image localization can be obtained. Also, as shown in FIG. 17, for example, as shown in FIG. 17, the center of the user's both ears is set as the origin 1701 and the origin and the coordinates of the reading position 1703 on the display space 1702 are connected by a straight line. This can be done by assigning the coordinates of the sound image 1705 corresponding to the object in the acoustic space 1704 to the equidistant position on the extension. Note that the mapping to the acoustic space may be modified so as to match the auditory sense according to the characteristics of the auditory sense. In FIG. 17, a two-dimensional space has been described as an example, but it goes without saying that this association can be easily extended to a three-dimensional space.

【００３９】図１８は、電子化テキストの表の例であ
る。表の場合は、予め定められた順番で表中の文字や行
数、列数等を読み上げるが、その際にも、文字属性によ
って音声の音質や話者を変えたり、サウンドを出力す
る。また、音の出力を、図１３で説明した方法を用いて
読み上げ位置（ｘ２、ｙ２）に対応する音響空間上の位
置に定位させることによって、表の構造と、項目の位置
を判別することが可能となる。表示位置の音響空間への
対応付けには、図１４の文書の場合と同様の方法を用い
ることができる。FIG. 18 is an example of a table of digitized text. In the case of a table, the characters, the number of lines, the number of columns, and the like in the table are read out in a predetermined order. At this time, the sound quality or the speaker of the voice is changed or the sound is output according to the character attribute. Further, by localizing the sound output to the position in the acoustic space corresponding to the reading position (x2, y2) using the method described with reference to FIG. 13, it is possible to determine the structure of the table and the position of the item. It becomes possible. The same method as in the case of the document in FIG. 14 can be used for associating the display position with the acoustic space.

【００４０】図１９は、図１８に示した表の文字と文字
属性の対応を示している。このような対応関係を元に図
３、図９における文字属性抽出処理ブロックで、読み上
げる文字の文字属性を抽出する。FIG. 19 shows the correspondence between the characters and the character attributes in the table shown in FIG. Based on such correspondence, the character attribute of the character to be read is extracted in the character attribute extraction processing block in FIGS.

【００４１】図２０は、電子化テキストの図の例であ
る。図の場合も、予め定められた順番で図中の文字や図
の形状として予め対応付けられた音声等を読み上げる
が、その際にも、文字属性によって音声の音質や話者を
変えたり、サウンドを出力する。また、音の出力を、図
１３で説明した方法を用いて、読み上げ位置（ｘ３、ｙ
３）に対応する音響空間上の位置に定位させることによ
って、図の構造と、図中のテキストの位置を判別するこ
とが可能となる。表示位置の音響空間への対応付けに
は、図１４の文書の場合と同様の方法を用いることがで
きる。FIG. 20 is an example of a digitized text diagram. In the case of a figure as well, a character in the figure or a voice or the like previously associated as a shape of the figure is read out in a predetermined order. Is output. Further, the output of the sound is read at the reading position (x3, y) using the method described with reference to FIG.
By localizing to the position in the acoustic space corresponding to 3), it is possible to determine the structure of the figure and the position of the text in the figure. The same method as in the case of the document in FIG. 14 can be used for associating the display position with the acoustic space.

【００４２】図２１は、図２０に示した図の文字と文字
属性の対応を示している。このような対応関係を元に図
３、図９における文字属性抽出処理ブロックで、読み上
げる文字の文字属性を抽出する。FIG. 21 shows the correspondence between the characters and the character attributes in the diagram shown in FIG. Based on such correspondence, the character attribute of the character to be read is extracted in the character attribute extraction processing block in FIGS.

【００４３】なお、読み上げる際に、例えばテキストの
途中に現われる行の変わり目、段の変わり目、ページの
変わり目等、急激に読み上げ位置が変化する部分を検出
して、その時に特定のサウンドを出力するようにすれ
ば、それぞれの行や段、ページの変化等を音の種類によ
っても判別することができる。また、表の読み上げ時に
罫線を越えて次の欄を読み上げる際には、特定のサウン
ドを出力するようにすれば、欄の移動を音の種類によっ
て判別することができる。At the time of reading, a portion where the reading position changes rapidly, such as a line change, a column change, or a page change appearing in the middle of the text, is detected, and a specific sound is output at that time. In this case, it is possible to determine the change of each row, column, page, and the like based on the type of sound. Also, when reading out the next column beyond the ruled line when reading out the table, if a specific sound is output, the movement of the column can be determined according to the type of sound.

【００４４】図２２は、文書のページの違いを立体音響
で表現する例である。図２２に示すように、文書の１ペ
ージの配置をＸ−Ｙ座標、ページ数をＺ座標に対応付
け、読み上げる文字２２０１があるページによって聞こ
えてくる位置を前後移動することによって、現在読み上
げているページを判別することができる。図の例では、
１ページより２ページ、２ページより３ページが使用者
にとって奥にあるように音像を定位させる。FIG. 22 shows an example in which the difference between pages of a document is expressed by stereophonic sound. As shown in FIG. 22, the arrangement of one page of the document is associated with the X-Y coordinate and the number of pages is associated with the Z coordinate, and the position where the character 2201 to be read is heard by the page where the character 2201 is to be read is moved back and forth, thereby reading out the document at present. The page can be determined. In the example shown,
The sound image is localized so that two pages are more than one page and three pages are more than two pages.

【００４５】また、図２３に示すように、使用者２３０
１の位置を位置センサ２３０２で検出し、読み上げる文
字２３０３の位置に対する使用者の位置の移動に合わせ
て音像を移動させることによって、音像の聞こえる位置
を調整することが可能である。その際に、使用者の移動
距離と、音像の移動距離を故意に変えてもよい。例え
ば、使用者の移動距離より音像の移動距離を相対的に大
きくすれば、使用者が身を乗り出す等の動作をしたとき
に、より大きな音で近くから聞こえるため、使用者の注
意を喚起することができる。Further, as shown in FIG.
The position where the sound image can be heard can be adjusted by detecting the position of the sound image with the position sensor 2302 and moving the sound image in accordance with the movement of the position of the user with respect to the position of the character 2303 to be read. At that time, the moving distance of the user and the moving distance of the sound image may be intentionally changed. For example, if the moving distance of the sound image is relatively larger than the moving distance of the user, the user can be heard from near by a loud sound when the user performs an operation such as leaning out, so that the user is alerted. be able to.

【００４６】図２４は、ウインドウシステムの読み上げ
例である。ウインドウシステムのメニュー操作やコマン
ド、ファイル、ウインドウ、アイコンの選択や移動等の
システム操作に対し、音声やサウンドを出力することに
よって、ウインドウシステムの操作結果を理解しやすく
することができる。ここで、２４０１、２４０２、２４
０３、２４０４、２４０５は、それぞれウインドウ、ア
イコン、メニュー、時計、およびカーソルを示す。２４
０６は、カーソル等でフォルダやファイル等を選択する
ことによってファイル検索や、アプリケーションの実行
等を行うファイルマネージャを表している。また、２４
０７はウインドウ名を表わしている。今、メニュー２４
０３内の「save」コマンドが選択されている。この時、
該コマンドの表示空間上の座標位置（ｘ４、ｙ４）に対
応する音響空間上の位置に該コマンドに割り当てられた
テキストに対応する音声やサウンドを定位することによ
って、コマンドのある位置や内容を判別することができ
る。FIG. 24 shows an example of reading aloud by the window system. By outputting voices and sounds in response to system operations such as menu operations and commands, files, windows and icons of the window system, and movement of the icons, the operation results of the window system can be easily understood. Here, 2401, 2402, 24
03, 2404, and 2405 indicate a window, an icon, a menu, a clock, and a cursor, respectively. 24
Reference numeral 06 denotes a file manager that searches for a file or executes an application by selecting a folder or a file with a cursor or the like. Also, 24
07 represents a window name. Now menu 24
The “save” command in 03 is selected. At this time,
By locating the voice or sound corresponding to the text assigned to the command at the position in the acoustic space corresponding to the coordinate position (x4, y4) in the display space of the command, the position or the content of the command is determined. can do.

【００４７】図２５にウインドウシステムの各構成要素
と音声やサウンドへの対応付けの例を示す。例えば、ウ
インドウについては、ウインドウが表示されているとき
には常に該ウインドウに対応付けられたメロディを出力
し、該ウインドウが選択された際にはメロディの出力と
同時にウインドウ名を読み上る。また、メニューについ
ては、メニュー上でカーソルを移動している際はクリッ
ク音を出力し、カーソルが停止した際には停止した位置
のコマンド名を読み上げる。時計については、カーソル
が時計上に停止した際に柱時計の音を出力し、選択され
た際には時間を読み上げる。ファイルマネージャにおい
て、ファイルについては、カーソル通過時にはパーカッ
ション音、カーソル停止時にはファイル名を読み上げ
る。フォルダについては、カーソル通過時と選択時でサ
ウンドを変える等を行う。このような対応付けによっ
て、ウインドウシステムの構成要素の内容や、位置、状
態を判別することが可能になる。FIG. 25 shows an example of correspondence between the components of the window system and voices and sounds. For example, when a window is displayed, a melody associated with the window is output, and when the window is selected, the window name is read out simultaneously with the output of the melody. As for the menu, a click sound is output when the cursor is moved on the menu, and the command name at the position where the cursor is stopped is read out when the cursor is stopped. For a clock, the sound of a wall clock is output when the cursor stops on the clock, and the time is read out when selected. In the file manager, a percussion sound is read out when a cursor passes through the file, and a file name is read out when the cursor stops. For the folder, the sound is changed between when the cursor passes and when the folder is selected. Such association makes it possible to determine the contents, positions, and states of the components of the window system.

【００４８】図２６は、World-Wide-Web（ヨーロッパ素
粒子物理学研究所（ＣＥＥＲＮ）で開発されたハイパー
テキスト技術を使った情報サービス）やGopher（ミネソ
タ大学で開発された、計算機ネットワークのリソースを
検索するメニュー式のシステム）のように、テキストが
階層構造を持つハイパーテキストを立体音響で表現する
例である。ハイパーテキストでは、テキスト上にアンカ
ーと呼ばれる呼び出しポイントが用意されており、例え
ばマウス等で該アンカーを選択した際に、予め該アンカ
ーに対応付けられている別のテキストや図、動画、サウ
ンド等が呼び出されて表示される。対応付けは、同一計
算機上のみでなく、ネットワークを介した様々なアクセ
スポイントにリンクすることによって行うことができ
る。図２６において、２６０１は、最も上層のハイパー
テキスト、２６０２は該ハイパーテキスト上のアンカー
を表している。使用者がアンカー２６０２を選択する
と、通常は画面上に新たなテキストや図、動画、サウン
ド等が表示される。２６０３は、このようにして新たに
表示された第２のハイパーテキストである。２６０４は
第２のハイパーテキスト２６０３上のアンカー、２６０
５は、該アンカーの選択によって開かれた第３のハイパ
ーテキストを表している。このような電子化テキストを
読み上げる際には、本発明を用いて、本文とアンカーで
読み上げ方や付与するサウンドを変えることによって、
アンカーの存在が判別できる。また、ハイパーテキスト
の階層を音響空間の奥行き方向に対応付け、使用者がア
ンカーを選択した際に、新たに開かれるハイパーテキス
トの読み上げ音声やテキストに付与されたサウンド、ま
たはアンカーにリンクされたサウンドが、例えばハイパ
ーテキスト上のアンカーのＸ−Ｙ座標を新たなハイパー
テキストやサウンドのＸ−Ｙ平面上の中心とし、奥行き
に対応するＺ軸方向のより手前から聞こえるようにする
ことによって、ハイパーテキストの階層構造を判別する
ことができる。FIG. 26 is a diagram of World-Wide-Web (an information service using hypertext technology developed by the European Institute for Particle Physics (CEERN)) and Gopher (a computer network resource developed at the University of Minnesota). This is an example in which hypertext having a hierarchical structure of text is expressed by stereophonic sound, as in a menu-type system for searching for. In the hypertext, a call point called an anchor is prepared on the text. For example, when the anchor is selected with a mouse or the like, another text, a figure, a moving image, a sound, and the like previously associated with the anchor are displayed. Called and displayed. The association can be performed not only on the same computer but also by linking to various access points via a network. In FIG. 26, 2601 denotes the uppermost hypertext, and 2602 denotes an anchor on the hypertext. When the user selects the anchor 2602, a new text, figure, moving image, sound, or the like is normally displayed on the screen. Reference numeral 2603 denotes a second hypertext newly displayed in this manner. 2604 is an anchor on the second hypertext 2603, 260
5 represents the third hypertext opened by the selection of the anchor. When reading such digitized text, using the present invention, by changing the method of reading and the sound to be given in the text and the anchor,
The presence of the anchor can be determined. In addition, the hypertext hierarchy is associated with the depth direction of the acoustic space, and when a user selects an anchor, a newly opened speech of the hypertext, a sound added to the text, or a sound linked to the anchor. For example, by setting the XY coordinates of the anchor on the hypertext as the center of the new hypertext or sound on the XY plane, so that the hypertext can be heard from the near side in the Z-axis direction corresponding to the depth, Can be determined.

【００４９】[0049]

【発明の効果】テキスト中の文字のフォント、下線、大
きさ等の文字属性の分類ごとに異なる音質を持つ合成音
声を用いてテキストを読み上げることにより、文字の属
性を音声の種類で提示できるようになった。According to the present invention, by reading out a text using a synthesized voice having a different sound quality for each classification of a character attribute such as font, underline, and size of the character in the text, the character attribute can be presented in the type of voice. Became.

【００５０】また、テキスト中の文字のフォント、下
線、大きさ等の文字属性の分類ごとに異なる音質を持つ
サウンドを読み上げ音声と同時に出力することにより、
文字の属性をサウンドで提示できるようになった。Further, by outputting a sound having a different sound quality for each character attribute classification such as font, underline, and size of a character in the text simultaneously with the reading voice,
Character attributes can now be presented as sounds.

【００５１】また、対象の位置や状態、対象に割り当て
たテキストの文字属性等によって、対象を音声によって
読み上げるか、サウンドを出力するか、音声によって読
み上げてサウンドを出力することにより、対象の位置や
状態、対象に割り当てたテキストの文字属性の違い等を
提示できるようになった。Depending on the position and state of the target, the character attribute of the text assigned to the target, and the like, the target can be read aloud by voice, output a sound, or read aloud by sound and output a sound to obtain the position and the target of the target. It is now possible to present the state, the character attribute difference of the text assigned to the object, and the like.

【００５２】さらに、現在読み上げているテキスト中の
文字のテキスト全体に対する相対的な位置を音響空間に
写像し、現在読み上げられている文字のテキスト全体に
対する相対位置に対応した方向から音声やサウンドが聞
こえるように提示する手段を設けたことにより、現在読
み上げられている文字のテキスト全体に対する相対位置
を提示できるようになった。Further, the relative position of the character in the currently read text with respect to the whole text is mapped to the acoustic space, and the voice or sound can be heard from the direction corresponding to the relative position of the currently read character with respect to the whole text. By providing the presenting means, it is possible to present the relative position of the currently read out character to the entire text.

【００５３】[0053]

[Brief description of the drawings]

【図１】本発明の方法を実現するためのハードウエア構
成の一実施例を表すである。FIG. 1 shows an embodiment of a hardware configuration for realizing the method of the present invention.

【図２】立体音響出力を行う場合の、ハードウエア構成
の一実施例を表すである。FIG. 2 is a diagram illustrating an example of a hardware configuration when performing stereophonic sound output.

【図３】音声規則合成処理の例を表す図である。FIG. 3 is a diagram illustrating an example of a speech rule synthesis process.

【図４】音色テーブルの例を表す図である。FIG. 4 is a diagram illustrating an example of a tone color table.

【図５】話者テーブルの例を表す図である。FIG. 5 is a diagram illustrating an example of a speaker table.

【図６】話者データの例を表す図である。FIG. 6 is a diagram illustrating an example of speaker data.

【図７】文字属性ごとに異なる話者を対応付ける場合の
話者テーブルの例を表す図である。FIG. 7 is a diagram illustrating an example of a speaker table when a different speaker is associated with each character attribute.

【図８】レベルテーブルの例を表す図である。FIG. 8 is a diagram illustrating an example of a level table.

【図９】サウンド生成処理の例を表す図である。FIG. 9 is a diagram illustrating an example of a sound generation process.

【図１０】サウンドテーブルの例を表す図である。FIG. 10 is a diagram illustrating an example of a sound table.

【図１１】サウンドテーブルの別の例である。FIG. 11 is another example of a sound table.

【図１２】音声およびサウンドの出力制御処理の例を表
す図である。FIG. 12 is a diagram illustrating an example of audio and sound output control processing.

【図１３】立体音響を生成する立体音響生成処理の例を
表す図である。FIG. 13 is a diagram illustrating an example of stereophonic sound generation processing for generating stereophonic sound.

【図１４】文字属性や段組み等を持つ電子化テキスト文
書の例を表す図である。FIG. 14 is a diagram illustrating an example of an digitized text document having character attributes, columns, and the like.

【図１５】文書の文字と文字属性の対応を示す例を表す
図である。FIG. 15 is a diagram illustrating an example showing correspondence between characters of a document and character attributes.

【図１６】表示空間上の文字位置の音像を音響空間に定
位する例を表す図である。FIG. 16 is a diagram illustrating an example in which a sound image at a character position in a display space is localized in an acoustic space.

【図１７】表示空間の音響空間への対応付けの例を表す
図である。FIG. 17 is a diagram illustrating an example of associating a display space with an acoustic space.

【図１８】電子化テキストの表の例を表す図である。FIG. 18 is a diagram illustrating an example of a table of digitized text.

【図１９】表の文字と文字属性の対応を示す例を表す図
である。FIG. 19 is a diagram illustrating an example showing correspondence between characters in a table and character attributes.

【図２０】電子化テキストの図の例を表す図である。FIG. 20 is a diagram illustrating an example of a digitized text diagram.

【図２１】図の文字と文字属性の対応を示す例を表す図
である。FIG. 21 is a diagram illustrating an example showing correspondence between characters in the figure and character attributes.

【図２２】文書のページの違いを立体音響で表現する例
を表す図である。FIG. 22 is a diagram illustrating an example in which a difference between pages of a document is expressed by stereophonic sound.

【図２３】使用者の位置の違いを立体音響で表現する例
を表す図である。FIG. 23 is a diagram illustrating an example in which a difference in user position is expressed by stereophonic sound.

【図２４】ウインドウシステムの読み上げ例を表す図で
ある。FIG. 24 is a diagram illustrating a reading example of a window system.

【図２５】ウインドウシステム構成要素の音声／サウン
ド対応例を表す図である。FIG. 25 is a diagram illustrating an example of audio / sound correspondence of window system components.

【図２６】ハイパーテキストを立体音響で表現する例を
表す図である。FIG. 26 is a diagram illustrating an example in which hypertext is expressed by stereophonic sound.

[Explanation of symbols]

１０６…システムバス、１１２…バス、１６０１…表示
空間、１６０２…使用者、１６０３…読み上げる文字、
１６０４…音響空間、１６０…座標、１７０１…原点、
１７０２…表示空間、１７０３…読み上げ位置、１７０
４…音響空間、１７０５…音像、２３０１…使用者、２
３０２…位置センサ、２３０３…読み上げる文字、２４
０１…ウインドウ、２４０２…アイコン、２４０３…メ
ニュー、２４０４…時計、２４０５…カーソル、２４０
６…ファイルマネージャ、２４０６…ウインドウ名、２
６０１、２６０３、２６０５…ハイパーテキスト、２６
０２、２６０４…アンカー。106: system bus, 112: bus, 1601: display space, 1602: user, 1603: characters to be read out,
1604: acoustic space, 160: coordinates, 1701: origin,
1702: display space, 1703: reading position, 170
4 acoustic space, 1705 sound image, 2301 user, 2
302 ... position sensor, 2303 ... character to read out, 24
01 window, 2402 icon, 2403 menu, 2404 clock, 2405 cursor, 240
6 File manager, 2406 Window name, 2
601, 2603, 2605... Hypertext, 26
02, 2604 ... anchor.

───────────────────────────────────────────────────── フロントページの続き (72)発明者菊池英明東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者 ▲高▼橋寿一東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (56)参考文献特開平７−72875（ＪＰ，Ａ) 特開平５−197517（ＪＰ，Ａ) 特開平５−80958（ＪＰ，Ａ) 特開平５−81294（ＪＰ，Ａ) 特開平６−259216（ＪＰ，Ａ) 特開平７−129356（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 3/16 G06F 3/00 G10L 11/00 G10L 13/06 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Hideaki Kikuchi 1-280 Higashi Koikekubo, Kokubunji-shi, Tokyo Inside the Central Research Laboratory, Hitachi, Ltd. (56) References JP-A-7-72875 (JP, A) JP-A-5-197517 (JP, A) JP-A-5-80958 (JP, A) JP-A-5-81294 (JP, A) JP-A-6-259216 (JP, A) JP-A-7-129356 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G06F 3/16 G06F 3 / 00 G10L 11/00 G10L 13/06

Claims

(57) [Claims]

And means for reading out a text composed of a plurality of characters by voice, and layout information associated with the text to be read out.
From the text data of the character data read out above
Means for extracting position relative to the whole, and mapping the sound to a position corresponding to the position in acoustic space
A sound image localization device, the voice of the reading means in the sound image localization position is localized to
Means for generating stereophonic sound as described above .

2. A has a hierarchical structure, other text included in another layer, FIG, video, or sound and a voice hypertext including during call <br/> beauty out points and the associated text means for reading out, the sound image localization means capable of outputting audibly the audio from a plurality of locations, and reading how the body, the sound image localization means so made different and reading how the call point A text-to-speech apparatus, comprising: means for controlling.

3. A text-to-speech apparatus according to claim 2 , wherein said sound image localization means has means for outputting an acoustic signal having a depth, and wherein the hierarchy of said hypertext is made to correspond to said depth.

4. A window system capable of displaying a plurality of windows on a screen and displaying information in each window, means for reading out text by voice, sound output means for outputting sound, and said screen A cursor that can be moved on the screen, and that can select an object displayed on the screen, and each window has an area in which a command name is displayed. The sound output means outputs a sound when the cursor passes through the area where the command name is displayed. The text-to-speech means outputs the sound when the movement of the cursor is stopped in the area where the command name is displayed. In
A window system characterized by reading out the command name.