JP2000330584A

JP2000330584A - Voice synthesis device, voice synthesis method, and voice communication device

Info

Publication number: JP2000330584A
Application number: JP11138525A
Authority: JP
Inventors: Toshio Nakajima; 利男中島
Original assignee: Toppan Printing Co Ltd
Current assignee: Toppan Inc
Priority date: 1999-05-19
Filing date: 1999-05-19
Publication date: 2000-11-30

Abstract

(57)【要約】【課題】送信するデータ量を減少させることによって
データ送信時間を短縮し、もって、通信コストの削減
と、再生するまでの時間を短縮することができる音声通
信装置を提供すること。また、情報量の少ない属性デー
タだけを入力として音声を合成することができる音声合
成装置、及び音声合成方法を提供すること。【解決手段】音声の波形パターンである音声波形パタ
ーンを各音声種別毎に格納する音声波形パターン記憶手
段２４と、音声波形パターンの種別を指定する発音デー
タｆに基づいて、これに対応する音声波形パターンを音
声波形パターン記憶手段２４から取り出すと共に、音声
波形パターンの１波形長さを指定する音階データｅに基
づいて、取り出された音声波形パターンに、指定された
波形長さを割り当てて、出力する音声合成手段２３とを
備える。 (57) [Summary] [Problem] To provide a voice communication device capable of reducing the data transmission time by reducing the amount of data to be transmitted, thereby reducing the communication cost and the time until reproduction. thing. Further, it is an object of the present invention to provide a voice synthesizing apparatus and a voice synthesizing method capable of synthesizing voice using only attribute data having a small amount of information as input. SOLUTION: An audio waveform pattern storage means 24 for storing an audio waveform pattern, which is an audio waveform pattern, for each audio type, and an audio waveform corresponding to the audio waveform based on pronunciation data f designating the type of the audio waveform pattern. The pattern is extracted from the audio waveform pattern storage means 24, and the extracted audio waveform pattern is assigned a specified waveform length based on scale data e specifying one waveform length of the audio waveform pattern, and is output. Voice synthesizing means 23.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、例えば、通信カラ
オケ装置のように、通信により送られた音声データを再
生する音声合成装置および音声合成方法、ならびにこれ
らが適用される音声通信装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice synthesizing apparatus and a voice synthesizing method for reproducing voice data transmitted by communication, such as a communication karaoke apparatus, and a voice communication apparatus to which these are applied. is there.

【０００２】[0002]

【従来の技術】この種の、従来から用いられている音声
通信装置を、図１１〜図１３を用いて説明する。2. Description of the Related Art A conventional voice communication apparatus of this type will be described with reference to FIGS.

【０００３】図１１は、このような従来から用いられて
いる音声通信装置の全体構成の一例を示すブロック図で
ある。FIG. 11 is a block diagram showing an example of the overall configuration of such a conventionally used voice communication device.

【０００４】すなわち、従来から用いられている音声通
信装置１００は、音声データＡをディジタル信号として
送信する送信装置１０１と、この音声データＡを受信
し、音声データＡに必要なデータ変換を行なった後に、
音声として出力する受信再生装置１１０とを備えてい
る。That is, a conventionally used voice communication apparatus 100 transmits a voice signal A as a digital signal to a transmitting apparatus 101, receives the voice data A, and performs necessary data conversion on the voice data A. later,
And a receiving / reproducing device 110 for outputting as audio.

【０００５】送信装置１０１から送信された音声データ
Ａは、電話回線などが用いられた通信回線１０２を介し
て受信再生装置１１０に送信される。[0005] The audio data A transmitted from the transmitting apparatus 101 is transmitted to the receiving / reproducing apparatus 110 via a communication line 102 using a telephone line or the like.

【０００６】図１２は、受信再生装置１１０の機能構成
の一例を示すブロック図である。FIG. 12 is a block diagram showing an example of a functional configuration of the receiving and reproducing apparatus 110.

【０００７】音声データＡは、受信再生装置１１０のモ
デム１１１によって受信された後にＤ／Ａ変換器（ディ
ジタル−アナログ変換器）１１２によってアナログ信号
に変換され、更に増幅器１１３にて増幅された後に出力
装置１１４から音声として出力される。このような構成
をなす音声通信装置１００は、以下の様にして、通信カ
ラオケ装置としても応用されている。The audio data A is received by the modem 111 of the receiving / reproducing device 110, converted into an analog signal by a D / A converter (digital-analog converter) 112, further amplified by an amplifier 113, and output. It is output from the device 114 as audio. The voice communication device 100 having such a configuration is also applied as a communication karaoke device as described below.

【０００８】音声通信装置１００を通信カラオケ装置と
して用いる場合、送信側の送信装置１０１から送信する
音声データＡをバックコーラスとして用い、それに加え
て曲の伴奏部分に相当する音楽データＢ、画面に表す歌
詞に相当する画像データＣも送信する必要がある。When the voice communication device 100 is used as a communication karaoke device, voice data A transmitted from the transmission device 101 on the transmission side is used as a back chorus, and in addition, music data B corresponding to an accompaniment portion of a song is displayed on a screen. It is also necessary to transmit image data C corresponding to lyrics.

【０００９】図１３は、このような従来から用いられて
いる音声通信装置を、通信カラオケ装置に応用した場合
における受信再生装置１２０の構成の一例を示すブロッ
ク図である。FIG. 13 is a block diagram showing an example of the configuration of a receiving and reproducing apparatus 120 when such a conventionally used voice communication apparatus is applied to a communication karaoke apparatus.

【００１０】すなわち、図１１に示す音声通信装置を通
信カラオケ装置として用いる場合、その受信再生装置１
２０は、図１２に示す受信再生装置１１０の構成に加え
て、モデム１１１が受信した送信データから音声データ
Ａと、音楽データＢと、画像データＣとを分別するセレ
クタ１２３を設けている。セレクタ１２３によって分別
された音楽データＡを出力する機能構成は、図１２と同
様である。That is, when the voice communication apparatus shown in FIG. 11 is used as a communication karaoke apparatus, the reception / reproduction apparatus 1
20 is provided with a selector 123 for separating audio data A, music data B, and image data C from transmission data received by the modem 111, in addition to the configuration of the reception / reproduction device 110 shown in FIG. The functional configuration for outputting the music data A sorted by the selector 123 is the same as that in FIG.

【００１１】一方、セレクタ１２３に接続している音楽
再生部１２６、画像表示部１２７は、それぞれセレクタ
１２３によって分別された音楽データＢ、画像データＣ
を外部に出力するための構成である。On the other hand, the music reproducing section 126 and the image display section 127 connected to the selector 123 respectively provide the music data B and the image data C separated by the selector 123.
Is output to the outside.

【００１２】音楽再生部１２６は、セレクタ１２３によ
って分別された圧縮形式の音楽データＢをディジタルデ
ータに解凍するデコーダ１２４ａと、デコーダ１２４ａ
によって解凍されたデータに基づいて曲を再現するシー
ケンサ１２５と、再現された曲を増幅する増幅器１１３
ｂと、増幅器１１３ｂにより増幅された曲を出力する出
力装置１１４ｂとを備えている。The music reproducing unit 126 includes a decoder 124a for decompressing the compressed music data B separated by the selector 123 into digital data, and a decoder 124a.
125 that reproduces a song based on the data decompressed by the amplifier, and an amplifier 113 that amplifies the reproduced song
b and an output device 114b for outputting the music amplified by the amplifier 113b.

【００１３】画像表示部１２７は、セレクタ１２３によ
って分別された画像データＣを展開するデコーダ１２４
ｂと、デコーダ１２４ｂによって展開されたデータを表
示する表示装置１２８とを備えている。An image display section 127 is a decoder 124 for expanding the image data C sorted by the selector 123.
b, and a display device 128 for displaying the data expanded by the decoder 124b.

【００１４】上述したように、図１１に示す音声通信装
置を通信カラオケ装置として用いる場合、送信側から受
信側へ音声データＡと、音楽データＢと、画像データＣ
とが送られる。通常これらのデータは１つのパッケージ
データとして送信がなされ、受信再生装置１２０のモデ
ム１１１により、まとめて受信される。セレクタ１２３
は、このように３種類の信号を含んだパッケージデータ
から、音声データＡと、音楽データＢと、画像データＣ
とを個別に取り出す分別装置である。As described above, when the voice communication device shown in FIG. 11 is used as a communication karaoke device, the voice data A, the music data B, and the image data C are transmitted from the transmitting side to the receiving side.
Is sent. Usually, these data are transmitted as one package data, and are collectively received by the modem 111 of the receiving and reproducing apparatus 120. Selector 123
The audio data A, the music data B, and the image data C are obtained from the package data including the three types of signals.
Is a separation device for individually taking out

【００１５】セレクタ１２３によって分別された音声デ
ータＡは、Ｄ／Ａ変換器１１２においてアナログ変換さ
れた後に増幅器１１３ａにおいて合成、増幅され、出力
装置１１４ａからバックコーラス音声として出力され
る。音声データＡは、データが圧縮されることなく、送
信装置１０１から原音データのまま送信される。The audio data A separated by the selector 123 is analog-converted by the D / A converter 112, then synthesized and amplified by the amplifier 113a, and output from the output device 114a as back-chorus audio. The audio data A is transmitted from the transmitting apparatus 101 as original sound data without data compression.

【００１６】一方、音楽データＢは、ＭＩＤＩ（Musica
l Instrument Digital Interface）規格で定められてい
る方式に基づく圧縮音符データとして送信される。セレ
クタ１２３によって分別された音楽データＢは、デコー
ダ１２４ａによってデータ解凍された後に、シーケンサ
１２５に送られる。シーケンサ１２５ではデコーダ１２
４ａによって解凍されたデータに基づいて曲が再現さ
れ、更に、再現された曲は増幅器１１３ｂにおいて増幅
された後に出力装置１１４ｂから出力される。On the other hand, the music data B is MIDI (Musica
l It is transmitted as compressed note data based on the method defined by the Instrument Digital Interface) standard. The music data B separated by the selector 123 is sent to the sequencer 125 after the data is decompressed by the decoder 124a. In the sequencer 125, the decoder 12
The music is reproduced based on the data decompressed by 4a, and the reproduced music is output from the output device 114b after being amplified by the amplifier 113b.

【００１７】一方、画像データＣは送信側よりＭＰＥＧ
形式の動画データとして送信される。モデム１１１によ
って受信された後に、セレクタ１２３で分別された画像
データＣは、更に、デコーダ１２４ｂに送られデータ展
開された後に、表示装置１２８から表示される。On the other hand, the image data C is transmitted from the transmitting side by MPEG.
It is transmitted as video data of the format. The image data C separated by the selector 123 after being received by the modem 111 is further sent to the decoder 124b and expanded, and then displayed on the display device 128.

【００１８】[0018]

【発明が解決しようとする課題】しかしながら、このよ
うな従来の音声通信装置では、以下のような問題があ
る。However, such a conventional voice communication device has the following problems.

【００１９】すなわち、従来の音声通信装置では、送信
側から受信側へ音声データＡが原音データのまま送信さ
れる。また、このような音声通信装置をカラオケ装置と
して利用し、歌詞あるいは背景画の表示用に送信される
画像データＣをあわせて送信する場合、この画像データ
ＣはＭＰＥＧ形式の動画データとして送信される。That is, in the conventional voice communication device, the voice data A is transmitted from the transmitting side to the receiving side as it is as the original sound data. When such a voice communication device is used as a karaoke device and image data C transmitted for displaying lyrics or a background image is transmitted together, the image data C is transmitted as moving image data in MPEG format. .

【００２０】ここで原音データは圧縮されておらず、ま
たＭＰＥＧ形式の動画データは元々大量のデータである
ので、送信されるデータは膨大になり、余分な通信時間
がかかるのみならず、受信側で再生するまでも時間がか
かるため、ＩＳＤＮ等の高速回線が必要となるなど通信
コスト的にも不利であった。Here, the original sound data is not compressed, and the moving picture data in the MPEG format is originally a large amount of data, so that the data to be transmitted becomes enormous, and not only extra communication time is required but also the receiving side Since it takes a long time to reproduce the data, a high-speed line such as an ISDN is required, which is disadvantageous in terms of communication cost.

【００２１】一方、近年、通信カラオケを安価で提供
し、一般家庭に広く普及させたいという要求が高まって
いる。しかし、ＩＳＤＮ等の高速回線を必須とするよう
な仕様では、これを一般に普及させるのは困難である。On the other hand, in recent years, there has been an increasing demand to provide communication karaoke at low cost and to spread it widely to ordinary households. However, it is difficult to popularize this with specifications such as ISDN that require a high-speed line.

【００２２】本発明はこのような事情に鑑みてなされた
ものであり、その第１の目的は、送信するデータ量を減
少させることによってデータ送信時間を短縮し、もっ
て、通信コストの削減と、再生するまでの時間を短縮す
ることができる音声通信装置を提供することにある。The present invention has been made in view of such circumstances, and a first object of the present invention is to reduce the data transmission time by reducing the amount of data to be transmitted, thereby reducing communication costs, and An object of the present invention is to provide a voice communication device capable of shortening the time required for reproduction.

【００２３】また、第２の目的は、情報量の少ない属性
データだけを入力として音声を合成することができる音
声合成装置、及び音声合成方法を提供することにある。It is a second object of the present invention to provide a voice synthesizing apparatus and a voice synthesizing method capable of synthesizing a voice by inputting only attribute data having a small amount of information.

【００２４】[0024]

【課題を解決するための手段】上記の目的を達成するた
めに、本発明では、以下のような手段を講じる。Means for Solving the Problems In order to achieve the above object, the present invention takes the following measures.

【００２５】すなわち、請求項１の発明では、音声の波
形パターンである音声波形パターンを各音声種別毎に格
納する音声波形パターン記憶手段と、音声波形パターン
の種別を指定する発音データに基づいて、これに対応す
る音声波形パターンを音声波形パターン記憶手段から取
り出すと共に、音声波形パターンの１波形長さを指定す
る音階データに基づいて、取り出された音声波形パター
ンに、指定された波形長さを割り当てて、出力する音声
合成手段とを備える。That is, according to the first aspect of the present invention, a voice waveform pattern storage means for storing a voice waveform pattern, which is a voice waveform pattern, for each voice type, and sound data designating the type of the voice waveform pattern, A corresponding audio waveform pattern is extracted from the audio waveform pattern storage means, and a specified waveform length is assigned to the extracted audio waveform pattern based on scale data specifying one waveform length of the audio waveform pattern. And a voice synthesizing means for outputting.

【００２６】従って、請求項１の発明の音声合成装置に
おいては、音声の再現に必要な音階データと、発音デー
タとを入力するだけで、その属性に基づき音声を再現す
ることができる。例えば、音声データを送信する事を考
えると、これをそのまま送信する場合に比べて、通信デ
ータの量を減少させることができる。従って、本発明を
例えば音声通信装置に利用すると、通信時間の短縮と、
通信コストの削減とを実現することが可能となる。Therefore, in the voice synthesizing apparatus according to the first aspect of the present invention, the voice can be reproduced based on the attributes only by inputting the scale data necessary for reproducing the voice and the pronunciation data. For example, in consideration of transmitting voice data, the amount of communication data can be reduced as compared with the case of transmitting voice data as it is. Therefore, when the present invention is applied to, for example, a voice communication device, a reduction in communication time and
It is possible to reduce the communication cost.

【００２７】請求項２の発明では、請求項１の発明の音
声合成装置において、音声合成手段は、音声波形パター
ンの繰り返し出力時間を指定する発音時間データ、及び
音声波形パターンの出力の大きさを指定する音量データ
に基づく発音持続時間及び音量で、指定された波形長が
割り当てられた音声波形パターンを出力する。According to a second aspect of the present invention, in the voice synthesizing apparatus according to the first aspect of the present invention, the voice synthesizing means generates the sounding time data for designating the repetitive output time of the voice waveform pattern and the output magnitude of the voice waveform pattern. An audio waveform pattern to which a specified waveform length is assigned is output with a sound generation duration and a volume based on the specified volume data.

【００２８】従って、請求項２の発明の音声合成装置に
おいては、音階データ及び発音データのみならず、発音
時間データ及び音量データをも用いることによって、音
声データを送信せずとも、音声のみならず、その長さ、
および音量についても再現することが可能となる。Therefore, in the voice synthesizing apparatus according to the second aspect of the present invention, not only the voice data but also the voice data is used to transmit not only the voice data but also the voice data. , Its length,
And sound volume can be reproduced.

【００２９】請求項３の発明では、音声波形パターン記
憶手段は、音声合成装置本体と着脱可能な記憶媒体から
なることを特徴とする請求項１の発明の音声合成装置と
する。According to a third aspect of the present invention, there is provided the voice synthesizing apparatus according to the first aspect of the present invention, wherein the voice waveform pattern storage means comprises a storage medium detachable from the main body of the voice synthesizing apparatus.

【００３０】従って、請求項３の発明の音声合成装置に
おいては、音声合成装置本体と着脱可能な記憶媒体を用
いることによって、必要に応じて音声の再現に必要な音
声波形データを補うことができ、音声合成機能を強化す
ることが可能となる。Therefore, in the voice synthesizing apparatus according to the third aspect of the present invention, by using a storage medium detachable from the main body of the voice synthesizing apparatus, it is possible to supplement voice waveform data necessary for voice reproduction as needed. Thus, it is possible to enhance the voice synthesis function.

【００３１】請求項４の発明では、音声の波形パターン
である音声波形パターンを各音声種別毎に格納する音声
波形パターン記憶手段から、音声波形パターンの種別を
指定する発音データに基づいて、これに対応する音声波
形パターンを取り出すと共に、音声波形パターンの１波
形長さを指定する音階データに基づいて、取り出された
音声波形パターンに、指定された波形長さを割り当て
て、出力するようにする。According to the fourth aspect of the present invention, the voice waveform pattern storage means for storing a voice waveform pattern, which is a voice waveform pattern, for each voice type is provided based on the sound data specifying the voice waveform pattern type. A corresponding audio waveform pattern is extracted, and a specified waveform length is assigned to the extracted audio waveform pattern based on musical scale data that specifies one waveform length of the audio waveform pattern, and is output.

【００３２】従って、請求項４の発明の音声合成方法に
おいては、音声の再現に必要な音階データと、発音デー
タとを受信し、その属性に基づき音声を再現することが
できる。その結果、音声データをそのまま送信する場合
に比べて、通信データの量が減少し、もって、通信時間
の短縮と、通信コストの削減とを実現することが可能と
なる。Therefore, in the voice synthesizing method according to the fourth aspect of the present invention, it is possible to receive scale data and pronunciation data necessary for voice reproduction and reproduce the voice based on the attributes. As a result, the amount of communication data is reduced as compared with the case where voice data is transmitted as it is, thereby making it possible to reduce the communication time and the communication cost.

【００３３】請求項５の発明では、音声の波形パターン
である音声波形パターンの種別を指定する発音データ
と、音声波形パターンの１波形長さを指定する音階デー
タと、音声波形パターンの繰り返し出力時間を指定する
発音時間データと、音声波形パターンの出力の大きさを
指定する音量データとからなる音属性データ、曲を構成
する音符が指定された音楽データ、及び画像を構成する
画像データを受信する受信手段と、受信された音属性デ
ータが入力され、当該音属性データに対応する音声を出
力する請求項２の発明の音声合成装置と、受信された音
楽データから曲を出力する音楽出力手段と、受信された
画像データから画像を出力する画像出力手段とを備え
る。According to the fifth aspect of the present invention, the pronunciation data for specifying the type of the audio waveform pattern which is the audio waveform pattern, the scale data for specifying one waveform length of the audio waveform pattern, and the repetition output time of the audio waveform pattern , Sound attribute data consisting of sounding time data designating the size of the output of the audio waveform pattern, music data designating the notes constituting the music, and image data constituting the image. 3. A receiving device, a voice synthesizing device according to claim 2, wherein the received sound attribute data is input, and outputs a sound corresponding to the sound attribute data, and a music output device that outputs a song from the received music data. And image output means for outputting an image from the received image data.

【００３４】従って、請求項５の発明の音声通信装置に
おいては、音声データをそのまま送信する場合に比べ
て、通信データの量が減少し、通信時間の短縮と、通信
コストの削減とを実現することができるのみならず、音
声の他に、音楽と、画像とをも合わせて出力することが
できる。その結果、例えば通信カラオケ装置や英会話学
習装置として応用することも可能となる。Therefore, in the voice communication device according to the fifth aspect of the present invention, the amount of communication data is reduced as compared with the case where voice data is transmitted as it is, thereby reducing the communication time and the communication cost. In addition to audio, music and images can be output in addition to audio. As a result, for example, it can be applied as a communication karaoke device or an English conversation learning device.

【００３５】請求項６の発明では、請求項５の発明の音
声通信装置において、音属性データ、音楽データ及び画
像データを記憶することが可能な受信データ記憶手段を
備え、受信手段により受信された音属性データ、音楽デ
ータ及び画像データを一旦受信データ記憶手段に記憶
し、指令された場合に、受信データ記憶手段に記憶され
た音属性データ、音楽データ及び画像データから対応す
る音声、曲及び画像を出力する。According to a sixth aspect of the present invention, in the voice communication device according to the fifth aspect of the present invention, there is provided a reception data storage means capable of storing sound attribute data, music data and image data, and the reception data is received by the reception means. The sound attribute data, music data and image data are temporarily stored in the reception data storage means, and when instructed, the corresponding sound, music and image are obtained from the sound attribute data, music data and image data stored in the reception data storage means. Is output.

【００３６】従って、請求項６の発明の音声通信装置に
おいては、複数の受信データをまとめて取得し、受信デ
ータ記憶手段に記憶させることができる。更に、受信デ
ータ記憶手段に記憶された受信データを、ユーザーの指
定により任意のタイミングで再生できるので、通信デー
タを受信している間のみ通信回線を接続していればよ
く、通信コストを削減することが可能となる。Therefore, in the voice communication device according to the sixth aspect of the present invention, it is possible to collectively acquire a plurality of pieces of received data and store them in the received data storage means. Further, since the reception data stored in the reception data storage means can be reproduced at an arbitrary timing according to the user's specification, the communication line only needs to be connected while the communication data is being received, thereby reducing the communication cost. It becomes possible.

【００３７】請求項７の発明では、請求項５の発明の音
声通信装置において、画像データは静止画像データとす
る。According to a seventh aspect of the present invention, in the voice communication apparatus of the fifth aspect, the image data is still image data.

【００３８】従って、請求項７の発明の音声通信装置に
おいては、動画データに比べて通信データ量が少ない静
止画像データを通信することにより、より一層の通信時
間の短縮と、通信コストの削減とを実現することが可能
となる。Therefore, in the voice communication apparatus according to the seventh aspect of the present invention, by communicating still image data having a smaller amount of communication data than moving image data, it is possible to further reduce communication time and communication cost. Can be realized.

【００３９】[0039]

【発明の実施の形態】以下に、本発明の実施の形態につ
いて図面を参照しながら説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００４０】（第１の実施の形態）本発明の第１の実施
の形態を図１から図６を用いて説明する。(First Embodiment) A first embodiment of the present invention will be described with reference to FIGS.

【００４１】図１は、本発明の実施の形態に係る音声合
成装置及び方法が適用された音声通信装置の全体構成の
一例を示すブロック図である。この音声通信装置は例え
ば通信カラオケ装置として適用されるものである。FIG. 1 is a block diagram showing an example of the overall configuration of a voice communication device to which a voice synthesis device and method according to an embodiment of the present invention are applied. This voice communication device is applied, for example, as a communication karaoke device.

【００４２】すなわち、本実施の形態による音声通信装
置１は、図１にその全体構成を示すように、音の再現に
最低限必要な属性を定義した音属性データａをディジタ
ル信号として送信する送信装置１０と、この音属性デー
タを受信し、その属性に基づき音を再現し、出力する受
信再生装置２０とを備えている。That is, as shown in FIG. 1, the voice communication device 1 according to the present embodiment transmits sound attribute data a defining a minimum attribute required for sound reproduction as a digital signal. The apparatus includes a device 10 and a reception / reproduction device 20 that receives the sound attribute data, reproduces sound based on the attribute, and outputs the sound.

【００４３】送信装置１０から送信された音属性データ
ａは、電話回線などが用いられた通信回線１１を介して
受信再生装置２０で受信される。The sound attribute data a transmitted from the transmitting device 10 is received by the receiving / reproducing device 20 via the communication line 11 using a telephone line or the like.

【００４４】図２は、このような音属性データａの構造
の一例を示すデータ構造図である。FIG. 2 is a data structure diagram showing an example of the structure of such sound attribute data a.

【００４５】音属性データａは、図２（ａ）にそのデー
タ構成の一例を示すように、音階データｅ（１バイ
ト）、発音データｆ（２バイト）、発音時間データｇ
（４バイト）、音量データｈ（１バイト）からなる８バ
イトから構成されて成る。As shown in FIG. 2A, the sound attribute data a includes scale data e (1 byte), sound data f (2 bytes), and sound time data g.
(4 bytes) and 8 bytes consisting of volume data h (1 byte).

【００４６】更に、音階データｅ（１バイト）は、図２
（ｂ）のデータ構成にその一例を示すように、性別（１
ビット）、発音パターン（１ビット）、音階（６ビッ
ト）の各データから構成されて成る。性別データが
「１」のときは男性、「０」のときは女性がそれぞれ定
義される。発音パターンは、男性、女性それぞれについ
てＡ、Ｂの２パターンが定義される。音階データは６ビ
ットあるので、６４の音階が定義される。１オクターブ
は「ドレミファソラシ」の７音階であるので、これは、
約９オクターブに相当する。Further, the scale data e (1 byte) is shown in FIG.
As shown in the example of the data structure of (b), gender (1
Bit), sound pattern (1 bit), and scale (6 bits). When the gender data is "1", a male is defined, and when the sex data is "0", a female is defined. As the sounding pattern, two patterns of A and B are defined for men and women, respectively. Since the scale data has 6 bits, 64 scales are defined. Since one octave is the seventh scale of "Doremi Faso Rashi", this is
This corresponds to about 9 octaves.

【００４７】また、４〜５オクターブ程度の音階しか使
用しないのであれば、図２（ｃ）に示すように、音階を
５ビットで定義し、発音パターンを２ビットで定義して
も良い。この場合、男女それぞれ４通りの発音パターン
が定義される。If only a scale of about 4 to 5 octaves is used, the scale may be defined by 5 bits and the tone generation pattern may be defined by 2 bits as shown in FIG. In this case, four sound patterns are defined for each of the male and female.

【００４８】図３は、本実施の形態による音声合成装置
及び方法が適用された音声通信装置１の受信再生装置２
０の構成の一例を示すブロック図である。FIG. 3 shows a receiving / reproducing apparatus 2 of a voice communication apparatus 1 to which the voice synthesizing apparatus and method according to the present embodiment are applied.
FIG. 3 is a block diagram illustrating an example of a configuration of a 0.

【００４９】すなわち、本実施の形態の音声合成装置及
び方法が適用された音声通信装置１の受信再生装置２０
は、送信装置１０より送信される音属性データａを受信
するモデム２１と、音属性データａを音階データｅ、発
音データｆ、発音時間データｇ、音量データｈに分別す
るセレクタ２２と、セレクタ２２によって分別されたデ
ータに基づいて、音階・発音データベース２４から必要
なデータを取り出し、音声出力可能なデータに再現する
デコーダ２３と、デコーダ２３によって再現されたデー
タをアナログに変換するＤ／Ａ変換器２５と、変換され
たアナログ信号を合成し、増幅する増幅器２６と、増幅
器２６に増幅された音声を出力する出力装置２７とを備
えている。That is, the receiving / reproducing device 20 of the voice communication device 1 to which the voice synthesizing device and method of the present embodiment is applied.
Is a modem 21 that receives the sound attribute data a transmitted from the transmitting device 10, a selector 22 that separates the sound attribute data a into scale data e, sound data f, sound time data g, and volume data h, and a selector 22. 23 that extracts necessary data from the scale / pronunciation database 24 based on the data separated by the decoder 23 and reproduces the data into data that can be output as speech, and a D / A converter that converts the data reproduced by the decoder 23 into analog data. 25, an amplifier 26 that combines and amplifies the converted analog signal, and an output device 27 that outputs the amplified sound to the amplifier 26.

【００５０】なお、請求項でいう音声合成装置は、例え
ばデコーダ２３及び音階・発音データベース２４からな
る。The speech synthesizer referred to in the claims comprises, for example, a decoder 23 and a scale / speech database 24.

【００５１】次に、以上のように構成した本実施の形態
の音声通信装置１の動作について説明する。Next, the operation of the voice communication device 1 according to the present embodiment configured as described above will be described.

【００５２】送信装置１０より通信回線１１を介して送
信された音属性データａは、受信再生装置２０のモデム
２１によって受信され、セレクタ２２に入力される。The sound attribute data a transmitted from the transmitting device 10 via the communication line 11 is received by the modem 21 of the receiving / reproducing device 20 and input to the selector 22.

【００５３】音属性データａは、図２（ａ）に示す通
り、８バイトから構成されるディジタルデータであり、
１バイト目が音階データｅ、２〜３バイト目が発音デー
タｆ、４〜７バイト目が発音時間データｇ、８バイト目
が音量データｈと規則的に構成されている。従って、セ
レクタ２２において、音属性データａはこの規則に従っ
て音階データｅ、発音データｆ、発音時間データｇ、音
量データｈに分別される。The sound attribute data a is digital data composed of 8 bytes as shown in FIG.
The first byte is regularly composed of musical scale data e, the second to third bytes are sounding data f, the fourth to seventh bytes are sounding time data g, and the eighth byte is volume data h. Therefore, the selector 22 classifies the sound attribute data a into scale data e, sounding data f, sounding time data g, and volume data h in accordance with this rule.

【００５４】分別された上記データは、デコーダ２３に
入力される。デコーダ２３は、音階・発音データベース
２４より、音階データｅと発音データｆとに基づいた音
階発音データを抽出し、更に、発音時間データｇと音量
データｈとに基づいて、おのおのの音階発音データに発
音時間と、音量とを割り付けることによって音声データ
を作成する。この音階発音データと音声データとの作成
方法について以下に説明する。The separated data is input to the decoder 23. The decoder 23 extracts scale pronunciation data based on the scale data e and the pronunciation data f from the scale / pronunciation database 24, and further converts each of the scale pronunciation data based on the pronunciation time data g and the volume data h. Voice data is created by assigning a pronunciation time and a volume. A method for creating the scale pronunciation data and the voice data will be described below.

【００５５】図４は、音階・発音データベース２４に格
納された音階・発音テーブルの一例を示すデータ構造図
である。FIG. 4 is a data structure diagram showing an example of the scale / pronunciation table stored in the scale / pronunciation database 24.

【００５６】また、図５および図６は、発音データｆ毎
に定義されている音声波形パターンの例を示す音声波形
図である。FIGS. 5 and 6 are audio waveform diagrams showing examples of audio waveform patterns defined for each sound data f.

【００５７】音階・発音データベース２４には、図４に
示すような音階・発音テーブルと、図５および図６に示
すような音声波形図のディジタルデータとが格納されて
いる。The scale / pronunciation database 24 stores a scale / pronunciation table as shown in FIG. 4 and digital data of voice waveform diagrams as shown in FIGS.

【００５８】図４に示す音階・発音テーブルは、図２
（ｂ）に示すように、発音パターンとして女性Ａ、女性
Ｂ、男性Ａ、男性Ｂの４パターンの指定が可能であり、
音階データｅを６４音まで設定した場合の例である。The scale / pronunciation table shown in FIG.
As shown in (b), four patterns of female A, female B, male A, and male B can be specified as sound patterns.
This is an example in which the scale data e is set up to 64 tones.

【００５９】すなわち、音階・発音データベース２４に
は、女性Ａ、女性Ｂ、男性Ａ、男性Ｂごとに音階データ
ｅと発音データｆとのマトリクスで定義された音階・発
音テーブルが格納されている。女性Ａ、女性Ｂ、男性
Ａ、男性Ｂは図２（ｂ）に示す様にそれぞれ音階データ
ｅの第１、第２ビットで定義され、例えば、女性Ａの場
合は「００」と認識され、該当する音階・発音テーブル
がデコーダ２３に呼び出されるようになっている。That is, the scale / pronunciation database 24 stores a scale / pronunciation table defined by a matrix of scale data e and pronunciation data f for each of female A, female B, male A, and male B. Female A, female B, male A, and male B are defined by the first and second bits of the scale data e, respectively, as shown in FIG. 2B. For example, in the case of female A, it is recognized as "00". The corresponding scale / pronunciation table is called by the decoder 23.

【００６０】音階・発音テーブルでは、音階データｅと
発音データｆとがそれぞれ２桁と３桁との数字で定義さ
れ、これにより、発音データｆと音階データｅとを組み
合わせてなる５桁の数字データが決定される。例えば
「あ」という音を、１オクターブ目の「ド」の音階で発
音する場合、この数字データは「００１０１」という識
別番号として認識される。In the scale / pronunciation table, the scale data e and the pronunciation data f are defined by two digits and three digits, respectively, whereby a five-digit number obtained by combining the pronunciation data f and the scale data e is obtained. The data is determined. For example, when the sound "a" is pronounced at the scale of "do" in the first octave, this numerical data is recognized as the identification number "00101".

【００６１】一方、図５および図６に示すような音声波
形図は、発音データｆ毎に固有のパターンとなり、周期
ｔが短くなると音階が高くなる。図５と図６とに示す音
声波形パターンは、それぞれ第１の発音（例えば
「あ」）、第２の発音（例えば「い」）に対する音声波
形を示す例であり、図５（ａ）と図６（ａ）、図５
（ｂ）と図６（ｂ）、図５（ｃ）と図６（ｃ）はそれぞ
れ波形周期が等しく、同じ音階である。On the other hand, the sound waveform diagrams as shown in FIGS. 5 and 6 have a unique pattern for each sound data f, and the scale becomes higher as the period t becomes shorter. The voice waveform patterns shown in FIG. 5 and FIG. 6 are examples showing voice waveforms for a first pronunciation (for example, “A”) and a second pronunciation (for example, “I”). 6 (a), 5
(B) and FIG. 6 (b), and FIG. 5 (c) and FIG. 6 (c) have the same waveform period and the same scale.

【００６２】また、図５（ａ）より図５（ｂ）の方が波
形周期が短く（ｔ＞ｔ’）高い音となる。波形周期ｔが
図５（ａ）の１／２である図５（ｃ）は、音階が図５
（ａ）の場合よりも１オクターブ高くなる。図６につい
ても同様である。The waveform of FIG. 5B has a shorter waveform period (t> t ') and is higher than that of FIG. 5A. FIG. 5C in which the waveform period t is １／ of FIG.
One octave higher than in the case of (a). The same applies to FIG.

【００６３】音階・発音データベース２４には、図５お
よび図６に示すような音声波形パターンが、女性Ａ、女
性Ｂ、男性Ａ、男性Ｂ毎に各発音データｆ別に各々ディ
ジタルデータとして格納されている。また、波形周期ｔ
は、音階データｅに対応して決定される。The scale / pronunciation database 24 stores voice waveform patterns as shown in FIGS. 5 and 6 as digital data for each of the pronunciation data f for each of female A, female B, male A, and male B. I have. Also, the waveform period t
Is determined corresponding to the scale data e.

【００６４】女性Ａ、女性Ｂ、男性Ａ、男性Ｂ毎に、発
音データｆと音階データｅとに基づいて定義される５桁
の識別番号のうち、上３桁の数字から発音データｆに基
づく音声波形のディジタルデータが音階・発音データベ
ース２４から抽出され、デコーダ２３に呼び出される。
また下２桁の数字から、音階データｅに基づく波形周期
ｔがデコーダ２３に呼び出される。デコーダ２３におい
て、このようにして得られた音声波形データと、波形周
期とに基づいて音声発音データが作成される。For each of female A, female B, male A, and male B, among the five-digit identification numbers defined based on the sound data f and the scale data e, the upper three digits are used as the basis of the sound data f. Digital data of the voice waveform is extracted from the scale / pronunciation database 24 and called by the decoder 23.
The decoder 23 calls the waveform cycle t based on the scale data e from the last two digits. In the decoder 23, voice pronunciation data is created based on the voice waveform data thus obtained and the waveform cycle.

【００６５】更に、デコーダ２３において、この音声発
音データに、発音時間データｇと、音量データｈとを割
り付けることによって音声再現データが作成される。す
なわち、音声再現データとは、音声波形データに基づい
て定義された音を、波形周期ｔに基づいた高さで、発音
時間データｇに基づいた長さで、音量データｈに基づい
た音量で出力するよう定義されたディジタルデータであ
る。Further, in the decoder 23, sound reproduction data is created by assigning sound generation time data g and volume data h to the sound generation data. That is, the sound reproduction data is a sound defined based on the sound waveform data, output at a height based on the waveform period t, a length based on the sounding time data g, and a sound volume based on the sound volume data h. Digital data defined to be

【００６６】このように、デコーダ２３で作成された音
声再現データは、Ｄ／Ａ変換器２５においてアナログ信
号に変換された後に増幅器２６により増幅され、出力装
置２７より出力される。As described above, the audio reproduction data created by the decoder 23 is converted into an analog signal by the D / A converter 25, then amplified by the amplifier 26, and output from the output device 27.

【００６７】上述したように、本実施の形態の音声合成
装置及び方法が適用された音声通信装置においては、音
声の再現に最低限必要な属性である音階、発音、発音時
間、音量を定義した音属性データａを送信装置１０から
受信再生装置２０へ送信し、受信再生装置２０におい
て、音属性データａに基づき音声を再現することができ
る。As described above, in the voice communication apparatus to which the voice synthesizing apparatus and method of the present embodiment is applied, the scale, pronunciation, sounding time, and volume, which are the minimum attributes required for voice reproduction, are defined. The sound attribute data a is transmitted from the transmission device 10 to the reception / reproduction device 20, and the reception / reproduction device 20 can reproduce a sound based on the sound attribute data a.

【００６８】すなわち、音声データをそのまま送信する
場合に比べて、送信するデータ量が減少し、もって、送
信時間の短縮と、通信コストの削減とを実現することが
可能となる。That is, as compared with the case where voice data is transmitted as it is, the amount of data to be transmitted is reduced, so that it is possible to reduce the transmission time and the communication cost.

【００６９】（第２の実施の形態）本発明の第２の実施
の形態を図７を用いて説明する。(Second Embodiment) A second embodiment of the present invention will be described with reference to FIG.

【００７０】本発明の第２の実施の形態の音声通信装置
１は、以下に説明する点を除き図１に示すものと同様の
ものである。The voice communication device 1 according to the second embodiment of the present invention is the same as that shown in FIG. 1 except for the points described below.

【００７１】すなわち、音声通信装置１には、図１の受
信再生装置２０に代えて、図７の受信再生装置３０が用
いられ、この受信再生装置３０は音属性データａのみな
らず、音楽データｂ、画像データｃも再現する機能をも
備えている。That is, as the voice communication device 1, the reception / reproduction device 30 shown in FIG. 7 is used instead of the reception / reproduction device 20 shown in FIG. b, a function to reproduce the image data c is also provided.

【００７２】これらの機能は、受信再生装置３０に、音
楽データｂから曲を再現する手段と、画像データｃから
画像を再現する手段とを付加することによって実現され
る。These functions are realized by adding, to the receiving / reproducing apparatus 30, means for reproducing music from the music data b and means for reproducing an image from the image data c.

【００７３】図７は、本実施の形態の音声通信装置１の
受信再生装置３０の構成の一例を示すブロック図であ
り、図３と同一部分には同一符号を付してその説明を省
略し、ここでは異なる部分についてのみ述べる。FIG. 7 is a block diagram showing an example of the configuration of the receiving / reproducing apparatus 30 of the voice communication apparatus 1 according to the present embodiment. Here, only the different parts will be described.

【００７４】すなわち、本実施の形態の音声通信装置１
の受信再生装置３０は、図３に示す受信再生装置２０の
構成に、音楽再生部３１と画像表示部３２とを付加し、
おのおのをセレクタ２２に接続した構成としている。That is, the voice communication device 1 of the present embodiment
The reception / reproduction device 30 of the present embodiment adds a music reproduction unit 31 and an image display unit 32 to the configuration of the reception / reproduction device 20 shown in FIG.
Each is connected to the selector 22.

【００７５】なお、本実施の形態では、送信装置１０
は、音楽データｂ、画像データｃも送信するように改良
されており、送信装置１０より送信される送信データ
は、音属性データａ、音楽データｂ、画像データｃを含
んでいる。In this embodiment, the transmitting device 10
Has been improved so as to also transmit music data b and image data c, and the transmission data transmitted from the transmission device 10 includes sound attribute data a, music data b, and image data c.

【００７６】受信再生装置３０におけるセレクタ２２
は、このような送信データから音属性データａと、音楽
データｂと、画像データｃとを分別し、それぞれデコー
ダ２３ａ、２３ｂ、２３ｃに入力する機能を備えてい
る。Selector 22 in receiving and reproducing apparatus 30
Has a function of separating sound attribute data a, music data b, and image data c from such transmission data and inputting them to the decoders 23a, 23b, and 23c, respectively.

【００７７】デコーダ２３ａと、音階・発音データベー
ス２４と、Ｄ／Ａ変換器２５と、増幅器２６ａと、出力
装置２７ａとの構成、接続関係及び機能は、第１の実施
の形態で説明した図３に示すデコーダ２３、音階・発音
データベース２４、Ｄ／Ａ変換器２５、増幅器２６、出
力装置２７と同様である。The configuration, connection relationship and function of the decoder 23a, the scale / sound generation database 24, the D / A converter 25, the amplifier 26a, and the output device 27a are the same as those described in the first embodiment with reference to FIG. , A scale / pronunciation database 24, a D / A converter 25, an amplifier 26, and an output device 27.

【００７８】音楽再生部３１は、セレクタ２２によって
分別されたＭＩＤＩ圧縮形式の音楽データｂをディジタ
ルデータに解凍するデコーダ２３ｂと、デコーダ２３ｂ
によって解凍されたデータに基づいて曲を再現するシー
ケンサ３３と、再現された曲を増幅する増幅器２６ｂ
と、増幅器２６ｂにより増幅された曲を出力する出力装
置２７ｂとを備えている。The music reproducing unit 31 includes a decoder 23b for decompressing the MIDI compressed music data b separated by the selector 22 into digital data, and a decoder 23b.
Sequencer 33 that reproduces a song based on the data decompressed by the amplifier, and an amplifier 26b that amplifies the reproduced song
And an output device 27b for outputting the music amplified by the amplifier 26b.

【００７９】画像表示部３２は、セレクタ２２によって
分別された画像データｃを展開するデコーダ２３ｃと、
デコーダ２３ｃによって展開されたデータを表示する表
示装置３４とを備えている。The image display section 32 includes a decoder 23c for expanding the image data c sorted by the selector 22,
A display device 34 for displaying the data expanded by the decoder 23c.

【００８０】次に、以上のように構成した本実施の形態
の音声通信装置１の動作について説明する。Next, the operation of the voice communication device 1 according to the present embodiment configured as described above will be described.

【００８１】送信装置１０より、通信回線１１を介して
受信再生装置３０に送信された送信データは、受信再生
装置３０のモデム２１によって受信され、更にセレクタ
２２に入力される。この送信データには、音属性データ
ａと、音楽データｂと、画像データｃとが含まれてお
り、セレクタ２２によってこれらのデータは分別され
る。Transmission data transmitted from the transmitting apparatus 10 to the receiving / reproducing apparatus 30 via the communication line 11 is received by the modem 21 of the receiving / reproducing apparatus 30 and further input to the selector 22. The transmission data includes sound attribute data a, music data b, and image data c, and these data are separated by the selector 22.

【００８２】セレクタ２２によって分別された音属性デ
ータａが、デコーダ２３ａ、Ｄ／Ａ変換器２５、増幅器
２６ａを経て、出力装置２７ａから音声が出力されるま
での動作については、図２を用いた第１の実施の形態で
説明したものと同様であるので、ここではその説明を省
略する。The operation until the sound attribute data a separated by the selector 22 passes through the decoder 23a, the D / A converter 25, and the amplifier 26a to output the sound from the output device 27a is described with reference to FIG. Since it is the same as that described in the first embodiment, the description is omitted here.

【００８３】一方、送信データに含まれる音楽データｂ
は、電子楽器の演奏情報などを相互にやり取りするため
に定められた、ハードウェアおよび通信プロトコルの国
際標準規格であるＭＩＤＩ規格に準拠した圧縮データで
ある。On the other hand, music data b included in the transmission data
Is compressed data conforming to the MIDI standard, which is an international standard for hardware and communication protocols, defined for exchanging performance information of electronic musical instruments with each other.

【００８４】セレクタ２２によって分別されたこのよう
な音楽データｂは、デコーダ２３ｂによってデータ解凍
された後に、シーケンサ３３に送られる。シーケンサ３
３ではデコーダ２３ｂによって解凍されたデータに基づ
いて曲が再現され、更に、再現された曲は増幅器２６ｂ
において増幅された後に出力装置２７ｂから出力され
る。The music data b separated by the selector 22 is sent to the sequencer 33 after being decompressed by the decoder 23b. Sequencer 3
3, the music is reproduced based on the data decompressed by the decoder 23b, and the reproduced music is further amplified by the amplifier 26b.
Is output from the output device 27b after being amplified.

【００８５】また、画像データｃはＪＰＥＧ形式で圧縮
された複数の静止画像データが、表示時間情報とともに
格納されたデータであり、セレクタ２２において送信デ
ータより分別された後に、デコーダ２３ｃに送られ、デ
コーダ２３ｃにおいてデータ解凍された後に、指定され
た表示時間に基づいて表示装置３４から連続的に表示さ
れる。このＪＰＥＧ形式の静止画像データは、ＭＰＥＧ
形式などの動画データに比べてデータ量が極端に少な
い。The image data c is data in which a plurality of still image data compressed in the JPEG format are stored together with the display time information. After being separated from the transmission data by the selector 22, the image data c is sent to the decoder 23c. After the data is decompressed in the decoder 23c, the data is continuously displayed from the display device 34 based on the designated display time. The JPEG still image data is MPEG
The amount of data is extremely small compared to video data such as formats.

【００８６】上述したように、本実施の形態の音声通信
装置においては、音声と、音楽と、画像とが合わせて出
力される。As described above, in the voice communication device according to the present embodiment, voice, music, and images are output together.

【００８７】すなわち、本実施の形態の音声通信装置
は、音楽再生と音声再生と画像表示とを同時に行うこと
ができるので、音楽データｂにより曲の演奏を行ない、
音属性データａによりバックコーラスを付け、画像デー
タｃにより背景静止画の表示、あるいは歌詞を表示する
ことにより、通信カラオケ装置に応用することが可能で
ある。That is, the voice communication device of the present embodiment can simultaneously perform music reproduction, voice reproduction, and image display, and perform music using the music data b.
By applying a back chorus using the sound attribute data a and displaying a background still image or displaying lyrics using the image data c, the present invention can be applied to a communication karaoke apparatus.

【００８８】送信データに含まれる音属性データａは、
第１の実施の形態で説明したように音の再現に必要な最
低限のデータである音階データｅ、発音データｆ、発音
時間データｇ、音量データｈからなるものである。ま
た、同じく送信データに含まれる画像データｃは、ＪＰ
ＥＧ形式の静止画像データであり、ＭＰＥＧ形式などの
動画データに比べてデータ量が極端に少ない。The sound attribute data a included in the transmission data is
As described in the first embodiment, it is composed of scale data e, sounding data f, sounding time data g, and sound volume data h, which are minimum data necessary for sound reproduction. The image data c also included in the transmission data is JP
This is still image data in the EG format, and the data amount is extremely small as compared with moving image data in the MPEG format or the like.

【００８９】更に、本実施の形態の音声通信装置では、
画像毎に表示時間を設定することができるので、曲の進
行とともに背景画を変化させたり、歌詞を文字コードと
して表示し、曲の進行とともに発音データｆの変化によ
り歌詞の色が変化するように設定することにより、あた
かも動画の如く表示することも可能となる。Further, in the voice communication device of the present embodiment,
Since the display time can be set for each image, the background image changes as the song progresses, the lyrics are displayed as character codes, and the color of the lyrics changes as the pronunciation data f changes as the song progresses. By setting, it is possible to display as if it were a moving image.

【００９０】以上の結果、本実施の形態の音声通信装置
によって、データ送信時間の短縮と、通信コストの削減
とを実現した通信カラオケ装置として利用することが可
能となる。As a result, the voice communication apparatus according to the present embodiment can be used as a communication karaoke apparatus which has reduced the data transmission time and the communication cost.

【００９１】この通信カラオケ装置では、本物の動画提
供はなされないもののＩＳＤＮ等の高速回線を使用する
ことなく、簡易にかつ安価に家庭用の通信カラオケ装置
を実現することができ、かかる通信カラオケ装置の一般
家庭への普及を容易に実現させるものである。In this communication karaoke apparatus, although a real moving picture is not provided, a home-use communication karaoke apparatus can be easily and inexpensively realized without using a high-speed line such as ISDN. Is easily realized in general households.

【００９２】（第３の実施の形態）本発明の第３の実施
の形態を図８を用いて説明する。(Third Embodiment) A third embodiment of the present invention will be described with reference to FIG.

【００９３】図８は、本実施の形態の音声合成装置及び
方法が適用された音声通信装置１の受信再生装置４０の
構成の一例を示すブロック図であり、図３と同一部分に
は同一符号を付してその説明を省略し、ここでは異なる
部分についてその説明を行なう。FIG. 8 is a block diagram showing an example of the configuration of the receiving / reproducing device 40 of the voice communication device 1 to which the voice synthesizing device and method of the present embodiment is applied. The description thereof will be omitted with the addition of, and different portions will be described here.

【００９４】本発明の第３の実施の形態における音声通
信方法が適用される音声通信装置１は、受信再生装置４
０に着脱可能な音階・発音データＲＯＭ４１を付加した
ことを除き第１、および第２の実施の形態の音声通信装
置１とその構成を同一としている。The voice communication apparatus 1 to which the voice communication method according to the third embodiment of the present invention is applied includes a receiving and reproducing apparatus 4
The configuration is the same as that of the voice communication device 1 of the first and second embodiments except that a detachable scale / sound data ROM 41 is added to 0.

【００９５】すなわち、本実施の形態の音声合成装置及
び方法が適用される音声通信装置１の受信再生装置４０
は、図８にその構成を示すように、図３に示す受信再生
装置２０の構成に、デコーダ２３に接続可能な音階・発
音データＲＯＭ４１を付加したものである。That is, the receiving / reproducing device 40 of the voice communication device 1 to which the voice synthesizing device and method of the present embodiment is applied.
As shown in FIG. 8, is a configuration in which a scale / sound data ROM 41 connectable to the decoder 23 is added to the configuration of the reception / reproduction device 20 shown in FIG.

【００９６】この音階・発音データＲＯＭ４１は、音階
・発音データベース２４と同様に、図４に示すような音
階・発音テーブルと、図５および図６に示すような音声
波形図のディジタルデータとを格納している。また、音
階・発音データＲＯＭ４１は、受信再生装置４０との着
脱が可能であり、必要に応じて受信再生装置４０のＲＯ
Ｍスロット（図示せず）に挿入することによって、デコ
ーダ２３と接続できるようになっている。The scale / pronunciation data ROM 41 stores a scale / pronunciation table as shown in FIG. 4 and digital data of a speech waveform diagram as shown in FIGS. are doing. The scale and sound data ROM 41 can be attached to and detached from the reception / reproduction device 40.
It can be connected to the decoder 23 by inserting it into an M slot (not shown).

【００９７】次に、以上のように構成した本実施の形態
の音声合成装置及び方法が適用される音声通信装置１の
作用について説明する。Next, the operation of the voice communication device 1 to which the voice synthesizing apparatus and method of the present embodiment configured as described above is applied will be described.

【００９８】送信装置１０から、音階・発音データベー
ス２４に格納された音階・発音テーブルや音声波形図の
ディジタルデータからは再生できないような音属性デー
タａが送信された場合、その再生に必要な音階・発音テ
ーブルや音声波形図のディジタルデータが格納された音
階・発音データＲＯＭ４１を、受信再生装置４０のＲＯ
Ｍスロットに装着することにより、音階・発音データＲ
ＯＭ４１が保持しているデータがデコーダ２３に提供さ
れる。When the transmitting device 10 transmits the sound attribute data a that cannot be reproduced from the scale / pronunciation table stored in the scale / pronunciation database 24 or the digital data of the sound waveform diagram, the scale required for the reproduction is transmitted. The scale / sound data ROM 41 storing the sound table and the digital data of the sound waveform diagram are stored in the RO of the receiving / reproducing device 40.
By attaching to the M slot, the scale and sound data R
The data held by the OM 41 is provided to the decoder 23.

【００９９】たとえば、英語で発音された音属性データ
ａが送信されても、音階・発音データベース２４に、英
語の音階・発音テーブルおよび音声波形図のディジタル
データが格納されていない場合、音属性データａは再生
されない。しかしながら、英語の音階・発音テーブルお
よび音声波形図のディジタルデータが格納されている音
階・発音データＲＯＭ４１を、受信再生装置４０に装着
することにより、デコーダ２３において、音階・発音デ
ータＲＯＭ４１に格納された英語の音階・発音データを
用いて音声が再現される。For example, if the sound attribute data a pronounced in English is transmitted but the digital data of the English scale / pronunciation table and the speech waveform diagram is not stored in the scale / pronunciation database 24, the sound attribute data may be transmitted. a is not reproduced. However, when the scale / sound data ROM 41 storing digital data of the English scale / sound table and the sound waveform diagram is attached to the receiving / reproducing device 40, the decoder 23 stores the scale / sound data ROM 41 in the decoder 23. The voice is reproduced using English scale and pronunciation data.

【０１００】上述したように、本実施の形態の音声合成
装置及び方法が適用される音声通信装置１においては、
上記のような作用により、着脱可能な音階・発音データ
ＲＯＭ４１をオプションとして用いることができ、これ
によって、音階・発音データベース２４のデータを補う
ことができ、音声合成機能を強化することが可能とな
る。このような音声通信装置１は、例えば英会話学習用
の装置としても利用可能である。As described above, in the voice communication device 1 to which the voice synthesizing apparatus and method of the present embodiment is applied,
With the above-described operation, the detachable scale / pronunciation data ROM 41 can be optionally used, whereby the data of the scale / pronunciation database 24 can be supplemented, and the voice synthesis function can be enhanced. . Such a voice communication device 1 can be used, for example, as a device for learning English conversation.

【０１０１】もちろん、図７に示す受信再生装置３０
も、このような音階・発音データＲＯＭ４１をデコーダ
２３ａに接続できるような構成としても良い。この場
合、音声合成機能を強化した通信カラオケ装置として応
用することも可能となる。Of course, the receiving and reproducing apparatus 30 shown in FIG.
Alternatively, the configuration may be such that such scale / sound data ROM 41 can be connected to the decoder 23a. In this case, the present invention can be applied to a communication karaoke apparatus having an enhanced voice synthesis function.

【０１０２】（第４の実施の形態）本発明の第４の実施
の形態を図９、図１０を用いて説明する。(Fourth Embodiment) A fourth embodiment of the present invention will be described with reference to FIGS.

【０１０３】図９は、本実施の形態の音声通信装置１の
受信再生装置５０の構成の一例を示すブロック図であ
り、図７と同一部分には同一符号を付してその説明を省
略し、ここでは異なる部分についてその説明を行なう。FIG. 9 is a block diagram showing an example of the configuration of the receiving / reproducing device 50 of the voice communication device 1 according to the present embodiment. The same parts as those in FIG. Here, different parts will be described.

【０１０４】本発明の第４の実施の形態における音声通
信装置１の受信再生装置５０は、記憶装置５１、制御部
５２及び入力部５３を付加したことを除き、図７に示す
受信再生装置３０とその構成を同一としている。The receiving / reproducing device 50 of the voice communication device 1 according to the fourth embodiment of the present invention is similar to the receiving / reproducing device 30 shown in FIG. 7 except that a storage device 51, a control unit 52 and an input unit 53 are added. And its configuration is the same.

【０１０５】すなわち、本実施の形態の音声通信装置１
の受信再生装置５０は、図９にその構成を示すように、
図７に示す受信再生装置３０の構成に、モデム２１とセ
レクタ２２との間に接続可能な記憶装置５１と、受信再
生装置５０を制御する制御部５２と、ユーザ入力を受け
付けその内容を制御部５２に受け渡す入力部５３とを付
加したものである。That is, the voice communication device 1 of the present embodiment
As shown in FIG. 9, the receiving and reproducing apparatus 50 of FIG.
The configuration of the receiving and reproducing apparatus 30 shown in FIG. 7 includes a storage device 51 connectable between the modem 21 and the selector 22, a control section 52 for controlling the receiving and reproducing apparatus 50, and a control section for accepting user input and transmitting the contents. 52 and an input unit 53 to be transferred to the input unit 52.

【０１０６】記憶装置５１は、モデム２１が受信した送
信データを一旦格納し、必要に応じてセレクタ２２に引
き渡す機能を有する装置である。一方、入力部５３は、
ユーザの要求を受け付け、その内容を制御部５２に受け
渡す機能を有する装置である。また、制御部５２は、入
力部５３から受け渡されたユーザ指示に従って、受信再
生装置５０を制御する装置である。The storage device 51 is a device having a function of temporarily storing the transmission data received by the modem 21 and transferring it to the selector 22 as necessary. On the other hand, the input unit 53
The device has a function of receiving a user request and transferring the content to the control unit 52. The control unit 52 is a device that controls the reception / reproduction device 50 according to a user instruction passed from the input unit 53.

【０１０７】このように構成された、本実施の形態にお
ける音声通信装置１の動作について図１０のフローチャ
ートを用いて説明する。The operation of the voice communication device 1 thus configured according to the present embodiment will be described with reference to the flowchart of FIG.

【０１０８】ユーザは、受信再生装置５０の入力部５３
から、受信したいデータ（例えば、音声やカラオケ曲）
を、あらかじめ決められたコード番号等で入力し、制御
部５２に登録する（Ｓ１）。尚、入力部はバッファメモ
リ（図示せず）を備えており、複数のコード番号を一度
に入力できるようになっている。The user operates the input unit 53 of the receiving / reproducing device 50.
From the data you want to receive (for example, voice or karaoke songs)
Is input with a predetermined code number or the like, and is registered in the control unit 52 (S1). The input unit has a buffer memory (not shown) so that a plurality of code numbers can be input at a time.

【０１０９】制御部５２にコード番号等が登録される
と、制御部５２によりモデム２１が起動し、これによっ
て受信再生装置５０と送信装置１０とが通信回線１１を
介して接続（Ｓ２）され、入力したコード番号等に対応
したデータが、送信装置１０から送信される。これによ
って、該当するデータが受信再生装置５０のモデム２１
に受信され（Ｓ３）、受信後更に、記憶装置５１に記憶
される（Ｓ４）。When a code number or the like is registered in the control unit 52, the modem 21 is started by the control unit 52, whereby the reception / reproduction device 50 and the transmission device 10 are connected via the communication line 11 (S2). Data corresponding to the input code number or the like is transmitted from the transmission device 10. As a result, the corresponding data is transmitted to the modem 21 of the receiving / reproducing device 50.
(S3), and further stored in the storage device 51 after the reception (S4).

【０１１０】このように、記憶装置５１にデータが一旦
記憶されると、後述する方法によって、ユーザの要求に
応じていつでもデータが再現される。As described above, once data is stored in the storage device 51, the data is reproduced at any time in response to a user's request by a method described later.

【０１１１】データが記憶装置５１に記憶された後に、
更にデータの追加を行なう場合（Ｓ５：Ｙｅｓ）には、
再びＳ１に戻る。データ追加を行なわない場合（Ｓ５：
Ｎｏ）には、制御部５２によって通信回線１１の切断
（Ｓ６）がなされる。After the data is stored in the storage device 51,
If additional data is to be added (S5: Yes),
It returns to S1 again. When not adding data (S5:
In No, the control unit 52 disconnects the communication line 11 (S6).

【０１１２】回線切断後も、更にデータの追加を行なう
場合（Ｓ７：Ｙｅｓ）においては、再びＳ１に戻り、一
方、データ追加を行なわない場合（Ｓ７：Ｎｏ）にはデ
ータ取得ルーチンを終了する。If data is to be added even after the line is disconnected (S7: Yes), the process returns to S1 again, while if data is not added (S7: No), the data acquisition routine ends.

【０１１３】記憶装置５１に少なくとも１つのデータが
記憶されている場合、ユーザ要求によって、入力部５３
から、記憶装置５１に記憶されているデータが一覧表示
される。これは、制御部５２が記憶装置５１にアクセス
して記憶されているデータを検索し、その結果を入力部
５３に引き渡すことによってなされる。When at least one data is stored in the storage device 51, the input unit 53
, A list of data stored in the storage device 51 is displayed. This is performed by the control unit 52 accessing the storage device 51 to search the stored data and transferring the result to the input unit 53.

【０１１４】ユーザによって、入力部５３に表示された
記憶データ一覧から、データが選択されると、そのデー
タは制御部５２によって、記憶装置５１からセレクタ２
２に引き渡され、以降は第２の実施の形態で説明したよ
うにして、データが再現され、出力される。When the user selects data from the list of stored data displayed on the input unit 53, the data is transferred from the storage device 51 to the selector 2 by the control unit 52.
2 and thereafter, the data is reproduced and output as described in the second embodiment.

【０１１５】上述したように、本実施の形態の音声通信
装置１においては、上記のような作用により、指定した
複数の送信データを、通信回線１１を介してまとめて取
得し、記憶装置５１に記憶させることができる。As described above, in the voice communication device 1 of the present embodiment, a plurality of specified transmission data are collectively acquired via the communication line 11 by the above-described operation, and are stored in the storage device 51. Can be memorized.

【０１１６】更に、記憶装置５１に記憶させた送信デー
タを、ユーザーの指定により任意のタイミングで再現で
きるので、通信回線１１を接続している必要があるの
は、送信データを取得している間のみであり、これによ
って通信回線の利用時間を短縮することができる。Further, since the transmission data stored in the storage device 51 can be reproduced at an arbitrary timing according to the user's specification, the communication line 11 needs to be connected only while the transmission data is being acquired. Only, the use time of the communication line can be reduced.

【０１１７】その結果、通信コストの削減が可能な音声
通信装置１を実現することができる。これは、本実施の
形態の音声通信装置１を、通信カラオケ装置として用い
る場合、ランニングコスト低減の効果が大きい。As a result, it is possible to realize the voice communication device 1 capable of reducing the communication cost. This has a great effect of reducing running costs when the voice communication device 1 of the present embodiment is used as a communication karaoke device.

【０１１８】なお、各実施の形態では、図１に示す音声
通信装置１を音声通信装置として説明したが、本発明で
いう音声通信装置は、この音声通信装置１のみならず、
受信再生装置２０、３０、４０、５０単体、更には通信
カラオケ装置や英会話学習装置等をも含むものである。In each of the embodiments, the voice communication device 1 shown in FIG. 1 has been described as a voice communication device. However, the voice communication device according to the present invention is not limited to this voice communication device 1,
The receiving / reproducing apparatuses 20, 30, 40, 50 alone, and further include a communication karaoke apparatus, an English conversation learning apparatus, and the like.

【０１１９】[0119]

【発明の効果】以上説明したように、本発明によれば、
送信するデータ量を減少させることによって、データ送
信時間を短縮することが可能となり、もって、通信コス
トの削減と、再生するまでの時間を短縮することができ
る音声通信装置を実現することができる。As described above, according to the present invention,
By reducing the amount of data to be transmitted, it is possible to reduce the data transmission time, so that it is possible to realize a voice communication device capable of reducing the communication cost and the time required for reproduction.

【０１２０】また、本発明によれば、情報量の少ない属
性データだけを入力として音声を合成することができる
音声合成装置、および音声合成方法を実現することがで
きる。Further, according to the present invention, it is possible to realize a voice synthesizing apparatus and a voice synthesizing method capable of synthesizing a voice by inputting only attribute data having a small amount of information.

[Brief description of the drawings]

【図１】第１の実施の形態に係る音声合成装置及び方法
が適用された音声通信装置の全体構成の一例を示すブロ
ック図。FIG. 1 is a block diagram showing an example of the overall configuration of a voice communication device to which a voice synthesis device and a voice synthesis method according to a first embodiment are applied.

【図２】音属性データの構造の一例を示すデータ構造
図。FIG. 2 is a data structure diagram showing an example of the structure of sound attribute data.

【図３】第１の実施の形態に係る音声合成装置及び方法
が適用された音声通信装置の受信再生装置の構成の一例
を示すブロック図。FIG. 3 is a block diagram showing an example of a configuration of a receiving / reproducing device of the voice communication device to which the voice synthesizing device and method according to the first embodiment are applied;

【図４】音階・発音データベースに格納された音階・発
音テーブルの一例を示すデータ構造図。FIG. 4 is a data structure diagram showing an example of a scale / pronunciation table stored in a scale / pronunciation database.

【図５】発音データ毎に定義されている音声波形パター
ンの一例を示す音声波形図。FIG. 5 is an audio waveform diagram showing an example of an audio waveform pattern defined for each pronunciation data.

【図６】発音データ毎に定義されている音声波形パター
ンの一例を示す音声波形図。FIG. 6 is an audio waveform diagram showing an example of an audio waveform pattern defined for each pronunciation data.

【図７】第２の実施の形態に係る音声通信装置の受信再
生装置の構成の一例を示すブロック図。FIG. 7 is a block diagram showing an example of a configuration of a receiving and reproducing device of the voice communication device according to the second embodiment.

【図８】第３の実施の形態に係る音声合成装置及び方法
が適用された音声通信装置の受信再生装置の構成の一例
を示すブロック図。FIG. 8 is a block diagram showing an example of the configuration of a receiving and reproducing device of a voice communication device to which a voice synthesizing device and method according to a third embodiment are applied.

【図９】第４の実施の形態に係る音声通信装置の受信再
生装置の構成の一例を示すブロック図。FIG. 9 is a block diagram showing an example of the configuration of a receiving and reproducing device of a voice communication device according to a fourth embodiment.

【図１０】第４の実施の形態に係る音声通信装置の動作
を示すフローチャート。FIG. 10 is a flowchart showing the operation of the voice communication device according to the fourth embodiment.

【図１１】従来から用いられている音声通信装置の全体
構成の一例を示すブロック図。FIG. 11 is a block diagram showing an example of the overall configuration of a conventionally used voice communication device.

【図１２】従来から用いられている音声通信装置の受信
再生装置の構成の一例を示すブロック図。FIG. 12 is a block diagram showing an example of a configuration of a receiving and reproducing device of a conventionally used voice communication device.

【図１３】従来から用いられている音声通信装置を、通
信カラオケ装置に応用した場合における受信再生装置の
構成の一例を示すブロック図。FIG. 13 is a block diagram showing an example of the configuration of a reception / playback apparatus when a conventionally used voice communication apparatus is applied to a communication karaoke apparatus.

[Explanation of symbols]

１…音声通信装置、１０…送信装置、１１…通信回線、２０、３０、４０、５０…受信再生装置、２１…モデム、２２…セレクタ、２３…デコーダ、２４…音階・発音データベース、２５…Ｄ／Ａ変換器、２６…増幅器、２７…出力装置、３１…音楽再生部、３２…画像表示部、３３…シーケンサ、３４…表示装置、４１…音階・発音データＲＯＭ、５１…記憶装置、５２…制御部、５３…入力部。 DESCRIPTION OF SYMBOLS 1 ... Voice communication apparatus, 10 ... Transmission apparatus, 11 ... Communication line, 20, 30, 40, 50 ... Receiving / reproducing apparatus, 21 ... Modem, 22 ... Selector, 23 ... Decoder, 24 ... Scale / pronunciation database, 25 ... D / A converter, 26 amplifier, 27 output device, 31 music reproduction unit, 32 image display unit, 33 sequencer, 34 display device, 41 scale / sound data ROM, 51 storage device, 52 storage device Control part, 53 ... Input part.

Claims

[Claims]

1. An audio waveform pattern storage means for storing an audio waveform pattern, which is an audio waveform pattern, for each audio type, and an audio waveform corresponding to the audio waveform pattern based on sound data designating the type of the audio waveform pattern. A pattern is extracted from the audio waveform pattern storage means, and a specified waveform length is assigned to the extracted audio waveform pattern based on musical scale data specifying one waveform length of the audio waveform pattern, and is output. A voice synthesizing device, comprising: voice synthesizing means.

2. The voice synthesizing device according to claim 1, wherein the voice synthesizing unit specifies sounding time data for specifying a repetitive output time of the voice waveform pattern, and a size of an output of the voice waveform pattern. A voice synthesizing apparatus which outputs a voice waveform pattern to which a designated waveform length is assigned based on a sound generation duration and a volume based on volume data to be performed.

3. The speech synthesizer according to claim 1, wherein said speech waveform pattern storage means comprises a storage medium detachable from said speech synthesizer main body.

4. A sound waveform pattern storage means for storing a sound waveform pattern, which is a sound waveform pattern, for each sound type, based on sound data specifying the type of the sound waveform pattern, based on sound data corresponding to the sound waveform pattern. In addition to extracting a pattern, a specified waveform length is assigned to the extracted audio waveform pattern based on scale data specifying one waveform length of the audio waveform pattern, and the extracted audio waveform pattern is output. Speech synthesis method to be used.

5. A method for designating pronunciation data for specifying a type of an audio waveform pattern which is an audio waveform pattern, scale data for specifying one waveform length of the audio waveform pattern, and specifying a repetitive output time of the audio waveform pattern. Receiving means for receiving sound attribute data consisting of sounding time data and sound volume data designating the magnitude of the output of the audio waveform pattern, music data designating musical notes constituting a song, and image data constituting an image The voice synthesizing apparatus according to claim 2, wherein the received sound attribute data is input, and a sound corresponding to the sound attribute data is output, and a music output that outputs the music from the received music data. Means, and image output means for outputting the image from the received image data.

6. The voice communication device according to claim 5, further comprising: a reception data storage unit capable of storing the sound attribute data, the music data, and the image data, wherein the reception data is received by the reception unit. Sound attribute data, music data and image data are temporarily stored in the reception data storage means, and when instructed, corresponding sound, music, and music are stored from the sound attribute data, music data and image data stored in the reception data storage means. And a voice communication device for outputting an image.

7. The voice communication device according to claim 5, wherein said image data is still image data.