JP2007256297A

JP2007256297A - Speech processing method and communication system, and communication terminal and server and program

Info

Publication number: JP2007256297A
Application number: JP2004079081A
Authority: JP
Inventors: Minako Miyamoto; 美奈子宮本
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2004-03-18
Filing date: 2004-03-18
Publication date: 2007-10-04
Also published as: WO2005091274A1

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system in which a program and data which are incorporated in a communication terminal beforehand are interlocked with speech processing and operated. <P>SOLUTION: The communication terminal 100 includes: a first program and data storage means 101; an speech processing means 102; an speech processing language information creating means 103 for creating language information for the speech processing means; a transmitting and receiving means 104 for acquiring the program/data from a server 200; a second program and data storage means 105 for storing the acquired program/data; and a control means 106 for interlocking and controlling the speech processing means 102 and the program/data of the first program and data storage means 101, based on the second program and the program/data stored in and data storage means. The server 200 includes: a transmitting and receiving means 201; and an speech processing language information creating means 202 for creating language information, based on the data transmitted from the communication terminal. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、通信システムに関し、特に、端末と、該端末と通信接続されるサーバとが処理の連携を実現するシステムと方法並びにコンピュータ・プログラムに関する。 The present invention relates to a communication system, and more particularly to a system and method for realizing cooperation of processing between a terminal and a server connected to the terminal, and a computer program.

従来の通信端末およびシステムの一例が、後記特許文献１、特許文献２等に記載されている。このうち、後記特許文献１に記載の通信端末は、音声認識部と、音声合成部、制御部、送受信部、データ変換部を備え、入力音声を音声認識によりテキストに変換して送信し、送受信部より受信したデータを音声合成により読み上げるものである。 Examples of conventional communication terminals and systems are described in Patent Document 1, Patent Document 2, and the like described later. Among these, the communication terminal described in Patent Document 1 below includes a speech recognition unit, a speech synthesis unit, a control unit, a transmission / reception unit, and a data conversion unit, converts input speech into text by speech recognition, transmits the text, and transmits and receives The data received from the unit is read out by speech synthesis.

また、後記特許文献２に記載された通信端末は、音声認識部と、音声合成部、制御部（メール処理部）を備え、音声入力によりメールを作成し、音声合成によりメールを読み上げるものである。 The communication terminal described in Patent Document 2 below includes a voice recognition unit, a voice synthesis unit, and a control unit (mail processing unit), creates a mail by voice input, and reads out the mail by voice synthesis. .

特開２００３−１８８９４８号公報（第４頁、第１図）Japanese Unexamined Patent Publication No. 2003-188948 (page 4, FIG. 1) 特開２００２−０７７３１５号号公報（第２、３頁、第１図）Japanese Patent Application Laid-Open No. 2002-077315 (Pages 2, 3 and 1)

上記した従来の端末では、端末外部より受信したプログラムが、端末に内蔵されているプログラムや、プログラムが管理するデータ、特に、ユーザ独自のデータやシステムの状態に応じて、ダイナミックに変化するデータと、音声処理機能とを、例えば端末に固有の所望の態様で、連携させて動作させることができない。 In the above-described conventional terminal, the program received from the outside of the terminal is a program built in the terminal, data managed by the program, in particular, data that changes dynamically according to the user's own data or the state of the system. The voice processing function cannot be operated in cooperation in a desired manner unique to the terminal, for example.

したがって、本発明の目的は、通信端末に内蔵されたプログラム及び／又はデータと、音声処理等の処理とを組み合わせて所望の機能を実現可能とした通信システムおよびサーバおよび通信端末を提供することにある。 Accordingly, an object of the present invention is to provide a communication system, a server, and a communication terminal capable of realizing a desired function by combining a program and / or data incorporated in a communication terminal and processing such as voice processing. is there.

本願で開示される発明は、上記目的を達成するため、概略以下の構成とされる。 The invention disclosed in the present application is generally configured as follows in order to achieve the above object.

本発明の一つのアスペクト（側面）に係る通信端末は、音声認識及び／又は音声合成の処理を行う音声処理部と、通信端末上で予め定められた所定の機能を実現するためのプログラム及び／又はデータを少なくとも記憶する第１の記憶部と、前記通信端末外部から前記通信端末に入力され、前記第１の記憶部に記憶されているプログラム及び／又はデータと、前記音声処理部による音声処理との連携の仕方を規定するプログラム及び又はデータを少なくとも記憶する第２の記憶部と、前記通信端末外部から入力された前記プログラム及び／又はデータと、前記第１の記憶部に記憶されている前記プログラム及び／又はデータと用いて、前記音声処理部による音声処理と、前記第１の記憶部に記憶されている前記プログラム及び／又はデータによる前記機能と、を連携動作させる制御を行う制御部と、を備えている。 A communication terminal according to one aspect of the present invention includes a speech processing unit that performs speech recognition and / or speech synthesis processing, a program for realizing a predetermined function on the communication terminal, and / or Alternatively, a first storage unit that stores at least data, a program and / or data that is input to the communication terminal from the outside of the communication terminal and stored in the first storage unit, and audio processing by the audio processing unit Is stored in the first storage unit, the second storage unit that stores at least a program and / or data that defines how to cooperate with the computer, the program and / or data input from the outside of the communication terminal Using the program and / or data, the voice processing by the voice processing unit and the program and / or data stored in the first storage unit That includes a control unit which performs control to work together with, and the function.

本発明において、前記制御部は、前記第２の記憶部に記憶されているプログラムを起動し、起動された前記プログラムが、前記第１の記憶部に記憶されているプログラムを呼び出すか、データを用いて、前記音声処理部による音声処理と、前記第１の記憶部に記憶されているプログラム及び／又はデータとを連携動作させる。 In the present invention, the control unit activates a program stored in the second storage unit, and the activated program calls a program stored in the first storage unit or stores data. The voice processing by the voice processing unit and the program and / or data stored in the first storage unit are operated in cooperation.

本発明の一つのアスペクト（側面）に係る通信端末は、音声認識及び／又は音声合成の処理を行う音声処理部と、通信端末が保持する情報を少なくとも記憶する第１の記憶部と、前記通信端末の外部から前記通信端末に入力され、音声処理用の言語情報を作成する手順を規定したプログラムを少なくとも記憶する第２の記憶部と、前記第２の記憶部に記憶されている、音声処理用の言語情報を作成する手順を規定したプログラムの起動を制御する制御部と、を備え、前記第２の記憶部に記憶されている前記音声処理用の言語情報を作成する手順を規定したプログラムは、少なくとも、前記第１の記憶部に記憶された情報を用いて、前記音声処理部での音声処理に用いられる言語情報を作成し、前記音声処理部は、前記作成された言語情報を用いて前記音声処理を行う。 A communication terminal according to one aspect of the present invention includes a speech processing unit that performs speech recognition and / or speech synthesis processing, a first storage unit that stores at least information held by the communication terminal, and the communication A second storage unit that stores at least a program that is input from the outside of the terminal to the communication terminal and that defines a procedure for creating language information for voice processing; and a voice processing stored in the second storage unit A program that defines a procedure for creating language information for speech processing that is stored in the second storage unit. Uses at least the information stored in the first storage unit to create language information used for voice processing in the voice processing unit, and the voice processing unit uses the created language information. It performs the voice processing Te.

本発明の他のアスペクト（側面）に係るシステムと、前記通信端末と通信接続するサーバと、を備え、前記通信端末は、前記通信端末に予め記憶されているプログラム及び／又はデータに基づいて、音声処理手段で用いられる言語情報を作成する手段と、前記言語情報を用いて、前記通信端末に予め記憶されているプログラム及び／又はデータと、前記サーバからダウンロードしたプログラム及び／又はデータとに基づき音声処理を連携動作する手段と、を備えている。 A system according to another aspect (side surface) of the present invention, and a server for communication connection with the communication terminal, the communication terminal based on a program and / or data stored in advance in the communication terminal, Based on means for creating language information used in the speech processing means, programs and / or data stored in advance in the communication terminal using the language information, and programs and / or data downloaded from the server And a means for operating the voice processing in a coordinated manner.

本発明の他のアスペクト（側面）に係るシステムは、通信端末と、前記通信端末と通信接続するサーバと、を備え、前記通信端末が、前記通信端末が予め記憶保持するプログラム及び／又はデータを記憶する第１の記憶部と、音声認識と音声合成の少なくとも一方の音声処理を行う音声処理手段と、前記第１の記憶部に記憶されているプログラム及び／又はデータに従って、前記音声処理手段で用いられる言語情報（例えば辞書や文法や言語モデル等）を作成する音声処理言語情報作成手段と、前記サーバからプログラム及び／又はデータを取得する手段と、前記サーバから取得した前記プログラム及び／又はデータを記憶する第２の記憶部と、前記第２の記憶部に記憶されたプログラム及び／又はデータに基づいて、前記音声処理手段と、前記第１の記憶部のプログラム及び／又はデータを連携制御する制御手段と、を含む。また、前記サーバは、前記通信端末から送信される情報を受信し、前記サーバで生成されたプログラム及び／又はデータを前記通信端末に送信する手段と、前記通信端末から送信されたデータを基に前記サーバ側に格納されているデータより、音声処理手段で用いられる言語情報を作成する音声処理言語情報作成手段と、を含む。 A system according to another aspect of the present invention includes a communication terminal and a server that is communicatively connected to the communication terminal, and the communication terminal stores a program and / or data stored and held in advance by the communication terminal. In accordance with the program and / or the data stored in the first storage unit, the voice processing unit that performs voice processing of at least one of voice recognition and voice synthesis, and the program and / or data stored in the first storage unit, Speech processing language information creation means for creating language information to be used (for example, dictionary, grammar, language model, etc.), means for obtaining a program and / or data from the server, and the program and / or data obtained from the server Based on a second storage unit that stores the program and / or data stored in the second storage unit, Serial includes first control means for cooperation control programs and / or data storage unit. Further, the server receives information transmitted from the communication terminal, and based on the data transmitted from the communication terminal and means for transmitting the program and / or data generated by the server to the communication terminal. Voice processing language information creating means for creating language information used by the voice processing means from data stored on the server side.

本発明のさらに他のアスペクト（側面）に係るシステムは、前記音声処理言語情報作成手段が、前記第１の記憶部と前記第２の記憶部に記憶されているプログラム及び／又はデータに従って、前記音声処理手段で用いられる言語情報を作成し、前記制御手段が、第２の記憶部に記憶されているプログラム及び／又はデータを用いて、前記音声処理手段と、前記第１の記憶部のプログラム及び／又はデータを、連携させる制御を行う構成とされる。 In a system according to still another aspect of the present invention, the speech processing language information creating unit is configured to perform the above-described program and / or data stored in the first storage unit and the second storage unit. Language information used in the voice processing unit is created, and the control unit uses the program and / or data stored in the second storage unit to program the voice processing unit and the first storage unit. And / or control for linking data.

本発明のさらに他のアスペクト（側面）に係るシステムは、通信端末と、１つ又は複数のサーバを含み、前記サーバは、前記通信端末から送信される情報を受信し、前記サーバ側で生成したプログラム及び／又はデータを通信端末に送信する手段と、前記通信端末から送信されたデータと、前記サーバ側に格納されているデータとから音声処理用の辞書を作成する音声処理言語情報作成手段と、を含む。また、前記通信端末は、前記通信端末に予め記憶保持されるプログラム及び／又はデータを記憶する第１の記憶部と、音声認識と音声合成の少なくとも一方の音声処理を行う音声処理手段と、前記サーバからプログラム及び／又はデータを取得する手段と、前記サーバから取得したプログラム及び／又はデータを記憶する第２の記憶部と、前記第１の記憶部と前記第２の記憶部の双方に記憶されているプログラム及び／又はデータに従って、前記音声処理手段で用いられる言語情報を作成する音声処理言語情報作成手段と、前記第２の記憶部に記憶されているプログラム及び／又はデータを用いて、前記音声処理手段と、前記第１の記憶部に記憶されているプログラム及び／又はデータを連携させる制御手段と、前記音声処理言語情報作成手段で作成した音声言語情報と、前記サーバの前記音声処理言語情報作成手段で作成した音声言語情報とを合成する音声処理言語情報統合手段と、を含む。 A system according to still another aspect of the present invention includes a communication terminal and one or more servers, and the server receives information transmitted from the communication terminal and generates the information on the server side. Means for transmitting a program and / or data to a communication terminal; voice processing language information creating means for creating a dictionary for voice processing from data transmitted from the communication terminal and data stored on the server side; ,including. The communication terminal includes a first storage unit that stores a program and / or data stored and held in advance in the communication terminal, a voice processing unit that performs voice processing of at least one of voice recognition and voice synthesis, Means for acquiring a program and / or data from a server, a second storage unit for storing the program and / or data acquired from the server, and storing both in the first storage unit and the second storage unit Using speech processing language information creating means for creating language information used in the speech processing means in accordance with the program and / or data being stored, and the program and / or data stored in the second storage unit, The voice processing means; a control means for linking programs and / or data stored in the first storage unit; In including a spoken language information generated, and the audio processing language information integration unit for synthesizing the audio language information created by the speech processing language information generating means of said server, a.

本発明の１つのアスペクトに係る方法は、
（Ａ）通信端末が、通信端末外部より前記通信端末にダウンロードされたプログラム及び／又はデータと、前記通信端末に予め記憶されているプログラム及び／又はデータとに基づき、音声処理で用いられる言語情報を生成する工程と、
（Ｂ）前記通信端末が、前記音声言語情報を用いて、前記通信端末に予め記憶されているプログラム及び／又はデータと、音声処理とを連携動作させる処理を実行する工程と、を含む。 A method according to one aspect of the present invention includes:
(A) Language information used in speech processing based on a program and / or data downloaded to the communication terminal from the outside by the communication terminal and a program and / or data stored in advance in the communication terminal Generating
(B) The communication terminal includes a step of executing a process in which a program and / or data stored in advance in the communication terminal and a voice process are operated in cooperation using the spoken language information.

本発明の他のアスペクトに係る方法は、
（Ａ）通信端末が、通信端末外部よりプログラム及び／又はデータを取得する工程と、
（Ｂ）前記通信端末が、前記取得したプログラム及び／又はデータと、前記通信端末に予め記憶されているプログラム及び／又はデータとに基づいて、音声処理で用いられる音声言語情報を生成する工程と、
（Ｃ）前記通信端末が、前記音声言語情報を用いて、前記通信端末に予め記憶されているプログラムと、前記通信端末外部よりダウンロードしたプログラム及び／又はデータと、前記音声処理とを連携動作させる工程と、
を含む。 A method according to another aspect of the present invention includes:
(A) a communication terminal acquiring a program and / or data from outside the communication terminal;
(B) The communication terminal generates speech language information used in speech processing based on the acquired program and / or data and the program and / or data stored in advance in the communication terminal; ,
(C) Using the speech language information, the communication terminal causes a program stored in advance in the communication terminal, a program and / or data downloaded from the outside of the communication terminal, and the voice processing to cooperate with each other. Process,
including.

本発明のさらに他のアスペクトに係る方法は、
（Ａ）通信端末が、１つ又は複数のサーバよりダウンロードされたプログラム及び／又はデータと、前記通信端末に予め記憶されているプログラム及び／又はデータより、通信端末内及び／又は複数のサーバの少なくとも１つで生成された音声処理用の音声言語情報を統合する工程と、
（Ｂ）前記通信端末が、前記音声言語情報を用いて、前記通信端末に予め記憶されているプログラム及び／又はデータと、サーバよりダウンロードしたプログラム及び／又はデータと、前記音声処理とを連携動作させる工程と、
を含む。 A method according to yet another aspect of the present invention includes:
(A) The communication terminal is configured to store the program in the communication terminal and / or the plurality of servers based on the program and / or data downloaded from one or more servers and the program and / or data stored in advance in the communication terminal. Integrating speech language information for speech processing generated by at least one;
(B) The communication terminal uses the speech language information to link a program and / or data stored in advance in the communication terminal, a program and / or data downloaded from a server, and the voice processing. A process of
including.

本発明の１つのアスペクトに係るコンピュータ・プログラムは、通信端末を構成するコンピュータに、
（Ａ）前記通信端末に予め記憶されているプログラム及び／又はデータを記憶する処理と、
（Ｂ）前記通信端末の外部で生成されたプログラム及び／又はデータを受信する処理と、
（Ｃ）前記受信したプログラム及び／又はデータを記憶する処理と、
（Ｄ）音声認識と音声合成の少なくとも一方を実行する処理と、
（Ｅ）前記通信端末に予め記憶されているプログラム及び／又はデータに基づいて音声処理を行うための音声言語情報を生成する処理と、
（Ｆ）前記受信したプログラム及び／又はデータにより、前記通信端末に予め記憶されているプログラム及び／又はデータと、音声処理とを連携させる処理と、
を実行させるためのプログラムよりなる。 A computer program according to one aspect of the present invention is provided in a computer constituting a communication terminal.
(A) a process of storing a program and / or data stored in advance in the communication terminal;
(B) a process of receiving a program and / or data generated outside the communication terminal;
(C) storing the received program and / or data;
(D) a process of executing at least one of speech recognition and speech synthesis;
(E) generating speech language information for performing speech processing based on a program and / or data stored in advance in the communication terminal;
(F) A process of linking a program and / or data stored in advance in the communication terminal and voice processing by the received program and / or data;
It consists of a program to execute.

本発明の他のアスペクトに係るコンピュータ・プログラムは、通信端末を構成するコンピュータに、
（Ａ）通信端末外部よりプログラム及び／又はデータを取得する処理と、
（Ｂ）音声認識と音声合成の少なくとも一方を実行する処理と、
（Ｃ）前記ダウンロードされたプログラム及び／又はデータと、前記通信端末に予め記憶されているプログラム及び／又はデータと、に基づいて、音声処理で用いられる音声言語情報を生成する処理と、
（Ｄ）前記音声言語情報を用いて、前記通信端末に予め記憶されているプログラムと、前記通信端末外部よりダウンロードしたプログラム及び／又はデータと、前記音声処理とを連携動作させる処理と、を実行させるためのプログラムよりなる。 A computer program according to another aspect of the present invention provides a computer constituting a communication terminal,
(A) processing for acquiring a program and / or data from outside the communication terminal;
(B) a process of executing at least one of speech recognition and speech synthesis;
(C) processing for generating spoken language information used in speech processing based on the downloaded program and / or data and the program and / or data stored in advance in the communication terminal;
(D) Using the spoken language information, executing a program stored in advance in the communication terminal, a program and / or data downloaded from the outside of the communication terminal, and a process for causing the voice processing to operate in cooperation with each other It consists of a program to let you.

本発明のさらに他のアスペクトに係るコンピュータ・プログラムは、通信端末を構成するコンピュータに、
（Ａ）前記通信端末に予め記憶保持されるプログラム及び／又はデータを第１の記憶部に記憶する処理と、
（Ｂ）通信端末外部の１つ又は複数のサーバより、プログラム及び／又はデータを受信する処理と、
（Ｃ）前記受信したプログラム及び／又はデータを第２の記憶部に記憶する処理と、
（Ｄ）音声認識と音声合成の内の少なくとも一方を行う処理と、
（Ｅ）前記第２の記憶部に記憶されているプログラム及び／又はデータと、前記第１の記憶部に予め記憶されているプログラム及び／又はデータとに基づいて、音声処理を行うための音声言語情報を通信端末内で生成する処理と、
（Ｆ）前記通信端末内で又は前記サーバで生成された音声言語情報を統合するための処理と、
（Ｇ）前記第２の記憶部のプログラム及び／又はデータにより、前記第１の記憶部に予め記憶されているプログラム及び／又はデータと、音声処理とを連携させる処理と、を実行させるためのプログラムよりなる。 A computer program according to still another aspect of the present invention provides a computer constituting a communication terminal,
(A) processing for storing a program and / or data stored in advance in the communication terminal in the first storage unit;
(B) a process of receiving a program and / or data from one or more servers outside the communication terminal;
(C) a process of storing the received program and / or data in a second storage unit;
(D) processing for performing at least one of speech recognition and speech synthesis;
(E) Audio for performing audio processing based on the program and / or data stored in the second storage unit and the program and / or data stored in advance in the first storage unit Processing to generate language information in the communication terminal;
(F) a process for integrating spoken language information generated in the communication terminal or in the server;
(G) A program and / or data stored in the first storage unit and a process for linking audio processing with the program and / or data stored in the first storage unit. It consists of a program.

本発明によれば、携帯端末等通信端末に内蔵されているプログラムが、音声処理に対応していない場合でも、サーバより、プログラムをダウンロードすることで、音声処理機能と連携させて動作させることができる。 According to the present invention, even if a program built in a communication terminal such as a portable terminal does not support voice processing, it can be operated in cooperation with the voice processing function by downloading the program from the server. it can.

また、本発明によれば、連携方法が異なるプログラムを、ユーザの好み等によって自在に入れ替えて、実行させることができる。 Further, according to the present invention, programs having different cooperation methods can be freely exchanged and executed according to user preferences or the like.

次に、本発明を実施するための最良の形態について、図面を参照して詳細に説明する。 Next, the best mode for carrying out the present invention will be described in detail with reference to the drawings.

図１を参照すると、本発明の第１の実施の形態は、通信端末１００とサーバ２００とを備えている。通信端末１００は、第１プログラムおよびデータ格納手段１０１と、音声処理手段１０２と、音声処理言語情報作成手段１０３と、送受信手段１０４と、第２プログラムおよびデータ格納手段１０５と、制御手段１０６とを備えている。サーバ２００は、送受信手段２０１と、音声処理言語情報作成手段２０２とを備えている。これらの手段はそれぞれ概略つぎのように動作する。 Referring to FIG. 1, the first embodiment of the present invention includes a communication terminal 100 and a server 200. The communication terminal 100 includes a first program and data storage unit 101, a voice processing unit 102, a voice processing language information creation unit 103, a transmission / reception unit 104, a second program and data storage unit 105, and a control unit 106. I have. The server 200 includes transmission / reception means 201 and voice processing language information creation means 202. Each of these means generally operates as follows.

第１プログラムおよびデータ格納手段１０１は、通信端末１００に予め内蔵されているプログラムやプログラムが管理するデータを格納する。第１プログラムおよびデータ格納手段１０１に格納されるデータとしては、通信端末１００の状態に応じて動的に変化するデータや、通信端末１００の利用者の個人データがある。 The first program and data storage means 101 stores a program built in the communication terminal 100 in advance and data managed by the program. The data stored in the first program and data storage means 101 includes data that dynamically changes according to the state of the communication terminal 100 and personal data of the user of the communication terminal 100.

音声処理手段１０２は、音声認識と音声合成の少なくとも１つを行う。 The voice processing unit 102 performs at least one of voice recognition and voice synthesis.

音声処理言語情報作成手段１０３は、第１プログラムおよびデータ格納手段１０１に記憶されている第１のプログラムおよびデータ基づいて、音声処理手段１０２用の辞書や文法や言語モデル等を作成する。 The voice processing language information creation unit 103 creates a dictionary, a grammar, a language model, and the like for the voice processing unit 102 based on the first program and data stored in the first program and data storage unit 101.

送受信手段１０４は、通信端末１００側の情報をサーバ２００に送信し、また、プログラムおよびデータを、通信端末１００外部から受信する。 The transmission / reception means 104 transmits information on the communication terminal 100 side to the server 200 and receives programs and data from the outside of the communication terminal 100.

第２プログラムおよびデータ格納手段１０５は、送受信手段１０４により、通信端末１００外部から受信したプログラムおよびデータを格納する。 Second program and data storage means 105 stores the program and data received from the outside of communication terminal 100 by transmission / reception means 104.

制御手段１０６は、送受信手段１０４により取得した第２のプログラムおよびデータから、プログラムおよびデータを呼び出して（サブルーチンコール等）、音声処理手段１０２と、第１のプログラムおよびデータを連携させる。 The control unit 106 calls the program and data (subroutine call or the like) from the second program and data acquired by the transmission / reception unit 104 to link the audio processing unit 102 with the first program and data.

サーバ２００の送受信手段２０１は、通信端末１００側からの情報を受信し、またサーバ２００からプログラムやデータを通信端末１００側に送信する。 The transmission / reception means 201 of the server 200 receives information from the communication terminal 100 side, and transmits a program and data from the server 200 to the communication terminal 100 side.

音声処理言語情報作成手段２０２は、通信端末１００から送信されたデータを基に、サーバ２００側に格納されているデータより、音声処理用の辞書を作成する。 The voice processing language information creation unit 202 creates a voice processing dictionary from data stored on the server 200 side based on the data transmitted from the communication terminal 100.

図２は、本発明の一実施形態の動作を説明するためのフローチャートである。図１及び図２を参照して、本実施の形態の全体の動作について詳細に説明する。 FIG. 2 is a flowchart for explaining the operation of the embodiment of the present invention. With reference to FIG.1 and FIG.2, the whole operation | movement of this Embodiment is demonstrated in detail.

通信端末１００の音声処理手段１０２で用いる音声言語情報は、通信端末１００側で生成される場合と、サーバ２００側で生成される場合の２通りがある。 The speech language information used by the speech processing means 102 of the communication terminal 100 is classified into two types: a case where it is generated on the communication terminal 100 side and a case where it is generated on the server 200 side.

通信端末１００側で生成する場合、送受信手段１０４により、通信端末１００外部からプログラムおよびデータを受信し（ステップＳａ１）、第２プログラムおよびデータ格納手段１０５に格納する（ステップＳａ２）。 When generating on the communication terminal 100 side, the transmission / reception means 104 receives the program and data from the outside of the communication terminal 100 (step Sa1) and stores them in the second program and data storage means 105 (step Sa2).

次に、制御手段１０６は、第２プログラムおよびデータ格納手段１０５に記憶されたプログラムおよびデータを呼び出し、起動する（ステップＳａ３）。 Next, the control means 106 calls and activates the program and data stored in the second program and data storage means 105 (step Sa3).

起動されたプログラム中に記述されている、音声処理用の言語情報作成手順により、第１プログラムおよびデータ格納手段１０１のデータを用いて、言語情報を生成する（ステップＳａ４）。 Language information is generated by using the first program and the data stored in the data storage means 101 according to the voice information language information creation procedure described in the activated program (step Sa4).

制御手段１０６では、前述の生成された音声処理用言語情報を読み出し、音声処理手段１０２を起動させる（ステップＳａ５）。 The control means 106 reads out the generated speech processing language information and activates the speech processing means 102 (step Sa5).

ステップＳａ３で起動されたプログラムに従って、第１プログラムおよびデータ格納手段１０１のプログラムを呼び出し（ステップＳａ６）、音声処理手段１０２と連携動作させる（ステップＳａ７）。 In accordance with the program started in step Sa3, the first program and the program of the data storage means 101 are called (step Sa6), and the voice processing means 102 is operated in cooperation (step Sa7).

サーバ２００側で生成する場合には、送受信手段１０４により、通信端末１００外部からプログラムおよびデータを受信し（ステップＳｂ１）、第２プログラムおよびデータ格納手段１０５に格納する（ステップＳｂ２）。 When generating on the server 200 side, the transmission / reception means 104 receives the program and data from the outside of the communication terminal 100 (step Sb1) and stores them in the second program and data storage means 105 (step Sb2).

次に、制御手段１０６は、第２プログラムおよびデータ格納手段１０５から前述のプログラムおよびデータを起動する（ステップＳｂ３）。 Next, the control means 106 starts the above-mentioned program and data from the 2nd program and data storage means 105 (step Sb3).

起動されたプログラム中に記述された音声処理用の言語情報作成手順と、言語情報作成に必要なデータを、サーバ２００に送信する（ステップＳｂ４）。サーバ２００側では、音声処理言語情報作成手段２０２が、通信端末１００から送信された言語情報作成手順とデータと、サーバ２００側に格納しているデータとを用いて音声言語情報を生成し（ステップＳｂ５）、通信端末１００に送信する（ステップＳｂ６）。 Language information creation procedure for speech processing described in the activated program and data necessary for language information creation are transmitted to the server 200 (step Sb4). On the server 200 side, the speech processing language information creation means 202 generates speech language information using the language information creation procedure and data transmitted from the communication terminal 100 and the data stored on the server 200 side (step Sb5) and transmitted to the communication terminal 100 (step Sb6).

通信端末１００では、これを受けて、生成された音声処理用言語情報を読み出し、音声処理手段１０２を起動させる（ステップＳｂ７）。 In response to this, the communication terminal 100 reads the generated speech processing language information and activates the speech processing means 102 (step Sb7).

さらに、ステップＳｂ３で起動されたプログラムに従って第１プログラムおよびデータ格納手段１０１のプログラムを呼び出し（ステップＳｂ８）、音声処理手段１０２と連携動作させる（ステップＳｂ９）。 Further, the first program and the program of the data storage means 101 are called in accordance with the program activated in step Sb3 (step Sb8), and the voice processing means 102 is operated in cooperation (step Sb9).

次に、本実施の形態の作用効果について説明する。 Next, the effect of this Embodiment is demonstrated.

本実施の形態では、通信端末１００およびサーバ２００で音声処理言語情報作成手段１０３、２０２を実行するように構成されているため、通信端末１００に予め内蔵されているプログラムや、該プログラムが管理するデータが、音声認識や音声合成といった音声処理に対応していない場合であっても、通信端末１００外から、音声処理と連携されるプログラムをダウンロードすることで、通信端末１００で、音声処理機能を利用することができる。 In this embodiment, the communication terminal 100 and the server 200 are configured to execute the voice processing language information creation means 103 and 202, and therefore, the program built in the communication terminal 100 and the program manage it. Even if the data is not compatible with voice processing such as voice recognition and voice synthesis, the communication terminal 100 can have a voice processing function by downloading a program linked with the voice processing from outside the communication terminal 100. Can be used.

また、本実施の形態では、送受信手段１０４と、受信により取得したプログラムを格納する手段と、このプログラムを呼び出して実行するための制御を行う制御手段１０６と、を有しているため、連携方法が異なるプログラムを、ユーザの好みによって入れ替え可能である。 In this embodiment, since the transmission / reception means 104, the means for storing the program acquired by reception, and the control means 106 for performing control for calling and executing the program are provided, the cooperation method Different programs can be replaced according to user preference.

次に、本発明を第２の実施形態について図面を参照して詳細に説明する。 Next, the second embodiment of the present invention will be described in detail with reference to the drawings.

図２０を参照すると、本発明の第２の実施の形態は、通信端末１０００とサーバ２００とを備えている。通信端末１０００は、第１プログラムおよびデータ格納手段１１０１と、音声処理手段１１０２と、音声処理言語情報作成手段１１０３と、送受信手段１１０４と、第２プログラムおよびデータ格納手段１１０５と、制御手段１１０６とを備えている。サーバ２００は、送受信手段２０１と、音声処理言語情報作成手段２０２を備えている。これらの手段はそれぞれ概略つぎのように動作する。 Referring to FIG. 20, the second embodiment of the present invention includes a communication terminal 1000 and a server 200. The communication terminal 1000 includes a first program and data storage unit 1101, a voice processing unit 1102, a voice processing language information creation unit 1103, a transmission / reception unit 1104, a second program and data storage unit 1105, and a control unit 1106. I have. The server 200 includes transmission / reception means 201 and voice processing language information creation means 202. Each of these means generally operates as follows.

第１プログラムおよびデータ格納手段１１０１は、通信端末１０００に予め内蔵されているプログラムや、プログラムが管理するデータを格納する。第１プログラムおよびデータ格納手段１１０１に格納されるデータとしては、通信端末１０００の状態に応じて、動的に変化するデータや、端末利用者の個人データ等がある。音声処理手段１１０２は、音声認識及び／又は音声合成を行う。 The first program and data storage means 1101 stores a program built in the communication terminal 1000 in advance and data managed by the program. The data stored in the first program and data storage means 1101 includes data that dynamically changes according to the state of the communication terminal 1000, personal data of the terminal user, and the like. The voice processing unit 1102 performs voice recognition and / or voice synthesis.

音声処理言語情報作成手段１１０３は、第１のプログラムおよびデータと第２のプログラムおよびデータが記憶している内容に従って、音声処理手段１１０２用の辞書や文法や言語モデル等を作成する。 The voice processing language information creation unit 1103 creates a dictionary, a grammar, a language model, and the like for the voice processing unit 1102 according to the contents stored in the first program and data and the second program and data.

送受信手段１１０４は、通信端末１０００側の情報を送信し、プログラムおよびデータを端末外部から受信する。 The transmission / reception means 1104 transmits information on the communication terminal 1000 side, and receives programs and data from the outside of the terminal.

第２プログラムおよびデータ格納手段１１０５は、送受信手段１１０４により、通信端末１０００外部から受信したプログラムおよびデータを格納する。 The second program and data storage unit 1105 stores the program and data received from the outside of the communication terminal 1000 by the transmission / reception unit 1104.

制御手段１１０６は、送受信手段１１０４により取得した第２のプログラムおよびデータからプログラムおよびデータを呼び出して、音声処理手段１１０２および第１のプログラムおよびデータを連携させる。 The control unit 1106 calls the program and data from the second program and data acquired by the transmission / reception unit 1104 to link the audio processing unit 1102 and the first program and data.

送受信手段２０１は、通信端末１０００側からの情報を受信し、サーバ２００で生成されたプログラムやデータを端末側に送信する。 The transmission / reception means 201 receives information from the communication terminal 1000 side, and transmits a program and data generated by the server 200 to the terminal side.

音声処理言語情報作成手段２０２は、通信端末１０００から送信されたデータを基に、サーバ２００側に格納されているデータより音声処理用の辞書を作成する。 The voice processing language information creation unit 202 creates a voice processing dictionary from data stored on the server 200 side based on the data transmitted from the communication terminal 1000.

次に、図２０及び図２１のフローチャートを参照して、本実施の形態の全体の動作について詳細に説明する。 Next, the overall operation of the present embodiment will be described in detail with reference to the flowcharts of FIGS.

まず、送受信手段１１０４により、通信端末１０００外部から第２プログラムおよびデータ格納手段１１０５にプログラムおよびデータを受信し（ステップＳ２１０１）、第２プログラムおよびデータ格納手段１１０５に格納する（ステップＳ２１０２）。 First, the transmission / reception means 1104 receives the program and data from the outside of the communication terminal 1000 to the second program and data storage means 1105 (step S2101) and stores them in the second program and data storage means 1105 (step S2102).

次に、制御手段１１０６は、第２プログラムおよびデータ格納手段１１０５より、前述のプログラムおよびデータを起動し（ステップＳ２１０３）、プログラム中に記述された音声処理用の言語情報作成手順をサブルーチンコール等で呼び出す（ステップＳ２１０４）。 Next, the control unit 1106 activates the above-described program and data from the second program and data storage unit 1105 (step S2103), and the procedure for creating language information for voice processing described in the program is executed by a subroutine call or the like. Call (step S2104).

ステップＳ２１０４で、呼び出した作成手順において、作成の対象となるプログラムおよびデータを、第１プログラムおよびデータ格納手段１１０１と、第２プログラムおよびデータ格納手段１１０５の双方よりを呼び出す（ステップＳ２１０５）。 In step S2104, the program and data to be created are called from both the first program and data storage means 1101 and the second program and data storage means 1105 in the called creation procedure (step S2105).

ステップＳ２１０４で読み出した言語情報生成手順により、通信端末１０００の音声処理言語情報作成手段１１０３で生成する場合（ステップＳ２１０６）、ステップＳ２１０５で呼び出したプログラムおよびデータを基に、音声処理言語情報作成手段１１０３において言語情報を生成する（ステップＳ２１０７）。 When the speech processing language information creation unit 1103 of the communication terminal 1000 generates the speech information in the language information generation procedure read out in step S2104 (step S2106), the speech processing language information creation unit 1103 is based on the program and data called in step S2105. The language information is generated at (step S2107).

ステップＳ２１０７において、通信端末１０００の音声処理言語情報作成手段１１０３で作成しない場合、ステップＳ２１０８に進む。 If the voice processing language information creation unit 1103 of the communication terminal 1000 does not create in step S2107, the process proceeds to step S2108.

ステップＳ２１０４で読み出した言語情報生成手順により、サーバ２００の音声処理言語情報作成手段２０２で生成する場合（ステップＳ２１０８）、ステップＳ２１０５において読み出したデータを送受信手段１１０４によりサーバ２００に送信し送受信手段２０１では、このデータを受信する（ステップＳ２１０９）。前述のデータとサーバ内に格納しているデータより、音声処理言語情報作成手段２０２で音声処理用言語情報を生成する（ステップＳ２１１０）。 When the voice processing language information creation unit 202 of the server 200 generates the data by the language information generation procedure read in step S2104 (step S2108), the data read in step S2105 is transmitted to the server 200 by the transmission / reception unit 1104. This data is received (step S2109). The speech processing language information creating unit 202 generates speech processing language information from the above-described data and the data stored in the server (step S2110).

サーバ２００の送受信手段２０１により生成された音声処理用辞書を、通信端末１０００に送信し、送受信手段１１０４は、これを受信する（ステップＳ２１１１）。 The speech processing dictionary generated by the transmission / reception means 201 of the server 200 is transmitted to the communication terminal 1000, and the transmission / reception means 1104 receives this (step S2111).

ステップＳ２１０８において、サーバ２００の音声処理言語情報作成手段２０２において音声処理用言語情報を作成しない場合、ステップＳ２１１２に進む。 If the speech processing language information creating unit 202 of the server 200 does not create speech processing language information in step S2108, the process advances to step S2112.

制御手段１１０６ではこれを受けて、前述の生成された音声処理用言語情報を読みこんで、音声処理手段１１０２を起動させる（ステップＳ２１１２）。 In response to this, the control unit 1106 reads the generated speech processing language information and activates the speech processing unit 1102 (step S2112).

さらに、ステップＳ２１０３で起動されたプログラムに従って、第１プログラムおよびデータ格納手段１１０２のプログラムをよびだし（ステップＳ２１１３）、音声処理手段１１０２と連携動作させる（ステップＳ２１１４）。 Further, in accordance with the program started in step S2103, the first program and the program of the data storage means 1102 are called (step S2113), and the voice processing means 1102 is operated in cooperation (step S2114).

本実施形態では、音声処理言語情報作成手段１１０３において、第１プログラムおよびデータ格納手段１１０１と、第２プログラムおよびデータ格納手段１１０５の双方よりプログラムおよびデータを呼び出して、音声言語情報を作成するため、通信端末１０００に予め内蔵されているプログラムと、受信したプログラムとを連携させて音声認識や音声合成などの音声処理を行うことが可能である。 In the present embodiment, the speech processing language information creating unit 1103 calls the program and data from both the first program and data storage unit 1101 and the second program and data storage unit 1105 to create the speech language information. It is possible to perform speech processing such as speech recognition and speech synthesis by linking a program built in the communication terminal 1000 in advance with the received program.

次に、本発明の第３の実施の形態について図面を参照して詳細に説明する。 Next, a third embodiment of the present invention will be described in detail with reference to the drawings.

図２３を参照すると、本発明の第３の実施の形態は、通信端末２０００と、複数のサーバ２００と、サーバｎ００を備えている。通信端末２０００は、第１プログラムおよびデータ格納手段２１０１と、音声処理手段２１０２と、音声処理言語情報作成手段２１０３と、送受信手段２１０４と、第２プログラムおよびデータ格納手段２１０５と、制御手段２１０６と、音声処理言語情報統合手段２１０７を備えている。 Referring to FIG. 23, the third embodiment of the present invention includes a communication terminal 2000, a plurality of servers 200, and a server n00. The communication terminal 2000 includes a first program and data storage unit 2101, a voice processing unit 2102, a voice processing language information creation unit 2103, a transmission / reception unit 2104, a second program and data storage unit 2105, a control unit 2106, A voice processing language information integration unit 2107 is provided.

サーバ２００は、送受信手段２０１と、音声処理言語情報作成手段２０２を備えている。サーバｎ００は、送受信手段ｎ０１と、音声処理言語情報作成手段ｎ０２から構成されている。これらの手段はそれぞれ概略つぎのように動作する。 The server 200 includes transmission / reception means 201 and voice processing language information creation means 202. The server n00 includes transmission / reception means n01 and speech processing language information creation means n02. Each of these means generally operates as follows.

第１プログラムおよびデータ格納手段２１０１は、通信端末２０００に予め内蔵されているプログラムやプログラムが管理するデータを格納する。第１プログラムおよびデータ格納手段２１０１に格納されるデータとしては、通信端末２０００の状態に応じて動的に変化するデータや端末利用者の個人データがある。音声処理手段２１０２は、音声認識及び／又は音声合成を行う。 The first program and data storage unit 2101 stores a program built in the communication terminal 2000 and data managed by the program. The data stored in the first program and data storage unit 2101 includes data that dynamically changes according to the state of the communication terminal 2000 and personal data of the terminal user. The voice processing unit 2102 performs voice recognition and / or voice synthesis.

音声処理言語情報作成手段２１０３は、第１のプログラムおよびデータと第２のプログラムおよびデータとが記憶している内容に従って、音声処理手段２１０２用の辞書や文法や言語モデル等を作成する。送受信手段２１０４は、通信端末２０００側の情報を送信し、プログラムおよびデータを端末外部から受信する。第２プログラムおよびデータ格納手段２１０５は、送受信手段２１０４により端末外部から受信したプログラムおよびデータを格納する。 The voice processing language information creation unit 2103 creates a dictionary, a grammar, a language model, and the like for the voice processing unit 2102 according to the contents stored in the first program and data and the second program and data. The transmission / reception means 2104 transmits information on the communication terminal 2000 side, and receives programs and data from the outside of the terminal. The second program and data storage unit 2105 stores the program and data received from the outside of the terminal by the transmission / reception unit 2104.

制御手段２１０６は、送受信手段２１０４により取得した第２のプログラムおよびデータからプログラムおよびデータを呼び出して、音声処理手段２１０２、および第１のプログラムおよびデータを連携させる。 The control unit 2106 calls the program and data from the second program and data acquired by the transmission / reception unit 2104 to link the audio processing unit 2102 and the first program and data.

音声処理言語情報統合手段２１０７は、通信端末２０００の音声処理言語情報作成手段２１０３で生成された音声言語情報と、サーバ２００の音声処理言語情報生成手段２０２およびサーバｎ００の音声処理言語情報生成手段ｎ０２で生成された音声言語情報とを合成して音声言語情報を生成する。 The speech processing language information integration unit 2107 includes the speech language information generated by the speech processing language information creation unit 2103 of the communication terminal 2000, the speech processing language information generation unit 202 of the server 200, and the speech processing language information generation unit n02 of the server n00. Spoken language information is generated by synthesizing with the spoken language information generated in step (1).

サーバ２００の送受信手段２０１は、通信端末２０００側からの情報を受信し、サーバ２００で生成されたプログラムやデータを通信端末２０００側に送信する。音声処理言語情報作成手段２０２は、通信端末２０００から送信されたデータを基に、サーバ２００側に格納されているデータより、音声処理用の音声言語情報を作成する。 The transmission / reception means 201 of the server 200 receives information from the communication terminal 2000 side and transmits a program and data generated by the server 200 to the communication terminal 2000 side. The speech processing language information creation unit 202 creates speech language information for speech processing from the data stored on the server 200 side based on the data transmitted from the communication terminal 2000.

また、サーバｎ００でも同様に、送受信手段ｎ０１は、通信端末２０００側からの情報を受信し、サーバｎ００で生成されたプログラムやデータを通信端末２０００側に送信する。音声処理言語情報作成手段ｎ０２は、通信端末２０００から送信されたデータを基にサーバｎ００側に格納されているデータより、音声処理用の音声言語情報を作成する。 Similarly, in the server n00, the transmission / reception means n01 receives information from the communication terminal 2000 and transmits the program and data generated by the server n00 to the communication terminal 2000. The speech processing language information creation unit n02 creates speech language information for speech processing from the data stored on the server n00 side based on the data transmitted from the communication terminal 2000.

次に、図２３及び図２４のフローチャートを参照して、本実施の形態の全体の動作について詳細に説明する。 Next, the overall operation of the present embodiment will be described in detail with reference to the flowcharts of FIGS.

まず、送受信手段２１０４により、通信端末２０００外部から第２プログラムおよびデータ格納手段２１０５に、プログラムおよびデータを受信し（ステップＳ２４０１）、第２プログラムおよびデータ格納手段２１０５に格納する（ステップＳ２４０２）。 First, the transmission / reception means 2104 receives the program and data from the outside of the communication terminal 2000 to the second program and data storage means 2105 (step S2401) and stores them in the second program and data storage means 2105 (step S2402).

次に、制御手段２１０６は、第２プログラムおよびデータ格納手段２１０５より前述のプログラムおよびデータを起動し（ステップＳ２４０３）、プログラム中に記述された音声処理用の言語情報作成手順を呼びだす。ステップＳ２４０４で呼び出した音声情報作成手順において、作成の対象となるプログラムおよびデータを、第１プログラムおよびデータ格納手段２１０１と、第２プログラムおよびデータ格納手段２１０５の双方よりを呼びだす（ステップＳ２４０５）。 Next, the control means 2106 activates the above-mentioned program and data from the second program and data storage means 2105 (step S2403), and calls the language information creation procedure for speech processing described in the program. In the audio information creation procedure called in step S2404, the program and data to be created are called from both the first program and data storage means 2101 and the second program and data storage means 2105 (step S2405).

ステップＳ２４０４で読み出した言語情報生成手順により、通信端末２０００の音声処理言語情報作成手段２１０３で生成する場合（ステップＳ２４０６）、ステップＳ２４０５で呼び出したプログラムおよびデータを基に、音声処理言語情報作成手段２１０３において言語情報を生成する（ステップＳ２４０７）。 When the speech processing language information creation unit 2103 of the communication terminal 2000 generates the speech information in the language information generation procedure read out in step S2404 (step S2406), the speech processing language information creation unit 2103 is based on the program and data called in step S2405. The language information is generated at (step S2407).

ステップＳ２４０６において、通信端末２０００の音声処理言語情報作成手段２１０３で作成しない場合、ステップＳ２４０８に進む。 If the voice processing language information creation unit 2103 of the communication terminal 2000 does not create in step S2406, the process proceeds to step S2408.

ステップＳ２４０４で読みだした言語情報生成手順により、サーバ２００の音声処理言語情報作成手段２０２で生成する場合（ステップＳ２４０８）、ステップＳ２４０５において読み出したデータを送受信手段２１０４によりサーバ２００に送信し、送受信手段２０１では、このデータを受信する（ステップＳ２４０９）。 In the case where the speech processing language information creation unit 202 of the server 200 generates the data by the language information generation procedure read in step S2404 (step S2408), the data read in step S2405 is transmitted to the server 200 by the transmission / reception unit 2104, and the transmission / reception unit In 201, this data is received (step S2409).

前述のデータとサーバ内に格納しているデータより音声処理言語情報作成手段２０２で音声処理用言語情報を生成する（ステップＳ２４１０）。 The speech processing language information creating unit 202 generates speech processing language information from the above data and the data stored in the server (step S2410).

サーバ２００の送受信手段２０１により生成された音声処理用辞書を通信端末２０００に送信し、送受信手段２１０４は、これを受信する（ステップＳ２４１１）。 The voice processing dictionary generated by the transmission / reception means 201 of the server 200 is transmitted to the communication terminal 2000, and the transmission / reception means 2104 receives this (step S2411).

ステップＳ２４０８において、サーバ２００の音声処理言語情報作成手段２０２において音声処理用言語情報を作成しない場合には、ステップＳ２４１２に進む。 In step S2408, if the speech processing language information creation unit 202 of the server 200 does not create speech processing language information, the process advances to step S2412.

音声処理言語情報統合手段２１０７では、通信端末２０００の音声処理言語情報作成手段２１０３およびサーバ２００の音声言語作成手段２０２およびサーバｎ００の音声処理言語情報作成手段ｎ０２のそれぞれで生成した音声言語情報を、１つの音声言語情報に合成する（ステップＳ２４１２）。 In the speech processing language information integration unit 2107, the speech language information generated by the speech processing language information creation unit 2103 of the communication terminal 2000, the speech language creation unit 202 of the server 200, and the speech processing language information creation unit n02 of the server n00, It is synthesized into one speech language information (step S2412).

ステップＳ２４１２を受けて、制御手段２１０６では、前述の生成された音声処理用言語情報を読みこんで音声処理手段２１０２を起動させる（ステップＳ２４１３）。 In response to step S2412, the control unit 2106 reads the generated speech processing language information and activates the speech processing unit 2102 (step S2413).

さらに、ステップＳ２４０３で起動されたプログラムに従って、第１プログラムおよびデータ格納手段２１０１のプログラムを呼び出し（ステップＳ２４１４）、音声処理手段２１０２と連携動作させる（ステップＳ２４１５）。 Further, in accordance with the program activated in step S2403, the first program and the program of the data storage means 2101 are called (step S2414), and the voice processing means 2102 is operated in cooperation (step S2415).

次に、本実施の形態の作用効果について説明する。本実施形態では、音声処理言語情報統合手段２１０７を有し、通信端末２０００側で生成した音声言語情報と、複数のサーバ２００〜ｎ００側で生成した音声言語情報とを合成する構成としており、複数のプログラムやデータを連携させて、音声処理を行う、ことが可能になる。 Next, the effect of this Embodiment is demonstrated. In this embodiment, the speech processing language information integration unit 2107 is provided, and the speech language information generated on the communication terminal 2000 side and the speech language information generated on the plurality of servers 200 to n00 side are combined. It is possible to perform voice processing by linking these programs and data.

次に、本発明の第１の実施例を、図面を参照して説明する。本発明の第１の実施例の構成は、図１に示した構成とされる。図３乃至図８は、図１に示した本実施例の第１プログラムおよびデータ格納手段１０１に格納されるプログラムおよびデータの一例を説明するものである。 Next, a first embodiment of the present invention will be described with reference to the drawings. The configuration of the first embodiment of the present invention is the configuration shown in FIG. 3 to 8 illustrate an example of the program and data stored in the first program and data storage means 101 of this embodiment shown in FIG.

図３は、第１プログラムおよびデータ格納手段１０１に格納されるプログラムと、プログラムが管理するデータとの対応を説明するための図である。図３では、プログラムとして、端末管理、アドレス帳、送信着信（発信受信）履歴、ＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）、赤外線の５つのプログラムが格納されている。また、それぞれに対応するデータとして、端末管理データ、アドレス帳データ、送信着信履歴データ、ＧＰＳデータ、赤外線でデータが格納されている。 FIG. 3 is a diagram for explaining the correspondence between the first program and the program stored in the data storage means 101 and the data managed by the program. In FIG. 3, five programs of terminal management, address book, transmission incoming / outgoing (outgoing / receiving) history, GPS (Global Positioning System), and infrared are stored as programs. Further, as data corresponding to each, data is stored in terminal management data, address book data, incoming call history data, GPS data, and infrared rays.

図４乃至図８は、図３に示した第１プログラムおよびデータ格納手段１０１に格納されるプログラムが管理するデータの一例を示す図であり、通信端末１００のユーザ独自の情報や通信端末の動的に変化するデータを想定している。 4 to 8 are diagrams showing examples of data managed by the first program shown in FIG. 3 and the program stored in the data storage means 101. The user-specific information of the communication terminal 100 and the operation of the communication terminal are shown in FIG. Data that changes continuously is assumed.

図４は、図３の端末管理プログラムが管理する端末管理データの一例である。図４に示すように、端末管理データは、項目と、その値（パラメータ）より構成される。 FIG. 4 is an example of terminal management data managed by the terminal management program of FIG. As shown in FIG. 4, the terminal management data is composed of items and their values (parameters).

項目としては、電源、電池の残量、電波状態、蓋の開閉があり、それぞれの値は、電源＝ON、電池の残量＝４５％、電波状態＝良好、蓋の開閉＝閉となっている。 Items include power supply, remaining battery level, radio wave status, lid open / closed. The values are: power = ON, remaining battery level = 45%, radio wave status = good, lid open / closed = closed. Yes.

図５は、図３のアドレス帳プログラムが管理するアドレス帳データの一例である。アドレス帳データは、データに番号付けをするためのID、名前、名前の読み、メールアドレス、電話番号から構成される。図５に示す例では、ID=001、名前＝田中一郎、読み＝いっちゃん、メールアドレス＝ichiro@xxx.com、電話番号＝090-2222-3333となっている。 FIG. 5 is an example of address book data managed by the address book program of FIG. The address book data is composed of an ID, a name, a name reading, a mail address, and a telephone number for numbering the data. In the example shown in FIG. 5, ID = 001, name = Ichiro Tanaka, reading = Icchan, mail address=ichiro@xxx.com, and telephone number = 090-2222-3333.

図６は、図３に示した発信着信（発信受信）履歴プログラムが管理する発信着信（送信受信）履歴データの一例である。発信着信履歴データは、データを番号付けするためのＩＤ、発信着信の別、発信または着信した日時、発信または着信した先の電話番号より構成される。 FIG. 6 is an example of outgoing / incoming (transmission / reception) history data managed by the outgoing / incoming (transmission / reception) history program shown in FIG. The outgoing / incoming history data includes an ID for numbering the data, the type of incoming / outgoing call, the date / time of outgoing / incoming call, and the telephone number of the destination of outgoing / incoming call.

図６に示す例では、データが３件あり、
ID=001のデータでは、発信着信の別=発信、発信日時＝2003年12月18日1時2分34秒、発信先＝090-2222-3333となっており、
ID=002のデータでは、発信着信の別=着信、着信日時＝2003年12月18日1時4分34秒、着信先＝090-2222-3333となっており、
ID=003のデータでは、発信着信の別=着信、着信日時＝2003年12月18日2時4分34秒、着信先＝090-2222-3333となっている。 In the example shown in FIG. 6, there are three data,
In the data of ID = 001, it is as follows: Outgoing and incoming calls = outgoing, outgoing date and time = 01:02:34 on December 18, 2003, outgoing address = 090-2222-3333,
In the data of ID = 002, it is as follows: Outgoing incoming / outgoing = incoming, incoming date and time = 1: 4: 34 on December 18, 2003, incoming call = 090-2222-3333,
In the data of ID = 003, the type of incoming / outgoing call = incoming, incoming date / time = December 18, 2003 2: 4: 34, and incoming destination = 090-2222-3333.

図７は、図３のＧＰＳプログラムが管理するＧＰＳデータの一例である。ＧＰＳデータでは、直前にＧＰＳプログラムが動作したときに取得したデータを保持する。図７に示す例では、ＧＰＳデータは、項目名とその値とから構成される。項目としては、計測日時、緯度、経度、現在地の住所より構成され、計測日時＝2003年12月18日1時0分34秒、緯度＝N35°51.475、経度＝E139°51.475、現在地の住所＝東京都港区芝1-1-1となっている。 FIG. 7 is an example of GPS data managed by the GPS program of FIG. In the GPS data, data acquired when the GPS program is operated immediately before is held. In the example shown in FIG. 7, GPS data is composed of item names and their values. The items are composed of the measurement date and time, latitude and longitude, and the address of the current location. Measurement date and time = December 18, 2003 1:00:34, Latitude = N35 ° 51.475, Longitude = E139 ° 51.475, Current location address = It is 1-1-1, Shiba, Minato-ku, Tokyo.

図８は、図３の赤外線プログラムが管理する赤外線データの一例である。赤外線データでは、端末外の赤外線ポートとの通信履歴を保持する。図８に示す例では、赤外線データは、データを番号付けするためのＩＤと、通信日時と、通信先ＩＤと、通信コマンドと、コマンドに付随したパラメータと、から構成されており、４件のデータがあり、
ID＝001のデータでは、通信日時＝2003年12月18日1時0分34秒10ms、通信先＝IDxxxxxx、通信コマンド＝接続要求となっており、
ID＝002のデータでは、通信日時＝2003年12月18日1時0分34秒30ms、通信先＝IDxxxxxx、通信コマンド＝接続要求応答となっており、
ID＝003のデータでは、通信日時＝2003年12月18日1時0分34秒50ms、通信先＝IDxxxxxx、通信コマンド＝データ読み出し要求、パラメータ＝“データ名=ファイル1.txt ファイルタイプ=txt”となっており、
ID＝004のデータでは、通信日時＝2003年12月18日1時0分34秒0ms、通信先＝Idxxxxxx、通信コマンド＝データ読み出し応答、パラメータ＝“データ名=ファイル1.txt ファイルタイプ=txt”となっている。 FIG. 8 is an example of infrared data managed by the infrared program of FIG. In infrared data, a communication history with an infrared port outside the terminal is held. In the example shown in FIG. 8, the infrared data is composed of an ID for numbering data, a communication date and time, a communication destination ID, a communication command, and parameters associated with the command. There is data
In the data of ID = 001, the communication date and time = December 18, 2003 1: 0: 34 10ms, communication destination = IDxxxxxx, communication command = connection request,
In the data of ID = 002, the communication date and time = December 18, 2003 1: 0: 34: 30ms, communication destination = IDxxxxxx, communication command = connection request response,
For data with ID = 003, the communication date and time = December 18, 2003 1: 0: 34, 50 ms, communication destination = IDxxxxxx, communication command = data read request, parameter = "data name = file 1.txt file type = txt "
For data with ID = 004, communication date and time = December 18, 2003 1: 0: 34, 0 ms, communication destination = Idxxxxxx, communication command = data read response, parameter = "data name = file 1.txt file type = txt It has become.

図８に示す例では、
ID=001のデータの示す時刻に、ID=001のデータの示す通信先に、通信端末より接続要求が送信され、
ID=002のデータの示す時刻に通信先との接続が確立し、
ID=003のデータの示す時刻にテキスト形式のファイルであるファイル1.txtのファイルを読み出すように要求を送信し、
ID=004のデータの示す時刻に前記ファイル1.txtのファイルが通信端末に送信された、
ことを意味している。 In the example shown in FIG.
At the time indicated by the data of ID = 001, a connection request is transmitted from the communication terminal to the communication destination indicated by the data of ID = 001,
A connection with the communication destination is established at the time indicated by the data of ID = 002,
Send a request to read the file 1.txt, which is a text format file, at the time indicated by the data of ID = 003,
The file of 1.txt file was sent to the communication terminal at the time indicated by the data of ID = 004,
It means that.

次に、図９乃至図１１を用いて、本発明の一実施例の通信端末１００の音声処理言語情報作成手段１０３の動作を説明する。 Next, the operation of the speech processing language information creation unit 103 of the communication terminal 100 according to an embodiment of the present invention will be described with reference to FIGS.

音声処理言語情報作成手段１０３は、音声処理手段１０２で用いる音声言語情報を作成する。音声処理手段１０２は、音声認識と音声合成を行う。 The speech processing language information creation unit 103 creates speech language information used by the speech processing unit 102. The voice processing unit 102 performs voice recognition and voice synthesis.

音声認識用の音声言語情報としては、単語列とその読みから構成される単語辞書、有限言語ネットワークによる文法、確率統計モデルに基づく言語モデルを用いる。 As speech language information for speech recognition, a word dictionary composed of a word string and its reading, a grammar by a finite language network, and a language model based on a probability statistical model are used.

また、音声合成用の音声言語情報としては、単語列とその読みから構成される辞書を用いる。 As speech language information for speech synthesis, a dictionary composed of word strings and their readings is used.

音声認識用の単語辞書作成の一例としては、対象とするプログラムやデータの形式を用いて形態素解析を行い、特定の品詞の単語に読み付けを行い、単語辞書に登録する。 As an example of creating a word dictionary for speech recognition, morphological analysis is performed using a target program or data format, a word with a specific part of speech is read, and registered in the word dictionary.

この際、品詞ごとに読み付けルール定め、これに従って、読みを行う。 At this time, a reading rule is determined for each part of speech, and reading is performed according to the rule.

また、格納されているデータの構造が明らかなものに関しては、構造解析を、事前に行い、この結果を用いて、データの種類を分類し、分類した結果毎に、読みづけを行う。 For those in which the structure of the stored data is clear, the structural analysis is performed in advance, the data type is classified using this result, and reading is performed for each classified result.

前者の形態素解析が有効なデータとしては、Ｗｅｂブラウザが管理するＷｅｂページデータやメーラが管理するメールデータなどがある。 Examples of data for which the former morphological analysis is effective include Web page data managed by a Web browser and mail data managed by a mailer.

また後者のデータの構造解析が有効な例としては、アドレス帳データや端末状態データがある。 Examples of the latter effective data structure analysis include address book data and terminal status data.

解析結果が人名のものは、同じデータ内に該当する読み情報がある場合、その情報を優先し、読みがない場合は仮名漢字変換機能により、読みを生成する。 When the analysis result is a person name, if there is corresponding reading information in the same data, the information is given priority, and if there is no reading, a reading is generated by the kana-kanji conversion function.

解析の結果が、電話番号およびメールアドレスでは、数字やアルファベットでの読みの他、「田中さんの電話」や「田中さんのメールアドレス」のように名前をふくめたものも含めて登録する。 The phone number and mail address of the analysis result are registered including numbers and alphabetical readings, as well as those including names such as “Tanaka's phone” and “Tanaka's mail address”.

解析の結果が、「電池」や「電源」などの端末の機能や部品を示す名詞や、それらの状態を示す名詞や形容詞や動詞は、予め類義語、発音変形、省略形の読みを予めデータベースとしておき、データベースより該当するものを辞書に登録する。 The nouns that indicate the functions and parts of the terminal such as “battery” and “power”, and the nouns, adjectives, and verbs that indicate the states of the results of the analysis are stored in advance as synonyms, pronunciation variants, and abbreviations as a database. Then, the corresponding one is registered in the dictionary from the database.

音声認識用の文法作成方法の一例としては、対象とするプログラム毎に、文法のテンプレートを予め用意しておく。この文法テンプレートは、文法の構造である単語列のネットワークや単語列の生成方法を定義するものである。 As an example of a method for creating a grammar for speech recognition, a grammar template is prepared in advance for each target program. This grammar template defines a network of word strings and a method for generating word strings, which are grammatical structures.

音声処理言語情報作成手段１０３では、文法テンプレートしたがって、データを挿入し、文法を作成する。 The voice processing language information creation means 103 inserts data and creates a grammar according to the grammar template.

音声合成用の辞書生成方法の一例としては、音声認識用の単語辞書と同様に対象とするプログラムやデータの形式を用いて形態素解析を行い、品詞ごとに、読み付けルールを定め、このルールに従って、読みを登録する。 As an example of a dictionary generation method for speech synthesis, a morphological analysis is performed using a target program or data format in the same manner as a word dictionary for speech recognition, and a reading rule is defined for each part of speech. Register the reading.

また、格納されているデータの構造が明らかなものに関しては、構造解析を事前におこないこの結果を用いてデータの種類を分類し、分類した結果ごとに読みづけルールを用意しておき、これを用いて読み付けを行う。 In addition, if the structure of the stored data is clear, structural analysis is performed in advance, the data type is classified using this result, and a reading rule is prepared for each classified result. Use to read.

読み付けのルールの一例としては、以下のような方法がある。 Examples of reading rules include the following methods.

解析結果が人名のものは、読み付けルールとして、同じデータ内に、該当する読み情報がある場合、その情報を優先し、読みがない場合には、仮名漢字変換機能により、読みを生成する。 When the analysis result is a person name, as a reading rule, if there is corresponding reading information in the same data, the information is prioritized. If there is no reading, a reading is generated by the kana-kanji conversion function.

解析の結果が、メールアドレス場合、読み付けのルールとして、形態素解析の結果を用いて、メールアドレスの前後の関係から、持ち主が判定できる場合、「田中さんのメールアドレス」のように、メールアドレスの持ち主の名前を含めた読み付けを行い、持ち主が判定できない場合、「このメールアドレス」のように読み付けを行う。 If the analysis result is an e-mail address, if the owner can determine from the relationship between the e-mail addresses using the morphological analysis result as a reading rule, e-mail address like "Tanaka's e-mail address" Is read including the name of the owner, and if the owner cannot be determined, it is read as “this mail address”.

辞書への登録を必要としない通常のアルファベットや、数字の読み付けを用いるか、前記読み付けルールを用いるかは、プログラム中に定義しておく。 It is defined in the program whether to use normal alphabets and numbers that do not need to be registered in the dictionary, or to use the reading rules.

解析の結果が、電話番号の場合、読み付けルールとしては、形態素解析の結果を用いて、電話番号の前後の関係から持ち主が判定できる場合には、例えば「田中さんの電話番号」のように、持ち主の名前を含めた読み付けを行う。一方、持ち主が判定できない場合には、「この電話番号」のように読み付けを行う。また、局番から固定電話、携帯電話、IP電話、フリーダイアル等の電話の種類を判定し、たとえば、０１２０で始まるものの場合、「このフリーダイアル」のように読み付けを行う。 If the result of the analysis is a phone number, as a reading rule, if the owner can determine from the relationship of the phone number before and after using the result of morphological analysis, for example, “Mr. Tanaka's phone number” , Read with the name of the owner. On the other hand, when the owner cannot be determined, reading is performed like “this phone number”. Further, the type of the telephone such as a fixed telephone, a mobile phone, an IP telephone, a free dial, etc. is determined from the station number.

このほかにも、音声合成が定義している数字や記号の読みを用いる方法や、局番間の区切り記号であるハイフンやマイナスや括弧を‘の’に置き換える方法などがある。たとえば、ハイフンやマイナスや括弧を‘の’に置き換える方法では、「０４４−９９９−１２３４」は、「ぜろよんよんのきゅーきゅーきゅーのいちにーさんしー」と読みづけがされる。いずれの読み付けルールを採用するかは、プログラム中に定義する。 In addition, there are a method using reading of numbers and symbols defined by speech synthesis, a method of replacing hyphens, minuses, and parentheses as delimiters between station numbers with '' '. For example, in the method of replacing hyphens, minuses, and parentheses with 'no', "044-999-1234" is read as "Zeroyonyon no Kyukyukyu no Ichi Nisansan". It is ending. Which reading rule is adopted is defined in the program.

解析の結果が「電池」や「電源」などの端末の機能や部品をしめす名詞では、予め、その読みをデータベースに登録しておき、データベースより該当するものを辞書に登録する。 For nouns that indicate the function or part of the terminal such as “battery” or “power source”, the reading is registered in the database in advance, and the corresponding one is registered in the dictionary from the database.

図９は、音声認識用の単語辞書の生成を説明するための図である。図９には、一例として、図４に示した端末情報データに対して、辞書が生成される例が示されている。端末情報データは、端末の機能や部品をしめす名詞や、それらの状態を示す名詞であることから、予め登録されたデータベースより、該当するものを辞書に登録する。 FIG. 9 is a diagram for explaining generation of a word dictionary for speech recognition. FIG. 9 shows an example where a dictionary is generated for the terminal information data shown in FIG. 4 as an example. Since the terminal information data is a noun indicating the function or part of the terminal or a noun indicating the state thereof, the corresponding information is registered in a dictionary from a previously registered database.

端末情報データの項目名である、「電源」および「電池の残量」を表記として、辞書が生成される例を示している。 An example is shown in which a dictionary is generated with “power source” and “remaining battery power”, which are item names of terminal information data, as notations.

表記「電源」には、単語の読み仮名である「でんげん」と、電源の類義語である「ぱわー」の２つを音声認識用の読みとして登録する。 In the notation “power”, two words, “Dengen”, which is a word reading pseudonym, and “Pawa”, which is a synonym for power, are registered as speech recognition readings.

また、表記「電池の残量」では、単語の読み仮名である「でんちのざんりょう」のほかに、「電池」の類義語である「ばってりー」および「ばってりー」の発音変形「ばってり」、「残量」の類義語である「のこり」とを合わせて、「でんちののこり」、「ばってりーのざんりょう」、「ばってりのざんりょう」、「ばってりーののこり」、「ばってりののこり」を登録する。 In addition, in the expression “remaining battery power”, in addition to the word reading pseudonym “Denchi no Zanri”, the phonetic variants of “battery” and “battery” are synonyms for “battery”. Combined with “Nokori”, which is a synonym for “Batari” and “Remaining amount”, “Denchinokori”, “Batari no Zanri”, “Batari no Zanri”, “Batari” Register "Nawori" and "Nori no Nori".

また、「電池の残量」の省略形である「でんち」、「ざんりょう」と、これらの類義語や発音変形の「ばってりー」、「ばってり」、「のこり」もあわせて登録する。 In addition, the abbreviations “battery remaining”, “denchi”, “zanryo”, and synonyms and pronunciation variations “battery”, “battery”, “smile” sign up.

図１０は、音声合成用の辞書の生成を説明するための図である。図１０には、一例として、図５に示したアドレス帳データに対して、辞書が生成される例が示されており、「田中一郎」、「ichiro@xxx.com」、「090-2222-3333」の３種類の表記に対して、音声合成の読みを生成する例を示している。 FIG. 10 is a diagram for explaining generation of a dictionary for speech synthesis. FIG. 10 shows an example in which a dictionary is generated for the address book data shown in FIG. 5 as an example, and “Tanaka Ichiro”, “ichiro@xxx.com”, “090-2222-” An example of generating speech synthesis readings for three types of notation “3333” is shown.

表記「田中一郎」に対しては、人名であることから、アドレス帳データに与えられている読み「いっちゃん」を登録する。 For the notation “Ichiro Tanaka”, since it is a personal name, the reading “Icchan” given to the address book data is registered.

また、表記「ichiro@xxx.com」に対しては、メールアドレスであるので、メールの持ち主の読みを含めた形式である「いっちゃんのめーるあどれす」を登録する。 Also, for the notation “ichiro@xxx.com”, since it is an e-mail address, register “Icchan's Mail Address”, which includes the reading of the e-mail owner.

さらに、表記「090-2222-3333」に対しては、電話番号であるので、電話番号の持ち主の読みと局番の解析結果を含めた形式である「いっちゃんのけいたいでんわ」を登録する。 Furthermore, since the notation “090-2222-3333” is a telephone number, “Ichichan no Keitai Denwa” which is a format including the reading of the owner of the telephone number and the analysis result of the station number is registered.

図１１は、音声認識用文法の生成を説明するための図である。図１１には、一例として図５で示したアドレス帳データを用いてアドレス帳に対して操作を行うための文法が示されている。 FIG. 11 is a diagram for explaining generation of a speech recognition grammar. FIG. 11 shows a grammar for operating the address book using the address book data shown in FIG. 5 as an example.

図１１（ａ）では、文法生成のための文法テンプレートの一例を示している。文法テンプレートでは、文法の構造や登録する単語列の定義の方法を定義する。この例では、アドレス帳の［読み］、［助詞］、アドレス帳の［項目］、［助詞］、［操作］の各項目を順々に発声するような言い回しを定義している。また、このうち、［読み］と、［項目］に登録する単語列はアドレス帳から参照する。また、それ以外の［助詞］と、［操作］は、登録する単語列を予め定義しておく。 FIG. 11A shows an example of a grammar template for generating a grammar. The grammar template defines a grammar structure and a method for defining a word string to be registered. In this example, a wording that utters each item of [Reading], [Participant] in the address book, [Item], [Participant], and [Operation] in the address book in order is defined. Of these, the word strings registered in [reading] and [item] are referred to from the address book. For other [particles] and [operation], word strings to be registered are defined in advance.

図１１（ｂ）では、図１１（ａ）の文法テンプレートに従って登録される単語列の例を示したものである。各単語列は、表記と読みで構成されている。アドレス帳から参照された［読み］と［項目名］、予め登録された［助詞］と［操作］の４項目の単語列を定義している。［読み］には、“いっちゃん”の１単語が、［項目名］には、“名前”、“読み”、“メールアドレス”、“電話番号”の４単語が、［助詞］には、“の”、“を”の２単語が、［操作］には、“消去する”、“消す”、“編集する”、“編集します”、“編集したい”の５単語が登録されている。また、各単語には読みと表記が与えられている。 FIG. 11B shows an example of a word string registered according to the grammar template of FIG. Each word string is composed of notation and reading. It defines a word string of four items: [reading] and [item name] referenced from the address book, [participant] and [operation] registered in advance. [Reading] has one word “Ichichan”, [Item name] has four words “name”, “reading”, “mail address”, “phone number”, and [Participant] has Two words “No” and “O” are registered in [Operation], and five words “Erase”, “Erase”, “Edit”, “Edit” and “I want to edit” are registered. . Each word is given a reading and a notation.

図１１（ｃ）では、図１１（ａ）および図１１（ｂ）により、テンプレートに単語列を挿入した結果、認識可能となる発声の一例とテンプレートとの対応を示したものである。 FIG. 11C shows a correspondence between an example of a utterance that can be recognized as a result of inserting a word string into a template and the template in FIGS. 11A and 11B.

以上のように定義することにより、「いっちゃんの電話番号を編集したい」という発声を、認識するための文法が生成される。 By defining as described above, a grammar for recognizing an utterance “I want to edit my phone number” is generated.

次に、図１２乃至図１４を用いて、図１に示した実施例のサーバ２００の音声処理言語情報作成手段２０２の動作を説明する。 Next, the operation of the speech processing language information creating unit 202 of the server 200 according to the embodiment shown in FIG. 1 will be described with reference to FIGS.

音声処理言語情報作成手段１０３では、通信端末１００側からデータの生成方法と、通信端末１００で管理しているデータを、サーバ２００に送信する。これらを用いて、サーバ２００内に格納しているデータにより、音声言語情報を生成する。 The voice processing language information creation unit 103 transmits a data generation method and data managed by the communication terminal 100 to the server 200 from the communication terminal 100 side. Using these, the speech language information is generated from the data stored in the server 200.

サーバ２００の音声処理言語情報作成手段２０２で生成する音声言語情報は、通信端末１００の音声合成情報作成手段１０３で生成されるものと同様に、音声認識用としては単語列とその読みから構成される単語辞書、有限言語ネットワークによる文法、確率統計モデルに基づく言語モデルを想定している。 The speech language information generated by the speech processing language information creation unit 202 of the server 200 is composed of a word string and its reading for speech recognition, similar to that generated by the speech synthesis information creation unit 103 of the communication terminal 100. A language model based on a word dictionary, a grammar by a finite language network, and a probability statistical model is assumed.

音声合成用の音声言語情報も、単語列と、その読みから構成される辞書を想定している。また、音声認識用の単語辞書および文法および言語モデルの作成方法は、サーバ２００内に格納しているデータを用いる他は同様とする。音声合成用辞書に関しても同様とする。 The speech language information for speech synthesis is also assumed to be a dictionary composed of word strings and their readings. The method for creating a word dictionary and grammar and language model for speech recognition is the same except that data stored in the server 200 is used. The same applies to the speech synthesis dictionary.

以下、図１２乃至図１４を参照して、音声言語情報の生成について説明する。 Hereinafter, the generation of the speech language information will be described with reference to FIGS.

図１２と図１３では、サーバ２００での音声言語情報生成の一例として、サーバ２００に格納されたデータを用いて、音声認識用の単語辞書を生成する例を示している。 FIGS. 12 and 13 show an example of generating a speech recognition word dictionary using data stored in the server 200 as an example of generating the speech language information in the server 200.

ここでは、サーバ２００側から、通信端末１００の電話番号を送信し、サーバ２００で管理している利用履歴を基に、サービスを利用するための単語辞書を作成する方法を説明する。 Here, a method of transmitting a telephone number of the communication terminal 100 from the server 200 side and creating a word dictionary for using the service based on the usage history managed by the server 200 will be described.

図１２は、サーバ２００で管理しているデータの一例である利用履歴データを示す図である。このデータは、データを番号付けするためのID、利用者の電話番号、サービス名、利用回数、最終利用日、利用金額合計、サービス主体者の利用希望の度合いを示すキャンペーンより構成されている。図１２では、通信端末１００よりサーバ２００に送付された通信端末の電話番号が、０９０−XXXXYYYYの場合、該当するものが、ID=001〜003の3件があることを示している。 FIG. 12 is a diagram illustrating usage history data that is an example of data managed by the server 200. This data is composed of a campaign indicating the ID for numbering the data, the telephone number of the user, the service name, the number of times of use, the date of last use, the total amount of use, and the degree of use desired by the service subject. In FIG. 12, when the telephone number of the communication terminal sent from the communication terminal 100 to the server 200 is 090-XXXXYYYY, it indicates that there are three cases of ID = 001 to 003.

また、それぞれの利用については、
ID=001では、AA美術館のチケット購入が5回利用があり、最終利用日時が2003年03月12日19時30分25秒で利用金額が5500円でサービス主体者の利用希望の度合いは低であり、
ID=002では、中華料理店Bが8回利用があり、最終利用日時が2003年03月12日22時30分25秒で利用金額が25800円でサービス主体者の利用希望の度合いは中であり、
ID=003では、エステサロンCは利用履歴がなくサービス主体者の利用希望の度合いは高である、
ことを示している。 In addition, about each use,
With ID = 001, the AA Museum ticket purchase is used 5 times, the last use date and time is 19:30:25 on March 12, 2003, the usage amount is 5500 yen, and the degree of hope of the service subject is low And
At ID = 002, Chinese restaurant B has been used 8 times, the last use date and time is 12:30 on March 12, 2003, the usage amount is 25800 yen, and the degree of hope of the service subject is medium Yes,
In ID = 003, the beauty salon C has no usage history and the service subject's degree of use is high.
It is shown that.

図１３は、図１２を用いて作成された音声認識用の単語辞書である。図１３に示す例では、利用履歴から利用可能なサービスを列挙し、登録されたサービス名やサービスを利用するための単語を辞書に登録する。サービスを利用するための単語列は、予めサーバ２００内に定義しておく。 FIG. 13 is a word dictionary for speech recognition created using FIG. In the example shown in FIG. 13, services that can be used are listed from the usage history, and registered service names and words for using the services are registered in the dictionary. A word string for using the service is defined in the server 200 in advance.

登録する単語列に登録可能な数や登録順などの制約がある場合には、利用回数、最終の利用日、金額合計やサービス主体者の利用希望の度合いにより制約を行う。 When there are restrictions on the number of words that can be registered and the order of registration in the word string to be registered, restrictions are made according to the number of uses, the last use date, the total amount, and the degree of use desired by the service subject.

例えば、図１２のID＝001に登録されているAA美術館チケット購入のサービスについての場合、利用するための言い回しとして、
AA美術館、美術館、チケット購入などを
音声認識用の単語辞書に登録する。 For example, in the case of the AA Museum ticket purchase service registered with ID = 001 in FIG.
Register AA Museum, Art Museum, ticket purchase, etc. in the word dictionary for voice recognition.

サーバ２００の音声処理言語情報作成手段２０２が生成する音声言語情報としては、複数のプログラムを連携動作させるための音声言語情報を作成することも想定している。 As the speech language information generated by the speech processing language information creation unit 202 of the server 200, it is assumed that speech language information for operating a plurality of programs in cooperation is created.

図１４を参照して、複数のプログラムを連携動作させるための音声言語情報の生成について説明する。図１４に示す例では、複合施設の情報案内プログラムと、複合施設内の店舗のサービス予約プログラムと、を連携させて動作させるための音声認識用文法の生成について説明する。 With reference to FIG. 14, generation of spoken language information for operating a plurality of programs in cooperation with each other will be described. In the example shown in FIG. 14, generation of a speech recognition grammar for operating a complex facility information guide program and a store service reservation program in the complex facility in cooperation with each other will be described.

図１４（ａ）および図１４（ｂ）は、すでに生成された音声認識用文法を示している。 FIG. 14A and FIG. 14B show already generated speech recognition grammars.

図１４（ａ）は、複合施設の情報案内プログラムを音声で制御させるための文法である。この文法は、複合施設内の店舗のカテゴリーを特定するための［カテゴリー］と、［店舗名］と、案内の内容を指定するための［項目］と、［コマンド］の４つの単語列より構成される。 FIG. 14A is a grammar for controlling the information guidance program of the complex facility by voice. This grammar consists of four word strings: [Category] for specifying the category of the store in the complex, [Store name], [Item] for specifying the content of the guidance, and [Command]. Is done.

単語列［カテゴリー］には“レストラン”、“エステティック”、“美術館”の３つの単語が、
［店舗施設名］には、“店Ａ”、“美術館Ａ”、“エステティックサロンＡ”の４つの単語が、
［項目］には、“場所”、“営業時間”、“定休日”、“予算”の４つの単語が、
［コマンド］には、“どこ”、“いくら”、“いつ”
の３つの単語が登録されている。 In the word string [Category], there are three words “restaurant”, “esthetic” and “museum”
In [Store Facility Name], the four words “Store A”, “Museum A”, and “Aesthetic Salon A”
In [Item], the four words "location", "business hours", "regular holiday", and "budget"
“Command” includes “where”, “how much”, “when”
These three words are registered.

なお、各単語列に登録された単語はそれぞれ表記と読みをもつ。 Each word registered in each word string has a notation and a reading.

この文法では、「エステティックのエステサロンＡの定休日はいつ」が認識可能である。 With this grammar, it is possible to recognize “when is a regular holiday at esthetic salon A”.

図１４（ｂ）は、複合施設内の店舗のサービス予約プログラムで、
サービスを特定するための［サービス名］と、
サービスを受けたい時間を指定するための［時間］と、
予約に関する操作を指定するための［予約コマンド］
より構成されている。 FIG. 14B is a service reservation program for a store in a complex facility.
[Service name] to identify the service,
[Time] to specify the time you want to receive the service,
[Reservation command] for specifying operations related to reservation
It is made up of.

単語列［サービス名］には“サービスＡを”、“サービスＢを”の２つの単語が、
［時間］には“１０：００に”、“１１：００に”の２つの単語が、
［予約コマンド］には“予約”、“取り消し”、“確認”の３つの単語が登録されている。なお、各単語列に登録された単語は、それぞれ表記と読みをもつ。 In the word string [service name], two words “service A” and “service B”
In [Time], the two words “at 10:00” and “at 11:00”
[Reservation command] contains three words “reservation”, “cancellation”, and “confirmation”. Each word registered in each word string has a notation and a reading.

この文法では、「サービスＡを１０：００に予約」が認識可能である。 In this grammar, “reservation of service A at 10:00” can be recognized.

図１４（ｃ）は、図１４（ａ）および図１４（ｂ）の文法を融合して、新たな文法を生成するための文法テンプレートである。 FIG. 14 (c) shows a grammar template for generating a new grammar by merging the grammars of FIG. 14 (a) and FIG. 14 (b).

このテンプレートでは、
［カテゴリー］と、［店舗施設名］と、［項目］と、［コマンド］を順々に指定する文法と、
［カテゴリー］と、［店舗施設名］と、［サービス名］と、［時間］と、［予約コマンド］と、を順々に指定する文法の２つを定義している。 In this template,
A grammar that specifies [Category], [Store Facility Name], [Item], and [Command] in sequence,
Two grammars are defined that specify [category], [store facility name], [service name], [time], and [reservation command] in sequence.

［カテゴリー］と、［店舗施設名］と、［項目］と、［コマンド］の項目に定義する単語列は、図１４（ａ）の文法より挿入する。 The word strings defined in the items [category], [store facility name], [item], and [command] are inserted from the grammar of FIG.

また、［サービス名］と、［時間］と、［予約コマンド］の各項目に定義する単語列は、図１４（ｂ）より挿入する。 Also, the word strings defined in the items of [service name], [time], and [reservation command] are inserted from FIG.

図１４（ｄ）は、図１４（ｃ）の文法テンプレートを用いて作成された文法を示す図である。これにより、
「エステティックのエステティックサロンＡの定休日はいつ」と、
「エスティックのエステティックサロンＡのサービスＡを１０：００に予約」
の双方の文法が認識可能となる。 FIG. 14D is a diagram showing a grammar created using the grammar template of FIG. This
“When is esthetic salon salon A regular holiday?”
"Esthetic esthetic salon A service A reserved at 10:00"
Both grammars can be recognized.

次に、図１５乃至図１９を参照して、図１の第２プログラムおよびデータ格納手段１０５に格納されるプログラムについて説明する。 Next, the program stored in the second program and data storage means 105 in FIG. 1 will be described with reference to FIGS.

第２プログラムおよびデータ格納手段１０５に格納されるプログラムでは、第１プログラムおよびデータ格納手段１０１に格納されたプログラムおよびデータや、音声処理手段１０２や、サーバ２００との連携動作の方法を定義している。 In the program stored in the second program and data storage means 105, the program and data stored in the first program and data storage means 101, the voice processing means 102, and the method of cooperative operation with the server 200 are defined. Yes.

この場合、第１プログラムおよびデータ格納手段１０１に格納されるプログラム（通信端末１００に予め格納されるプログラム）としては、発信や着信を管理する発信着信プログラム、現在地を割り出すためのGPSプログラムや、赤外線通信を行うための赤外線通信プログラムなどがある。 In this case, as the first program and the program stored in the data storage means 101 (program stored in advance in the communication terminal 100), an outgoing / incoming program for managing outgoing and incoming calls, a GPS program for determining the current location, an infrared ray, There are infrared communication programs for communication.

以下では、図１５乃至図１９を参照して、具体的なプログラムについて説明する。 Hereinafter, a specific program will be described with reference to FIGS. 15 to 19.

図１５、図１６を参照して、第２プログラムおよびデータ格納手段１０５に格納されるプログラムの一例として、第１プログラムおよびデータ格納手段１０１に格納された発信着信プログラムと、音声処理手段１０２とを連携して動作させるプログラムの動作を説明する。具体的には前述のプログラムでは、電話を着信すると、発信着信履歴や、端末状態に応じたメッセージを合成音で出力する。 Referring to FIGS. 15 and 16, as an example of a program stored in second program and data storage means 105, an outgoing / incoming program stored in first program and data storage means 101, and voice processing means 102, The operation of the program that operates in cooperation will be described. Specifically, in the above-described program, when a call is received, a call reception history and a message corresponding to the terminal state are output with synthesized sound.

図１５は、第２プログラムおよびデータ格納手段１０５に格納されるプログラムの処理手順を説明するためのフローチャートである。図１５に示すように、制御手段１０６は、電話を着信すると、まずアドレス帳より着信した電話番号に該当するデータを検索する（ステップＳ１３０１）。着信番号が登録されている場合（ステップＳ１３０２の「ある」分岐）、名前を一時的保存する（ステップＳ１３０３）。次に、発信着信履歴を検索し、該当する電話番号がある場合（ステップＳ１３０４の「ある」分岐）、発信回数と受信回数を一時的に保存する（ステップＳ１３０５）。次に、端末状態を検索し、バッテリー状態を一時的に保存する（ステップＳ１３０６）。 FIG. 15 is a flowchart for explaining the processing procedure of the program stored in the second program and data storage means 105. As shown in FIG. 15, when a call is received, the control means 106 first searches for data corresponding to the received telephone number from the address book (step S1301). If the incoming call number is registered ("Yes" branch in step S1302), the name is temporarily saved (step S1303). Next, the outgoing call / incoming history is searched, and if there is a corresponding telephone number ("Yes" branch in step S1304), the number of outgoing calls and the number of received times are temporarily stored (step S1305). Next, the terminal state is searched, and the battery state is temporarily saved (step S1306).

さらに、ステップＳ１３０３、ステップＳ１３０５、およびステップＳ１３０６での保存状態を受けて、出力する文章を作成し（ステップＳ１３０７）、音声合成で出力する（ステップＳ１３０８）。 Further, in response to the storage state in step S1303, step S1305, and step S1306, a sentence to be output is created (step S1307) and output by speech synthesis (step S1308).

図１６は、ステップＳ１３０７における発声文を作成するためのルールの一例を説明するための図である。まず、ステップＳ１３０２において、アドレス帳に該当するデータがない場合、「でんわだよ」とする。アドレス帳に該当するデータがある場合、アドレス帳に登録されている読みを用いて「田中一郎さんから電話だよ」のように作成する。 FIG. 16 is a diagram for explaining an example of a rule for creating an utterance sentence in step S1307. First, in step S1302, if there is no corresponding data in the address book, “No phone” is set. If there is corresponding data in the address book, create it like “Ichiro Tanaka's phone call” using the readings registered in the address book.

次に、ステップＳ１３０４において、発信回数および着信回数がともに０〜４回の場合、該当する文章は作成しない。 Next, in step S1304, if the number of outgoing calls and the number of incoming calls are both 0 to 4, no corresponding sentence is created.

発信回数が５回以上で、着信回数が０〜４回のときは、「お待ちどう様、やっと、かかってきてよかったね」とする。さらに、着信回数が５回以上のときは、「今日は、よくかかってくるね」とする。 When the number of outgoing calls is 5 or more and the number of incoming calls is 0 to 4, it is said that "I'm glad you've finally received it". Furthermore, when the number of incoming calls is 5 or more, it is assumed that “Today often comes”.

さらに、ステップＳ１３０６において、電池の残量が、４０％以上の場合は、該当する文章はない。４０％未満の場合は、「どうでもいいけど、電源につないでくれよー。電池の残量がきれるよ。」とする。 In step S1306, if the remaining battery level is 40% or more, there is no corresponding sentence. If it's less than 40%, say "I don't care, but please connect it to the power source. The battery will run out."

例えば、ステップＳ１３０２において該当するデータがあり、着信回数が５回以上で、電池の残量が４０％未満の場合、「田中さんから電話だよ。今日は、よくかかってくるね。どうでもいいけど、電源につないでくれよー。電池の残量がきれるよ。」となり、ステップＳ１３０２において該当するデータがなく、着信回数および発信回数がなく、電池の残量が４０％以上の場合には、「でんわだよ」となる。 For example, if there is applicable data in step S1302, the number of incoming calls is 5 times or more, and the remaining battery level is less than 40%, “You are calling from Mr. Tanaka. But please connect it to the power supply. The remaining battery power is exhausted. ”In step S1302, there is no corresponding data, there are no incoming and outgoing calls, and the remaining battery capacity is 40% or more. “It ’s a phone.”

次に、図１７には、第２プログラムおよびデータ格納手段１０５に格納されれたプログラムの一例が示されている。図１７を参照して、第１プログラムおよびデータ格納手段１０１に格納されたＧＰＳプログラムと音声処理とを連携動作させるプログラムの動作について説明する。図１７は、ＧＰＳプログラムとサーバと音声処理とを連携して動作させる例を示す図であり、具体的には通信端末の現在の位置から複合施設を割り出し複合施設サービスを音声検索するためのプログラムのフローチャートである。 Next, FIG. 17 shows an example of a program stored in the second program and data storage means 105. With reference to FIG. 17, the operation of the first program and the program that causes the GPS program stored in the data storage means 101 and the voice processing to operate in cooperation will be described. FIG. 17 is a diagram illustrating an example in which a GPS program, a server, and voice processing are operated in cooperation. Specifically, a program for determining a complex facility from the current position of the communication terminal and performing a voice search for the complex facility service It is a flowchart of.

まず、通信端末１００内の第１プログラムおよびデータ格納手段１０１に格納されているＧＳＰプログラムを起動し（ステップＳ１５０１）、現在地を計測する（ステップＳ１５０２）。送受信手段１０４より現在地のデータをサーバに送信し、サーバ２００の送受信手段２０１ではこれを受信する（ステップＳ１５０３）。 First, the first program in the communication terminal 100 and the GSP program stored in the data storage means 101 are activated (step S1501), and the current location is measured (step S1502). The current location data is transmitted from the transmission / reception means 104 to the server, and the transmission / reception means 201 of the server 200 receives this (step S1503).

サーバ２００の音声処理言語情報作成手段２０２では、ステップＳ１５０３で受信した現在地データとサーバ内で管理している複合施設のサービスリストとにより辞書を作成する（ステップＳ１５０４）。 The voice processing language information creation unit 202 of the server 200 creates a dictionary from the current location data received in step S1503 and the service list of the complex facility managed in the server (step S1504).

辞書を、サーバ２００の送受信手段２０１により通信端末１００に送信し、通信端末１００の送受信手段１０４で、辞書を受信する（ステップＳ１５０５）。 The dictionary is transmitted to the communication terminal 100 by the transmission / reception means 201 of the server 200, and the transmission / reception means 104 of the communication terminal 100 receives the dictionary (step S1505).

次に、ステップＳ１５０５において受信した音声処理辞書を用いて、音声処理手段１０２の音声認識を起動する（ステップＳ１５０６）。 Next, using the speech processing dictionary received in step S1505, speech recognition of the speech processing means 102 is activated (step S1506).

さらに、通信端末１００内の第１プログラムおよびデータ格納手段１０１に格納されているブラウザを起動し（ステップＳ１５０７）、ステップＳ１５０２において取得した現在地の住所から複合施設サービスページを表示する（ステップＳ１５０８）。 Further, the browser stored in the first program and data storage means 101 in the communication terminal 100 is activated (step S1507), and the complex facility service page is displayed from the address of the current location acquired in step S1502 (step S1508).

ブラウザの表示中にユーザの発声があった場合（ステップＳ１５０９）、音声処理手段１０２で音声認識され（ステップＳ１５１０）、ページ中のリンクへのジャンプや文字入力などのコマンド処理が行われる（ステップＳ１５１１）。コマンドが終了コマンドであった場合、終了する。 When a user utters during the display of the browser (step S1509), the voice processing means 102 recognizes the voice (step S1510), and command processing such as jumping to a link in the page and inputting characters is performed (step S1511). ). If the command is an end command, exit.

次に、図１８および図１９を参照して、図１の第２プログラムおよびデータ格納手段１０５に格納されるプログラムの一例として、第１プログラムおよびデータ格納手段１０１に格納された赤外線プログラムと、音声処理と、サーバ２００とを連携動作させるプログラムの動作について説明する。 Next, referring to FIG. 18 and FIG. 19, as an example of the program stored in the second program and data storage means 105 in FIG. 1, the infrared program stored in the first program and data storage means 101 and the sound The process and the operation of the program that causes the server 200 to cooperate with each other will be described.

図１８および図１９は、赤外線プログラムと、音声処理手段１０２とを連動させて使用するプログラムの一例を示す図であり、具体的には映画館やショッピングモール等の複合型施設でのサービスを音声認識や合成により受けるためのプログラムの一例である。 FIG. 18 and FIG. 19 are diagrams showing an example of a program that uses the infrared program and the voice processing means 102 in conjunction with each other. Specifically, the service at a complex facility such as a movie theater or a shopping mall is voiced. It is an example of the program for receiving by recognition or composition.

このプログラムでは、複合施設の壁や柱や家具などの什器に、赤外線ポートを埋め込んだり、赤外線ポート専用の端末などのインフラを設けておくことが前提とされている。 In this program, it is assumed that infrared ports will be embedded in the fixtures such as walls, pillars, and furniture of the complex, and infrastructure such as terminals dedicated to infrared ports will be provided.

図１８は、サービスの形態を説明するための図であり、複合施設内の赤外線ポート毎のサービス一覧を表している。この例では、各赤外線固有のポートIDと、各ポートの店舗名や設置フロアなど固定の位置情報と、什器や端末を特定するための情報と、ポートより受けられるサービスにより表している。 FIG. 18 is a diagram for explaining a form of service, and represents a service list for each infrared port in the complex. In this example, it is represented by a port ID unique to each infrared ray, fixed position information such as a store name and an installation floor of each port, information for specifying a fixture or a terminal, and a service received from the port.

例えば、ポートID=001の赤外線ポートは、美術館１階のＡ２柱に設置の展示Ａに設置されている。このポートからは、展示品情報提供サービスの呼び出しが可動である。具体的には、展示Ａに関連した情報を音声合成により読み上げを行う。 For example, the infrared port with port ID = 001 is installed in the exhibition A installed on the A2 pillar on the first floor of the museum. From this port, it is possible to call an exhibit information providing service. Specifically, information related to exhibition A is read out by voice synthesis.

また、ポートID=100の赤外線ポートは、西館６Ｆ中華料理店ＢにあるＴ１００テーブルに設置されている。このポートからは、メニュー説明注文のサービスが可動である。具体的には、メニューの紹介を音声合成により読み上げたり、音声認識により注文を行う。 The infrared port with port ID = 100 is installed on the T100 table in the Chinese restaurant B on the 6th floor of the West Building. From this port, menu description ordering services are movable. Specifically, the menu introduction is read out by speech synthesis or an order is placed through speech recognition.

図１９は、赤外線プログラムと、音声処理とを連動させるプログラムの動作手順を示す図である。この例では、動作に必要なプログラムや音声処理に必要な辞書は、予め生成し、第１プログラムおよびデータ格納手段１０１に格納しておく。 FIG. 19 is a diagram illustrating an operation procedure of a program that links an infrared program and audio processing. In this example, a program necessary for operation and a dictionary necessary for voice processing are generated in advance and stored in the first program and data storage means 101.

まず、第１プログラムおよびデータ格納手段１０１に格納されている赤外線プログラムを起動し（ステップＳ１７０１）、現在地の赤外線ポートＩＤを取得する（ステップＳ１７０２）。取得したポートＩＤに従ったサービスプログラムに切り替える（ステップＳ１７０３）。 First, the infrared program stored in the first program and data storage means 101 is activated (step S1701), and the infrared port ID of the current location is acquired (step S1702). The service program is switched according to the acquired port ID (step S1703).

サービスプログラムでは、音声認識または音声合成が設定されているので、音声処理機能を起動する（ステップＳ１７０４）。 In the service program, since speech recognition or speech synthesis is set, the speech processing function is activated (step S1704).

起動されたプログラムに対して、ボタン入力や発声などのユーザ入力ある場合（ステップＳ１７０５）は、音声処理を実行する（ステップＳ１７０６）。具体的に、この処理では、音声合成による出力や、音声認識の結果をコマンドに変換して実行する。さらに、音声処理の結果が終了の場合には（ステップＳ１７０７）、終了する。 If there is a user input such as button input or utterance for the activated program (step S1705), voice processing is executed (step S1706). Specifically, in this process, an output by speech synthesis or a speech recognition result is converted into a command and executed. Further, when the result of the voice processing is finished (step S1707), the process is finished.

次に、本発明の第２の実施例を図面を参照して説明する。図２０は、この実施例の構成を示す図である。本実施例では、音声処理言語情報作成手段１１０３が、第１プログラムおよびデータ格納手段１１０１と第２プログラムおよびデータ格納手段１１０５の双方を参照して、音声言語情報を生成する点が、前記第１の実施例とは相違している。 Next, a second embodiment of the present invention will be described with reference to the drawings. FIG. 20 is a diagram showing the configuration of this embodiment. In the present embodiment, the speech processing language information creating unit 1103 refers to both the first program and data storage unit 1101 and the second program and data storage unit 1105 to generate speech language information. This embodiment is different from the first embodiment.

図２２は、音声処理言語情報作成手段１１０３の動作を説明するための図である。 FIG. 22 is a diagram for explaining the operation of the voice processing language information creation unit 1103.

図２２（ａ）は、音声処理言語情報作成手段１１０３が第１プログラムおよびデータ格納手段１１０３より読み込んだプログラムおよびデータを説明するための図であり、この例では、メーラとメーラの管理するデータであるアドレス帳データとメールデータとを用いるものとする。 FIG. 22A is a diagram for explaining the program and data read from the first program and data storage unit 1103 by the speech processing language information creation unit 1103. In this example, the mailer and data managed by the mailer are used. It is assumed that certain address book data and mail data are used.

図２２（ｂ）は、音声処理言語情報作成手段１１０３が、第２プログラムおよびデータ格納手段１１０５より読み込んだプログラムおよびデータを説明するための図である。この例では、商品検索プログラムと商品データを用いる。 FIG. 22B is a diagram for explaining the program and data read by the voice processing language information creation unit 1103 from the second program and data storage unit 1105. In this example, a product search program and product data are used.

図２２（ｃ）は、前記商品データの一例を説明するための図であり、商品ＩＤと商品名と商品データファイルとにより構成されている。 FIG. 22C is a diagram for explaining an example of the product data, and includes a product ID, a product name, and a product data file.

図２２（ｄ）は、音声処理言語情報作成手段１１０３で生成する文法のテンプレートである。このテンプレートを用いると、読みと助詞と商品名とメールコマンドより構成される文法が生成される。また、メールコマンドは、メールとコマンド（メール）より構成される。読みはアドレス帳より参照する。また、商品名は商品データより参照する。 FIG. 22D is a grammar template generated by the speech processing language information creation unit 1103. When this template is used, a grammar composed of a reading, a particle, a product name, and a mail command is generated. The mail command is composed of a mail and a command (mail). Refer to the address book for reading. The product name is referenced from the product data.

それぞれのプログラムが管理するデータより参照できない、助詞、メール、コマンド（メール）に登録する単語列は予め与えておく。 Word strings to be registered in particles, mails, and commands (mails) that cannot be referenced from data managed by each program are given in advance.

図２２（ｅ）は、文法テンプレートに、データを参照して文法を生成することを説明するための図で、参照後、「いっちゃんに商品００１をメールで送る」という発声を音声認識するための文法が生成されたことを意味している。 FIG. 22 (e) is a diagram for explaining that a grammar is generated by referring to data in a grammar template. After the reference, the utterance of “send product 001 by e-mail” is recognized by voice. This means that the grammar has been generated.

以上のように、音声処理言語情報作成手段１１０３で、第１プログラムおよびデータ格納手段１１０１より読み込んだプログラムおよびデータと、第２プログラムおよびデータ格納手段１１０１より読み込んだプログラムおよびデータとを連携させて音声言語情報を生成することにより、端末に固有のプログラムや機種やユーザに固有の情報や端末の状態によってダイナミックに変化する情報と機種に依存することなく作られた汎用的なプログラムやそのデータとを連携させ、音声認識や音声合成といった音声処理で制御することが可能になる。 As described above, the voice processing language information creation unit 1103 performs the voice processing by linking the program and data read from the first program and data storage unit 1101 with the program and data read from the second program and data storage unit 1101. By generating language information, a program specific to the terminal, a model specific to the user, information specific to the user, information that changes dynamically depending on the terminal status, and a general-purpose program or data created without depending on the model It is possible to control by voice processing such as voice recognition and voice synthesis.

次に、本発明の第３の実施例を図面を参照して説明する。図２３は、本発明の第３の実施例の構成を示す図である。 Next, a third embodiment of the present invention will be described with reference to the drawings. FIG. 23 is a diagram showing the configuration of the third exemplary embodiment of the present invention.

本実施例では、複数のサーバ２００〜ｎ００より構成されている点と、通信端末２０００や複数のサーバ２００〜ｎ００で生成される音声言語情報を統合するための通信端末２０００が音声処理言語情報統合手段２１０７を備えている点が前記第１の実施例と相違している。。 In this embodiment, the communication terminal 2000 is integrated with the processing terminal 2000 and the communication terminal 2000 for integrating the speech language information generated by the communication terminal 2000 and the plurality of servers 200 to n00. The point that the means 2107 is provided is different from the first embodiment. .

図２４は、音声処理言語情報統合手段２１０７の動作を説明するための図である。音声処理言語情報統合手段２１０７は、サーバ２００とサーバｎ００と通信端末２０００のそれぞれで生成された音声言語情報を読み込んで、音声言語情報を生成する。 FIG. 24 is a diagram for explaining the operation of the speech processing language information integration unit 2107. The speech processing language information integration unit 2107 reads speech language information generated by each of the server 200, the server n00, and the communication terminal 2000, and generates speech language information.

図２５（ａ）は、サーバ２００で生成された音声言語情報の一例を示す図であり、映画に関する情報検索を行うための音声認識用文法の構成図と構成図に対応する文法の一例を示している。 FIG. 25A is a diagram illustrating an example of speech language information generated by the server 200, and illustrates a configuration diagram of a speech recognition grammar for searching for information about a movie, and an example of a grammar corresponding to the configuration diagram. ing.

この文法は、映画名に関する項目である[映画]と、映画に関する項目である[項目(200)]と、問い合わせのための項目である[コマンド(200)]より構成されており、それぞれの項目に登録する単語列が定義されている。このように定義することにより、「映画Ａの開始時間を教えて」がこの文法で認識可能となる。 This grammar is composed of [movie] which is an item related to the movie name, [item (200)] which is an item related to the movie, and [command (200)] which is an item for inquiry. A word string to be registered in is defined. By defining in this way, “Tell me the start time of movie A” can be recognized by this grammar.

図２５（ｂ）は、サーバｎ００で生成された音声言語情報の一例で、店舗および施設に関する情報検索を行うための音声認識用文法の構成と対応する文法の一例を示している。 FIG. 25B is an example of speech language information generated by the server n00, and shows an example of a grammar corresponding to the structure of the speech recognition grammar for performing information search regarding stores and facilities.

この文法は、店舗や施設を特定するための項目である［店舗および施設名］と、店舗および施設に関する項目である［項目（ｎ００）］と、問い合わせのための項目である［コマンド（ｎ００）］より構成されており、［店舗および施設名］の［カテゴリ］と［名前］のそれぞれの項目に登録する単語列が定義されている。このように定義することにより、「レストランの店Ａの予算はいくら」がこの文法で認識可能となる。 This grammar is an item for specifying a store or facility [store and facility name], an item related to the store and facility [item (n00)], and an item for inquiries [command (n00). ], And a word string to be registered in each item of [Category] and [Name] of [Store and facility name] is defined. With this definition, “how much is the budget for restaurant A” can be recognized with this grammar.

図２５（ｃ）は、通信端末２０００で生成された音声言語情報の一例で、メール操作を行うための音声認識用文法の構成と対応する文法の一例を示している。 FIG. 25C is an example of speech language information generated by the communication terminal 2000, and shows an example of the grammar corresponding to the configuration of the speech recognition grammar for performing the mail operation.

この文法は、メールのあて先を指定するための項目である［読み］と、メールに関する項目である［メール］と、問い合わせのための項目である［コマンド（メール）］より構成されており、それぞれの項目に登録する単語列が定義されている。このように定義することにより、「いっちゃんにメールを送る」がこの文法で認識可能となる。 This grammar consists of [Read], which is an item for specifying the mail address, [Mail], which is an item related to mail, and [Command (Mail)], which is an item for inquiries. A word string to be registered in the item is defined. By defining in this way, it is possible to recognize "send an email to Icchan" with this grammar.

図２５（ｄ）は、音声処理言語情報統合手段２１０７で音声言語情報を合成するための一例であり、図２５（ａ）から図２５（ｃ）での音声言語情報を合成するための文法テンプレートの構成を示している。この文法テンプレートでは、映画や施設の情報をメールで送信するための文法を想定している。文法は、メールのあて先を指定するための［読み］と、［助詞］と、映画や施設の名前や項目を指定するための［映画および施設情報］と、メールを送信するための［メールコマンド］より構成される。項目［読み］は、通信端末２０００で生成された文法より参照する。また、項目［映画および施設情報］は、［映画情報］と、［助詞］と、［施設情報］と［助詞］より構成される。 FIG. 25D is an example for synthesizing the speech language information by the speech processing language information integration unit 2107, and a grammar template for synthesizing the speech language information in FIGS. 25A to 25C. The structure of is shown. This grammar template assumes a grammar for sending movie and facility information by email. The grammar consists of [Reading] to specify the destination of the email, [Participant], [Movie and facility information] to specify the name and item of the movie or facility, and [Mail command to send the email. ]. The item [reading] is referred to from the grammar generated by the communication terminal 2000. The item [movie and facility information] includes [movie information], [particles], [facility information], and [particles].

さらに、［映画情報］は、［映画］と、［項目（２００）］より構成され、［映画］と、［項目（２００）］は、サーバ２００で生成された文法より参照する。 Furthermore, [movie information] includes [movie] and [item (200)], and [movie] and [item (200)] are referred to by the grammar generated by the server 200.

［施設情報］は［店舗および施設名］と［項目（ｎ００）］より構成され、［店舗および施設名］と［項目（ｎ００）］は、サーバｎ００で生成された文法より参照する。 [Facility information] is composed of [Store and facility name] and [Item (n00)], and [Store and facility name] and [Item (n00)] are referenced from the grammar generated by the server n00.

図２５（ｅ）は、文法テンプレートに、データを参照して文法を生成することを説明するための図である。データを参照することで、「いっちゃんに映画Ａの開始時間とレストランの店Ａの予算をメールで送る」という発声を行う、音声認識のための文法が生成されたことを意味している。 FIG. 25E is a diagram for describing generation of a grammar with reference to data in a grammar template. By referring to the data, it means that a grammar for voice recognition is generated, which says, “I will send you the start time of movie A and the budget of restaurant A by email”.

以上のように、音声処理言語情報作成手段で複数のサーバで生成された音声言語情報と通信端末内で生成された音声言語情報を合成して音声言語情報を生成することにより、端末に固有のプログラムや機種やユーザに固有の情報や端末の状態によってダイナミックに変化する情報と機種に依存することなく作られた汎用的なプログラムやそのデータとを連携させ、音声認識や音声合成といった音声処理で制御することが可能になる。 As described above, by synthesizing the speech language information generated in the plurality of servers by the speech processing language information creation unit and the speech language information generated in the communication terminal, the speech language information is generated. By linking a program, model, user-specific information, information that changes dynamically depending on the state of the terminal, and a general-purpose program created without depending on the model and its data, voice processing such as voice recognition and voice synthesis It becomes possible to control.

以上本発明を上記実施例に即して説明したが、本発明は上記実施例の構成にのみ限定されるものでなく、本発明の範囲内で当業者であればなし得るであろう各種変形、修正を含むことは勿論である。 Although the present invention has been described with reference to the above embodiment, the present invention is not limited to the configuration of the above embodiment, and various modifications that can be made by those skilled in the art within the scope of the present invention. Of course, modifications are included.

本発明によれば、携帯電話や携帯端末で音声認識や音声合成などの音声処理と端末に内蔵されたプログラムやそのプログラムが管理するデータとを連携されるといった用途に適用できる。特に、端末が管理するシステムの状態によって動的に変化するデータや個人情報を音声処理とを連携させる用途に適用できる。 INDUSTRIAL APPLICABILITY According to the present invention, it can be applied to a use in which voice processing such as voice recognition and voice synthesis is linked with a program built in the terminal and data managed by the program in a mobile phone or a portable terminal. In particular, the present invention can be applied to use of data and personal information that dynamically change depending on the state of a system managed by a terminal in cooperation with voice processing.

また、音声処理と端末に内蔵されたプログラムやそのプログラムが管理するデータとサーバとを連携させる用途にも適用可能である。具体的には、携帯電話を用いて複合施設の情報検索案内サービス用途に適用可能である。例えば、上記実施例では、携帯型通信端末を例に説明したが、音声処理機能を具備した任意の電子装置等に対して適用できることは勿論である。 Also, the present invention can be applied to a use in which a voice server and a program built in a terminal, data managed by the program, and a server are linked. Specifically, it can be applied to information retrieval guidance service use of a complex facility using a mobile phone. For example, in the above-described embodiments, the portable communication terminal has been described as an example, but it is needless to say that the present invention can be applied to any electronic device having a voice processing function.

本発明の第１の実施の形態の構成を示すブロック図である。It is a block diagram which shows the structure of the 1st Embodiment of this invention. 本発明の第１の実施の形態の動作を示す流れ図である。It is a flowchart which shows the operation | movement of the 1st Embodiment of this invention. 本発明の第１の実施の形態の通信端末に記憶されているプログラムおよびそのデータの具体例を示す図である。It is a figure which shows the specific example of the program memorize | stored in the communication terminal of the 1st Embodiment of this invention, and its data. 本発明の第１の実施の形態の通信端末に予め格納されているプログラムおよびそのデータの具体例を示す図である。It is a figure which shows the specific example of the program previously stored in the communication terminal of the 1st Embodiment of this invention, and its data. 本発明の第１の実施の形態の通信端末に予め格納されているプログラムおよびそのデータの具体例を示す図である。It is a figure which shows the specific example of the program previously stored in the communication terminal of the 1st Embodiment of this invention, and its data. 本発明の第１の実施の形態の通信端末に予め格納されているプログラムおよびそのデータの具体例を示す図である。It is a figure which shows the specific example of the program previously stored in the communication terminal of the 1st Embodiment of this invention, and its data. 本発明の第１の実施の形態の通信端末に予め格納されているプログラムおよびそのデータの具体例を示す図である。It is a figure which shows the specific example of the program previously stored in the communication terminal of the 1st Embodiment of this invention, and its data. 本発明の第１の実施の形態の通信端末に予め格納されているプログラムおよびそのデータの具体例を示す図である。It is a figure which shows the specific example of the program previously stored in the communication terminal of the 1st Embodiment of this invention, and its data. 本発明の第１の実施の形態の通信端末での音声処理用の音声言語情報作成の具体例を示す図である。It is a figure which shows the specific example of the speech language information preparation for speech processing in the communication terminal of the 1st Embodiment of this invention. 本発明の第１の実施の形態の通信端末での音声処理用の音声言語情報作成の具体例を示す図である。It is a figure which shows the specific example of the speech language information preparation for speech processing in the communication terminal of the 1st Embodiment of this invention. 本発明の第１の実施の形態の通信端末での音声処理用の音声言語情報作成の具体例を示す図である。It is a figure which shows the specific example of the speech language information preparation for speech processing in the communication terminal of the 1st Embodiment of this invention. 本発明の第１の実施の形態のサーバに格納されているデータの具体例を示す図である。It is a figure which shows the specific example of the data stored in the server of the 1st Embodiment of this invention. 本発明の第１の実施の形態のサーバでの音声処理用の音声言語情報作成の具体例を示す図である。It is a figure which shows the specific example of the speech language information preparation for speech processing in the server of the 1st Embodiment of this invention. 本発明の第１の実施の形態のサーバでの音声処理用の音声言語情報作成の具体例を示す図である。It is a figure which shows the specific example of the speech language information preparation for speech processing in the server of the 1st Embodiment of this invention. 本発明の第１の実施の形態の端末外部よりダウンロードされたプログラムの動作の一例を示すための図である。It is a figure for showing an example of operation of a program downloaded from the outside of a terminal of a 1st embodiment of the present invention. 本発明の第１の実施の形態の端末外部よりダウンロードされたプログラムの動作の一例を示すための図である。It is a figure for showing an example of operation of a program downloaded from the outside of a terminal of a 1st embodiment of the present invention. 本発明の第１の実施の形態の端末外部よりダウンロードされたプログラムの動作の一例を示すための図である。It is a figure for showing an example of operation of a program downloaded from the outside of a terminal of a 1st embodiment of the present invention. 本発明の第１の実施の形態の端末外部よりダウンロードされたプログラムの動作の一例を示すための図である。It is a figure for showing an example of operation of a program downloaded from the outside of a terminal of a 1st embodiment of the present invention. 本発明の第１の実施の形態の端末外部よりダウンロードされたプログラムの動作の一例を示すための図である。It is a figure for showing an example of operation of a program downloaded from the outside of a terminal of a 1st embodiment of the present invention. 本発明の第２の実施の形態の構成を示す図である。It is a figure which shows the structure of the 2nd Embodiment of this invention. 本発明の第２の実施の形態の動作を示す流れ図である。It is a flowchart which shows the operation | movement of the 2nd Embodiment of this invention. 本発明の第２の実施の形態の通信端末での音声処理用の音声言語情報作成の具体例を示す図である。It is a figure which shows the specific example of the speech language information preparation for speech processing in the communication terminal of the 2nd Embodiment of this invention. 本発明の第３の実施の形態の構成を示す図である。It is a figure which shows the structure of the 3rd Embodiment of this invention. 本発明の第３の実施の形態の動作を示す流れ図である。It is a flowchart which shows operation | movement of the 3rd Embodiment of this invention. 本発明の第３の実施の形態の通信端末での音声処理言語情報統合の具体例を示す図である。It is a figure which shows the specific example of speech processing language information integration in the communication terminal of the 3rd Embodiment of this invention.

Explanation of symbols

１００、１０００、２０００通信端末
１０１、１１０１、２１０１第１プログラムおよびデータ格納手段
１０２、１１０２、２１０２音声処理手段
１０３、１１０３、２１０３音声処理言語情報作成手段
１０４、１１０４、２１０４送受信手段
１０５、１１０５、２１０５第２プログラムおよびデータ格納手段
１０６、１１０６、２１０６制御手段
２１０７音声処理言語情報統合手段
２００、ｎ００サーバ
２０１、ｎ０１送受信手段
２０２、ｎ０２音声処理言語情報作成手段 100, 1000, 2000 Communication terminal 101, 1101, 2101 First program and data storage means 102, 1102, 2102 Voice processing means 103, 1103, 2103 Voice processing language information creation means 104, 1104, 2104 Transmission / reception means 105, 1105, 2105 Second program and data storage means 106, 1106, 2106 Control means 2107 Voice processing language information integration means 200, n00 server 201, n01 transmission / reception means 202, n02 Voice processing language information creation means

Claims

A speech processing unit that performs speech recognition and / or speech synthesis processing;
A first storage unit that stores at least a program and / or data for realizing a predetermined function on a communication terminal;
Program and / or data that defines how to link the program and / or data input to the communication terminal from the outside of the communication terminal and stored in the first storage unit and the voice processing by the voice processing unit A second storage unit that stores at least
Using the program and / or data input from the outside of the communication terminal and the program and / or data stored in the first storage unit, voice processing by the voice processing unit, and the first A control unit that performs a control for cooperative operation of the function based on the program and / or data stored in the storage unit;
A communication terminal characterized by comprising:

The control unit starts the program stored in the second storage unit, and the started program calls the program stored in the first storage unit or uses data, The communication terminal according to claim 1, wherein voice processing by a voice processing unit and a program and / or data stored in the first storage unit are operated in a coordinated manner.

A speech processing unit that performs speech recognition and / or speech synthesis processing;
A first storage unit that stores at least information held by the communication terminal;
A second storage unit that stores at least a program that is input to the communication terminal from the outside of the communication terminal and that defines a procedure for creating language information for speech processing;
A control unit that controls activation of a program that defines a procedure for creating language information for voice processing, which is stored in the second storage unit;
With
When activated, the program that defines the procedure for creating language information for voice processing is used for voice processing in the voice processing unit using at least the information stored in the first storage unit. Create language information,
The communication terminal, wherein the voice processing unit performs the voice processing using the created language information.

The second storage unit is connected to a program and / or data input to the communication terminal from the outside of the communication terminal and stored in the first storage unit, and voice processing by the voice processing unit. Memorize the program that defines how,
The first storage unit is executed on the communication terminal and stores a program that realizes a predetermined function that is determined in advance or the program and data,
The control unit starts a program stored in the second storage unit, and cooperates with the audio processing by the audio processing unit and the program and / or data stored in the first storage unit The communication terminal according to claim 3, wherein:

The program and / or data input to the communication terminal from outside the communication terminal and stored in the second storage unit are transferred from a server to which the communication terminal communicates. The communication terminal as described in any one of thru | or 4.

The communication terminal transmits information necessary for creating language information for voice processing to the server,
6. The communication terminal receives the language information for voice processing created on the server side that has received the language information for voice processing, and performs voice processing in the voice processing unit. The communication terminal described in 1.

The first storage unit stores a program and / or data for generating language information for speech processing including at least one of a dictionary, a grammar, and a language model used in the speech processing unit. ,
The communication terminal includes a speech processing language information creation unit that creates language information for speech processing using at least a program and / or data stored in the first storage unit. The communication terminal according to any one of claims 1 to 4.

The second storage unit includes a program and / or data that is input from the outside of the communication terminal and that generates language information including at least one of a dictionary, a grammar, and a language model that is used in the voice processing unit. Remembered,
The speech processing language information creation unit creates language information for speech processing using a program and / or data stored in the first storage unit and the second storage unit. The communication terminal according to claim 7.

The communication terminal synthesizes speech processing language information created by the speech processing language information creation unit of the communication terminal and one or more speech processing language information input from outside the communication terminal. The communication terminal according to claim 7, further comprising a language information integration unit for voice processing that creates the language information.

The language information for speech processing input from the outside of the communication terminal to the communication terminal is created by a server to which the communication terminal is connected for communication and transferred from the server to the communication terminal. The communication terminal according to claim 9.

The server apparatus which carries out communication connection with the communication terminal as described in any one of Claims 1 thru | or 10.

A communication terminal that has a first storage unit that stores at least a program and / or data that realizes a predetermined function predetermined on the communication terminal, and that performs voice processing of voice recognition and / or voice synthesis,
From the outside of the communication terminal, at least a program that defines how to link the program and / or data stored in the first storage unit and the voice processing is input, and the input program is stored in the second storage. Storing in the department;
Using the program and / or data input from the outside of the communication terminal and the program and / or data stored in the first storage unit, voice processing by the voice processing unit, and the first A step of performing a control for cooperative operation of the function based on the program and / or data stored in the storage unit;
A speech processing method characterized by comprising:

The program stored in the second storage unit is activated, and the activated program calls the program stored in the first storage unit or uses data to generate a sound from the sound processing unit. The voice processing method according to claim 12, further comprising a step of causing a process and a program and / or data stored in the first storage unit to perform a cooperative operation.

A communication terminal that has a first storage unit that stores at least information held by the communication terminal and performs voice processing for voice recognition and / or voice synthesis,
Inputting at least a program defining a procedure for creating language information for speech processing from the outside of the communication terminal, and storing the input program in a second storage unit;
Starting a program that defines a procedure for creating language information for speech processing, which is stored in the second storage unit;
A program that defines a procedure for creating the language information for voice processing stored in the second storage unit is at least used by the voice processing unit using information stored in the first storage unit. Creating language information used for voice processing of
Performing the speech processing using the created language information;
A speech processing method characterized by comprising:

The second storage unit defines how to link the program and / or data input to the communication terminal from the outside of the communication terminal and stored in the first storage unit with the audio processing. Program to remember,
The first storage unit stores a program executed on the communication terminal and realizing a predetermined function or the program and data,
Including a step of activating a program stored in the second storage unit and causing the voice processing and the program and / or data stored in the first storage unit to operate in cooperation with each other. The voice processing method according to claim 14.

13. The program and / or data input to the communication terminal from outside the communication terminal and stored in the second storage unit are transferred from a server to which the communication terminal communicates. The voice processing method according to any one of 1 to 15.

The communication terminal transmits information necessary for creating language information for voice processing to the server;
The server receives the information and creates language information for the voice processing;
The communication terminal receives the language information for voice processing created on the server side, and performs voice processing in the voice processing unit;
The voice processing method according to claim 16, further comprising:

The first storage unit stores a program and / or data for generating language information for speech processing including at least one of a dictionary, a grammar, and a language model used in the speech processing unit. ,
16. The voice processing method according to claim 12, further comprising a step of creating the language information using at least a program and / or data stored in the first storage unit. .

A program for generating language information for speech processing including at least one of a dictionary, a grammar, and a language model, which is input from the outside of the communication terminal and used in the speech processing unit, in the second storage unit, and Data is stored,
16. The method according to claim 12, further comprising a step of creating the language information using a program and / or data stored in the first storage unit and the second storage unit. The voice processing method described in 1.

Creates language information obtained by synthesizing the speech processing language information created by the speech processing language information creation unit of the communication terminal and one or more speech processing language information input from outside the communication terminal. The voice processing method according to claim 19, further comprising a step.

The speech processing language information input to the communication terminal from the outside of the communication terminal is created by a server to which the communication terminal communicates and is transferred from the server to the communication terminal. The voice processing method according to claim 20.

A communication terminal, and a server for communication connection with the communication terminal,
The communication terminal includes voice processing means for performing recognition and / or synthesis voice processing;
Means for creating language information used by the voice processing means based on programs and / or data stored in advance in the communication terminal;
Means for operating the program and / or data stored in advance in the communication terminal and the voice processing using the language information in cooperation with the program and / or data downloaded from the server;
A communication system characterized by comprising:

A communication terminal, and a server for communication connection with the communication terminal;
With
The communication terminal is
A first storage unit for storing a program and / or data stored and held in advance by the communication terminal;
A speech processing unit that performs speech processing of at least one of speech recognition and speech synthesis;
Voice processing language information creating means for creating language information used in the voice processing unit according to the program and / or data stored in the first storage unit;
Means for obtaining a program and / or data from the server;
A second storage unit for storing the program and / or data acquired from the server;
Based on the program and / or data stored in the second storage unit, the voice processing unit and control means for controlling the program and / or data in the first storage unit in cooperation with each other;
Including
The server is
Means for receiving information transmitted from the communication terminal and transmitting the program and / or data generated by the server to the communication terminal;
From the data stored on the server side based on the data transmitted from the communication terminal, voice processing language information creating means for creating language information used in the voice processing unit;
A communication system comprising:

A first storage unit that stores programs and / or data stored in advance in the communication terminal;
A speech processing unit that performs speech processing of at least one of speech recognition and speech synthesis;
Speech processing language information creating means for creating language information used in the speech processing unit according to the program and / or data stored in the first storage unit;
Means for receiving programs and / or data from outside the communication terminal;
A second storage unit for storing a program and / or data received from the outside of the communication terminal; a program and / or data stored in the second storage unit; Control means for linking programs and / or data in the storage unit of
A communication terminal comprising:

A server communicatively connected to the communication terminal according to claim 24,
Means for receiving information transmitted from the communication terminal and transmitting a program and / or data generated by a server to the communication terminal;
Voice processing language information creating means for creating language information used in the voice processing unit from data stored on the server side based on data transmitted from the communication terminal;
A server characterized by including:

The communication terminal according to claim 24,
The first storage unit stores at least a state of the communication terminal,
The speech processing language information creating means creates the language information based on data that varies according to the state of the communication terminal.

26. The server of claim 25, wherein
The voice processing language information creation means receives from the communication terminal data that varies according to the state of the communication terminal stored in the first storage unit of the communication terminal, and the received data; A server characterized in that language information is created based on data stored and managed on the server side, and the created language information is transmitted to the communication terminal.

A communication terminal generates language information used in voice processing based on a program and / or data transmitted from the outside of the communication terminal to the communication terminal and a program and / or data stored in advance in the communication terminal And a process of
The communication terminal using the language information for the voice processing to execute a program and / or data stored in advance in the communication terminal and a process for causing the voice processing to cooperate with each other;
A speech processing method characterized by comprising:

In the computer that constitutes the communication terminal,
A process of storing a program and / or data stored in advance in the communication terminal;
A process of receiving a program and / or data generated outside the communication terminal;
Processing to store the received program and / or data;
Processing for performing at least one of speech recognition and speech synthesis;
Processing for generating speech processing language information for performing speech processing based on a program and / or data stored in advance in the communication terminal;
A process of linking audio processing with a program and / or data stored in advance in the communication terminal by the received program and / or data;
A program for running

A communication terminal and a server;
The communication terminal is
A first storage unit for storing programs and / or data stored and held in advance in the communication terminal;
A speech processing unit that performs speech processing of at least one of speech recognition and speech synthesis;
Means for obtaining a program and / or data from the server;
A second storage unit for storing a program and / or data acquired from the server;
Voice processing language information creating means for creating language information used in the voice processing unit according to the program and / or data stored in the first storage unit and the second storage unit;
Control means for linking the audio processing unit and the program and / or data in the first storage unit using the program and / or data stored in the second storage unit,
Including
Means for the server to receive information from the communication terminal side and transmit the program and / or data generated by the server to the communication terminal side;
Based on the data transmitted from the communication terminal, voice processing language information creating means for creating a dictionary for voice processing from data stored on the server side;
A communication system comprising:

A first storage unit for storing programs and / or data stored and held in advance in the communication terminal;
A speech processing unit that performs speech processing of at least one of speech recognition and speech synthesis;
Means for acquiring a program and / or data from outside the communication terminal;
A second storage unit for storing a program and / or data acquired from the outside of the communication terminal;
Voice processing language information creating means for creating language information used in the voice processing unit according to the program and / or data stored in the first storage unit and the second storage unit;
Control means for linking the sound processing unit and the program and / or data stored in the first storage unit using the program and / or data stored in the second storage unit ,
A communication terminal comprising:

Means for receiving the information transmitted from the communication terminal according to claim 29 and transmitting the program and / or data generated on the server side to the communication terminal;
Voice processing language information creating means for creating a dictionary used in the voice processing unit from data transmitted from the communication terminal and data stored on the server side;
A server characterized by including:

The communication terminal obtains a program and / or data downloaded from outside the communication terminal;
The communication terminal generates language information for voice processing used in voice processing based on the downloaded program and / or data and the program and / or data stored in the communication terminal in advance. When,
Using the language information for voice processing, the communication terminal causes a program stored in advance in the communication terminal, a program and / or data downloaded from the outside of the communication terminal, and the voice processing to cooperate with each other. Process,
A speech processing method characterized by comprising:

In the computer that constitutes the communication terminal,
Processing for acquiring programs and / or data from outside the communication terminal;
Processing for generating language information for speech processing used in speech processing based on the acquired program and / or data and the program and / or data stored in advance in the communication terminal;
Processing for performing at least one of speech recognition and speech synthesis;
Using the language information for voice processing, a program stored in advance in the communication terminal, a program and / or data downloaded from the outside of the communication terminal, and a process for operating the voice processing in cooperation with each other;
A program for running

Including a communication terminal and one or more servers,
The server receives information transmitted from the communication terminal, and transmits a program and / or data generated on the server side to the communication terminal;
Voice processing language information creating means for creating a dictionary for voice processing from data transmitted from the communication terminal and data stored on the server side;
Including
The communication terminal is
A first storage unit for storing programs and / or data stored and held in advance in the communication terminal;
A speech processing unit that performs speech processing of at least one of speech recognition and speech synthesis;
Means for obtaining a program and / or data from the server;
A second storage unit for storing a program and / or data acquired from the server;
Speech processing language information creating means for creating language information used in the speech processing unit in accordance with programs and / or data stored in both the first storage unit and the second storage unit;
Using the program and / or data stored in the second storage unit, the voice processing unit and control means for linking the program and / or data stored in the first storage unit;
Speech processing language information integrating means for combining speech processing language information created by the speech processing language information creating means and speech processing language information created by the speech processing language information creating means of the server; ,
A communication system comprising:

A speech processing language for creating a speech processing dictionary from transmission / reception means for transmitting a program and / or data generated on the server side to the communication terminal, and data transmitted from the communication terminal and data stored on the server side A communication terminal that is communicatively connected to a server including the information creating means;
A first storage unit for storing programs and / or data stored and held in advance in the communication terminal;
A speech processing unit that performs at least one of speech recognition and speech synthesis;
Means for obtaining a program and / or data from the server;
A second storage unit for storing a program and / or data acquired from the server;
Speech processing language information creating means for creating language information used in the speech processing unit in accordance with a program and / or data stored in both the first storage unit and the second storage unit;
Using the acquired program and / or data, control means for linking the audio processing unit and the program and / or data of the first storage unit;
Speech processing language information integration means for synthesizing speech processing language information created by the speech processing language information creation means and speech processing language information created by the speech processing language information creation means of the server;
A communication terminal comprising:

The communication terminal uses at least one of the program and / or data downloaded from one or more servers and the program and / or data stored in advance in the communication terminal in the communication terminal and / or the plurality of servers. Integrating the language information for speech processing generated in
The communication terminal uses the language information for voice processing to link the program and / or data stored in advance in the communication terminal, the program and / or data downloaded from the server, and the voice processing. A process of
A speech processing method characterized by comprising:

In the computer that constitutes the communication terminal,
Processing for storing a program and / or data stored in advance in the communication terminal in the first storage unit;
Processing to receive programs and / or data from one or more servers outside the communication terminal;
A process of storing the received program and / or data in a second storage unit;
Processing to perform at least one of speech recognition and speech synthesis;
For voice processing for performing voice processing based on the program and / or data stored in the second storage unit and the program and / or data stored in advance in the first storage unit Processing to generate language information in the communication terminal;
Processing for integrating speech processing language information generated in the communication terminal or in the server;
A process of linking audio processing with a program and / or data stored in advance in the first storage unit by means of the program and / or data of the second storage unit;
A program for running

A first processing unit that performs predetermined processing,
A first storage unit that is executed on the communication terminal and stores at least a program and / or data for realizing a predetermined function;
A program that defines the way in which the program and / or data input to the communication terminal from the outside of the communication terminal and stored in the first storage unit and the process by the first processing unit are linked; A second storage unit for storing at least data;
Using the program and / or data stored in the second storage unit, and further storing in the first storage unit using the program and / or data stored in the first storage unit A control unit that performs control for causing the function based on the program and / or data being performed and the processing performed by the first processing unit to operate in cooperation with each other;
A communication terminal characterized by comprising:

40. The communication terminal according to claim 39, wherein the first processing unit performs speech recognition and / or speech synthesis processing.

The server apparatus which carries out communication connection with the communication terminal of Claim 39 or 40, and transfers the said program and / or data memorize | stored in the said 2nd memory | storage part of the said communication terminal with respect to the said communication terminal.

A first processing unit that performs predetermined processing,
A first storage unit for storing at least a program and / or data for realizing a predetermined function on the electronic device;
A program that defines how to link the program and / or data input to the electronic device from the outside of the electronic device and stored in the first storage unit with the processing by the first processing unit And / or a second storage unit that stores at least data;
Using the program and / or data stored in the second storage unit, and further storing in the first storage unit using the program and / or data stored in the first storage unit A control unit that performs control for causing the function based on the program and / or data being performed and the processing performed by the first processing unit to operate in cooperation with each other;
An electronic device comprising:

43. The electronic device according to claim 42, wherein the first processing unit performs speech recognition and / or speech synthesis processing.