JP2020088818A

JP2020088818A - Call control system

Info

Publication number: JP2020088818A
Application number: JP2018225618A
Authority: JP
Inventors: 和愛三上; Kazue Mikami; 勇真五十嵐; Yuma Igarashi; 佐藤　篤; Atsushi Sato; 篤佐藤
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-06-04
Anticipated expiration: 2038-11-30
Also published as: JP7112949B2

Abstract

【課題】発側および着側の双方の間で通話内容のテキストを一致させること。【解決手段】一実施形態に係る呼制御システムは音声テキスト化サービスを実行可能である。呼制御システムは、発信者と着信者との双方が音声テキスト化サービスの利用者である場合に、発信端末に対応する発側メディア処理装置と着信端末に対応する着側メディア処理装置とのうちの一方を共通のメディア処理装置として機能させる制御部を備える。共通のメディア処理装置は音声認識エンジンと接続する。共通のメディア処理装置は、発側音声を音声認識エンジンに入力することで発側テキストを取得し、発側テキストを発信端末および着信端末の双方に向けて送信する。共通のメディア処理装置は、着側音声を音声認識エンジンに入力することで着側テキストを取得し、着側テキストを発信端末および着信端末の双方に向けて送信する。【選択図】図２An object of the present invention is to match the text of the contents of a call between both a calling party and a called party. A call control system according to one embodiment is capable of performing a speech-to-text service. When both the caller and the called party are users of the speech-to-text service, the call control system selects between the calling-side media processing device corresponding to the calling terminal and the called-side media processing device corresponding to the called terminal. function as a common media processing device. A common media processing unit interfaces with the speech recognition engine. The common media processing device acquires the calling side text by inputting the calling side voice to the speech recognition engine, and transmits the calling side text to both the calling terminal and the called terminal. The common media processing device acquires the destination text by inputting the destination speech to the speech recognition engine, and transmits the destination text to both the originating terminal and the terminating terminal. [Selection drawing] Fig. 2

Description

本開示の一側面は呼制御システムに関する。 One aspect of the present disclosure relates to call control systems.

端末間で伝送される通話の内容をテキストに変換して少なくとも一方の端末にそのテキストを表示する技術が知られている。例えば、特許文献１には、第１の端末から入力された音声信号を音声認識し、音声認識結果の読み情報を生成し、少なくとも読み情報を、第１の端末の通話相手である第２の端末に表示させる電話システムが記載されている。 A technique is known in which the content of a call transmitted between terminals is converted into text and the text is displayed on at least one terminal. For example, in Patent Document 1, a voice signal input from a first terminal is voice-recognized, read information of a voice recognition result is generated, and at least the read information is read by a second party who is a communication partner of the first terminal. The telephone system displayed on the terminal is described.

特開２００８−６６８６６号公報JP, 2008-66866, A

上記の電話システムは、一方の話者の発話をテキストに変換して該テキストを他方の話者の電話機に伝送するので、この仕組みは一方向のテキスト変換である。一方の話者の発話を双方の話者が視認する場面を実現するための手段として、発側および着側の双方に音声認識サーバを設置することが考えられる。しかし、音声認識エンジンへの接続が発側と着側との間で異なると音声認識の結果が異なってしまう可能性があり、その結果、一つの発話を表すテキストが発側と着側とで異なる可能性がある。そのため、発側および着側の双方の間で通話内容のテキストを一致させることが望まれている。 This mechanism is a one-way text conversion because the above telephone system converts the utterance of one speaker into text and transmits the text to the telephone of the other speaker. As a means for realizing a situation in which both speakers visually recognize the utterance of one speaker, it is conceivable to install a voice recognition server on both the calling side and the receiving side. However, if the connection to the voice recognition engine is different between the caller side and the callee side, the result of voice recognition may be different, and as a result, the text representing one utterance may be different between the caller side and the callee side. May be different. Therefore, it is desired to match the text of the call content between the calling side and the called side.

本開示の一側面に係る呼制御システムは、発信端末と着信端末との間で伝送される通話をテキストに変換する音声テキスト化サービスを実行可能である。呼制御システムは、発信端末を利用する発信者と着信端末を利用する着信者との双方が音声テキスト化サービスの利用者である場合に、発信端末に対応する発側メディア処理装置と着信端末に対応する着側メディア処理装置とのうちの一方を共通のメディア処理装置として機能させる制御部を備える。共通のメディア処理装置は、発信者または着信者の音声をテキストに変換する音声認識エンジンと接続する。共通のメディア処理装置は、発信端末から送信された発信者の発側音声を音声認識エンジンに入力することで発側テキストを取得し、発側テキストを発信端末および着信端末の双方に向けて送信する。共通のメディア処理装置は、着信端末から送信された着信者の着側音声を音声認識エンジンに入力することで着側テキストを取得し、着側テキストを発信端末および着信端末の双方に向けて送信する。 A call control system according to an aspect of the present disclosure can execute a voice text service that converts a call transmitted between a calling terminal and a receiving terminal into text. When both the caller using the calling terminal and the callee using the called terminal are users of the voice text service, the call control system provides the calling media processing device and the called terminal corresponding to the calling terminal. A control unit that causes one of the corresponding receiving-side media processing devices to function as a common media processing device is provided. The common media processing device interfaces with a voice recognition engine that converts the voice of the caller or callee into text. The common media processing device acquires the calling side text by inputting the calling side calling voice of the calling party sent from the calling terminal to the voice recognition engine, and sends the calling side text to both the calling terminal and the called terminal. To do. The common media processing device acquires the callee text by inputting the callee voice of the callee sent from the callee terminal to the voice recognition engine, and sends the callee text to both the caller terminal and the callee terminal. To do.

このような側面においては、発信者および着信者の双方が音声認識サービスの利用者である場合に、発信者および着信者の双方の音声が共通のメディア処理装置を介してテキストに変換され、そのテキストが発信端末および着信端末の双方に送信される。発側および着側の双方について、共通のメディア処理装置が用いられるので、発側および着側の双方の間で通話内容のテキストを一致させることができる。 In such an aspect, when both the caller and the callee are users of the voice recognition service, the voices of both the caller and the callee are converted to text through a common media processing device, and The text is sent to both the originating and terminating terminals. Since the common media processing device is used for both the calling side and the called side, the texts of the call contents can be matched between the calling side and the called side.

本開示の一側面によれば、発側および着側の双方の間で通話内容のテキストを一致させることができる。 According to an aspect of the present disclosure, texts of call contents can be matched between the calling side and the called side.

実施形態に係る呼制御システムの全体構成の一例を示す図である。It is a figure showing an example of the whole composition of the call control system concerning an embodiment. 実施形態に係るいくつかの通信制御装置の機能構成の一例を示す図である。It is a figure showing an example of functional composition of some communication control units concerning an embodiment. 実施形態に係る呼制御システムの動作の一例を示すシーケンス図である。It is a sequence diagram which shows an example of operation|movement of the call control system which concerns on embodiment. 実施形態に係る呼制御システムの動作の一例を示すシーケンス図である。It is a sequence diagram which shows an example of operation|movement of the call control system which concerns on embodiment. 実施形態に係る呼制御システムの動作の一例を示すシーケンス図である。It is a sequence diagram which shows an example of operation|movement of the call control system which concerns on embodiment. 実施形態に係る呼制御システムの動作の一例を示すシーケンス図である。It is a sequence diagram which shows an example of operation|movement of the call control system which concerns on embodiment. 実施形態に係る通信制御装置に用いられるコンピュータのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the computer used for the communication control apparatus which concerns on embodiment.

以下、添付図面を参照しながら本開示での実施形態を詳細に説明する。なお、図面の説明において同一または同等の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the description of the drawings, the same or equivalent elements will be denoted by the same reference symbols, without redundant description.

呼制御システムは、発信端末と着信端末との間の呼および通話を制御するコンピュータシステムである。呼とは発信端末と着信端末との間で一時的に占有される通信経路のことをいう。発信端末とは最初に呼接続を要求する通信端末のことをいい、着信端末とはその呼接続要求に応答する通信端末のことをいう。これら二つの通信端末間で呼が確立されることで、発信者（発信端末のユーザ）および着信者（着信端末のユーザ）は会話することができる。通話とは、発信端末と着信端末との間で送受信される音声を意味し、また、発信端末と着信端末との間での音声の送受信も意味する。 The call control system is a computer system that controls calls between a calling terminal and a called terminal. A call refers to a communication path that is temporarily occupied between a calling terminal and a called terminal. The calling terminal means a communication terminal that first requests a call connection, and the called terminal means a communication terminal that responds to the call connection request. By establishing a call between these two communication terminals, the caller (user of the caller terminal) and the callee (user of the callee terminal) can talk. The call means a voice transmitted and received between the calling terminal and the receiving terminal, and also means a voice transmission and receiving between the calling terminal and the receiving terminal.

本実施形態では、呼制御システムは、発信端末と着信端末との間の通話をテキストに変換して、変換されたテキストを発信端末および着信端末の少なくとも一方に表示させる音声テキスト化サービス（これは音声認識サービスともいう。）を実行する。本開示では、変換されたテキストを音声テキストともいう。 In the present embodiment, the call control system converts a call between a calling terminal and a called terminal into text, and displays the converted text on at least one of the calling terminal and the called terminal. (Also called voice recognition service). In the present disclosure, the converted text is also referred to as voice text.

図１は実施形態に係る呼制御システム１の全体構成を示す図である。呼制御システム１は、発信端末３１が在圏する発側ネットワーク２１と、着信端末３２が在圏する着側ネットワーク２２と、発側ネットワーク２１および着側ネットワーク２２を接続するコアネットワーク１０とを備える。呼制御システム１では、複数の装置および端末の間で制御信号が伝送されることで呼（通信経路）が確立され、音声を示すデータ信号がその呼を介して伝送されることで、通話が可能になる。 FIG. 1 is a diagram showing an overall configuration of a call control system 1 according to an embodiment. The call control system 1 includes an originating network 21 in which an originating terminal 31 is located, a destination network 22 in which an incoming terminal 32 is located, and a core network 10 connecting the originating network 21 and the destination network 22. .. In the call control system 1, a call (communication path) is established by transmitting a control signal between a plurality of devices and terminals, and a call is established by transmitting a data signal indicating voice through the call. It will be possible.

発信端末３１および着信端末３２はいずれも、通話機能を有する通信端末である。発信端末３１および着信端末３２のそれぞれは固定端末でもよいし携帯端末でもよい。発信端末３１および着信端末３２の例として、携帯電話機、スマートフォン、タブレット端末、ウェアラブル端末、またはパーソナルコンピュータが挙げられるが、端末の種類はこれらに限定されない。発信端末３１と着信端末３２とで端末の種類が同じでもよいし異なってもよい。 Both the calling terminal 31 and the receiving terminal 32 are communication terminals having a call function. Each of the calling terminal 31 and the receiving terminal 32 may be a fixed terminal or a mobile terminal. Examples of the calling terminal 31 and the receiving terminal 32 include a mobile phone, a smartphone, a tablet terminal, a wearable terminal, and a personal computer, but the types of terminals are not limited to these. The originating terminal 31 and the receiving terminal 32 may be of the same type or of different types.

発側ネットワーク２１および着側ネットワーク２２はいずれも、端末が直接に接続するアクセスネットワークである。アクセスネットワークの構成は限定されない。例えば、アクセスネットワークは任意の無線ネットワークまたは有線ネットワークであってもよい。発側ネットワーク２１と着側ネットワーク２２との間でアクセスネットワークの種類（プロトコル）が同じでもよいし異なってもよい。 Both the originating network 21 and the terminating network 22 are access networks to which terminals directly connect. The configuration of the access network is not limited. For example, the access network may be any wireless or wired network. The type (protocol) of the access network may be the same or different between the source network 21 and the destination network 22.

コアネットワーク１０は、呼制御システム１の中核を成すネットワークであり、様々な通信制御装置を備える。本実施形態では、コアネットワーク１０はＩＭＳネットワークであるとする。ＩＭＳネットワークは、通信プロトコルとしてＳＩＰを用い、データ通信だけでなく音声または動画のリアルタイム通信を実現するマルチメディアサービスを提供できるネットワークである。ＩＭＳネットワークでは、呼セッション制御機能（ＣＳＣＦ：ＣａｌｌＳｅｓｓｉｏｎＣｏｎｔｒｏｌＦｕｎｃｔｉｏｎ）、アプリケーションサーバ（ＡＳ：ＡｐｐｌｉｃａｔｉｏｎＳｅｒｖｅｒ）、ゲートウェイ、加入者管理機能（ＨＳＳ：ＨｏｍｅＳｕｂｓｃｒｉｂｅｒＳｅｒｖｅｒ）などの複数の通信制御装置により呼が処理される。ＣＳＣＦは、呼またはセッションを設定したり、予め定められたサービスを起動したりする呼制御装置である。アプリケーションサーバは、予め定められた付加サービス（例えば、音声テキスト化サービス）を実行したり、その付加サービスの実行の可否を判定したりする装置である。ゲートウェイは、アクセスネットワークとコアネットワークとを接続する装置である。ＨＳＳはユーザのプロファイル（加入者情報）を記憶する装置（データベース）である。 The core network 10 is a network that forms the core of the call control system 1 and includes various communication control devices. In this embodiment, the core network 10 is an IMS network. The IMS network is a network that uses SIP as a communication protocol and can provide a multimedia service that realizes not only data communication but also voice or moving image real-time communication. In the IMS network, a call is processed by a plurality of communication control devices such as a call session control function (CSCF), an application server (AS), a gateway, and a subscriber management function (HSS: Home Subscriber Server). To be done. The CSCF is a call control device that sets up a call or session or activates a predetermined service. The application server is a device that executes a predetermined supplementary service (for example, a voice text conversion service) and determines whether or not the supplementary service can be executed. The gateway is a device that connects the access network and the core network. The HSS is a device (database) that stores user profiles (subscriber information).

本実施形態では、コアネットワーク１０は、ＭＣＥ（ＭｅｄｉａＣｏｍｐｏｓｉｔｉｏｎＥｎａｂｌｅｒ）およびＳＭＳ−ＧＷ（ＳＭＳゲートウェイ）という２種類の通信制御装置をさらに備える。ＭＣＥは通話の付加機能を提供するメディア処理装置である。ＳＭＳ−ＧＷは、コアネットワークと他のネットワークとを接続するゲートウェイの一種であり、ショートメッセージサービス（ＳＭＳ）を提供する装置である。 In the present embodiment, the core network 10 further includes two types of communication control devices, an MCE (Media Composition Enabler) and an SMS-GW (SMS gateway). The MCE is a media processing device that provides an additional function of calling. The SMS-GW is a type of gateway that connects the core network and other networks, and is a device that provides a short message service (SMS).

図１は、付加サービスを伴う呼の制御に特に関連する通信制御装置を示し、具体的には、発側ＣＳＣＦ１１、着側ＣＳＣＦ１２、発側ＡＳ１３、着側ＡＳ１４、発側ＭＣＥ１５、着側ＭＣＥ１６、発側ＳＭＳ−ＧＷ１７、および着側ＳＭＳ−ＧＷ１８を示す。 FIG. 1 shows a communication control device particularly related to control of a call involving supplementary services, and specifically, a calling side CSCF 11, a called side CSCF 12, a calling side AS 13, a called side AS 14, a calling side MCE 15, a called side MCE 16, The originating side SMS-GW 17 and the destination side SMS-GW 18 are shown.

発側ＣＳＣＦ１１および着側ＣＳＣＦ１２はいずれも、発信端末３１と着信端末３２とを通信接続するための呼制御を実行する。発側ＣＳＣＦ１１と着側ＣＳＣＦ１２との間で制御信号およびデータ信号（例えば音声データ）が送受信されることで、発側と着側とが相互に接続される。発側ＡＳ１３は発側のアプリケーションサーバであり、着側ＡＳ１４は着側のアプリケーションサーバである。発側ＭＣＥ１５は発側のメディア処理装置であり、着側ＭＣＥ１６は着側のメディア処理装置である。発側ＳＭＳ−ＧＷ１７は発側のＳＭＳゲートウェイであり、着側ＳＭＳ−ＧＷ１８は着側のＳＭＳゲートウェイである。 Both the originating CSCF 11 and the terminating CSCF 12 execute call control for communicatively connecting the originating terminal 31 and the terminating terminal 32. By transmitting and receiving control signals and data signals (for example, voice data) between the calling side CSCF 11 and the called side CSCF 12, the calling side and the called side are connected to each other. The originating side AS 13 is an originating side application server, and the destination side AS 14 is a destination side application server. The calling-side MCE 15 is a calling-side media processing device, and the receiving-side MCE 16 is a receiving-side media processing device. The originating SMS-GW 17 is an originating SMS gateway, and the destination SMS-GW 18 is a destination SMS gateway.

図１はさらに発側Ｗｅｂサーバ４１、着側Ｗｅｂサーバ４２、および音声認識エンジン４３を示す。発側Ｗｅｂサーバ４１および音声認識エンジン４３は、発信端末３１に音声テキスト化サービスを提供する発側サービス基盤を構成する。着側Ｗｅｂサーバ４２および音声認識エンジン４３は、着信端末３２に音声テキスト化サービスを提供する着側サービス基盤を構成する。音声認識エンジン４３は、発側および着側の双方により用いられる共通のコンピュータであり、音声認識を用いて音声をテキストに変換する。発側および着側のサービス基盤はいずれも、コアネットワーク１０とは別の通信ネットワーク内に設けられる。発側Ｗｅｂサーバ４１は、発信端末３１、発側ＡＳ１３、および発側ＭＣＥ１５のそれぞれとデータ通信を実行することができる。着側Ｗｅｂサーバ４２は、着信端末３２、着側ＡＳ１４、および着側ＭＣＥ１６のそれぞれとデータ通信を実行することができる。音声認識エンジン４３は発側ＭＣＥ１５および着側ＭＣＥ１６のそれぞれとデータ通信を実行することができる。発信端末３１は発側Ｗｅｂサーバ４１と接続することで音声テキスト化サービスを発信者に提供することができる。着信端末３２は着側Ｗｅｂサーバ４２と接続することで音声テキスト化サービスを着信者に提供することができる。 FIG. 1 further shows a calling side Web server 41, a receiving side Web server 42, and a voice recognition engine 43. The calling side Web server 41 and the voice recognition engine 43 constitute a calling side service infrastructure for providing a voice text service to the calling terminal 31. The receiving side Web server 42 and the voice recognition engine 43 constitute a receiving side service infrastructure that provides a voice text service to the receiving terminal 32. The voice recognition engine 43 is a common computer used by both the calling side and the receiving side, and converts voice into text using voice recognition. Both the originating and terminating service infrastructures are provided in a communication network separate from the core network 10. The originating Web server 41 can execute data communication with each of the originating terminal 31, the originating AS 13, and the originating MCE 15. The destination Web server 42 can execute data communication with each of the receiving terminal 32, the destination AS 14, and the destination MCE 16. The voice recognition engine 43 can perform data communication with each of the calling MCE 15 and the receiving MCE 16. The calling terminal 31 can provide a voice text service to the caller by connecting to the calling Web server 41. The receiving terminal 32 can provide a voice text service to the callee by connecting to the receiving side Web server 42.

本実施形態では、コアネットワーク１０はセッションデータベース（セッションＤＢ）１９をさらに備える。セッションデータベース１９は、音声テキスト化サービスを伴う呼（セッション）に関するセッション情報を記憶する装置（記憶部）であり、発側および着側の双方により用いられる共通のデータベースである。セッションデータベース１９は発側ＡＳ１３および着側ＡＳ１４にアクセスされ得る。 In this embodiment, the core network 10 further includes a session database (session DB) 19. The session database 19 is a device (storage unit) that stores session information regarding a call (session) involving a voice text service, and is a common database used by both the calling side and the called side. The session database 19 can be accessed by the originating AS 13 and the terminating AS 14.

例えば、一つの呼に対応するセッション情報は、セッションＩＤ、発側補助セッションＩＤ、着側補助セッションＩＤ、発信端末３１の加入者番号、着信端末３２の加入者番号、発側エンドポイント、着側エンドポイント、および認識方向というデータ項目群を含んでもよい。セッションＩＤは呼（セッション）を一意に特定する識別子である。補助セッションＩＤは、コアネットワーク１０の外側に位置するＷｅｂサーバでも呼を一意に特定できるように用意される識別子である。発側補助セッションＩＤは発側Ｗｅｂサーバ４１のために用いられ、着側補助セッションＩＤは着側Ｗｅｂサーバ４２のために用いられる。エンドポイントはＷｅｂサーバを一意に特定する識別子である。発側エンドポイントは発側Ｗｅｂサーバ４１を一意に特定し、着側エンドポイントは着側Ｗｅｂサーバ４２を一意に特定する。認識方向は、音声テキストをどの通信端末に送信するかを示す情報である。 For example, the session information corresponding to one call includes a session ID, a calling side auxiliary session ID, a called side auxiliary session ID, a subscriber number of the calling terminal 31, a subscriber number of the called terminal 32, a calling end point, a called side. It may include a data item group such as an endpoint and a recognition direction. The session ID is an identifier that uniquely identifies a call (session). The auxiliary session ID is an identifier prepared so that the Web server located outside the core network 10 can uniquely identify the call. The calling side auxiliary session ID is used for the calling side web server 41, and the receiving side auxiliary session ID is used for the receiving side web server 42. The endpoint is an identifier that uniquely identifies the Web server. The originating endpoint uniquely identifies the originating Web server 41, and the destination endpoint uniquely identifies the destination Web server 42. The recognition direction is information indicating to which communication terminal the voice text is transmitted.

セッション情報のデータ構造は限定されず、任意の方針で設計されてよい。例えば、セッション情報は発側のレコードと着側のレコードとを互いに関連付けることで表現されてもよい。あるいは、セッション情報は、発側および着側の双方のデータ項目が１レコードに統合されることで表現されてもよい。 The data structure of session information is not limited and may be designed by any policy. For example, the session information may be expressed by associating a record on the sending side and a record on the receiving side with each other. Alternatively, the session information may be expressed by integrating the data items of both the calling side and the called side into one record.

図１に示す各装置は、少なくとも一つのコンピュータを用いて構成される。複数のコンピュータが用いられる場合には、これらのコンピュータが通信ネットワークを介して相互に接続することで、論理的に一つの装置が構築される。 Each device shown in FIG. 1 is configured by using at least one computer. When a plurality of computers are used, these computers are mutually connected via a communication network to logically form one device.

呼制御システム１の特徴の一つは、発信者および着信者の双方が音声テキスト化サービスを利用する場合に、発側および着側のいずれか一方が、発信者および着信者の双方の音声をテキストに変換する点にある。図１に示すように音声認識エンジン４３が発側と着側とで共通であったとしても、その音声認識エンジン４３への接続が発側と着側との間で異なると音声認識の結果が異なってしまう可能性がある。例えば、或る一つの発話が発側ＭＣＥ１５から音声認識エンジン４３に入力された場合と、同じ発話が着側ＭＣＥ１６から音声認識エンジン４３に入力された場合とで、音声テキストが異なる可能性がある。発側および着側の双方の間で通話内容のテキストを一致させるために、呼制御システム１では、発側ＭＣＥ１５および着側ＭＣＥ１６のうちの一方のみが共通のメディア処理装置として機能する。この共通のメディア処理装置は、発信者および着信者の双方の音声を音声認識エンジン４３に送信し、音声テキストを発側Ｗｅｂサーバ４１および着側Ｗｅｂサーバ４２の双方に送信する。図１は、この仕組みに関連する接続５１，５２も示す。接続５１は一つの呼（セッション）において発側ＭＣＥ１５が共通のメディア処理装置として機能する場合に用いられ、接続５２は一つの呼（セッション）において着側ＭＣＥ１６が共通のメディア処理装置として機能する場合に用いられる。 One of the characteristics of the call control system 1 is that, when both the caller and the callee use the voice text conversion service, one of the caller and the callee receives the voice of both the caller and the callee. It is in the point of converting to text. As shown in FIG. 1, even if the voice recognition engine 43 is common to the caller side and the callee side, if the connection to the voice recognition engine 43 is different between the caller side and the callee side, the result of voice recognition may be different. It can be different. For example, the voice text may be different when a certain utterance is input from the originating MCE 15 to the voice recognition engine 43 and when the same utterance is input from the destination MCE 16 to the voice recognition engine 43. .. In the call control system 1, only one of the calling side MCE 15 and the called side MCE 16 functions as a common media processing device in order to match the texts of the call contents between the calling side and the called side. The common media processing device transmits the voices of both the caller and the callee to the voice recognition engine 43, and transmits the voice text to both the calling side web server 41 and the receiving side web server 42. FIG. 1 also shows the connections 51, 52 associated with this scheme. Connection 51 is used when the originating MCE 15 functions as a common media processing device in one call (session), and connection 52 is a case where the receiving MCE 16 functions as a common media processing device in one call (session). Used for.

図２は、アプリケーションサーバの機能構成の一例を示す図である。発側ＡＳ１３は機能要素としてサービス制御部１３１、セッション制御部１３２、およびサービスシナリオ部１３３を備える。サービス制御部１３１は発側ＣＳＣＦ１１との間でデータを送受信する機能要素である。セッション制御部１３２は発側ＭＣＥ１５との間でデータを送受信する機能要素である。サービスシナリオ部１３３は発側ＳＭＳ−ＧＷ１７および発側Ｗｅｂサーバ４１のそれぞれとの間でデータを送受信する機能要素である。発側ＭＣＥ１５が発側および着側のそれぞれの音声を処理する場合には、サービスシナリオ部１３３は着側Ｗｅｂサーバ４２との間でもデータを送受信する可能性があり、図２における接続６１はその通信を示す。 FIG. 2 is a diagram illustrating an example of the functional configuration of the application server. The originating AS 13 includes a service control unit 131, a session control unit 132, and a service scenario unit 133 as functional elements. The service control unit 131 is a functional element that transmits and receives data to and from the originating CSCF 11. The session control unit 132 is a functional element that transmits and receives data to and from the originating MCE 15. The service scenario unit 133 is a functional element that transmits/receives data to/from each of the originating SMS-GW 17 and the originating Web server 41. When the originating MCE 15 processes the respective voices of the originating side and the destination side, the service scenario unit 133 may send/receive data to/from the destination Web server 42 as well, and the connection 61 in FIG. Indicates communication.

着側ＡＳ１４は機能要素としてサービス制御部１４１、セッション制御部１４２、およびサービスシナリオ部１４３を備える。サービス制御部１４１は着側ＣＳＣＦ１２との間でデータを送受信する機能要素である。セッション制御部１４２は着側ＭＣＥ１６との間でデータを送受信する機能要素である。サービスシナリオ部１４３は着側ＳＭＳ−ＧＷ１８および着側Ｗｅｂサーバ４２のそれぞれとの間でデータを送受信する機能要素である。着側ＭＣＥ１６が発側および着側のそれぞれの音声を処理する場合には、サービスシナリオ部１４３は発側Ｗｅｂサーバ４１との間でもデータを送受信する可能性があり、図２における接続６２はその通信を示す。 The destination AS 14 includes a service control unit 141, a session control unit 142, and a service scenario unit 143 as functional elements. The service control unit 141 is a functional element that transmits and receives data to and from the destination CSCF 12. The session control unit 142 is a functional element that transmits and receives data to and from the destination MCE 16. The service scenario unit 143 is a functional element that transmits/receives data to/from each of the destination SMS-GW 18 and the destination Web server 42. When the receiving-side MCE 16 processes the respective voices of the calling-side and the calling-side, the service scenario unit 143 may send/receive data to/from the calling-side Web server 41, and the connection 62 in FIG. Indicates communication.

発側ＡＳ１３および着側ＡＳ１４はいずれも、発信者および着信者の双方が音声テキスト化サービスを利用する場合に、発側ＭＣＥ１５および着側ＭＣＥ１６のうちの一方を共通のメディア処理装置として機能させる制御部を備える。発側ＡＳ１３では、サービス制御部１３１、セッション制御部１３２、およびサービスシナリオ部１３３の少なくとも一つがその制御部に相当する。着側ＡＳ１４では、サービス制御部１４１、セッション制御部１４２、およびサービスシナリオ部１４３の少なくとも一つがその制御部に相当する。 Both the originating side AS13 and the destination side AS14 control to cause one of the originating side MCE15 and the destination side MCE16 to function as a common media processing device when both the caller and the callee use the voice text service. Section. At the calling side AS 13, at least one of the service control unit 131, the session control unit 132, and the service scenario unit 133 corresponds to the control unit. At the destination AS 14, at least one of the service control unit 141, the session control unit 142, and the service scenario unit 143 corresponds to the control unit.

本実施形態では発側ＭＣＥ１５が双方の音声を処理する例を説明する。したがって、図１に示す接続５１と図２に示す接続６１とが利用される。しかし、本開示はその例に限定されるものではなく、着側ＭＣＥ１６が双方の音声を処理してもよい。 In this embodiment, an example in which the calling MCE 15 processes both voices will be described. Therefore, the connection 51 shown in FIG. 1 and the connection 61 shown in FIG. 2 are used. However, the present disclosure is not limited to that example, and the receiving MCE 16 may process both voices.

図３〜図６を参照しながら、本実施形態に係る呼制御システム１の動作の例を説明する。図３〜図６はいずれも呼制御システム１の動作の一例を示すシーケンス図である。図３は呼を確立する処理の例を示す。図４および図５は音声テキスト化サービスを起動する処理の例を示す。図６は音声テキストを通信端末上に表示する処理の例を示す。理解を容易にするために、図３〜図６では、通話および音声テキスト化サービスの制御に特に関係する構成要素、処理、およびデータ信号に限って示す。 An example of the operation of the call control system 1 according to the present embodiment will be described with reference to FIGS. 3 to 6 are sequence diagrams showing an example of the operation of the call control system 1. FIG. 3 shows an example of the process of establishing a call. FIG. 4 and FIG. 5 show an example of processing for activating a voice text service. FIG. 6 shows an example of processing for displaying voice text on a communication terminal. For ease of understanding, FIGS. 3-6 depict only those components, processes, and data signals that are of particular interest to controlling call and voice-to-text services.

まず、図３を参照しながら、呼を確立する処理の例を処理フローＳ１として説明する。 First, an example of a process for establishing a call will be described as a process flow S1 with reference to FIG.

ステップＳ１０１では、発信端末３１が発信者の発信操作に応じてＩＮＶＩＴＥメッセージを送信し、発側ＡＳ１３がそのＩＮＶＩＴＥメッセージを受信する。ＩＮＶＩＴＥメッセージは、発信端末３１と着信端末３２との間に呼（セッション）を確立するために伝送される制御信号（呼確立要求信号）である。このＩＮＶＩＴＥメッセージは発側ネットワーク２１を経由してコアネットワーク１０に入る。コアネットワーク１０では、発側ＣＳＣＦ１１がそのＩＮＶＩＴＥメッセージを発側ＡＳ１３に転送する。 In step S101, the calling terminal 31 sends an INVITE message in response to the calling operation of the caller, and the calling side AS 13 receives the INVITE message. The INVITE message is a control signal (call establishment request signal) transmitted to establish a call (session) between the calling terminal 31 and the receiving terminal 32. This INVITE message enters the core network 10 via the originating network 21. In the core network 10, the originating CSCF 11 transfers the INVITE message to the originating AS 13.

ステップＳ１０２では、サービス制御部１３１がそのＩＮＶＩＴＥメッセージに応答して発信端末３１（発信者）のために音声テキスト化サービスを起動する。サービス制御部１３１は加入者管理機能にアクセスして発信者の加入者情報を参照し、発信者が音声テキスト化サービスを契約しているか否かを判定する。発信者が音声テキスト化サービスを契約している場合に、サービス制御部１３１はサービスを起動する。本実施形態では、発信者が音声テキスト化サービスの契約者であることを前提とする。サービスの起動に関連して、サービス制御部１３１、セッション制御部１３２、およびサービスシナリオ部１３３は連携して、これから確立する呼のセッションＩＤと、発側補助セッションＩＤと、発信端末３１の加入者番号と、着信端末３２の加入者番号とを含むセッション情報をセッションデータベース１９に格納する。 In step S102, the service control unit 131 activates the voice text service for the calling terminal 31 (caller) in response to the INVITE message. The service control unit 131 accesses the subscriber management function, refers to the subscriber information of the caller, and determines whether the caller has a voice text service contract. When the caller subscribes to the voice text service, the service control unit 131 activates the service. In the present embodiment, it is assumed that the caller is a voice text service contractor. In connection with the activation of the service, the service control unit 131, the session control unit 132, and the service scenario unit 133 cooperate with each other to establish the session ID of the call to be established, the calling side auxiliary session ID, and the subscriber of the calling terminal 31. The session information including the number and the subscriber number of the receiving terminal 32 is stored in the session database 19.

ステップＳ１０３では、サービスシナリオ部１３３が発側ＳＭＳ−ＧＷ１７にプッシュ通知を送信し、ステップＳ１０４では、発側ＳＭＳ−ＧＷ１７がそのプッシュ通知に応答して発信端末３１にプッシュ要求を送信する。サービスシナリオ部１３３は、サービス制御部１３１からの指示に応答してユーザプロファイルにアクセスして発信者のユーザ情報を参照し、音声テキスト化サービスの契約状態を判定する。発信者に音声テキスト化サービスを提供できる場合に、サービスシナリオ部１３３はプッシュ通知を送信する。本実施形態では、発信者が音声テキスト化サービスを享受する資格を有することを前提とする。プッシュ要求は、発信端末３１が発側Ｗｅｂサーバ４１から音声テキスト化サービスを受けるために必要な情報（例えば、発信端末３１のデバイストークン、および発側補助セッションＩＤ）を含み、プッシュ通知は、そのプッシュ要求を構成する情報の少なくとも一部を含む。 In step S103, the service scenario unit 133 sends a push notification to the originating SMS-GW 17, and in step S104, the originating SMS-GW 17 sends a push request to the calling terminal 31 in response to the push notification. The service scenario unit 133 accesses the user profile in response to the instruction from the service control unit 131, refers to the user information of the caller, and determines the contract state of the voice text service. When the voice text service can be provided to the caller, the service scenario unit 133 sends a push notification. In this embodiment, it is assumed that the caller is qualified to enjoy the voice text service. The push request includes information necessary for the calling terminal 31 to receive the voice text service from the calling Web server 41 (for example, the device token of the calling terminal 31 and the calling side auxiliary session ID), and the push notification indicates that It contains at least some of the information that makes up the push request.

ステップＳ１０５では、セッション制御部１３２が発側ＭＣＥ１５との接続のためにＩＮＶＩＴＥメッセージを発側ＭＣＥ１５に送信する。発側ＭＣＥ１５はそのＩＮＶＩＴＥメッセージに応答して音声テキスト化サービスのための処理を実行した後に、ステップＳ１０６において２００＿ＯＫメッセージを送信する。２００＿ＯＫメッセージは、ＩＮＶＩＴＥメッセージに対応する処理が正常に実行されたことを示す応答信号である。すなわち、２００＿ＯＫメッセージはＩＮＶＩＴＥメッセージに対応する成功応答信号である。 In step S105, the session control unit 132 transmits an INVITE message to the calling MCE 15 for connection with the calling MCE 15. The calling MCE 15 responds to the INVITE message, executes the process for the voice text service, and then transmits the 200_OK message in step S106. The 200_OK message is a response signal indicating that the process corresponding to the INVITE message has been normally executed. That is, the 200_OK message is a success response signal corresponding to the INVITE message.

ステップＳ１０７では、サービス制御部１３１が着側ＡＳ１４に向けてＩＮＶＩＴＥメッセージを送信する。サービス制御部１３１は、ＩＮＶＩＴＥメッセージのヘッダ情報に、発側ＭＣＥ１５を一意に特定するための識別子である発側メディア装置ＩＤと、発側で音声テキスト化サービスが実行されることを示す発側サービス情報とを付加する。そして、サービス制御部１３１は発側メディア装置ＩＤおよび発側サービス情報を含むＩＮＶＩＴＥメッセージを送信する。このＩＮＶＩＴＥメッセージは発側ＣＳＣＦ１１および着側ＣＳＣＦ１２を経由して着側ＡＳ１４に到達する。 In step S107, the service control unit 131 sends an INVITE message to the destination AS 14. The service control unit 131 includes, in the header information of the INVITE message, a calling-side media device ID that is an identifier for uniquely identifying the calling-side MCE 15, and a calling-side service that indicates that the calling-side voice text service is executed. Add information and. Then, the service control unit 131 transmits an INVITE message including the calling-side media device ID and the calling-side service information. This INVITE message reaches the destination AS 14 via the originating CSCF 11 and the destination CSCF 12.

ステップＳ１０８では、サービス制御部１４１が発側ＡＳ１３からのＩＮＶＩＴＥメッセージに応答して着信端末３２（着信者）のために音声テキスト化サービスを起動する。サービス制御部１４１は加入者管理機能にアクセスして着信者の加入者情報を参照し、着信者が音声テキスト化サービスを契約しているか否かを判定する。着信者が音声テキスト化サービスを契約している場合に、サービス制御部１４１はサービスを起動する。本実施形態では、着信者が音声テキスト化サービスの契約者であることを前提とする。サービスの起動に関連して、サービス制御部１４１、セッション制御部１４２、およびサービスシナリオ部１４３は連携して、これから確立する呼の着側補助セッションＩＤをセッションデータベース１９内の対応するセッション情報に書き込む。 In step S108, the service control unit 141 responds to the INVITE message from the calling side AS 13 and activates the voice text service for the called terminal 32 (callee). The service control unit 141 accesses the subscriber management function, refers to the subscriber information of the called party, and determines whether the called party has a contract for the voice text conversion service. When the called party subscribes to the voice text conversion service, the service control unit 141 activates the service. In the present embodiment, it is assumed that the called party is a voice text service contractor. In connection with the activation of the service, the service control unit 141, the session control unit 142, and the service scenario unit 143 cooperate to write the called party auxiliary session ID of the call to be established in the corresponding session information in the session database 19. ..

ステップＳ１０９では、サービスシナリオ部１４３が着側ＳＭＳ−ＧＷ１８にプッシュ通知を送信し、ステップＳ１１０では、着側ＳＭＳ−ＧＷ１８がそのプッシュ通知に応答して着信端末３２にプッシュ要求を送信する。サービスシナリオ部１４３は、サービス制御部１４１からの指示に応答してユーザプロファイルにアクセスして着信者のユーザ情報を参照し、音声テキスト化サービスの契約状態を判定する。着信者に音声テキスト化サービスを提供できる場合に、サービスシナリオ部１４３はプッシュ通知を送信する。本実施形態では、着信者が音声テキスト化サービスを享受する資格を有することを前提とする。プッシュ要求は、着信端末３２が着側Ｗｅｂサーバ４２から音声テキスト化サービスを受けるために必要な情報（例えば、着信端末３２のデバイストークン、および着側補助セッションＩＤ）を含み、プッシュ通知は、そのプッシュ要求を構成する情報の少なくとも一部を含む。 In step S109, the service scenario unit 143 sends a push notification to the destination SMS-GW 18, and in step S110, the destination SMS-GW 18 sends a push request to the receiving terminal 32 in response to the push notification. In response to the instruction from the service control unit 141, the service scenario unit 143 accesses the user profile, refers to the user information of the called party, and determines the contract status of the voice text service. When the text-to-speech service can be provided to the call recipient, the service scenario unit 143 sends a push notification. In the present embodiment, it is assumed that the called party is qualified to enjoy the voice text service. The push request includes information necessary for the receiving terminal 32 to receive the voice text service from the receiving side Web server 42 (for example, the device token of the receiving terminal 32 and the receiving side auxiliary session ID), and the push notification indicates that It contains at least some of the information that makes up the push request.

ステップＳ１１１では、セッション制御部１４２が着側ＭＣＥ１６との接続のためにＩＮＶＩＴＥメッセージを着側ＭＣＥ１６に送信する。着側ＭＣＥ１６はそのＩＮＶＩＴＥメッセージに応答して音声テキスト化サービスのための処理を実行する。着側ＭＣＥ１６はＩＮＶＩＴＥメッセージ内の発側メディア装置ＩＤおよび発側サービス情報を参照することで、発側で音声テキスト化サービスが実行されることと、発側ＭＣＥ１５がそのサービスを実行することとを認識する。この認識に基づいて、着側ＭＣＥ１６は音声データを音声認識エンジン４３に提供しない。ただし、着側ＭＣＥ１６と着側ＡＳ１４との間の接続は、呼が切断されるまで維持される。ステップＳ１１２では、着側ＭＣＥ１６が２００＿ＯＫメッセージを着側ＡＳ１４に送信する。 In step S111, the session control unit 142 transmits an INVITE message to the destination MCE 16 for connection with the destination MCE 16. The terminating MCE 16 performs processing for the voice text service in response to the INVITE message. The receiving side MCE 16 refers to the calling side media device ID and the calling side service information in the INVITE message so that the calling side MCE 15 executes the voice text service and the calling side MCE 15 executes the service. recognize. Based on this recognition, destination MCE 16 does not provide voice data to voice recognition engine 43. However, the connection between the destination MCE 16 and the destination AS 14 is maintained until the call is disconnected. In step S112, the destination MCE 16 sends a 200_OK message to the destination AS 14.

ステップＳ１１３では、サービス制御部１４１がＩＮＶＩＴＥメッセージを着信端末３２に向けて送信する。ＩＮＶＩＴＥメッセージは着側ＡＳ１４から着側ＣＳＣＦ１２に送られ、着側ＣＳＣＦ１２から着側ネットワーク２２を経由して着信端末３２に送信される。着信端末３２がそのＩＮＶＩＴＥメッセージを受信することで、着信端末３２に対する呼出処理が完了する。 In step S113, the service control unit 141 transmits the INVITE message to the receiving terminal 32. The INVITE message is sent from the callee AS 14 to the callee CSCF 12, and is sent from the callee CSCF 12 to the callee terminal 32 via the callee network 22. When the called terminal 32 receives the INVITE message, the calling process for the called terminal 32 is completed.

ステップＳ１１４では、着信者が電話に出たことに応答して、着信端末３２が２００＿ＯＫメッセージを送信し、この２００＿ＯＫメッセージが着側ネットワーク２２および着側ＣＳＣＦ１２を経由して着側ＡＳ１４に到達する。 In step S114, in response to the callee answering the call, the receiving terminal 32 transmits a 200_OK message, and the 200_OK message reaches the destination AS 14 via the destination network 22 and the destination CSCF 12.

ステップＳ１１５では、着側ＡＳ１４のサービス制御部１４１、セッション制御部１４２、およびサービスシナリオ部１４３のそれぞれがそのメッセージを処理し、最後にサービス制御部１４１が２００＿ＯＫメッセージを発側ＡＳ１３に向けて送信する。サービス制御部１４１は、２００＿ＯＫメッセージのヘッダ情報に、着側ＭＣＥ１６を一意に特定するための識別子である着側メディア装置ＩＤと、着側で音声テキスト化サービスが実行されることを示す着側サービス情報とを付加する。そして、サービス制御部１４１は着側メディア装置ＩＤおよび着側サービス情報を含む２００＿ＯＫメッセージを送信する。この２００＿ＯＫメッセージは着側ＣＳＣＦ１２および発側ＣＳＣＦ１１を経由して発側ＡＳ１３に到達する。 In step S115, each of the service control unit 141, the session control unit 142, and the service scenario unit 143 of the destination AS 14 processes the message, and finally the service control unit 141 transmits the 200_OK message to the originating AS 13. .. The service control unit 141, in the header information of the 200_OK message, the destination media device ID, which is an identifier for uniquely identifying the destination MCE 16, and the destination service indicating that the voice text conversion service is executed on the destination. Add information and. Then, the service control unit 141 transmits a 200_OK message including the destination media device ID and the destination service information. This 200_OK message reaches the originating AS 13 via the destination CSCF 12 and the originating CSCF 11.

ステップＳ１１６では、セッション制御部１３２がその２００＿ＯＫメッセージを発側ＭＣＥ１５に送信する。発側ＭＣＥ１５はその２００＿ＯＫメッセージ内の着側メディア装置ＩＤおよび着側サービス情報を参照することで、着側でも音声テキスト化サービスが実行されることを認識する。この認識に基づいて、発側ＭＣＥ１５は発信端末３１からの音声データと着信端末３２からの音声データとを音声認識エンジン４３に提供する。このように、発側ＡＳ１３は発側ＭＣＥ１５を共通のメディア処理装置として機能させる。ステップＳ１１７では、発側ＭＣＥ１５が２００＿ＯＫメッセージを発側ＡＳ１３に返し、ステップＳ１１８では、発側ＡＳ１３がその２００＿ＯＫメッセージを発信端末３１に向けて送信する。２００＿ＯＫメッセージは発側ＣＳＣＦ１１および発側ネットワーク２１を経由して発信端末３１に到達する。 In step S116, the session control unit 132 transmits the 200_OK message to the calling MCE 15. The calling side MCE 15 recognizes that the voice text service is also executed on the called side by referring to the called side media device ID and the called side service information in the 200_OK message. Based on this recognition, the originating MCE 15 provides the voice recognition engine 43 with the voice data from the calling terminal 31 and the voice data from the receiving terminal 32. In this way, the originating AS 13 causes the originating MCE 15 to function as a common media processing device. In step S117, the calling side MCE 15 returns a 200_OK message to the calling side AS 13, and in step S118, the calling side AS 13 transmits the 200_OK message to the calling terminal 31. The 200_OK message reaches the calling terminal 31 via the calling CSCF 11 and the calling network 21.

ステップＳ１１９では、発信端末３１が２００＿ＯＫメッセージを受信することで、発信端末３１と着信端末３２との間に、データ信号を伝送するためのＵ−Ｐｌａｎｅ（ユーザ・プレイン）のバスが確立される。すなわち、発信端末３１と着信端末３２との間に呼が確立される。この結果、発信端末３１と着信端末３２との間で通話が可能になる。 In step S119, when the calling terminal 31 receives the 200_OK message, a U-Plane (user plane) bus for transmitting a data signal is established between the calling terminal 31 and the called terminal 32. That is, a call is established between the originating terminal 31 and the terminating terminal 32. As a result, a call becomes possible between the calling terminal 31 and the receiving terminal 32.

次に、図４を参照しながら、音声テキスト化サービスを起動する処理の例を処理フローＳ２として説明する。この例は、通信端末での音声テキスト化サービスの開始のタイミングが発信端末３１と着信端末３２との間で同じかまたはほぼ同じ場合を示す。 Next, with reference to FIG. 4, an example of processing for activating the voice text service will be described as a processing flow S2. This example shows a case where the start timing of the voice text service at the communication terminal is the same or almost the same between the calling terminal 31 and the receiving terminal 32.

ステップＳ２０１では、発信端末３１が音声テキスト化サービスのためのアプリケーションプログラムを起動するために接続要求を発側Ｗｅｂサーバ４１に送信する。接続要求は発信端末３１と発側Ｗｅｂサーバ４１との間に通信接続を確立するためのデータ信号であり、プッシュ要求により提供された情報の少なくとも一部（例えば、発信端末３１のデバイストークン、および発側補助セッションＩＤ）を含む。 In step S201, the calling terminal 31 transmits a connection request to the calling side Web server 41 to activate an application program for the voice text service. The connection request is a data signal for establishing a communication connection between the originating terminal 31 and the originating Web server 41, and at least a part of the information provided by the push request (for example, the device token of the originating terminal 31, and Calling party auxiliary session ID).

ステップＳ２０２では、発側Ｗｅｂサーバ４１と発側ＡＳ１３のサービスシナリオ部１３３との間で、発信者を認証するための処理が実行される。発側Ｗｅｂサーバ４１は、接続要求により提供された情報の少なくとも一部（例えば、発信端末３１のデバイストークン）を含む認証要求を発側ＡＳ１３に送信する。サービスシナリオ部１３３はその認証要求に応答して認証処理を実行する。例えば、サービスシナリオ部１３３はデバイストークンが有効か否かを検査する。サービスシナリオ部１３３はその処理結果を発側Ｗｅｂサーバ４１に送信する。本実施形態では、発信者が認証されることを前提とする。 In step S202, a process for authenticating the caller is executed between the calling side Web server 41 and the service scenario section 133 of the calling side AS 13. The originating Web server 41 transmits an authentication request including at least a part of the information provided by the connection request (for example, the device token of the originating terminal 31) to the originating AS 13. The service scenario unit 133 executes the authentication process in response to the authentication request. For example, the service scenario unit 133 checks whether the device token is valid. The service scenario unit 133 sends the processing result to the originating Web server 41. In the present embodiment, it is assumed that the sender is authenticated.

ステップＳ２０３では、発信端末３１が音声テキスト化サービスのためのアプリケーションプログラムを起動させて起動信号を発側Ｗｅｂサーバ４１に送信する。起動信号はそのアプリケーションプログラムを実行するためのデータ信号である。 In step S203, the transmission terminal 31 activates the application program for the voice text service and transmits the activation signal to the originating Web server 41. The activation signal is a data signal for executing the application program.

ステップＳ２０４では、発側Ｗｅｂサーバ４１がその起動信号に応答して発側ＡＳ１３にイベント通知を送信する。このイベント通知は発側エンドポイントおよび発側補助セッションＩＤを含む。 In step S204, the originating Web server 41 transmits an event notification to the originating AS 13 in response to the activation signal. This event notification includes the originating endpoint and the originating auxiliary session ID.

ステップＳ２０５では、発側ＡＳ１３のサービスシナリオ部１３３が発側エンドポイントをセッションデータベース１９に登録する。サービスシナリオ部１３３は、発側補助セッションＩＤに対応するセッション情報に発側エンドポイントを書き込む。この登録処理により、現在確立されている呼（セッション）での音声テキストを発側Ｗｅｂサーバ４１経由で発信端末３１に送信することが可能になる。 In step S205, the service scenario unit 133 of the originating AS 13 registers the originating endpoint in the session database 19. The service scenario unit 133 writes the calling side endpoint in the session information corresponding to the calling side auxiliary session ID. By this registration processing, it becomes possible to transmit the voice text in the currently established call (session) to the calling terminal 31 via the calling side Web server 41.

着側でもステップＳ２０１〜Ｓ２０５と同様の処理が実行される。その同様の処理をステップＳ２１１〜Ｓ２１５として示す。 The same process as steps S201 to S205 is executed on the receiving side. The same processing is shown as steps S211 to S215.

ステップＳ２１１では、着信端末３２が音声テキスト化サービスのためのアプリケーションプログラムを起動するために接続要求を着側Ｗｅｂサーバ４２に送信する。接続要求は、プッシュ要求により提供された情報の少なくとも一部（例えば、着信端末３２のデバイストークン、および着側補助セッションＩＤ）を含む。 In step S211, the receiving terminal 32 transmits a connection request to the destination Web server 42 to activate the application program for the voice text service. The connection request includes at least a part of the information provided by the push request (for example, the device token of the receiving terminal 32 and the called party auxiliary session ID).

ステップＳ２１２では、着側Ｗｅｂサーバ４２と着側ＡＳ１４のサービスシナリオ部１４３との間で、発信者を認証するための処理が実行される。本実施形態では、着信者も認証されることを前提とする。 In step S212, a process for authenticating the caller is executed between the destination Web server 42 and the service scenario unit 143 of the destination AS 14. In this embodiment, it is assumed that the called party is also authenticated.

ステップＳ２１３では、着信端末３２が音声テキスト化サービスのためのアプリケーションプログラムを起動させて起動信号を着側Ｗｅｂサーバ４２に送信する。 In step S213, the receiving terminal 32 activates the application program for the voice text service and transmits the activation signal to the destination Web server 42.

ステップＳ２１４では、着側Ｗｅｂサーバ４２がその起動信号に応答して着側ＡＳ１４にイベント通知を送信する。このイベント通知は着側エンドポイントおよび着側補助セッションＩＤを含む。 In step S214, the destination Web server 42 sends an event notification to the destination AS 14 in response to the activation signal. This event notification includes the destination endpoint and the destination auxiliary session ID.

ステップＳ２１５では、着側ＡＳ１４のサービスシナリオ部１４３が着側エンドポイントをセッションデータベース１９に登録する。サービスシナリオ部１４３は、着側補助セッションＩＤに対応するレコードに着側エンドポイントを書き込む。この登録処理により、現在確立されている呼（セッション）での音声テキストを着側Ｗｅｂサーバ４２経由で着信端末３２に送信することが可能になる。 In step S215, the service scenario unit 143 of the destination AS 14 registers the destination endpoint in the session database 19. The service scenario unit 143 writes the destination endpoint in the record corresponding to the destination auxiliary session ID. By this registration processing, it becomes possible to transmit the voice text of the currently established call (session) to the receiving terminal 32 via the destination Web server 42.

発側では、ステップＳ２０５の後にステップＳ２０６，Ｓ２０７が実行される。ステップＳ２０６では、発信端末３１が、発信者が音声テキスト化サービスの利用に同意することを示す同意信号を発側Ｗｅｂサーバ４１に送信する。ステップＳ２０７では、発側Ｗｅｂサーバ４１がその同意信号に応答して発側ＡＳ１３にイベント通知を送信する。このイベント通知は発信者の同意を示す。これらの同意信号およびイベント通知はいずれも発側補助セッションＩＤを含む。 On the calling side, steps S206 and S207 are executed after step S205. In step S206, the calling terminal 31 transmits a consent signal indicating that the caller agrees to use the voice text service to the calling-side Web server 41. In step S207, the originating Web server 41 transmits an event notification to the originating AS 13 in response to the consent signal. This event notification indicates the sender's consent. Both of these consent signals and event notifications include the originating auxiliary session ID.

着側では、ステップＳ２１５の後にステップＳ２１６，Ｓ２１７が実行される。ステップＳ２１６では、着信端末３２が、着信者が音声テキスト化サービスの利用に同意することを示す同意信号を着側Ｗｅｂサーバ４２に送信する。ステップＳ２１７では、着側Ｗｅｂサーバ４２がその同意信号に応答して発側ＡＳ１３に向けてイベント通知を送信する。このイベント通知は着信者の同意を示す。これらの同意信号およびイベント通知はいずれも着側補助セッションＩＤを含む。 On the receiving side, steps S216 and S217 are executed after step S215. In step S216, the receiving terminal 32 transmits a consent signal indicating that the callee agrees to use the voice text service to the destination Web server 42. In step S217, the destination Web server 42 sends an event notification to the originating AS 13 in response to the consent signal. This event notification indicates the recipient's consent. Both of these consent signals and event notifications include the receiving side auxiliary session ID.

ステップＳ２０８では、サービスシナリオ部１３３が、ステップＳ２０７，Ｓ２１７での二つのイベント通知に基づいて、確立された呼に対応するセッション情報の認識方向を「双方向」に設定する。具体的には、サービスシナリオ部１３３はセッションデータベース１９にアクセスして、発側または着側の補助セッションＩＤに対応するセッション情報を特定し、このセッション情報の認識方向を「双方向」に設定する。このように、サービスシナリオ部１３３は、発信端末３１および着信端末３２の双方から同意信号が送信されたことに応答して認識方向を「双方向」に設定する。この結果、ステップＳ２２０で示すように、発着側の双方で音声テキスト化サービスが実行される。 In step S208, the service scenario unit 133 sets the recognition direction of the session information corresponding to the established call to "bidirectional" based on the two event notifications in steps S207 and S217. Specifically, the service scenario unit 133 accesses the session database 19 to identify the session information corresponding to the auxiliary session ID of the calling side or the called side, and sets the recognition direction of this session information to “bidirectional”. .. In this way, the service scenario unit 133 sets the recognition direction to “bidirectional” in response to the consent signals transmitted from both the calling terminal 31 and the receiving terminal 32. As a result, as shown in step S220, the voice text service is executed on both the caller side and the callee side.

次に、図５を参照しながら、音声テキスト化サービスを起動する処理の別の例を処理フローＳ２Ａとして説明する。この例は、通信端末での音声テキスト化サービスの開始のタイミングが発信端末３１と着信端末３２との間で異なる場合を示し、より具体的には、着信端末３２が発信端末３１よりも後に音声テキスト化サービスを開始する場合を示す。 Next, with reference to FIG. 5, another example of the process of activating the voice text service will be described as a process flow S2A. This example shows a case where the start timing of the voice text service at the communication terminal is different between the calling terminal 31 and the called terminal 32. More specifically, the called terminal 32 outputs a voice after the calling terminal 31. The case where the text service is started is shown.

処理フローＳ２Ａでも処理フローＳ２と同様に、発側ではステップＳ２０１〜Ｓ２０７が実行される。音声テキスト化サービスのアプリケーションプログラムの起動に関する処理のタイミングが発側と着側とである程度大きく異なる場合には、発側ではステップＳ２０７の後にステップＳ２０８Ａが実行される。このステップＳ２０８Ａでは、サービスシナリオ部１３３が、ステップＳ２０７でのイベント通知に基づいて、確立された呼に対応するセッション情報（発側補助セッションＩＤに対応するセッション情報）の認識方向を「発側」に設定する。この結果、ステップＳ２２１に示すように、発信端末３１でのみ音声テキスト化サービスが実行される。 Also in the processing flow S2A, steps S201 to S207 are executed on the calling side as in the processing flow S2. When the timings of the processes relating to the activation of the application program of the voice text service are significantly different between the calling side and the called side, the calling side executes step S208A after step S207. In step S208A, the service scenario unit 133 sets the recognition direction of the session information (session information corresponding to the calling side auxiliary session ID) corresponding to the established call to "calling side" based on the event notification in step S207. Set to. As a result, as shown in step S221, the voice text service is executed only on the calling terminal 31.

ステップＳ２２１の後に、着側でステップＳ２１１〜Ｓ２１７が実行されると、発側ではステップＳ２０８Ｂが実行される。このステップＳ２０８Ｂでは、サービスシナリオ部１３３が、ステップＳ２１７でのイベント通知に基づいて、確立された呼に対応するセッション情報（着側補助セッションＩＤに対応するセッション情報）の認識方向を「発側」から「双方向」に更新する。このように、サービスシナリオ部１３３は、発信端末３１および着信端末３２の双方から同意信号が送信されたことに応答して認識方向を「双方向」に設定する。この結果、ステップＳ２２２で示すように、発着側の双方で音声テキスト化サービスが実行可能になる。ステップＳ２２２は処理フローＳ２におけるステップＳ２２０と同じである。 When steps S211 to S217 are executed on the receiving side after step S221, step S208B is executed on the calling side. In step S208B, the service scenario unit 133 sets the recognition direction of the session information corresponding to the established call (session information corresponding to the called-side auxiliary session ID) to "calling side" based on the event notification in step S217. To "bidirectional". In this way, the service scenario unit 133 sets the recognition direction to “bidirectional” in response to the consent signals transmitted from both the calling terminal 31 and the receiving terminal 32. As a result, as shown in step S222, the voice text service can be executed on both the caller side and the callee side. Step S222 is the same as step S220 in the process flow S2.

次に、図６を参照しながら、音声テキストを通信端末上に表示する処理の例を処理フローＳ３として説明する。処理フローＳ３は、発着側の双方で音声テキスト化サービスが実行可能になったこと（すなわち、ステップＳ２２０またはＳ２２２）を前提とする。 Next, with reference to FIG. 6, an example of processing for displaying voice text on the communication terminal will be described as a processing flow S3. The processing flow S3 is based on the premise that the voice text service can be executed on both the calling side (that is, step S220 or S222).

ステップＳ３０１〜Ｓ３０９は、着信者の音声（着側音声）をテキストに変換して、その音声テキストを発信端末３１および着信端末３２の双方に表示にする処理を示す。 Steps S301 to S309 show a process of converting the voice of the called party (callee voice) into text and displaying the voice text on both the calling terminal 31 and the receiving terminal 32.

ステップＳ３０１では、着信端末３２から送信された音声データ（着側音声）が着側ネットワーク２２を介してコアネットワーク１０に送られ、着側ＣＳＣＦ１２、発側ＣＳＣＦ１１、発側ＡＳ１３などの通信制御装置を経由して発側ＭＣＥ１５に送信される。ステップＳ３０２では発側ＭＣＥ１５がその音声データを音声認識エンジン４３に送信する。ステップＳ３０３では、音声認識エンジン４３がその音声データに対して音声認識を実行することで着側音声をテキストに変換し、その音声テキストを発側ＭＣＥ１５に送信する。この音声テキストは着側テキストに相当する。 In step S301, the voice data (callee voice) sent from the receiving terminal 32 is sent to the core network 10 via the callee network 22, and the callee CSCF12, caller CSCF11, caller AS13, and other communication control devices are activated. It is transmitted to the calling side MCE 15 via. In step S302, the calling MCE 15 transmits the voice data to the voice recognition engine 43. In step S303, the voice recognition engine 43 performs voice recognition on the voice data to convert the destination voice into text, and transmits the voice text to the originating MCE 15. This voice text corresponds to the recipient text.

ステップＳ３０４では、発側ＭＣＥ１５が、その音声テキストと、発話者が誰であるかを示す発話種別とを含む認識結果を発側Ｗｅｂサーバ４１に送信する。音声テキストは着側音声を示すので、このステップで送信される認識結果では、発話種別は着側を示す。ステップＳ３０５では、発側ＭＣＥ１５がその認識結果を着側Ｗｅｂサーバ４２にも送信する。発側ＭＣＥ１５は発側ＡＳ１３を介して現在の呼に対応するセッション情報をセッションデータベース１９から取得する。セッション情報の認識方向が「双方向」であることに応答して、発側ＭＣＥ１５はそのセッション情報から発側エンドポイントおよび着側エンドポイントを取得する。発側ＭＣＥ１５はこれらのエンドポイントにより認識結果の送信先（すなわち、発側Ｗｅｂサーバ４１および着側Ｗｅｂサーバ４２）を取得することができる。このように、発側ＭＣＥ１５は、認識方向が「双方向」であることに応答して着側テキストを発側Ｗｅｂサーバ４１および着側Ｗｅｂサーバ４２の双方に向けて送信する。 In step S304, the calling MCE 15 transmits the recognition result including the voice text and the utterance type indicating who is the speaker to the calling Web server 41. Since the voice text indicates the receiving side voice, in the recognition result transmitted in this step, the utterance type indicates the receiving side. In step S305, the originating MCE 15 also transmits the recognition result to the destination Web server 42. The calling MCE 15 acquires the session information corresponding to the current call from the session database 19 via the calling AS 13. In response to the recognition direction of the session information being “bidirectional”, the originating MCE 15 acquires the originating endpoint and the terminating endpoint from the session information. The originating MCE 15 can acquire the transmission destination of the recognition result (that is, the originating Web server 41 and the destination Web server 42) by these endpoints. In this way, the originating MCE 15 sends the destination text to both the originating Web server 41 and the destination Web server 42 in response to the recognition direction being “bidirectional”.

ステップＳ３０６では、発側Ｗｅｂサーバ４１が発信端末３１に認識結果を送信する。発側Ｗｅｂサーバ４１は、認識結果に含まれる発話種別が着側であることに基づいて、音声テキストが通話相手のものとして発信端末３１上に表示されるように、音声テキストを含むデータを生成する。 In step S306, the originating Web server 41 sends the recognition result to the originating terminal 31. The calling-side Web server 41 generates data including the voice text so that the voice text is displayed on the calling terminal 31 as that of the other party on the basis of the utterance type included in the recognition result being the receiving side. To do.

ステップＳ３０７では、発信端末３１がそのデータに基づいて、音声テキストを着信者（通話相手）のものとして画面上に表示する。これにより、発信者は相手が話した内容を視覚的に認識できる。 In step S307, the calling terminal 31 displays the voice text on the screen as that of the called party (call partner) based on the data. This allows the caller to visually recognize what the other party has spoken.

ステップＳ３０８では、着側Ｗｅｂサーバ４２が着信端末３２に認識結果を送信する。着側Ｗｅｂサーバ４２は、認識結果に含まれる発話種別が着側であることに基づいて、音声テキストが着信者自身のものとして着信端末３２上に表示されるように、音声テキストを含むデータを生成する。 In step S308, the destination Web server 42 sends the recognition result to the receiving terminal 32. Based on the utterance type included in the recognition result being the receiving side, the receiving side Web server 42 receives the data including the voice text so that the voice text is displayed on the receiving terminal 32 as the recipient's own. To generate.

ステップＳ３０９では、着信端末３２がそのデータに基づいて、音声テキストを着信者自身のものとして画面上に表示する。これにより、着信者は自分の発話を視覚的に認識できる。 In step S309, the receiving terminal 32 displays the voice text on the screen as the recipient's own, based on the data. This allows the callee to visually recognize his or her utterance.

ステップＳ３１０〜Ｓ３１８は、発信者の音声（発側音声）をテキストに変換して、その音声テキストを発信端末３１および着信端末３２の双方に表示にする処理を示す。 Steps S310 to S318 show a process of converting the voice of the caller (calling side voice) into text and displaying the voice text on both the calling terminal 31 and the receiving terminal 32.

ステップＳ３１０では、発信端末３１から送信された音声データ（発側音声）が発側ネットワーク２１を介してコアネットワーク１０に送られ、発側ＣＳＣＦ１１および発側ＡＳ１３を経由して発側ＭＣＥ１５に送信される。ステップＳ３１１では発側ＭＣＥ１５がその音声データを音声認識エンジン４３に送信する。ステップＳ３１２では、音声認識エンジン４３がその音声データに対して音声認識を実行することで発側音声をテキストに変換し、その音声テキストを発側ＭＣＥ１５に送信する。この音声テキストは発側テキストに相当する。 In step S310, the voice data (calling side voice) sent from the calling terminal 31 is sent to the core network 10 via the calling side network 21, and is sent to the calling side MCE 15 via the calling side CSCF 11 and the calling side AS13. It In step S311, the originating MCE 15 transmits the voice data to the voice recognition engine 43. In step S312, the voice recognition engine 43 performs voice recognition on the voice data to convert the calling voice into text, and transmits the voice text to the calling MCE 15. This audio text corresponds to the calling text.

ステップＳ３１３では、発側ＭＣＥ１５が、その音声テキストと、発話者が誰であるかを示す発話種別とを含む認識結果を発側Ｗｅｂサーバ４１に送信する。音声テキストは発側音声を示すので、このステップで送信される認識結果では、発話種別は発側を示す。ステップＳ３１４では、発側ＭＣＥ１５がその認識結果を着側Ｗｅｂサーバ４２にも送信する。発側ＭＣＥ１５は発側ＡＳ１３を介して、現在の呼に対応するセッション情報をセッションデータベース１９から取得する。セッション情報の認識方向が「双方向」であることに応答して、発側ＭＣＥ１５はそのセッション情報から発側エンドポイントおよび着側エンドポイントを取得し、これにより発側Ｗｅｂサーバ４１および着側Ｗｅｂサーバ４２を特定できる。このように、発側ＭＣＥ１５は、認識方向が「双方向」であることに応答して発側テキストを発側Ｗｅｂサーバ４１および着側Ｗｅｂサーバ４２の双方に向けて送信する。 In step S313, the calling MCE 15 transmits the recognition result including the voice text and the utterance type indicating who is the speaker to the calling Web server 41. Since the voice text indicates the calling side voice, the utterance type indicates the calling side in the recognition result transmitted in this step. In step S314, the originating MCE 15 also transmits the recognition result to the destination Web server 42. The calling MCE 15 acquires the session information corresponding to the current call from the session database 19 via the calling AS 13. In response to the recognition direction of the session information being “bidirectional”, the originating MCE 15 acquires the originating endpoint and the terminating endpoint from the session information, and thereby the originating Web server 41 and the terminating Web server. The server 42 can be specified. In this way, the calling-side MCE 15 transmits the calling-side text to both the calling-side Web server 41 and the called-side Web server 42 in response to the recognition direction being “bidirectional”.

ステップＳ３１５では、発側Ｗｅｂサーバ４１が発信端末３１に認識結果を送信する。発側Ｗｅｂサーバ４１は、認識結果に含まれる発話種別が発側であることに基づいて、音声テキストが発信者自身のものとして発信端末３１上に表示されるように、音声テキストを含むデータを生成する。 In step S315, the originating Web server 41 sends the recognition result to the originating terminal 31. The calling side Web server 41 processes the data including the voice text so that the voice text is displayed on the calling terminal 31 as the caller's own, based on that the utterance type included in the recognition result is the calling side. To generate.

ステップＳ３１６では、発信端末３１がそのデータに基づいて、音声テキストを発信者自身のものとして画面上に表示する。これにより、発信者は自分の発話を視覚的に認識できる。 In step S316, the calling terminal 31 displays the voice text on the screen as the caller's own, based on the data. This allows the caller to visually recognize his or her utterance.

ステップＳ３１７では、着側Ｗｅｂサーバ４２が着信端末３２に認識結果を送信する。着側Ｗｅｂサーバ４２は、認識結果に含まれる発話種別が発側であることに基づいて、音声テキストが通話相手のものとして着信端末３２上に表示されるように、音声テキストを含むデータを生成する。 In step S317, the destination Web server 42 sends the recognition result to the receiving terminal 32. Based on the utterance type included in the recognition result being the calling party, the receiving-side Web server 42 generates data including the voice text so that the voice text is displayed on the receiving terminal 32 as that of the other party. To do.

ステップＳ３１８では、着信端末３２がそのデータに基づいて、音声テキストを発信者（通話相手）のものとして画面上に表示する。これにより、着信者は相手が話した内容を視覚的に認識できる。 In step S318, the receiving terminal 32 displays the voice text on the screen as that of the caller (call partner) based on the data. This allows the callee to visually recognize what the other party has spoken.

このように、双方のＷｅｂサーバは発話種別に基づいて音声テキストの表示態様を設定する。音声テキストを発話者自身または通話相手のものとして表示する手法は何ら限定されず、任意の手法が採用されてよい。Ｗｅｂサーバは発話種別に応じて音声テキストの表示位置（たとえば、音声テキストの吹き出しの表示位置）を変えてもよい。例えば、Ｗｅｂサーバは、発話者自身の音声テキストが右側（一方の側の一例）に表示され、通話相手の音声テキストが左側（他方の側の一例）に表示されるように表示態様を制御してもよい。あるいは、Ｗｅｂサーバは発話種別に応じて、音声テキストのフォントを変えてもよいし、吹き出しの形状または背景色を変えてもよい。 In this way, both Web servers set the display mode of voice text based on the utterance type. The method for displaying the voice text as that of the speaker himself or the other party of the call is not limited in any way, and any method may be adopted. The Web server may change the display position of the voice text (for example, the display position of the voice text balloon) according to the utterance type. For example, the Web server controls the display mode so that the voice text of the speaker himself is displayed on the right side (an example of one side) and the voice text of the call partner is displayed on the left side (an example of the other side). May be. Alternatively, the Web server may change the font of the voice text or the shape of the balloon or the background color depending on the utterance type.

発話種別に基づく音声テキストの表示態様の設定は発信端末３１および着信端末３２で実行されてもよい。具体的には、発側Ｗｅｂサーバ４１および着側Ｗｅｂサーバ４２のそれぞれが、音声テキストと共に発話種別も、対応する通信端末に送信することで、該通信端末にその発話種別に基づいて音声テキストの表示態様を設定させてもよい。この仕組みによっても、発信端末３１および着信端末３２のそれぞれは、表示位置、フォント、吹き出しの形状または背景色などの表示態様を設定することができる。 The setting of the display mode of the voice text based on the utterance type may be executed by the transmitting terminal 31 and the receiving terminal 32. Specifically, each of the calling-side Web server 41 and the receiving-side Web server 42 transmits the speech text together with the speech type to the corresponding communication terminal, so that the speech text of the speech text is transmitted to the communication terminal based on the speech type. You may set a display mode. With this mechanism as well, each of the transmitting terminal 31 and the receiving terminal 32 can set the display mode such as the display position, the font, the shape of the balloon, or the background color.

本実施形態ではコアネットワーク１０がＩＭＳネットワークであるが、本開示に係る呼制御システムは任意の種類のコアネットワークに適用されてもよい。これに関連して、本開示に係る呼制御システムはＳＩＰ以外の通信プロトコルを用いてもよい。 In the present embodiment, the core network 10 is an IMS network, but the call control system according to the present disclosure may be applied to any type of core network. In this regard, the call control system according to the present disclosure may use communication protocols other than SIP.

発側ＡＳ１３に実装される機能要素の少なくとも一部は、発側ＡＳ１３以外の通信制御装置に実装されてもよい。同様に、着側ＡＳ１４に実装される機能要素の少なくとも一部は、着側ＡＳ１４以外の通信制御装置に実装されてもよい。 At least a part of the functional elements mounted on the calling side AS 13 may be mounted on a communication control device other than the calling side AS 13. Similarly, at least a part of the functional elements mounted on the destination AS 14 may be mounted on a communication control device other than the destination AS 14.

上記実施形態の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成部）は、ハードウェア及びソフトウェアの少なくとも一方の任意の組み合わせによって実現される。また、各機能ブロックの実現方法は特に限定されない。すなわち、各機能ブロックは、物理的又は論理的に結合した１つの装置を用いて実現されてもよいし、物理的又は論理的に分離した２つ以上の装置を直接的又は間接的に（例えば、有線、無線などを用いて）接続し、これら複数の装置を用いて実現されてもよい。機能ブロックは、上記１つの装置又は上記複数の装置にソフトウェアを組み合わせて実現されてもよい。 The block diagrams used in the description of the above embodiment show blocks of functional units. These functional blocks (components) are realized by an arbitrary combination of at least one of hardware and software. The method of realizing each functional block is not particularly limited. That is, each functional block may be realized by using one device physically or logically coupled, or directly or indirectly (for example, two or more devices physically or logically separated). , Wired, wireless, etc.) and may be implemented using these multiple devices. The functional blocks may be realized by combining the one device or the plurality of devices with software.

機能には、判断、決定、判定、計算、算出、処理、導出、調査、探索、確認、受信、送信、出力、アクセス、解決、選択、選定、確立、比較、想定、期待、見做し、報知（broadcasting）、通知（notifying）、通信（communicating）、転送（forwarding）、構成（configuring）、再構成（reconfiguring）、割り当て（allocating、mapping）、割り振り（assigning）などがあるが、これらに限られない。たとえば、送信を機能させる機能ブロック（構成部）は、送信部（transmitting unit）や送信機（transmitter）と呼称される。いずれも、上述したとおり、実現方法は特に限定されない。 Functions include judgment, decision, judgment, calculation, calculation, processing, derivation, investigation, search, confirmation, reception, transmission, output, access, resolution, selection, selection, establishment, comparison, assumption, expectation, observation, Broadcasting, notifying, communicating, forwarding, configuration, reconfiguring, allocating, mapping, assigning, etc., but not limited to these. I can't. For example, a functional block (component) that causes transmission to function is called a transmitting unit or a transmitter. In any case, as described above, the implementation method is not particularly limited.

例えば、本開示の一実施の形態における通信制御装置は、本開示の処理を行うコンピュータとして機能してもよい。図７は、その通信制御装置として機能するコンピュータ１００のハードウェア構成の一例を示す図である。コンピュータ１００は、物理的には、プロセッサ１００１、メモリ１００２、ストレージ１００３、通信装置１００４、入力装置１００５、出力装置１００６、バス１００７などを含んでもよい。 For example, the communication control device according to the embodiment of the present disclosure may function as a computer that performs the processing of the present disclosure. FIG. 7 is a diagram illustrating an example of the hardware configuration of the computer 100 that functions as the communication control device. The computer 100 may physically include a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like.

なお、以下の説明では、「装置」という文言は、回路、デバイス、ユニットなどに読み替えることができる。通信制御装置のハードウェア構成は、図に示した各装置を１つ又は複数含むように構成されてもよいし、一部の装置を含まずに構成されてもよい。 In the following description, the word “device” can be read as a circuit, a device, a unit, or the like. The hardware configuration of the communication control device may be configured to include one or a plurality of each device shown in the figure, or may be configured not to include some devices.

通信制御装置における各機能は、プロセッサ１００１、メモリ１００２などのハードウェア上に所定のソフトウェア（プログラム）を読み込ませることによって、プロセッサ１００１が演算を行い、通信装置１００４による通信を制御したり、メモリ１００２及びストレージ１００３におけるデータの読み出し及び書き込みの少なくとも一方を制御したりすることによって実現される。 For each function in the communication control device, a predetermined software (program) is read on hardware such as the processor 1001 and the memory 1002, so that the processor 1001 performs an arithmetic operation to control communication by the communication device 1004 and the memory 1002. Also, it is realized by controlling at least one of reading and writing of data in the storage 1003.

プロセッサ１００１は、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサ１００１は、周辺装置とのインターフェース、制御装置、演算装置、レジスタなどを含む中央処理装置（ＣＰＵ：Central Processing Unit）によって構成されてもよい。 The processor 1001 operates an operating system to control the entire computer, for example. The processor 1001 may be configured by a central processing unit (CPU) including an interface with peripheral devices, a control device, an arithmetic device, a register, and the like.

また、プロセッサ１００１は、プログラム（プログラムコード）、ソフトウェアモジュール、データなどを、ストレージ１００３及び通信装置１００４の少なくとも一方からメモリ１００２に読み出し、これらに従って各種の処理を実行する。プログラムとしては、上述の実施の形態において説明した動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。例えば、通信制御装置の各機能要素は、メモリ１００２に格納され、プロセッサ１００１において動作する制御プログラムによって実現されてもよい。上述の各種処理は、１つのプロセッサ１００１によって実行される旨を説明してきたが、２以上のプロセッサ１００１により同時又は逐次に実行されてもよい。プロセッサ１００１は、１以上のチップによって実装されてもよい。なお、プログラムは、電気通信回線を介してネットワークから送信されてもよい。 Further, the processor 1001 reads a program (program code), a software module, data, and the like into the memory 1002 from at least one of the storage 1003 and the communication device 1004, and executes various processes according to these. As the program, a program that causes a computer to execute at least a part of the operations described in the above-described embodiments is used. For example, each functional element of the communication control device may be realized by a control program stored in the memory 1002 and operating in the processor 1001. Although it has been described that the various processes described above are executed by one processor 1001, they may be executed simultaneously or sequentially by two or more processors 1001. The processor 1001 may be implemented by one or more chips. The program may be transmitted from the network via an electric communication line.

メモリ１００２は、コンピュータ読み取り可能な記録媒体であり、例えば、ＲＯＭ（Read Only Memory）、ＥＰＲＯＭ（Erasable Programmable ＲＯＭ）、ＥＥＰＲＯＭ（Electrically Erasable Programmable ＲＯＭ）、ＲＡＭ（Random Access Memory）などの少なくとも１つによって構成されてもよい。メモリ１００２は、レジスタ、キャッシュ、メインメモリ（主記憶装置）などと呼ばれてもよい。メモリ１００２は、本開示の一実施の形態に係る方法を実施するために実行可能なプログラム（プログラムコード）、ソフトウェアモジュールなどを保存することができる。 The memory 1002 is a computer-readable recording medium, and is configured by at least one of a ROM (Read Only Memory), an EPROM (Erasable Programmable ROM), an EEPROM (Electrically Erasable Programmable ROM), a RAM (Random Access Memory), and the like. May be done. The memory 1002 may be called a register, a cache, a main memory (main storage device), or the like. The memory 1002 may store an executable program (program code), a software module, etc. for performing the method according to the embodiment of the present disclosure.

ストレージ１００３は、コンピュータ読み取り可能な記録媒体であり、例えば、ＣＤ−ＲＯＭ（Compact Disc ＲＯＭ）などの光ディスク、ハードディスクドライブ、フレキシブルディスク、光磁気ディスク(例えば、コンパクトディスク、デジタル多用途ディスク、Ｂｌｕ−ｒａｙ（登録商標）ディスク)、スマートカード、フラッシュメモリ(例えば、カード、スティック、キードライブ)、フロッピー（登録商標）ディスク、磁気ストリップなどの少なくとも１つによって構成されてもよい。ストレージ１００３は、補助記憶装置と呼ばれてもよい。上述の記憶媒体は、例えば、メモリ１００２及びストレージ１００３の少なくとも一方を含むデータベース、サーバその他の適切な媒体であってもよい。 The storage 1003 is a computer-readable recording medium, for example, an optical disc such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disc, a magneto-optical disc (for example, a compact disc, a digital versatile disc, a Blu-ray disc). At least one of a (registered trademark) disk, a smart card, a flash memory (for example, a card, a stick, and a key drive), a floppy (registered trademark) disk, a magnetic strip, or the like. The storage 1003 may be called an auxiliary storage device. The storage medium described above may be, for example, a database including at least one of the memory 1002 and the storage 1003, a server, or another appropriate medium.

通信装置１００４は、有線ネットワーク及び無線ネットワークの少なくとも一方を介してコンピュータ間の通信を行うためのハードウェア（送受信デバイス）であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュールなどともいう。通信装置１００４は、例えば周波数分割複信（ＦＤＤ：Frequency Division Duplex）及び時分割複信（ＴＤＤ：Time Division Duplex）の少なくとも一方を実現するために、高周波スイッチ、デュプレクサ、フィルタ、周波数シンセサイザなどを含んで構成されてもよい。 The communication device 1004 is hardware (transmission/reception device) for performing communication between computers via at least one of a wired network and a wireless network, and is also referred to as, for example, a network device, a network controller, a network card, a communication module, or the like. The communication device 1004 includes a high frequency switch, a duplexer, a filter, a frequency synthesizer, and the like in order to realize at least one of, for example, frequency division duplex (FDD: Frequency Division Duplex) and time division duplex (TDD). May be composed of

入力装置１００５は、外部からの入力を受け付ける入力デバイス（例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、センサなど）である。出力装置１００６は、外部への出力を実施する出力デバイス（例えば、ディスプレイ、スピーカー、LEDランプなど）である。なお、入力装置１００５及び出力装置１００６は、一体となった構成（例えば、タッチパネル）であってもよい。 The input device 1005 is an input device (for example, a keyboard, a mouse, a microphone, a switch, a button, a sensor, etc.) that receives an input from the outside. The output device 1006 is an output device (for example, a display, a speaker, an LED lamp, etc.) that performs output to the outside. The input device 1005 and the output device 1006 may be integrated (for example, a touch panel).

また、プロセッサ１００１、メモリ１００２などの各装置は、情報を通信するためのバス１００７によって接続される。バス１００７は、単一のバスを用いて構成されてもよいし、装置間ごとに異なるバスを用いて構成されてもよい。 Further, each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information. The bus 1007 may be configured by using a single bus, or may be configured by using a different bus for each device.

また、コンピュータ１００は、マイクロプロセッサ、デジタル信号プロセッサ（ＤＳＰ：Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）、ＦＰＧＡ（Field Programmable Gate Array）などのハードウェアを含んで構成されてもよく、当該ハードウェアにより、各機能ブロックの一部又は全てが実現されてもよい。例えば、プロセッサ１００１は、これらのハードウェアの少なくとも１つを用いて実装されてもよい。 The computer 100 includes hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). Alternatively, a part or all of each functional block may be realized by the hardware. For example, the processor 1001 may be implemented using at least one of these hardware.

以上説明したように、本開示の一側面に係る呼制御システムは、発信端末と着信端末との間で伝送される通話をテキストに変換する音声テキスト化サービスを実行可能である。呼制御システムは、発信端末を利用する発信者と着信端末を利用する着信者との双方が音声テキスト化サービスの利用者である場合に、発信端末に対応する発側メディア処理装置と着信端末に対応する着側メディア処理装置とのうちの一方を共通のメディア処理装置として機能させる制御部を備える。共通のメディア処理装置は、発信者または着信者の音声をテキストに変換する音声認識エンジンと接続する。共通のメディア処理装置は、発信端末から送信された発信者の発側音声を音声認識エンジンに入力することで発側テキストを取得し、発側テキストを発信端末および着信端末の双方に向けて送信する。共通のメディア処理装置は、着信端末から送信された着信者の着側音声を音声認識エンジンに入力することで着側テキストを取得し、着側テキストを発信端末および着信端末の双方に向けて送信する。 As described above, the call control system according to one aspect of the present disclosure can execute a voice text service that converts a call transmitted between a calling terminal and a receiving terminal into text. When both the caller using the calling terminal and the callee using the called terminal are users of the voice text service, the call control system provides the calling media processing device and the called terminal corresponding to the calling terminal. A control unit that causes one of the corresponding receiving-side media processing devices to function as a common media processing device is provided. The common media processing device interfaces with a voice recognition engine that converts the voice of the caller or callee into text. The common media processing device acquires the calling side text by inputting the calling side calling voice of the calling party sent from the calling terminal to the voice recognition engine, and sends the calling side text to both the calling terminal and the called terminal. To do. The common media processing device acquires the callee text by inputting the callee voice of the callee sent from the callee terminal to the voice recognition engine, and sends the callee text to both the caller terminal and the callee terminal. To do.

また、発側メディア処理装置と着側メディア処理装置の双方を用いるのではなく、そのうちの一方が用いられるので、音声テキスト化サービスを実行するために用いられるハードウェア資源および利用ライセンス数の少なくとも一方を節約することができる。また、音声テキスト化サービスに関連するメッセージ（例えばガイダンス）を、共通のメディア処理装置から発信端末および着信端末の双方に送信することも可能になる。 Also, since both the originating side media processing device and the destination side media processing device are not used, one of them is used, so at least one of the hardware resources and the number of licenses used for executing the voice text service is used. Can be saved. It is also possible to send a message (eg, guidance) related to the voice text service from the common media processing device to both the calling terminal and the called terminal.

他の側面に係る呼制御システムでは、制御部が発側メディア処理装置を共通のメディア処理装置として機能させてもよい。或る同一種類の処理が実行されるタイミングは着側よりも発側の方が早い。したがって、発側メディア処理装置を共通のメディア処理装置として用いることで、音声テキスト化サービスに関連する処理を早く開始することができ、その分、音声テキスト化サービスをより早くユーザに提供することが可能になる。 In the call control system according to another aspect, the control unit may cause the originating media processing device to function as a common media processing device. The timing at which a certain type of processing is executed is earlier on the calling side than on the receiving side. Therefore, by using the calling-side media processing device as a common media processing device, it is possible to quickly start the processing related to the voice text conversion service, and the voice text conversion service can be provided to the user earlier accordingly. It will be possible.

他の側面に係る呼制御システムでは、制御部が、発側メディア処理装置を一意に特定する発側メディア装置ＩＤを着側メディア処理装置に向けて送信し、発側メディア装置ＩＤを受信した着側メディア処理装置から、着側メディア処理装置を一意に特定する着側メディア装置ＩＤを受信し、着側メディア装置ＩＤの受信に応答して、発側メディア処理装置を共通のメディア処理装置として機能させてもよい。発側および着側の双方のメディア処理装置の識別子を取得することで共通のメディア処理装置を確実に機能させることができる。 In the call control system according to another aspect, the control unit transmits a calling-side media processing device that uniquely identifies the calling-side media processing device to the receiving-side media processing device, and receives the calling-side media device ID. The receiving side media processing device receives the receiving side media device ID that uniquely identifies the receiving side media processing device, and in response to the reception of the receiving side media device ID, the calling side media processing device functions as a common media processing device. You may let me. By acquiring the identifiers of the media processing devices on both the calling side and the receiving side, the common media processing device can surely function.

他の側面に係る呼制御システムでは、発側メディア処理装置が、発側テキストまたは着側テキストを発信端末に送信する発側Ｗｅｂサーバと接続し、着側メディア処理装置が、発側テキストまたは着側テキストを着信端末に送信する着側Ｗｅｂサーバと接続してもよい。呼制御システムは、発側Ｗｅｂサーバを一意に特定する発側エンドポイントと、着側Ｗｅｂサーバを一意に特定する着側エンドポイントとを含むセッション情報を記憶するデータベースをさらに備えてもよい。共通のメディア処理装置は、セッション情報の発側エンドポイントおよび着側エンドポイントを取得し、発側エンドポイントに基づいて、発側テキストまたは着側テキストを発側Ｗｅｂサーバに送信することで、発側テキストまたは着側テキストを発信端末に向けて送信し、着側エンドポイントに基づいて、発側テキストまたは着側テキストを着側Ｗｅｂサーバに送信することで、発側テキストまたは着側テキストを着信端末に向けて送信してもよい。そのエンドポイントを参照することで、テキストを送信すべきＷｅｂサーバを特定することができる。 In the call control system according to another aspect, the calling-side media processing device is connected to a calling-side Web server that sends the calling-side text or the called-side text to the calling terminal, and the called-side media processing device causes the calling-side text or the called-side text processing device to be called. You may connect with the receiving side web server which transmits a side text to a receiving terminal. The call control system may further include a database that stores session information including a calling-side endpoint that uniquely specifies the calling-side Web server and a called-side endpoint that uniquely specifies the called-side Web server. The common media processing device acquires the caller endpoint and callee endpoint of the session information, and based on the caller endpoint, sends the caller text or callee text to the caller Web server, The destination text or destination text is received by sending the destination text or destination text to the originating terminal and sending the source text or destination text to the destination Web server based on the destination endpoint. You may transmit toward a terminal. By referring to the endpoint, the Web server to which the text should be transmitted can be specified.

他の側面に係る呼制御システムでは、制御部が、ユーザが音声テキスト化サービスの利用に同意することを示す同意信号が発信端末および着信端末の双方から送信されたことに応答して、音声テキストをどの通信端末に送信するかを示す認識方向を双方向に設定し、共通のメディア処理装置が、認識方向が双方向であることに応答して、発側テキストまたは着側テキストを発側Ｗｅｂサーバおよび着側Ｗｅｂサーバの双方に向けて送信してもよい。ユーザの同意に応じて認識方向を設定することで、発信者および着信者の双方が音声テキスト化サービスを希望する場合にのみその双方にテキストを送信することが可能になる。 In the call control system according to the other aspect, the control unit responds to the consent signal indicating that the user consents to the use of the voice text service by transmitting the voice text in response to both the originating terminal and the receiving terminal. The recognition direction indicating which communication terminal is transmitted is set to bidirectional, and the common media processing device responds to the bidirectional recognition direction by transmitting the calling side text or the called side text to the calling side web. It may be sent to both the server and the destination Web server. By setting the recognition direction according to the consent of the user, it becomes possible to send the text to both the caller and the callee only when both of them desire the voice text conversion service.

他の側面に係る呼制御システムでは、共通のメディア処理装置が、発側テキストおよび着側テキストのそれぞれについて、発話者が発信者および着信者のどちらであるかを示す発話種別をさらに発側Ｗｅｂサーバおよび着側Ｗｅｂサーバの双方に送信してもよい。この発話種別がＷｅｂサーバに提供されることで、Ｗｅｂサーバは発話者の種類に応じてテキストを処理することができる。 In the call control system according to another aspect, the common media processing device further sets, for each of the calling side text and the called side text, a calling type that indicates whether the calling party is a calling party or a called party. It may be sent to both the server and the destination Web server. By providing the utterance type to the web server, the web server can process the text according to the type of the speaker.

他の側面に係る呼制御システムでは、発側Ｗｅｂサーバは、発話種別が発信者を示す場合には、発信端末上で発側テキストが発話者自身の音声テキストとして表示されるように発側テキストの表示態様を設定し、発話種別が着信者を示す場合には、発信端末上で着側テキストが通話相手の音声テキストとして表示されるように着側テキストの表示態様を設定してもよい。着側Ｗｅｂサーバは、発話種別が発信者を示す場合には、着信端末上で発側テキストが通話相手の音声テキストとして表示されるように発側テキストの表示態様を設定し、発話種別が着信者を示す場合には、着信端末上で着側テキストが発話者自身の音声テキストとして表示されるように着側テキストの表示態様を設定してもよい。 In the call control system according to another aspect, when the utterance type indicates the caller, the calling Web server displays the calling text so that the calling text is displayed as the voice text of the speaker on the calling terminal. When the utterance type indicates the called party, the display mode of the receiving side text may be set so that the receiving side text is displayed as the voice text of the other party on the calling terminal. When the utterance type indicates the caller, the destination Web server sets the display mode of the calling side text so that the calling side text is displayed as the voice text of the other party on the receiving terminal, and the utterance type is called. When indicating the person, the display mode of the receiving side text may be set so that the receiving side text is displayed as the voice text of the speaker himself on the receiving terminal.

発側および着側のそれぞれで、発話種別に応じて上記のようにテキストの表示態様を設定することで、通信端末の利用者と発話者との関係に応じてテキストを表示することができる。通信端末は自機のユーザの音声テキストと通話相手の音声テキストとを互いに異なる表示態様で表示し、このことは、音声テキスト化サービスのユーザインタフェースの改善に寄与し得る。 By setting the text display mode as described above according to the utterance type on the calling side and the called side, the text can be displayed according to the relationship between the user of the communication terminal and the speaker. The communication terminal displays the voice text of its own user and the voice text of the other party in different display modes, which can contribute to the improvement of the user interface of the voice text service.

他の側面に係る呼制御システムでは、発側Ｗｅｂサーバは、発話種別が発信者を示す場合には、発信端末上で発側テキストが発話者自身の音声テキストとして表示されるように発側テキストを発信端末上の第１の側に表示させ、発話種別が着信者を示す場合には、発信端末上で着側テキストが通話相手の音声テキストとして表示されるように着側テキストを発信端末上の第２の側に表示させてもよい。着側Ｗｅｂサーバは、発話種別が発信者を示す場合には、着信端末上で発側テキストが通話相手の音声テキストとして表示されるように発側テキストを着信端末上の第１の側に表示させ、発話種別が着信者を示す場合には、着信端末上で着側テキストが発話者自身の音声テキストとして表示されるように着側テキストを着信端末上の第２の側に表示させてもよい。 In the call control system according to another aspect, when the utterance type indicates the caller, the calling Web server displays the calling text so that the calling text is displayed as the voice text of the speaker on the calling terminal. Is displayed on the first side of the calling terminal and the utterance type indicates the called party, the called side text is displayed on the calling terminal so that the called side text is displayed as the voice text of the other party on the calling terminal. May be displayed on the second side of. When the utterance type indicates the caller, the receiving side Web server displays the calling side text on the receiving side terminal on the first side so that the calling side text is displayed as the voice text of the caller. If the utterance type indicates the called party, the called side text may be displayed on the second side on the called terminal so that the called side text is displayed as the voice text of the calling person on the called terminal. Good.

発側および着側のそれぞれで、発話種別に応じて上記のようにテキストの表示位置を設定することで、通信端末の利用者と発話者との関係に応じてテキストを表示することができる。通信端末は自機のユーザの音声テキストと通話相手の音声テキストとを互いに異なる側に表示するので、発信者および着信者のそれぞれに、自分の発話と相手の発話とを分かり易く示すことができる。 By setting the display position of the text on each of the calling side and the receiving side as described above according to the utterance type, the text can be displayed according to the relationship between the user of the communication terminal and the utterer. Since the communication terminal displays the voice text of its own user and the voice text of the other party on different sides, it is possible to show the utterance of the user and the utterance of the other person to the caller and the callee in an easy-to-understand manner. ..

他の側面に係る呼制御システムでは、発側Ｗｅｂサーバが、発話種別を発側テキストまたは着側テキストと共に発信端末に送信することで、発信端末に発話種別に基づいて発側テキストまたは着側テキストの表示態様を設定させ、着側Ｗｅｂサーバが、発話種別を発側テキストまたは着側テキストと共に着信端末に送信することで、着信端末に発話種別に基づいて発側テキストまたは着側テキストの表示態様を設定させてもよい。 In the call control system according to another aspect, the calling side Web server transmits the utterance type together with the calling side text or the called side text to the calling terminal, so that the calling side text or the called side text is transmitted to the calling terminal based on the utterance type. The display mode of the calling side text or the receiving side text is displayed on the receiving terminal based on the utterance type by causing the receiving side Web server to transmit the utterance type together with the calling side text or the receiving side text to the receiving terminal. May be set.

以上、本開示について詳細に説明したが、当業者にとっては、本開示が本開示中に説明した実施形態に限定されるものではないということは明らかである。本開示は、請求の範囲の記載により定まる本開示の趣旨及び範囲を逸脱することなく修正及び変更態様として実施することができる。したがって、本開示の記載は、例示説明を目的とするものであり、本開示に対して何ら制限的な意味を有するものではない。 Although the present disclosure has been described in detail above, it is obvious to those skilled in the art that the present disclosure is not limited to the embodiments described in the present disclosure. The present disclosure can be implemented as modified and changed modes without departing from the spirit and scope of the present disclosure defined by the description of the claims. Therefore, the description of the present disclosure is for the purpose of exemplification, and does not have any restrictive meaning to the present disclosure.

情報の通知は、本開示において説明した態様／実施形態に限られず、他の方法を用いて行われてもよい。例えば、情報の通知は、物理レイヤシグナリング（例えば、ＤＣＩ（Downlink Control Information）、ＵＣＩ（Uplink Control Information））、上位レイヤシグナリング（例えば、ＲＲＣ（Radio Resource Control）シグナリング、ＭＡＣ（Medium Access Control）シグナリング、報知情報（ＭＩＢ（Master Information Block）、ＳＩＢ（System Information Block）））、その他の信号又はこれらの組み合わせによって実施されてもよい。また、ＲＲＣシグナリングは、ＲＲＣメッセージと呼ばれてもよく、例えば、ＲＲＣ接続セットアップ（RRC Connection Setup）メッセージ、ＲＲＣ接続再構成（RRC Connection Reconfiguration）メッセージなどであってもよい。 The notification of information is not limited to the aspect/embodiment described in the present disclosure, and may be performed using another method. For example, the information is notified by physical layer signaling (for example, DCI (Downlink Control Information), UCI (Uplink Control Information)), upper layer signaling (for example, RRC (Radio Resource Control) signaling, MAC (Medium Access Control) signaling, It may be implemented by notification information (MIB (Master Information Block), SIB (System Information Block)), another signal, or a combination thereof. Further, the RRC signaling may be called an RRC message, and may be, for example, an RRC connection setup (RRC Connection Setup) message, an RRC connection reconfiguration message, or the like.

本開示において説明した各態様／実施形態は、ＬＴＥ（Long Term Evolution）、ＬＴＥ−Ａ（LTE-Advanced）、ＳＵＰＥＲ３Ｇ、ＩＭＴ−Ａｄｖａｎｃｅｄ、４Ｇ（4th generation mobile communication system）、５Ｇ（5th generation mobile communication system）、ＦＲＡ（Future Radio Access）、ＮＲ（new Radio）、Ｗ−ＣＤＭＡ（登録商標）、ＧＳＭ（登録商標）、ＣＤＭＡ２０００、ＵＭＢ（Ultra Mobile Broadband）、ＩＥＥＥ８０２．１１（Ｗｉ−Ｆｉ（登録商標））、ＩＥＥＥ８０２．１６（ＷｉＭＡＸ（登録商標））、ＩＥＥＥ８０２．２０、ＵＷＢ（Ultra-WideBand）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、その他の適切なシステムを利用するシステム及びこれらに基づいて拡張された次世代システムの少なくとも一つに適用されてもよい。また、複数のシステムが組み合わされて（例えば、ＬＴＥ及びＬＴＥ−Ａの少なくとも一方と５Ｇとの組み合わせ等）適用されてもよい。 Each aspect/embodiment described in the present disclosure is LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G (4th generation mobile communication system), 5G (5th generation mobile communication). system), FRA (Future Radio Access), NR (new Radio), W-CDMA (registered trademark), GSM (registered trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi (registered trademark) )), IEEE 802.16 (WiMAX (registered trademark)), IEEE 802.20, UWB (Ultra-WideBand), Bluetooth (registered trademark), systems using other suitable systems, and extensions based on these. It may be applied to at least one of the next-generation systems. Also, a plurality of systems may be combined and applied (for example, a combination of at least one of LTE and LTE-A and 5G).

本開示において説明した各態様／実施形態の処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本開示において説明した方法については、例示的な順序を用いて様々なステップの要素を提示しており、提示した特定の順序に限定されない。 As long as there is no contradiction, the order of the processing procedure, sequence, flowchart, etc. of each aspect/embodiment described in the present disclosure may be changed. For example, the methods described in this disclosure present elements of the various steps in a sample order, and are not limited to the specific order presented.

本開示において基地局によって行われるとした特定動作は、場合によってはその上位ノード（upper node）によって行われることもある。基地局を有する１つ又は複数のネットワークノード（network nodes）からなるネットワークにおいて、端末との通信のために行われる様々な動作は、基地局及び基地局以外の他のネットワークノード（例えば、ＭＭＥ又はＳ−ＧＷなどが考えられるが、これらに限られない）の少なくとも１つによって行われ得ることは明らかである。上記において基地局以外の他のネットワークノードが１つである場合を例示したが、複数の他のネットワークノードの組み合わせ（例えば、ＭＭＥ及びＳ−ＧＷ）であってもよい。 In the present disclosure, the specific operation performed by the base station may be performed by its upper node in some cases. In a network of one or more network nodes having a base station, the various operations performed for communication with a terminal are the base station and other network nodes other than the base station (eg MME or S-GW and the like are conceivable, but are not limited thereto, and it is clear that at least one of them can be used. Although the case where there is one other network node other than the base station has been illustrated above, a combination of a plurality of other network nodes (for example, MME and S-GW) may be used.

情報等は、上位レイヤ（又は下位レイヤ）から下位レイヤ（又は上位レイヤ）へ出力され得る。複数のネットワークノードを介して入出力されてもよい。 Information and the like can be output from the upper layer (or lower layer) to the lower layer (or upper layer). Input/output may be performed via a plurality of network nodes.

入出力された情報等は特定の場所（例えば、メモリ）に保存されてもよいし、管理テーブルを用いて管理してもよい。入出力される情報等は、上書き、更新、又は追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 The input/output information and the like may be stored in a specific place (for example, a memory) or may be managed using a management table. Information that is input/output may be overwritten, updated, or added. The output information and the like may be deleted. The input information and the like may be transmitted to another device.

判定は、１ビットで表される値（０か１か）によって行われてもよいし、真偽値（Boolean：true又はfalse）によって行われてもよいし、数値の比較（例えば、所定の値との比較）によって行われてもよい。 The determination may be performed based on a value represented by 1 bit (whether 0 or 1), may be performed based on a Boolean value (Boolean: true or false), or may be compared by a numerical value (for example, a predetermined value). (Comparison with the value).

本開示において説明した各態様／実施形態は単独で用いてもよいし、組み合わせて用いてもよいし、実行に伴って切り替えて用いてもよい。また、所定の情報の通知（例えば、「Ｘであること」の通知）は、明示的に行うものに限られず、暗黙的（例えば、当該所定の情報の通知を行わない）ことによって行われてもよい。 Each aspect/embodiment described in the present disclosure may be used alone, may be used in combination, or may be switched according to execution. Further, the notification of the predetermined information (for example, the notification of “being X”) is not limited to the explicit notification, but is performed implicitly (for example, the notification of the predetermined information is not performed). Good.

ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。 Software, whether called software, firmware, middleware, microcode, hardware description language, or any other name, instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules. , Application, software application, software package, routine, subroutine, object, executable, thread of execution, procedure, function, etc. should be construed broadly.

また、ソフトウェア、命令、情報などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、有線技術（同軸ケーブル、光ファイバケーブル、ツイストペア、デジタル加入者回線（ＤＳＬ：Digital Subscriber Line）など）及び無線技術（赤外線、マイクロ波など）の少なくとも一方を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び無線技術の少なくとも一方は、伝送媒体の定義内に含まれる。 In addition, software, instructions, information, etc. may be sent and received via a transmission medium. For example, the software uses a website using at least one of wired technology (coaxial cable, optical fiber cable, twisted pair, digital subscriber line (DSL), etc.) and wireless technology (infrared, microwave, etc.), When sent from a server, or other remote source, at least one of these wired and wireless technologies is included within the definition of transmission medium.

本開示において説明した情報、信号などは、様々な異なる技術のいずれかを使用して表されてもよい。例えば、上記の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、チップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、又はこれらの任意の組み合わせによって表されてもよい。 The information, signals, etc. described in this disclosure may be represented using any of a variety of different technologies. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description include voltage, current, electromagnetic waves, magnetic fields or magnetic particles, optical fields or photons, or any of these. May be represented by a combination of

なお、本開示において説明した用語及び本開示の理解に必要な用語については、同一の又は類似する意味を有する用語と置き換えてもよい。例えば、チャネル及びシンボルの少なくとも一方は信号（シグナリング）であってもよい。また、信号はメッセージであってもよい。また、コンポーネントキャリア（ＣＣ：Component Carrier）は、キャリア周波数、セル、周波数キャリアなどと呼ばれてもよい。 The terms described in the present disclosure and the terms necessary for understanding the present disclosure may be replaced with terms having the same or similar meanings. For example, at least one of the channel and the symbol may be a signal (signaling). The signal may also be a message. Moreover, a component carrier (CC:Component Carrier) may be called a carrier frequency, a cell, a frequency carrier, or the like.

本開示において使用する「システム」及び「ネットワーク」という用語は、互換的に使用される。 The terms "system" and "network" used in this disclosure are used interchangeably.

また、本開示において説明した情報、パラメータなどは、絶対値を用いて表されてもよいし、所定の値からの相対値を用いて表されてもよいし、対応する別の情報を用いて表されてもよい。例えば、無線リソースはインデックスによって指示されるものであってもよい。 Further, the information, parameters, etc. described in the present disclosure may be represented by using an absolute value, may be represented by using a relative value from a predetermined value, or by using other corresponding information. May be represented. For example, the radio resources may be those indicated by the index.

上述したパラメータに使用する名称はいかなる点においても限定的な名称ではない。さらに、これらのパラメータを使用する数式等は、本開示で明示的に開示したものと異なる場合もある。様々なチャネル（例えば、ＰＵＣＣＨ、ＰＤＣＣＨなど）及び情報要素は、あらゆる好適な名称によって識別できるので、これらの様々なチャネル及び情報要素に割り当てている様々な名称は、いかなる点においても限定的な名称ではない。 The names used for the above parameters are not limiting in any way. Further, mathematical formulas and the like using these parameters may differ from those explicitly disclosed in the present disclosure. Since various channels (eg PUCCH, PDCCH, etc.) and information elements can be identified by any suitable name, the various names assigned to these various channels and information elements are in no way limited names. is not.

本開示においては、「基地局（ＢＳ：Base Station）」、「無線基地局」、「固定局（fixed station）」、「ＮｏｄｅＢ」、「ｅＮｏｄｅＢ（ｅＮＢ）」、「ｇＮｏｄｅＢ（ｇＮＢ）」、「アクセスポイント（access point）」、「送信ポイント（transmission point）」、「受信ポイント（reception point）、「送受信ポイント（transmission/reception point）」、「セル」、「セクタ」、「セルグループ」、「キャリア」、「コンポーネントキャリア」などの用語は、互換的に使用され得る。基地局は、マクロセル、スモールセル、フェムトセル、ピコセルなどの用語で呼ばれる場合もある。 In the present disclosure, "base station (BS)", "radio base station", "fixed station", "NodeB", "eNodeB (eNB)", "gNodeB (gNB)", " "Access point", "transmission point", "reception point", "transmission/reception point", "cell", "sector", "cell group", " The terms "carrier", "component carrier" and the like may be used interchangeably. A base station may be referred to by terms such as macro cell, small cell, femto cell, pico cell, and the like.

基地局は、１つ又は複数（例えば、３つ）のセルを収容することができる。基地局が複数のセルを収容する場合、基地局のカバレッジエリア全体は複数のより小さいエリアに区分でき、各々のより小さいエリアは、基地局サブシステム（例えば、屋内用の小型基地局（ＲＲＨ：ＲｅｍｏｔｅＲａｄｉｏＨｅａｄ）によって通信サービスを提供することもできる。「セル」又は「セクタ」という用語は、このカバレッジにおいて通信サービスを行う基地局及び基地局サブシステムの少なくとも一方のカバレッジエリアの一部又は全体を指す。 A base station can accommodate one or more (eg, three) cells. When a base station accommodates multiple cells, the entire coverage area of the base station can be divided into multiple smaller areas, each smaller area being defined by a base station subsystem (eg, indoor small base station (RRH: Remote Radio Head) may also be used to provide communication services.The term "cell" or "sector" refers to part or all of the coverage area of a base station and/or a base station subsystem serving communication in this coverage. Refers to.

本開示においては、「移動局（ＭＳ：Mobile Station）」、「ユーザ端末（user terminal）」、「ユーザ装置（ＵＥ：User Equipment）」、「端末」などの用語は、互換的に使用され得る。 In this disclosure, terms such as “mobile station (MS)”, “user terminal”, “user equipment (UE)”, and “terminal” may be used interchangeably. ..

移動局は、当業者によって、加入者局、モバイルユニット、加入者ユニット、ワイヤレスユニット、リモートユニット、モバイルデバイス、ワイヤレスデバイス、ワイヤレス通信デバイス、リモートデバイス、モバイル加入者局、アクセス端末、モバイル端末、ワイヤレス端末、リモート端末、ハンドセット、ユーザエージェント、モバイルクライアント、クライアント、又はいくつかの他の適切な用語で呼ばれる場合もある。 A mobile station can be a subscriber station, mobile unit, subscriber unit, wireless unit, remote unit, mobile device, wireless device, wireless communication device, remote device, mobile subscriber station, access terminal, mobile terminal, wireless, by a person skilled in the art. It may also be referred to as a terminal, remote terminal, handset, user agent, mobile client, client, or some other suitable term.

基地局及び移動局の少なくとも一方は、送信装置、受信装置、通信装置などと呼ばれてもよい。なお、基地局及び移動局の少なくとも一方は、移動体に搭載されたデバイス、移動体自体などであってもよい。当該移動体は、乗り物（例えば、車、飛行機など）であってもよいし、無人で動く移動体（例えば、ドローン、自動運転車など）であってもよいし、ロボット（有人型又は無人型）であってもよい。なお、基地局及び移動局の少なくとも一方は、必ずしも通信動作時に移動しない装置も含む。例えば、基地局及び移動局の少なくとも一方は、センサなどのＩｏＴ（Internet of Things）機器であってもよい。 At least one of the base station and the mobile station may be called a transmitting device, a receiving device, a communication device, or the like. Note that at least one of the base station and the mobile station may be a device mounted on the mobile body, the mobile body itself, or the like. The moving body may be a vehicle (eg, car, airplane, etc.), an unmanned moving body (eg, drone, self-driving car, etc.), or a robot (manned type or unmanned type). ). At least one of the base station and the mobile station also includes a device that does not necessarily move during a communication operation. For example, at least one of the base station and the mobile station may be an IoT (Internet of Things) device such as a sensor.

また、本開示における基地局は、ユーザ端末で読み替えてもよい。例えば、基地局及びユーザ端末間の通信を、複数のユーザ端末間の通信（例えば、Ｄ２Ｄ（Device-to-Device）、Ｖ２Ｘ（Vehicle-to-Everything）などと呼ばれてもよい）に置き換えた構成について、本開示の各態様／実施形態を適用してもよい。この場合、基地局が有する機能をユーザ端末が有する構成としてもよい。また、「上り」及び「下り」などの文言は、端末間通信に対応する文言（例えば、「サイド（side）」）で読み替えられてもよい。例えば、上りチャネル、下りチャネルなどは、サイドチャネルで読み替えられてもよい。 Further, the base station in the present disclosure may be replaced by the user terminal. For example, the communication between the base station and the user terminal is replaced with communication between a plurality of user terminals (eg, may be called D2D (Device-to-Device), V2X (Vehicle-to-Everything), etc.). Each aspect/embodiment of the present disclosure may be applied to the configuration. In this case, the user terminal may have the function of the base station. In addition, the wording such as “up” and “down” may be replaced with the wording corresponding to the communication between terminals (for example, “side”). For example, the uplink channel and the downlink channel may be replaced with the side channel.

同様に、本開示におけるユーザ端末は、基地局で読み替えてもよい。この場合、ユーザ端末が有する機能を基地局が有する構成としてもよい。 Similarly, the user terminal in the present disclosure may be replaced with the base station. In this case, the base station may have the function of the user terminal.

本開示で使用する「判断(determining)」、「決定(determining)」という用語は、多種多様な動作を包含する場合がある。「判断」、「決定」は、例えば、判定(judging)、計算(calculating)、算出(computing)、処理(processing)、導出(deriving)、調査(investigating)、探索(looking up、search、inquiry)（例えば、テーブル、データベース又は別のデータ構造での探索）、確認(ascertaining)した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、受信(receiving)（例えば、情報を受信すること）、送信(transmitting)(例えば、情報を送信すること)、入力(input)、出力(output)、アクセス(accessing)（例えば、メモリ中のデータにアクセスすること）した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、解決(resolving)、選択(selecting)、選定(choosing)、確立(establishing)、比較(comparing)などした事を「判断」「決定」したとみなす事を含み得る。つまり、「判断」「決定」は、何らかの動作を「判断」「決定」したとみなす事を含み得る。また、「判断（決定）」は、「想定する（assuming）」、「期待する（expecting）」、「みなす（considering）」などで読み替えられてもよい。 The terms "determining" and "determining" as used in this disclosure may encompass a wide variety of actions. "Judgment", "decision", for example, judgment (judging), calculation (calculating), calculation (computing), processing (processing), derivation (deriving), investigating (investigating), searching (looking up, search, inquiry) (Eg, searching in a table, a database, or another data structure), considering ascertaining as “judging” or “deciding” may be included. In addition, "decision" and "decision" include receiving (eg, receiving information), transmitting (eg, transmitting information), input (input), output (output), access (accessing) (for example, accessing data in a memory) may be regarded as “judging” and “deciding”. In addition, "judgment" and "decision" are considered to be "judgment" and "decision" when things such as resolving, selecting, choosing, establishing, establishing, and comparing are done. May be included. That is, the “judgment” and “decision” may include considering some action as “judgment” and “decision”. In addition, "determination (decision)" may be read as "assuming," "expecting," "considering," and the like.

「接続された(connected)」、「結合された(coupled)」という用語、又はこれらのあらゆる変形は、２又はそれ以上の要素間の直接的又は間接的なあらゆる接続又は結合を意味し、互いに「接続」又は「結合」された２つの要素間に１又はそれ以上の中間要素が存在することを含むことができる。要素間の結合又は接続は、物理的なものであっても、論理的なものであっても、或いはこれらの組み合わせであってもよい。例えば、「接続」は「アクセス」で読み替えられてもよい。本開示で使用する場合、２つの要素は、１又はそれ以上の電線、ケーブル及びプリント電気接続の少なくとも一つを用いて、並びにいくつかの非限定的かつ非包括的な例として、無線周波数領域、マイクロ波領域及び光（可視及び不可視の両方）領域の波長を有する電磁エネルギーなどを用いて、互いに「接続」又は「結合」されると考えることができる。 The terms "connected," "coupled," or any variation thereof, mean any direct or indirect connection or coupling between two or more elements, and It can include the presence of one or more intermediate elements between two elements that are “connected” or “coupled”. The connections or connections between the elements may be physical, logical, or a combination thereof. For example, “connection” may be read as “access”. As used in this disclosure, two elements are in the radio frequency domain, with at least one of one or more wires, cables and printed electrical connections, and as some non-limiting and non-exhaustive examples. , Can be considered to be “connected” or “coupled” to each other, such as with electromagnetic energy having wavelengths in the microwave and light (both visible and invisible) regions.

本開示において使用する「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 As used in this disclosure, the phrase "based on" does not mean "based only on," unless expressly specified otherwise. In other words, the phrase "based on" means both "based only on" and "based at least on."

本開示において使用する「第１の」、「第２の」などの呼称を使用した要素へのいかなる参照も、それらの要素の量又は順序を全般的に限定しない。これらの呼称は、２つ以上の要素間を区別する便利な方法として本開示において使用され得る。したがって、第１及び第２の要素への参照は、２つの要素のみが採用され得ること、又は何らかの形で第１の要素が第２の要素に先行しなければならないことを意味しない。 Any reference to elements using the designations "first," "second," etc. as used in this disclosure does not generally limit the amount or order of those elements. These designations may be used in this disclosure as a convenient way to distinguish between two or more elements. Thus, references to the first and second elements do not imply that only two elements may be employed or that the first element must precede the second element in any way.

本開示において、「含む（include）」、「含んでいる（including）」及びそれらの変形が使用されている場合、これらの用語は、用語「備える（comprising）」と同様に、包括的であることが意図される。さらに、本開示において使用されている用語「又は（or）」は、排他的論理和ではないことが意図される。 Where the terms “include”, “including” and variations thereof are used in this disclosure, these terms are inclusive, as is the term “comprising”. Is intended. Furthermore, the term "or" as used in this disclosure is not intended to be exclusive-or.

本開示において、例えば、英語でのa, an及びtheのように、翻訳により冠詞が追加された場合、本開示は、これらの冠詞の後に続く名詞が複数形であることを含んでもよい。 In this disclosure, where translations add articles, such as a, an, and the in English, the disclosure may include that the noun that follows these articles is in the plural.

本開示において、「ＡとＢが異なる」という用語は、「ＡとＢが互いに異なる」ことを意味してもよい。なお、当該用語は、「ＡとＢがそれぞれＣと異なる」ことを意味してもよい。「離れる」、「結合される」などの用語も、「異なる」と同様に解釈されてもよい。 In the present disclosure, the term “A and B are different” may mean “A and B are different from each other”. The term may mean that “A and B are different from C”. The terms "remove", "coupled" and the like may be construed as "different" as well.

１…呼制御システム、１０…コアネットワーク、１１…発側ＣＳＣＦ、１２…着側ＣＳＣＦ、１３…発側ＡＳ、１４…着側ＡＳ、１５…発側ＭＣＥ（発側メディア処理装置）、１６…着側ＭＣＥ（着側メディア処理装置）、１７…発側ＳＭＳ−ＧＷ、１８…着側ＳＭＳ−ＧＷ、１９…セッションデータベース、２１…発側ネットワーク、２２…着側ネットワーク、３１…発信端末、３２…着信端末、４１…発側Ｗｅｂサーバ、４２…着側Ｗｅｂサーバ、４３…音声認識エンジン、１３１，１４１…サービス制御部、１３２，１４２…セッション制御部、１３３，１４３…サービスシナリオ部。 1...Call control system, 10...Core network, 11...Calling side CSCF, 12...Calling side CSCF, 13...Calling side AS, 14...Calling side AS, 15...Calling side MCE (calling side media processing device), 16... Destination side MCE (end side media processing device), 17... Originating side SMS-GW, 18... Destination side SMS-GW, 19... Session database, 21... Originating network, 22... Destination network, 31... Originating terminal, 32 ... Incoming terminal, 41... Originating Web server, 42... Destination Web server, 43... Voice recognition engine, 131, 141... Service control unit, 132, 142... Session control unit, 133, 143... Service scenario unit.

Claims

A call control system capable of executing a voice text service for converting a call transmitted between a calling terminal and a receiving terminal into text,
When both the caller who uses the caller terminal and the caller who uses the callee terminal are users of the text-to-speech service, the calling-side media processing device and callee terminal corresponding to the caller terminal And a control unit that causes one of the corresponding destination media processing devices to function as a common media processing device,
The common media processing device is connected to a voice recognition engine that converts the voice of the caller or the callee into text,
The common media processing device is
Acquiring a calling side text by inputting the calling side calling voice of the calling party transmitted from the calling terminal into the voice recognition engine,
Sending the originating text to both the originating and terminating terminals,
By inputting the callee voice of the callee transmitted from the callee terminal to the voice recognition engine, the callee text is acquired,
Sending the destination text to both the originating and terminating terminals,
Call control system.

The control unit causes the originating media processing device to function as the common media processing device,
The call control system according to claim 1.

The control unit,
A sending media device ID that uniquely identifies the calling media processing device is transmitted to the receiving media processing device,
From the destination media processing device that has received the source media device ID, receives a destination media device ID that uniquely identifies the destination media processing device,
Causing the originating media processing device to function as the common media processing device in response to receipt of the destination media device ID,
The call control system according to claim 2.

The calling side media processing device is connected to a calling side web server for transmitting the calling side text or the called side text to the calling terminal,
The receiving side media processing device is connected to a receiving side web server for transmitting the calling side text or the receiving side text to the receiving terminal,
The call control system further includes a database that stores session information including a calling endpoint that uniquely identifies the calling Web server and a called endpoint that uniquely specifies the called Web server,
The common media processing device is
Acquiring the originating endpoint and the terminating endpoint of the session information,
Based on the calling end point, by sending the calling side text or the called side text to the calling side web server, the calling side text or the called side text is sent toward the calling terminal,
Transmitting the calling side text or the called side text to the called side web server based on the called side endpoint, thereby sending the calling side text or the called side text to the called terminal.
The call control system according to claim 2 or 3.

The control unit transmits the voice text to which communication terminal in response to the consent signal indicating that the user agrees to use the voice text service, transmitted from both the calling terminal and the receiving terminal. Set the recognition direction that indicates whether to
The common media processing device transmits the source text or the destination text to both the source Web server and the destination Web server in response to the recognition direction being bidirectional. ,
The call control system according to claim 4.

The common media processing device further sets, for each of the calling-side text and the called-side text, a utterance type indicating whether the utterer is the caller or the callee, to the calling-side Web server and the callee. Sent to both side Web servers,
The call control system according to claim 4 or 5.