WO2011029862A1

WO2011029862A1 - Method and system for converting text messages into voice over ip calls from a web interface

Info

Publication number: WO2011029862A1
Application number: PCT/EP2010/063218
Authority: WO
Inventors: Santiago Prieto Martin; David ARTUÑEDO GUILLÉN
Original assignee: Telefonica SA
Current assignee: Telefonica SA
Priority date: 2009-09-09
Filing date: 2010-09-09
Publication date: 2011-03-17
Anticipated expiration: 2012-03-09
Also published as: UY32881A; ES2372142B1; AR078277A1; ES2372142A1

Abstract

The invention relates to a method for converting text messages into voice over IP calls which comprises providing a text message contained in an HTTP request (200) entered by a source user (1, 100) through a Web interface, with an identifier; sending the request with an identifier (201) to a text-to-speech converter (104); converting the text contained in the request with an identifier (201) into a voice file (202) by means of a text-to-speech converter (104); making a voice over IP call to the destination (2, 3, 103) indicated in the HTTP request (200); playing the voice file (202), when the destination (3, 103) answers the voice call, by means of the first subsystem (101); ending the voice call, generating a notification (205) about the result of the voice call, and transmitting the notification (205) to the source user (1, 100).

Description

METHOD AND SYSTEM FOR CONVERTING TEXT MESSAGES INTO VOICE OVER IP CALLS FROM A WEB INTERFACE

Technical Field of the Invention

The present invention is comprised in the technical field of telecommunications and more specifically in the field of next generation telecommunications networks (both for fixed or mobile communications) deployed over IP networks (on the Internet for example) and in which Web access is furthermore available. The invention particularly belongs to the sector of the systems for converting text messages into speech which can be used in such telecommunications networks.

Background of the Invention

Text-to-speech converters consist of specialized systems which allow generating synthetic speech from a determined text. They are applicable in a number of fields generally to generate synthetically recorded messages which are subsequently played in interactive voice response systems, for example in a call center or in an automatic answering service. An example of this type of application is found in patent US-6549749-B1 .

On the other hand, current deployments of communications networks over IP of the telephones operators are mostly based on the SIP (Session Initiation Protocol) protocol described in IETF. Session Initiation Protocol (SIP). RFC3261 , June 2002, which allows the setup and release of voice over IP calls. SIP calls can be generated from any terminal including terminals located in the network of the operator. Furthermore, the SIP protocol is designed in a manner similar to the HTTP protocol, such that it is easy to integrate services having a Web part with a voice over IP part.

Nevertheless, the inventors do not know of the existence of any system which is capable of converting a determined text entered from a Web page into a voice over IP call such that the call ends in a destination user previously identified by his unique user identifier (URI), which, in view of the multiple practical applications that such system would have, entails a limitation of the current applications, which constitutes a drawback.

Description of the Invention

The object of the present invention is to overcome the drawback of the state of the art detailed above by means of a method and a system for converting text messages into voice over IP calls from a Web interface.

According to the invention, the method is a method for converting text messages into voice over IP calls from a Web interface which comprises sending to a destination a voice message generated from a text sent by a source user by means of a text-to-speech converter, which is characterized in that it comprises:

a first step in which a text message contained in an HTTP request entered by a source user through a Web interface of a first subsystem comprising an SIP user agent is provided with at least one identifier for generating a request with an identifier comprising a file identifier and a text corresponding to the text message present in the HTTP request;

a second step in which the request with an identifier is sent to a text- to-speech converter by means of a second subsystem;

a third step in which the text message contained in the request with an identifier is converted into a voice file by means of the text-to-synthetic speech converter;

a fourth step in which the text-to-synthetic speech converter sends to the first subsystem the voice file to a voice message folder identified by the identifier indicated in the request with an identifier;

a fifth step in which, by means of the first subsystem, a voice over IP call is made to the destination indicated in the HTTP request;

a sixth step in which, when the destination answers the voice call, a voice session with the destination is initiated in which the voice file is played with synthetic speech by the first subsystem and transmitted to the destination;

a seventh step in which the first subsystem ends the voice call and generates a notification to inform the source user about the result of the voice call, the notification being a delivery notification when the voice file has been communicated to the destination, and an error notification when the voice message has not been able to be transmitted to the destination; and

an eighth step in which the first subsystem transmits the notification to the source user.

On the other hand, the system according to the invention is a system for converting text messages into voice over IP calls from a Web interface to a destination, comprising means for receiving text messages from a plurality of users, a text-to-speech converter for converting text messages sent by each of the source users into respective voice messages and means for delivering each voice message to its destination, such as a mobile or fixed user for example, which is characterized in that it further comprises a first subsystem and a second subsystem, the first subsystem comprising:

a Web interface for entering HTTP message requests comprising text messages,

an SIP user agent for generating respective requests with an identifier, each of which comprises a file identifier and a text corresponding to the text message present in the HTTP request, for sending the requests with an identifier to the second subsystem, the second subsystem comprising means for sending the text of each request with an identifier to the text-to-speech converter, in which the text contained in each message is converted into a voice file with synthetic speech, receiving each voice file and sending each voice file to a message folder in the first subsystem identified by the identifier, means for making voice over IP calls to each destination indicated in an HTTP request;

means for, when a destination answers the voice call, initiating a voice session in which the voice file is played with synthetic speech by the first subsystem and transmitted to the destination;

means for ending the voice call and generating a notification to inform the source user about the result of the voice call, the notification being a delivery notification when the voice file has been communicated to the destination, and an error notification when the voice message has not been able to be transmitted to the destination; and

means for transmitting the notification to the source user.

According to the invention, the first subsystem comprises an SIP HTTP SERVLET system, whereas the second subsystem can comprise a CONVERTER INTERFACE system.

The Web interface can be housed in a server, and can be an HTML Web page, in which case the user is usually a Web browser. The Web interface can also be a Web service interface with technology allowing a machine-to-machine communication, this technology being able to be based on Roy Thomas Fielding's Representational State Transfer (REST) technology, which can be consulted in;

http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm, in which case the user is a usually a service of a third party.

The voice file is preferably coded in an audio codec, such as a G.71 1 codec or a GSM codec for example, and encapsulated in RTP format.

The Web access interface can be housed in any Web server, for example, a public Web server accessible from the Internet. To achieve the conversion of the text into a voice over IP call, the system makes use of a text-to-speech conversion element and of an SIP call agent which is capable of initiating from the network voice over IP calls with the destination user and of sending the audio stream corresponding to the speech generated by the text-to-speech converter. For the text-to-speech conversion it is possible to use commercial or free text-to-speech converters which are available, such that this invention can be used with any of them. On the other hand, the SIP call agent will require an external element capable of routing the calls generated by the former, for example an SIP proxy server or a generic IP network for Voice over IP communications such as IP networks with IMS architecture or NGN networks.

The present invention has various applications, such as for example in the field of voice communications services which can be offered by a telecommunications operator from its own network, for example, a Voice over IP network or a network with an IMS (IP Multimedia Subsystem) core.

The user of the system can be an end user, for example, a user with Internet connectivity who is given access to perform conversions of text into voice over IP calls. The aspects of charging for this service are outside the scope of this invention. In the event that the service is offered by a voice over IP operator, the latter will be in charge of defining the charging conditions. A simple possibility is providing the system with a prior authentication interface, which subsequently allows associating the conversion performed with the authenticated user.

Another user can be a third party, i.e., a third party system using the REST interface to offer the conversion of text into call to other users or third parties.

Finally, another user can be an additional service of the actual operator housing the system object of this patent. Said service will invoke the conversion of text into call by means of the REST interface. This may be applicable for including advertisements in the form of voice over IP calls in any other voice service of the operator.

Brief Description of the Drawings

Aspects and embodiments of the invention are described below based on several drawings, in which

Figure 1 shows a general view of the environment in which this invention can be applied.

Figure 2 is a diagram of the operation of an embodiment of the system of the present invention which allows an exchange of messages between the source user, the server and the destination users.

In these figures there is a series of reference numbers which identify the following elements:

1 user

2 mobile destination user

3 fixed destination user 4 server housing the Web interface of the service

5 generic IP network

6 IP network for voice over IP communications

100 source user

101 SIP_HTTP_SERVLET

102 CONVERTERJNTERFACE subsystem

103 destination

104 text-to-speech converter

200 HTTP text conversion request

201 request with an identifier

202 voice message file

204 voice session with RTP packets

205 notification of the result of the call

Embodiments of the Invention

According to the embodiment shown in Figure 1 , the system object of the present invention is formed by different elements which communicate through a generic IP network -5- allowing the users to access the system and through an IP network for voice over IP communications -6-. This IP network for voice over IP communications -6- is formed by a single SIP server capable of routing the voice over IP calls or it can be a complete IMS network of an operator, in which case the system will be housed in the network itself of the operator.

A user -1 - who wishes to convert a determined text into a voice over IP call ending in a mobile destination user -2- or fixed destination user-3- will access a Web interface offered by the server -4- housing the Web interface of the service. The Web interface can be either an HTML Web page in which case the user -1 - is a conventional Web browser, or a Web service interface based on any technology allowing machine-to-machine communication such as REST (described, for example, in Wikipedia, Representational State Transfer (REST):

http://en.wikipedia.org/wiki/Representational_State_Transfer). In the second case the user -1 - will typically be a service of a third party.

The interface offers as obligatory input parameters the text which is to be converted into speech and the recipient of the call in the form of an SIP URI (e.g. "sip:user_destination@domain"). The interface offers a button or a command which allows launching the conversion of text into call.

Once the user -1 - has launched the conversion request, the system takes care of converting the text into a synthetic voice file, by means of a standard text-to-speech converter housed in the server -4-. The file -202- thus generated is coded in an audio codec (G.711 , GSM or another one) and encapsulated in RTP (Real Time Transport Protocol) format (described, for example, in IETF. RTP: A Transport Protocol for Real-Time Applications. RFC 1889.1996).

Once the encapsulation has been performed, the system -4-, using an SIP user agent, will generate a call to the destination user -3-. The SIP user agent is a multi-line agent, i.e., it is capable of generating as many voice over

IP calls as there are requests, with the only limit of the capacity of the equipment in which said agent is executed and the available multimedia bandwidth.

The SIP user agent will use the file previously encapsulated in RTP format as multimedia traffic and does not require either receiving or storing multimedia traffic of the destination user -3- called.

Finally, once the call has ended, either completely, incompletely or in a failed manner, the result of the call will be stored including the success or failure information thereof from the signaling information provided by the SIP protocol. This information can be shown to the user -1 - who invoked the conversion, either by means of the Web interface itself or by means of a response message through the Web interface itself.

According to the embodiment of the process according to the invention shown in Figure 2, the system is formed by two elements which implement it, referred to as SIP_HTTP_SERVLET subsystem -101 - and

CONVERTERJNTERFACE subsystem -102-. The subsystem -101 - offers the Web interface and makes the call with the destination -103-. The subsystem 102 takes care of sending the text-to-speech converter -104- the text which must be converted and of collecting the result in the form of a file generated by the converter.

A user -100-, through the Web interface housed in an SIP HTTP SERVLET subsystem, sends to the service by means of the HTTP request -200- the data of the text to be converted and the destination of the voice over IP call, as a result of which the CONVERTERJNTERFACE subsystem -102- creates a file identifier and makes a text-to-speech conversion request -201 - through the interface with the converter -104-, indicating the text to be converted and the file identifier in which the converter must write the synthetic voice file.

Once the text-to-speech converter -104- has performed the conversion, the interface -102- reports the message -202- to the SIP_HTTP_SERVLET subsystem -101 - such that the voice over IP call can start. The subsystem -101 - then makes an SIP call the destination of which is the user identified by the destination -103-. The SIP signaling to be used is the standard one. In the diagram shown in Figure 2 it is assumed that the destination -103- answers the call.

Once the acceptance of the call has been detected, the subsystem -101 - starts playing the voice file -202- which has been sent duly coded and encapsulated to the destination -103- in a voice session -203- with RTP (Real Time Protocol) packets, such that the voice file -202- is played with synthetic speech -204-.

Finally, once the playing has ended, the service causes the end of the call, and the result of the call is communicated to the source user -100- by means of a notification -205-.

Without prejudice to the signaling set forth herein, it is possible for the call to not be successful due to various causes contemplated in the SIP protocol. These causes can also be notified in the message -205- such that the source user -100- has information about the result of the invoked service.

Claims

1 . Method for converting text messages into voice over IP calls from a Web interface, which comprises sending to a destination (2, 3, 103) a voice message (204) generated from a text sent by a source user (1 , 100) by means of a text-to-speech converter (104), characterized in that it comprises: a first step in which a text message contained in an HTTP request

(200) entered by a source user (1 , 100) through a Web interface of a first subsystem (101 ) comprising an SIP user agent is provided with at least one identifier for generating a request with an identifier (201 ) comprising a file identifier and a text corresponding to the text message present in the HTTP request (200);

a second step in which the request with an identifier (201 ) is sent to a text-to-speech converter (104) by means of a second subsystem (102);

a third step in which the text message contained in the request with an identifier (201 ) is converted into a voice file (202) by means of the text-to- synthetic speech converter (104);

a fourth step in which the text-to-synthetic speech converter (104) sends to the first subsystem (101 ) the voice file (202) to a voice message folder identified by the identifier indicated in the request with an identifier

(201 ) ;

a fifth step in which, by means of the first subsystem (101 ), a voice over IP call is made to the destination (2, 3, 103) indicated in the HTTP request (200);

a sixth step in which, when the destination (3, 103) answers the voice call, a voice session (203) with the destination (2, 3, 103) is initiated in which the voice file (202) is played with synthetic speech (204) by the first subsystem (101 ) and transmitted to the destination (2, 3, 103);

a seventh step in which the first subsystem ends the voice call and generates a notification (205) to inform the source user (1 , 100) about the result of the voice call, the notification (205) being a delivery notification when the voice file has been communicated to the destination (2, 3, 103), and an error notification when the voice message has not been able to be transmitted to the destination (2, 3, 103); and

an eighth step in which the first subsystem (101 ) transmits the notification (205) to the source user (1 , 100).

2. Method according to claim 1 , characterized in that the first subsystem (101 ) comprises an SIP HTTP SERVLET system.

3. Method according to claim 1 or 2, characterized in that the second subsystem (102) comprises a CONVERTER INTERFACE system.

4. Method according to one of the previous claims, characterized in that the destination (2, 3, 103) is a mobile user (2).

5. Method according to one of the previous claims, characterized in that the destination (2, 3, 103) is a fixed user (3).

6. Method according to one of the previous claims, characterized in that the Web interface is housed in a server (4).

7. Method according to one of the previous claims, characterized in that the Web interface is an HTML Web page and the user (1 , 100) is a Web browser.

8. Method according to one of the previous claims, characterized in that the Web interface is a Web service interface with technology allowing a machine-to-machine communication.

9. Method according to claim 8, characterized in that the technology allowing a machine-to-machine communication is based on Representational State Transfer technology.

10. Method according to claim 8 or 9, characterized in that the user (1 , 100) is a service of a third party.

1 1 . Method according to one of the previous claims, characterized in that the voice file (202) is coded in an audio codec and encapsulated in RTP format.

12. Method according to claim 1 1 , characterized in that the codec is a G.71 1 codec or a GSM codec.

13. System for converting text messages into voice over IP calls from a Web interface to a destination (2, 3, 103), comprising means for receiving text messages from a plurality of users (1 , 100), a text-to-speech converter (104) for converting text messages sent by each of the source users (1 , 100) into respective voice messages (204) and means for delivering each voice message (204) to its destination (2, 3, 203), characterized in that it further comprises a first subsystem (101 ) and a second subsystem (101 ), the first subsystem (101 ) comprising

a Web interface for entering HTTP message requests (200) comprising text messages,

an SIP user agent for generating respective requests with an identifier (201 ), each of which comprises an identifier and a text corresponding to the text message present in the HTTP request (200), for sending the requests with an identifier (201 ) to the second subsystem (102), the second subsystem (102) comprising means for sending the text of each request with an identifier (201 ) to the text-to-speech converter (104), in which the text contained in each message is converted into a voice file (202) with synthetic speech, receiving each voice file (202) and sending each voice file (202) to a message folder in the first subsystem (201 ) identified by the identifier,

means for sending voice over IP calls to each destination (2, 3, 103) indicated in an HTTP request (200);

means for, when a destination (2, 3, 103) answers the voice call, initiating a voice session (203) in which the voice file (202) is played with synthetic speech (204) by the first subsystem (101 ) and transmitted to the destination (2, 3, 103);

means for ending the voice call and generating a notification (205) to inform the source user (1 , 100) about the result of the voice call, the notification (205) being a delivery notification when the voice file has been communicated to the destination (2, 3, 103), and an error notification when the voice message has not been able to be transmitted to the destination (2, 3, 103); and means for transmitting the notification to the source user (1 , 100).

14. System according to claim 13, characterized in that the first subsystem (101 ) comprises an SIP HTTP SERVLET system.

15. System according to claim 13 or 14, characterized in that the second subsystem (102) comprises a CONVERTER INTERFACE system.

1 6. System according to one of claims 13 to 15, characterized in that the destination (2, 3, 103) is a mobile user (2).

17. System according to one of claims 13 to 16, characterized in that the destination (2, 3, 103) is a fixed user (3).

18. System according to one of claims 13 to 17, characterized in that the Web interface is housed in a server (4).

19. System according to one of claims 13 to 18, characterized in that the Web interface is an HTML Web page and the user (1 , 100) is a Web browser.

20. System according to one of claims 13 to 19, characterized in that the Web interface is a Web service interface with technology allowing a machine-to-machine communication.

21 . System according to claim 20, characterized in that the technology allowing a machine-to-machine communication is based on Representational State Transfer technology.

22. System according to claim 20 or 21 , characterized in that the user (1 , 100) is a service of a third party.

23. System according to one of claims 13 to 22, characterized in that the voice file (202) is coded in an audio codec and encapsulated in RTP format.

24. System according to claim 23, characterized in that the codec is a G.71 1 codec or a GSM codec.