[go: up one dir, main page]

WO2010059120A1 - Procédé, serveur multimédia, programme informatique et produit de programme informatique pour combiner une parole associée à une voix au cours d'une session de communication vocale par protocole internet entre des équipements d'utilisateur, en combinaison avec des applications basées sur le web - Google Patents

Procédé, serveur multimédia, programme informatique et produit de programme informatique pour combiner une parole associée à une voix au cours d'une session de communication vocale par protocole internet entre des équipements d'utilisateur, en combinaison avec des applications basées sur le web Download PDF

Info

Publication number
WO2010059120A1
WO2010059120A1 PCT/SE2009/051313 SE2009051313W WO2010059120A1 WO 2010059120 A1 WO2010059120 A1 WO 2010059120A1 SE 2009051313 W SE2009051313 W SE 2009051313W WO 2010059120 A1 WO2010059120 A1 WO 2010059120A1
Authority
WO
WIPO (PCT)
Prior art keywords
media server
text
speech
unit
contextual data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/SE2009/051313
Other languages
English (en)
Inventor
Catherine Mulligan
Magnus Olsson
Ulf Olsson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telefonaktiebolaget LM Ericsson AB
Original Assignee
Telefonaktiebolaget LM Ericsson AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget LM Ericsson AB filed Critical Telefonaktiebolaget LM Ericsson AB
Priority to EP09827832.8A priority Critical patent/EP2351022A4/fr
Priority to US13/129,828 priority patent/US20110224969A1/en
Priority to CN2009801464301A priority patent/CN102224543A/zh
Publication of WO2010059120A1 publication Critical patent/WO2010059120A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M7/00Arrangements for interconnection between switching centres
    • H04M7/0024Services and arrangements where telephone services are combined with data services
    • H04M7/0027Collaboration services where a computer is used for data transfer and the telephone is used for telephonic communication
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1083In-session procedures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/765Media network packet handling intermediate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/561Adding application-functional data or data for application control, e.g. adding metadata
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/564Enhancement of application control based on intercepted application data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/40Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition

Definitions

  • the invention relates to a field of telecommunication, and more particularly to a media server, method, computer program and computer program product for combining a speech related to a voice over IP (VoIP) voice communication session between user equipments, with a web based applications.
  • VoIP voice over IP
  • IMS IP Multimedia Subsystem
  • the IMS network can be used to set up and control multimedia sessions for "IMS enabled" terminals connected to various access networks, regardless of the access technology used.
  • the IMS concept can be used for fixed and mobile IP terminals .
  • Multimedia sessions are handled by specific session control nodes in the IMS network, e.g. the nodes P-CSCF (Proxy Call Session Control Function), S-CSCF (Serving Call Session Control Function), and I-CSCF (Interrogating Call Session Control Function) .
  • a database node HSS Home Subscriber Server
  • HSS Home Subscriber Server
  • the Media Resource Function provides media related functions such as media manipulation (e.g. voice stream mixing) and playing of tones and announcements.
  • media manipulation e.g. voice stream mixing
  • playing of tones and announcements e.g. voice stream mixing
  • Each MRF is further divided into a Media Resource Function Controller
  • MRFC Media Resource Function Processor
  • MRFP Media Resource Function Processor
  • MRFC is a signalling plane node that acts as a SIP (Session Initiation Protocol) User Agent to the S-CSCF, and which controls the MRFP.
  • SIP Session Initiation Protocol
  • the MRFP is a media plane node that implements all media-related functions.
  • a Back-to-Back User Agent acts as a user agent to both ends of a SIP call.
  • the B2BUA is responsible for handling all SIP signalling between both ends of the call, from call establishment to termination. Each call is tracked from beginning to end, allowing the operators of the B2BUA to offer value-added features to the call.
  • the B2BUA acts as a User Agent server on one side and as a User
  • Agent client on the other (back-to-back) side.
  • the IMS network may also include various application servers and/or be connected to external ones. These servers can host different multimedia services or IP services.
  • voice This service has some problems today. One example is that it is necessary for the users to speak the same language. It is also not possible to combine to integrate the voice service with other services in a convenient way.
  • the IMS network is a platform designed to be used in conjunction with other Internet services using Mobile
  • VoIP voice over IP
  • the objective of the invention is to provide a translation application for e.g. translations and subtitles of the ongoing voice conversation and/or IPTV broadcast to the end- users so they can manage storage, maintenance, search and process voice based content. This is achieved by the different aspects of the invention described below.
  • a method, in a media server for combining a speech related to a voice over IP
  • VoIP Voice IP voice communication session between a user equipment A (UE-A) and a user equipment B (UE-B) , with a web based applications, the method further comprising the media server performing the following steps: - capturing the speech related to the VoIP voice communication session;
  • the contextual data is a subtitle
  • the method further comprising the step of sending the subtitle to the UE-B.
  • the contextual data is a translation
  • the method further comprising the step of sending the translation to the UE-B.
  • the method further comprises the steps of
  • the step of creating a contextual data comprises the sub-steps of
  • the UE-A is a set top box.
  • the contextual data and/or the web page links as an Internet text based corpora/web viewing format, wherein the step of storing may be done in a web technology application server and/or a storage unit and/or a media server storage unit .
  • a media server for combining a speech related to the voice over IP (VoIP) voice communication session between the user equipment A (UE-A) and the user equipment B (UE-B) , with the web based applications, the media server comprising: - a capturing unit for capturing the speech of the VoIP
  • VoIP voice over IP
  • converting unit for converting the speech to text
  • creating unit for creating a contextual data by adding the service from web based applications using said text.
  • the media server comprises : - a subtitle unit for converting the text to subtitles;
  • a translation unit for converting the text to a translation
  • - an output unit for sending the translation to the UE-B.
  • the media server may comprise: — a speech unit for converting the translation into the translated speech;
  • the media server may comprise:
  • the UE-A may be the set top box.
  • the media server may provide the contextual data in realtime to the UE-A and/or UE-B.
  • the media server may provide a real-time output of the subtitles in parallel of an IMS voice session.
  • the media server may provide a real-time output of the translation in parallel of an IMS voice session.
  • the media server may provide a real-time output of the translated speech to the UE-B.
  • the media server may in one embodiment comprise: - a location based unit for sending the text to a location based services application server;
  • an input unit for receiving the contextual text in the form of a location information
  • - an output unit for sending the location information to the UE-B and/or UE-A.
  • the media server may comprise the output unit for sending the contextual data for storage on a web technology application server and/or storage unit and/or a media server storage unit.
  • the output unit for outputting and returning to the UE-A and/or UE-B with the list of the web page links from the search.
  • the media server may in one embodiment comprise the output unit for sending the contextual data and/or the list of web page links as an internet based corpora/web viewing format for storage on the web technology application server .
  • a computer program comprising computer readable code means which when run on the media server causes the media server to: - capture a speech related to a voice over IP (VoIP) voice communication session;
  • VoIP voice over IP
  • the computer readable code means which when run on the media server causes the media server to perform the step of converting the text to a subtitle.
  • the computer readable code means which when run on the media server causes the media server to perform the step of converting the text to a translation.
  • the computer readable code means which when run on the media server causes the media server to perform the step of converting the subtitles and the translation into a speech.
  • computer readable code means which when run on the media server causes the media server to perform the step of converting the text an advertisement for a UE-A and/or UE-B.
  • computer readable code means which when run on the media server causes the media server to perform the step of outputting a location based information for a UE-A and/or a UE-B.
  • computer program product for the media server connected to the voice over IP (VoIP) voice communication session, the media server having a processing unit, the computer program product comprises the computer program above and a memory, wherein the computer program is stored in the memory.
  • VoIP voice over IP
  • Internet domain - a non-exhaustive list is: real-time translation, inserting subtitles into an ongoing video stream, voice-based search engine, context-based advertising, etc.
  • FIG. 1 illustrates a flow diagram of call sessions according to an embodiment of the invention.
  • FIG. 1 illustrates a flow diagram for an IPTV based embodiment .
  • FIG. 1 illustrates a flow diagram for a second embodiment .
  • FIG. 3 illustrates a flow diagram for a third embodiment .
  • FIG. 4 illustrates a detailed flow diagram for the embodiment in Figure 3.
  • FIG. 4a illustrates a media server 600 according to an embodiment of the invention.
  • FIG. 4b illustrates a creating unit 640 of the media server 600.
  • FIG. 4c illustrates a voice based internet service comprising the media server 600 and the web based applications 170 - Figure 5 illustrates a flow diagram for a fourth embodiment .
  • FIG. 6 illustrates another aspect of the media server 600 with computer program product and computer program.
  • web based applications The number of web based applications is continuously growing. Examples are web based communities and hosted services, such as social-networking sites, wikis and blogs, which aim to facilitate creativity, collaboration, and sharing between users.
  • a Web 2.0 technology is an example of such web based applications 170 (see Fig 4c) .
  • a media server 600 for combining a speech related to a voice over IP (VoIP) voice communication session between users, with the web based applications 170 whereby improving the voice service in a voice over IP (VoIP) session such as a Skype technology or a network architecture called IMS (IP Multimedia Subsystems) developed by the 3 rd Generation Partnership Project (3GPP) e.g. IMS core 120.
  • VoIP voice over IP
  • IMS IP Multimedia Subsystems
  • 3GPP 3 rd Generation Partnership Project
  • a method is provided in the media server 600 for combining the speech related to the VoIP voice communication session between users, with the web based applications 170.
  • a computer program for the media server 600 is provided.
  • a computer program product for the media server 600 is provided.
  • a concept of the invention is to capture the voice content i.e. a speech of the VoIP session i.e. in a Skype or an IMS session and "mash up'Vcombine the content with the web based applications 170.
  • An end-user that wishes to use one of the services that adds value to the ongoing voice call does this by establishing a call and indicating that they wish to e.g. use subtitles for the ongoing conversation. This could be done by clicking on a web link, either from a PC, or a mobile terminal.
  • a subtitling application would then establish a call via the IMS core 120 between a user equipment A (UE-A) 110 and a user equipment B (UE-B) 140, linking in the media server 600 e.g. a Media Resource Function Proxy/Processor (MRFP) into the voice session.
  • the UE-A may also be a SET TOP Box (STB) 110a e.g. an IPTV broadcast that establishes the TV session.
  • STB SET TOP Box
  • the speech between end users A and B is captured/intercepted by the media server 600, converted to a text, converted into a contextual data and this contextual data is passed onto the receiving user e.g. via UE-B 140.
  • the speech to text transformation and conversion e.g. into the contextual data form could be created by services run in the Internet domain and "mashed up'Vcombined with the traffic e.g. voice from an IMS network. This is described in more detail in the later sections of the detailed description.
  • the service can be invoked by one of several methods; through provisioning Initial Filter Criteria in an HSS that links in the translation service during the call establishment to an end-user .
  • the service can be invoked using mechanisms such as the Parlay-X.
  • the media server 600 could analyse the call case by e.g. matching the caller-callee pair to assess which conversations need to invoke a mash-up service, e.g. translation into another language or subtitling; if the call needs translation, the IMS core 120 links in the correct media server 600, rather than forwarding the call directly to the B-party.
  • the callee party it is also possible for the callee party to invoke the inverse of the called party; for example, the callee gets Swedish to Mandarin translations, while the called party gets Mandarin to Swedish.
  • Figure 1 illustrates a possible call flow 100 for subtitling during an IMS voice session. Other call flows are possible, based on how a service is invoked, as described in the paragraph above.
  • the Figure 1 comprises the following elements : — There are two user equipments, the UE-A 110 and the
  • Translation application unit 130 comprising the media server 600 and the web based applications
  • Voice-to-text converter application 132 a voice/speech to text translator application
  • Translate text converter 133 application an application to translate the text to another language .
  • the UE-A 110 places a call to the UE-B 140 using the Translation application unit 130 comprised in the media server 600, requesting the subtitles to be provided between e.g. Swedish and Mandarin.
  • the Translation application unit 130 contains the media server 600 functionality that performs as a Back to Back User Agent (B2BUA) .
  • B2BUA Back to Back User Agent
  • the media server 600 functions establish two call legs; one to the UE-A 110 and one to the UE-B 140 by sending an INVITE message to the IMS core 120.
  • the IMS Core 120 sends an INVITE message to the UE-A 110 with the IP address and port number of the media server B2BUA.
  • the IMS Core 120 sends the INVITE message to the UE-B 140 with the IP address and port number of the media server B2BUA.
  • the UE-A 110 responds with a 200 OK message.
  • the UE-B 140 responds with the 200 OK message.
  • Voice media now flows via the media server 600 functions of the B2BUA.
  • the end user A speaks Swedish as per normal.
  • the media server 600 captures the speech from the UE-A' s call leg.
  • the media server 600 converts it to the text using the voice-to-text converter application 132. This text is the extracted text that can be mashed up with Internet technologies in the web based applications 170.
  • the media server 600 functions as a gateway toward the web based applications 170 as shown in Figure 4c.
  • the text thus extracted from the speech can now be converted into the contextual data by sending it to the translate text converter application 133 on the web based applications 170 whereby outputting a translation.
  • a translation is Alta vista's "babel fish"; the translation is returned in the text form in the UE-B 140' s language.
  • the text thus extracted from the speech can now be converted into the contextual data by feeding the extracted text into e.g. Google's APIs to provide advertising that is contextual to the ongoing conversation .
  • the contextual data e.g. the subtitles are sent back to the media server 600 for transmission along with the speech/voice session.
  • the media server B2BUA sends the speech and the subtitles as a multimedia session.
  • the media server 600 captures the voice part of the video stream.
  • the media server 600 converts the speech to text and allows the end-user to select the language of the subtitles for that program. Following steps are performed:
  • Figure Ia illustrates a call flow 100a for subtitling during the IPTV session. Other call flows are possible, based on how the service is invoked, as described in the paragraph above.
  • the Figure Ia comprises the following elements:
  • the STB 110a there is one user equipment, e.g. the STB 110a in the form of e.g. an IPTV broadcast.
  • IMS core 120 The IPTV session is going through the IMS network; - The Translation application unit 130, comprising the media server 600 and the web based applications 170;
  • Voice-to-text converter application 132 a voice/speech to text translator application
  • Translate text converter application 133 an application to translate the text to another language
  • subtitle application 130a comprising both the voice-to-text converter application 132 and the translate text converter application 133.
  • the STB 110a places a TV channel request to the IPTV provider using the Translation application unit 130 i.e. comprising the media server 600, requesting the subtitles to be provided e.g. Swedish or Mandarin.
  • the IMS core 120 establish two sessions; one to the subtitle application 130a and one to the media server 600 by sending an INVITE from the IMS core 120.
  • Both the subtitle application 130a and the media server 600 return the 200 OK message to the IMS core
  • the IMS core 120 sends the 200 OK message to the STB 110a with a combined session description protocol (SDP) with two media flows, e.g. one media stream for a channel X and one media stream for the subtitles.
  • the media server 600 sends the media e.g. channel X to the STB 110a and to the subtitle application 130a.
  • the subtitle application 130a converts the media to text and translates to a target language.
  • the subtitle application 130a sends the subtitles to the STB 110a.
  • the STB 110a has co-ordination mechanism based on time tags in the incoming subtitle stream.
  • the above solution is also suitable to be used in conjunction with e.g. news broadcasts to provide subtitles on an IPTV service. This will provide a better configurability for the end users rather than traditional subtitling on a TV program. The end users could be able to choose exactly the language that they want to see the subtitles in.
  • Figure 2 illustrates a call flow 200 for translation of voice during a voice session.
  • the Figure 2 comprises the following elements:
  • UE-A 110 There are two user equipments, the UE-A 110 and the UE-B 140.
  • the IMS core 120 The voice session is going through the IMS network.
  • the Translation application unit 130 comprising the media server 600 and the web technologies 170 functions .
  • the Voice-to-text converter application 132 a voice to text translator application.
  • the Translate text converter application 133 an application to translate the text to another language .
  • a Text-to-voice converter application 134 an application to a text to voice translator.
  • the UE-A 110 places a call to UE-B 140 using the Translation Service application 130 comprising the media server 600, requesting the subtitles to be provided between e.g. Swedish and Mandarin.
  • the Translation service application contains the media server 600 functionality that performs as the B2BUA.
  • the media server 600 functions establish two call legs; one to the UE-A 110 and one to the UE-B 140 by sending the INVITE message to the IMS core 120.
  • the IMS Core 120 sends the INVITE message to the UE-A 110 with the IP address and port number of the media server B2BUA.
  • the IMS Core 120 sends the INVITE message to the UE-B 140 with the IP address and port number of the media server B2BUA.
  • the UE-A 110 responds with the 200 OK.
  • the UE-B 140 responds with the 200 OK.
  • Voice media now flows via the media server 600 functions of the B2BUA.
  • End User A speaks Swedish as per normal h)
  • the media server 600 captures the speech from the UE-A 110's call leg.
  • the media server 600 converts it to the text using the voice-to-text converter application 132. This is the "data" that can be mashed up with Internet technologies in the web based applications 170 and form the contextual data.
  • the media server 600 works as the gateway toward the web based applications 170 as shown in Figure 4c.
  • This text thus extracted text from speech can now be converted into the contextual data by sending it to the translate text converter application 133 on the web based applications 170 for conversion into contextual data.
  • One example is Alta vista's "babel fish" for language translation; the contextual data i.e. the translation is returned in text format to in the UE-B 140' s language.
  • the contextual data is thus a language translation .
  • the contextual data i.e. the translation thus retrieved from the mash-up/combining is converted back to a translated speech in the selected language using the text-to-speech converter application 134.
  • the media server B2BUA sends the translated speech to the UE-B 140.
  • Figure 3 describes procedural steps 300 performed by the media server 600, for combining the speech related to the VoIP voice communication session such as a IMS based voice communication session between the UE-A 110 and the UE-B 140, with the web based applications 170.
  • the media server 600 performs the following steps for the combining of the IMS voice communication session with the web based applications 170.
  • first step 310 the media server 600 captures the speech related to the IMS voice communication session.
  • the initialization procedure is initiated by UE-A 110/UE-B 140 as described earlier in the steps 1-7 and the capturing process in step 8 in the Figure 1 and similarly by the steps a-g in Figure 2.
  • the media server 600 converts the speech to a text; i.e.
  • the media server 600 creates the contextual data by adding a service from the web based applications 170 using the text.
  • the creation of the contextual data and subsequent transfer of the contextual data to the UE-A 110 and/or the UE-B 140 is performed i.e. in the steps 10-12 in Figure 1 and steps j -m in Figure 2.
  • the invention allows greater value to be derived from an IMS connectivity by retrieving the voice data from the ongoing voice session.
  • This conversational data i.e. the extracted text is then used to provide greater value to the end-users of the IMS core 120 by mashing up this data with the web based applications 170, e.g. the web 2.0 technologies.
  • Figure 4 describes schematically a flow 400, different forms pertaining to the extracted text being converted to the contextual data e.g. in steps 320, 330 of Figure 3 among others.
  • the media server 600 in combination with web based applications 170 may convert the text to subtitles.
  • the media server 600 in combination with the web based applications 170 may convert the text to the translation e.g. into a different language.
  • the media server 600 in combination with the web based applications 170 may convert the subtitles and the translation into the speech.
  • the text may be sent to an advertising application server 160 which converts the text to meaningful advertisements i.e. the contextual text for the user.
  • step 450 the text may be sent to a location based application server 150 to output e.g. location based information for the user. Further in step 460, the output from steps 410-450 are sent to the user. The steps 410-450 maybe performed individually or in combination as an output to the user.
  • Figure 4a shows schematically an embodiment of the media server 600.
  • the media server 600 has a
  • the creating unit 640 has a
  • Figure 4c describes schematically another embodiment of the invention.
  • the Figure 4c shows the functional relationship between the media server 600 and the web based applications 170 to create a voice based internet service.
  • the location based application server 150 and the advertising application server 160 may either be connected to the web based applications 170 or the media server 600.
  • the process of such voice based internet service is described later on in Figure 5.
  • the web based applications 170 may include some of similar components of the media server 600 shown in Figure 4a and 4b.
  • the web based applications 170 may comprise a search unit 172 and a storage unit 173.
  • a call would be established via the IMS core 120 that links in the "voice-based Internet Service".
  • This service would provide the following functionality:
  • This service may be used as the basis of several different types of application, for example:
  • - End-users may submit voice-based ⁇ web-pages' to be stored in the multimedia corpora for others to be able to use. For example, someone records a voice web page about "Drip Irrigation for use in drought affected areas", instead of typing the content they speak the content into their phone or other IMS terminal. The end- user indicates that they are finished recording their message and the service then prompts the end-user to submit keywords to describe the piece. In this example, it could be "drought”, “irrigation”, “minimise use of water”, “minimise use of fertiliser”, etc. This is then captured by the service and stored in an appropriate format . - Voice can be saved either in a server accessible for the public on the ⁇ public' Internet or in a ⁇ private' network.
  • the private storage area could be based within the Operator' s network. - If the end-user wishes, they can also indicate that they wish for the voice-based web page to be converted to text and stored on the Internet in text-based format for those that may wish to read it, rather than listen to it.
  • - Voice or other multimedia corpora can then be searched using several different mechanisms; XML, or other Natural Language Processing (NLP) mechanisms.
  • NLP Natural Language Processing
  • the end-users may utilise the service to search text-based corpora and have the text converted to speech.
  • Figure 5 describes very schematically a procedure flow 500, with numerous other embodiments relating to storing, retrieving and converting the contextual data.
  • the contextual data may be stored in a web technology application server 171 e.g. Internet or IP-based application server.
  • stored content of the contextual data may be searched on the web e.g. by the search unit 172 in assistance with the web technology application server 171.
  • the media server 600 in combination with the web based applications 170 may output and return to the UE-A 110 and/or UE-B 140 a list of web page links from searching the content of the contextual data.
  • the search results and the contextual data may be stored on the web e.g.
  • the contextual data may be retrieved and converted by the media server 600 to the translated speech which subsequently may be stored e.g. on the web technology application server 171 for later viewing and access.
  • the translated speech maybe is output to the user for playback.
  • the storage unit 173 maybe utilized for steps 510 and 540 described earlier.
  • the storage unit 173 may utilize cloud computing for storage optimization.
  • a media server storage unit 614 maybe utilized for steps 510 and 540 described earlier as shown in Figure 6.
  • the search unit 172 has access to both stored user data in the media server storage unit 614 and the storage unit 173.
  • FIG. 6 shows schematically an embodiment of the media server 600.
  • a processing unit 613 e.g. with a DSP (Digital Signal Processor) and an encoding and a decoding modules.
  • the processing unit 613 can be a single unit or a plurality of units to perform different steps of procedure 300,400 and 500.
  • the media server 600 also comprises the input unit 660 and the output unit 670 for communication with the IMS core 120, the web based applications 170, the location based application server 150 and the advertising application server 160.
  • the input unit 660 and output unit 670 may be arranged as one port/in one connector in the hardware of the media server 600.
  • the media server 600 comprises at least one computer program product 610 in the form of a non-volatile memory, e.g. an EEPROM and a flash memory or a disk drive.
  • the computer program product 610 comprises a computer program 611, which comprises computer readable code means which when run on the media server 600 causes the media server 600 to perform the steps of the procedure 300, 400 and 500 described earlier .
  • the computer readable code means in the computer program 611 of the media server 600 comprises a capturing module 611a for capturing the speech of the IMS voice session; a converting module 611b for converting the speech to text; and a creating module 611c for adding the service from web based applications 170 using the text, in the form of computer program code structured in computer program modules.
  • the modules 611a-c essentially performs the steps of flow 300 to emulate the device described in Figure 4a. In other words, when the different modules 611a-c are run on the processing unit 613, they correspond to the corresponding units 620, 630, 640 of Figure 4a.
  • the creating module 611c may comprise a location based module 61Ic-I for converting the text to subtitles; a translation module 611c-2 for converting the text to the translation e.g. into different languages; a speech module 611c-3 for converting the subtitles and the translation into the speech; an advertisement module 611c-4 for converting the text to meaningful advertisement for the user; and a location based module 611c-5 for outputting location based information for the user, in the form of computer program code structured in computer program modules.
  • the modules 61Ic-I to 611c-5 essentially performs the steps of flow 400 to emulate the device described in Figure 4b. In other words, when the different modules 61Ic-I to 611c-5 are run on the processing unit 613, they correspond to the corresponding units 641-645 of Figure 4b.
  • the computer readable code means in the embodiments disclosed above in conjunction with Figure 6 are implemented as computer program modules which when run on the media server 600 causes the media server 600 to perform steps described e.g. earlier in the conjunction with figures mentioned above. At least one of the corresponding functions of the computer readable code means maybe implemented at least partly as hardware circuits in the alternative embodiments described earlier.
  • the computer readable code means may be implemented within the media server database

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Library & Information Science (AREA)
  • Telephonic Communication Services (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

L'invention porte sur un serveur multimédia, sur un procédé, sur un programme informatique et sur un produit de programme informatique pour le serveur multimédia, pour combiner une parole associée à une voix, au cours d'une session de communication vocale par protocole Internet (VoIP) entre un équipement d'utilisateur A et un équipement d'utilisateur B, avec des applications basées sur le Web. Le procédé comprend de plus la réalisation par le serveur multimédia des étapes suivantes : la capture de la parole associée à la session de communication vocale VoIP ; la conversion de la parole en un texte ; la création de données contextuelles par addition d'un service à partir des applications basées sur le Web utilisant le texte. Le serveur multimédia comprend une unité de capture pour capturer la parole de la session de communication vocale VoIP ; une unité de conversion pour convertir la parole en texte ; une unité de création pour créer des données contextuelles par addition de services à partir d'applications basées sur le Web utilisant ledit texte. De plus, un programme informatique et un produit de programme informatique sont fournis pour le serveur multimédia.
PCT/SE2009/051313 2008-11-21 2009-11-20 Procédé, serveur multimédia, programme informatique et produit de programme informatique pour combiner une parole associée à une voix au cours d'une session de communication vocale par protocole internet entre des équipements d'utilisateur, en combinaison avec des applications basées sur le web Ceased WO2010059120A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP09827832.8A EP2351022A4 (fr) 2008-11-21 2009-11-20 Procédé, serveur multimédia, programme informatique et produit de programme informatique pour combiner une parole associée à une voix au cours d'une session de communication vocale par protocole internet entre des équipements d'utilisateur, en combinaison avec des applications basées sur le web
US13/129,828 US20110224969A1 (en) 2008-11-21 2009-11-20 Method, a Media Server, Computer Program and Computer Program Product For Combining a Speech Related to a Voice Over IP Voice Communication Session Between User Equipments, in Combination With Web Based Applications
CN2009801464301A CN102224543A (zh) 2008-11-21 2009-11-20 用于将与用户设备之间的基于IP的语音的语音通信会话相关的话音同基于web的应用进行组合的方法、媒体服务器、计算机程序和计算机程序产品

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11676108P 2008-11-21 2008-11-21
US61/116,761 2008-11-21

Publications (1)

Publication Number Publication Date
WO2010059120A1 true WO2010059120A1 (fr) 2010-05-27

Family

ID=42198365

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2009/051313 Ceased WO2010059120A1 (fr) 2008-11-21 2009-11-20 Procédé, serveur multimédia, programme informatique et produit de programme informatique pour combiner une parole associée à une voix au cours d'une session de communication vocale par protocole internet entre des équipements d'utilisateur, en combinaison avec des applications basées sur le web

Country Status (3)

Country Link
EP (1) EP2351022A4 (fr)
CN (1) CN102224543A (fr)
WO (1) WO2010059120A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013013284A1 (fr) * 2011-07-28 2013-01-31 Research In Motion Limited Système et procédé pour la diffusion de sous-titres
CN103685196A (zh) * 2012-09-19 2014-03-26 上海港联电信股份有限公司 基于云计算的精准数据分析通话系统及其方法
CN113473238A (zh) * 2020-04-29 2021-10-01 海信集团有限公司 一种智能设备及视频通话时的同声翻译方法
US11509696B2 (en) * 2018-08-01 2022-11-22 Telefonaktiebolaget Lm Ericsson (Publ) Methods and apparatuses for enhancement to IP multimedia subsystem

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103685985A (zh) * 2012-09-17 2014-03-26 联想(北京)有限公司 通话方法、发送装置、接收装置、语音处理和终端设备
CN114127735A (zh) * 2019-07-23 2022-03-01 瑞典爱立信有限公司 通信网络中的用户设备、网络节点和方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020176404A1 (en) * 2001-04-13 2002-11-28 Girard Gregory D. Distributed edge switching system for voice-over-packet multiservice network
WO2007078200A1 (fr) * 2005-12-30 2007-07-12 Tandberg Telecom As Flux multimedia consultable
US20080034056A1 (en) * 2006-07-21 2008-02-07 At&T Corp. System and method of collecting, correlating, and aggregating structured edited content and non-edited content
US20080052069A1 (en) * 2000-10-24 2008-02-28 Global Translation, Inc. Integrated speech recognition, closed captioning, and translation system and method
WO2008066836A1 (fr) 2006-11-28 2008-06-05 Treyex Llc Procédé et appareil pour une traduction de la parole durant un appel
WO2008130842A1 (fr) 2007-04-20 2008-10-30 Utbk, Inc. Procédés et systèmes pour raccorder des personnes via la réalité virtuelle pour des communications en temps réel
WO2009011549A2 (fr) 2007-07-19 2009-01-22 Seo-O Telecom Co., Ltd Système de traduction en temps réel et procédé pour contenu de téléphone mobile

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6654722B1 (en) * 2000-06-19 2003-11-25 International Business Machines Corporation Voice over IP protocol based speech system
US20080240379A1 (en) * 2006-08-03 2008-10-02 Pudding Ltd. Automatic retrieval and presentation of information relevant to the context of a user's conversation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080052069A1 (en) * 2000-10-24 2008-02-28 Global Translation, Inc. Integrated speech recognition, closed captioning, and translation system and method
US20020176404A1 (en) * 2001-04-13 2002-11-28 Girard Gregory D. Distributed edge switching system for voice-over-packet multiservice network
WO2007078200A1 (fr) * 2005-12-30 2007-07-12 Tandberg Telecom As Flux multimedia consultable
US20080034056A1 (en) * 2006-07-21 2008-02-07 At&T Corp. System and method of collecting, correlating, and aggregating structured edited content and non-edited content
WO2008066836A1 (fr) 2006-11-28 2008-06-05 Treyex Llc Procédé et appareil pour une traduction de la parole durant un appel
WO2008130842A1 (fr) 2007-04-20 2008-10-30 Utbk, Inc. Procédés et systèmes pour raccorder des personnes via la réalité virtuelle pour des communications en temps réel
WO2009011549A2 (fr) 2007-07-19 2009-01-22 Seo-O Telecom Co., Ltd Système de traduction en temps réel et procédé pour contenu de téléphone mobile

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2351022A4

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013013284A1 (fr) * 2011-07-28 2013-01-31 Research In Motion Limited Système et procédé pour la diffusion de sous-titres
US9591032B2 (en) 2011-07-28 2017-03-07 Blackberry Limited System and method for broadcasting captions
CN103685196A (zh) * 2012-09-19 2014-03-26 上海港联电信股份有限公司 基于云计算的精准数据分析通话系统及其方法
US11509696B2 (en) * 2018-08-01 2022-11-22 Telefonaktiebolaget Lm Ericsson (Publ) Methods and apparatuses for enhancement to IP multimedia subsystem
US20230071920A1 (en) * 2018-08-01 2023-03-09 Telefonaktiebolaget Lm Ericsson (Publ) Methods and Apparatuses for Enhancement to IP Multimedia Subsystem
US11909775B2 (en) 2018-08-01 2024-02-20 Telefonaktiebolaget Lm Ericsson (Publ) Methods and apparatuses for enhancement to IP multimedia subsystem
CN113473238A (zh) * 2020-04-29 2021-10-01 海信集团有限公司 一种智能设备及视频通话时的同声翻译方法
CN113473238B (zh) * 2020-04-29 2022-10-18 海信集团有限公司 一种智能设备及视频通话时的同声翻译方法

Also Published As

Publication number Publication date
CN102224543A (zh) 2011-10-19
EP2351022A4 (fr) 2017-05-10
EP2351022A1 (fr) 2011-08-03

Similar Documents

Publication Publication Date Title
US20110224969A1 (en) Method, a Media Server, Computer Program and Computer Program Product For Combining a Speech Related to a Voice Over IP Voice Communication Session Between User Equipments, in Combination With Web Based Applications
TWI440346B (zh) 基於開放架構之域相依即時多語系通信服務
WO2008036651A2 (fr) Procédé et système pour communications réseau
US20120259924A1 (en) Method and apparatus for providing summary information in a live media session
EP2351022A1 (fr) Procédé, serveur multimédia, programme informatique et produit de programme informatique pour combiner une parole associée à une voix au cours d'une session de communication vocale par protocole internet entre des équipements d'utilisateur, en combinaison avec des applications basées sur le web
CN103067188A (zh) 一种网络电话会议系统及其实现方法
US8908853B2 (en) Method and device for displaying information
US9148518B2 (en) System for and method of providing video ring-back tones
Fowdur et al. Performance analysis of webrtc and sip-based audio and video communication systems
KR20180005575A (ko) 통신 단말 기반의 그룹 통화 보안 장치 및 방법
EP4037349B1 (fr) Procédé permettant de fournir une fonctionnalité d'aide vocale à l'utilisateur final au moyen d'une connexion vocale établie sur un système de télécommunications basé sur ip
US8971515B2 (en) Method to stream compressed digital audio over circuit switched, voice networks
CN102394991B (zh) 一种多媒体会议业务中实现会场放音的方法和系统
KR20120025364A (ko) 멀티모달 인터페이스를 이용한 양방향 자동 응답 시스템 및 그 방법
EP1858218A1 (fr) Procédé et entités permettant de fournir un enrichissement d'appel pour des appels vocaux, et combinaison sémantique de plusieurs sessions de service en une session de service combinée virtuellement.
Yi et al. Automatic voice relay with open source Kiara
Podhradský et al. Subsystem for m/e-learning and Virtual Training based on IMS NGN Architecture
EP1917793A1 (fr) Service de personnalisation de communications par traitement des flux media audio et/ou video
CN121397176A (en) Video interaction method and device, storage medium and electronic equipment
KR101334478B1 (ko) 통신 시스템에서 멀티미디어 서비스 제공 방법 및 장치
藤井章博 et al. Trends in the Commercialization and R&D of New Information Network Infrastructure
Watanabe et al. A General Purpose Connection CTI Server Based on the SIP Protocol and Its Implementation
KR20100031413A (ko) 단말기 적응형 멀티미디어 스트리밍 서비스를 위한 단말기 정보 통신 장치 및 방법

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200980146430.1

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09827832

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 13129828

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2009827832

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE