[go: up one dir, main page]

WO2019000515A1 - Voice call method and device - Google Patents

Voice call method and device Download PDF

Info

Publication number
WO2019000515A1
WO2019000515A1 PCT/CN2017/093741 CN2017093741W WO2019000515A1 WO 2019000515 A1 WO2019000515 A1 WO 2019000515A1 CN 2017093741 W CN2017093741 W CN 2017093741W WO 2019000515 A1 WO2019000515 A1 WO 2019000515A1
Authority
WO
WIPO (PCT)
Prior art keywords
language
server
voice
information
voice information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2017/093741
Other languages
French (fr)
Chinese (zh)
Inventor
蒋壮
王文琪
王广新
陈杰
温平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Water World Co Ltd
Original Assignee
Shenzhen Water World Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Water World Co Ltd filed Critical Shenzhen Water World Co Ltd
Publication of WO2019000515A1 publication Critical patent/WO2019000515A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M7/00Arrangements for interconnection between switching centres
    • H04M7/006Networks other than PSTN/ISDN providing telephone service, e.g. Voice over Internet Protocol (VoIP), including next generation networks with a packet-switched transport layer
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a voice call method and apparatus.
  • a primary object of the present invention is to provide a voice call method and apparatus for solving the technical problem that a user using a different language cannot perform remote voice communication through a communication terminal.
  • an embodiment of the present invention provides a voice call method, where the method includes the following steps: collecting voice information of an original first language; and transmitting the voice information of the original first language to a server. Translating processing, so that the server translates the voice information of the first language into voice information of a final second language; receiving voice information of the final second language returned by the server; The voice information of the language is sent to the peer.
  • an embodiment of the present invention further provides a voice call method, where the method includes The following steps: receiving voice information of the original second language sent by the peer end; sending the voice information of the original second language to the server for translation processing, so that the server translates the voice information of the second language into Final voice information of the first language; receiving voice information of the final first language returned by the server; and outputting voice information of the final first language.
  • the embodiment of the present invention further provides a voice call device, where the device includes an information collection module, a first translation processing module, a first information receiving module, and an information sending module, and the information collection module is configured to collect the original first a voice information of the language; the first translation processing module is configured to send the voice information of the original first language to a server for translation processing, so that the server translates the voice information of the original first language into a final second The voice information of the language; the first information receiving module is configured to receive the voice information of the final second language returned by the server; and the information sending module is configured to send the voice information of the final second language to the peer end.
  • a voice call method provided by an embodiment of the present invention sends a voice message of a local user to a server for translation processing, and translates the voice information that can be recognized by the peer user, and then translates the voice.
  • the information is sent to the peer end, so that the peer user can understand the voice of the local user.
  • the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.
  • FIG. 1 is a system block diagram of an application scenario of a voice call method according to an embodiment of the present invention
  • FIG. 2 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention.
  • FIG. 3 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention.
  • FIG. 4 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention.
  • FIG. 5 is a block diagram showing a first embodiment of a voice communication device according to the present invention.
  • FIG. 6 is a block diagram of a first translation processing module of FIG. 5;
  • FIG. 7 is a block diagram showing a second embodiment of a voice communication device according to the present invention
  • 8 is a block diagram of a second translation processing module of FIG. 7;
  • FIG. 9 is a block diagram showing a third embodiment of a voice communication device of the present invention.
  • VOLTE Voice over LTE
  • VoLTE is an IP data transmission technology that does not require a 2G/3G network. All services are carried on a 4G network, which enables data and voice services to be unified under the same network.
  • it can also be applied to a communication terminal based on other IP data transmission technologies, as long as it can unify data and voice services in the same network, which is not limited by the present invention.
  • the first embodiment of the voice call method of the present invention includes the following steps:
  • the language used by the VOLTE terminal user is defined as the first language, and the language used by the peer user is the second language.
  • the VOLTE terminal acts as a transmitting terminal, the voice information of the user's first language is collected through the microphone.
  • S12 Send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the original first language into the voice information of the final second language.
  • the VOLTE terminal may directly transmit the voice information of the original first language to the server as a voice data stream.
  • the VOLTE terminal sends the voice information of the original first language to the server in the form of a data packet.
  • the VOLTE terminal first records the voice information of the original first language, records it as a voice file and caches it, and then sends each cached voice file to the server in the form of a data packet.
  • Translation processing mainly includes three processes of identification, translation and synthesis. These three processes can be completed by one server or by two or three servers.
  • the server includes a voice recognition server, a translation server, and a voice synthesis server.
  • the VOLTE terminal establishes an IP-based connection with the voice recognition server, and sets the identification information, that is, the language type to be recognized, including the local language type (first language), and may further include The language type of the terminal (second language); establishes an IP-based connection with the translation server, sets the translation information, that is, the language to be translated, including the local-to-peer mapping, and may further include the peer-to-peer mapping;
  • the server establishes a connection based on IP communication, and sets synthetic information, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.
  • step S12 the specific process of the VOLTE terminal transmitting the original first language voice information to the server for translation processing is as follows:
  • S121 Send the voice information of the original first language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the first language.
  • the VOLTE terminal first records the voice information of the original first language, records the voice files into a single voice file, and buffers them, and then sends each cached voice file to the voice recognition server in the form of a data packet. After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the first language, and returns the character string of the first language to the V 0LTE terminal.
  • S122 Receive a character string of the first language returned by the voice recognition server.
  • S123 Send a character string of the first language to the translation server, so that the translation server translates the character string of the first language into the character string of the second language.
  • the VOLTE terminal After receiving the character string of the first language, the VOLTE terminal sends the character string of the first language to the translation server. After receiving the string of the first language, the translation server translates the string of the first language according to the preset translation information, translates the string into the second language, and returns the string of the second language to the VOLTE. terminal.
  • S124 Receive a character string of a second language returned by the translation server.
  • S125 Send a character string of the second language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language.
  • the VOLTE terminal After receiving the character string of the second language, the VOLTE terminal sends the character string of the second language to the voice synthesizing server. After receiving the character string in the second language, the speech synthesis server synthesizes the character string of the second language according to the preset synthesis information, synthesizes the speech information into the final second language, and finally the voice information of the second language is The form of the voice stream is returned to the VOLTE terminal.
  • the voice information of the original first language may also be identified by a server. Translation and synthesis processing.
  • the VOLTE terminal transmits the voice information of the original first language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the voice information to the VOLTE terminal.
  • the identification, translation, and synthesis processing of the speech information of the original first language may also be performed by two servers.
  • the VOLTE terminal sends the voice information of the original first language to the first server, and the first server identifies and translates the voice information and returns the voice information to the VOLTE terminal, and the VOLT E terminal sends the voice information after the identification and translation processing.
  • the second server sends the voice information to the VOLTE terminal.
  • the VOLTE terminal sends the voice information of the original first language to the first server, and the first server returns the voice information to the VOLTE terminal, and the VOLTE terminal sends the voice information after the identification processing to the second server.
  • the second server translates and synthesizes the voice information and returns it to the VOLTE terminal.
  • the VOLTE terminal After receiving the voice information of the final second language returned by the server, the VOLTE terminal sends the voice information of the final second language to the peer end through the voice channel. After receiving the voice information of the final second language, the peer end processes the voice information of the final second language through the audio channel, and finally outputs the voice information of the final second language through the sounding device (handset, speaker, etc.), using the The peer user of the second language can understand what the local user said.
  • the voice call method of the embodiment of the present invention sends the voice information of the collected local user to the server for translation processing, translates the voice information that can be recognized by the peer user, and then sends the translated voice information to the pair. End, so that the peer user can understand the voice of the local user.
  • the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.
  • step S14 the following steps are further included:
  • S15 Receive voice information of the original second language sent by the opposite end.
  • the VOLTE terminal may directly transmit the voice information of the original second language to the server as a voice data stream.
  • the VOLTE terminal sends the voice information of the original second language to the server in the form of a data packet.
  • the VOLTE terminal first records the voice information of the original second language, records it as a voice file and caches it, and then sends each cached voice file to the server in the form of a data packet.
  • the server includes a voice recognition server, a translation server, and a voice synthesis server.
  • step S16 the specific process of the VOLTE terminal transmitting the voice information of the original second language to the server for translation processing is as follows:
  • the VOLTE terminal first performs recording processing on the voice information of the original second language, records the voice files as a single voice file, and buffers, and then sends each cached voice file to the voice recognition server in the form of a data packet.
  • the voice recognition server After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the second language, and returns the character string of the second language to the V OLTE terminal.
  • S162. Receive a character string of a second language returned by the voice recognition server.
  • the VOLTE terminal After receiving the character string in the second language, the VOLTE terminal sends the character string of the second language to the translation server. After receiving the character string in the second language, the translation server translates the character string of the second language according to the preset translation information, translates the character string into the first language, and returns the character string of the first language to VOLTE. terminal.
  • S164 Receive a character string of the first language returned by the translation server.
  • S165 Send the character string of the first language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language.
  • the VOLTE terminal After receiving the character string in the first language, the VOLTE terminal sends the character string of the first language to the voice combination Become a server. After receiving the character string in the first language, the speech synthesis server synthesizes the character string of the first language according to the preset synthesis information, synthesizes the voice information into the final first language, and uses the voice information of the final first language to The form of the voice stream is returned to the VOLTE terminal.
  • the identification, translation, and synthesis processing of the voice information of the original second language may also be performed by one server.
  • the VOLTE terminal transmits the voice information of the original second language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the voice information to the VOLTE terminal.
  • the identification, translation, and composition processing of the speech information of the original second language may also be performed by two servers.
  • the VOLTE terminal sends the voice information of the original second language to the first server, and the first server identifies and translates the voice information and returns the voice information to the VOLTE terminal, and the VOLT E terminal sends the voice information after the identification and translation processing.
  • the second server sends the voice information to the VOLTE terminal.
  • the VOLTE terminal sends the voice information of the original second language to the first server, and the first server returns the voice information to the VOLTE terminal, and the VOLTE terminal sends the voice information after the identification processing to the second server.
  • the second server translates and synthesizes the voice information and returns it to the VOLTE terminal.
  • S17 Receive voice information of the final first language returned by the server.
  • the VOLTE terminal processes the voice information of the final first language through the audio path, and finally outputs the final first language through the sounding device (handset, speaker, etc.)
  • the voice information, the local user in the first language can understand what the opposite user said.
  • the received voice information of the peer user is further sent to the server for translation processing, and the voice information that can be recognized by the local user is translated, and the translated voice information is output, so that the local user can Understand the voice of the opposite user. Therefore, even if the peer end is an ordinary terminal, remote voice communication can be realized for users using different languages, which greatly expands the application range and further reduces the communication cost.
  • a third embodiment of the voice call method of the present invention is proposed, and the method includes the following steps:
  • S22 Send the voice information of the original second language to the server for translation processing, so that the server will The speech information of the second language is translated into speech information of the final first language.
  • S23 Receive voice information of the final first language returned by the server.
  • steps S21 to S24 are the same as the steps S15-S18 in the second embodiment, and details are not described herein again.
  • the voice call method of the embodiment of the present invention transmits the voice information of the received peer user to the server for translation processing, translates the voice information that the local user can recognize, and then outputs the translated voice information, so that The local user can understand the voice of the peer user.
  • the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.
  • step S24 the following steps are further included:
  • S26 Send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the first language into the voice information of the final second language.
  • S27 Receive voice information of the final second language returned by the server.
  • S28 Send the voice information of the final second language to the peer end.
  • the steps S25-S28 are respectively related to the steps S11-S14 in the first embodiment, and details are not described herein again.
  • the collected voice information of the local user is further sent to the server for translation processing, translated into voice information that the peer user can recognize, and then the translated voice information is sent to the peer end, so that The peer user can understand the voice of the local user. Therefore, even if the peer end is an ordinary terminal, remote voice communication can be realized for users using different languages, which greatly expands the application range and further reduces the communication cost.
  • the first embodiment and the third embodiment may be applied to the application scenario shown in FIG. 1 , where the VOLTE terminal A and the VOLTE terminal B pass the IP multimedia subsystem (IP Multimedia Subsys tern, IMS).
  • IP Multimedia Subsys tern, IMS IP Multimedia Subsys tern
  • the network establishes a connection, and VOLTE terminal A and VOLTE terminal B are respectively connected to the voice recognition
  • the other server, the translation server, and the voice synthesizing server, the VOLTE terminal A and the VOLTE terminal B both use the voice call method of the first embodiment or the second embodiment to perform a voice call, so that users in different languages can implement remote voice communication.
  • the second embodiment and the fourth embodiment can be applied to the application scenarios as shown in FIGS. 2 to 4.
  • the VOL TE terminal A and the voice terminal B establish a connection through the IMS network, and the VOLTE terminal A is respectively connected to the voice recognition server, the translation server and the voice synthesis server, and the VOLTE terminal A uses the second embodiment or the third embodiment.
  • the voice call method and the voice terminal B make a voice call so that users in different languages can realize remote voice communication.
  • the VOLTE terminal A connects to the IMS network and the gateway of the 2G/3G network through the IMS network
  • the voice terminal B connects the IMS network and the gateway of the 2G/3G network through the 2G/3G network
  • the VOLTE terminal A is respectively connected to the voice recognition server, and the translation
  • the server and the voice synthesizing server, the VO LTE terminal A uses the voice call method of the second embodiment or the third embodiment to make a voice call with the voice terminal B, so that users in different languages can realize remote voice communication.
  • the VOLTE terminal A connects to the IMS network and the public switched telephone network (PSTN) gateway through the IMS network
  • the voice terminal B connects the IMS network and the PSTN gateway through the PSTN
  • the VOLTE terminal A is connected to the voice recognition respectively.
  • the server, the translation server, and the voice synthesizing server, the VO LTE terminal A uses the voice call method of the second embodiment or the third embodiment to make a voice call with the voice terminal B, so that users in different languages can implement remote voice communication.
  • the processing delay of the speech recognition server is generally less than 3 seconds
  • the processing delay of the translation server is generally less than 200 milliseconds
  • the processing delay of the speech synthesis server is generally less than 200 milliseconds
  • the delay of the transmission of the IMS network is generally second. Therefore, using the high-rate and low-latency characteristics of LTE communication, the multi-language real-time translation function during voice call is implemented on the VOLTE terminal, and the voice translation processing speed is fast, the delay is small, and the call of the user is not affected, thereby Enables remote, accessible voice communication for users in different languages.
  • the device includes an information collection module 10, a first translation processing module 20, a first information receiving module 30, and an information sending module 40, where:
  • the information collection module 10 is configured to collect voice information of the original first language.
  • the first translation processing module 20 is configured to send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the original first language into the voice information of the final second language.
  • First information receiving mode Block 30 is arranged to receive the voice information of the final second language returned by the server.
  • the information sending module 40 is configured to send the voice information of the final second language to the opposite end.
  • the language used by the VOLTE terminal user is the first language
  • the language used by the peer user is the second language.
  • the information collecting module 10 collects the voice information of the original first language of the user through the microphone.
  • the first translation processing module 20 may send the voice information of the original first language to the server directly as a voice data stream.
  • the first translation processing module 20 subdivides the voice information of the original first language in the form of a data packet.
  • Sent to the server For example, the first translation processing module 20 first records the voice information of the original first language, records the voice files into a single voice file, and caches them, and then sequentially sends each cached voice file to the server in the form of a data packet.
  • the translation process mainly includes three processes of identification, translation and synthesis.
  • the three processes can be completed by one server or by two or three servers.
  • the server includes a voice recognition server, a translation server, and a voice synthesis server.
  • the VOLTE terminal establishes an IP-based communication connection with the voice recognition server, and sets the identification information through the first setting module, that is, the language type to be recognized, including the local language type (first language), and may further include the language type of the opposite end (first The second language); establishes an IP-based connection with the translation server, and sets the translation information through the second setting module, that is, the language to be translated, including the local-to-peer mapping, and may further include the peer-to-end mapping;
  • the server establishes a connection based on IP communication, and sets the synthesized information through the third setting module, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.
  • the first translation processing module 20 includes a first transmitting unit 21, a first receiving unit 22, a second transmitting unit 23, a second receiving unit 24, and a third transmitting unit 25, where:
  • the first transmitting unit 21 is configured to transmit the voice information of the original first language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the first language.
  • the first sending unit 21 first performs recording processing on the voice information of the original first language, records the voice files as a single voice file, and buffers, and then sequentially sends each cached voice file to the voice recognition server in the form of a data packet.
  • the voice recognition server After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the first language, and returns the character string of the first language to the VOLTE terminal.
  • the first receiving unit 22 is arranged to receive a character string of the first language returned by the voice recognition server.
  • the element 23 is arranged to send a string of the first language to the translation server to cause the translation server to translate the string of the first language into a string of the second language.
  • the second transmitting unit 23 transmits the character string of the first language to the translation server.
  • the translation server translates the string of the first language according to the preset translation information, translates the string into the second language, and returns the string of the second language to the VOLTE. terminal.
  • the second receiving unit 24 is arranged to receive a character string of the second language returned by the translation server.
  • the third transmitting unit 25 is arranged to transmit the character string of the second language to the speech synthesis server such that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language. After receiving the character string of the second language, the third transmitting unit 25 transmits the character string of the second language to the speech synthesis server. After receiving the character string in the second language, the speech synthesis server synthesizes the character string of the second language according to the preset synthesis information, synthesizes the speech information into the final second language, and finally the voice information of the second language is The form of the voice stream is returned to the VOLTE terminal.
  • the identification, translation, and synthesis processing of the voice information of the original first language may also be performed by one server.
  • the first translation processing module 20 transmits the voice information of the original first language to the server, and the server identifies, translates, and synthesizes the voice information and returns it to the VOLTE terminal.
  • the identification, translation, and synthesis processing of the speech information of the original first language may also be performed by two servers.
  • the first translation processing module 20 sends the voice information of the original first language to the first server, and the first server identifies and translates the voice information and returns the result to the V OLTE terminal, where the first translation processing module 20 identifies And the voice information after the translation processing is sent to the second server, and the second server combines the voice information and returns to the VOLTE terminal.
  • the first translation processing module 20 sends the voice information of the original first language to the first server, and the first server performs the identification process and returns the voice information to the VOLTE terminal, and the first translation processing module 20 further identifies the processing.
  • the voice information is sent to the second server, and the second server translates and synthesizes the voice information and returns it to the VOLTE terminal.
  • the voice call device of the embodiment of the present invention transmits the voice information of the local user of the collection to the server for translation processing, translates the voice information that can be recognized by the peer user, and then sends the translated voice information to the voice message.
  • the peer end enables the peer user to understand the voice of the local user. Therefore, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, and the use is different. Language users can't communicate technical problems with remote voice communication through communication terminals, which reduces communication costs and improves user experience.
  • the device includes a second information receiving module 50, a second translation processing module 60, a third information receiving module 70, and an information output module 80, wherein
  • the second information receiving module 50 is configured to receive the voice information of the original second language sent by the opposite end.
  • the second information receiving module 50 receives the voice information of the original second language sent by the opposite end of the transmitting end through the voice channel.
  • the second translation processing module 60 is arranged to transmit the voice information of the original second language to the server for translation processing, so that the server translates the voice information of the original second language into the voice information of the final first language.
  • the second translation processing module 60 may directly transmit the voice information of the original second language to the server as a voice data stream. Preferably, the second translation processing module 60 segments the voice information of the original second language in the form of a data packet. Sent to the server. For example, the second translation processing module 60 first performs recording processing on the voice information of the original second language, records the voice files as a single voice file, and caches them, and then sequentially sends each cached voice file to the server in the form of a data packet.
  • the server includes a voice recognition server, a translation server, and a voice synthesis server.
  • the VOLTE terminal establishes an IP-based communication connection with the voice recognition server, and sets the identification information through the first setting module, that is, the language type to be recognized, including the language type of the opposite end (second language), and may further include the local language type (first) a language); establishing a connection based on the IP communication with the translation server, setting the translation information through the second setting module, that is, the language to be translated, including the mapping of the peer to the local end, and further including the mapping of the local end to the opposite end; and speech synthesis
  • the server establishes a connection based on IP communication, and sets the synthesized information through the third setting module, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.
  • the second translation processing module 60 includes a fourth transmitting unit 61, a third receiving unit 62, a fifth transmitting unit 63, a fourth receiving unit 64, and a sixth transmitting unit 65, where:
  • the transmitting unit 61 is arranged to transmit the voice information of the original second language to the voice recognition server such that the voice recognition server recognizes the voice information as a character string of the second language.
  • the fourth sending unit 61 first performs recording processing on the voice information of the original second language, records the voice files into a single voice file, and buffers them, and then sequentially sends each cached voice file to the voice recognition server in the form of a data packet.
  • the voice file is identified according to the preset identification information, recognized as a character string of the second language, and the character string of the second language is returned to the VOLTE terminal.
  • the third receiving unit 62 is arranged to receive a character string of the second language returned by the voice recognition server.
  • the fifth transmitting unit 63 is arranged to transmit the character string of the second language to the translation server to cause the translation server to translate the character string of the second language into a character string of the first language. After receiving the character string of the second language, the fifth transmitting unit 63 transmits the character string of the second language to the translation server.
  • the translation server After receiving the character string in the second language, the translation server translates the character string of the second language according to the preset translation information, translates the character string into the first language, and returns the character string of the first language to VOLTE. terminal.
  • the fourth receiving unit 64 is arranged to receive a character string of the first language returned by the translation server.
  • the sixth transmitting unit 65 is arranged to transmit the character string of the first language to the speech synthesis server such that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language. After receiving the character string of the first language, the sixth transmitting unit 65 transmits the character string of the first language to the speech synthesis server.
  • the speech synthesis server After receiving the character string in the first language, the speech synthesis server synthesizes the character string of the first language according to the preset synthesis information, synthesizes the voice information into the final first language, and uses the voice information of the final first language to The form of the voice stream is returned to the VOLTE terminal.
  • the identification, translation, and synthesis processing of the voice information of the original second language may also be performed by a server.
  • the second translation processing module 60 transmits the voice information of the original second language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the result to the VOLTE terminal.
  • the identification, translation, and synthesis processing of the speech information of the original second language may also be performed by two servers.
  • the second translation processing module 60 sends the voice information of the original second language to the first server, and the first server identifies and translates the voice information and returns the result to the V 0LTE terminal, where the second translation processing module 60 identifies And the voice information after the translation processing is sent to the second server, and the second server combines the voice information and returns to the VOLTE terminal.
  • the second translation processing module 60 sends the voice information of the original second language to the first server, and the first server performs the identification process and returns the voice information to the VOLTE terminal, and the second translation processing module 60 performs the identification process.
  • the voice information is sent to the second server, and the second server translates and synthesizes the voice information and returns it to the VOLTE terminal.
  • the third information receiving module 70 is configured to receive the voice information of the final first language returned by the server.
  • the information output module 80 is configured to output the voice of the final first language Information. After receiving the voice information of the final first language returned by the server, the information output module 80 processes the voice information of the final first language through the audio path, and finally outputs the final first through the sounding device (handset, speaker, etc.)
  • the voice information of the language, the local user who uses the first language can understand what the opposite user said.
  • the voice call device of the embodiment of the present invention transmits the voice information of the received peer user to the server for translation processing, translates the voice information that the local user can recognize, and then outputs the translated voice information, so that The local user can understand the voice of the peer user.
  • the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.
  • the voice communication devices of the foregoing first embodiment and the second embodiment may be combined to form the voice communication device of the third embodiment.
  • the voice call device can not only translate the voice information collected by the local end but also send the voice information to the opposite end, and can also translate the voice information sent by the opposite end and then output the voice information, so that even if the opposite end is an ordinary voice terminal,
  • the ability to implement remote voice communication for users using different languages greatly expands the scope of application and further reduces communication costs.
  • the voice call device of this embodiment can be applied to the application scenario as shown in FIG. 2 to FIG. 4.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Transfer Between Computers (AREA)
  • Machine Translation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本发明揭示了一种语音通话方法和装置,所述方法包括以下步骤:采集原始第一语言的语音信息;将原始第一语言的语音信息发送给服务器进行翻译处理,以使服务器将原始第一语言的语音信息翻译处理为最终第二语言的语音信息;接收服务器返回的最终第二语言的语音信息;将最终第二语言的语音信息发送给对端。The present invention discloses a voice call method and apparatus, the method comprising the steps of: collecting voice information of an original first language; transmitting voice information of the original first language to a server for translation processing, so that the server will be original first The voice information of the language is translated into the voice information of the final second language; the voice information of the final second language returned by the server is received; and the voice information of the final second language is sent to the opposite end.

Description

发明名称:语音通话方法和装置  Invention name: voice call method and device

技术领域  Technical field

[0001] 本发明涉及通信技术领域, 特别是涉及到一种语音通话方法和装置。  [0001] The present invention relates to the field of communications technologies, and in particular, to a voice call method and apparatus.

背景技术  Background technique

[0002] 随着通信终端的使用日益广泛, 人们利用通信终端可以实现多种功能, 例如利 用通信终端听音乐, 看视频以及进行语音通话等等。 语音通话是通信终端的一 个基本的和常用的功能, 即使人们远隔千里, 也能够通过通信终端实现远程语 音交流, 无形中缩短了人与人之间的距离。  [0002] With the increasing use of communication terminals, people can realize various functions by using communication terminals, such as listening to music, watching videos, and making voice calls using communication terminals. Voice calls are a basic and commonly used function of communication terminals. Even if people are thousands of miles apart, they can realize remote voice communication through communication terminals, which virtually shortens the distance between people.

[0003] 同时, 随着经济的全球化和囯际化发展, 不同囯家之间的人们的交往也越来越 密切。 不同国家的人通常使用不同的语言, 当两个用户中至少一个用户听不懂 对方的语言, 另一个用户也不会说对方的语言时, 两个用户则无法通过通信终 端进行远程语音交流, 必须面对面的交谈, 并且通过人工或者翻译机进行翻译 , 从而减少了沟通渠道, 提高了沟通成本。  [0003] At the same time, with the globalization of the economy and the development of internationalization, people's exchanges between different countries are becoming more and more close. People in different countries usually use different languages. When at least one of the two users does not understand the other party's language, and the other user does not speak the other party's language, the two users cannot communicate remotely through the communication terminal. Face-to-face conversations must be made and translated through manual or translation machines, reducing communication channels and increasing communication costs.

技术问题  technical problem

[0004] 因此, 如何通过通信终端为使用不同语言的用户实现远程语音交流, 是当前亟 需解决的技术问题。  [0004] Therefore, how to implement remote voice communication for users using different languages through a communication terminal is a technical problem that needs to be solved at present.

问题的解决方案  Problem solution

技术解决方案  Technical solution

[0005] 本发明的主要目的为提供一种语音通话方法和装置, 旨在解决使用不同语言的 用户无法通过通信终端进行远程语音交流的技术问题。  [0005] A primary object of the present invention is to provide a voice call method and apparatus for solving the technical problem that a user using a different language cannot perform remote voice communication through a communication terminal.

[0006] 为达以上目的, 本发明实施例提出一种语音通话方法, 所述方法包括以下步骤 : 釆集原始第一语言的语音信息; 将所述原始第一语言的语音信息发送给服务 器进行翻译处理, 以使所述服务器将所述第一语言的语音信息翻译处理为最终 第二语言的语音信息; 接收所述服务器返回的所述最终第二语言的语音信息; 将所述最终第二语言的语音信息发送给对端。  [0006] In order to achieve the above objective, an embodiment of the present invention provides a voice call method, where the method includes the following steps: collecting voice information of an original first language; and transmitting the voice information of the original first language to a server. Translating processing, so that the server translates the voice information of the first language into voice information of a final second language; receiving voice information of the final second language returned by the server; The voice information of the language is sent to the peer.

[0007] 基于同一发明构思, 本发明实施例还提出一种语音通话方法, 所述方法包括以 下步骤: 接收对端发送的原始第二语言的语音信息; 将所述原始第二语言的语 音信息发送给服务器进行翻译处理, 以使所述服务器将所述第二语言的语音信 息翻译处理为最终第一语言的语音信息; 接收所述服务器返回的所述最终第一 语言的语音信息; 输出所述最终第一语言的语音信息。 [0007] Based on the same inventive concept, an embodiment of the present invention further provides a voice call method, where the method includes The following steps: receiving voice information of the original second language sent by the peer end; sending the voice information of the original second language to the server for translation processing, so that the server translates the voice information of the second language into Final voice information of the first language; receiving voice information of the final first language returned by the server; and outputting voice information of the final first language.

[0008] 本发明实施例同时提出一种语音通话装置, 所述装置包括信息釆集模块、 第一 翻译处理模块、 第一信息接收模块、 信息发送模块, 信息釆集模块设置为采集 原始第一语言的语音信息; 第一翻译处理模块设置为将所述原始第一语言的语 音信息发送给服务器进行翻译处理, 以使所述服务器将所述原始第一语言的语 音信息翻译处理为最终第二语言的语音信息; 第一信息接收模块设置为接收所 述服务器返回的所述最终第二语言的语音信息; 信息发送模块设置为将所述最 终第二语言的语音信息发送给对端。 [0008] The embodiment of the present invention further provides a voice call device, where the device includes an information collection module, a first translation processing module, a first information receiving module, and an information sending module, and the information collection module is configured to collect the original first a voice information of the language; the first translation processing module is configured to send the voice information of the original first language to a server for translation processing, so that the server translates the voice information of the original first language into a final second The voice information of the language; the first information receiving module is configured to receive the voice information of the final second language returned by the server; and the information sending module is configured to send the voice information of the final second language to the peer end.

发明的有益效果  Advantageous effects of the invention

有益效果  Beneficial effect

[0009] 本发明实施例所提供的一种语音通话方法, 通过将采集的本端用户的语音信息 发送给服务器进行翻译处理, 翻译为对端用户能够识别的语音信息, 再将翻译 后的语音信息发送给对端, 使得对端用户能够听懂本端用户的语音。 从而为通 信终端增加了翻译功能, 使得使用不同语言的用户实现了远程语音交流, 解决 了使用不同语言的用户无法通过通信终端进行远程语音交流的技术问题, 降低 了沟通成本, 提升了用户体验。  [0009] A voice call method provided by an embodiment of the present invention sends a voice message of a local user to a server for translation processing, and translates the voice information that can be recognized by the peer user, and then translates the voice. The information is sent to the peer end, so that the peer user can understand the voice of the local user. Thereby, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.

对附图的简要说明  Brief description of the drawing

附图说明  DRAWINGS

[0010] 图 1是实现本发明实施例的语音通话方法一应用场景的系统框图;  1 is a system block diagram of an application scenario of a voice call method according to an embodiment of the present invention;

[0011] 图 2是实现本发明实施例的语音通话方法又一应用场景的系统框图;  [0011] FIG. 2 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention;

[0012] 图 3是实现本发明实施例的语音通话方法又一应用场景的系统框图;  [0012] FIG. 3 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention;

[0013] 图 4是实现本发明实施例的语音通话方法又一应用场景的系统框图;  [0013] FIG. 4 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention;

[0014] 图 5是本发明的语音通话装置第一实施例的模块示意图;  5 is a block diagram showing a first embodiment of a voice communication device according to the present invention;

[0015] 图 6是图 5中的第一翻译处理模块的模块示意图;  6 is a block diagram of a first translation processing module of FIG. 5;

[0016] 图 7是本发明的语音通话装置第二实施例的模块示意图; [0017] 图 8是图 7中的第二翻译处理模块的模块示意图; 7 is a block diagram showing a second embodiment of a voice communication device according to the present invention; 8 is a block diagram of a second translation processing module of FIG. 7;

[0018] 图 9是本发明的语音通话装置第三实施例的模块示意图。  9 is a block diagram showing a third embodiment of a voice communication device of the present invention.

实施该发明的最佳实施例  BEST MODE FOR CARRYING OUT THE INVENTION

本发明的最佳实施方式  BEST MODE FOR CARRYING OUT THE INVENTION

[0019] 应当理解, 此处所描述的具体实施例仅仅用以解释本发明, 并不用于限定本发 明。 The specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

[0020] 本发明实施例的语音通话方法和装置, 主要应用于 VOLTE终端, 该 VOLTE终 端即基于 VOLTE (Voice over LTE) 技术的通信终端。 VoLTE是一种 IP数据传输 技术, 无需 2G/3G网络, 全部业务承载于 4G网络上, 可实现数据与语音业务在同 一网络下的统一。 当然, 也可以应用于基于其它 IP数据传输技术的通信终端, 只 要其能够将数据与语音业务统一在同一网络下即可, 本发明对此不作限定。  [0020] The voice call method and apparatus of the embodiments of the present invention are mainly applied to a VOLTE terminal, which is a communication terminal based on VOLTE (Voice over LTE) technology. VoLTE is an IP data transmission technology that does not require a 2G/3G network. All services are carried on a 4G network, which enables data and voice services to be unified under the same network. Of course, it can also be applied to a communication terminal based on other IP data transmission technologies, as long as it can unify data and voice services in the same network, which is not limited by the present invention.

[0021] 本发明的语音通话方法第一实施例, 所述方法包括以下步骤:  [0021] The first embodiment of the voice call method of the present invention, the method includes the following steps:

[0022] Sll、 釆集原始第一语言的语音信息。  [0022] S11. Collect voice information of the original first language.

[0023] 本发明实施例中, 定义 VOLTE终端用户使用的语言为第一语言, 对端用户使 用的语言为第二语言。 当 VOLTE终端作为发送端吋, 通过麦克风采集用户的第 一语言的语音信息。  [0023] In the embodiment of the present invention, the language used by the VOLTE terminal user is defined as the first language, and the language used by the peer user is the second language. When the VOLTE terminal acts as a transmitting terminal, the voice information of the user's first language is collected through the microphone.

[0024] S12、 将原始第一语言的语音信息发送给服务器进行翻译处理, 以使服务器将 原始第一语言的语音信息翻译处理为最终第二语言的语音信息。  [0024] S12: Send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the original first language into the voice information of the final second language.

[0025] VOLTE终端可以将原始第一语言的语音信息直接以语音数据流的方式发送给 服务器, 作为优选, VOLTE终端将原始第一语言的语音信息以数据包的形式分 包发送给服务器。 例如, VOLTE终端首先将原始第一语言的语音信息进行录音 处理, 录制为一个个的语音文件并缓存, 然后将缓存的每个语音文件以数据包 的形式依次发送给服务器。  [0025] The VOLTE terminal may directly transmit the voice information of the original first language to the server as a voice data stream. Preferably, the VOLTE terminal sends the voice information of the original first language to the server in the form of a data packet. For example, the VOLTE terminal first records the voice information of the original first language, records it as a voice file and caches it, and then sends each cached voice file to the server in the form of a data packet.

[0026] 翻译处理主要包括识别、 翻译和合成三个流程, 这三个流程可以由一个服务器 完成, 也可以由两个或三个服务器完成。  [0026] Translation processing mainly includes three processes of identification, translation and synthesis. These three processes can be completed by one server or by two or three servers.

[0027] 本发明实施例中, 服务器包括语音识别服务器、 翻译服务器和语音合成服务器 。 VOLTE终端与语音识别服务器建立基于 IP通信的连接, 设置识别信息, 即需 要识别的语言类型, 包括本端的语言类型 (第一语言) , 还可以进一步包括对 端的语言类型 (第二语言) ; 与翻译服务器建立基于 IP通信的连接, 设置翻译信 息, 即要翻译的语种, 包括本端对对端的映射, 还可以进一步包括对端对本端 映射; 与语音合成服务器建立基于 IP通信的连接, 设置合成信息, 即语音合成的 类型, 比如男女声、 语速等。 [0027] In the embodiment of the present invention, the server includes a voice recognition server, a translation server, and a voice synthesis server. The VOLTE terminal establishes an IP-based connection with the voice recognition server, and sets the identification information, that is, the language type to be recognized, including the local language type (first language), and may further include The language type of the terminal (second language); establishes an IP-based connection with the translation server, sets the translation information, that is, the language to be translated, including the local-to-peer mapping, and may further include the peer-to-peer mapping; The server establishes a connection based on IP communication, and sets synthetic information, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.

[0028] 步骤 S12中, VOLTE终端将原始第一语言的语音信息发送给服务器进行翻译处 理的具体流程如下:  [0028] In step S12, the specific process of the VOLTE terminal transmitting the original first language voice information to the server for translation processing is as follows:

[0029] S121、 将原始第一语言的语音信息发送给语音识别服务器, 以使语音识别服务 器将语音信息识别为第一语言的字符串。  [0029] S121. Send the voice information of the original first language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the first language.

[0030] VOLTE终端首先将原始第一语言的语音信息进行录音处理, 录制为一个个的 语音文件并缓存, 然后将缓存的每个语音文件以数据包的形式依次发送给语音 识别服务器。 语音识别服务器接收到语音文件后, 根据预设的识别信息对语音 文件进行识别处理, 识别为第一语言的字符串, 并将第一语言的字符串返回给 V 0LTE终端。  [0030] The VOLTE terminal first records the voice information of the original first language, records the voice files into a single voice file, and buffers them, and then sends each cached voice file to the voice recognition server in the form of a data packet. After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the first language, and returns the character string of the first language to the V 0LTE terminal.

[0031] S122、 接收语音识别服务器返回的第一语言的字符串。  [0031] S122. Receive a character string of the first language returned by the voice recognition server.

[0032] S123、 将第一语言的字符串发送给翻译服务器, 以使翻译服务器将第一语言的 字符串翻译为第二语言的字符串。  [0032] S123. Send a character string of the first language to the translation server, so that the translation server translates the character string of the first language into the character string of the second language.

[0033] VOLTE终端接收到第一语言的字符串后, 将第一语言的字符串发送给翻译服 务器。 翻译服务器接收到第一语言的字符串后, 根据预设的翻译信息对该第一 语言的字符串进行翻译处理, 翻译为第二语言的字符串, 并将第二语言的字符 串返回给 VOLTE终端。  [0033] After receiving the character string of the first language, the VOLTE terminal sends the character string of the first language to the translation server. After receiving the string of the first language, the translation server translates the string of the first language according to the preset translation information, translates the string into the second language, and returns the string of the second language to the VOLTE. terminal.

[0034] S124、 接收翻译服务器返回的第二语言的字符串。  [0034] S124. Receive a character string of a second language returned by the translation server.

[0035] S125、 将第二语言的字符串发送给语音合成服务器, 以使语音合成服务器将第 二语言的字符串合成为最终第二语言的语音信息。  [0035] S125. Send a character string of the second language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language.

[0036] VOLTE终端接收到第二语言的字符串后, 将第二语言的字符串发送给语音合 成服务器。 语音合成服务器接收到第二语言的字符串后, 根据预设的合成信息 对第二语言的字符串进行合成处理, 合成为最终第二语言的语音信息, 并将最 终第二语言的语音信息以语音码流的形式返回给 VOLTE终端。  [0036] After receiving the character string of the second language, the VOLTE terminal sends the character string of the second language to the voice synthesizing server. After receiving the character string in the second language, the speech synthesis server synthesizes the character string of the second language according to the preset synthesis information, synthesizes the speech information into the final second language, and finally the voice information of the second language is The form of the voice stream is returned to the VOLTE terminal.

[0037] 在其它实施例中, 也可以由一个服务器完成原始第一语言的语音信息的识别、 翻译和合成处理。 例如, VOLTE终端将原始第一语言的语音信息发送给服务器 , 服务器将该语音信息进行识别、 翻译和合成处理后返回给 VOLTE终端。 在另 一些实施例中, 也可以由两个服务器完成原始第一语言的语音信息的识别、 翻 译和合成处理。 例如, VOLTE终端将原始第一语言的语音信息发送给第一服务 器, 第一服务器将该语音信息进行识别和翻译处理后返回给 VOLTE终端, VOLT E终端再将识别和翻译处理后的语音信息发送给第二服务器, 第二服务器将该语 音信息进行合成处理后返回给 VOLTE终端。 又如, VOLTE终端将原始第一语言 的语音信息发送给第一服务器, 第一服务器将该语音信息进行识别处理后返回 给 VOLTE终端, VOLTE终端再将识别处理后的语音信息发送给第二服务器, 第 二服务器将该语音信息进行翻译和合成处理后返回给 VOLTE终端。 [0037] In other embodiments, the voice information of the original first language may also be identified by a server. Translation and synthesis processing. For example, the VOLTE terminal transmits the voice information of the original first language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the voice information to the VOLTE terminal. In other embodiments, the identification, translation, and synthesis processing of the speech information of the original first language may also be performed by two servers. For example, the VOLTE terminal sends the voice information of the original first language to the first server, and the first server identifies and translates the voice information and returns the voice information to the VOLTE terminal, and the VOLT E terminal sends the voice information after the identification and translation processing. The second server sends the voice information to the VOLTE terminal. For another example, the VOLTE terminal sends the voice information of the original first language to the first server, and the first server returns the voice information to the VOLTE terminal, and the VOLTE terminal sends the voice information after the identification processing to the second server. The second server translates and synthesizes the voice information and returns it to the VOLTE terminal.

[0038] S13、 接收服务器返回的最终第二语言的语音信息。  [0038] S13. Receive voice information of the final second language returned by the server.

[0039] S14、 将最终第二语言的语音信息发送给对端。  [0039] S14. Send the voice information of the final second language to the peer end.

[0040] VOLTE终端接收到服务器返回的最终第二语言的语音信息后, 通过语音通道 将最终第二语言的语音信息发送给对端。 对端接收到最终第二语言的语音信息 后, 通过音频通路对该最终第二语言的语音信息进行处理, 最后通过发声装置 (听筒、 扬声器等) 输出该最终第二语言的语音信息, 使用第二语言的对端用 户则能够听懂本端用户所说的话。  [0040] After receiving the voice information of the final second language returned by the server, the VOLTE terminal sends the voice information of the final second language to the peer end through the voice channel. After receiving the voice information of the final second language, the peer end processes the voice information of the final second language through the audio channel, and finally outputs the voice information of the final second language through the sounding device (handset, speaker, etc.), using the The peer user of the second language can understand what the local user said.

[0041] 本发明实施例的语音通话方法, 通过将采集的本端用户的语音信息发送给服务 器进行翻译处理, 翻译为对端用户能够识别的语音信息, 再将翻译后的语音信 息发送给对端, 使得对端用户能够听懂本端用户的语音。 从而为通信终端增加 了翻译功能, 使得使用不同语言的用户实现了远程语音交流, 解决了使用不同 语言的用户无法通过通信终端进行远程语音交流的技术问题, 降低了沟通成本 , 提升了用户体验。 The voice call method of the embodiment of the present invention sends the voice information of the collected local user to the server for translation processing, translates the voice information that can be recognized by the peer user, and then sends the translated voice information to the pair. End, so that the peer user can understand the voice of the local user. Thereby, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.

[0042] 进一步地, 在本发明的语音通话方法的第二实施例中, 步骤 S 14之后还包括以 下步骤:  [0042] Further, in the second embodiment of the voice call method of the present invention, after step S14, the following steps are further included:

[0043] S15、 接收对端发送的原始第二语言的语音信息。  [0043] S15. Receive voice information of the original second language sent by the opposite end.

[0044] 当 VOLTE终端作为接收端吋, 通过语音通道接收作为发送端的对端发送的原 始第二语言的语音信息。 [0045] S16、 将原始第二语言的语音信息发送给服务器进行翻译处理, 以使服务器将 原始第二语言的语音信息翻译处理为最终第一语言的语音信息。 [0044] When the VOLTE terminal is used as the receiving end, the voice information of the original second language sent by the opposite end of the transmitting end is received through the voice channel. [0045] S16: Send the voice information of the original second language to the server for translation processing, so that the server translates the voice information of the original second language into the voice information of the final first language.

[0046] VOLTE终端可以将原始第二语言的语音信息直接以语音数据流的方式发送给 服务器, 作为优选, VOLTE终端将原始第二语言的语音信息以数据包的形式分 包发送给服务器。 例如, VOLTE终端首先将原始第二语言的语音信息进行录音 处理, 录制为一个个的语音文件并缓存, 然后将缓存的每个语音文件以数据包 的形式依次发送给服务器。  [0046] The VOLTE terminal may directly transmit the voice information of the original second language to the server as a voice data stream. Preferably, the VOLTE terminal sends the voice information of the original second language to the server in the form of a data packet. For example, the VOLTE terminal first records the voice information of the original second language, records it as a voice file and caches it, and then sends each cached voice file to the server in the form of a data packet.

[0047] 本发明实施例中, 服务器包括语音识别服务器、 翻译服务器和语音合成服务器 。 步骤 S16中, VOLTE终端将原始第二语言的语音信息发送给服务器进行翻译处 理的具体流程如下:  [0047] In the embodiment of the present invention, the server includes a voice recognition server, a translation server, and a voice synthesis server. In step S16, the specific process of the VOLTE terminal transmitting the voice information of the original second language to the server for translation processing is as follows:

[0048] S161、 将原始第二语言的语音信息发送给语音识别服务器, 以使语音识别服务 器将语音信息识别为最终第二语言的字符串。  [0048] S161. Send the voice information of the original second language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the final second language.

[0049] VOLTE终端首先将原始第二语言的语音信息进行录音处理, 录制为一个个的 语音文件并缓存, 然后将缓存的每个语音文件以数据包的形式依次发送给语音 识别服务器。 语音识别服务器接收到语音文件后, 根据预设的识别信息对语音 文件进行识别处理, 识别为第二语言的字符串, 并将第二语言的字符串返回给 V OLTE终端。  [0049] The VOLTE terminal first performs recording processing on the voice information of the original second language, records the voice files as a single voice file, and buffers, and then sends each cached voice file to the voice recognition server in the form of a data packet. After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the second language, and returns the character string of the second language to the V OLTE terminal.

[0050] S162、 接收语音识别服务器返回的第二语言的字符串。  [0050] S162. Receive a character string of a second language returned by the voice recognition server.

[0051] S163、 将第二语言的字符串发送给翻译服务器, 以使翻译服务器将第二语言的 字符串翻译为第一语言的字符串。  [0051] S163. Send a character string of the second language to the translation server, so that the translation server translates the character string of the second language into the character string of the first language.

[0052] VOLTE终端接收到第二语言的字符串后, 将第二语言的字符串发送给翻译服 务器。 翻译服务器接收到第二语言的字符串后, 根据预设的翻译信息对该第二 语言的字符串进行翻译处理, 翻译为第一语言的字符串, 并将第一语言的字符 串返回给 VOLTE终端。  [0052] After receiving the character string in the second language, the VOLTE terminal sends the character string of the second language to the translation server. After receiving the character string in the second language, the translation server translates the character string of the second language according to the preset translation information, translates the character string into the first language, and returns the character string of the first language to VOLTE. terminal.

[0053] S164、 接收翻译服务器返回的第一语言的字符串。  [0053] S164. Receive a character string of the first language returned by the translation server.

[0054] S165、 将第一语言的字符串发送给语音合成服务器, 以使语音合成服务器将第 一语言的字符串合成为最终第一语言的语音信息。  [0054] S165. Send the character string of the first language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language.

[0055] VOLTE终端接收到第一语言的字符串后, 将第一语言的字符串发送给语音合 成服务器。 语音合成服务器接收到第一语言的字符串后, 根据预设的合成信息 对第一语言的字符串进行合成处理, 合成为最终第一语言的语音信息, 并将最 终第一语言的语音信息以语音码流的形式返回给 VOLTE终端。 [0055] After receiving the character string in the first language, the VOLTE terminal sends the character string of the first language to the voice combination Become a server. After receiving the character string in the first language, the speech synthesis server synthesizes the character string of the first language according to the preset synthesis information, synthesizes the voice information into the final first language, and uses the voice information of the final first language to The form of the voice stream is returned to the VOLTE terminal.

[0056] 在其它实施例中, 也可以由一个服务器完成原始第二语言的语音信息的识别、 翻译和合成处理。 例如, VOLTE终端将原始第二语言的语音信息发送给服务器 , 服务器将该语音信息进行识别、 翻译和合成处理后返回给 VOLTE终端。 在另 一些实施例中, 也可以由两个服务器完成原始第二语言的语音信息的识别、 翻 译和合成处理。 例如, VOLTE终端将原始第二语言的语音信息发送给第一服务 器, 第一服务器将该语音信息进行识别和翻译处理后返回给 VOLTE终端, VOLT E终端再将识别和翻译处理后的语音信息发送给第二服务器, 第二服务器将该语 音信息进行合成处理后返回给 VOLTE终端。 又如, VOLTE终端将原始第二语言 的语音信息发送给第一服务器, 第一服务器将该语音信息进行识别处理后返回 给 VOLTE终端, VOLTE终端再将识别处理后的语音信息发送给第二服务器, 第 二服务器将该语音信息进行翻译和合成处理后返回给 VOLTE终端。  [0056] In other embodiments, the identification, translation, and synthesis processing of the voice information of the original second language may also be performed by one server. For example, the VOLTE terminal transmits the voice information of the original second language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the voice information to the VOLTE terminal. In other embodiments, the identification, translation, and composition processing of the speech information of the original second language may also be performed by two servers. For example, the VOLTE terminal sends the voice information of the original second language to the first server, and the first server identifies and translates the voice information and returns the voice information to the VOLTE terminal, and the VOLT E terminal sends the voice information after the identification and translation processing. The second server sends the voice information to the VOLTE terminal. For another example, the VOLTE terminal sends the voice information of the original second language to the first server, and the first server returns the voice information to the VOLTE terminal, and the VOLTE terminal sends the voice information after the identification processing to the second server. The second server translates and synthesizes the voice information and returns it to the VOLTE terminal.

[0057] S17、 接收服务器返回的最终第一语言的语音信息。  [0057] S17. Receive voice information of the final first language returned by the server.

[0058] S18、 输出最终第一语言的语音信息。  [0058] S18. Output voice information of the final first language.

[0059] VOLTE终端接收到服务器返回的最终第一语言的语音信息后, 通过音频通路 对该最终第一语言的语音信息进行处理, 最后通过发声装置 (听筒、 扬声器等 ) 输出该最终第一语言的语音信息, 使用第一语言的本端用户则能够听懂对端 用户所说的话。  [0059] after receiving the voice information of the final first language returned by the server, the VOLTE terminal processes the voice information of the final first language through the audio path, and finally outputs the final first language through the sounding device (handset, speaker, etc.) The voice information, the local user in the first language can understand what the opposite user said.

[0060] 本实施例中, 进一步将接收到的对端用户的语音信息发送给服务器进行翻译处 理, 翻译为本端用户能够识别的语音信息, 再输出翻译后的语音信息, 使得本 端用户能够听懂对端用户的语音。 从而, 即使对端为普通终端, 也能够让使用 不同语言的用户实现远程语音交流, 大大扩大了应用范围, 进一步降低了沟通 成本。  [0060] In this embodiment, the received voice information of the peer user is further sent to the server for translation processing, and the voice information that can be recognized by the local user is translated, and the translated voice information is output, so that the local user can Understand the voice of the opposite user. Therefore, even if the peer end is an ordinary terminal, remote voice communication can be realized for users using different languages, which greatly expands the application range and further reduces the communication cost.

[0061] 提出本发明的语音通话方法第三实施例, 所述方法包括以下步骤:  [0061] A third embodiment of the voice call method of the present invention is proposed, and the method includes the following steps:

[0062] S21、 接收对端发送的原始第二语言的语音信息。 [0062] S21. Receive voice information of the original second language sent by the opposite end.

[0063] S22、 将原始第二语言的语音信息发送给服务器进行翻译处理, 以使服务器将 第二语言的语音信息翻译处理为最终第一语言的语音信息。 [0063] S22: Send the voice information of the original second language to the server for translation processing, so that the server will The speech information of the second language is translated into speech information of the final first language.

[0064] S23、 接收服务器返回的最终第一语言的语音信息。 [0064] S23. Receive voice information of the final first language returned by the server.

[0065] S24、 输出最终第一语言的语音信息。 [0065] S24. Output voice information of the final first language.

[0066] 本实施例中, 步骤 S21-步骤 S24分别与第二实施例中的步骤 S15-S18相同, 在此 不再赘述。  [0066] In this embodiment, the steps S21 to S24 are the same as the steps S15-S18 in the second embodiment, and details are not described herein again.

[0067] 本发明实施例的语音通话方法, 通过将接收到的对端用户的语音信息发送给服 务器进行翻译处理, 翻译为本端用户能够识别的语音信息, 再输出翻译后的语 音信息, 使得本端用户能够听懂对端用户的语音。 从而为通信终端增加了翻译 功能, 使得使用不同语言的用户实现了远程语音交流, 解决了使用不同语言的 用户无法通过通信终端进行远程语音交流的技术问题, 降低了沟通成本, 提升 了用户体验。  The voice call method of the embodiment of the present invention transmits the voice information of the received peer user to the server for translation processing, translates the voice information that the local user can recognize, and then outputs the translated voice information, so that The local user can understand the voice of the peer user. Thereby, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.

[0068] 进一步地, 在本发明的语音通话方法的第四实施例中, 步骤 S24之后还包括以 下步骤:  [0068] Further, in the fourth embodiment of the voice call method of the present invention, after step S24, the following steps are further included:

[0069] S25、 采集原始第一语言的语音信息。  [0069] S25. Acquire voice information of the original first language.

[0070] S26、 将原始第一语言的语音信息发送给服务器进行翻译处理, 以使服务器将 第一语言的语音信息翻译处理为最终第二语言的语音信息。  [0070] S26: Send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the first language into the voice information of the final second language.

[0071] S27、 接收服务器返回的最终第二语言的语音信息。 [0071] S27. Receive voice information of the final second language returned by the server.

[0072] S28、 将最终第二语言的语音信息发送给对端。 [0072] S28. Send the voice information of the final second language to the peer end.

[0073] 本实施例中, 步骤 S25-S28分别与第一实施例中的步骤 S11-S14系统, 在此不再 赘述。  [0073] In this embodiment, the steps S25-S28 are respectively related to the steps S11-S14 in the first embodiment, and details are not described herein again.

[0074] 本实施例中, 进一步地将采集的本端用户的语音信息发送给服务器进行翻译处 理, 翻译为对端用户能够识别的语音信息, 再将翻译后的语音信息发送给对端 , 使得对端用户能够听懂本端用户的语音。 从而, 即使对端为普通终端, 也能 够让使用不同语言的用户实现远程语音交流, 大大扩大了应用范围, 进一步降 低了沟通成本。  [0074] In this embodiment, the collected voice information of the local user is further sent to the server for translation processing, translated into voice information that the peer user can recognize, and then the translated voice information is sent to the peer end, so that The peer user can understand the voice of the local user. Therefore, even if the peer end is an ordinary terminal, remote voice communication can be realized for users using different languages, which greatly expands the application range and further reduces the communication cost.

[0075] 本发明实施例中, 第一实施例和第三实施例可以应用于如图 1所示的应用场景 , 其中, VOLTE终端 A与 VOLTE终端 B通过 IP多媒体系统 (IP Multimedia Subsys tern, IMS) 网络建立连接, 且 VOLTE终端 A和 VOLTE终端 B均分别连接语音识 别服务器、 翻译服务器和语音合成服务器, VOLTE终端 A和 VOLTE终端 B均釆用 第一实施例或第二实施例的语音通话方法进行语音通话, 从而使用不同语言的 用户就能实现远程语音交流。 In the embodiment of the present invention, the first embodiment and the third embodiment may be applied to the application scenario shown in FIG. 1 , where the VOLTE terminal A and the VOLTE terminal B pass the IP multimedia subsystem (IP Multimedia Subsys tern, IMS). The network establishes a connection, and VOLTE terminal A and VOLTE terminal B are respectively connected to the voice recognition The other server, the translation server, and the voice synthesizing server, the VOLTE terminal A and the VOLTE terminal B both use the voice call method of the first embodiment or the second embodiment to perform a voice call, so that users in different languages can implement remote voice communication.

[0076] 第二实施例和第四实施例可以应用于如图 2-图 4所示的应用场景。 图 2中, VOL TE终端 A与语音终端 B通过 IMS网络建立连接, 且 VOLTE终端 A分别连接语音识 别服务器、 翻译服务器和语音合成服务器, VOLTE终端 A釆用第二实施例或第 三实施例的语音通话方法与语音终端 B进行语音通话 从而使用不同语言的用户 就能实现远程语音交流。 图 3中 VOLTE终端 A通过 IMS网络连接 IMS网络与 2G/ 3G网络的网关, 语音终端 B通过 2G/3G网络连接 IMS网络与 2G/3G网络的网关, 且 VOLTE终端 A分别连接语音识别服务器、 翻译服务器和语音合成服务器, VO LTE终端 A釆用第二实施例或第三实施例的语音通话方法与语音终端 B进行语音 通话, 从而使用不同语言的用户就能实现远程语音交流。 图 4中, VOLTE终端 A 通过 IMS网络连接 IMS网络与公共交换电话网络 (Public Switched Telephone Network, PSTN) 的网关, 语音终端 B通过 PSTN连接 IMS网络与 PSTN的网关, 且 VOLTE终端 A分别连接语音识别服务器、 翻译服务器和语音合成服务器, VO LTE终端 A采用第二实施例或第三实施例的语音通话方法与语音终端 B进行语音 通话, 从而使用不同语言的用户就能实现远程语音交流。  [0076] The second embodiment and the fourth embodiment can be applied to the application scenarios as shown in FIGS. 2 to 4. In FIG. 2, the VOL TE terminal A and the voice terminal B establish a connection through the IMS network, and the VOLTE terminal A is respectively connected to the voice recognition server, the translation server and the voice synthesis server, and the VOLTE terminal A uses the second embodiment or the third embodiment. The voice call method and the voice terminal B make a voice call so that users in different languages can realize remote voice communication. In Figure 3, the VOLTE terminal A connects to the IMS network and the gateway of the 2G/3G network through the IMS network, and the voice terminal B connects the IMS network and the gateway of the 2G/3G network through the 2G/3G network, and the VOLTE terminal A is respectively connected to the voice recognition server, and the translation The server and the voice synthesizing server, the VO LTE terminal A uses the voice call method of the second embodiment or the third embodiment to make a voice call with the voice terminal B, so that users in different languages can realize remote voice communication. In Figure 4, the VOLTE terminal A connects to the IMS network and the public switched telephone network (PSTN) gateway through the IMS network, the voice terminal B connects the IMS network and the PSTN gateway through the PSTN, and the VOLTE terminal A is connected to the voice recognition respectively. The server, the translation server, and the voice synthesizing server, the VO LTE terminal A uses the voice call method of the second embodiment or the third embodiment to make a voice call with the voice terminal B, so that users in different languages can implement remote voice communication.

[0077] 语音识别服务器的处理吋延一般小于 3秒, 翻译服务器的处理吋延一般小于 200 毫秒, 语音合成服务器的处理吋延一般小于 200毫秒, IMS网络传输的吋延一般 为秒级。 因此, 利用 LTE通信的高速率低时延的特点, 在 VOLTE终端上实现语 音通话时的多语言实时翻译功能, 语音翻译处理的速度快, 时延小, 不会对用 户的通话造成影响, 从而使得使用不同语言的用户可以实现远程无障碍语音交 流。  [0077] The processing delay of the speech recognition server is generally less than 3 seconds, the processing delay of the translation server is generally less than 200 milliseconds, the processing delay of the speech synthesis server is generally less than 200 milliseconds, and the delay of the transmission of the IMS network is generally second. Therefore, using the high-rate and low-latency characteristics of LTE communication, the multi-language real-time translation function during voice call is implemented on the VOLTE terminal, and the voice translation processing speed is fast, the delay is small, and the call of the user is not affected, thereby Enables remote, accessible voice communication for users in different languages.

[0078] 参照图 5, 提出本发明的语音通话装置第一实施例, 所述装置包括信息采集模 块 10、 第一翻译处理模块 20、 第一信息接收模块 30和信息发送模块 40, 其中: Referring to FIG. 5, a first embodiment of a voice call device of the present invention is provided. The device includes an information collection module 10, a first translation processing module 20, a first information receiving module 30, and an information sending module 40, where:

[0079] 信息采集模块 10设置为采集原始第一语言的语音信息。 第一翻译处理模块 20设 置为将原始第一语言的语音信息发送给服务器进行翻译处理, 以使服务器将原 始第一语言的语音信息翻译处理为最终第二语言的语音信息。 第一信息接收模 块 30设置为接收服务器返回的最终第二语言的语音信息。 信息发送模块 40设置 为将最终第二语言的语音信息发送给对端。 本发明实施例中, VOLTE终端用户 使用的语言为第一语言, 对端用户使用的语言为第二语言。 当 VOLTE终端作为 发送端时, 信息釆集模块 10通过麦克风釆集用户的原始第一语言的语音信息。 第一翻译处理模块 20可以将原始第一语言的语音信息直接以语音数据流的方式 发送给服务器, 作为优选, 第一翻译处理模块 20将原始第一语言的语音信息以 数据包的形式分包发送给服务器。 例如, 第一翻译处理模块 20首先将原始第一 语言的语音信息进行录音处理, 录制为一个个的语音文件并缓存, 然后将缓存 的每个语音文件以数据包的形式依次发送给服务器。 [0079] The information collection module 10 is configured to collect voice information of the original first language. The first translation processing module 20 is configured to send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the original first language into the voice information of the final second language. First information receiving mode Block 30 is arranged to receive the voice information of the final second language returned by the server. The information sending module 40 is configured to send the voice information of the final second language to the opposite end. In the embodiment of the present invention, the language used by the VOLTE terminal user is the first language, and the language used by the peer user is the second language. When the VOLTE terminal is used as the transmitting end, the information collecting module 10 collects the voice information of the original first language of the user through the microphone. The first translation processing module 20 may send the voice information of the original first language to the server directly as a voice data stream. Preferably, the first translation processing module 20 subdivides the voice information of the original first language in the form of a data packet. Sent to the server. For example, the first translation processing module 20 first records the voice information of the original first language, records the voice files into a single voice file, and caches them, and then sequentially sends each cached voice file to the server in the form of a data packet.

[0080] 翻译处理主要包括识别、 翻译和合成三个流程, 这三个流程可以由一个服务器 完成, 也可以由两个或三个服务器完成。  [0080] The translation process mainly includes three processes of identification, translation and synthesis. The three processes can be completed by one server or by two or three servers.

[0081] 本发明实施例中, 服务器包括语音识别服务器、 翻译服务器和语音合成服务器 。 VOLTE终端与语音识别服务器建立基于 IP通信的连接, 通过第一设置模块设 置识别信息, 即需要识别的语言类型, 包括本端的语言类型 (第一语言) , 还 可以进一步包括对端的语言类型 (第二语言) ; 与翻译服务器建立基于 IP通信的 连接, 通过第二设置模块设置翻译信息, 即要翻译的语种, 包括本端对对端的 映射, 还可以进一步包括对端对本端映射; 与语音合成服务器建立基于 IP通信的 连接, 通过第三设置模块设置合成信息, 即语音合成的类型, 比如男女声、 语 速等。  [0081] In the embodiment of the present invention, the server includes a voice recognition server, a translation server, and a voice synthesis server. The VOLTE terminal establishes an IP-based communication connection with the voice recognition server, and sets the identification information through the first setting module, that is, the language type to be recognized, including the local language type (first language), and may further include the language type of the opposite end (first The second language); establishes an IP-based connection with the translation server, and sets the translation information through the second setting module, that is, the language to be translated, including the local-to-peer mapping, and may further include the peer-to-end mapping; The server establishes a connection based on IP communication, and sets the synthesized information through the third setting module, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.

[0082] 如图 6所示, 第一翻译处理模块 20包括第一发送单元 21、 第一接收单元 22、 第 二发送单元 23、 第二接收单元 24和第三发送单元 25, 其中:  As shown in FIG. 6, the first translation processing module 20 includes a first transmitting unit 21, a first receiving unit 22, a second transmitting unit 23, a second receiving unit 24, and a third transmitting unit 25, where:

[0083] 第一发送单元 21设置为将原始第一语言的语音信息发送给语音识别服务器 , 以 使语音识别服务器将语音信息识别为第一语言的字符串。 第一发送单元 21首先 将原始第一语言的语音信息进行录音处理, 录制为一个个的语音文件并缓存, 然后将缓存的每个语音文件以数据包的形式依次发送给语音识别服务器。 语音 识别服务器接收到语音文件后, 根据预设的识别信息对语音文件进行识别处理 , 识别为第一语言的字符串, 并将第一语言的字符串返回给 VOLTE终端。 第一 接收单元 22设置为接收语音识别服务器返回的第一语言的字符串。 第二发送单 元 23设置为将第一语言的字符串发送给翻译服务器, 以使翻译服务器将第一语 言的字符串翻译为第二语言的字符串。 当接收到第一语言的字符串后, 第二发 送单元 23则将第一语言的字符串发送给翻译服务器。 翻译服务器接收到第一语 言的字符串后, 根据预设的翻译信息对该第一语言的字符串进行翻译处理, 翻 译为第二语言的字符串, 并将第二语言的字符串返回给 VOLTE终端。 第二接收 单元 24设置为接收翻译服务器返回的第二语言的字符串。 第三发送单元 25设置 为将第二语言的字符串发送给语音合成服务器, 以使语音合成服务器将第二语 言的字符串合成为最终第二语言的语音信息。 当接收到第二语言的字符串后, 第三发送单元 25则将第二语言的字符串发送给语音合成服务器。 语音合成服务 器接收到第二语言的字符串后, 根据预设的合成信息对第二语言的字符串进行 合成处理, 合成为最终第二语言的语音信息, 并将最终第二语言的语音信息以 语音码流的形式返回给 VOLTE终端。 [0083] The first transmitting unit 21 is configured to transmit the voice information of the original first language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the first language. The first sending unit 21 first performs recording processing on the voice information of the original first language, records the voice files as a single voice file, and buffers, and then sequentially sends each cached voice file to the voice recognition server in the form of a data packet. After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the first language, and returns the character string of the first language to the VOLTE terminal. The first receiving unit 22 is arranged to receive a character string of the first language returned by the voice recognition server. Second send order The element 23 is arranged to send a string of the first language to the translation server to cause the translation server to translate the string of the first language into a string of the second language. After receiving the character string of the first language, the second transmitting unit 23 transmits the character string of the first language to the translation server. After receiving the string of the first language, the translation server translates the string of the first language according to the preset translation information, translates the string into the second language, and returns the string of the second language to the VOLTE. terminal. The second receiving unit 24 is arranged to receive a character string of the second language returned by the translation server. The third transmitting unit 25 is arranged to transmit the character string of the second language to the speech synthesis server such that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language. After receiving the character string of the second language, the third transmitting unit 25 transmits the character string of the second language to the speech synthesis server. After receiving the character string in the second language, the speech synthesis server synthesizes the character string of the second language according to the preset synthesis information, synthesizes the speech information into the final second language, and finally the voice information of the second language is The form of the voice stream is returned to the VOLTE terminal.

[0084] 在其它实施例中, 也可以由一个服务器完成原始第一语言的语音信息的识别、 翻译和合成处理。 例如, 第一翻译处理模块 20将原始第一语言的语音信息发送 给服务器, 服务器将该语音信息进行识别、 翻译和合成处理后返回给 VOLTE终 端。 在另一些实施例中, 也可以由两个服务器完成原始第一语言的语音信息的 识别、 翻译和合成处理。 例如, 第一翻译处理模块 20将原始第一语言的语音信 息发送给第一服务器, 第一服务器将该语音信息进行识别和翻译处理后返回给 V OLTE终端, 第一翻译处理模块 20再将识别和翻译处理后的语音信息发送给第二 服务器, 第二服务器将该语音信息进行合成处理后返回给 VOLTE终端。 又如, 第一翻译处理模块 20将原始第一语言的语音信息发送给第一服务器, 第一服务 器将该语音信息进行识别处理后返回给 VOLTE终端, 第一翻译处理模块 20再将 识别处理后的语音信息发送给第二服务器, 第二服务器将该语音信息进行翻译 和合成处理后返回给 VOLTE终端。  [0084] In other embodiments, the identification, translation, and synthesis processing of the voice information of the original first language may also be performed by one server. For example, the first translation processing module 20 transmits the voice information of the original first language to the server, and the server identifies, translates, and synthesizes the voice information and returns it to the VOLTE terminal. In other embodiments, the identification, translation, and synthesis processing of the speech information of the original first language may also be performed by two servers. For example, the first translation processing module 20 sends the voice information of the original first language to the first server, and the first server identifies and translates the voice information and returns the result to the V OLTE terminal, where the first translation processing module 20 identifies And the voice information after the translation processing is sent to the second server, and the second server combines the voice information and returns to the VOLTE terminal. For another example, the first translation processing module 20 sends the voice information of the original first language to the first server, and the first server performs the identification process and returns the voice information to the VOLTE terminal, and the first translation processing module 20 further identifies the processing. The voice information is sent to the second server, and the second server translates and synthesizes the voice information and returns it to the VOLTE terminal.

[0085] 本发明实施例的语音通话装置, 通过将釆集的本端用户的语音信息发送给服务 器进行翻译处理, 翻译为对端用户能够识别的语音信息, 再将翻译后的语音信 息发送给对端, 使得对端用户能够听懂本端用户的语音。 从而为通信终端增加 了翻译功能, 使得使用不同语言的用户实现了远程语音交流, 解决了使用不同 语言的用户无法通过通信终端进行远程语音交流的技术问题, 降低了沟通成本 , 提升了用户体验。 The voice call device of the embodiment of the present invention transmits the voice information of the local user of the collection to the server for translation processing, translates the voice information that can be recognized by the peer user, and then sends the translated voice information to the voice message. The peer end enables the peer user to understand the voice of the local user. Therefore, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, and the use is different. Language users can't communicate technical problems with remote voice communication through communication terminals, which reduces communication costs and improves user experience.

[0086] 参照图 7, 提出本发明的语音通话装置的第二实施例, 该装置包括第二信息接 收模块 50、 第二翻译处理模块 60、 第三信息接收模块 70和信息输出模块 80, 其 中: 第二信息接收模块 50设置为接收对端发送的原始第二语言的语音信息。 当 V OLTE终端作为接收端吋, 第二信息接收模块 50通过语音通道接收作为发送端的 对端发送的原始第二语言的语音信息。 第二翻译处理模块 60设置为将原始第二 语言的语音信息发送给服务器进行翻译处理, 以使服务器将原始第二语言的语 音信息翻译处理为最终第一语言的语音信息。 第二翻译处理模块 60可以将原始 第二语言的语音信息直接以语音数据流的方式发送给服务器, 作为优选, 第二 翻译处理模块 60将原始第二语言的语音信息以数据包的形式分包发送给服务器 。 例如, 第二翻译处理模块 60首先将原始第二语言的语音信息进行录音处理, 录制为一个个的语音文件并缓存, 然后将缓存的每个语音文件以数据包的形式 依次发送给服务器。  Referring to FIG. 7, a second embodiment of a voice call device of the present invention is provided. The device includes a second information receiving module 50, a second translation processing module 60, a third information receiving module 70, and an information output module 80, wherein The second information receiving module 50 is configured to receive the voice information of the original second language sent by the opposite end. When the V OLTE terminal is used as the receiving end, the second information receiving module 50 receives the voice information of the original second language sent by the opposite end of the transmitting end through the voice channel. The second translation processing module 60 is arranged to transmit the voice information of the original second language to the server for translation processing, so that the server translates the voice information of the original second language into the voice information of the final first language. The second translation processing module 60 may directly transmit the voice information of the original second language to the server as a voice data stream. Preferably, the second translation processing module 60 segments the voice information of the original second language in the form of a data packet. Sent to the server. For example, the second translation processing module 60 first performs recording processing on the voice information of the original second language, records the voice files as a single voice file, and caches them, and then sequentially sends each cached voice file to the server in the form of a data packet.

[0087] 本发明实施例中, 服务器包括语音识别服务器、 翻译服务器和语音合成服务器 。 VOLTE终端与语音识别服务器建立基于 IP通信的连接, 通过第一设置模块设 置识别信息, 即需要识别的语言类型, 包括对端的语言类型 (第二语言) , 还 可以进一步包括本端的语言类型 (第一语言) ; 与翻译服务器建立基于 IP通信的 连接, 通过第二设置模块设置翻译信息, 即要翻译的语种, 包括对端对本端映 射, 还可以进一步包括本端对对端的映射; 与语音合成服务器建立基于 IP通信的 连接, 通过第三设置模块设置合成信息, 即语音合成的类型, 比如男女声、 语 速等。  [0087] In the embodiment of the present invention, the server includes a voice recognition server, a translation server, and a voice synthesis server. The VOLTE terminal establishes an IP-based communication connection with the voice recognition server, and sets the identification information through the first setting module, that is, the language type to be recognized, including the language type of the opposite end (second language), and may further include the local language type (first) a language); establishing a connection based on the IP communication with the translation server, setting the translation information through the second setting module, that is, the language to be translated, including the mapping of the peer to the local end, and further including the mapping of the local end to the opposite end; and speech synthesis The server establishes a connection based on IP communication, and sets the synthesized information through the third setting module, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.

[0088] 如图 8所示, 第二翻译处理模块 60包括第四发送单元 61、 第三接收单元 62、 第 五发送单元 63、 第四接收单元 64和第六发送单元 65, 其中: 第四发送单元 61设 置为将原始第二语言的语音信息发送给语音识别服务器, 以使语音识别服务器 将语音信息识别为第二语言的字符串。 第四发送单元 61首先将原始第二语言的 语音信息进行录音处理, 录制为一个个的语音文件并缓存, 然后将缓存的每个 语音文件以数据包的形式依次发送给语音识别服务器。 语音识别服务器接收到 语音文件后, 根据预设的识别信息对语音文件进行识别处理, 识别为第二语言 的字符串, 并将第二语言的字符串返回给 VOLTE终端。 第三接收单元 62设置为 接收语音识别服务器返回的第二语言的字符串。 第五发送单元 63设置为将第二 语言的字符串发送给翻译服务器, 以使翻译服务器将第二语言的字符串翻译为 第一语言的字符串。 当接收到第二语言的字符串后, 第五发送单元 63则将第二 语言的字符串发送给翻译服务器。 翻译服务器接收到第二语言的字符串后, 根 据预设的翻译信息对该第二语言的字符串进行翻译处理, 翻译为第一语言的字 符串, 并将第一语言的字符串返回给 VOLTE终端。 第四接收单元 64设置为接收 翻译服务器返回的第一语言的字符串。 第六发送单元 65设置为将第一语言的字 符串发送给语音合成服务器, 以使语音合成服务器将第一语言的字符串合成为 最终第一语言的语音信息。 当接收到第一语言的字符串后, 第六发送单元 65则 将第一语言的字符串发送给语音合成服务器。 语音合成服务器接收到第一语言 的字符串后, 根据预设的合成信息对第一语言的字符串进行合成处理, 合成为 最终第一语言的语音信息, 并将最终第一语言的语音信息以语音码流的形式返 回给 VOLTE终端。 As shown in FIG. 8, the second translation processing module 60 includes a fourth transmitting unit 61, a third receiving unit 62, a fifth transmitting unit 63, a fourth receiving unit 64, and a sixth transmitting unit 65, where: The transmitting unit 61 is arranged to transmit the voice information of the original second language to the voice recognition server such that the voice recognition server recognizes the voice information as a character string of the second language. The fourth sending unit 61 first performs recording processing on the voice information of the original second language, records the voice files into a single voice file, and buffers them, and then sequentially sends each cached voice file to the voice recognition server in the form of a data packet. Received by the speech recognition server After the voice file, the voice file is identified according to the preset identification information, recognized as a character string of the second language, and the character string of the second language is returned to the VOLTE terminal. The third receiving unit 62 is arranged to receive a character string of the second language returned by the voice recognition server. The fifth transmitting unit 63 is arranged to transmit the character string of the second language to the translation server to cause the translation server to translate the character string of the second language into a character string of the first language. After receiving the character string of the second language, the fifth transmitting unit 63 transmits the character string of the second language to the translation server. After receiving the character string in the second language, the translation server translates the character string of the second language according to the preset translation information, translates the character string into the first language, and returns the character string of the first language to VOLTE. terminal. The fourth receiving unit 64 is arranged to receive a character string of the first language returned by the translation server. The sixth transmitting unit 65 is arranged to transmit the character string of the first language to the speech synthesis server such that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language. After receiving the character string of the first language, the sixth transmitting unit 65 transmits the character string of the first language to the speech synthesis server. After receiving the character string in the first language, the speech synthesis server synthesizes the character string of the first language according to the preset synthesis information, synthesizes the voice information into the final first language, and uses the voice information of the final first language to The form of the voice stream is returned to the VOLTE terminal.

在其它实施例中, 也可以由一个服务器完成原始第二语言的语音信息的识别、 翻译和合成处理。 例如, 第二翻译处理模块 60将原始第二语言的语音信息发送 给服务器, 服务器将该语音信息进行识别、 翻译和合成处理后返回给 VOLTE终 端。 在另一些实施例中, 也可以由两个服务器完成原始第二语言的语音信息的 识别、 翻译和合成处理。 例如, 第二翻译处理模块 60将原始第二语言的语音信 息发送给第一服务器, 第一服务器将该语音信息进行识别和翻译处理后返回给 V 0LTE终端, 第二翻译处理模块 60再将识别和翻译处理后的语音信息发送给第二 服务器, 第二服务器将该语音信息进行合成处理后返回给 VOLTE终端。 又如, 第二翻译处理模块 60将原始第二语言的语音信息发送给第一服务器, 第一服务 器将该语音信息进行识别处理后返回给 VOLTE终端, 第二翻译处理模块 60再将 识别处理后的语音信息发送给第二服务器, 第二服务器将该语音信息进行翻译 和合成处理后返回给 VOLTE终端。 第三信息接收模块 70设置为接收服务器返回 的最终第一语言的语音信息。 信息输出模块 80设置为输出最终第一语言的语音 信息。 当接收到服务器返回的最终第一语言的语音信息后, 信息输出模块 80则 通过音频通路对该最终第一语言的语音信息进行处理, 最后通过发声装置 (听 筒、 扬声器等) 输出该最终第一语言的语音信息, 使用第一语言的本端用户则 能够听懂对端用户所说的话。 In other embodiments, the identification, translation, and synthesis processing of the voice information of the original second language may also be performed by a server. For example, the second translation processing module 60 transmits the voice information of the original second language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the result to the VOLTE terminal. In other embodiments, the identification, translation, and synthesis processing of the speech information of the original second language may also be performed by two servers. For example, the second translation processing module 60 sends the voice information of the original second language to the first server, and the first server identifies and translates the voice information and returns the result to the V 0LTE terminal, where the second translation processing module 60 identifies And the voice information after the translation processing is sent to the second server, and the second server combines the voice information and returns to the VOLTE terminal. For another example, the second translation processing module 60 sends the voice information of the original second language to the first server, and the first server performs the identification process and returns the voice information to the VOLTE terminal, and the second translation processing module 60 performs the identification process. The voice information is sent to the second server, and the second server translates and synthesizes the voice information and returns it to the VOLTE terminal. The third information receiving module 70 is configured to receive the voice information of the final first language returned by the server. The information output module 80 is configured to output the voice of the final first language Information. After receiving the voice information of the final first language returned by the server, the information output module 80 processes the voice information of the final first language through the audio path, and finally outputs the final first through the sounding device (handset, speaker, etc.) The voice information of the language, the local user who uses the first language can understand what the opposite user said.

[0090] 前述第一实施例和第二实施例的语音通话装置, 可以应用于如图 1所示的应用 场景。 [0090] The voice call apparatuses of the foregoing first embodiment and the second embodiment can be applied to the application scenario as shown in FIG. 1.

[0091] 本发明实施例的语音通话装置, 通过将接收到的对端用户的语音信息发送给服 务器进行翻译处理, 翻译为本端用户能够识别的语音信息, 再输出翻译后的语 音信息, 使得本端用户能够听懂对端用户的语音。 从而为通信终端增加了翻译 功能, 使得使用不同语言的用户实现了远程语音交流, 解决了使用不同语言的 用户无法通过通信终端进行远程语音交流的技术问题, 降低了沟通成本, 提升 了用户体验。  The voice call device of the embodiment of the present invention transmits the voice information of the received peer user to the server for translation processing, translates the voice information that the local user can recognize, and then outputs the translated voice information, so that The local user can understand the voice of the peer user. Thereby, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.

[0092] 进一步地, 如图 9, 还可以将前述第一实施例和第二实施例的语音通话装置结 合起来形成第三实施例的语音通话装置。 使得语音通话装置既可以将本端采集 的语音信息进行翻译处理后再发送给对端, 也可以将对端发送的语音信息进行 翻译处理后再予以输出, 从而即使对端为普通的语音终端也能实现使用不同语 言的用户的远程语音交流, 大大扩大了应用范围, 进一步降低了沟通成本。  Further, as shown in FIG. 9, the voice communication devices of the foregoing first embodiment and the second embodiment may be combined to form the voice communication device of the third embodiment. The voice call device can not only translate the voice information collected by the local end but also send the voice information to the opposite end, and can also translate the voice information sent by the opposite end and then output the voice information, so that even if the opposite end is an ordinary voice terminal, The ability to implement remote voice communication for users using different languages greatly expands the scope of application and further reduces communication costs.

[0093] 本实施例的语音通话装置可以应用于如图 2-图 4所示的应用场景。  [0093] The voice call device of this embodiment can be applied to the application scenario as shown in FIG. 2 to FIG. 4.

Claims

[权利要求 1] 一种语音通话方法, 包括以下步骤: [Claim 1] A voice call method includes the following steps: 釆集原始第一语言的语音信息;  Collecting voice information in the original first language; 将所述原始第一语言的语音信息发送给服务器进行翻译处理, 以使所 述服务器将所述原始第一语言的语音信息翻译处理为最终第二语言的 语音信息;  Transmitting the voice information of the original first language to a server for translation processing, so that the server translates the voice information of the original first language into voice information of a final second language; 接收所述服务器返回的所述最终第二语言的语音信息;  Receiving voice information of the final second language returned by the server; 将所述最终第二语言的语音信息发送给对端。  Transmitting the voice information of the final second language to the peer end. [权利要求 2] 根据权利要求 1所述的语音通话方法, 其中, 所述服务器包括语音识 别服务器、 翻译服务器和语音合成服务器, 所述将所述第一语言的语 音信息发送给服务器进行翻译处理的步骤包括: 将所述原始第一语言的语音信息发送给语音识别服务器, 以使所述语 音识别服务器将所述语音信息识别为第一语言的字符串; 接收所述语音识别服务器返回的所述第一语言的字符串; 将所述第一语言的字符串发送给所述翻译服务器, 以使所述翻译服务 器将所述第一语言的字符串翻译为第二语言的字符串; [Claim 2] The voice call method according to claim 1, wherein the server includes a voice recognition server, a translation server, and a voice synthesis server, and the voice information in the first language is sent to a server for translation processing. The step of: transmitting the voice information of the original first language to a voice recognition server, so that the voice recognition server recognizes the voice information as a character string of a first language; receiving the returned by the voice recognition server a character string of the first language; sending the character string of the first language to the translation server, so that the translation server translates the character string of the first language into a character string of the second language; 接收所述翻译服务器返回的所述第二语言的字符串;  Receiving a character string of the second language returned by the translation server; 将所述第二语言的字符串发送给所述语音合成服务器, 以使所述语音 合成服务器将所述第二语言的字符串合成为最终第二语言的语音信息  Transmitting the character string of the second language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language [权利要求 3] 根据权利要求 1所述的语音通话方法, 其中, 所述方法还包括: 接收所述对端发送的原始第二语言的语音信息; 将所述原始第二语言的语音信息发送给服务器进行翻译处理, 以使所 述服务器将所述原始第二语言的语音信息翻译处理为最终第一语言的 语音信息; [Claim 3] The voice call method according to claim 1, wherein the method further comprises: receiving voice information of the original second language sent by the opposite end; transmitting voice information of the original second language Translating processing to the server, so that the server translates the voice information of the original second language into voice information of the final first language; 接收所述服务器返回的所述最终第一语言的语音信息;  Receiving the voice information of the final first language returned by the server; 输出所述最终第一语言的语音信息。  The voice information of the final first language is output. [权利要求 4] 根据权利要求 3所述的语音通话方法, 其中, 所述服务器包括语音识 别服务器、 翻译服务器和语音合成服务器, 所述将所述原始第二语言 的语音信息发送给服务器进行翻译处理的步骤包括: [Claim 4] The voice call method according to claim 3, wherein the server includes voice recognition The server, the translation server, and the voice synthesizing server, the step of transmitting the voice information of the original second language to the server for translation processing includes: 将所述原始第二语言的语音信息发送给语音识别服务器, 以使所述语 音识别服务器将所述语音信息识别为第二语言的字符串; 接收所述语音识别服务器返回的所述第二语言的字符串; 将所述第二语言的字符串发送给所述翻译服务器, 以使所述翻译服务 器将所述第二语言的字符串翻译为第一语言的字符串;  Transmitting the voice information of the original second language to a voice recognition server, so that the voice recognition server recognizes the voice information as a character string of a second language; receiving the second language returned by the voice recognition server a string of the second language is sent to the translation server to cause the translation server to translate the string of the second language into a string of the first language; 接收所述翻译服务器返回的所述第一语言的字符串;  Receiving a character string of the first language returned by the translation server; 将所述第一语言的字符串发送给所述语音合成服务器, 以使所述语音 合成服务器将所述第一语言的字符串合成为最终第一语言的语音信息  Transmitting the character string of the first language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language [权利要求 5] 根据权利要求 1所述的语音通话方法, 其中, 所述方法应用于 VOLTE [Claim 5] The voice call method according to claim 1, wherein the method is applied to VOLTE 终端。  terminal. [权利要求 6] —种语音通话方法, 包括以下步骤:  [Claim 6] A voice call method includes the following steps: 接收对端发送的原始第二语言的语音信息;  Receiving voice information of the original second language sent by the opposite end; 将所述原始第二语言的语音信息发送给服务器进行翻译处理, 以使所 述服务器将所述原始第二语言的语音信息翻译处理为最终第一语言的 语首 息;  Transmitting the voice information of the original second language to a server for translation processing, so that the server translates the voice information of the original second language into a language first of the final first language; 接收所述服务器返回的所述最终第一语言的语音信息;  Receiving the voice information of the final first language returned by the server; 输出所述最终第一语言的语音信息。  The voice information of the final first language is output. [权利要求 7] 根据权利要求 6所述的语音通话方法, 其中, 所述服务器包括语音识 别服务器、 翻译服务器和语音合成服务器, 所述将所述原始第二语言 的语音信息发送给服务器进行翻译处理的步骤包括: [Claim 7] The voice call method according to claim 6, wherein the server includes a voice recognition server, a translation server, and a voice synthesis server, and the voice information of the original second language is sent to a server for translation. The steps of processing include: 将所述原始第二语言的语音信息发送给语音识别服务器, 以使所述语 音识别服务器将所述语音信息识别为第二语言的字符串; 接收所述语音识别服务器返回的所述第二语言的字符串; 将所述第二语言的字符串发送给所述翻译服务器, 以使所述翻译服务 器将所述第二语言的字符串翻译为第一语言的字符串; 接收所述翻译服务器返回的所述第一语言的字符串; Transmitting the voice information of the original second language to a voice recognition server, so that the voice recognition server recognizes the voice information as a character string of a second language; receiving the second language returned by the voice recognition server a string of the second language is sent to the translation server to cause the translation server to translate the string of the second language into a string of the first language; Receiving a character string of the first language returned by the translation server; 将所述第一语言的字符串发送给所述语音合成服务器, 以使所述语音 合成服务器将所述第一语言的字符串合成为最终第一语言的语音信息  Transmitting the character string of the first language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language [权利要求 8] 根据权利要求 6所述的语音通话方法, 其中, 所述方法还包括: 釆集原始第一语言的语音信息; [Claim 8] The voice call method according to claim 6, wherein the method further comprises: collecting voice information of the original first language; 将所述原始第一语言的语音信息发送给服务器进行翻译处理, 以使所 述服务器将所述第一语言的语音信息翻译处理为最终第二语言的语音 in息;  Transmitting the voice information of the original first language to a server for translation processing, so that the server translates the voice information of the first language into a voice of a final second language; 接收所述服务器返回的所述最终第二语言的语音信息;  Receiving voice information of the final second language returned by the server; 将所述最终第二语言的语音信息发送给所述对端。  Transmitting the voice information of the final second language to the peer. [权利要求 9] 根据权利要求 6所述的语音通话方法, 其中, 所述方法应用于 VOLTE 终端。 [Claim 9] The voice call method according to claim 6, wherein the method is applied to a VOLTE terminal. [权利要求 10] —种语音通话装置, 包括信息采集模块、 第一翻译处理模块、 第一信 息接收模块、 信息发送模块,  [Claim 10] A voice call device, comprising: an information collecting module, a first translation processing module, a first information receiving module, and an information sending module, 信息采集模块设置为采集原始第一语言的语音信息;  The information collection module is configured to collect voice information in the original first language; 第一翻译处理模块设置为将所述原始第一语言的语音信息发送给服务 器进行翻译处理, 以使所述服务器将所述原始第一语言的语音信息翻 译处理为最终第二语言的语音信息;  The first translation processing module is configured to send the voice information of the original first language to a server for translation processing, so that the server translates the voice information of the original first language into voice information of a final second language; 第一信息接收模, 设置为接收所述服务器返回的所述最终第二语言的 语音信息;  a first information receiving mode, configured to receive voice information of the final second language returned by the server; 信息发送模块, 设置为将所述最终第二语言的语音信息发送给对端。  The information sending module is configured to send the voice information of the final second language to the peer end. [权利要求 11] 根据权利要求 10所述的语音通话装置, 其中, 所述服务器包括语音识 别服务器、 翻译服务器和语音合成服务器, 所述第一翻译处理模块包 括: [Claim 11] The voice communication device according to claim 10, wherein the server comprises a voice recognition server, a translation server, and a voice synthesis server, and the first translation processing module includes: 第一发送单元, 设置为将所述原始第一语言的语音信息发送给语音识 别服务器, 以使所述语音识别服务器将所述语音信息识别为第一语言 的字符串; 第一接收单元, 设置为接收所述语音识别服务器返回的所述第一语言 的字符串; a first sending unit, configured to send the voice information of the original first language to a voice recognition server, so that the voice recognition server recognizes the voice information as a character string of a first language; a first receiving unit, configured to receive a character string of the first language returned by the voice recognition server; 第二发送单元, 设置为将所述第一语言的字符串发送给所述翻译服务 器, 以使所述翻译服务器将所述第一语言的字符串翻译为第二语言的 字符串;  a second sending unit, configured to send the character string of the first language to the translation server, so that the translation server translates the character string of the first language into a character string of a second language; 第二接收单元, 设置为接收所述翻译服务器返回的所述第二语言的字 符串;  a second receiving unit, configured to receive the character string of the second language returned by the translation server; 第三发送单元, 设置为将所述第二语言的字符串发送给所述语音合成 服务器, 以使所述语音合成服务器将所述第二语言的字符串合成为最 终第二语言的语音信息。  And a third transmitting unit configured to send the character string of the second language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language. [权利要求 12] 根据权利要求 10所述的语音通话装置, 其中, 所述装置还包括: 第二信息接收模块, 设置为接收所述对端发送的原始第二语言的语音 fp息; [Claim 12] The voice communication device according to claim 10, wherein the device further includes: a second information receiving module, configured to receive a voice fp of the original second language sent by the opposite end; 第二翻译处理模块, 设置为将所述原始第二语言的语音信息发送给服 务器进行翻译处理, 以使所述服务器将所述第二语言的语音信息翻译 处理为最终第一语言的语音信息;  a second translation processing module, configured to send the voice information of the original second language to a server for translation processing, so that the server translates the voice information of the second language into voice information of a final first language; 第三信息接收模块, 设置为接收所述服务器返回的所述最终第一语言 的语音信息;  a third information receiving module, configured to receive the voice information of the final first language returned by the server; 信息输出模块, 设置为输出所述最终第一语言的语音信息。  The information output module is configured to output the voice information of the final first language. [权利要求 13] 根据权利要求 12所述的语音通话装置, 其中, 所述服务器包括语音识 别服务器、 翻译服务器和语音合成服务器, 所述第二翻译处理模块包 括:  [Claim 13] The voice communication device according to claim 12, wherein the server includes a voice recognition server, a translation server, and a voice synthesis server, and the second translation processing module includes: 第四发送单元, 设置为将所述原始第二语言的语音信息发送给语音识 别服务器, 以使所述语音识别服务器将所述语音信息识别为第二语言 的字符串;  a fourth sending unit, configured to send the voice information of the original second language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the second language; 第三接收单元, 设置为接收所述语音识别服务器返回的所述第二语言 的字符串;  a third receiving unit, configured to receive a character string of the second language returned by the voice recognition server; 第五发送单元, 设置为将所述第二语言的字符串发送给所述翻译服务 器, 以使所述翻译服务器将所述第二语言的字符串翻译为第一语言的 字符串; a fifth sending unit, configured to send the character string of the second language to the translation service So that the translation server translates the string of the second language into a string of the first language; 第四接收单元, 设置为接收所述翻译服务器返回的所述第一语言的字 符串;  a fourth receiving unit, configured to receive the character string of the first language returned by the translation server; 第六发送单元, 设置为将所述第一语言的字符串发送给所述语音合成 服务器, 以使所述语音合成服务器将所述第一语言的字符串合成为最 终第一语言的语音信息。  The sixth transmitting unit is configured to transmit the character string of the first language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language. [权利要求 14] 根据权利要求 10所述的语音通话装置, 其中, 所述装置应用于 VOLT [Claim 14] The voice communication device according to claim 10, wherein the device is applied to VOLT E终端。  E terminal. [权利要求 15] —种语音通话装置, 包括:  [Claim 15] A voice communication device, comprising: 第二信息接收模块, 设置为接收所述对端发送的原始第二语言的语音 fn息;  a second information receiving module, configured to receive the voice fn of the original second language sent by the opposite end; 第二翻译处理模块, 设置为将所述原始第二语言的语音信息发送给服 务器进行翻译处理, 以使所述服务器将所述原始第二语言的语音信息 翻译处理为最终第一语言的语音信息;  a second translation processing module, configured to send the voice information of the original second language to a server for translation processing, so that the server translates the voice information of the original second language into voice information of a final first language ; 第三信息接收模块, 设置为接收所述服务器返回的所述最终第一语言 的语音信息;  a third information receiving module, configured to receive the voice information of the final first language returned by the server; 信息输出模块, 设置为输出所述最终第一语言的语音信息。  The information output module is configured to output the voice information of the final first language. [权利要求 16] 根据权利要求 15所述的语音通话装置, 其中, 所述服务器包括语音识 别服务器、 翻译服务器和语音合成服务器, 所述第二翻译处理模块包 括:  [Claim 16] The voice communication device according to claim 15, wherein the server includes a voice recognition server, a translation server, and a voice synthesis server, and the second translation processing module includes: 第四发送单元, 设置为将所述原始第二语言的语音信息发送给语音识 别服务器, 以使所述语音识别服务器将所述语音信息识别为第二语言 的字符串;  a fourth sending unit, configured to send the voice information of the original second language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the second language; 第三接收单元, 设置为接收所述语音识别服务器返回的所述第二语言 的字符串;  a third receiving unit, configured to receive a character string of the second language returned by the voice recognition server; 第五发送单元, 设置为将所述第二语言的字符串发送给所述翻译服务 器, 以使所述翻译服务器将所述第二语言的字符串翻译为第一语言的 字符串; a fifth sending unit, configured to send the character string of the second language to the translation server, so that the translation server translates the character string of the second language into a first language String 第四接收单元, 设置为接收所述翻译服务器返回的所述第一语言的字 符串;  a fourth receiving unit, configured to receive the character string of the first language returned by the translation server; 第六发送单元, 设置为将所述第一语言的字符串发送给所述语音合成 服务器, 以使所述语音合成服务器将所述第一语言的字符串合成为最 终第一语言的语音信息。  The sixth transmitting unit is configured to transmit the character string of the first language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language. [权利要求 17] 根据权利要求 15所述的语音通话装置, 其中, 所述装置还包括: 信息釆集模块, 设置为釆集原始第一语言的语音信息; [Claim 17] The voice communication device according to claim 15, wherein the device further comprises: an information collection module, configured to collect voice information in the original first language; 第一翻译处理模块, 设置为将所述原始第一语言的语音信息发送给服 务器进行翻译处理, 以使所述服务器将所述第一语言的语音信息翻译 处理为最终第二语言的语音信息;  a first translation processing module, configured to send the voice information of the original first language to a server for translation processing, so that the server translates the voice information of the first language into voice information of a final second language; 第一信息接收模块, 设置为接收所述服务器返回的所述最终第二语言 的语音信息;  a first information receiving module, configured to receive voice information of the final second language returned by the server; 信息发送模块, 设置为将所述最终第二语言的语音信息发送给对端。  The information sending module is configured to send the voice information of the final second language to the peer end. [权利要求 18] 根据权利要求 15所述的语音通话装置, 其中, 所述装置应用于 VOLT [Claim 18] The voice communication device according to claim 15, wherein the device is applied to VOLT E终端。  E terminal.
PCT/CN2017/093741 2017-06-26 2017-07-20 Voice call method and device Ceased WO2019000515A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710495264.2A CN107343113A (en) 2017-06-26 2017-06-26 Audio communication method and device
CN201710495264.2 2017-06-26

Publications (1)

Publication Number Publication Date
WO2019000515A1 true WO2019000515A1 (en) 2019-01-03

Family

ID=60220070

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/093741 Ceased WO2019000515A1 (en) 2017-06-26 2017-07-20 Voice call method and device

Country Status (2)

Country Link
CN (1) CN107343113A (en)
WO (1) WO2019000515A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228575A (en) * 2017-12-20 2018-06-29 科大讯飞股份有限公司 Voiced translation exchange method and system
WO2019134107A1 (en) * 2018-01-05 2019-07-11 深圳市沃特沃德股份有限公司 Method and device for speech-to-speech translation, and translation device
CN109446533B (en) * 2018-09-17 2020-12-22 深圳市沃特沃德股份有限公司 Bluetooth translation machine, interactive mode of Bluetooth translation and device thereof
CN109286725B (en) 2018-10-15 2021-10-19 华为技术有限公司 Translation method and terminal
CN114999535A (en) * 2018-10-15 2022-09-02 华为技术有限公司 Voice data processing method and device in online translation process
CN109582976A (en) * 2018-10-15 2019-04-05 华为技术有限公司 A kind of interpretation method and electronic equipment based on voice communication
CN109327613B (en) * 2018-10-15 2020-09-29 华为技术有限公司 Negotiation method based on voice call translation capability and electronic equipment
CN110111770A (en) * 2019-05-10 2019-08-09 濮阳市顶峰网络科技有限公司 A kind of multilingual social interpretation method of network, system, equipment and medium
CN110267309B (en) * 2019-06-26 2022-09-23 广州三星通信技术研究有限公司 Method and equipment for translating call voice in real time
CN110442881A (en) * 2019-08-06 2019-11-12 上海祥久智能科技有限公司 A kind of information processing method and device of voice conversion
CN113660375B (en) * 2021-08-11 2023-02-03 维沃移动通信有限公司 Call method and device and electronic equipment
CN114625336A (en) * 2022-03-10 2022-06-14 北京小米移动软件有限公司 Call method, device, terminal device and storage medium
CN115767484B (en) * 2022-11-07 2024-07-09 中国联合网络通信集团有限公司 Call processing method, device, server, system and medium in customer service scene

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102360347A (en) * 2011-09-30 2012-02-22 宇龙计算机通信科技(深圳)有限公司 Voice translation method and system and voice translation server
CN104394265A (en) * 2014-10-31 2015-03-04 小米科技有限责任公司 Automatic session method and device based on mobile intelligent terminal
CN104754536A (en) * 2013-12-27 2015-07-01 中国移动通信集团公司 Method and system for realizing communication between different languages
CN105430208A (en) * 2015-10-23 2016-03-23 小米科技有限责任公司 Voice conversation method and apparatus, and terminal equipment
US20160170970A1 (en) * 2014-12-12 2016-06-16 Microsoft Technology Licensing, Llc Translation Control
CN106453043A (en) * 2016-09-29 2017-02-22 安徽声讯信息技术有限公司 Multi-language conversion-based instant communication system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101867632A (en) * 2009-06-12 2010-10-20 刘越 Mobile phone speech instant translation system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102360347A (en) * 2011-09-30 2012-02-22 宇龙计算机通信科技(深圳)有限公司 Voice translation method and system and voice translation server
CN104754536A (en) * 2013-12-27 2015-07-01 中国移动通信集团公司 Method and system for realizing communication between different languages
CN104394265A (en) * 2014-10-31 2015-03-04 小米科技有限责任公司 Automatic session method and device based on mobile intelligent terminal
US20160170970A1 (en) * 2014-12-12 2016-06-16 Microsoft Technology Licensing, Llc Translation Control
CN105430208A (en) * 2015-10-23 2016-03-23 小米科技有限责任公司 Voice conversation method and apparatus, and terminal equipment
CN106453043A (en) * 2016-09-29 2017-02-22 安徽声讯信息技术有限公司 Multi-language conversion-based instant communication system

Also Published As

Publication number Publication date
CN107343113A (en) 2017-11-10

Similar Documents

Publication Publication Date Title
WO2019000515A1 (en) Voice call method and device
US10834252B2 (en) Transcribing audio communication sessions
US8606249B1 (en) Methods and systems for enhancing audio quality during teleconferencing
EP3276905B1 (en) System for audio communication using lte
CN103379232B (en) Communication server, communication terminal and voice communication method
US20140278402A1 (en) Automatic Channel Selective Transcription Engine
US11710488B2 (en) Transcription of communications using multiple speech recognition systems
US10506067B2 (en) Dynamic personalization of a communication session in heterogeneous environments
WO2016094598A1 (en) Translation control
CN113395284B (en) Multi-scene voice service real-time matching method, system, equipment and storage medium
CN114979545A (en) Multi-terminal calling method, storage medium and electronic device
CN103067188A (en) Network phone conference system and implementation method thereof
US20090299735A1 (en) Method for Transferring an Audio Stream Between a Plurality of Terminals
RU2015156799A (en) SYSTEM AND METHOD FOR CREATING A WIRELESS TUBE FOR STATIONARY PHONES USING A HOME GATEWAY AND A SMARTPHONE
CN111448567A (en) Real-time speech processing
CN113612759A (en) High-performance high-concurrency intelligent broadcasting system based on SIP protocol and implementation method
EP2536176B1 (en) Text-to-speech injection apparatus for telecommunication system
CN107566340B (en) Conference auxiliary communication method and storage medium and device thereof
CN105407243B (en) An Echo Cancellation VOIP System Using Improved Affine Projection Algorithm on Android Platform
KR101341893B1 (en) Telephone call service apparatus and method for magnetic telephone of roip gateway
US10721360B2 (en) Method and device for reducing telephone call costs
CN116233351A (en) Method and system for interactive video conference based on small program
HK40073421A (en) Multi-terminal communication method, and storage medium, and electronic device
KR102413621B1 (en) Terminal apparatus and service server for providing information
Rothbucher et al. Backwards compatible 3d audio conference server using hrtf synthesis and sip

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17916375

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17916375

Country of ref document: EP

Kind code of ref document: A1