WO2019000515A1

WO2019000515A1 - Voice call method and device

Info

Publication number: WO2019000515A1
Application number: PCT/CN2017/093741
Authority: WO
Inventors: 蒋壮; 王文琪; 王广新; 陈杰; 温平
Original assignee: Shenzhen Water World Co Ltd
Current assignee: Shenzhen Water World Co Ltd
Priority date: 2017-06-26
Filing date: 2017-07-20
Publication date: 2019-01-03
Anticipated expiration: 2019-12-26
Also published as: CN107343113A

Abstract

本发明揭示了一种语音通话方法和装置，所述方法包括以下步骤：采集原始第一语言的语音信息；将原始第一语言的语音信息发送给服务器进行翻译处理，以使服务器将原始第一语言的语音信息翻译处理为最终第二语言的语音信息；接收服务器返回的最终第二语言的语音信息；将最终第二语言的语音信息发送给对端。The present invention discloses a voice call method and apparatus, the method comprising the steps of: collecting voice information of an original first language; transmitting voice information of the original first language to a server for translation processing, so that the server will be original first The voice information of the language is translated into the voice information of the final second language; the voice information of the final second language returned by the server is received; and the voice information of the final second language is sent to the opposite end.

Description

发明名称：语音通话方法和装置 Invention name: voice call method and device

技术领域 Technical field

[0001] 本发明涉及通信技术领域，特别是涉及到一种语音通话方法和装置。 [0001] The present invention relates to the field of communications technologies, and in particular, to a voice call method and apparatus.

背景技术 Background technique

[0002] 随着通信终端的使用日益广泛，人们利用通信终端可以实现多种功能，例如利用通信终端听音乐，看视频以及进行语音通话等等。语音通话是通信终端的一个基本的和常用的功能，即使人们远隔千里，也能够通过通信终端实现远程语音交流，无形中缩短了人与人之间的距离。 [0002] With the increasing use of communication terminals, people can realize various functions by using communication terminals, such as listening to music, watching videos, and making voice calls using communication terminals. Voice calls are a basic and commonly used function of communication terminals. Even if people are thousands of miles apart, they can realize remote voice communication through communication terminals, which virtually shortens the distance between people.

[0003] 同时，随着经济的全球化和囯际化发展，不同囯家之间的人们的交往也越来越密切。不同国家的人通常使用不同的语言，当两个用户中至少一个用户听不懂对方的语言，另一个用户也不会说对方的语言时，两个用户则无法通过通信终端进行远程语音交流，必须面对面的交谈，并且通过人工或者翻译机进行翻译 , 从而减少了沟通渠道，提高了沟通成本。 [0003] At the same time, with the globalization of the economy and the development of internationalization, people's exchanges between different countries are becoming more and more close. People in different countries usually use different languages. When at least one of the two users does not understand the other party's language, and the other user does not speak the other party's language, the two users cannot communicate remotely through the communication terminal. Face-to-face conversations must be made and translated through manual or translation machines, reducing communication channels and increasing communication costs.

技术问题 technical problem

[0004] 因此，如何通过通信终端为使用不同语言的用户实现远程语音交流，是当前亟需解决的技术问题。 [0004] Therefore, how to implement remote voice communication for users using different languages through a communication terminal is a technical problem that needs to be solved at present.

问题的解决方案 Problem solution

技术解决方案 Technical solution

[0005] 本发明的主要目的为提供一种语音通话方法和装置，旨在解决使用不同语言的用户无法通过通信终端进行远程语音交流的技术问题。 [0005] A primary object of the present invention is to provide a voice call method and apparatus for solving the technical problem that a user using a different language cannot perform remote voice communication through a communication terminal.

[0006] 为达以上目的，本发明实施例提出一种语音通话方法，所述方法包括以下步骤：釆集原始第一语言的语音信息；将所述原始第一语言的语音信息发送给服务器进行翻译处理，以使所述服务器将所述第一语言的语音信息翻译处理为最终第二语言的语音信息；接收所述服务器返回的所述最终第二语言的语音信息；将所述最终第二语言的语音信息发送给对端。 [0006] In order to achieve the above objective, an embodiment of the present invention provides a voice call method, where the method includes the following steps: collecting voice information of an original first language; and transmitting the voice information of the original first language to a server. Translating processing, so that the server translates the voice information of the first language into voice information of a final second language; receiving voice information of the final second language returned by the server; The voice information of the language is sent to the peer.

[0007] 基于同一发明构思，本发明实施例还提出一种语音通话方法，所述方法包括以下步骤：接收对端发送的原始第二语言的语音信息；将所述原始第二语言的语音信息发送给服务器进行翻译处理，以使所述服务器将所述第二语言的语音信息翻译处理为最终第一语言的语音信息；接收所述服务器返回的所述最终第一语言的语音信息；输出所述最终第一语言的语音信息。 [0007] Based on the same inventive concept, an embodiment of the present invention further provides a voice call method, where the method includes The following steps: receiving voice information of the original second language sent by the peer end; sending the voice information of the original second language to the server for translation processing, so that the server translates the voice information of the second language into Final voice information of the first language; receiving voice information of the final first language returned by the server; and outputting voice information of the final first language.

[0008] 本发明实施例同时提出一种语音通话装置，所述装置包括信息釆集模块、第一翻译处理模块、第一信息接收模块、信息发送模块，信息釆集模块设置为采集原始第一语言的语音信息；第一翻译处理模块设置为将所述原始第一语言的语音信息发送给服务器进行翻译处理，以使所述服务器将所述原始第一语言的语音信息翻译处理为最终第二语言的语音信息；第一信息接收模块设置为接收所述服务器返回的所述最终第二语言的语音信息；信息发送模块设置为将所述最终第二语言的语音信息发送给对端。 [0008] The embodiment of the present invention further provides a voice call device, where the device includes an information collection module, a first translation processing module, a first information receiving module, and an information sending module, and the information collection module is configured to collect the original first a voice information of the language; the first translation processing module is configured to send the voice information of the original first language to a server for translation processing, so that the server translates the voice information of the original first language into a final second The voice information of the language; the first information receiving module is configured to receive the voice information of the final second language returned by the server; and the information sending module is configured to send the voice information of the final second language to the peer end.

发明的有益效果 Advantageous effects of the invention

有益效果 Beneficial effect

[0009] 本发明实施例所提供的一种语音通话方法，通过将采集的本端用户的语音信息发送给服务器进行翻译处理，翻译为对端用户能够识别的语音信息，再将翻译后的语音信息发送给对端，使得对端用户能够听懂本端用户的语音。从而为通信终端增加了翻译功能，使得使用不同语言的用户实现了远程语音交流，解决了使用不同语言的用户无法通过通信终端进行远程语音交流的技术问题，降低了沟通成本，提升了用户体验。 [0009] A voice call method provided by an embodiment of the present invention sends a voice message of a local user to a server for translation processing, and translates the voice information that can be recognized by the peer user, and then translates the voice. The information is sent to the peer end, so that the peer user can understand the voice of the local user. Thereby, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.

对附图的简要说明 Brief description of the drawing

附图说明 DRAWINGS

[0010] 图 1是实现本发明实施例的语音通话方法一应用场景的系统框图； 1 is a system block diagram of an application scenario of a voice call method according to an embodiment of the present invention;

[0011] 图 2是实现本发明实施例的语音通话方法又一应用场景的系统框图； [0011] FIG. 2 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention;

[0012] 图 3是实现本发明实施例的语音通话方法又一应用场景的系统框图； [0012] FIG. 3 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention;

[0013] 图 4是实现本发明实施例的语音通话方法又一应用场景的系统框图； [0013] FIG. 4 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention;

[0014] 图 5是本发明的语音通话装置第一实施例的模块示意图； 5 is a block diagram showing a first embodiment of a voice communication device according to the present invention;

[0015] 图 6是图 5中的第一翻译处理模块的模块示意图； 6 is a block diagram of a first translation processing module of FIG. 5;

[0016] 图 7是本发明的语音通话装置第二实施例的模块示意图； [0017] 图 8是图 7中的第二翻译处理模块的模块示意图； 7 is a block diagram showing a second embodiment of a voice communication device according to the present invention; 8 is a block diagram of a second translation processing module of FIG. 7;

[0018] 图 9是本发明的语音通话装置第三实施例的模块示意图。 9 is a block diagram showing a third embodiment of a voice communication device of the present invention.

实施该发明的最佳实施例 BEST MODE FOR CARRYING OUT THE INVENTION

本发明的最佳实施方式 BEST MODE FOR CARRYING OUT THE INVENTION

[0019] 应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。 The specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

[0020] 本发明实施例的语音通话方法和装置，主要应用于 VOLTE终端，该 VOLTE终端即基于 VOLTE (Voice over LTE) 技术的通信终端。 VoLTE是一种 IP数据传输技术，无需 2G/3G网络，全部业务承载于 4G网络上，可实现数据与语音业务在同一网络下的统一。当然，也可以应用于基于其它 IP数据传输技术的通信终端，只要其能够将数据与语音业务统一在同一网络下即可，本发明对此不作限定。 [0020] The voice call method and apparatus of the embodiments of the present invention are mainly applied to a VOLTE terminal, which is a communication terminal based on VOLTE (Voice over LTE) technology. VoLTE is an IP data transmission technology that does not require a 2G/3G network. All services are carried on a 4G network, which enables data and voice services to be unified under the same network. Of course, it can also be applied to a communication terminal based on other IP data transmission technologies, as long as it can unify data and voice services in the same network, which is not limited by the present invention.

[0021] 本发明的语音通话方法第一实施例，所述方法包括以下步骤： [0021] The first embodiment of the voice call method of the present invention, the method includes the following steps:

[0022] Sll、釆集原始第一语言的语音信息。 [0022] S11. Collect voice information of the original first language.

[0023] 本发明实施例中，定义 VOLTE终端用户使用的语言为第一语言，对端用户使用的语言为第二语言。当 VOLTE终端作为发送端吋，通过麦克风采集用户的第一语言的语音信息。 [0023] In the embodiment of the present invention, the language used by the VOLTE terminal user is defined as the first language, and the language used by the peer user is the second language. When the VOLTE terminal acts as a transmitting terminal, the voice information of the user's first language is collected through the microphone.

[0024] S12、将原始第一语言的语音信息发送给服务器进行翻译处理，以使服务器将原始第一语言的语音信息翻译处理为最终第二语言的语音信息。 [0024] S12: Send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the original first language into the voice information of the final second language.

[0025] VOLTE终端可以将原始第一语言的语音信息直接以语音数据流的方式发送给服务器，作为优选， VOLTE终端将原始第一语言的语音信息以数据包的形式分包发送给服务器。例如， VOLTE终端首先将原始第一语言的语音信息进行录音处理，录制为一个个的语音文件并缓存，然后将缓存的每个语音文件以数据包的形式依次发送给服务器。 [0025] The VOLTE terminal may directly transmit the voice information of the original first language to the server as a voice data stream. Preferably, the VOLTE terminal sends the voice information of the original first language to the server in the form of a data packet. For example, the VOLTE terminal first records the voice information of the original first language, records it as a voice file and caches it, and then sends each cached voice file to the server in the form of a data packet.

[0026] 翻译处理主要包括识别、翻译和合成三个流程，这三个流程可以由一个服务器完成，也可以由两个或三个服务器完成。 [0026] Translation processing mainly includes three processes of identification, translation and synthesis. These three processes can be completed by one server or by two or three servers.

[0027] 本发明实施例中，服务器包括语音识别服务器、翻译服务器和语音合成服务器。 VOLTE终端与语音识别服务器建立基于 IP通信的连接，设置识别信息，即需要识别的语言类型，包括本端的语言类型（第一语言），还可以进一步包括对端的语言类型（第二语言）；与翻译服务器建立基于 IP通信的连接，设置翻译信息，即要翻译的语种，包括本端对对端的映射，还可以进一步包括对端对本端映射；与语音合成服务器建立基于 IP通信的连接，设置合成信息，即语音合成的类型，比如男女声、语速等。 [0027] In the embodiment of the present invention, the server includes a voice recognition server, a translation server, and a voice synthesis server. The VOLTE terminal establishes an IP-based connection with the voice recognition server, and sets the identification information, that is, the language type to be recognized, including the local language type (first language), and may further include The language type of the terminal (second language); establishes an IP-based connection with the translation server, sets the translation information, that is, the language to be translated, including the local-to-peer mapping, and may further include the peer-to-peer mapping; The server establishes a connection based on IP communication, and sets synthetic information, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.

[0028] 步骤 S12中， VOLTE终端将原始第一语言的语音信息发送给服务器进行翻译处理的具体流程如下： [0028] In step S12, the specific process of the VOLTE terminal transmitting the original first language voice information to the server for translation processing is as follows:

[0029] S121、将原始第一语言的语音信息发送给语音识别服务器，以使语音识别服务器将语音信息识别为第一语言的字符串。 [0029] S121. Send the voice information of the original first language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the first language.

[0030] VOLTE终端首先将原始第一语言的语音信息进行录音处理，录制为一个个的语音文件并缓存，然后将缓存的每个语音文件以数据包的形式依次发送给语音识别服务器。语音识别服务器接收到语音文件后，根据预设的识别信息对语音文件进行识别处理，识别为第一语言的字符串，并将第一语言的字符串返回给 V 0LTE终端。 [0030] The VOLTE terminal first records the voice information of the original first language, records the voice files into a single voice file, and buffers them, and then sends each cached voice file to the voice recognition server in the form of a data packet. After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the first language, and returns the character string of the first language to the V 0LTE terminal.

[0031] S122、接收语音识别服务器返回的第一语言的字符串。 [0031] S122. Receive a character string of the first language returned by the voice recognition server.

[0032] S123、将第一语言的字符串发送给翻译服务器，以使翻译服务器将第一语言的字符串翻译为第二语言的字符串。 [0032] S123. Send a character string of the first language to the translation server, so that the translation server translates the character string of the first language into the character string of the second language.

[0033] VOLTE终端接收到第一语言的字符串后，将第一语言的字符串发送给翻译服务器。翻译服务器接收到第一语言的字符串后，根据预设的翻译信息对该第一语言的字符串进行翻译处理，翻译为第二语言的字符串，并将第二语言的字符串返回给 VOLTE终端。 [0033] After receiving the character string of the first language, the VOLTE terminal sends the character string of the first language to the translation server. After receiving the string of the first language, the translation server translates the string of the first language according to the preset translation information, translates the string into the second language, and returns the string of the second language to the VOLTE. terminal.

[0034] S124、接收翻译服务器返回的第二语言的字符串。 [0034] S124. Receive a character string of a second language returned by the translation server.

[0035] S125、将第二语言的字符串发送给语音合成服务器，以使语音合成服务器将第二语言的字符串合成为最终第二语言的语音信息。 [0035] S125. Send a character string of the second language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language.

[0036] VOLTE终端接收到第二语言的字符串后，将第二语言的字符串发送给语音合成服务器。语音合成服务器接收到第二语言的字符串后，根据预设的合成信息对第二语言的字符串进行合成处理，合成为最终第二语言的语音信息，并将最终第二语言的语音信息以语音码流的形式返回给 VOLTE终端。 [0036] After receiving the character string of the second language, the VOLTE terminal sends the character string of the second language to the voice synthesizing server. After receiving the character string in the second language, the speech synthesis server synthesizes the character string of the second language according to the preset synthesis information, synthesizes the speech information into the final second language, and finally the voice information of the second language is The form of the voice stream is returned to the VOLTE terminal.

[0037] 在其它实施例中，也可以由一个服务器完成原始第一语言的语音信息的识别、翻译和合成处理。例如， VOLTE终端将原始第一语言的语音信息发送给服务器 , 服务器将该语音信息进行识别、翻译和合成处理后返回给 VOLTE终端。在另一些实施例中，也可以由两个服务器完成原始第一语言的语音信息的识别、翻译和合成处理。例如， VOLTE终端将原始第一语言的语音信息发送给第一服务器，第一服务器将该语音信息进行识别和翻译处理后返回给 VOLTE终端， VOLT E终端再将识别和翻译处理后的语音信息发送给第二服务器，第二服务器将该语音信息进行合成处理后返回给 VOLTE终端。又如， VOLTE终端将原始第一语言的语音信息发送给第一服务器，第一服务器将该语音信息进行识别处理后返回给 VOLTE终端， VOLTE终端再将识别处理后的语音信息发送给第二服务器，第二服务器将该语音信息进行翻译和合成处理后返回给 VOLTE终端。 [0037] In other embodiments, the voice information of the original first language may also be identified by a server. Translation and synthesis processing. For example, the VOLTE terminal transmits the voice information of the original first language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the voice information to the VOLTE terminal. In other embodiments, the identification, translation, and synthesis processing of the speech information of the original first language may also be performed by two servers. For example, the VOLTE terminal sends the voice information of the original first language to the first server, and the first server identifies and translates the voice information and returns the voice information to the VOLTE terminal, and the VOLT E terminal sends the voice information after the identification and translation processing. The second server sends the voice information to the VOLTE terminal. For another example, the VOLTE terminal sends the voice information of the original first language to the first server, and the first server returns the voice information to the VOLTE terminal, and the VOLTE terminal sends the voice information after the identification processing to the second server. The second server translates and synthesizes the voice information and returns it to the VOLTE terminal.

[0038] S13、接收服务器返回的最终第二语言的语音信息。 [0038] S13. Receive voice information of the final second language returned by the server.

[0039] S14、将最终第二语言的语音信息发送给对端。 [0039] S14. Send the voice information of the final second language to the peer end.

[0040] VOLTE终端接收到服务器返回的最终第二语言的语音信息后，通过语音通道将最终第二语言的语音信息发送给对端。对端接收到最终第二语言的语音信息后，通过音频通路对该最终第二语言的语音信息进行处理，最后通过发声装置 (听筒、扬声器等）输出该最终第二语言的语音信息，使用第二语言的对端用户则能够听懂本端用户所说的话。 [0040] After receiving the voice information of the final second language returned by the server, the VOLTE terminal sends the voice information of the final second language to the peer end through the voice channel. After receiving the voice information of the final second language, the peer end processes the voice information of the final second language through the audio channel, and finally outputs the voice information of the final second language through the sounding device (handset, speaker, etc.), using the The peer user of the second language can understand what the local user said.

[0041] 本发明实施例的语音通话方法，通过将采集的本端用户的语音信息发送给服务器进行翻译处理，翻译为对端用户能够识别的语音信息，再将翻译后的语音信息发送给对端，使得对端用户能够听懂本端用户的语音。从而为通信终端增加了翻译功能，使得使用不同语言的用户实现了远程语音交流，解决了使用不同语言的用户无法通过通信终端进行远程语音交流的技术问题，降低了沟通成本 , 提升了用户体验。 The voice call method of the embodiment of the present invention sends the voice information of the collected local user to the server for translation processing, translates the voice information that can be recognized by the peer user, and then sends the translated voice information to the pair. End, so that the peer user can understand the voice of the local user. Thereby, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.

[0042] 进一步地，在本发明的语音通话方法的第二实施例中，步骤 S 14之后还包括以下步骤： [0042] Further, in the second embodiment of the voice call method of the present invention, after step S14, the following steps are further included:

[0043] S15、接收对端发送的原始第二语言的语音信息。 [0043] S15. Receive voice information of the original second language sent by the opposite end.

[0044] 当 VOLTE终端作为接收端吋，通过语音通道接收作为发送端的对端发送的原始第二语言的语音信息。 [0045] S16、将原始第二语言的语音信息发送给服务器进行翻译处理，以使服务器将原始第二语言的语音信息翻译处理为最终第一语言的语音信息。 [0044] When the VOLTE terminal is used as the receiving end, the voice information of the original second language sent by the opposite end of the transmitting end is received through the voice channel. [0045] S16: Send the voice information of the original second language to the server for translation processing, so that the server translates the voice information of the original second language into the voice information of the final first language.

[0046] VOLTE终端可以将原始第二语言的语音信息直接以语音数据流的方式发送给服务器，作为优选， VOLTE终端将原始第二语言的语音信息以数据包的形式分包发送给服务器。例如， VOLTE终端首先将原始第二语言的语音信息进行录音处理，录制为一个个的语音文件并缓存，然后将缓存的每个语音文件以数据包的形式依次发送给服务器。 [0046] The VOLTE terminal may directly transmit the voice information of the original second language to the server as a voice data stream. Preferably, the VOLTE terminal sends the voice information of the original second language to the server in the form of a data packet. For example, the VOLTE terminal first records the voice information of the original second language, records it as a voice file and caches it, and then sends each cached voice file to the server in the form of a data packet.

[0047] 本发明实施例中，服务器包括语音识别服务器、翻译服务器和语音合成服务器。步骤 S16中， VOLTE终端将原始第二语言的语音信息发送给服务器进行翻译处理的具体流程如下： [0047] In the embodiment of the present invention, the server includes a voice recognition server, a translation server, and a voice synthesis server. In step S16, the specific process of the VOLTE terminal transmitting the voice information of the original second language to the server for translation processing is as follows:

[0048] S161、将原始第二语言的语音信息发送给语音识别服务器，以使语音识别服务器将语音信息识别为最终第二语言的字符串。 [0048] S161. Send the voice information of the original second language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the final second language.

[0049] VOLTE终端首先将原始第二语言的语音信息进行录音处理，录制为一个个的语音文件并缓存，然后将缓存的每个语音文件以数据包的形式依次发送给语音识别服务器。语音识别服务器接收到语音文件后，根据预设的识别信息对语音文件进行识别处理，识别为第二语言的字符串，并将第二语言的字符串返回给 V OLTE终端。 [0049] The VOLTE terminal first performs recording processing on the voice information of the original second language, records the voice files as a single voice file, and buffers, and then sends each cached voice file to the voice recognition server in the form of a data packet. After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the second language, and returns the character string of the second language to the V OLTE terminal.

[0050] S162、接收语音识别服务器返回的第二语言的字符串。 [0050] S162. Receive a character string of a second language returned by the voice recognition server.

[0051] S163、将第二语言的字符串发送给翻译服务器，以使翻译服务器将第二语言的字符串翻译为第一语言的字符串。 [0051] S163. Send a character string of the second language to the translation server, so that the translation server translates the character string of the second language into the character string of the first language.

[0052] VOLTE终端接收到第二语言的字符串后，将第二语言的字符串发送给翻译服务器。翻译服务器接收到第二语言的字符串后，根据预设的翻译信息对该第二语言的字符串进行翻译处理，翻译为第一语言的字符串，并将第一语言的字符串返回给 VOLTE终端。 [0052] After receiving the character string in the second language, the VOLTE terminal sends the character string of the second language to the translation server. After receiving the character string in the second language, the translation server translates the character string of the second language according to the preset translation information, translates the character string into the first language, and returns the character string of the first language to VOLTE. terminal.

[0053] S164、接收翻译服务器返回的第一语言的字符串。 [0053] S164. Receive a character string of the first language returned by the translation server.

[0054] S165、将第一语言的字符串发送给语音合成服务器，以使语音合成服务器将第一语言的字符串合成为最终第一语言的语音信息。 [0054] S165. Send the character string of the first language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language.

[0055] VOLTE终端接收到第一语言的字符串后，将第一语言的字符串发送给语音合成服务器。语音合成服务器接收到第一语言的字符串后，根据预设的合成信息对第一语言的字符串进行合成处理，合成为最终第一语言的语音信息，并将最终第一语言的语音信息以语音码流的形式返回给 VOLTE终端。 [0055] After receiving the character string in the first language, the VOLTE terminal sends the character string of the first language to the voice combination Become a server. After receiving the character string in the first language, the speech synthesis server synthesizes the character string of the first language according to the preset synthesis information, synthesizes the voice information into the final first language, and uses the voice information of the final first language to The form of the voice stream is returned to the VOLTE terminal.

[0056] 在其它实施例中，也可以由一个服务器完成原始第二语言的语音信息的识别、翻译和合成处理。例如， VOLTE终端将原始第二语言的语音信息发送给服务器 , 服务器将该语音信息进行识别、翻译和合成处理后返回给 VOLTE终端。在另一些实施例中，也可以由两个服务器完成原始第二语言的语音信息的识别、翻译和合成处理。例如， VOLTE终端将原始第二语言的语音信息发送给第一服务器，第一服务器将该语音信息进行识别和翻译处理后返回给 VOLTE终端， VOLT E终端再将识别和翻译处理后的语音信息发送给第二服务器，第二服务器将该语音信息进行合成处理后返回给 VOLTE终端。又如， VOLTE终端将原始第二语言的语音信息发送给第一服务器，第一服务器将该语音信息进行识别处理后返回给 VOLTE终端， VOLTE终端再将识别处理后的语音信息发送给第二服务器，第二服务器将该语音信息进行翻译和合成处理后返回给 VOLTE终端。 [0056] In other embodiments, the identification, translation, and synthesis processing of the voice information of the original second language may also be performed by one server. For example, the VOLTE terminal transmits the voice information of the original second language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the voice information to the VOLTE terminal. In other embodiments, the identification, translation, and composition processing of the speech information of the original second language may also be performed by two servers. For example, the VOLTE terminal sends the voice information of the original second language to the first server, and the first server identifies and translates the voice information and returns the voice information to the VOLTE terminal, and the VOLT E terminal sends the voice information after the identification and translation processing. The second server sends the voice information to the VOLTE terminal. For another example, the VOLTE terminal sends the voice information of the original second language to the first server, and the first server returns the voice information to the VOLTE terminal, and the VOLTE terminal sends the voice information after the identification processing to the second server. The second server translates and synthesizes the voice information and returns it to the VOLTE terminal.

[0057] S17、接收服务器返回的最终第一语言的语音信息。 [0057] S17. Receive voice information of the final first language returned by the server.

[0058] S18、输出最终第一语言的语音信息。 [0058] S18. Output voice information of the final first language.

[0059] VOLTE终端接收到服务器返回的最终第一语言的语音信息后，通过音频通路对该最终第一语言的语音信息进行处理，最后通过发声装置（听筒、扬声器等 ) 输出该最终第一语言的语音信息，使用第一语言的本端用户则能够听懂对端用户所说的话。 [0059] after receiving the voice information of the final first language returned by the server, the VOLTE terminal processes the voice information of the final first language through the audio path, and finally outputs the final first language through the sounding device (handset, speaker, etc.) The voice information, the local user in the first language can understand what the opposite user said.

[0060] 本实施例中，进一步将接收到的对端用户的语音信息发送给服务器进行翻译处理，翻译为本端用户能够识别的语音信息，再输出翻译后的语音信息，使得本端用户能够听懂对端用户的语音。从而，即使对端为普通终端，也能够让使用不同语言的用户实现远程语音交流，大大扩大了应用范围，进一步降低了沟通成本。 [0060] In this embodiment, the received voice information of the peer user is further sent to the server for translation processing, and the voice information that can be recognized by the local user is translated, and the translated voice information is output, so that the local user can Understand the voice of the opposite user. Therefore, even if the peer end is an ordinary terminal, remote voice communication can be realized for users using different languages, which greatly expands the application range and further reduces the communication cost.

[0061] 提出本发明的语音通话方法第三实施例，所述方法包括以下步骤： [0061] A third embodiment of the voice call method of the present invention is proposed, and the method includes the following steps:

[0062] S21、接收对端发送的原始第二语言的语音信息。 [0062] S21. Receive voice information of the original second language sent by the opposite end.

[0063] S22、将原始第二语言的语音信息发送给服务器进行翻译处理，以使服务器将第二语言的语音信息翻译处理为最终第一语言的语音信息。 [0063] S22: Send the voice information of the original second language to the server for translation processing, so that the server will The speech information of the second language is translated into speech information of the final first language.

[0064] S23、接收服务器返回的最终第一语言的语音信息。 [0064] S23. Receive voice information of the final first language returned by the server.

[0065] S24、输出最终第一语言的语音信息。 [0065] S24. Output voice information of the final first language.

[0066] 本实施例中，步骤 S21-步骤 S24分别与第二实施例中的步骤 S15-S18相同，在此不再赘述。 [0066] In this embodiment, the steps S21 to S24 are the same as the steps S15-S18 in the second embodiment, and details are not described herein again.

[0067] 本发明实施例的语音通话方法，通过将接收到的对端用户的语音信息发送给服务器进行翻译处理，翻译为本端用户能够识别的语音信息，再输出翻译后的语音信息，使得本端用户能够听懂对端用户的语音。从而为通信终端增加了翻译功能，使得使用不同语言的用户实现了远程语音交流，解决了使用不同语言的用户无法通过通信终端进行远程语音交流的技术问题，降低了沟通成本，提升了用户体验。 The voice call method of the embodiment of the present invention transmits the voice information of the received peer user to the server for translation processing, translates the voice information that the local user can recognize, and then outputs the translated voice information, so that The local user can understand the voice of the peer user. Thereby, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.

[0068] 进一步地，在本发明的语音通话方法的第四实施例中，步骤 S24之后还包括以下步骤： [0068] Further, in the fourth embodiment of the voice call method of the present invention, after step S24, the following steps are further included:

[0069] S25、采集原始第一语言的语音信息。 [0069] S25. Acquire voice information of the original first language.

[0070] S26、将原始第一语言的语音信息发送给服务器进行翻译处理，以使服务器将第一语言的语音信息翻译处理为最终第二语言的语音信息。 [0070] S26: Send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the first language into the voice information of the final second language.

[0071] S27、接收服务器返回的最终第二语言的语音信息。 [0071] S27. Receive voice information of the final second language returned by the server.

[0072] S28、将最终第二语言的语音信息发送给对端。 [0072] S28. Send the voice information of the final second language to the peer end.

[0073] 本实施例中，步骤 S25-S28分别与第一实施例中的步骤 S11-S14系统，在此不再赘述。 [0073] In this embodiment, the steps S25-S28 are respectively related to the steps S11-S14 in the first embodiment, and details are not described herein again.

[0074] 本实施例中，进一步地将采集的本端用户的语音信息发送给服务器进行翻译处理，翻译为对端用户能够识别的语音信息，再将翻译后的语音信息发送给对端 , 使得对端用户能够听懂本端用户的语音。从而，即使对端为普通终端，也能够让使用不同语言的用户实现远程语音交流，大大扩大了应用范围，进一步降低了沟通成本。 [0074] In this embodiment, the collected voice information of the local user is further sent to the server for translation processing, translated into voice information that the peer user can recognize, and then the translated voice information is sent to the peer end, so that The peer user can understand the voice of the local user. Therefore, even if the peer end is an ordinary terminal, remote voice communication can be realized for users using different languages, which greatly expands the application range and further reduces the communication cost.

[0075] 本发明实施例中，第一实施例和第三实施例可以应用于如图 1所示的应用场景 , 其中， VOLTE终端 A与 VOLTE终端 B通过 IP多媒体系统（IP Multimedia Subsys tern, IMS) 网络建立连接，且 VOLTE终端 A和 VOLTE终端 B均分别连接语音识别服务器、翻译服务器和语音合成服务器， VOLTE终端 A和 VOLTE终端 B均釆用第一实施例或第二实施例的语音通话方法进行语音通话，从而使用不同语言的用户就能实现远程语音交流。 In the embodiment of the present invention, the first embodiment and the third embodiment may be applied to the application scenario shown in FIG. 1 , where the VOLTE terminal A and the VOLTE terminal B pass the IP multimedia subsystem (IP Multimedia Subsys tern, IMS). The network establishes a connection, and VOLTE terminal A and VOLTE terminal B are respectively connected to the voice recognition The other server, the translation server, and the voice synthesizing server, the VOLTE terminal A and the VOLTE terminal B both use the voice call method of the first embodiment or the second embodiment to perform a voice call, so that users in different languages can implement remote voice communication.

[0076] 第二实施例和第四实施例可以应用于如图 2-图 4所示的应用场景。图 2中， VOL TE终端 A与语音终端 B通过 IMS网络建立连接，且 VOLTE终端 A分别连接语音识别服务器、翻译服务器和语音合成服务器， VOLTE终端 A釆用第二实施例或第三实施例的语音通话方法与语音终端 B进行语音通话从而使用不同语言的用户就能实现远程语音交流。图 3中 VOLTE终端 A通过 IMS网络连接 IMS网络与 2G/ 3G网络的网关，语音终端 B通过 2G/3G网络连接 IMS网络与 2G/3G网络的网关，且 VOLTE终端 A分别连接语音识别服务器、翻译服务器和语音合成服务器， VO LTE终端 A釆用第二实施例或第三实施例的语音通话方法与语音终端 B进行语音通话，从而使用不同语言的用户就能实现远程语音交流。图 4中， VOLTE终端 A 通过 IMS网络连接 IMS网络与公共交换电话网络（Public Switched Telephone Network, PSTN) 的网关，语音终端 B通过 PSTN连接 IMS网络与 PSTN的网关，且 VOLTE终端 A分别连接语音识别服务器、翻译服务器和语音合成服务器， VO LTE终端 A采用第二实施例或第三实施例的语音通话方法与语音终端 B进行语音通话，从而使用不同语言的用户就能实现远程语音交流。 [0076] The second embodiment and the fourth embodiment can be applied to the application scenarios as shown in FIGS. 2 to 4. In FIG. 2, the VOL TE terminal A and the voice terminal B establish a connection through the IMS network, and the VOLTE terminal A is respectively connected to the voice recognition server, the translation server and the voice synthesis server, and the VOLTE terminal A uses the second embodiment or the third embodiment. The voice call method and the voice terminal B make a voice call so that users in different languages can realize remote voice communication. In Figure 3, the VOLTE terminal A connects to the IMS network and the gateway of the 2G/3G network through the IMS network, and the voice terminal B connects the IMS network and the gateway of the 2G/3G network through the 2G/3G network, and the VOLTE terminal A is respectively connected to the voice recognition server, and the translation The server and the voice synthesizing server, the VO LTE terminal A uses the voice call method of the second embodiment or the third embodiment to make a voice call with the voice terminal B, so that users in different languages can realize remote voice communication. In Figure 4, the VOLTE terminal A connects to the IMS network and the public switched telephone network (PSTN) gateway through the IMS network, the voice terminal B connects the IMS network and the PSTN gateway through the PSTN, and the VOLTE terminal A is connected to the voice recognition respectively. The server, the translation server, and the voice synthesizing server, the VO LTE terminal A uses the voice call method of the second embodiment or the third embodiment to make a voice call with the voice terminal B, so that users in different languages can implement remote voice communication.

[0077] 语音识别服务器的处理吋延一般小于 3秒，翻译服务器的处理吋延一般小于 200 毫秒，语音合成服务器的处理吋延一般小于 200毫秒， IMS网络传输的吋延一般为秒级。因此，利用 LTE通信的高速率低时延的特点，在 VOLTE终端上实现语音通话时的多语言实时翻译功能，语音翻译处理的速度快，时延小，不会对用户的通话造成影响，从而使得使用不同语言的用户可以实现远程无障碍语音交流。 [0077] The processing delay of the speech recognition server is generally less than 3 seconds, the processing delay of the translation server is generally less than 200 milliseconds, the processing delay of the speech synthesis server is generally less than 200 milliseconds, and the delay of the transmission of the IMS network is generally second. Therefore, using the high-rate and low-latency characteristics of LTE communication, the multi-language real-time translation function during voice call is implemented on the VOLTE terminal, and the voice translation processing speed is fast, the delay is small, and the call of the user is not affected, thereby Enables remote, accessible voice communication for users in different languages.

[0078] 参照图 5，提出本发明的语音通话装置第一实施例，所述装置包括信息采集模块 10、第一翻译处理模块 20、第一信息接收模块 30和信息发送模块 40，其中： Referring to FIG. 5, a first embodiment of a voice call device of the present invention is provided. The device includes an information collection module 10, a first translation processing module 20, a first information receiving module 30, and an information sending module 40, where:

[0079] 信息采集模块 10设置为采集原始第一语言的语音信息。第一翻译处理模块 20设置为将原始第一语言的语音信息发送给服务器进行翻译处理，以使服务器将原始第一语言的语音信息翻译处理为最终第二语言的语音信息。第一信息接收模块 30设置为接收服务器返回的最终第二语言的语音信息。信息发送模块 40设置为将最终第二语言的语音信息发送给对端。本发明实施例中， VOLTE终端用户使用的语言为第一语言，对端用户使用的语言为第二语言。当 VOLTE终端作为发送端时，信息釆集模块 10通过麦克风釆集用户的原始第一语言的语音信息。第一翻译处理模块 20可以将原始第一语言的语音信息直接以语音数据流的方式发送给服务器，作为优选，第一翻译处理模块 20将原始第一语言的语音信息以数据包的形式分包发送给服务器。例如，第一翻译处理模块 20首先将原始第一语言的语音信息进行录音处理，录制为一个个的语音文件并缓存，然后将缓存的每个语音文件以数据包的形式依次发送给服务器。 [0079] The information collection module 10 is configured to collect voice information of the original first language. The first translation processing module 20 is configured to send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the original first language into the voice information of the final second language. First information receiving mode Block 30 is arranged to receive the voice information of the final second language returned by the server. The information sending module 40 is configured to send the voice information of the final second language to the opposite end. In the embodiment of the present invention, the language used by the VOLTE terminal user is the first language, and the language used by the peer user is the second language. When the VOLTE terminal is used as the transmitting end, the information collecting module 10 collects the voice information of the original first language of the user through the microphone. The first translation processing module 20 may send the voice information of the original first language to the server directly as a voice data stream. Preferably, the first translation processing module 20 subdivides the voice information of the original first language in the form of a data packet. Sent to the server. For example, the first translation processing module 20 first records the voice information of the original first language, records the voice files into a single voice file, and caches them, and then sequentially sends each cached voice file to the server in the form of a data packet.

[0080] 翻译处理主要包括识别、翻译和合成三个流程，这三个流程可以由一个服务器完成，也可以由两个或三个服务器完成。 [0080] The translation process mainly includes three processes of identification, translation and synthesis. The three processes can be completed by one server or by two or three servers.

[0081] 本发明实施例中，服务器包括语音识别服务器、翻译服务器和语音合成服务器。 VOLTE终端与语音识别服务器建立基于 IP通信的连接，通过第一设置模块设置识别信息，即需要识别的语言类型，包括本端的语言类型（第一语言），还可以进一步包括对端的语言类型（第二语言）；与翻译服务器建立基于 IP通信的连接，通过第二设置模块设置翻译信息，即要翻译的语种，包括本端对对端的映射，还可以进一步包括对端对本端映射；与语音合成服务器建立基于 IP通信的连接，通过第三设置模块设置合成信息，即语音合成的类型，比如男女声、语速等。 [0081] In the embodiment of the present invention, the server includes a voice recognition server, a translation server, and a voice synthesis server. The VOLTE terminal establishes an IP-based communication connection with the voice recognition server, and sets the identification information through the first setting module, that is, the language type to be recognized, including the local language type (first language), and may further include the language type of the opposite end (first The second language); establishes an IP-based connection with the translation server, and sets the translation information through the second setting module, that is, the language to be translated, including the local-to-peer mapping, and may further include the peer-to-end mapping; The server establishes a connection based on IP communication, and sets the synthesized information through the third setting module, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.

[0082] 如图 6所示，第一翻译处理模块 20包括第一发送单元 21、第一接收单元 22、第二发送单元 23、第二接收单元 24和第三发送单元 25，其中： As shown in FIG. 6, the first translation processing module 20 includes a first transmitting unit 21, a first receiving unit 22, a second transmitting unit 23, a second receiving unit 24, and a third transmitting unit 25, where:

[0083] 第一发送单元 21设置为将原始第一语言的语音信息发送给语音识别服务器 , 以使语音识别服务器将语音信息识别为第一语言的字符串。第一发送单元 21首先将原始第一语言的语音信息进行录音处理，录制为一个个的语音文件并缓存，然后将缓存的每个语音文件以数据包的形式依次发送给语音识别服务器。语音识别服务器接收到语音文件后，根据预设的识别信息对语音文件进行识别处理 , 识别为第一语言的字符串，并将第一语言的字符串返回给 VOLTE终端。第一接收单元 22设置为接收语音识别服务器返回的第一语言的字符串。第二发送单元 23设置为将第一语言的字符串发送给翻译服务器，以使翻译服务器将第一语言的字符串翻译为第二语言的字符串。当接收到第一语言的字符串后，第二发送单元 23则将第一语言的字符串发送给翻译服务器。翻译服务器接收到第一语言的字符串后，根据预设的翻译信息对该第一语言的字符串进行翻译处理，翻译为第二语言的字符串，并将第二语言的字符串返回给 VOLTE终端。第二接收单元 24设置为接收翻译服务器返回的第二语言的字符串。第三发送单元 25设置为将第二语言的字符串发送给语音合成服务器，以使语音合成服务器将第二语言的字符串合成为最终第二语言的语音信息。当接收到第二语言的字符串后，第三发送单元 25则将第二语言的字符串发送给语音合成服务器。语音合成服务器接收到第二语言的字符串后，根据预设的合成信息对第二语言的字符串进行合成处理，合成为最终第二语言的语音信息，并将最终第二语言的语音信息以语音码流的形式返回给 VOLTE终端。 [0083] The first transmitting unit 21 is configured to transmit the voice information of the original first language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the first language. The first sending unit 21 first performs recording processing on the voice information of the original first language, records the voice files as a single voice file, and buffers, and then sequentially sends each cached voice file to the voice recognition server in the form of a data packet. After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the first language, and returns the character string of the first language to the VOLTE terminal. The first receiving unit 22 is arranged to receive a character string of the first language returned by the voice recognition server. Second send order The element 23 is arranged to send a string of the first language to the translation server to cause the translation server to translate the string of the first language into a string of the second language. After receiving the character string of the first language, the second transmitting unit 23 transmits the character string of the first language to the translation server. After receiving the string of the first language, the translation server translates the string of the first language according to the preset translation information, translates the string into the second language, and returns the string of the second language to the VOLTE. terminal. The second receiving unit 24 is arranged to receive a character string of the second language returned by the translation server. The third transmitting unit 25 is arranged to transmit the character string of the second language to the speech synthesis server such that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language. After receiving the character string of the second language, the third transmitting unit 25 transmits the character string of the second language to the speech synthesis server. After receiving the character string in the second language, the speech synthesis server synthesizes the character string of the second language according to the preset synthesis information, synthesizes the speech information into the final second language, and finally the voice information of the second language is The form of the voice stream is returned to the VOLTE terminal.

[0084] 在其它实施例中，也可以由一个服务器完成原始第一语言的语音信息的识别、翻译和合成处理。例如，第一翻译处理模块 20将原始第一语言的语音信息发送给服务器，服务器将该语音信息进行识别、翻译和合成处理后返回给 VOLTE终端。在另一些实施例中，也可以由两个服务器完成原始第一语言的语音信息的识别、翻译和合成处理。例如，第一翻译处理模块 20将原始第一语言的语音信息发送给第一服务器，第一服务器将该语音信息进行识别和翻译处理后返回给 V OLTE终端，第一翻译处理模块 20再将识别和翻译处理后的语音信息发送给第二服务器，第二服务器将该语音信息进行合成处理后返回给 VOLTE终端。又如，第一翻译处理模块 20将原始第一语言的语音信息发送给第一服务器，第一服务器将该语音信息进行识别处理后返回给 VOLTE终端，第一翻译处理模块 20再将识别处理后的语音信息发送给第二服务器，第二服务器将该语音信息进行翻译和合成处理后返回给 VOLTE终端。 [0084] In other embodiments, the identification, translation, and synthesis processing of the voice information of the original first language may also be performed by one server. For example, the first translation processing module 20 transmits the voice information of the original first language to the server, and the server identifies, translates, and synthesizes the voice information and returns it to the VOLTE terminal. In other embodiments, the identification, translation, and synthesis processing of the speech information of the original first language may also be performed by two servers. For example, the first translation processing module 20 sends the voice information of the original first language to the first server, and the first server identifies and translates the voice information and returns the result to the V OLTE terminal, where the first translation processing module 20 identifies And the voice information after the translation processing is sent to the second server, and the second server combines the voice information and returns to the VOLTE terminal. For another example, the first translation processing module 20 sends the voice information of the original first language to the first server, and the first server performs the identification process and returns the voice information to the VOLTE terminal, and the first translation processing module 20 further identifies the processing. The voice information is sent to the second server, and the second server translates and synthesizes the voice information and returns it to the VOLTE terminal.

[0085] 本发明实施例的语音通话装置，通过将釆集的本端用户的语音信息发送给服务器进行翻译处理，翻译为对端用户能够识别的语音信息，再将翻译后的语音信息发送给对端，使得对端用户能够听懂本端用户的语音。从而为通信终端增加了翻译功能，使得使用不同语言的用户实现了远程语音交流，解决了使用不同语言的用户无法通过通信终端进行远程语音交流的技术问题，降低了沟通成本 , 提升了用户体验。 The voice call device of the embodiment of the present invention transmits the voice information of the local user of the collection to the server for translation processing, translates the voice information that can be recognized by the peer user, and then sends the translated voice information to the voice message. The peer end enables the peer user to understand the voice of the local user. Therefore, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, and the use is different. Language users can't communicate technical problems with remote voice communication through communication terminals, which reduces communication costs and improves user experience.

[0086] 参照图 7，提出本发明的语音通话装置的第二实施例，该装置包括第二信息接收模块 50、第二翻译处理模块 60、第三信息接收模块 70和信息输出模块 80, 其中：第二信息接收模块 50设置为接收对端发送的原始第二语言的语音信息。当 V OLTE终端作为接收端吋，第二信息接收模块 50通过语音通道接收作为发送端的对端发送的原始第二语言的语音信息。第二翻译处理模块 60设置为将原始第二语言的语音信息发送给服务器进行翻译处理，以使服务器将原始第二语言的语音信息翻译处理为最终第一语言的语音信息。第二翻译处理模块 60可以将原始第二语言的语音信息直接以语音数据流的方式发送给服务器，作为优选，第二翻译处理模块 60将原始第二语言的语音信息以数据包的形式分包发送给服务器。例如，第二翻译处理模块 60首先将原始第二语言的语音信息进行录音处理，录制为一个个的语音文件并缓存，然后将缓存的每个语音文件以数据包的形式依次发送给服务器。 Referring to FIG. 7, a second embodiment of a voice call device of the present invention is provided. The device includes a second information receiving module 50, a second translation processing module 60, a third information receiving module 70, and an information output module 80, wherein The second information receiving module 50 is configured to receive the voice information of the original second language sent by the opposite end. When the V OLTE terminal is used as the receiving end, the second information receiving module 50 receives the voice information of the original second language sent by the opposite end of the transmitting end through the voice channel. The second translation processing module 60 is arranged to transmit the voice information of the original second language to the server for translation processing, so that the server translates the voice information of the original second language into the voice information of the final first language. The second translation processing module 60 may directly transmit the voice information of the original second language to the server as a voice data stream. Preferably, the second translation processing module 60 segments the voice information of the original second language in the form of a data packet. Sent to the server. For example, the second translation processing module 60 first performs recording processing on the voice information of the original second language, records the voice files as a single voice file, and caches them, and then sequentially sends each cached voice file to the server in the form of a data packet.

[0087] 本发明实施例中，服务器包括语音识别服务器、翻译服务器和语音合成服务器。 VOLTE终端与语音识别服务器建立基于 IP通信的连接，通过第一设置模块设置识别信息，即需要识别的语言类型，包括对端的语言类型（第二语言），还可以进一步包括本端的语言类型（第一语言）；与翻译服务器建立基于 IP通信的连接，通过第二设置模块设置翻译信息，即要翻译的语种，包括对端对本端映射，还可以进一步包括本端对对端的映射；与语音合成服务器建立基于 IP通信的连接，通过第三设置模块设置合成信息，即语音合成的类型，比如男女声、语速等。 [0087] In the embodiment of the present invention, the server includes a voice recognition server, a translation server, and a voice synthesis server. The VOLTE terminal establishes an IP-based communication connection with the voice recognition server, and sets the identification information through the first setting module, that is, the language type to be recognized, including the language type of the opposite end (second language), and may further include the local language type (first) a language); establishing a connection based on the IP communication with the translation server, setting the translation information through the second setting module, that is, the language to be translated, including the mapping of the peer to the local end, and further including the mapping of the local end to the opposite end; and speech synthesis The server establishes a connection based on IP communication, and sets the synthesized information through the third setting module, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.

[0088] 如图 8所示，第二翻译处理模块 60包括第四发送单元 61、第三接收单元 62、第五发送单元 63、第四接收单元 64和第六发送单元 65，其中：第四发送单元 61设置为将原始第二语言的语音信息发送给语音识别服务器，以使语音识别服务器将语音信息识别为第二语言的字符串。第四发送单元 61首先将原始第二语言的语音信息进行录音处理，录制为一个个的语音文件并缓存，然后将缓存的每个语音文件以数据包的形式依次发送给语音识别服务器。语音识别服务器接收到语音文件后，根据预设的识别信息对语音文件进行识别处理，识别为第二语言的字符串，并将第二语言的字符串返回给 VOLTE终端。第三接收单元 62设置为接收语音识别服务器返回的第二语言的字符串。第五发送单元 63设置为将第二语言的字符串发送给翻译服务器，以使翻译服务器将第二语言的字符串翻译为第一语言的字符串。当接收到第二语言的字符串后，第五发送单元 63则将第二语言的字符串发送给翻译服务器。翻译服务器接收到第二语言的字符串后，根据预设的翻译信息对该第二语言的字符串进行翻译处理，翻译为第一语言的字符串，并将第一语言的字符串返回给 VOLTE终端。第四接收单元 64设置为接收翻译服务器返回的第一语言的字符串。第六发送单元 65设置为将第一语言的字符串发送给语音合成服务器，以使语音合成服务器将第一语言的字符串合成为最终第一语言的语音信息。当接收到第一语言的字符串后，第六发送单元 65则将第一语言的字符串发送给语音合成服务器。语音合成服务器接收到第一语言的字符串后，根据预设的合成信息对第一语言的字符串进行合成处理，合成为最终第一语言的语音信息，并将最终第一语言的语音信息以语音码流的形式返回给 VOLTE终端。 As shown in FIG. 8, the second translation processing module 60 includes a fourth transmitting unit 61, a third receiving unit 62, a fifth transmitting unit 63, a fourth receiving unit 64, and a sixth transmitting unit 65, where: The transmitting unit 61 is arranged to transmit the voice information of the original second language to the voice recognition server such that the voice recognition server recognizes the voice information as a character string of the second language. The fourth sending unit 61 first performs recording processing on the voice information of the original second language, records the voice files into a single voice file, and buffers them, and then sequentially sends each cached voice file to the voice recognition server in the form of a data packet. Received by the speech recognition server After the voice file, the voice file is identified according to the preset identification information, recognized as a character string of the second language, and the character string of the second language is returned to the VOLTE terminal. The third receiving unit 62 is arranged to receive a character string of the second language returned by the voice recognition server. The fifth transmitting unit 63 is arranged to transmit the character string of the second language to the translation server to cause the translation server to translate the character string of the second language into a character string of the first language. After receiving the character string of the second language, the fifth transmitting unit 63 transmits the character string of the second language to the translation server. After receiving the character string in the second language, the translation server translates the character string of the second language according to the preset translation information, translates the character string into the first language, and returns the character string of the first language to VOLTE. terminal. The fourth receiving unit 64 is arranged to receive a character string of the first language returned by the translation server. The sixth transmitting unit 65 is arranged to transmit the character string of the first language to the speech synthesis server such that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language. After receiving the character string of the first language, the sixth transmitting unit 65 transmits the character string of the first language to the speech synthesis server. After receiving the character string in the first language, the speech synthesis server synthesizes the character string of the first language according to the preset synthesis information, synthesizes the voice information into the final first language, and uses the voice information of the final first language to The form of the voice stream is returned to the VOLTE terminal.

在其它实施例中，也可以由一个服务器完成原始第二语言的语音信息的识别、翻译和合成处理。例如，第二翻译处理模块 60将原始第二语言的语音信息发送给服务器，服务器将该语音信息进行识别、翻译和合成处理后返回给 VOLTE终端。在另一些实施例中，也可以由两个服务器完成原始第二语言的语音信息的识别、翻译和合成处理。例如，第二翻译处理模块 60将原始第二语言的语音信息发送给第一服务器，第一服务器将该语音信息进行识别和翻译处理后返回给 V 0LTE终端，第二翻译处理模块 60再将识别和翻译处理后的语音信息发送给第二服务器，第二服务器将该语音信息进行合成处理后返回给 VOLTE终端。又如，第二翻译处理模块 60将原始第二语言的语音信息发送给第一服务器，第一服务器将该语音信息进行识别处理后返回给 VOLTE终端，第二翻译处理模块 60再将识别处理后的语音信息发送给第二服务器，第二服务器将该语音信息进行翻译和合成处理后返回给 VOLTE终端。第三信息接收模块 70设置为接收服务器返回的最终第一语言的语音信息。信息输出模块 80设置为输出最终第一语言的语音信息。当接收到服务器返回的最终第一语言的语音信息后，信息输出模块 80则通过音频通路对该最终第一语言的语音信息进行处理，最后通过发声装置（听筒、扬声器等）输出该最终第一语言的语音信息，使用第一语言的本端用户则能够听懂对端用户所说的话。 In other embodiments, the identification, translation, and synthesis processing of the voice information of the original second language may also be performed by a server. For example, the second translation processing module 60 transmits the voice information of the original second language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the result to the VOLTE terminal. In other embodiments, the identification, translation, and synthesis processing of the speech information of the original second language may also be performed by two servers. For example, the second translation processing module 60 sends the voice information of the original second language to the first server, and the first server identifies and translates the voice information and returns the result to the V 0LTE terminal, where the second translation processing module 60 identifies And the voice information after the translation processing is sent to the second server, and the second server combines the voice information and returns to the VOLTE terminal. For another example, the second translation processing module 60 sends the voice information of the original second language to the first server, and the first server performs the identification process and returns the voice information to the VOLTE terminal, and the second translation processing module 60 performs the identification process. The voice information is sent to the second server, and the second server translates and synthesizes the voice information and returns it to the VOLTE terminal. The third information receiving module 70 is configured to receive the voice information of the final first language returned by the server. The information output module 80 is configured to output the voice of the final first language Information. After receiving the voice information of the final first language returned by the server, the information output module 80 processes the voice information of the final first language through the audio path, and finally outputs the final first through the sounding device (handset, speaker, etc.) The voice information of the language, the local user who uses the first language can understand what the opposite user said.

[0090] 前述第一实施例和第二实施例的语音通话装置，可以应用于如图 1所示的应用场景。 [0090] The voice call apparatuses of the foregoing first embodiment and the second embodiment can be applied to the application scenario as shown in FIG. 1.

[0091] 本发明实施例的语音通话装置，通过将接收到的对端用户的语音信息发送给服务器进行翻译处理，翻译为本端用户能够识别的语音信息，再输出翻译后的语音信息，使得本端用户能够听懂对端用户的语音。从而为通信终端增加了翻译功能，使得使用不同语言的用户实现了远程语音交流，解决了使用不同语言的用户无法通过通信终端进行远程语音交流的技术问题，降低了沟通成本，提升了用户体验。 The voice call device of the embodiment of the present invention transmits the voice information of the received peer user to the server for translation processing, translates the voice information that the local user can recognize, and then outputs the translated voice information, so that The local user can understand the voice of the peer user. Thereby, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.

[0092] 进一步地，如图 9, 还可以将前述第一实施例和第二实施例的语音通话装置结合起来形成第三实施例的语音通话装置。使得语音通话装置既可以将本端采集的语音信息进行翻译处理后再发送给对端，也可以将对端发送的语音信息进行翻译处理后再予以输出，从而即使对端为普通的语音终端也能实现使用不同语言的用户的远程语音交流，大大扩大了应用范围，进一步降低了沟通成本。 Further, as shown in FIG. 9, the voice communication devices of the foregoing first embodiment and the second embodiment may be combined to form the voice communication device of the third embodiment. The voice call device can not only translate the voice information collected by the local end but also send the voice information to the opposite end, and can also translate the voice information sent by the opposite end and then output the voice information, so that even if the opposite end is an ordinary voice terminal, The ability to implement remote voice communication for users using different languages greatly expands the scope of application and further reduces communication costs.

[0093] 本实施例的语音通话装置可以应用于如图 2-图 4所示的应用场景。 [0093] The voice call device of this embodiment can be applied to the application scenario as shown in FIG. 2 to FIG. 4.

Claims

[Claim 1] A voice call method includes the following steps:

Collecting voice information in the original first language;

Transmitting the voice information of the original first language to a server for translation processing, so that the server translates the voice information of the original first language into voice information of a final second language;

Receiving voice information of the final second language returned by the server;

Transmitting the voice information of the final second language to the peer end.

[Claim 2] The voice call method according to claim 1, wherein the server includes a voice recognition server, a translation server, and a voice synthesis server, and the voice information in the first language is sent to a server for translation processing. The step of: transmitting the voice information of the original first language to a voice recognition server, so that the voice recognition server recognizes the voice information as a character string of a first language; receiving the returned by the voice recognition server a character string of the first language; sending the character string of the first language to the translation server, so that the translation server translates the character string of the first language into a character string of the second language;

Receiving a character string of the second language returned by the translation server;

Transmitting the character string of the second language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language

[Claim 3] The voice call method according to claim 1, wherein the method further comprises: receiving voice information of the original second language sent by the opposite end; transmitting voice information of the original second language Translating processing to the server, so that the server translates the voice information of the original second language into voice information of the final first language;

Receiving the voice information of the final first language returned by the server;

The voice information of the final first language is output.

[Claim 4] The voice call method according to claim 3, wherein the server includes voice recognition The server, the translation server, and the voice synthesizing server, the step of transmitting the voice information of the original second language to the server for translation processing includes:

Transmitting the voice information of the original second language to a voice recognition server, so that the voice recognition server recognizes the voice information as a character string of a second language; receiving the second language returned by the voice recognition server a string of the second language is sent to the translation server to cause the translation server to translate the string of the second language into a string of the first language;

Receiving a character string of the first language returned by the translation server;

Transmitting the character string of the first language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language

[Claim 5] The voice call method according to claim 1, wherein the method is applied to VOLTE

terminal.

[Claim 6] A voice call method includes the following steps:

Receiving voice information of the original second language sent by the opposite end;

Transmitting the voice information of the original second language to a server for translation processing, so that the server translates the voice information of the original second language into a language first of the final first language;

The voice information of the final first language is output.

[Claim 7] The voice call method according to claim 6, wherein the server includes a voice recognition server, a translation server, and a voice synthesis server, and the voice information of the original second language is sent to a server for translation. The steps of processing include:

Transmitting the voice information of the original second language to a voice recognition server, so that the voice recognition server recognizes the voice information as a character string of a second language; receiving the second language returned by the voice recognition server a string of the second language is sent to the translation server to cause the translation server to translate the string of the second language into a string of the first language; Receiving a character string of the first language returned by the translation server;

[Claim 8] The voice call method according to claim 6, wherein the method further comprises: collecting voice information of the original first language;

Transmitting the voice information of the original first language to a server for translation processing, so that the server translates the voice information of the first language into a voice of a final second language;

Transmitting the voice information of the final second language to the peer.

[Claim 9] The voice call method according to claim 6, wherein the method is applied to a VOLTE terminal.

[Claim 10] A voice call device, comprising: an information collecting module, a first translation processing module, a first information receiving module, and an information sending module,

The information collection module is configured to collect voice information in the original first language;

The first translation processing module is configured to send the voice information of the original first language to a server for translation processing, so that the server translates the voice information of the original first language into voice information of a final second language;

a first information receiving mode, configured to receive voice information of the final second language returned by the server;

The information sending module is configured to send the voice information of the final second language to the peer end.

[Claim 11] The voice communication device according to claim 10, wherein the server comprises a voice recognition server, a translation server, and a voice synthesis server, and the first translation processing module includes:

a first sending unit, configured to send the voice information of the original first language to a voice recognition server, so that the voice recognition server recognizes the voice information as a character string of a first language; a first receiving unit, configured to receive a character string of the first language returned by the voice recognition server;

a second sending unit, configured to send the character string of the first language to the translation server, so that the translation server translates the character string of the first language into a character string of a second language;

a second receiving unit, configured to receive the character string of the second language returned by the translation server;

And a third transmitting unit configured to send the character string of the second language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language.

[Claim 12] The voice communication device according to claim 10, wherein the device further includes: a second information receiving module, configured to receive a voice fp of the original second language sent by the opposite end;

a second translation processing module, configured to send the voice information of the original second language to a server for translation processing, so that the server translates the voice information of the second language into voice information of a final first language;

a third information receiving module, configured to receive the voice information of the final first language returned by the server;

The information output module is configured to output the voice information of the final first language.

[Claim 13] The voice communication device according to claim 12, wherein the server includes a voice recognition server, a translation server, and a voice synthesis server, and the second translation processing module includes:

a fourth sending unit, configured to send the voice information of the original second language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the second language;

a third receiving unit, configured to receive a character string of the second language returned by the voice recognition server;

a fifth sending unit, configured to send the character string of the second language to the translation service So that the translation server translates the string of the second language into a string of the first language;

a fourth receiving unit, configured to receive the character string of the first language returned by the translation server;

The sixth transmitting unit is configured to transmit the character string of the first language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language.

[Claim 14] The voice communication device according to claim 10, wherein the device is applied to VOLT

E terminal.

[Claim 15] A voice communication device, comprising:

a second information receiving module, configured to receive the voice fn of the original second language sent by the opposite end;

a second translation processing module, configured to send the voice information of the original second language to a server for translation processing, so that the server translates the voice information of the original second language into voice information of a final first language ;

[Claim 16] The voice communication device according to claim 15, wherein the server includes a voice recognition server, a translation server, and a voice synthesis server, and the second translation processing module includes:

a fifth sending unit, configured to send the character string of the second language to the translation server, so that the translation server translates the character string of the second language into a first language String

[Claim 17] The voice communication device according to claim 15, wherein the device further comprises: an information collection module, configured to collect voice information in the original first language;

a first translation processing module, configured to send the voice information of the original first language to a server for translation processing, so that the server translates the voice information of the first language into voice information of a final second language;

a first information receiving module, configured to receive voice information of the final second language returned by the server;

[Claim 18] The voice communication device according to claim 15, wherein the device is applied to VOLT

E terminal.