WO2019000515A1 - Voice call method and device - Google Patents
Voice call method and device Download PDFInfo
- Publication number
- WO2019000515A1 WO2019000515A1 PCT/CN2017/093741 CN2017093741W WO2019000515A1 WO 2019000515 A1 WO2019000515 A1 WO 2019000515A1 CN 2017093741 W CN2017093741 W CN 2017093741W WO 2019000515 A1 WO2019000515 A1 WO 2019000515A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- language
- server
- voice
- information
- voice information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M7/00—Arrangements for interconnection between switching centres
- H04M7/006—Networks other than PSTN/ISDN providing telephone service, e.g. Voice over Internet Protocol (VoIP), including next generation networks with a packet-switched transport layer
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
Definitions
- the present invention relates to the field of communications technologies, and in particular, to a voice call method and apparatus.
- a primary object of the present invention is to provide a voice call method and apparatus for solving the technical problem that a user using a different language cannot perform remote voice communication through a communication terminal.
- an embodiment of the present invention provides a voice call method, where the method includes the following steps: collecting voice information of an original first language; and transmitting the voice information of the original first language to a server. Translating processing, so that the server translates the voice information of the first language into voice information of a final second language; receiving voice information of the final second language returned by the server; The voice information of the language is sent to the peer.
- an embodiment of the present invention further provides a voice call method, where the method includes The following steps: receiving voice information of the original second language sent by the peer end; sending the voice information of the original second language to the server for translation processing, so that the server translates the voice information of the second language into Final voice information of the first language; receiving voice information of the final first language returned by the server; and outputting voice information of the final first language.
- the embodiment of the present invention further provides a voice call device, where the device includes an information collection module, a first translation processing module, a first information receiving module, and an information sending module, and the information collection module is configured to collect the original first a voice information of the language; the first translation processing module is configured to send the voice information of the original first language to a server for translation processing, so that the server translates the voice information of the original first language into a final second The voice information of the language; the first information receiving module is configured to receive the voice information of the final second language returned by the server; and the information sending module is configured to send the voice information of the final second language to the peer end.
- a voice call method provided by an embodiment of the present invention sends a voice message of a local user to a server for translation processing, and translates the voice information that can be recognized by the peer user, and then translates the voice.
- the information is sent to the peer end, so that the peer user can understand the voice of the local user.
- the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.
- FIG. 1 is a system block diagram of an application scenario of a voice call method according to an embodiment of the present invention
- FIG. 2 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention.
- FIG. 3 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention.
- FIG. 4 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention.
- FIG. 5 is a block diagram showing a first embodiment of a voice communication device according to the present invention.
- FIG. 6 is a block diagram of a first translation processing module of FIG. 5;
- FIG. 7 is a block diagram showing a second embodiment of a voice communication device according to the present invention
- 8 is a block diagram of a second translation processing module of FIG. 7;
- FIG. 9 is a block diagram showing a third embodiment of a voice communication device of the present invention.
- VOLTE Voice over LTE
- VoLTE is an IP data transmission technology that does not require a 2G/3G network. All services are carried on a 4G network, which enables data and voice services to be unified under the same network.
- it can also be applied to a communication terminal based on other IP data transmission technologies, as long as it can unify data and voice services in the same network, which is not limited by the present invention.
- the first embodiment of the voice call method of the present invention includes the following steps:
- the language used by the VOLTE terminal user is defined as the first language, and the language used by the peer user is the second language.
- the VOLTE terminal acts as a transmitting terminal, the voice information of the user's first language is collected through the microphone.
- S12 Send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the original first language into the voice information of the final second language.
- the VOLTE terminal may directly transmit the voice information of the original first language to the server as a voice data stream.
- the VOLTE terminal sends the voice information of the original first language to the server in the form of a data packet.
- the VOLTE terminal first records the voice information of the original first language, records it as a voice file and caches it, and then sends each cached voice file to the server in the form of a data packet.
- Translation processing mainly includes three processes of identification, translation and synthesis. These three processes can be completed by one server or by two or three servers.
- the server includes a voice recognition server, a translation server, and a voice synthesis server.
- the VOLTE terminal establishes an IP-based connection with the voice recognition server, and sets the identification information, that is, the language type to be recognized, including the local language type (first language), and may further include The language type of the terminal (second language); establishes an IP-based connection with the translation server, sets the translation information, that is, the language to be translated, including the local-to-peer mapping, and may further include the peer-to-peer mapping;
- the server establishes a connection based on IP communication, and sets synthetic information, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.
- step S12 the specific process of the VOLTE terminal transmitting the original first language voice information to the server for translation processing is as follows:
- S121 Send the voice information of the original first language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the first language.
- the VOLTE terminal first records the voice information of the original first language, records the voice files into a single voice file, and buffers them, and then sends each cached voice file to the voice recognition server in the form of a data packet. After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the first language, and returns the character string of the first language to the V 0LTE terminal.
- S122 Receive a character string of the first language returned by the voice recognition server.
- S123 Send a character string of the first language to the translation server, so that the translation server translates the character string of the first language into the character string of the second language.
- the VOLTE terminal After receiving the character string of the first language, the VOLTE terminal sends the character string of the first language to the translation server. After receiving the string of the first language, the translation server translates the string of the first language according to the preset translation information, translates the string into the second language, and returns the string of the second language to the VOLTE. terminal.
- S124 Receive a character string of a second language returned by the translation server.
- S125 Send a character string of the second language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language.
- the VOLTE terminal After receiving the character string of the second language, the VOLTE terminal sends the character string of the second language to the voice synthesizing server. After receiving the character string in the second language, the speech synthesis server synthesizes the character string of the second language according to the preset synthesis information, synthesizes the speech information into the final second language, and finally the voice information of the second language is The form of the voice stream is returned to the VOLTE terminal.
- the voice information of the original first language may also be identified by a server. Translation and synthesis processing.
- the VOLTE terminal transmits the voice information of the original first language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the voice information to the VOLTE terminal.
- the identification, translation, and synthesis processing of the speech information of the original first language may also be performed by two servers.
- the VOLTE terminal sends the voice information of the original first language to the first server, and the first server identifies and translates the voice information and returns the voice information to the VOLTE terminal, and the VOLT E terminal sends the voice information after the identification and translation processing.
- the second server sends the voice information to the VOLTE terminal.
- the VOLTE terminal sends the voice information of the original first language to the first server, and the first server returns the voice information to the VOLTE terminal, and the VOLTE terminal sends the voice information after the identification processing to the second server.
- the second server translates and synthesizes the voice information and returns it to the VOLTE terminal.
- the VOLTE terminal After receiving the voice information of the final second language returned by the server, the VOLTE terminal sends the voice information of the final second language to the peer end through the voice channel. After receiving the voice information of the final second language, the peer end processes the voice information of the final second language through the audio channel, and finally outputs the voice information of the final second language through the sounding device (handset, speaker, etc.), using the The peer user of the second language can understand what the local user said.
- the voice call method of the embodiment of the present invention sends the voice information of the collected local user to the server for translation processing, translates the voice information that can be recognized by the peer user, and then sends the translated voice information to the pair. End, so that the peer user can understand the voice of the local user.
- the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.
- step S14 the following steps are further included:
- S15 Receive voice information of the original second language sent by the opposite end.
- the VOLTE terminal may directly transmit the voice information of the original second language to the server as a voice data stream.
- the VOLTE terminal sends the voice information of the original second language to the server in the form of a data packet.
- the VOLTE terminal first records the voice information of the original second language, records it as a voice file and caches it, and then sends each cached voice file to the server in the form of a data packet.
- the server includes a voice recognition server, a translation server, and a voice synthesis server.
- step S16 the specific process of the VOLTE terminal transmitting the voice information of the original second language to the server for translation processing is as follows:
- the VOLTE terminal first performs recording processing on the voice information of the original second language, records the voice files as a single voice file, and buffers, and then sends each cached voice file to the voice recognition server in the form of a data packet.
- the voice recognition server After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the second language, and returns the character string of the second language to the V OLTE terminal.
- S162. Receive a character string of a second language returned by the voice recognition server.
- the VOLTE terminal After receiving the character string in the second language, the VOLTE terminal sends the character string of the second language to the translation server. After receiving the character string in the second language, the translation server translates the character string of the second language according to the preset translation information, translates the character string into the first language, and returns the character string of the first language to VOLTE. terminal.
- S164 Receive a character string of the first language returned by the translation server.
- S165 Send the character string of the first language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language.
- the VOLTE terminal After receiving the character string in the first language, the VOLTE terminal sends the character string of the first language to the voice combination Become a server. After receiving the character string in the first language, the speech synthesis server synthesizes the character string of the first language according to the preset synthesis information, synthesizes the voice information into the final first language, and uses the voice information of the final first language to The form of the voice stream is returned to the VOLTE terminal.
- the identification, translation, and synthesis processing of the voice information of the original second language may also be performed by one server.
- the VOLTE terminal transmits the voice information of the original second language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the voice information to the VOLTE terminal.
- the identification, translation, and composition processing of the speech information of the original second language may also be performed by two servers.
- the VOLTE terminal sends the voice information of the original second language to the first server, and the first server identifies and translates the voice information and returns the voice information to the VOLTE terminal, and the VOLT E terminal sends the voice information after the identification and translation processing.
- the second server sends the voice information to the VOLTE terminal.
- the VOLTE terminal sends the voice information of the original second language to the first server, and the first server returns the voice information to the VOLTE terminal, and the VOLTE terminal sends the voice information after the identification processing to the second server.
- the second server translates and synthesizes the voice information and returns it to the VOLTE terminal.
- S17 Receive voice information of the final first language returned by the server.
- the VOLTE terminal processes the voice information of the final first language through the audio path, and finally outputs the final first language through the sounding device (handset, speaker, etc.)
- the voice information, the local user in the first language can understand what the opposite user said.
- the received voice information of the peer user is further sent to the server for translation processing, and the voice information that can be recognized by the local user is translated, and the translated voice information is output, so that the local user can Understand the voice of the opposite user. Therefore, even if the peer end is an ordinary terminal, remote voice communication can be realized for users using different languages, which greatly expands the application range and further reduces the communication cost.
- a third embodiment of the voice call method of the present invention is proposed, and the method includes the following steps:
- S22 Send the voice information of the original second language to the server for translation processing, so that the server will The speech information of the second language is translated into speech information of the final first language.
- S23 Receive voice information of the final first language returned by the server.
- steps S21 to S24 are the same as the steps S15-S18 in the second embodiment, and details are not described herein again.
- the voice call method of the embodiment of the present invention transmits the voice information of the received peer user to the server for translation processing, translates the voice information that the local user can recognize, and then outputs the translated voice information, so that The local user can understand the voice of the peer user.
- the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.
- step S24 the following steps are further included:
- S26 Send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the first language into the voice information of the final second language.
- S27 Receive voice information of the final second language returned by the server.
- S28 Send the voice information of the final second language to the peer end.
- the steps S25-S28 are respectively related to the steps S11-S14 in the first embodiment, and details are not described herein again.
- the collected voice information of the local user is further sent to the server for translation processing, translated into voice information that the peer user can recognize, and then the translated voice information is sent to the peer end, so that The peer user can understand the voice of the local user. Therefore, even if the peer end is an ordinary terminal, remote voice communication can be realized for users using different languages, which greatly expands the application range and further reduces the communication cost.
- the first embodiment and the third embodiment may be applied to the application scenario shown in FIG. 1 , where the VOLTE terminal A and the VOLTE terminal B pass the IP multimedia subsystem (IP Multimedia Subsys tern, IMS).
- IP Multimedia Subsys tern, IMS IP Multimedia Subsys tern
- the network establishes a connection, and VOLTE terminal A and VOLTE terminal B are respectively connected to the voice recognition
- the other server, the translation server, and the voice synthesizing server, the VOLTE terminal A and the VOLTE terminal B both use the voice call method of the first embodiment or the second embodiment to perform a voice call, so that users in different languages can implement remote voice communication.
- the second embodiment and the fourth embodiment can be applied to the application scenarios as shown in FIGS. 2 to 4.
- the VOL TE terminal A and the voice terminal B establish a connection through the IMS network, and the VOLTE terminal A is respectively connected to the voice recognition server, the translation server and the voice synthesis server, and the VOLTE terminal A uses the second embodiment or the third embodiment.
- the voice call method and the voice terminal B make a voice call so that users in different languages can realize remote voice communication.
- the VOLTE terminal A connects to the IMS network and the gateway of the 2G/3G network through the IMS network
- the voice terminal B connects the IMS network and the gateway of the 2G/3G network through the 2G/3G network
- the VOLTE terminal A is respectively connected to the voice recognition server, and the translation
- the server and the voice synthesizing server, the VO LTE terminal A uses the voice call method of the second embodiment or the third embodiment to make a voice call with the voice terminal B, so that users in different languages can realize remote voice communication.
- the VOLTE terminal A connects to the IMS network and the public switched telephone network (PSTN) gateway through the IMS network
- the voice terminal B connects the IMS network and the PSTN gateway through the PSTN
- the VOLTE terminal A is connected to the voice recognition respectively.
- the server, the translation server, and the voice synthesizing server, the VO LTE terminal A uses the voice call method of the second embodiment or the third embodiment to make a voice call with the voice terminal B, so that users in different languages can implement remote voice communication.
- the processing delay of the speech recognition server is generally less than 3 seconds
- the processing delay of the translation server is generally less than 200 milliseconds
- the processing delay of the speech synthesis server is generally less than 200 milliseconds
- the delay of the transmission of the IMS network is generally second. Therefore, using the high-rate and low-latency characteristics of LTE communication, the multi-language real-time translation function during voice call is implemented on the VOLTE terminal, and the voice translation processing speed is fast, the delay is small, and the call of the user is not affected, thereby Enables remote, accessible voice communication for users in different languages.
- the device includes an information collection module 10, a first translation processing module 20, a first information receiving module 30, and an information sending module 40, where:
- the information collection module 10 is configured to collect voice information of the original first language.
- the first translation processing module 20 is configured to send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the original first language into the voice information of the final second language.
- First information receiving mode Block 30 is arranged to receive the voice information of the final second language returned by the server.
- the information sending module 40 is configured to send the voice information of the final second language to the opposite end.
- the language used by the VOLTE terminal user is the first language
- the language used by the peer user is the second language.
- the information collecting module 10 collects the voice information of the original first language of the user through the microphone.
- the first translation processing module 20 may send the voice information of the original first language to the server directly as a voice data stream.
- the first translation processing module 20 subdivides the voice information of the original first language in the form of a data packet.
- Sent to the server For example, the first translation processing module 20 first records the voice information of the original first language, records the voice files into a single voice file, and caches them, and then sequentially sends each cached voice file to the server in the form of a data packet.
- the translation process mainly includes three processes of identification, translation and synthesis.
- the three processes can be completed by one server or by two or three servers.
- the server includes a voice recognition server, a translation server, and a voice synthesis server.
- the VOLTE terminal establishes an IP-based communication connection with the voice recognition server, and sets the identification information through the first setting module, that is, the language type to be recognized, including the local language type (first language), and may further include the language type of the opposite end (first The second language); establishes an IP-based connection with the translation server, and sets the translation information through the second setting module, that is, the language to be translated, including the local-to-peer mapping, and may further include the peer-to-end mapping;
- the server establishes a connection based on IP communication, and sets the synthesized information through the third setting module, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.
- the first translation processing module 20 includes a first transmitting unit 21, a first receiving unit 22, a second transmitting unit 23, a second receiving unit 24, and a third transmitting unit 25, where:
- the first transmitting unit 21 is configured to transmit the voice information of the original first language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the first language.
- the first sending unit 21 first performs recording processing on the voice information of the original first language, records the voice files as a single voice file, and buffers, and then sequentially sends each cached voice file to the voice recognition server in the form of a data packet.
- the voice recognition server After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the first language, and returns the character string of the first language to the VOLTE terminal.
- the first receiving unit 22 is arranged to receive a character string of the first language returned by the voice recognition server.
- the element 23 is arranged to send a string of the first language to the translation server to cause the translation server to translate the string of the first language into a string of the second language.
- the second transmitting unit 23 transmits the character string of the first language to the translation server.
- the translation server translates the string of the first language according to the preset translation information, translates the string into the second language, and returns the string of the second language to the VOLTE. terminal.
- the second receiving unit 24 is arranged to receive a character string of the second language returned by the translation server.
- the third transmitting unit 25 is arranged to transmit the character string of the second language to the speech synthesis server such that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language. After receiving the character string of the second language, the third transmitting unit 25 transmits the character string of the second language to the speech synthesis server. After receiving the character string in the second language, the speech synthesis server synthesizes the character string of the second language according to the preset synthesis information, synthesizes the speech information into the final second language, and finally the voice information of the second language is The form of the voice stream is returned to the VOLTE terminal.
- the identification, translation, and synthesis processing of the voice information of the original first language may also be performed by one server.
- the first translation processing module 20 transmits the voice information of the original first language to the server, and the server identifies, translates, and synthesizes the voice information and returns it to the VOLTE terminal.
- the identification, translation, and synthesis processing of the speech information of the original first language may also be performed by two servers.
- the first translation processing module 20 sends the voice information of the original first language to the first server, and the first server identifies and translates the voice information and returns the result to the V OLTE terminal, where the first translation processing module 20 identifies And the voice information after the translation processing is sent to the second server, and the second server combines the voice information and returns to the VOLTE terminal.
- the first translation processing module 20 sends the voice information of the original first language to the first server, and the first server performs the identification process and returns the voice information to the VOLTE terminal, and the first translation processing module 20 further identifies the processing.
- the voice information is sent to the second server, and the second server translates and synthesizes the voice information and returns it to the VOLTE terminal.
- the voice call device of the embodiment of the present invention transmits the voice information of the local user of the collection to the server for translation processing, translates the voice information that can be recognized by the peer user, and then sends the translated voice information to the voice message.
- the peer end enables the peer user to understand the voice of the local user. Therefore, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, and the use is different. Language users can't communicate technical problems with remote voice communication through communication terminals, which reduces communication costs and improves user experience.
- the device includes a second information receiving module 50, a second translation processing module 60, a third information receiving module 70, and an information output module 80, wherein
- the second information receiving module 50 is configured to receive the voice information of the original second language sent by the opposite end.
- the second information receiving module 50 receives the voice information of the original second language sent by the opposite end of the transmitting end through the voice channel.
- the second translation processing module 60 is arranged to transmit the voice information of the original second language to the server for translation processing, so that the server translates the voice information of the original second language into the voice information of the final first language.
- the second translation processing module 60 may directly transmit the voice information of the original second language to the server as a voice data stream. Preferably, the second translation processing module 60 segments the voice information of the original second language in the form of a data packet. Sent to the server. For example, the second translation processing module 60 first performs recording processing on the voice information of the original second language, records the voice files as a single voice file, and caches them, and then sequentially sends each cached voice file to the server in the form of a data packet.
- the server includes a voice recognition server, a translation server, and a voice synthesis server.
- the VOLTE terminal establishes an IP-based communication connection with the voice recognition server, and sets the identification information through the first setting module, that is, the language type to be recognized, including the language type of the opposite end (second language), and may further include the local language type (first) a language); establishing a connection based on the IP communication with the translation server, setting the translation information through the second setting module, that is, the language to be translated, including the mapping of the peer to the local end, and further including the mapping of the local end to the opposite end; and speech synthesis
- the server establishes a connection based on IP communication, and sets the synthesized information through the third setting module, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.
- the second translation processing module 60 includes a fourth transmitting unit 61, a third receiving unit 62, a fifth transmitting unit 63, a fourth receiving unit 64, and a sixth transmitting unit 65, where:
- the transmitting unit 61 is arranged to transmit the voice information of the original second language to the voice recognition server such that the voice recognition server recognizes the voice information as a character string of the second language.
- the fourth sending unit 61 first performs recording processing on the voice information of the original second language, records the voice files into a single voice file, and buffers them, and then sequentially sends each cached voice file to the voice recognition server in the form of a data packet.
- the voice file is identified according to the preset identification information, recognized as a character string of the second language, and the character string of the second language is returned to the VOLTE terminal.
- the third receiving unit 62 is arranged to receive a character string of the second language returned by the voice recognition server.
- the fifth transmitting unit 63 is arranged to transmit the character string of the second language to the translation server to cause the translation server to translate the character string of the second language into a character string of the first language. After receiving the character string of the second language, the fifth transmitting unit 63 transmits the character string of the second language to the translation server.
- the translation server After receiving the character string in the second language, the translation server translates the character string of the second language according to the preset translation information, translates the character string into the first language, and returns the character string of the first language to VOLTE. terminal.
- the fourth receiving unit 64 is arranged to receive a character string of the first language returned by the translation server.
- the sixth transmitting unit 65 is arranged to transmit the character string of the first language to the speech synthesis server such that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language. After receiving the character string of the first language, the sixth transmitting unit 65 transmits the character string of the first language to the speech synthesis server.
- the speech synthesis server After receiving the character string in the first language, the speech synthesis server synthesizes the character string of the first language according to the preset synthesis information, synthesizes the voice information into the final first language, and uses the voice information of the final first language to The form of the voice stream is returned to the VOLTE terminal.
- the identification, translation, and synthesis processing of the voice information of the original second language may also be performed by a server.
- the second translation processing module 60 transmits the voice information of the original second language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the result to the VOLTE terminal.
- the identification, translation, and synthesis processing of the speech information of the original second language may also be performed by two servers.
- the second translation processing module 60 sends the voice information of the original second language to the first server, and the first server identifies and translates the voice information and returns the result to the V 0LTE terminal, where the second translation processing module 60 identifies And the voice information after the translation processing is sent to the second server, and the second server combines the voice information and returns to the VOLTE terminal.
- the second translation processing module 60 sends the voice information of the original second language to the first server, and the first server performs the identification process and returns the voice information to the VOLTE terminal, and the second translation processing module 60 performs the identification process.
- the voice information is sent to the second server, and the second server translates and synthesizes the voice information and returns it to the VOLTE terminal.
- the third information receiving module 70 is configured to receive the voice information of the final first language returned by the server.
- the information output module 80 is configured to output the voice of the final first language Information. After receiving the voice information of the final first language returned by the server, the information output module 80 processes the voice information of the final first language through the audio path, and finally outputs the final first through the sounding device (handset, speaker, etc.)
- the voice information of the language, the local user who uses the first language can understand what the opposite user said.
- the voice call device of the embodiment of the present invention transmits the voice information of the received peer user to the server for translation processing, translates the voice information that the local user can recognize, and then outputs the translated voice information, so that The local user can understand the voice of the peer user.
- the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.
- the voice communication devices of the foregoing first embodiment and the second embodiment may be combined to form the voice communication device of the third embodiment.
- the voice call device can not only translate the voice information collected by the local end but also send the voice information to the opposite end, and can also translate the voice information sent by the opposite end and then output the voice information, so that even if the opposite end is an ordinary voice terminal,
- the ability to implement remote voice communication for users using different languages greatly expands the scope of application and further reduces communication costs.
- the voice call device of this embodiment can be applied to the application scenario as shown in FIG. 2 to FIG. 4.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Information Transfer Between Computers (AREA)
- Machine Translation (AREA)
- Telephonic Communication Services (AREA)
Abstract
本发明揭示了一种语音通话方法和装置,所述方法包括以下步骤:采集原始第一语言的语音信息;将原始第一语言的语音信息发送给服务器进行翻译处理,以使服务器将原始第一语言的语音信息翻译处理为最终第二语言的语音信息;接收服务器返回的最终第二语言的语音信息;将最终第二语言的语音信息发送给对端。The present invention discloses a voice call method and apparatus, the method comprising the steps of: collecting voice information of an original first language; transmitting voice information of the original first language to a server for translation processing, so that the server will be original first The voice information of the language is translated into the voice information of the final second language; the voice information of the final second language returned by the server is received; and the voice information of the final second language is sent to the opposite end.
Description
发明名称:语音通话方法和装置 Invention name: voice call method and device
技术领域 Technical field
[0001] 本发明涉及通信技术领域, 特别是涉及到一种语音通话方法和装置。 [0001] The present invention relates to the field of communications technologies, and in particular, to a voice call method and apparatus.
背景技术 Background technique
[0002] 随着通信终端的使用日益广泛, 人们利用通信终端可以实现多种功能, 例如利 用通信终端听音乐, 看视频以及进行语音通话等等。 语音通话是通信终端的一 个基本的和常用的功能, 即使人们远隔千里, 也能够通过通信终端实现远程语 音交流, 无形中缩短了人与人之间的距离。 [0002] With the increasing use of communication terminals, people can realize various functions by using communication terminals, such as listening to music, watching videos, and making voice calls using communication terminals. Voice calls are a basic and commonly used function of communication terminals. Even if people are thousands of miles apart, they can realize remote voice communication through communication terminals, which virtually shortens the distance between people.
[0003] 同时, 随着经济的全球化和囯际化发展, 不同囯家之间的人们的交往也越来越 密切。 不同国家的人通常使用不同的语言, 当两个用户中至少一个用户听不懂 对方的语言, 另一个用户也不会说对方的语言时, 两个用户则无法通过通信终 端进行远程语音交流, 必须面对面的交谈, 并且通过人工或者翻译机进行翻译 , 从而减少了沟通渠道, 提高了沟通成本。 [0003] At the same time, with the globalization of the economy and the development of internationalization, people's exchanges between different countries are becoming more and more close. People in different countries usually use different languages. When at least one of the two users does not understand the other party's language, and the other user does not speak the other party's language, the two users cannot communicate remotely through the communication terminal. Face-to-face conversations must be made and translated through manual or translation machines, reducing communication channels and increasing communication costs.
技术问题 technical problem
[0004] 因此, 如何通过通信终端为使用不同语言的用户实现远程语音交流, 是当前亟 需解决的技术问题。 [0004] Therefore, how to implement remote voice communication for users using different languages through a communication terminal is a technical problem that needs to be solved at present.
问题的解决方案 Problem solution
技术解决方案 Technical solution
[0005] 本发明的主要目的为提供一种语音通话方法和装置, 旨在解决使用不同语言的 用户无法通过通信终端进行远程语音交流的技术问题。 [0005] A primary object of the present invention is to provide a voice call method and apparatus for solving the technical problem that a user using a different language cannot perform remote voice communication through a communication terminal.
[0006] 为达以上目的, 本发明实施例提出一种语音通话方法, 所述方法包括以下步骤 : 釆集原始第一语言的语音信息; 将所述原始第一语言的语音信息发送给服务 器进行翻译处理, 以使所述服务器将所述第一语言的语音信息翻译处理为最终 第二语言的语音信息; 接收所述服务器返回的所述最终第二语言的语音信息; 将所述最终第二语言的语音信息发送给对端。 [0006] In order to achieve the above objective, an embodiment of the present invention provides a voice call method, where the method includes the following steps: collecting voice information of an original first language; and transmitting the voice information of the original first language to a server. Translating processing, so that the server translates the voice information of the first language into voice information of a final second language; receiving voice information of the final second language returned by the server; The voice information of the language is sent to the peer.
[0007] 基于同一发明构思, 本发明实施例还提出一种语音通话方法, 所述方法包括以 下步骤: 接收对端发送的原始第二语言的语音信息; 将所述原始第二语言的语 音信息发送给服务器进行翻译处理, 以使所述服务器将所述第二语言的语音信 息翻译处理为最终第一语言的语音信息; 接收所述服务器返回的所述最终第一 语言的语音信息; 输出所述最终第一语言的语音信息。 [0007] Based on the same inventive concept, an embodiment of the present invention further provides a voice call method, where the method includes The following steps: receiving voice information of the original second language sent by the peer end; sending the voice information of the original second language to the server for translation processing, so that the server translates the voice information of the second language into Final voice information of the first language; receiving voice information of the final first language returned by the server; and outputting voice information of the final first language.
[0008] 本发明实施例同时提出一种语音通话装置, 所述装置包括信息釆集模块、 第一 翻译处理模块、 第一信息接收模块、 信息发送模块, 信息釆集模块设置为采集 原始第一语言的语音信息; 第一翻译处理模块设置为将所述原始第一语言的语 音信息发送给服务器进行翻译处理, 以使所述服务器将所述原始第一语言的语 音信息翻译处理为最终第二语言的语音信息; 第一信息接收模块设置为接收所 述服务器返回的所述最终第二语言的语音信息; 信息发送模块设置为将所述最 终第二语言的语音信息发送给对端。 [0008] The embodiment of the present invention further provides a voice call device, where the device includes an information collection module, a first translation processing module, a first information receiving module, and an information sending module, and the information collection module is configured to collect the original first a voice information of the language; the first translation processing module is configured to send the voice information of the original first language to a server for translation processing, so that the server translates the voice information of the original first language into a final second The voice information of the language; the first information receiving module is configured to receive the voice information of the final second language returned by the server; and the information sending module is configured to send the voice information of the final second language to the peer end.
发明的有益效果 Advantageous effects of the invention
有益效果 Beneficial effect
[0009] 本发明实施例所提供的一种语音通话方法, 通过将采集的本端用户的语音信息 发送给服务器进行翻译处理, 翻译为对端用户能够识别的语音信息, 再将翻译 后的语音信息发送给对端, 使得对端用户能够听懂本端用户的语音。 从而为通 信终端增加了翻译功能, 使得使用不同语言的用户实现了远程语音交流, 解决 了使用不同语言的用户无法通过通信终端进行远程语音交流的技术问题, 降低 了沟通成本, 提升了用户体验。 [0009] A voice call method provided by an embodiment of the present invention sends a voice message of a local user to a server for translation processing, and translates the voice information that can be recognized by the peer user, and then translates the voice. The information is sent to the peer end, so that the peer user can understand the voice of the local user. Thereby, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.
对附图的简要说明 Brief description of the drawing
附图说明 DRAWINGS
[0010] 图 1是实现本发明实施例的语音通话方法一应用场景的系统框图; 1 is a system block diagram of an application scenario of a voice call method according to an embodiment of the present invention;
[0011] 图 2是实现本发明实施例的语音通话方法又一应用场景的系统框图; [0011] FIG. 2 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention;
[0012] 图 3是实现本发明实施例的语音通话方法又一应用场景的系统框图; [0012] FIG. 3 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention;
[0013] 图 4是实现本发明实施例的语音通话方法又一应用场景的系统框图; [0013] FIG. 4 is a system block diagram of still another application scenario of a voice call method according to an embodiment of the present invention;
[0014] 图 5是本发明的语音通话装置第一实施例的模块示意图; 5 is a block diagram showing a first embodiment of a voice communication device according to the present invention;
[0015] 图 6是图 5中的第一翻译处理模块的模块示意图; 6 is a block diagram of a first translation processing module of FIG. 5;
[0016] 图 7是本发明的语音通话装置第二实施例的模块示意图; [0017] 图 8是图 7中的第二翻译处理模块的模块示意图; 7 is a block diagram showing a second embodiment of a voice communication device according to the present invention; 8 is a block diagram of a second translation processing module of FIG. 7;
[0018] 图 9是本发明的语音通话装置第三实施例的模块示意图。 9 is a block diagram showing a third embodiment of a voice communication device of the present invention.
实施该发明的最佳实施例 BEST MODE FOR CARRYING OUT THE INVENTION
本发明的最佳实施方式 BEST MODE FOR CARRYING OUT THE INVENTION
[0019] 应当理解, 此处所描述的具体实施例仅仅用以解释本发明, 并不用于限定本发 明。 The specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
[0020] 本发明实施例的语音通话方法和装置, 主要应用于 VOLTE终端, 该 VOLTE终 端即基于 VOLTE (Voice over LTE) 技术的通信终端。 VoLTE是一种 IP数据传输 技术, 无需 2G/3G网络, 全部业务承载于 4G网络上, 可实现数据与语音业务在同 一网络下的统一。 当然, 也可以应用于基于其它 IP数据传输技术的通信终端, 只 要其能够将数据与语音业务统一在同一网络下即可, 本发明对此不作限定。 [0020] The voice call method and apparatus of the embodiments of the present invention are mainly applied to a VOLTE terminal, which is a communication terminal based on VOLTE (Voice over LTE) technology. VoLTE is an IP data transmission technology that does not require a 2G/3G network. All services are carried on a 4G network, which enables data and voice services to be unified under the same network. Of course, it can also be applied to a communication terminal based on other IP data transmission technologies, as long as it can unify data and voice services in the same network, which is not limited by the present invention.
[0021] 本发明的语音通话方法第一实施例, 所述方法包括以下步骤: [0021] The first embodiment of the voice call method of the present invention, the method includes the following steps:
[0022] Sll、 釆集原始第一语言的语音信息。 [0022] S11. Collect voice information of the original first language.
[0023] 本发明实施例中, 定义 VOLTE终端用户使用的语言为第一语言, 对端用户使 用的语言为第二语言。 当 VOLTE终端作为发送端吋, 通过麦克风采集用户的第 一语言的语音信息。 [0023] In the embodiment of the present invention, the language used by the VOLTE terminal user is defined as the first language, and the language used by the peer user is the second language. When the VOLTE terminal acts as a transmitting terminal, the voice information of the user's first language is collected through the microphone.
[0024] S12、 将原始第一语言的语音信息发送给服务器进行翻译处理, 以使服务器将 原始第一语言的语音信息翻译处理为最终第二语言的语音信息。 [0024] S12: Send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the original first language into the voice information of the final second language.
[0025] VOLTE终端可以将原始第一语言的语音信息直接以语音数据流的方式发送给 服务器, 作为优选, VOLTE终端将原始第一语言的语音信息以数据包的形式分 包发送给服务器。 例如, VOLTE终端首先将原始第一语言的语音信息进行录音 处理, 录制为一个个的语音文件并缓存, 然后将缓存的每个语音文件以数据包 的形式依次发送给服务器。 [0025] The VOLTE terminal may directly transmit the voice information of the original first language to the server as a voice data stream. Preferably, the VOLTE terminal sends the voice information of the original first language to the server in the form of a data packet. For example, the VOLTE terminal first records the voice information of the original first language, records it as a voice file and caches it, and then sends each cached voice file to the server in the form of a data packet.
[0026] 翻译处理主要包括识别、 翻译和合成三个流程, 这三个流程可以由一个服务器 完成, 也可以由两个或三个服务器完成。 [0026] Translation processing mainly includes three processes of identification, translation and synthesis. These three processes can be completed by one server or by two or three servers.
[0027] 本发明实施例中, 服务器包括语音识别服务器、 翻译服务器和语音合成服务器 。 VOLTE终端与语音识别服务器建立基于 IP通信的连接, 设置识别信息, 即需 要识别的语言类型, 包括本端的语言类型 (第一语言) , 还可以进一步包括对 端的语言类型 (第二语言) ; 与翻译服务器建立基于 IP通信的连接, 设置翻译信 息, 即要翻译的语种, 包括本端对对端的映射, 还可以进一步包括对端对本端 映射; 与语音合成服务器建立基于 IP通信的连接, 设置合成信息, 即语音合成的 类型, 比如男女声、 语速等。 [0027] In the embodiment of the present invention, the server includes a voice recognition server, a translation server, and a voice synthesis server. The VOLTE terminal establishes an IP-based connection with the voice recognition server, and sets the identification information, that is, the language type to be recognized, including the local language type (first language), and may further include The language type of the terminal (second language); establishes an IP-based connection with the translation server, sets the translation information, that is, the language to be translated, including the local-to-peer mapping, and may further include the peer-to-peer mapping; The server establishes a connection based on IP communication, and sets synthetic information, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.
[0028] 步骤 S12中, VOLTE终端将原始第一语言的语音信息发送给服务器进行翻译处 理的具体流程如下: [0028] In step S12, the specific process of the VOLTE terminal transmitting the original first language voice information to the server for translation processing is as follows:
[0029] S121、 将原始第一语言的语音信息发送给语音识别服务器, 以使语音识别服务 器将语音信息识别为第一语言的字符串。 [0029] S121. Send the voice information of the original first language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the first language.
[0030] VOLTE终端首先将原始第一语言的语音信息进行录音处理, 录制为一个个的 语音文件并缓存, 然后将缓存的每个语音文件以数据包的形式依次发送给语音 识别服务器。 语音识别服务器接收到语音文件后, 根据预设的识别信息对语音 文件进行识别处理, 识别为第一语言的字符串, 并将第一语言的字符串返回给 V 0LTE终端。 [0030] The VOLTE terminal first records the voice information of the original first language, records the voice files into a single voice file, and buffers them, and then sends each cached voice file to the voice recognition server in the form of a data packet. After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the first language, and returns the character string of the first language to the V 0LTE terminal.
[0031] S122、 接收语音识别服务器返回的第一语言的字符串。 [0031] S122. Receive a character string of the first language returned by the voice recognition server.
[0032] S123、 将第一语言的字符串发送给翻译服务器, 以使翻译服务器将第一语言的 字符串翻译为第二语言的字符串。 [0032] S123. Send a character string of the first language to the translation server, so that the translation server translates the character string of the first language into the character string of the second language.
[0033] VOLTE终端接收到第一语言的字符串后, 将第一语言的字符串发送给翻译服 务器。 翻译服务器接收到第一语言的字符串后, 根据预设的翻译信息对该第一 语言的字符串进行翻译处理, 翻译为第二语言的字符串, 并将第二语言的字符 串返回给 VOLTE终端。 [0033] After receiving the character string of the first language, the VOLTE terminal sends the character string of the first language to the translation server. After receiving the string of the first language, the translation server translates the string of the first language according to the preset translation information, translates the string into the second language, and returns the string of the second language to the VOLTE. terminal.
[0034] S124、 接收翻译服务器返回的第二语言的字符串。 [0034] S124. Receive a character string of a second language returned by the translation server.
[0035] S125、 将第二语言的字符串发送给语音合成服务器, 以使语音合成服务器将第 二语言的字符串合成为最终第二语言的语音信息。 [0035] S125. Send a character string of the second language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language.
[0036] VOLTE终端接收到第二语言的字符串后, 将第二语言的字符串发送给语音合 成服务器。 语音合成服务器接收到第二语言的字符串后, 根据预设的合成信息 对第二语言的字符串进行合成处理, 合成为最终第二语言的语音信息, 并将最 终第二语言的语音信息以语音码流的形式返回给 VOLTE终端。 [0036] After receiving the character string of the second language, the VOLTE terminal sends the character string of the second language to the voice synthesizing server. After receiving the character string in the second language, the speech synthesis server synthesizes the character string of the second language according to the preset synthesis information, synthesizes the speech information into the final second language, and finally the voice information of the second language is The form of the voice stream is returned to the VOLTE terminal.
[0037] 在其它实施例中, 也可以由一个服务器完成原始第一语言的语音信息的识别、 翻译和合成处理。 例如, VOLTE终端将原始第一语言的语音信息发送给服务器 , 服务器将该语音信息进行识别、 翻译和合成处理后返回给 VOLTE终端。 在另 一些实施例中, 也可以由两个服务器完成原始第一语言的语音信息的识别、 翻 译和合成处理。 例如, VOLTE终端将原始第一语言的语音信息发送给第一服务 器, 第一服务器将该语音信息进行识别和翻译处理后返回给 VOLTE终端, VOLT E终端再将识别和翻译处理后的语音信息发送给第二服务器, 第二服务器将该语 音信息进行合成处理后返回给 VOLTE终端。 又如, VOLTE终端将原始第一语言 的语音信息发送给第一服务器, 第一服务器将该语音信息进行识别处理后返回 给 VOLTE终端, VOLTE终端再将识别处理后的语音信息发送给第二服务器, 第 二服务器将该语音信息进行翻译和合成处理后返回给 VOLTE终端。 [0037] In other embodiments, the voice information of the original first language may also be identified by a server. Translation and synthesis processing. For example, the VOLTE terminal transmits the voice information of the original first language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the voice information to the VOLTE terminal. In other embodiments, the identification, translation, and synthesis processing of the speech information of the original first language may also be performed by two servers. For example, the VOLTE terminal sends the voice information of the original first language to the first server, and the first server identifies and translates the voice information and returns the voice information to the VOLTE terminal, and the VOLT E terminal sends the voice information after the identification and translation processing. The second server sends the voice information to the VOLTE terminal. For another example, the VOLTE terminal sends the voice information of the original first language to the first server, and the first server returns the voice information to the VOLTE terminal, and the VOLTE terminal sends the voice information after the identification processing to the second server. The second server translates and synthesizes the voice information and returns it to the VOLTE terminal.
[0038] S13、 接收服务器返回的最终第二语言的语音信息。 [0038] S13. Receive voice information of the final second language returned by the server.
[0039] S14、 将最终第二语言的语音信息发送给对端。 [0039] S14. Send the voice information of the final second language to the peer end.
[0040] VOLTE终端接收到服务器返回的最终第二语言的语音信息后, 通过语音通道 将最终第二语言的语音信息发送给对端。 对端接收到最终第二语言的语音信息 后, 通过音频通路对该最终第二语言的语音信息进行处理, 最后通过发声装置 (听筒、 扬声器等) 输出该最终第二语言的语音信息, 使用第二语言的对端用 户则能够听懂本端用户所说的话。 [0040] After receiving the voice information of the final second language returned by the server, the VOLTE terminal sends the voice information of the final second language to the peer end through the voice channel. After receiving the voice information of the final second language, the peer end processes the voice information of the final second language through the audio channel, and finally outputs the voice information of the final second language through the sounding device (handset, speaker, etc.), using the The peer user of the second language can understand what the local user said.
[0041] 本发明实施例的语音通话方法, 通过将采集的本端用户的语音信息发送给服务 器进行翻译处理, 翻译为对端用户能够识别的语音信息, 再将翻译后的语音信 息发送给对端, 使得对端用户能够听懂本端用户的语音。 从而为通信终端增加 了翻译功能, 使得使用不同语言的用户实现了远程语音交流, 解决了使用不同 语言的用户无法通过通信终端进行远程语音交流的技术问题, 降低了沟通成本 , 提升了用户体验。 The voice call method of the embodiment of the present invention sends the voice information of the collected local user to the server for translation processing, translates the voice information that can be recognized by the peer user, and then sends the translated voice information to the pair. End, so that the peer user can understand the voice of the local user. Thereby, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.
[0042] 进一步地, 在本发明的语音通话方法的第二实施例中, 步骤 S 14之后还包括以 下步骤: [0042] Further, in the second embodiment of the voice call method of the present invention, after step S14, the following steps are further included:
[0043] S15、 接收对端发送的原始第二语言的语音信息。 [0043] S15. Receive voice information of the original second language sent by the opposite end.
[0044] 当 VOLTE终端作为接收端吋, 通过语音通道接收作为发送端的对端发送的原 始第二语言的语音信息。 [0045] S16、 将原始第二语言的语音信息发送给服务器进行翻译处理, 以使服务器将 原始第二语言的语音信息翻译处理为最终第一语言的语音信息。 [0044] When the VOLTE terminal is used as the receiving end, the voice information of the original second language sent by the opposite end of the transmitting end is received through the voice channel. [0045] S16: Send the voice information of the original second language to the server for translation processing, so that the server translates the voice information of the original second language into the voice information of the final first language.
[0046] VOLTE终端可以将原始第二语言的语音信息直接以语音数据流的方式发送给 服务器, 作为优选, VOLTE终端将原始第二语言的语音信息以数据包的形式分 包发送给服务器。 例如, VOLTE终端首先将原始第二语言的语音信息进行录音 处理, 录制为一个个的语音文件并缓存, 然后将缓存的每个语音文件以数据包 的形式依次发送给服务器。 [0046] The VOLTE terminal may directly transmit the voice information of the original second language to the server as a voice data stream. Preferably, the VOLTE terminal sends the voice information of the original second language to the server in the form of a data packet. For example, the VOLTE terminal first records the voice information of the original second language, records it as a voice file and caches it, and then sends each cached voice file to the server in the form of a data packet.
[0047] 本发明实施例中, 服务器包括语音识别服务器、 翻译服务器和语音合成服务器 。 步骤 S16中, VOLTE终端将原始第二语言的语音信息发送给服务器进行翻译处 理的具体流程如下: [0047] In the embodiment of the present invention, the server includes a voice recognition server, a translation server, and a voice synthesis server. In step S16, the specific process of the VOLTE terminal transmitting the voice information of the original second language to the server for translation processing is as follows:
[0048] S161、 将原始第二语言的语音信息发送给语音识别服务器, 以使语音识别服务 器将语音信息识别为最终第二语言的字符串。 [0048] S161. Send the voice information of the original second language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the final second language.
[0049] VOLTE终端首先将原始第二语言的语音信息进行录音处理, 录制为一个个的 语音文件并缓存, 然后将缓存的每个语音文件以数据包的形式依次发送给语音 识别服务器。 语音识别服务器接收到语音文件后, 根据预设的识别信息对语音 文件进行识别处理, 识别为第二语言的字符串, 并将第二语言的字符串返回给 V OLTE终端。 [0049] The VOLTE terminal first performs recording processing on the voice information of the original second language, records the voice files as a single voice file, and buffers, and then sends each cached voice file to the voice recognition server in the form of a data packet. After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the second language, and returns the character string of the second language to the V OLTE terminal.
[0050] S162、 接收语音识别服务器返回的第二语言的字符串。 [0050] S162. Receive a character string of a second language returned by the voice recognition server.
[0051] S163、 将第二语言的字符串发送给翻译服务器, 以使翻译服务器将第二语言的 字符串翻译为第一语言的字符串。 [0051] S163. Send a character string of the second language to the translation server, so that the translation server translates the character string of the second language into the character string of the first language.
[0052] VOLTE终端接收到第二语言的字符串后, 将第二语言的字符串发送给翻译服 务器。 翻译服务器接收到第二语言的字符串后, 根据预设的翻译信息对该第二 语言的字符串进行翻译处理, 翻译为第一语言的字符串, 并将第一语言的字符 串返回给 VOLTE终端。 [0052] After receiving the character string in the second language, the VOLTE terminal sends the character string of the second language to the translation server. After receiving the character string in the second language, the translation server translates the character string of the second language according to the preset translation information, translates the character string into the first language, and returns the character string of the first language to VOLTE. terminal.
[0053] S164、 接收翻译服务器返回的第一语言的字符串。 [0053] S164. Receive a character string of the first language returned by the translation server.
[0054] S165、 将第一语言的字符串发送给语音合成服务器, 以使语音合成服务器将第 一语言的字符串合成为最终第一语言的语音信息。 [0054] S165. Send the character string of the first language to the speech synthesis server, so that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language.
[0055] VOLTE终端接收到第一语言的字符串后, 将第一语言的字符串发送给语音合 成服务器。 语音合成服务器接收到第一语言的字符串后, 根据预设的合成信息 对第一语言的字符串进行合成处理, 合成为最终第一语言的语音信息, 并将最 终第一语言的语音信息以语音码流的形式返回给 VOLTE终端。 [0055] After receiving the character string in the first language, the VOLTE terminal sends the character string of the first language to the voice combination Become a server. After receiving the character string in the first language, the speech synthesis server synthesizes the character string of the first language according to the preset synthesis information, synthesizes the voice information into the final first language, and uses the voice information of the final first language to The form of the voice stream is returned to the VOLTE terminal.
[0056] 在其它实施例中, 也可以由一个服务器完成原始第二语言的语音信息的识别、 翻译和合成处理。 例如, VOLTE终端将原始第二语言的语音信息发送给服务器 , 服务器将该语音信息进行识别、 翻译和合成处理后返回给 VOLTE终端。 在另 一些实施例中, 也可以由两个服务器完成原始第二语言的语音信息的识别、 翻 译和合成处理。 例如, VOLTE终端将原始第二语言的语音信息发送给第一服务 器, 第一服务器将该语音信息进行识别和翻译处理后返回给 VOLTE终端, VOLT E终端再将识别和翻译处理后的语音信息发送给第二服务器, 第二服务器将该语 音信息进行合成处理后返回给 VOLTE终端。 又如, VOLTE终端将原始第二语言 的语音信息发送给第一服务器, 第一服务器将该语音信息进行识别处理后返回 给 VOLTE终端, VOLTE终端再将识别处理后的语音信息发送给第二服务器, 第 二服务器将该语音信息进行翻译和合成处理后返回给 VOLTE终端。 [0056] In other embodiments, the identification, translation, and synthesis processing of the voice information of the original second language may also be performed by one server. For example, the VOLTE terminal transmits the voice information of the original second language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the voice information to the VOLTE terminal. In other embodiments, the identification, translation, and composition processing of the speech information of the original second language may also be performed by two servers. For example, the VOLTE terminal sends the voice information of the original second language to the first server, and the first server identifies and translates the voice information and returns the voice information to the VOLTE terminal, and the VOLT E terminal sends the voice information after the identification and translation processing. The second server sends the voice information to the VOLTE terminal. For another example, the VOLTE terminal sends the voice information of the original second language to the first server, and the first server returns the voice information to the VOLTE terminal, and the VOLTE terminal sends the voice information after the identification processing to the second server. The second server translates and synthesizes the voice information and returns it to the VOLTE terminal.
[0057] S17、 接收服务器返回的最终第一语言的语音信息。 [0057] S17. Receive voice information of the final first language returned by the server.
[0058] S18、 输出最终第一语言的语音信息。 [0058] S18. Output voice information of the final first language.
[0059] VOLTE终端接收到服务器返回的最终第一语言的语音信息后, 通过音频通路 对该最终第一语言的语音信息进行处理, 最后通过发声装置 (听筒、 扬声器等 ) 输出该最终第一语言的语音信息, 使用第一语言的本端用户则能够听懂对端 用户所说的话。 [0059] after receiving the voice information of the final first language returned by the server, the VOLTE terminal processes the voice information of the final first language through the audio path, and finally outputs the final first language through the sounding device (handset, speaker, etc.) The voice information, the local user in the first language can understand what the opposite user said.
[0060] 本实施例中, 进一步将接收到的对端用户的语音信息发送给服务器进行翻译处 理, 翻译为本端用户能够识别的语音信息, 再输出翻译后的语音信息, 使得本 端用户能够听懂对端用户的语音。 从而, 即使对端为普通终端, 也能够让使用 不同语言的用户实现远程语音交流, 大大扩大了应用范围, 进一步降低了沟通 成本。 [0060] In this embodiment, the received voice information of the peer user is further sent to the server for translation processing, and the voice information that can be recognized by the local user is translated, and the translated voice information is output, so that the local user can Understand the voice of the opposite user. Therefore, even if the peer end is an ordinary terminal, remote voice communication can be realized for users using different languages, which greatly expands the application range and further reduces the communication cost.
[0061] 提出本发明的语音通话方法第三实施例, 所述方法包括以下步骤: [0061] A third embodiment of the voice call method of the present invention is proposed, and the method includes the following steps:
[0062] S21、 接收对端发送的原始第二语言的语音信息。 [0062] S21. Receive voice information of the original second language sent by the opposite end.
[0063] S22、 将原始第二语言的语音信息发送给服务器进行翻译处理, 以使服务器将 第二语言的语音信息翻译处理为最终第一语言的语音信息。 [0063] S22: Send the voice information of the original second language to the server for translation processing, so that the server will The speech information of the second language is translated into speech information of the final first language.
[0064] S23、 接收服务器返回的最终第一语言的语音信息。 [0064] S23. Receive voice information of the final first language returned by the server.
[0065] S24、 输出最终第一语言的语音信息。 [0065] S24. Output voice information of the final first language.
[0066] 本实施例中, 步骤 S21-步骤 S24分别与第二实施例中的步骤 S15-S18相同, 在此 不再赘述。 [0066] In this embodiment, the steps S21 to S24 are the same as the steps S15-S18 in the second embodiment, and details are not described herein again.
[0067] 本发明实施例的语音通话方法, 通过将接收到的对端用户的语音信息发送给服 务器进行翻译处理, 翻译为本端用户能够识别的语音信息, 再输出翻译后的语 音信息, 使得本端用户能够听懂对端用户的语音。 从而为通信终端增加了翻译 功能, 使得使用不同语言的用户实现了远程语音交流, 解决了使用不同语言的 用户无法通过通信终端进行远程语音交流的技术问题, 降低了沟通成本, 提升 了用户体验。 The voice call method of the embodiment of the present invention transmits the voice information of the received peer user to the server for translation processing, translates the voice information that the local user can recognize, and then outputs the translated voice information, so that The local user can understand the voice of the peer user. Thereby, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.
[0068] 进一步地, 在本发明的语音通话方法的第四实施例中, 步骤 S24之后还包括以 下步骤: [0068] Further, in the fourth embodiment of the voice call method of the present invention, after step S24, the following steps are further included:
[0069] S25、 采集原始第一语言的语音信息。 [0069] S25. Acquire voice information of the original first language.
[0070] S26、 将原始第一语言的语音信息发送给服务器进行翻译处理, 以使服务器将 第一语言的语音信息翻译处理为最终第二语言的语音信息。 [0070] S26: Send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the first language into the voice information of the final second language.
[0071] S27、 接收服务器返回的最终第二语言的语音信息。 [0071] S27. Receive voice information of the final second language returned by the server.
[0072] S28、 将最终第二语言的语音信息发送给对端。 [0072] S28. Send the voice information of the final second language to the peer end.
[0073] 本实施例中, 步骤 S25-S28分别与第一实施例中的步骤 S11-S14系统, 在此不再 赘述。 [0073] In this embodiment, the steps S25-S28 are respectively related to the steps S11-S14 in the first embodiment, and details are not described herein again.
[0074] 本实施例中, 进一步地将采集的本端用户的语音信息发送给服务器进行翻译处 理, 翻译为对端用户能够识别的语音信息, 再将翻译后的语音信息发送给对端 , 使得对端用户能够听懂本端用户的语音。 从而, 即使对端为普通终端, 也能 够让使用不同语言的用户实现远程语音交流, 大大扩大了应用范围, 进一步降 低了沟通成本。 [0074] In this embodiment, the collected voice information of the local user is further sent to the server for translation processing, translated into voice information that the peer user can recognize, and then the translated voice information is sent to the peer end, so that The peer user can understand the voice of the local user. Therefore, even if the peer end is an ordinary terminal, remote voice communication can be realized for users using different languages, which greatly expands the application range and further reduces the communication cost.
[0075] 本发明实施例中, 第一实施例和第三实施例可以应用于如图 1所示的应用场景 , 其中, VOLTE终端 A与 VOLTE终端 B通过 IP多媒体系统 (IP Multimedia Subsys tern, IMS) 网络建立连接, 且 VOLTE终端 A和 VOLTE终端 B均分别连接语音识 别服务器、 翻译服务器和语音合成服务器, VOLTE终端 A和 VOLTE终端 B均釆用 第一实施例或第二实施例的语音通话方法进行语音通话, 从而使用不同语言的 用户就能实现远程语音交流。 In the embodiment of the present invention, the first embodiment and the third embodiment may be applied to the application scenario shown in FIG. 1 , where the VOLTE terminal A and the VOLTE terminal B pass the IP multimedia subsystem (IP Multimedia Subsys tern, IMS). The network establishes a connection, and VOLTE terminal A and VOLTE terminal B are respectively connected to the voice recognition The other server, the translation server, and the voice synthesizing server, the VOLTE terminal A and the VOLTE terminal B both use the voice call method of the first embodiment or the second embodiment to perform a voice call, so that users in different languages can implement remote voice communication.
[0076] 第二实施例和第四实施例可以应用于如图 2-图 4所示的应用场景。 图 2中, VOL TE终端 A与语音终端 B通过 IMS网络建立连接, 且 VOLTE终端 A分别连接语音识 别服务器、 翻译服务器和语音合成服务器, VOLTE终端 A釆用第二实施例或第 三实施例的语音通话方法与语音终端 B进行语音通话 从而使用不同语言的用户 就能实现远程语音交流。 图 3中 VOLTE终端 A通过 IMS网络连接 IMS网络与 2G/ 3G网络的网关, 语音终端 B通过 2G/3G网络连接 IMS网络与 2G/3G网络的网关, 且 VOLTE终端 A分别连接语音识别服务器、 翻译服务器和语音合成服务器, VO LTE终端 A釆用第二实施例或第三实施例的语音通话方法与语音终端 B进行语音 通话, 从而使用不同语言的用户就能实现远程语音交流。 图 4中, VOLTE终端 A 通过 IMS网络连接 IMS网络与公共交换电话网络 (Public Switched Telephone Network, PSTN) 的网关, 语音终端 B通过 PSTN连接 IMS网络与 PSTN的网关, 且 VOLTE终端 A分别连接语音识别服务器、 翻译服务器和语音合成服务器, VO LTE终端 A采用第二实施例或第三实施例的语音通话方法与语音终端 B进行语音 通话, 从而使用不同语言的用户就能实现远程语音交流。 [0076] The second embodiment and the fourth embodiment can be applied to the application scenarios as shown in FIGS. 2 to 4. In FIG. 2, the VOL TE terminal A and the voice terminal B establish a connection through the IMS network, and the VOLTE terminal A is respectively connected to the voice recognition server, the translation server and the voice synthesis server, and the VOLTE terminal A uses the second embodiment or the third embodiment. The voice call method and the voice terminal B make a voice call so that users in different languages can realize remote voice communication. In Figure 3, the VOLTE terminal A connects to the IMS network and the gateway of the 2G/3G network through the IMS network, and the voice terminal B connects the IMS network and the gateway of the 2G/3G network through the 2G/3G network, and the VOLTE terminal A is respectively connected to the voice recognition server, and the translation The server and the voice synthesizing server, the VO LTE terminal A uses the voice call method of the second embodiment or the third embodiment to make a voice call with the voice terminal B, so that users in different languages can realize remote voice communication. In Figure 4, the VOLTE terminal A connects to the IMS network and the public switched telephone network (PSTN) gateway through the IMS network, the voice terminal B connects the IMS network and the PSTN gateway through the PSTN, and the VOLTE terminal A is connected to the voice recognition respectively. The server, the translation server, and the voice synthesizing server, the VO LTE terminal A uses the voice call method of the second embodiment or the third embodiment to make a voice call with the voice terminal B, so that users in different languages can implement remote voice communication.
[0077] 语音识别服务器的处理吋延一般小于 3秒, 翻译服务器的处理吋延一般小于 200 毫秒, 语音合成服务器的处理吋延一般小于 200毫秒, IMS网络传输的吋延一般 为秒级。 因此, 利用 LTE通信的高速率低时延的特点, 在 VOLTE终端上实现语 音通话时的多语言实时翻译功能, 语音翻译处理的速度快, 时延小, 不会对用 户的通话造成影响, 从而使得使用不同语言的用户可以实现远程无障碍语音交 流。 [0077] The processing delay of the speech recognition server is generally less than 3 seconds, the processing delay of the translation server is generally less than 200 milliseconds, the processing delay of the speech synthesis server is generally less than 200 milliseconds, and the delay of the transmission of the IMS network is generally second. Therefore, using the high-rate and low-latency characteristics of LTE communication, the multi-language real-time translation function during voice call is implemented on the VOLTE terminal, and the voice translation processing speed is fast, the delay is small, and the call of the user is not affected, thereby Enables remote, accessible voice communication for users in different languages.
[0078] 参照图 5, 提出本发明的语音通话装置第一实施例, 所述装置包括信息采集模 块 10、 第一翻译处理模块 20、 第一信息接收模块 30和信息发送模块 40, 其中: Referring to FIG. 5, a first embodiment of a voice call device of the present invention is provided. The device includes an information collection module 10, a first translation processing module 20, a first information receiving module 30, and an information sending module 40, where:
[0079] 信息采集模块 10设置为采集原始第一语言的语音信息。 第一翻译处理模块 20设 置为将原始第一语言的语音信息发送给服务器进行翻译处理, 以使服务器将原 始第一语言的语音信息翻译处理为最终第二语言的语音信息。 第一信息接收模 块 30设置为接收服务器返回的最终第二语言的语音信息。 信息发送模块 40设置 为将最终第二语言的语音信息发送给对端。 本发明实施例中, VOLTE终端用户 使用的语言为第一语言, 对端用户使用的语言为第二语言。 当 VOLTE终端作为 发送端时, 信息釆集模块 10通过麦克风釆集用户的原始第一语言的语音信息。 第一翻译处理模块 20可以将原始第一语言的语音信息直接以语音数据流的方式 发送给服务器, 作为优选, 第一翻译处理模块 20将原始第一语言的语音信息以 数据包的形式分包发送给服务器。 例如, 第一翻译处理模块 20首先将原始第一 语言的语音信息进行录音处理, 录制为一个个的语音文件并缓存, 然后将缓存 的每个语音文件以数据包的形式依次发送给服务器。 [0079] The information collection module 10 is configured to collect voice information of the original first language. The first translation processing module 20 is configured to send the voice information of the original first language to the server for translation processing, so that the server translates the voice information of the original first language into the voice information of the final second language. First information receiving mode Block 30 is arranged to receive the voice information of the final second language returned by the server. The information sending module 40 is configured to send the voice information of the final second language to the opposite end. In the embodiment of the present invention, the language used by the VOLTE terminal user is the first language, and the language used by the peer user is the second language. When the VOLTE terminal is used as the transmitting end, the information collecting module 10 collects the voice information of the original first language of the user through the microphone. The first translation processing module 20 may send the voice information of the original first language to the server directly as a voice data stream. Preferably, the first translation processing module 20 subdivides the voice information of the original first language in the form of a data packet. Sent to the server. For example, the first translation processing module 20 first records the voice information of the original first language, records the voice files into a single voice file, and caches them, and then sequentially sends each cached voice file to the server in the form of a data packet.
[0080] 翻译处理主要包括识别、 翻译和合成三个流程, 这三个流程可以由一个服务器 完成, 也可以由两个或三个服务器完成。 [0080] The translation process mainly includes three processes of identification, translation and synthesis. The three processes can be completed by one server or by two or three servers.
[0081] 本发明实施例中, 服务器包括语音识别服务器、 翻译服务器和语音合成服务器 。 VOLTE终端与语音识别服务器建立基于 IP通信的连接, 通过第一设置模块设 置识别信息, 即需要识别的语言类型, 包括本端的语言类型 (第一语言) , 还 可以进一步包括对端的语言类型 (第二语言) ; 与翻译服务器建立基于 IP通信的 连接, 通过第二设置模块设置翻译信息, 即要翻译的语种, 包括本端对对端的 映射, 还可以进一步包括对端对本端映射; 与语音合成服务器建立基于 IP通信的 连接, 通过第三设置模块设置合成信息, 即语音合成的类型, 比如男女声、 语 速等。 [0081] In the embodiment of the present invention, the server includes a voice recognition server, a translation server, and a voice synthesis server. The VOLTE terminal establishes an IP-based communication connection with the voice recognition server, and sets the identification information through the first setting module, that is, the language type to be recognized, including the local language type (first language), and may further include the language type of the opposite end (first The second language); establishes an IP-based connection with the translation server, and sets the translation information through the second setting module, that is, the language to be translated, including the local-to-peer mapping, and may further include the peer-to-end mapping; The server establishes a connection based on IP communication, and sets the synthesized information through the third setting module, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.
[0082] 如图 6所示, 第一翻译处理模块 20包括第一发送单元 21、 第一接收单元 22、 第 二发送单元 23、 第二接收单元 24和第三发送单元 25, 其中: As shown in FIG. 6, the first translation processing module 20 includes a first transmitting unit 21, a first receiving unit 22, a second transmitting unit 23, a second receiving unit 24, and a third transmitting unit 25, where:
[0083] 第一发送单元 21设置为将原始第一语言的语音信息发送给语音识别服务器 , 以 使语音识别服务器将语音信息识别为第一语言的字符串。 第一发送单元 21首先 将原始第一语言的语音信息进行录音处理, 录制为一个个的语音文件并缓存, 然后将缓存的每个语音文件以数据包的形式依次发送给语音识别服务器。 语音 识别服务器接收到语音文件后, 根据预设的识别信息对语音文件进行识别处理 , 识别为第一语言的字符串, 并将第一语言的字符串返回给 VOLTE终端。 第一 接收单元 22设置为接收语音识别服务器返回的第一语言的字符串。 第二发送单 元 23设置为将第一语言的字符串发送给翻译服务器, 以使翻译服务器将第一语 言的字符串翻译为第二语言的字符串。 当接收到第一语言的字符串后, 第二发 送单元 23则将第一语言的字符串发送给翻译服务器。 翻译服务器接收到第一语 言的字符串后, 根据预设的翻译信息对该第一语言的字符串进行翻译处理, 翻 译为第二语言的字符串, 并将第二语言的字符串返回给 VOLTE终端。 第二接收 单元 24设置为接收翻译服务器返回的第二语言的字符串。 第三发送单元 25设置 为将第二语言的字符串发送给语音合成服务器, 以使语音合成服务器将第二语 言的字符串合成为最终第二语言的语音信息。 当接收到第二语言的字符串后, 第三发送单元 25则将第二语言的字符串发送给语音合成服务器。 语音合成服务 器接收到第二语言的字符串后, 根据预设的合成信息对第二语言的字符串进行 合成处理, 合成为最终第二语言的语音信息, 并将最终第二语言的语音信息以 语音码流的形式返回给 VOLTE终端。 [0083] The first transmitting unit 21 is configured to transmit the voice information of the original first language to the voice recognition server, so that the voice recognition server recognizes the voice information as a character string of the first language. The first sending unit 21 first performs recording processing on the voice information of the original first language, records the voice files as a single voice file, and buffers, and then sequentially sends each cached voice file to the voice recognition server in the form of a data packet. After receiving the voice file, the voice recognition server identifies the voice file according to the preset identification information, recognizes the character string as the first language, and returns the character string of the first language to the VOLTE terminal. The first receiving unit 22 is arranged to receive a character string of the first language returned by the voice recognition server. Second send order The element 23 is arranged to send a string of the first language to the translation server to cause the translation server to translate the string of the first language into a string of the second language. After receiving the character string of the first language, the second transmitting unit 23 transmits the character string of the first language to the translation server. After receiving the string of the first language, the translation server translates the string of the first language according to the preset translation information, translates the string into the second language, and returns the string of the second language to the VOLTE. terminal. The second receiving unit 24 is arranged to receive a character string of the second language returned by the translation server. The third transmitting unit 25 is arranged to transmit the character string of the second language to the speech synthesis server such that the speech synthesis server synthesizes the character string of the second language into the speech information of the final second language. After receiving the character string of the second language, the third transmitting unit 25 transmits the character string of the second language to the speech synthesis server. After receiving the character string in the second language, the speech synthesis server synthesizes the character string of the second language according to the preset synthesis information, synthesizes the speech information into the final second language, and finally the voice information of the second language is The form of the voice stream is returned to the VOLTE terminal.
[0084] 在其它实施例中, 也可以由一个服务器完成原始第一语言的语音信息的识别、 翻译和合成处理。 例如, 第一翻译处理模块 20将原始第一语言的语音信息发送 给服务器, 服务器将该语音信息进行识别、 翻译和合成处理后返回给 VOLTE终 端。 在另一些实施例中, 也可以由两个服务器完成原始第一语言的语音信息的 识别、 翻译和合成处理。 例如, 第一翻译处理模块 20将原始第一语言的语音信 息发送给第一服务器, 第一服务器将该语音信息进行识别和翻译处理后返回给 V OLTE终端, 第一翻译处理模块 20再将识别和翻译处理后的语音信息发送给第二 服务器, 第二服务器将该语音信息进行合成处理后返回给 VOLTE终端。 又如, 第一翻译处理模块 20将原始第一语言的语音信息发送给第一服务器, 第一服务 器将该语音信息进行识别处理后返回给 VOLTE终端, 第一翻译处理模块 20再将 识别处理后的语音信息发送给第二服务器, 第二服务器将该语音信息进行翻译 和合成处理后返回给 VOLTE终端。 [0084] In other embodiments, the identification, translation, and synthesis processing of the voice information of the original first language may also be performed by one server. For example, the first translation processing module 20 transmits the voice information of the original first language to the server, and the server identifies, translates, and synthesizes the voice information and returns it to the VOLTE terminal. In other embodiments, the identification, translation, and synthesis processing of the speech information of the original first language may also be performed by two servers. For example, the first translation processing module 20 sends the voice information of the original first language to the first server, and the first server identifies and translates the voice information and returns the result to the V OLTE terminal, where the first translation processing module 20 identifies And the voice information after the translation processing is sent to the second server, and the second server combines the voice information and returns to the VOLTE terminal. For another example, the first translation processing module 20 sends the voice information of the original first language to the first server, and the first server performs the identification process and returns the voice information to the VOLTE terminal, and the first translation processing module 20 further identifies the processing. The voice information is sent to the second server, and the second server translates and synthesizes the voice information and returns it to the VOLTE terminal.
[0085] 本发明实施例的语音通话装置, 通过将釆集的本端用户的语音信息发送给服务 器进行翻译处理, 翻译为对端用户能够识别的语音信息, 再将翻译后的语音信 息发送给对端, 使得对端用户能够听懂本端用户的语音。 从而为通信终端增加 了翻译功能, 使得使用不同语言的用户实现了远程语音交流, 解决了使用不同 语言的用户无法通过通信终端进行远程语音交流的技术问题, 降低了沟通成本 , 提升了用户体验。 The voice call device of the embodiment of the present invention transmits the voice information of the local user of the collection to the server for translation processing, translates the voice information that can be recognized by the peer user, and then sends the translated voice information to the voice message. The peer end enables the peer user to understand the voice of the local user. Therefore, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, and the use is different. Language users can't communicate technical problems with remote voice communication through communication terminals, which reduces communication costs and improves user experience.
[0086] 参照图 7, 提出本发明的语音通话装置的第二实施例, 该装置包括第二信息接 收模块 50、 第二翻译处理模块 60、 第三信息接收模块 70和信息输出模块 80, 其 中: 第二信息接收模块 50设置为接收对端发送的原始第二语言的语音信息。 当 V OLTE终端作为接收端吋, 第二信息接收模块 50通过语音通道接收作为发送端的 对端发送的原始第二语言的语音信息。 第二翻译处理模块 60设置为将原始第二 语言的语音信息发送给服务器进行翻译处理, 以使服务器将原始第二语言的语 音信息翻译处理为最终第一语言的语音信息。 第二翻译处理模块 60可以将原始 第二语言的语音信息直接以语音数据流的方式发送给服务器, 作为优选, 第二 翻译处理模块 60将原始第二语言的语音信息以数据包的形式分包发送给服务器 。 例如, 第二翻译处理模块 60首先将原始第二语言的语音信息进行录音处理, 录制为一个个的语音文件并缓存, 然后将缓存的每个语音文件以数据包的形式 依次发送给服务器。 Referring to FIG. 7, a second embodiment of a voice call device of the present invention is provided. The device includes a second information receiving module 50, a second translation processing module 60, a third information receiving module 70, and an information output module 80, wherein The second information receiving module 50 is configured to receive the voice information of the original second language sent by the opposite end. When the V OLTE terminal is used as the receiving end, the second information receiving module 50 receives the voice information of the original second language sent by the opposite end of the transmitting end through the voice channel. The second translation processing module 60 is arranged to transmit the voice information of the original second language to the server for translation processing, so that the server translates the voice information of the original second language into the voice information of the final first language. The second translation processing module 60 may directly transmit the voice information of the original second language to the server as a voice data stream. Preferably, the second translation processing module 60 segments the voice information of the original second language in the form of a data packet. Sent to the server. For example, the second translation processing module 60 first performs recording processing on the voice information of the original second language, records the voice files as a single voice file, and caches them, and then sequentially sends each cached voice file to the server in the form of a data packet.
[0087] 本发明实施例中, 服务器包括语音识别服务器、 翻译服务器和语音合成服务器 。 VOLTE终端与语音识别服务器建立基于 IP通信的连接, 通过第一设置模块设 置识别信息, 即需要识别的语言类型, 包括对端的语言类型 (第二语言) , 还 可以进一步包括本端的语言类型 (第一语言) ; 与翻译服务器建立基于 IP通信的 连接, 通过第二设置模块设置翻译信息, 即要翻译的语种, 包括对端对本端映 射, 还可以进一步包括本端对对端的映射; 与语音合成服务器建立基于 IP通信的 连接, 通过第三设置模块设置合成信息, 即语音合成的类型, 比如男女声、 语 速等。 [0087] In the embodiment of the present invention, the server includes a voice recognition server, a translation server, and a voice synthesis server. The VOLTE terminal establishes an IP-based communication connection with the voice recognition server, and sets the identification information through the first setting module, that is, the language type to be recognized, including the language type of the opposite end (second language), and may further include the local language type (first) a language); establishing a connection based on the IP communication with the translation server, setting the translation information through the second setting module, that is, the language to be translated, including the mapping of the peer to the local end, and further including the mapping of the local end to the opposite end; and speech synthesis The server establishes a connection based on IP communication, and sets the synthesized information through the third setting module, that is, the type of speech synthesis, such as male and female voice, speech rate, and the like.
[0088] 如图 8所示, 第二翻译处理模块 60包括第四发送单元 61、 第三接收单元 62、 第 五发送单元 63、 第四接收单元 64和第六发送单元 65, 其中: 第四发送单元 61设 置为将原始第二语言的语音信息发送给语音识别服务器, 以使语音识别服务器 将语音信息识别为第二语言的字符串。 第四发送单元 61首先将原始第二语言的 语音信息进行录音处理, 录制为一个个的语音文件并缓存, 然后将缓存的每个 语音文件以数据包的形式依次发送给语音识别服务器。 语音识别服务器接收到 语音文件后, 根据预设的识别信息对语音文件进行识别处理, 识别为第二语言 的字符串, 并将第二语言的字符串返回给 VOLTE终端。 第三接收单元 62设置为 接收语音识别服务器返回的第二语言的字符串。 第五发送单元 63设置为将第二 语言的字符串发送给翻译服务器, 以使翻译服务器将第二语言的字符串翻译为 第一语言的字符串。 当接收到第二语言的字符串后, 第五发送单元 63则将第二 语言的字符串发送给翻译服务器。 翻译服务器接收到第二语言的字符串后, 根 据预设的翻译信息对该第二语言的字符串进行翻译处理, 翻译为第一语言的字 符串, 并将第一语言的字符串返回给 VOLTE终端。 第四接收单元 64设置为接收 翻译服务器返回的第一语言的字符串。 第六发送单元 65设置为将第一语言的字 符串发送给语音合成服务器, 以使语音合成服务器将第一语言的字符串合成为 最终第一语言的语音信息。 当接收到第一语言的字符串后, 第六发送单元 65则 将第一语言的字符串发送给语音合成服务器。 语音合成服务器接收到第一语言 的字符串后, 根据预设的合成信息对第一语言的字符串进行合成处理, 合成为 最终第一语言的语音信息, 并将最终第一语言的语音信息以语音码流的形式返 回给 VOLTE终端。 As shown in FIG. 8, the second translation processing module 60 includes a fourth transmitting unit 61, a third receiving unit 62, a fifth transmitting unit 63, a fourth receiving unit 64, and a sixth transmitting unit 65, where: The transmitting unit 61 is arranged to transmit the voice information of the original second language to the voice recognition server such that the voice recognition server recognizes the voice information as a character string of the second language. The fourth sending unit 61 first performs recording processing on the voice information of the original second language, records the voice files into a single voice file, and buffers them, and then sequentially sends each cached voice file to the voice recognition server in the form of a data packet. Received by the speech recognition server After the voice file, the voice file is identified according to the preset identification information, recognized as a character string of the second language, and the character string of the second language is returned to the VOLTE terminal. The third receiving unit 62 is arranged to receive a character string of the second language returned by the voice recognition server. The fifth transmitting unit 63 is arranged to transmit the character string of the second language to the translation server to cause the translation server to translate the character string of the second language into a character string of the first language. After receiving the character string of the second language, the fifth transmitting unit 63 transmits the character string of the second language to the translation server. After receiving the character string in the second language, the translation server translates the character string of the second language according to the preset translation information, translates the character string into the first language, and returns the character string of the first language to VOLTE. terminal. The fourth receiving unit 64 is arranged to receive a character string of the first language returned by the translation server. The sixth transmitting unit 65 is arranged to transmit the character string of the first language to the speech synthesis server such that the speech synthesis server synthesizes the character string of the first language into the speech information of the final first language. After receiving the character string of the first language, the sixth transmitting unit 65 transmits the character string of the first language to the speech synthesis server. After receiving the character string in the first language, the speech synthesis server synthesizes the character string of the first language according to the preset synthesis information, synthesizes the voice information into the final first language, and uses the voice information of the final first language to The form of the voice stream is returned to the VOLTE terminal.
在其它实施例中, 也可以由一个服务器完成原始第二语言的语音信息的识别、 翻译和合成处理。 例如, 第二翻译处理模块 60将原始第二语言的语音信息发送 给服务器, 服务器将该语音信息进行识别、 翻译和合成处理后返回给 VOLTE终 端。 在另一些实施例中, 也可以由两个服务器完成原始第二语言的语音信息的 识别、 翻译和合成处理。 例如, 第二翻译处理模块 60将原始第二语言的语音信 息发送给第一服务器, 第一服务器将该语音信息进行识别和翻译处理后返回给 V 0LTE终端, 第二翻译处理模块 60再将识别和翻译处理后的语音信息发送给第二 服务器, 第二服务器将该语音信息进行合成处理后返回给 VOLTE终端。 又如, 第二翻译处理模块 60将原始第二语言的语音信息发送给第一服务器, 第一服务 器将该语音信息进行识别处理后返回给 VOLTE终端, 第二翻译处理模块 60再将 识别处理后的语音信息发送给第二服务器, 第二服务器将该语音信息进行翻译 和合成处理后返回给 VOLTE终端。 第三信息接收模块 70设置为接收服务器返回 的最终第一语言的语音信息。 信息输出模块 80设置为输出最终第一语言的语音 信息。 当接收到服务器返回的最终第一语言的语音信息后, 信息输出模块 80则 通过音频通路对该最终第一语言的语音信息进行处理, 最后通过发声装置 (听 筒、 扬声器等) 输出该最终第一语言的语音信息, 使用第一语言的本端用户则 能够听懂对端用户所说的话。 In other embodiments, the identification, translation, and synthesis processing of the voice information of the original second language may also be performed by a server. For example, the second translation processing module 60 transmits the voice information of the original second language to the server, and the server identifies, translates, and synthesizes the voice information, and returns the result to the VOLTE terminal. In other embodiments, the identification, translation, and synthesis processing of the speech information of the original second language may also be performed by two servers. For example, the second translation processing module 60 sends the voice information of the original second language to the first server, and the first server identifies and translates the voice information and returns the result to the V 0LTE terminal, where the second translation processing module 60 identifies And the voice information after the translation processing is sent to the second server, and the second server combines the voice information and returns to the VOLTE terminal. For another example, the second translation processing module 60 sends the voice information of the original second language to the first server, and the first server performs the identification process and returns the voice information to the VOLTE terminal, and the second translation processing module 60 performs the identification process. The voice information is sent to the second server, and the second server translates and synthesizes the voice information and returns it to the VOLTE terminal. The third information receiving module 70 is configured to receive the voice information of the final first language returned by the server. The information output module 80 is configured to output the voice of the final first language Information. After receiving the voice information of the final first language returned by the server, the information output module 80 processes the voice information of the final first language through the audio path, and finally outputs the final first through the sounding device (handset, speaker, etc.) The voice information of the language, the local user who uses the first language can understand what the opposite user said.
[0090] 前述第一实施例和第二实施例的语音通话装置, 可以应用于如图 1所示的应用 场景。 [0090] The voice call apparatuses of the foregoing first embodiment and the second embodiment can be applied to the application scenario as shown in FIG. 1.
[0091] 本发明实施例的语音通话装置, 通过将接收到的对端用户的语音信息发送给服 务器进行翻译处理, 翻译为本端用户能够识别的语音信息, 再输出翻译后的语 音信息, 使得本端用户能够听懂对端用户的语音。 从而为通信终端增加了翻译 功能, 使得使用不同语言的用户实现了远程语音交流, 解决了使用不同语言的 用户无法通过通信终端进行远程语音交流的技术问题, 降低了沟通成本, 提升 了用户体验。 The voice call device of the embodiment of the present invention transmits the voice information of the received peer user to the server for translation processing, translates the voice information that the local user can recognize, and then outputs the translated voice information, so that The local user can understand the voice of the peer user. Thereby, the translation function is added to the communication terminal, so that users who use different languages realize remote voice communication, which solves the technical problem that the users who use different languages cannot communicate remotely through the communication terminal, reduces the communication cost, and improves the user experience.
[0092] 进一步地, 如图 9, 还可以将前述第一实施例和第二实施例的语音通话装置结 合起来形成第三实施例的语音通话装置。 使得语音通话装置既可以将本端采集 的语音信息进行翻译处理后再发送给对端, 也可以将对端发送的语音信息进行 翻译处理后再予以输出, 从而即使对端为普通的语音终端也能实现使用不同语 言的用户的远程语音交流, 大大扩大了应用范围, 进一步降低了沟通成本。 Further, as shown in FIG. 9, the voice communication devices of the foregoing first embodiment and the second embodiment may be combined to form the voice communication device of the third embodiment. The voice call device can not only translate the voice information collected by the local end but also send the voice information to the opposite end, and can also translate the voice information sent by the opposite end and then output the voice information, so that even if the opposite end is an ordinary voice terminal, The ability to implement remote voice communication for users using different languages greatly expands the scope of application and further reduces communication costs.
[0093] 本实施例的语音通话装置可以应用于如图 2-图 4所示的应用场景。 [0093] The voice call device of this embodiment can be applied to the application scenario as shown in FIG. 2 to FIG. 4.
Claims
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710495264.2A CN107343113A (en) | 2017-06-26 | 2017-06-26 | Audio communication method and device |
| CN201710495264.2 | 2017-06-26 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019000515A1 true WO2019000515A1 (en) | 2019-01-03 |
Family
ID=60220070
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2017/093741 Ceased WO2019000515A1 (en) | 2017-06-26 | 2017-07-20 | Voice call method and device |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN107343113A (en) |
| WO (1) | WO2019000515A1 (en) |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108228575A (en) * | 2017-12-20 | 2018-06-29 | 科大讯飞股份有限公司 | Voiced translation exchange method and system |
| WO2019134107A1 (en) * | 2018-01-05 | 2019-07-11 | 深圳市沃特沃德股份有限公司 | Method and device for speech-to-speech translation, and translation device |
| CN109446533B (en) * | 2018-09-17 | 2020-12-22 | 深圳市沃特沃德股份有限公司 | Bluetooth translation machine, interactive mode of Bluetooth translation and device thereof |
| CN109286725B (en) | 2018-10-15 | 2021-10-19 | 华为技术有限公司 | Translation method and terminal |
| CN114999535A (en) * | 2018-10-15 | 2022-09-02 | 华为技术有限公司 | Voice data processing method and device in online translation process |
| CN109582976A (en) * | 2018-10-15 | 2019-04-05 | 华为技术有限公司 | A kind of interpretation method and electronic equipment based on voice communication |
| CN109327613B (en) * | 2018-10-15 | 2020-09-29 | 华为技术有限公司 | Negotiation method based on voice call translation capability and electronic equipment |
| CN110111770A (en) * | 2019-05-10 | 2019-08-09 | 濮阳市顶峰网络科技有限公司 | A kind of multilingual social interpretation method of network, system, equipment and medium |
| CN110267309B (en) * | 2019-06-26 | 2022-09-23 | 广州三星通信技术研究有限公司 | Method and equipment for translating call voice in real time |
| CN110442881A (en) * | 2019-08-06 | 2019-11-12 | 上海祥久智能科技有限公司 | A kind of information processing method and device of voice conversion |
| CN113660375B (en) * | 2021-08-11 | 2023-02-03 | 维沃移动通信有限公司 | Call method and device and electronic equipment |
| CN114625336A (en) * | 2022-03-10 | 2022-06-14 | 北京小米移动软件有限公司 | Call method, device, terminal device and storage medium |
| CN115767484B (en) * | 2022-11-07 | 2024-07-09 | 中国联合网络通信集团有限公司 | Call processing method, device, server, system and medium in customer service scene |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102360347A (en) * | 2011-09-30 | 2012-02-22 | 宇龙计算机通信科技(深圳)有限公司 | Voice translation method and system and voice translation server |
| CN104394265A (en) * | 2014-10-31 | 2015-03-04 | 小米科技有限责任公司 | Automatic session method and device based on mobile intelligent terminal |
| CN104754536A (en) * | 2013-12-27 | 2015-07-01 | 中国移动通信集团公司 | Method and system for realizing communication between different languages |
| CN105430208A (en) * | 2015-10-23 | 2016-03-23 | 小米科技有限责任公司 | Voice conversation method and apparatus, and terminal equipment |
| US20160170970A1 (en) * | 2014-12-12 | 2016-06-16 | Microsoft Technology Licensing, Llc | Translation Control |
| CN106453043A (en) * | 2016-09-29 | 2017-02-22 | 安徽声讯信息技术有限公司 | Multi-language conversion-based instant communication system |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101867632A (en) * | 2009-06-12 | 2010-10-20 | 刘越 | Mobile phone speech instant translation system and method |
-
2017
- 2017-06-26 CN CN201710495264.2A patent/CN107343113A/en active Pending
- 2017-07-20 WO PCT/CN2017/093741 patent/WO2019000515A1/en not_active Ceased
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102360347A (en) * | 2011-09-30 | 2012-02-22 | 宇龙计算机通信科技(深圳)有限公司 | Voice translation method and system and voice translation server |
| CN104754536A (en) * | 2013-12-27 | 2015-07-01 | 中国移动通信集团公司 | Method and system for realizing communication between different languages |
| CN104394265A (en) * | 2014-10-31 | 2015-03-04 | 小米科技有限责任公司 | Automatic session method and device based on mobile intelligent terminal |
| US20160170970A1 (en) * | 2014-12-12 | 2016-06-16 | Microsoft Technology Licensing, Llc | Translation Control |
| CN105430208A (en) * | 2015-10-23 | 2016-03-23 | 小米科技有限责任公司 | Voice conversation method and apparatus, and terminal equipment |
| CN106453043A (en) * | 2016-09-29 | 2017-02-22 | 安徽声讯信息技术有限公司 | Multi-language conversion-based instant communication system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107343113A (en) | 2017-11-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2019000515A1 (en) | Voice call method and device | |
| US10834252B2 (en) | Transcribing audio communication sessions | |
| US8606249B1 (en) | Methods and systems for enhancing audio quality during teleconferencing | |
| EP3276905B1 (en) | System for audio communication using lte | |
| CN103379232B (en) | Communication server, communication terminal and voice communication method | |
| US20140278402A1 (en) | Automatic Channel Selective Transcription Engine | |
| US11710488B2 (en) | Transcription of communications using multiple speech recognition systems | |
| US10506067B2 (en) | Dynamic personalization of a communication session in heterogeneous environments | |
| WO2016094598A1 (en) | Translation control | |
| CN113395284B (en) | Multi-scene voice service real-time matching method, system, equipment and storage medium | |
| CN114979545A (en) | Multi-terminal calling method, storage medium and electronic device | |
| CN103067188A (en) | Network phone conference system and implementation method thereof | |
| US20090299735A1 (en) | Method for Transferring an Audio Stream Between a Plurality of Terminals | |
| RU2015156799A (en) | SYSTEM AND METHOD FOR CREATING A WIRELESS TUBE FOR STATIONARY PHONES USING A HOME GATEWAY AND A SMARTPHONE | |
| CN111448567A (en) | Real-time speech processing | |
| CN113612759A (en) | High-performance high-concurrency intelligent broadcasting system based on SIP protocol and implementation method | |
| EP2536176B1 (en) | Text-to-speech injection apparatus for telecommunication system | |
| CN107566340B (en) | Conference auxiliary communication method and storage medium and device thereof | |
| CN105407243B (en) | An Echo Cancellation VOIP System Using Improved Affine Projection Algorithm on Android Platform | |
| KR101341893B1 (en) | Telephone call service apparatus and method for magnetic telephone of roip gateway | |
| US10721360B2 (en) | Method and device for reducing telephone call costs | |
| CN116233351A (en) | Method and system for interactive video conference based on small program | |
| HK40073421A (en) | Multi-terminal communication method, and storage medium, and electronic device | |
| KR102413621B1 (en) | Terminal apparatus and service server for providing information | |
| Rothbucher et al. | Backwards compatible 3d audio conference server using hrtf synthesis and sip |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17916375 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17916375 Country of ref document: EP Kind code of ref document: A1 |