[go: up one dir, main page]

CN120434333A - System and method for telephone robot interaction acceleration based on large model - Google Patents

System and method for telephone robot interaction acceleration based on large model

Info

Publication number
CN120434333A
CN120434333A CN202510359840.5A CN202510359840A CN120434333A CN 120434333 A CN120434333 A CN 120434333A CN 202510359840 A CN202510359840 A CN 202510359840A CN 120434333 A CN120434333 A CN 120434333A
Authority
CN
China
Prior art keywords
tts
large model
telephone
speech synthesis
synthesis engine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510359840.5A
Other languages
Chinese (zh)
Inventor
胡家鹰
李全忠
何国涛
蒲瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Puqiang Times Zhuhai Hengqin Information Technology Co ltd
Original Assignee
Puqiang Times Zhuhai Hengqin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Puqiang Times Zhuhai Hengqin Information Technology Co ltd filed Critical Puqiang Times Zhuhai Hengqin Information Technology Co ltd
Priority to CN202510359840.5A priority Critical patent/CN120434333A/en
Publication of CN120434333A publication Critical patent/CN120434333A/en
Pending legal-status Critical Current

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention provides a system and a method for interactive acceleration of a telephone robot based on a large model, wherein the system comprises an IVR (Interactive Voice response) for carrying out automatic communication service through a telephone, controlling telephone communication in a telephone robot dialogue system, realizing a complete telephone robot system by interfacing with a voice synthesis engine and a session management service, a TTS (text to speech) for converting text information into natural sounding voice, a LLM (deep learning model) with a large number of parameters and a complex structure for processing a large number of data in the fields of natural language processing and voice recognition, wherein the IVR is connected with the TTS, and the LLM is connected with the TTS. According to the invention, the conversation prompt fragments are synthesized one by adopting a mode that a large model, a voice synthesis engine and a call platform play and work simultaneously, and the voice fragments are played one by one, so that the delay of each link of the telephone robot dialogue system is optimized, the acceleration of the synthesis link and the acceleration of the playing link are realized, and the human-computer interaction low-delay experience is realized.

Description

System and method for telephone robot interaction acceleration based on large model
Technical Field
The invention relates to the technical field of intelligent telephone voice robots, in particular to a telephone robot interactive acceleration system and method based on a large model.
Background
In the outbound robot project based on the large model, although the large model can provide better understanding effect than the traditional semantic understanding engine, the delay of returning the complete result of the large model is larger, usually 5 seconds to 10 seconds, and the traditional telephone robot plays after receiving the complete prompt result, so that the requirement of man-machine interaction on low delay cannot be met.
Moreover, the semantic understanding capability of the phone robot system based on the large model is greatly superior to that of the traditional semantic understanding engine, but the introduction of the large model into the phone voice robot system is also very difficult due to the very long delay of the large model call.
Disclosure of Invention
In view of the above, the present invention aims to provide a system and a method for interactive acceleration of a phone robot based on a large model, which optimize the delay of each link of a dialogue system of the phone robot, directly communicate with a speech synthesis engine by using a session management service, and transfer sentences to be synthesized generated by the large model to the speech synthesis engine in a sentence-by-sentence transfer manner.
The invention provides a telephone robot interactive acceleration system based on a large model, which comprises:
the call platform IVR is used for carrying out automatic communication service through a telephone, controlling telephone communication in a telephone robot dialogue system, realizing telephone operation such as telephone answering, calling out, transferring, playing back and receiving numbers, hanging up and transferring, and realizing a complete telephone robot system by interfacing with a voice synthesis engine and a session management service;
a speech synthesis engine TTS for converting text information into natural sounding speech;
specifically, the speech synthesis engine TTS converts the text answers of the telephony robot dialog system into speech output, enabling the user to receive information audibly.
The technical options of the speech synthesis engine TTS include:
based on the spliced TTS, the pre-recorded voice fragments are used for splicing.
Parameter-based TTS: neural network models such as WaveNet, tacotron.
The implementation steps of the speech synthesis engine TTS include:
and (3) text analysis, namely preprocessing such as word segmentation, prosody prediction and the like on the text.
Speech synthesis, which is to generate speech waveforms according to text features.
Voice optimization, namely adjusting the voice speed, the tone and the like, and improving the naturalness.
The large model LLM is a deep learning model with a large number of parameters and a complex structure and is used for processing a large amount of data in the fields of natural language processing and voice recognition;
the calling platform IVR is connected with the speech synthesis engine TTS, and the large model LLM is connected with the speech synthesis engine TTS.
Further, the system for phone robot interaction acceleration based on the large model further comprises:
the session management service DM is used for realizing the design of a man-machine conversation process, the configuration of parameters of a prompt, a pause time, a hot word and a dynamic model of each node in the process, and the configuration of rules and models for semantic understanding, thereby realizing the process control of man-machine multi-round interaction;
the call platform IVR is connected with the session management service DM which is respectively connected with the large model LLM and the speech synthesis engine TTS.
In particular, the session management service DM is the core of the dialog system for maintaining dialog states and deciding on the next action.
The technical options of the session management service DM include:
Rule-based systems-use predefined rules and state machines.
Machine learning based systems using reinforcement learning or policy networks.
The implementation steps of the session management service DM include:
State maintenance, tracking dialog history and current state.
Decision making-deciding the system response based on the user intent and the context information.
And executing the action, namely calling the corresponding service or API to generate an answer.
Further, the system for phone robot interaction acceleration based on the large model further comprises:
A speech recognition engine ASR for converting human speech into machine-understandable text;
The speech recognition engine ASR is connected with the call platform IVR.
Specifically, the speech recognition engine ASR is the first step in converting a user's speech input into text information. ASR is based on deep learning technology, and can realize high-accuracy speech-to-text conversion.
Technical options for the speech recognition engine ASR include:
End-to-end models such as CTC (Connectionist Temporal Classification) and attention mechanism models.
Open source tools Mozilla DeepSpeech, kaldi, etc.
The implementation steps of the speech recognition engine ASR include:
Audio acquisition using a microphone or other audio input device.
Preprocessing, noise reduction, gain control and feature extraction.
Model training, namely training a model by using a large amount of marked voice data.
Recognition, converting speech to text in real-time or non-real-time.
The invention also provides a method for the telephone robot interaction acceleration based on the large model, which is applied to the telephone robot interaction acceleration system based on the large model, and comprises the following steps:
The method comprises the steps of adopting a mode that a speech synthesis engine TTS synthesizes conversation prompt fragments one by one, a calling platform IVR plays conversation prompt fragments one by one, enabling a large model LLM, the speech synthesis engine TTS and the calling platform IVR to play simultaneously, calling the speech synthesis engine TTS immediately when the large model LLM returns a sentence result to be synthesized, and enabling the calling platform IVR to start playing immediately when the speech synthesis engine TTS synthesizes a conversation prompt fragment (one conversation prompt fragment is a sentence, generally 3-5 words are short and 20-30 words are long).
When the intelligent telephone voice robot based on the large model performs outbound, the session management service DM transmits the session prompt fragments to the voice synthesis engine sentence by sentence, so that the voice synthesis engine instantly starts the synthesis of the prompt, and instantly starts playing by the call platform, thereby realizing the man-machine interaction experience with lower delay.
Further, the method for accelerating interaction of the telephone robot based on the large model further comprises the following steps:
Under the open question-answering scene of the telephone robot, the session management service DM calls a large model LLM stream to return an answer text, when the session management service DM receives a sentence generated by the large model LLM, the session management service DM immediately informs a speech synthesis engine TTS to synthesize, a call platform IVR calls the speech synthesis engine TTS to play recording fragments one by one, the speech synthesis speed of the speech synthesis engine TTS exceeds the play speed of the call platform IVR, so that the play is normally carried out, and the play of the round of prompt is ended until the last recording fragment is played.
Further, the speech synthesis speed of the speech synthesis engine TTS is higher than the speed at which a human normally speaks.
Preferably, the speech synthesis speed of the speech synthesis engine TTS is greater than 3 words/sec.
Embodiments of the present invention have compressed the man-machine interaction delay of a telerobotic system to an acceptable level (industry standard is typically within 1.8 seconds-2.2 seconds).
The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of large model based telephonic robot interaction acceleration as described above.
The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of large model based telephony robot interaction acceleration as described above when executing the program.
Compared with the prior art, the invention has the beneficial effects that:
The invention provides a system and a method for interactive acceleration of a telephone robot based on a large model, which adopt a mode that the large model, a voice synthesis engine and a call platform play and work simultaneously, change the existing mode of receiving the complete prompt results and then sending the complete prompt results to the voice synthesis engine, change the design into a mode of synthesizing conversation prompt fragments one by one and playing conversation prompt fragments one by one, directly communicate with the voice synthesis engine by using conversation management service, transfer sentences to be synthesized generated by the large model to the voice synthesis engine in a sentence-by-sentence transmission mode, and when the voice synthesis engine synthesizes one conversation prompt fragment, the call platform instantly plays the synthesized conversation prompt voice fragments, so that the voice synthesis engine also synthesizes conversation prompt fragments simultaneously when the large model returns the sentence results to be synthesized, and the call platform also plays conversation prompt voice fragments simultaneously, thereby optimizing the delay of each link of the telephone robot dialogue system, realizing the acceleration of synthesis links and the acceleration of playing links, meeting the requirements of man-machine interaction reduction delay, and providing practical and effective solutions for the large model in the production application of telephone robot projects.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.
In the drawings:
FIG. 1 is a flow diagram of a session management service quasi-streaming calling TTS composite recording clip according to an embodiment of the present invention;
FIG. 2 is a flow chart of a large model flow returning result according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a TTS quasi-streaming composite recording clip for a session management service according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and products consistent with some aspects of the disclosure as detailed in the appended claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.
Embodiments of the present invention are described in further detail below.
The embodiment of the invention provides a telephone robot interactive acceleration system based on a large model, which comprises the following steps:
the call platform IVR is used for carrying out automatic communication service through a telephone, controlling telephone communication in a telephone robot dialogue system, realizing telephone operation such as telephone answering, calling out, transferring, playing back and receiving numbers, hanging up and transferring, and realizing a complete telephone robot system by interfacing with a voice synthesis engine and a session management service;
a speech synthesis engine TTS for converting text information into natural sounding speech;
The speech synthesis engine TTS converts the text answers of the telephony robot dialog system into speech output enabling the user to receive information audibly.
The technical options of the speech synthesis engine TTS include:
based on the spliced TTS, the pre-recorded voice fragments are used for splicing.
Parameter-based TTS: neural network models such as WaveNet, tacotron.
The implementation steps of the speech synthesis engine TTS include:
and (3) text analysis, namely preprocessing such as word segmentation, prosody prediction and the like on the text.
Speech synthesis, which is to generate speech waveforms according to text features.
Voice optimization, namely adjusting the voice speed, the tone and the like, and improving the naturalness.
A large model LLM (large artificial intelligence model) which is a deep learning model with a large number of parameters and complex structures and is used for processing a large amount of data in the fields of natural language processing and voice recognition;
The call platform IVR is connected with the speech synthesis engine TTS; the large model LLM is connected with the speech synthesis engine TTS;
the system for telephone robot interaction acceleration based on the large model further comprises:
the session management service DM is used for realizing the design of a man-machine conversation process, the configuration of parameters of a prompt, a pause time, a hot word and a dynamic model of each node in the process, and the configuration of rules and models for semantic understanding, thereby realizing the process control of man-machine multi-round interaction;
The session management service in the telephone robot of the embodiment can realize the design of a man-machine conversation process, the configuration of parameters such as a prompt, a pause time, a hot word, a dynamic model and the like of each node in the process, and the configuration of rules and models for semantic understanding, thereby realizing the process control of man-machine multi-round interaction. The following are examples of parameters:
Prompt to welcome to call a certain travel net, ask what can help you?
The pause time is 800ms;
hotwords, namely a building and an address;
dynamic parameters cmn/yue/eng, wherein cmn represents a mandarin model, yue represents a cantonese model, and eng represents an English model;
rules for semantic understanding, rule models, including regular expressions, for intent judgment, such as (yes) | (confirm) | (opposite) such representations confirm intent;
semantic understanding model the semantic understanding can use a neural network small model or a neural network large model, and the trained model can return corresponding intention judgment according to the text input by the client.
The call platform IVR is connected with the session management service DM which is respectively connected with the large model LLM and the speech synthesis engine TTS.
The system for telephone robot interaction acceleration based on the large model further comprises:
A speech recognition engine ASR for converting human speech into machine-understandable text;
The speech recognition engine ASR is connected with the call platform IVR.
The speech recognition engine ASR is the first step in converting the user's speech input into text information. ASR is based on deep learning technology, and can realize high-accuracy speech-to-text conversion.
Technical options for the speech recognition engine ASR include:
End-to-end models such as CTC (Connectionist Temporal Classification) and attention mechanism models.
Open source tools Mozilla DeepSpeech, kaldi, etc.
The implementation steps of the speech recognition engine ASR include:
Audio acquisition using a microphone or other audio input device.
Preprocessing, noise reduction, gain control and feature extraction.
Model training, namely training a model by using a large amount of marked voice data.
Recognition, converting speech to text in real-time or non-real-time.
The embodiment of the invention also provides a method for accelerating the interaction of the telephone robot based on the large model, which is applied to the system for accelerating the interaction of the telephone robot based on the large model, and comprises the following steps:
The method comprises the steps of adopting a mode that a speech synthesis engine TTS synthesizes conversation prompt fragments one by one, and a calling platform IVR plays conversation prompt fragments one by one, enabling a large model LLM, the speech synthesis engine TTS and the calling platform IVR to play simultaneously, enabling the speech synthesis engine TTS to be called immediately when the large model LLM returns a sentence result to be synthesized, and enabling the calling platform IVR to start playing immediately when the speech synthesis engine TTS synthesizes a conversation prompt fragment. Fig. 1 shows a flow of a session management service quasi-streaming call TTS composite recording clips of the present embodiment.
Under the open question-answering scene of the telephone robot, the session management service DM calls a large model LLM stream to return an answer text, when the session management service DM receives a sentence generated by the large model LLM, the session management service DM immediately informs a speech synthesis engine TTS to synthesize, a call platform IVR calls the speech synthesis engine TTS to play recording fragments one by one, the speech synthesis speed of the speech synthesis engine TTS exceeds the play speed of the call platform IVR, so that the play is normally carried out, and the play of the round of prompt is ended until the last recording fragment is played. Fig. 2 shows a flow of a large model streaming return result of the present embodiment, and fig. 3 shows a flow of a session management service call TTS quasi-streaming composite recording clip of the present embodiment.
The speech synthesis speed of the speech synthesis engine TTS is higher than the speed at which a human normally speaks. In this embodiment, the speech synthesis speed of the speech synthesis engine TTS is greater than 3 words/second.
When the intelligent telephone voice robot based on the large model performs outbound, the session management service DM transmits session prompt fragments to the voice synthesis engine sentence by sentence, so that the voice synthesis engine can start prompt synthesis as soon as possible, and the call platform can start playing as soon as possible, and the technical scheme of man-machine interaction experience with lower delay can be realized.
The system and the method for the interaction acceleration of the telephone robot based on the large model adopt a mode that the large model, the voice synthesis engine and the call platform play and work simultaneously, change the existing mode that the complete prompt results are received and then sent to the voice synthesis engine, change the design into a mode that conversation prompt fragments are synthesized one by one and conversation prompt fragments are played one by one, directly communicate with the voice synthesis engine by using conversation management service, transfer sentences to be synthesized generated by the large model to the voice synthesis engine in a sentence-by-sentence transmission mode, and when the voice synthesis engine synthesizes one conversation prompt fragment, the call platform immediately plays the synthesized conversation prompt voice fragments, so that the voice synthesis engine also synthesizes the conversation prompt fragments simultaneously when the large model returns the conversation prompt results, the call platform also plays the conversation prompt voice fragments simultaneously, delay of each link of the conversation system of the telephone robot is optimized, the acceleration of the synthetic links is realized, the effect of the acceleration of the playing links is also accelerated, and the requirement of the reduction delay of man-machine interaction is met.
An embodiment of the present invention further provides a computer device, fig. 4 is a schematic structural diagram of a computer device provided by the embodiment of the present invention, and referring to fig. 4 of the present invention, the computer device includes an input system 23, an output system 24, a memory 22 and a processor 21, where the memory 22 is used to store one or more programs, and when the one or more programs are executed by the one or more processors 21, the one or more processors 21 implement a method for implementing the large model based interaction acceleration of a telephone robot as provided by the foregoing embodiment, and the input system 23, the output system 24, the memory 22 and the processor 21 may be connected by a bus or another manner, where fig. 4 is exemplified by connection via the bus.
The memory 22 is a computer-readable storage medium that may be used to store software programs, computer-executable programs, and program instructions corresponding to the method of large model-based telephony robot interaction acceleration according to embodiments of the present invention, and the memory 22 may mainly include a memory program area that may store an operating system, application programs required for at least one function, data area that may store data created according to the use of the device, etc., and a memory 22 that may further include a high-speed random access memory, a nonvolatile memory such as at least one disk storage device, a flash memory device, or other nonvolatile solid state storage device, and in some examples, the memory 22 may further include a memory remotely located with respect to the processor 21, which may be connected to the device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input system 23 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the device, and the output system 24 may include a display device such as a display screen.
The processor 21 executes various functional applications of the device and data processing, i.e. implements the above-described method of large model-based telephony robot interaction acceleration, by running software programs, instructions and modules stored in the memory 22.
The computer equipment provided by the embodiment can be used for executing the method for accelerating the interaction of the telephone robot based on the large model, and has corresponding functions and beneficial effects.
Embodiments of the present invention also provide a storage medium containing computer executable instructions that when executed by a computer processor are used to perform a method of large model based telephony robot interaction acceleration as provided by the above embodiments, the storage medium being any of various types of memory devices or storage devices, including an installation medium, such as a CD-ROM, floppy disk or tape system, a computer system memory or random access memory, such as DRAM, DDR RAM, SRAM, EDO RAM, lanbas (Rambus) RAM, etc., a non-volatile memory, such as flash memory, magnetic media (e.g., hard disk or optical storage), registers or other similar types of memory elements, etc., the storage medium may also include other types of memory, or combinations thereof, in addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system, the second computer system being connected to the first computer system through a network, such as the internet, the second computer system may provide program instructions to the first computer for execution. Storage media includes two or more storage media that may reside in different locations (e.g., in different computer systems connected by a network). The storage medium may store program instructions (e.g., embodied as a computer program) executable by one or more processors.
Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the method for large model-based telephonic robot interaction acceleration described in the above embodiments, and may also perform the related operations in the method for large model-based telephonic robot interaction acceleration provided in any embodiment of the present invention.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1.一种基于大模型的电话机器人交互加速的系统,其特征在于,包括:1. A system for accelerating telephone robot interaction based on a large model, characterized by comprising: 呼叫平台IVR:用于通过电话进行自动化通信服务,在电话机器人对话系统中控制电话通讯,实现诸如电话接听、外呼、转接、放音收号、挂机、转接的电话操作,通过与语音合成引擎和会话管理服务对接,实现完整电话机器人系统;Calling platform IVR: used for automated communication services via telephone, controlling telephone communications in the telephone robot dialogue system, and implementing telephone operations such as answering calls, outbound calls, transfers, playing announcements and collecting numbers, hanging up, and transferring calls. By connecting with the speech synthesis engine and conversation management service, a complete telephone robot system is realized. 语音合成引擎TTS:用于将文本信息转换成听起来自然的语音;Speech synthesis engine TTS: used to convert text information into natural-sounding speech; 大模型LLM:具有大量参数和复杂结构的深度学习模型,用于处理自然语言处理、语音识别领域的大量数据;Large Model (LLM): A deep learning model with a large number of parameters and complex structure, used to process large amounts of data in the fields of natural language processing and speech recognition. 所述呼叫平台IVR与所述语音合成引擎TTS连接;所述大模型LLM与所述语音合成引擎TTS连接。The call platform IVR is connected to the speech synthesis engine TTS; the large model LLM is connected to the speech synthesis engine TTS. 2.根据权利要求1所述的基于大模型的电话机器人交互加速的系统,其特征在于,还包括:2. The system for accelerating telephone robot interaction based on a large model according to claim 1, further comprising: 会话管理服务DM:用于实现人机对话流程的设计,流程中每个节点的提示语、停顿时长、热词和动态模型的参数的配置,语义理解的规则和模型配置,从而实现人机多轮交互的流程控制;Conversation Management Service (DM): This service is used to design the human-computer dialogue process, configure prompts, pause durations, hot words, and dynamic model parameters for each node in the process, and configure semantic understanding rules and models, thereby achieving process control for multi-round human-computer interactions. 所述呼叫平台IVR与所述会话管理服务DM连接;所述会话管理服务DM分别与所述大模型LLM、所述语音合成引擎TTS连接。The call platform IVR is connected to the session management service DM; the session management service DM is connected to the large model LLM and the speech synthesis engine TTS respectively. 3.根据权利要求1所述的基于大模型的电话机器人交互加速的系统,其特征在于,还包括:3. The system for accelerating telephone robot interaction based on a large model according to claim 1, further comprising: 语音识别引擎ASR:用于将人类的语音转换成机器可理解的文本;ASR (Analog Speech Recognition) engine: used to convert human speech into machine-understandable text; 所述语音识别引擎ASR与所述呼叫平台IVR连接。The speech recognition engine ASR is connected to the call platform IVR. 4.一种基于大模型的电话机器人交互加速的方法,应用于如权利要求1-3任一项所述的基于大模型的电话机器人交互加速的系统,其特征在于,包括:4. A method for accelerating telephone robot interaction based on a large model, applied to the system for accelerating telephone robot interaction based on a large model as described in any one of claims 1 to 3, characterized by comprising: 采用语音合成引擎TTS逐个合成会话提示语片段,呼叫平台IVR逐个播放所述会话提示语片段的方式,大模型LLM、语音合成引擎TTS和呼叫平台IVR播放同时工作;当所述大模型LLM返回一个待合成句子结果时,就即刻调用语音合成引擎TTS;当语音合成引擎TTS合成完一个会话提示语片段时,呼叫平台IVR就即刻启动播放。The speech synthesis engine TTS is used to synthesize conversation prompt segments one by one, and the call platform IVR plays the conversation prompt segments one by one. The large model LLM, speech synthesis engine TTS and call platform IVR playback work simultaneously; when the large model LLM returns a sentence result to be synthesized, the speech synthesis engine TTS is immediately called; when the speech synthesis engine TTS completes synthesizing a conversation prompt segment, the call platform IVR immediately starts playing. 5.根据权利要求4所述的基于大模型的电话机器人交互加速的方法,其特征在于,还包括:5. The method for accelerating telephone robot interaction based on a large model according to claim 4, further comprising: 在电话机器人的开放问答场景下,会话管理服务DM调用大模型LLM流式返回回答文本,当会话管理服务DM每接收到大模型LLM生成的一句话时,就即刻通知语音合成引擎TTS进行合成,呼叫平台IVR调用所述语音合成引擎TTS,逐个播放录音片段;所述语音合成引擎TTS的语音合成速度超过所述呼叫平台IVR的播放速度,使得播放正常进行,直至播放到最后一条录音片段时,结束本轮提示语的播放。In the open question-and-answer scenario of the telephone robot, the conversation management service DM calls the large model LLM to stream the answer text. Every time the conversation management service DM receives a sentence generated by the large model LLM, it immediately notifies the speech synthesis engine TTS to synthesize it. The call platform IVR calls the speech synthesis engine TTS to play the recording segments one by one; the speech synthesis speed of the speech synthesis engine TTS exceeds the playback speed of the call platform IVR, so that the playback proceeds normally until the last recording segment is played, ending the playback of this round of prompts. 6.根据权利要求5所述的基于大模型的电话机器人交互加速的方法,其特征在于,所述语音合成引擎TTS的语音合成速度高于人类正常说话的速度。6. The method for accelerating telephone robot interaction based on a large model according to claim 5 is characterized in that the speech synthesis speed of the speech synthesis engine TTS is higher than the normal speaking speed of humans. 7.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现权利要求4-6任一项所述的基于大模型的电话机器人交互加速的方法的步骤。7. A computer-readable storage medium having a computer program stored thereon, characterized in that when the program is executed by a processor, the steps of the method for accelerating telephone robot interaction based on a large model as described in any one of claims 4 to 6 are implemented. 8.一种计算机设备,所述计算机设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求4-6任一项所述的基于大模型的电话机器人交互加速的方法的步骤。8. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the steps of the method for accelerating telephone robot interaction based on a large model as described in any one of claims 4 to 6 are implemented.
CN202510359840.5A 2025-03-25 2025-03-25 System and method for telephone robot interaction acceleration based on large model Pending CN120434333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510359840.5A CN120434333A (en) 2025-03-25 2025-03-25 System and method for telephone robot interaction acceleration based on large model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510359840.5A CN120434333A (en) 2025-03-25 2025-03-25 System and method for telephone robot interaction acceleration based on large model

Publications (1)

Publication Number Publication Date
CN120434333A true CN120434333A (en) 2025-08-05

Family

ID=96553336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510359840.5A Pending CN120434333A (en) 2025-03-25 2025-03-25 System and method for telephone robot interaction acceleration based on large model

Country Status (1)

Country Link
CN (1) CN120434333A (en)

Similar Documents

Publication Publication Date Title
CN111128126B (en) Multi-language intelligent voice conversation method and system
US8694324B2 (en) System and method of providing an automated data-collection in spoken dialog systems
WO2021051506A1 (en) Voice interaction method and apparatus, computer device and storage medium
US7644000B1 (en) Adding audio effects to spoken utterance
Ward Using prosodic clues to decide when to produce back-channel utterances
CN109977218B (en) A kind of automatic answering system and method applied to session operational scenarios
US7490042B2 (en) Methods and apparatus for adapting output speech in accordance with context of communication
CN111294471B (en) Intelligent telephone answering method and system
CN112201222B (en) Voice interaction method, device, equipment and storage medium based on voice call
JP7713113B2 (en) Generalized automatic speech recognition for integrated acoustic echo cancellation, speech enhancement, and voice separation.
CN111462726B (en) Method, device, equipment and medium for answering out call
KR20210123545A (en) Method and apparatus for conversation service based on user feedback
KR20230007502A (en) Hotword-free preemption of automated assistant response presentations
CN111611407A (en) Customer service interaction method, customer service interaction device, storage medium and equipment
KR20240033265A (en) Joint acoustic echo cancellation, speech enhancement, and speech separation for automatic speech recognition
CN111696576A (en) Intelligent voice robot talk test system
JP7426917B2 (en) Program, device and method for interacting with a user according to multimodal information around the user
CN120434333A (en) System and method for telephone robot interaction acceleration based on large model
CN110534084B (en) Intelligent voice control method and system based on FreeWITCH
CN120238609A (en) A system and method for realizing telephone robot conversation status at the engine level
CN120263899A (en) A system and method for implementing semantic interruption of telephone robots at the engine level
CN118245008B (en) Intelligent voice interaction method for 3D digital human
KR102840099B1 (en) Method and system for automatic back-channel generation in interactive agent system
EA047172B1 (en) METHOD FOR TRAINING A VOICE ROBOT
CN117690460A (en) Method for improving response speed of voice dialogue system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination