CN111462726B

CN111462726B - Method, device, equipment and medium for answering out call

Info

Publication number: CN111462726B
Application number: CN202010235873.6A
Authority: CN
Inventors: 张晨
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2020-03-30
Filing date: 2020-03-30
Publication date: 2023-08-22
Anticipated expiration: 2040-03-30
Also published as: CN111462726A

Abstract

The embodiment of the invention discloses an external call response method, an external call response device and a medium, wherein the method comprises the following steps: when an external call response instruction is triggered, obtaining voice data to be responded corresponding to the external call response instruction; semantic understanding is carried out on the voice data to be responded, and a target intention corresponding to the voice data to be responded is obtained; and determining a target response strategy corresponding to the voice data to be responded according to the target intention, and responding according to the target response strategy. According to the outbound response method provided by the embodiment of the invention, through carrying out intention recognition on the voice data to be responded and carrying out response according to the recognition result, the outbound process is automatically completed, and the outbound efficiency is improved.

Description

Method, device, equipment and medium for answering out call

Technical Field

The embodiment of the invention relates to the technical field of communication, in particular to an external call response method, an external call response device, external call response equipment and a medium.

Background

With the rapid development of communication technology, outbound services are widely used in various fields: in the education and training industry, the related course information can be quickly and effectively transmitted to clients by using outbound calls; in the financial industry, the outbound call can be used for the scenes of telephone collection, repayment reminding, banking outbound call and the like. The traditional outbound system needs to be outbound through a manual seat, a great deal of labor cost is often required, and outbound efficiency is unstable.

Disclosure of Invention

The embodiment of the invention provides an outbound response method, an outbound response device, outbound response equipment and an outbound response medium, so as to realize automatic completion of an outbound flow and improve outbound efficiency.

In a first aspect, an embodiment of the present invention provides an outbound response method, including:

when an external call response instruction is triggered, acquiring voice data to be responded corresponding to the external call response instruction when the external call response instruction is triggered;

semantic understanding is carried out on the voice data to be responded, and a target intention corresponding to the voice data to be responded is obtained;

and determining a target response strategy corresponding to the voice data to be responded according to the target intention, and responding according to the target response strategy.

In a second aspect, an embodiment of the present invention further provides an external call response apparatus, including:

the voice to be responded acquisition module is used for acquiring voice data to be responded corresponding to the external call response instruction when the external call response instruction is triggered and when the external call response instruction is triggered;

the target intention determining module is used for carrying out semantic understanding on the voice data to be responded to and obtaining target intention corresponding to the voice data to be responded to;

and the external call response module is used for determining a target response strategy corresponding to the voice data to be responded according to the target intention and responding according to the target response strategy.

In a third aspect, an embodiment of the present invention further provides a computer apparatus, including:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the outbound response method as provided by any embodiment of the present invention.

In a fourth aspect, embodiments of the present invention also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an outbound response method as provided by any of the embodiments of the present invention.

According to the embodiment of the invention, when the external call response instruction is triggered, the voice data to be responded corresponding to the external call response instruction is obtained when the external call response instruction is triggered; semantic understanding is carried out on the voice data to be responded, and a target intention corresponding to the voice data to be responded is obtained; and determining a target response strategy corresponding to the voice data to be responded according to the target intention, and responding according to the target response strategy, so that the outbound flow is automatically completed, and the outbound efficiency is improved.

Drawings

Fig. 1 is a flowchart of an outbound method according to a first embodiment of the present invention;

fig. 2 is a schematic illustration of an outbound procedure according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of an outbound device according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1 is a flowchart of an outbound method according to an embodiment of the present invention. The present embodiment is applicable to the case when an outbound call is made. The method may be performed by an outbound device, which may be implemented in software and/or hardware, e.g., which may be configured in a computer apparatus. As shown in fig. 1, the method includes:

s110, when the external call response instruction is triggered, obtaining voice data to be responded corresponding to the external call response instruction.

In this embodiment, the external call response instruction may be triggered by voice information input by the user. Optionally, in the process of outbound call, the user can input voice information to trigger an outbound call response instruction, and the user input voice information is to-be-responded voice data corresponding to the outbound call response instruction. Illustratively, when the outbound call is "system: is user a? The user: when yes, the voice information input by the user is yes, namely, the external call response instruction is triggered, and the voice information is used as voice data to be responded.

S120, carrying out semantic understanding on the voice data to be responded to obtain a target intention corresponding to the voice data to be responded to.

In this embodiment, after the voice data to be responded is obtained, the user intention corresponding to the voice data to be responded is identified as the target intention. Optionally, the identifying the user intention corresponding to the voice data to be responded may be that text conversion is performed on the voice data to be responded to obtain text information corresponding to the voice data to be responded to, semantic understanding is performed on the text information, and the target intention corresponding to the voice data to be responded to is determined according to a semantic understanding result.

In an embodiment of the present invention, the semantic understanding of the voice data to be responded to, to obtain a target intention corresponding to the voice data to be responded, includes: performing text conversion on the voice data to be responded to obtain text information corresponding to the voice data to be responded; and inputting the text information into a pre-trained intention recognition model to obtain the target intention output by the intention recognition model. Optionally, the manner of performing text conversion on the voice data to be responded is not limited again, so long as the voice data to be responded can be converted into text information, and after the text information corresponding to the voice data to be responded is obtained, intention recognition is performed through a pre-trained intention recognition model. In this embodiment, different outbound flows may be constructed according to outbound purposes, and different training samples are used to train corresponding intent recognition models for different outbound flows. That is, the intention recognition model can be obtained according to the flow identifier corresponding to the current outbound call. Corresponding intention recognition models are trained aiming at different outbound flows, so that the recognition results of the intention recognition models are more attached to the outbound flows, and the intention recognition results are more accurate.

S130, determining a target response strategy corresponding to the voice data to be responded according to the target intention, and responding according to the target response strategy.

In this embodiment, after determining the target intention of the user, a target response policy of the voice data to be responded is determined according to the target intention and the current outbound flow. Optionally, the outbound procedure may include a plurality of steps, and the target answer policy corresponding to the target intention may be determined according to preset procedure answer logic and a step corresponding to the voice data to be answered.

In one embodiment of the present invention, the responding according to the target response policy includes: acquiring at least one response sub-content contained in the target response strategy and a content type corresponding to the response sub-content; and generating target response voice information according to the response sub-content and the content type corresponding to the response sub-content, and playing the target response voice information. Optionally, a plurality of answer sub-contents may be predefined, and the answer contents are formed by splicing a plurality of answer word contents, so as to improve reusability of the answer sub-contents. For each reply sub-content, a corresponding content type can be set for identifying the storage mode of the reply word content. For example, if the answer sub-content is stored in an audio manner, the content type thereof may be set to a voice type, and if the answer sub-content is stored in a text manner, the content type thereof may be set to a text type. In this embodiment, after determining the target response policy, the response sub-content identifier included in the target response policy is obtained, and the response sub-content and the content type corresponding to the response sub-content are obtained according to the response sub-content identifier.

On the basis of the above scheme, the generating the target response voice information according to the response sub-content and the content type corresponding to the response sub-content includes: and generating sub-response voice information corresponding to the response sub-content according to the content type corresponding to the response sub-content aiming at each response sub-content, combining the sub-response voice information to generate target response voice information, and playing the target response voice information. In this embodiment, when the content types corresponding to the response sub-content are different, the manner of generating the sub-response voice information corresponding to the response sub-content is also different. After obtaining response sub-content and content types corresponding to the response sub-content contained in the target response strategy, generating sub-response voice information corresponding to the response sub-content according to the content types corresponding to the response sub-content for each response sub-content, splicing the sub-response voice information to obtain target response voice information, and playing the target response voice information to finish the response of the voice data to be responded. For example, it is assumed that the target answer policy includes answer sub-content 1 "your good" and answer sub-content 2 "if you temporarily do not answer, we will call again later, please keep communication smooth", then corresponding sub-answer voice information 1 "your good" is generated for answer sub-content 1, and corresponding sub-answer voice information 2 "if you temporarily do not answer, we will call again later, please keep communication smooth", then splice sub-answer voice information 1 and sub-answer voice information 2 to obtain target answer voice information "your good", if you temporarily do not answer, we will call again later, please keep communication smooth ", and play it.

In one embodiment of the present invention, the content type includes a voice type, and the generating sub-answer voice information corresponding to the answer sub-content according to the content type corresponding to the answer sub-content includes: and calling a setting path to acquire voice information corresponding to the response sub-content, and taking the voice information as sub-response voice information corresponding to the response sub-content. Optionally, the content type corresponding to the reply sub-content includes a voice type, which indicates that the reply sub-content is stored in an audio form. It can be understood that when the content type corresponding to the response sub-content is a voice type, the path corresponding to the response sub-content is directly called to obtain the pre-stored voice information, and the obtained voice information is used as the sub-response voice information corresponding to the response sub-content.

In one embodiment of the present invention, the content type includes a text type, and the generating sub-answer voice information corresponding to the answer sub-content according to the content type corresponding to the answer sub-content includes: and obtaining text information corresponding to the response sub-content, performing voice synthesis on the text information to obtain voice information corresponding to the text information, and taking the voice information as sub-response voice information corresponding to the response sub-content. Optionally, the content type corresponding to the reply sub-content may further include a text type, which indicates that the reply sub-content is stored in a text form. When the content type corresponding to the response sub-content is text type, the response sub-content in text form needs to be subjected to voice synthesis, and voice information obtained by voice synthesis is used as sub-response voice information corresponding to the response sub-content.

On the basis of the scheme, the method further comprises the following steps: and acquiring the unanswered time of the user, generating overtime response information when the unanswered time is larger than a set overtime threshold value, and outputting the overtime response information. Optionally, when the outbound call is performed, when the answering node is at the user side, the unanswered time of the user can be detected in real time, and when the unanswered time of the user exceeds a preset timeout threshold, timeout response information is generated according to a set timeout strategy, and the timeout response information is played so as to prompt the user to answer. The timeout response information may be information such as repeating a response problem of waiting for a user to answer, prompting the user to answer, or entering other response links. And generating and outputting timeout response information when the user does not answer for a long time through the timeout strategy, so that timeliness of outbound call is ensured, and outbound call efficiency is improved.

Example two

This embodiment provides a preferred embodiment on the basis of the above-described embodiments. The outbound response method provided by the embodiment can be executed by an outbound system. Optionally, the outbound system comprises five modules of speech recognition, a flow engine, semantic understanding, a speech engine and speech synthesis. The speech recognition may be any general speech recognition technique.

The flow engine may fully define the entire outbound flow. In this embodiment, the flow engine includes concepts of flow, links, and steps. It is understood that multiple flows may be created in the flow engine. One flow represents an intelligent outbound policy, with outbound content being based on the content of the policy. One flow includes a plurality of links, and one link (Section) includes a plurality of steps (Step), one Step being one or more rounds of interactions between the client and the outbound system. And a timeout mechanism is set for the flow taking into account the timeliness of outgoing calls. By way of example, an xml file may be used to define a flow. The Process can be defined as a flow, the Process comprises a type attribute, the value of the type attribute is the service type of the flow, the unique identifier of the flow is defined by using the flow id, and the Name of the flow is defined by using the Name; and defines the Timeout attribute of the flow, i.e., global Timeout information of the entire flow. For example, when the client does not speak for a certain time, calculating a timeout, and defining the timeout times by using a Count attribute; the step-ref attribute is used to define the step to jump to when timeout occurs. In the definition of links, a Section attribute is used to define a link, and the Start attribute of the Section is the Start link of each session. The id attribute is the unique identifier of a link in a certain flow. The Name attribute is the Name of the link. The Timeout attribute is the unique Timeout information for this link. The attribute is optional, if not defined, global timeout information is used, and if defined, global timeout information is overridden. In the definition of the step, a step definition step is used, wherein the start attribute is the initial step in the starting link, the id attribute is the unique identifier in the link, the name is the name, the Driver attribute is the driving type of the step, the directDriver is the direct driving, and the value is the value of the step to which the Driver directly jumps; the engineDriver is driven by the engine. At this point, the engineStack attribute needs to be defined. engineStack is a description of the semantic understanding engine. One to more engine attributes are contained therein, the value of which is the id of the classifier.

The semantic understanding engine is formed by combining a plurality of machine learning models and a deep learning model. In this embodiment, the intent expressed in the client statement is understood by the semantic understanding strategy of the autonomous design. Wherein the semantic understanding engine is composed of a plurality of model groups. Different model sets can be set for different links in the flow engine, and each model set comprises a plurality of machine learning or deep learning models. It should be noted that, the machine learning model or the deep learning model for semantic understanding must be a text classification model.

The speech engine performs speech storage through a pre-designed set of speech storage structures and speech assembly policies. Considering that the current text-to-speech technology is not mature enough and has a certain gap from the occurrence of a real person, in this embodiment, a speech section can be defined in a speech engine, and a complete speech is formed by a plurality of speech sections. And the dialog segments define a plurality of types, such as "text", "sound recording", etc. The speech synthesis engine will choose the way in which the speech is synthesized based on the different types. It will be appreciated that defining multiple speech segments may improve the reusability of the speech segments. Illustratively, the speech engine may be defined as: callscript is the complete speech information under a certain scene, the type value is the scene name, and the speech information defined in the flow engine is the type. Wherein the Call-scripts comprise a plurality of Call-scripts. Id is a unique mark in a certain scene call in call-script; name is the name. Type is the default Type for this session. The speech content is defined in the speech segments as segments. One segment contains a plurality of segments, each segment is a section, and when the attribute of the section is Text attribute, the segment is Text content thereof.

Fig. 2 is a schematic illustration of an outbound call flow provided in a second embodiment of the present invention, as shown in fig. 2, when an outbound call is performed, voice data is accessed into the system, and first, voice information is converted into text information through a voice recognition module; then, according to the current flow engine, processing logic of the current flow is obtained, and a corresponding semantic understanding engine is called; returning to the flow engine, determining the next flow, outputting to the conversation engine, and assembling the corresponding conversation; finally, synthesizing the voice replied to the client through a voice synthesis engine.

The embodiment of the invention adopts a flexible configuration method to configure outbound flows, voice operation storage and the like, determines outbound logic through the outbound flows designed in the flow engine, carries out semantic understanding on voice information of clients through the semantic understanding engine, responds to voice information synthesized by a voice operation storage structure and method stored in advance in the voice operation engine and a voice operation assembly strategy after determining the intention of a user, constructs an intelligent outbound system with strong identification capability, and improves the working efficiency of outbound.

Example III

Fig. 3 is a schematic structural diagram of an external call answering device according to a third embodiment of the present invention. The outbound means may be implemented in software and/or hardware, for example, the outbound means may be configured in a computer device. As shown in fig. 3, the apparatus includes a voice to answer acquisition module 310, a target intention determination module 320, and an external call answer module 330, in which:

the to-be-responded voice acquisition module 310 is configured to acquire to-be-responded voice data corresponding to an external call response instruction when the external call response instruction is triggered;

the target intention determining module 320 is configured to perform semantic understanding on the voice data to be responded, and obtain a target intention corresponding to the voice data to be responded;

and the external call response module 330 is configured to determine a target response policy corresponding to the voice data to be responded according to the target intention, and respond according to the target response policy.

According to the embodiment of the invention, when the external call response instruction is triggered, the to-be-responded voice data corresponding to the external call response instruction is acquired through the to-be-responded voice acquisition module; the target intention determining module carries out semantic understanding on the voice data to be responded to, and obtains a target intention corresponding to the voice data to be responded to; the outbound response module determines a target response strategy corresponding to the voice data to be responded according to the target intention, and responds according to the target response strategy, so that an outbound flow is automatically completed, and outbound efficiency is improved.

Optionally, based on the above scheme, the outbound response module 330 is specifically configured to:

acquiring at least one response sub-content contained in the target response strategy and a content type corresponding to the response sub-content;

and generating target response voice information according to the response sub-content and the content type corresponding to the response sub-content, and playing the target response voice information.

and generating sub-response voice information corresponding to the response sub-content according to the content type corresponding to the response sub-content aiming at each response sub-content, combining the sub-response voice information to generate target response voice information, and playing the target response voice information.

Optionally, on the basis of the above solution, the content type includes a voice type, and the external call response module 330 is specifically configured to:

and calling a setting path to acquire voice information corresponding to the response sub-content, and taking the voice information as sub-response voice information corresponding to the response sub-content.

Optionally, on the basis of the above solution, the content type includes a text type, and the outbound response module 330 is specifically configured to:

and obtaining text information corresponding to the response sub-content, performing voice synthesis on the text information to obtain voice information corresponding to the text information, and taking the voice information as sub-response voice information corresponding to the response sub-content.

Optionally, on the basis of the above solution, the target intention determining module 320 is specifically configured to:

performing text conversion on the voice data to be responded to obtain text information corresponding to the voice data to be responded;

and inputting the text information into a pre-trained intention recognition model to obtain the target intention output by the intention recognition model.

Optionally, on the basis of the above scheme, the device further includes a timeout response module, configured to:

and acquiring the unanswered time of the user, generating overtime response information when the unanswered time is larger than a set overtime threshold value, and outputting the overtime response information.

The external call response device provided by the embodiment of the invention can execute the external call response method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example IV

Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. Fig. 4 illustrates a block diagram of an exemplary computer device 412 suitable for use in implementing embodiments of the invention. The computer device 412 shown in fig. 4 is only an example and should not be construed as limiting the functionality and scope of use of embodiments of the invention.

As shown in FIG. 4, computer device 412 is in the form of a general purpose computing device. Components of computer device 412 may include, but are not limited to: one or more processors 416, a system memory 428, and a bus 418 that connects the various system components (including the system memory 428 and the processors 416).

Bus 418 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor 416, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 412 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 412 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 428 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 430 and/or cache memory 432. The computer device 412 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage 434 may be used to read from or write to non-removable, non-volatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard disk drive"). Although not shown in fig. 4, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 418 via one or more data medium interfaces. Memory 428 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.

A program/utility 440 having a set (at least one) of program modules 442 may be stored in, for example, memory 428, such program modules 442 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 442 generally perform the functions and/or methodologies in the described embodiments of the invention.

The computer device 412 may also communicate with one or more external devices 414 (e.g., keyboard, pointing device, display 424, etc.), one or more devices that enable a user to interact with the computer device 412, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 412 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 422. Moreover, computer device 412 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 420. As shown, network adapter 420 communicates with other modules of computer device 412 over bus 418. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computer device 412, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processor 416 executes various functional applications and data processing by running programs stored in the system memory 428, such as implementing the outbound response method provided by embodiments of the present invention, the method comprising:

when an external call response instruction is triggered, obtaining voice data to be responded corresponding to the external call response instruction;

Of course, those skilled in the art will understand that the processor may also implement the technical solution of the external call response method provided by any embodiment of the present invention.

Example five

The fifth embodiment of the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the outbound response method provided by the embodiments of the present invention, the method comprising:

Of course, the computer-readable storage medium provided by the embodiments of the present invention, on which the computer program stored, is not limited to the method operations described above, but may also perform the related operations of the external call response method provided by any of the embodiments of the present invention.

The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. An outbound response method, comprising:

determining a target response strategy corresponding to the voice data to be responded according to the target intention, and responding according to the target response strategy;

the semantic understanding of the voice data to be responded to the target intention corresponding to the voice data to be responded to the target intention comprises the following steps:

inputting the text information into a pre-trained intention recognition model to obtain the target intention output by the intention recognition model; the intention recognition model is obtained by constructing different outbound flows according to outbound purposes and correspondingly training different training samples aiming at the different outbound flows;

wherein, the responding according to the target response strategy comprises:

2. The method of claim 1, wherein the content type includes a voice type, and wherein the generating sub-answer voice information corresponding to the answer sub-content according to the content type corresponding to the answer sub-content includes:

3. The method of claim 1, wherein the content type includes a text type, and the generating sub-answer speech information corresponding to the answer sub-content according to the content type corresponding to the answer sub-content includes:

4. The method as recited in claim 1, further comprising:

5. An external call response device, comprising:

the voice to be responded acquisition module is used for acquiring voice data to be responded corresponding to the external call response instruction when the external call response instruction is triggered;

the external call response module is used for determining a target response strategy corresponding to the voice data to be responded according to the target intention and responding according to the target response strategy;

the target intention determining module is specifically configured to perform text conversion on the voice data to be responded to, so as to obtain text information corresponding to the voice data to be responded to; inputting the text information into a pre-trained intention recognition model to obtain the target intention output by the intention recognition model;

the intention recognition model is obtained by constructing different outbound flows according to outbound purposes and correspondingly training different training samples aiming at the different outbound flows;

the external call response module is specifically configured to:

6. A computer device, the device comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the outbound response method of any of claims 1-4.

7. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the external call response method according to any of claims 1-4.