CN111435981B

CN111435981B - Call processing method and device

Info

Publication number: CN111435981B
Application number: CN201910028456.1A
Authority: CN
Inventors: 杨德波; 严宝亮; 章永伟
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-01-11
Filing date: 2019-01-11
Publication date: 2021-06-08
Anticipated expiration: 2039-01-11
Also published as: CN111435981A

Abstract

The present application provides a call processing method and device. The method includes: when a first server receives a trigger control request sent by a core network device, according to address information of the second server, send a site creation request for instructing the second server to create a site to the second server, and the trigger control request uses The core network device instructs the first server to control the current call when it detects that the calling party and/or the called party needs a third party to participate in the current call. When the first server receives the site creation response sent by the second server, and controls the calling party, the called party, and a third party to join the site, the first server obtains the calling party's voice information and the called party's voice information through the site, and creates a site. The response is used to indicate that the second server has established a conference site. The first server executes an operation corresponding to the target keyword according to the target keyword identified from the calling party's voice information and the called party's voice information by the device corresponding to the third party to meet various needs of the user.

Description

Call processing method and device

Technical Field

The present application relates to the field of communications technologies, and in particular, to a call processing method and apparatus.

Background

With the continuous development of communication technology, the conversation between users is increasingly generalized. However, security and compatibility of the call have been factors affecting call performance. At present, there are user demands for preventing communication fraud and realizing such things as ordering food, inquiring weather and translating user's communication with voice in the communication process. Therefore, a call processing method that can satisfy various needs of users is needed.

Disclosure of Invention

The application provides a call processing method and a call processing device, wherein when a first server detects that a calling party and/or a called party needs a third party to participate in a current call through core network equipment, the first server can indicate a second server to establish a meeting place of the calling party, the called party and the third party, and further, when the calling party, the called party and the third party join the meeting place, the first server can detect the call process of the calling party, the called party and the third party through the meeting place so as to obtain voice information of the calling party and voice information of the called party. And then, the first server executes the operation corresponding to the target keyword according to the target keyword which is identified by the equipment corresponding to the third party from the voice information of the calling party and the voice information of the called party, namely, the user requirement corresponding to the target keyword is met.

In a first aspect, the present application provides a call processing method, including:

when receiving a trigger control request sent by core network equipment, a first server sends a meeting place establishing request to a second server according to address information of the second server, wherein the meeting place establishing request is used for indicating the second server to establish a meeting place, and the trigger control request is used for indicating the first server to control the current conversation when the core network equipment detects that a calling party and/or a called party needs a third party to participate in the current conversation;

the first server acquires the voice information of the calling party and the voice information of the called party through the meeting place when receiving a meeting place establishing response sent by the second server and controlling the calling party, the called party and a third party to join the meeting place, wherein the meeting place establishing response is used for indicating that the second server establishes the meeting place;

and the first server executes the operation corresponding to the target keyword according to the target keyword which is identified by the equipment corresponding to the third party from the voice information of the calling party and the voice information of the called party.

By the call processing method provided by the first aspect, when it is detected that the calling party and/or the called party needs the third party to participate in the current call, the core network device may send a trigger control request indicating the first server to control the current call to the first server. When the first server receives the trigger control request, a meeting place creating request indicating that the second server creates a meeting place can be sent to the second server according to the address information of the second server, so that the second server creates the meeting place and informs the first server that the second server establishes the meeting place through a meeting place creating response. Furthermore, when the first server receives the meeting place establishing response and controls the calling party, the called party and the third party to join the meeting place, the first server can acquire the voice information of the calling party and the voice information of the called party in real time or irregularly, and can execute the operation corresponding to the target keyword according to the target keyword identified by the equipment corresponding to the third party from the voice information of the calling party and the voice information of the called party, so that various requirements of the user are met in real time in the conversation process, the conversation performance of the user is improved, and the conversation experience of the user is improved.

The device used by the calling party is a device corresponding to the calling party, the device used by the called party is a device corresponding to the called party, and the device used by the third party is a device corresponding to the third party.

The first device and the second device may be terminals, which may be wireless terminals or wired terminals, and the wireless terminals may be devices providing voice and/or other service data connectivity to users, handheld devices having a wireless connection function, or other processing devices connected to a wireless modem. Wireless terminals, which may be mobile terminals such as mobile telephones (or "cellular" telephones) and computers with mobile terminals, such as portable, pocket, hand-held, computer-included, or vehicle-mounted mobile devices, may communicate with one or more core networks via the RAN, and may exchange language and/or data with a radio access network. For example, Personal Communication Service (PCS) phones, cordless phones, Session Initiation Protocol (SIP) phones, Wireless Local Loop (WLL) stations, Personal Digital Assistants (PDAs), and the like. A wireless Terminal may also be referred to as a system, a Subscriber Unit (Subscriber Unit), a Subscriber Station (Subscriber Station), a Mobile Station (Mobile), a Remote Station (Remote Station), a Remote Terminal (Remote Terminal), an Access Terminal (Access Terminal), a User Terminal (User Terminal), a User Agent (User Agent), and a User Device or User Equipment (User Equipment), which are not limited herein.

The third device has a voice recognition function, and may specifically be a terminal separately provided or a terminal managed by the second server, and the terminal and the second server have a communication connection therebetween, for example, the terminal may be a terminal device, or may also be an Application (APP) in the terminal device, a web page, a public number, or the like, or may also be a combination of the above two manners. Further, the third device may also be a server.

In the present application, the core network device includes, but is not limited to, an operator core network device, and the core network device may detect communication services handled by a calling party and a called party. The method and the device can distinguish different communication services in the forms of numbers, codes or identifications and the like, and are convenient for the core network equipment to accurately detect the communication services. Generally, the core network device establishes communication connections with both the first device and the second device.

In this application, the second server may provide one meeting place or a plurality of meeting places at the same time. Each conference place has its own conference place Identification (ID), and different conference places can provide different media information to the calling party, the called party and the third party, where the media information may include, but is not limited to, a unique conference place ID, a port number of the first device, a port number of the second device, a port number of the third device, a coding and decoding manner of the call service, and a media type (such as a video type or a voice type) of the call service. The second Server may include, but is not limited to, a Session Server, which may be a Session Initiation Protocol (SIP) based Session Server Markup Language (MSML). In addition, the first server and the second server may be generally separately provided.

In one possible design, the first server performs, according to a target keyword recognized by the device corresponding to the third party from the voice information of the calling party and the voice information of the called party, at least one of the following operations corresponding to the target keyword, including:

when the first server determines that the target keyword comprises a fraud keyword, the first server reminds the calling party and the called party of the current conversation as a fraud conversation through the conference place;

when the first server determines that the target keywords comprise the booking keywords, executing booking operation and reminding the booking result to the calling party and the called party through the meeting place by voice;

when the first server determines that the target keywords comprise query keywords, executing query operation and prompting query results to the calling party and the called party through the meeting place by voice;

when the first server determines that the target keywords comprise translation keywords, executing translation operation and reminding a translation result to the calling party and the called party through the meeting place by voice; and the combination of (a) and (b),

and when the first server determines that the target keywords comprise other keywords, executing other operations and carrying out voice reminding operation results to the calling party and the called party through the meeting place.

In one possible design, the first server determining that the target keyword comprises a fraud keyword comprises:

the first server receives text information sent by equipment corresponding to the third party, wherein the text information is obtained by identifying the received voice information after sound mixing sent by the second server when the equipment corresponding to the third party does not carry out communication between the calling party and the called party within a preset time length, and the voice information after sound mixing is obtained by carrying out sound mixing on the voice information of the calling party and the voice information of the called party by the second server;

the first server determines whether the text message includes the fraud keyword.

In one possible design, the first server determines whether the text message includes the fraud keyword, including:

the first server matches the text information in a fraud keyword database to determine whether the fraud keyword is included in the text information.

In one possible design, the first server voice alerts the calling party and the called party that the current call is a fraud call, comprising:

the first server records the occurrence frequency of the fraud keywords in a preset time period;

the first server judges whether the current call is a fraud call according to the occurrence frequency of the fraud keyword;

and when the current call is determined to be a fraud call, the first server reminds the calling party and the called party that the current call is a fraud call.

In one possible design, the method further includes:

and the first server stores the text information.

In one possible design, the current call is a voice call or a video call.

In a second aspect, the present application provides a call processing apparatus, where the call processing module is applied to a server, and includes:

a sending module, configured to send a meeting place creating request to another server according to address information of the other server when a receiving module receives a trigger control request sent by a core network device, where the meeting place creating request is used to instruct the other server to create a meeting place, and the trigger control request is used for instructing, when the core network device detects that a calling party and/or a called party needs a third party to participate in a current call, the server to control the current call;

an obtaining module, configured to obtain, when the receiving module receives a meeting place creation response sent by the another server and a joining module controls the calling party, the called party, and a third party to join the meeting place, voice information of the calling party and voice information of the called party through the meeting place, where the meeting place creation response is used to indicate that the another server has created the meeting place;

and the execution module is used for executing the operation corresponding to the target keyword according to the target keyword which is identified by the equipment corresponding to the third party from the voice information of the calling party and the voice information of the called party.

In a possible design, the executing module is specifically configured to execute, according to a target keyword recognized by the device corresponding to the third party from the voice information of the calling party and the voice information of the called party, at least one of the following operations corresponding to the target keyword, including:

when the determining module determines that the target keyword comprises a fraud keyword, the current conversation is reminded to the calling party and the called party through the conference place in a voice mode to be a fraud conversation;

when the determining module determines that the target keywords comprise the booking keywords, executing booking operation and reminding the booking result to the calling party and the called party through the meeting place by voice;

when the determining module determines that the target keywords comprise query keywords, executing query operation and prompting query results to the calling party and the called party through the meeting place by voice;

when the determining module determines that the target keywords comprise translation keywords, executing translation operation and reminding a translation result to the calling party and the called party through the meeting place by voice; and the combination of (a) and (b),

and when the determining module determines that the target keywords comprise other keywords, executing other operations and carrying out voice reminding operation results to the calling party and the called party through the meeting place.

In a possible design, the determining module is specifically configured to receive text information sent by the device corresponding to the third party, where the text information is obtained by the device corresponding to the third party recognizing the received voice information after audio mixing sent by the other server when the calling party and the called party do not communicate within a preset time period, and the voice information after audio mixing is obtained by the other server mixing the voice information of the calling party and the voice information of the called party; and judging whether the text information contains the fraud keywords.

In one possible design, the determining module is to match the text information in a fraud keyword database to determine whether the fraud keyword is included in the text information.

In one possible design, the execution module is configured to record the number of occurrences of the fraud keyword within a preset time period; judging whether the current call is a fraud call or not according to the occurrence frequency of the fraud keywords; and when the current call is determined to be a fraud call, reminding the calling party and the called party that the current call is a fraud call.

In one possible design, the apparatus further includes: a storage module;

and the storage module is used for storing the character information.

In one possible design, the current call is a voice call or a video call.

The beneficial effects of the call processing apparatus provided in the second aspect and each possible design of the second aspect may refer to the beneficial effects brought by each possible implementation manner of the first aspect, and are not described herein again.

In a third aspect, the present application provides a call processing apparatus, including: a communication interface, a memory for storing program instructions, and a processor for calling the program instructions in the memory to execute the call processing method according to the first aspect and any one of the possible designs of the first aspect.

In a fourth aspect, the present application provides a readable storage medium, where an execution instruction is stored, and when at least one processor of the server executes the execution instruction, the server executes the call processing method in any one of the possible designs of the first aspect and the first aspect.

In a fifth aspect, the present application provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the server may read the executable instructions from the readable storage medium, and the execution of the executable instructions by the at least one processor causes the server to implement the call processing method according to the first aspect and any one of the possible designs of the first aspect.

In a sixth aspect, the present application provides a chip, where the chip is connected to a memory, or a memory is integrated on the chip, and when a software program stored in the memory is executed, the method for processing a call as described above is implemented.

Drawings

FIG. 1 is a schematic diagram of a call system;

fig. 2 is a flowchart of a call processing method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a call processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a call processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a call processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic hardware structure diagram of a call processing device according to an embodiment of the present application.

Detailed Description

In the embodiment of the present application, "and/or" describes an association relationship of associated objects, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Fig. 1 is a schematic diagram of a call system, as shown in fig. 1, the call system may include: the device comprises a first server, core network equipment, a second server, equipment corresponding to a calling party, equipment corresponding to a called party and equipment corresponding to a third party.

As will be understood by those skilled in the art, during a call, a calling party refers to a calling party who actively initiates a call, a called party refers to a calling party who passively answers a call, and a third party refers to an intelligent robot. The number of the calling party, the called party and the third party is not limited, and the number of the calling party, the called party and the third party can be one or more. In general, a calling party and a called party can handle various call services, each call service can meet different requirements of the calling party and the called party, and different call services can be provided for the calling party and the called party conveniently. The call service may include, but is not limited to, a color ring service, a short message service, a multimedia message service, a traffic service, and other services. In order to distinguish the types of the call services, the call services are divided into two types, namely, call services needing participation of a third party and call services not needing participation of the third party.

In the present application, the device used by the calling party is a device corresponding to the calling party, the device used by the called party is a device corresponding to the called party, and the device used by the third party is a device corresponding to the third party. For convenience of explanation, a device corresponding to a calling party is simply referred to as a first device, a device corresponding to a called party is simply referred to as a second device, and a device corresponding to a third party is simply referred to as a third device.

When the calling party calls the passive party, the calling party can dial the number of the called party through the first device and initiate a call request, and at the moment, the core network device can receive the call request initiated by the first device. Usually, the call request includes a number of a calling party and a number of a called party, and the core network device may detect whether the calling party and the called party transact a call service requiring a third party to participate according to the number of the calling party and the number of the called party.

When the core network device determines that the calling party and the called party are not detected to transact the call service needing the participation of the third party, the core network device can be directly communicated with the first call channel between the first device and the second device, and the process that the calling party successfully calls the called party is achieved.

When the core network device determines that any one of the calling party and the called party transacts the call service needing participation of the third party, the core network device establishes communication connection with the first server, so that the core network device can initiate a trigger control request to the first server, wherein the trigger control request is used for requesting the first server to control the call between the calling party and the called party, and the process that the calling party, the called party and the third party can participate in the call at the same time is realized. When the first server receives the trigger control request, on one hand, the first server may send a first call connection request to the core network device, where the first call connection request is used to request the core network device to connect a first call channel between the first device and the second device, so as to implement a process in which a calling party successfully calls a called party. When the core network device is connected to the first call channel between the first device and the second device, the core network device may send a first call connection response indicating that the core network device has connected to the first call channel between the first device and the second device to the first server, and then the first server sends a successful response indicating that the trigger control response has been received to the core network device. On the other hand, because the call service between the active party and the passive party needs the participation of a third party, the first server can provide a virtual place, namely a meeting place, of the call service to the calling party, the called party and the third party by means of the second server, so that the calling party, the called party and the third party can participate in the call at the same time. And because the first server and the second server are connected in a communication mode, the first server can send a meeting place creating request to the second server, and the meeting place creating request is used for requesting the second server to establish a meeting place.

At the end of the meeting place creation, the first server needs to control the calling party, the called party and the third party to join the meeting place.

The method and the device have the advantages that the sequence of the calling party, the called party and the third party to join the meeting place is not limited, and the method and the device can be executed simultaneously or sequentially.

In one aspect, since the first call channel between the first device and the second device is already connected, the first server may send a first call disconnection request to the first device, where the first call disconnection request is used to request the first device to disconnect the first call channel from the second device. And the first device sends a first call disconnection response to the first server, wherein the first call disconnection response is used for indicating that the first device is disconnected from the first call channel between the first device and the second device. And the first call disconnection response carries the media information of the first device.

The media information of the first device includes, but is not limited to, a port number of the first device, a coding and decoding manner of the call service, and a media type (such as a video type or a voice type) of the call service.

Further, the first server may send a first session joining request to the second server, where the first session joining request is used to request the first device to join the session, and the first session joining request carries media information of the first device. Therefore, the second server can match the media information of the first device according to the media information of the second server to obtain first matched media information which is simultaneously suitable for the first device and the second server, namely, the media information can be transmitted between the first device and the second server, namely, the first device is added into the meeting place. The second server may send a first venue join response to the first server indicating that the first device has joined the venue. And the first meeting place joining response carries the first matched media information, and the first server can send the first matched media information to the first equipment, so that the first equipment can complete the conversation process in the meeting place according to the first matched media information.

On the other hand, since the first call channel between the first device and the second device is already connected, the first server may further send a second call disconnection request to the second device, where the second call disconnection request is used to request the second device to disconnect the first call channel from the first device. And the second device sends a second call disconnection response to the first server, wherein the second call disconnection response is used for indicating that the second device disconnects the first call channel with the first device. And the second communication disconnection response carries the media information of the second equipment.

The media information of the second device includes, but is not limited to, a port number of the second device, a coding and decoding manner of the call service, and a media type (such as a video type or a voice type) of the call service.

Further, the first server may send a second meeting place joining request to the second server, where the second meeting place joining request is used to request the second device to join the meeting place, and the second meeting place joining request carries media information of the second device. Therefore, the second server can match the media information of the second device according to the media information of the second server to obtain second matched media information which is simultaneously applicable to the second device and the second server, namely, the media information can be transmitted between the second device and the second server, namely, the second device is added to the meeting place. The second server may send a second venue join response to the first server indicating that the second device joined the venue. And the second meeting place joining response carries second matched media information, and the first server can send the second matched media information to the second equipment, so that the second equipment can complete the call process in the meeting place according to the second matched media information.

In another aspect, since the first server needs to control the third party to join the meeting place, the first server may send a second communication connection request to the third device, where the second communication connection request is used to request the third party device to join the meeting place. The third-party device may send a second communication response to the first server, where the second communication response is used to indicate that the third-party device has joined the meeting place, and the second communication response carries the media information of the third-party device.

The media information of the third device includes, but is not limited to, a port number of the third device, a coding and decoding manner of the call service, and a media type (such as a video type or a voice type) of the call service.

Further, the first server may send a third meeting place joining request to the second server, where the third meeting place joining request is used to request a third-party device to join the meeting place, and the third meeting place joining request carries media information of the third-party device. Therefore, the second server can match the media information of the third device according to the media information of the second server to obtain third matched media information which is simultaneously applicable to the third device and the second server, namely, the media information can be mutually transmitted between the third device and the second server, namely, the third device is added to the meeting place.

The second server may send a third venue join response to the first server indicating that the third party device has joined the venue. And the third meeting place joining response carries third matching media information, and the first server can send the third matching media information to the third equipment, so that the third equipment can complete the call process in the meeting place according to the third matching media information.

Therefore, the calling party, the called party and the third party are all joined in the meeting place, so that the third party participates in the conversation process of the calling party and the called party.

During the actual communication, the calling party and the called party may talk about the reservation items such as meal ordering, ticket booking, room booking, ticket booking, query items such as weather query, traffic condition query, movie theater remaining ticket number query, or other items, and the calling party and the called party may not understand the respective languages each other, and even a fraud party may exist in the calling party and the called party.

Based on the above process, the user may have various needs such as implementing the booking event, implementing the query event, real-time simultaneous interpretation and preventing the communication fraud during the communication process. In view of the above problems, the present application provides a call processing method and device, which can implement a process of satisfying various requirements of a user in real time during a call, improve the call performance of the user, and improve the call experience of the user.

Next, a specific implementation process of the call processing method according to the embodiment of the present application is described in detail with reference to fig. 2 by taking the first server as an execution main body.

Fig. 2 is a flowchart of a call processing method according to an embodiment of the present application, and as shown in fig. 2, the call processing method according to the embodiment of the present application may include:

s101, when receiving a trigger control request sent by core network equipment, a first server sends a meeting place creating request to a second server according to address information of the second server, wherein the meeting place creating request is used for indicating the second server to create a meeting place, and the trigger control request is used for indicating the first server to control current conversation when the core network equipment detects that a calling party and/or a called party needs a third party to participate in the current conversation.

With reference to fig. 1, when the calling party calls the called party, and when the core network device may detect that any one of the calling party and the called party needs a third party to participate in the current call, the core network device may send a trigger control request to the first server, where the trigger control request is used to request the first server to control the current call.

In this application, the trigger control request may include information indicating that the first server is requested to control the current call, where the information is information agreed in advance by the first server and the core network device, and the information may be in the form of a number, an identifier, a code, a protocol format, or the like. For example, when the information may be the identifier a, the first server determines whether to control the current call by determining whether the identifier a is included in the trigger control request. The current call can be a voice call or a video call, and the type of the current call is not limited in the application.

Since the first server contains the address information of the second server, the first server may send a meeting place creating request to the second server, where the meeting place creating request is used to instruct the second server to create a meeting place providing a call to the calling party, the called party, and the third party.

In this application, the address information of the second server has a function of uniquely identifying the second server, and may specifically be a Uniform Resource Locator (URL) of the second server. The meeting place creating request may include information indicating that the second server is requested to create a meeting place, where the information is information agreed in advance by the first server and the second server, and the information may be in the form of numbers, identifiers, codes, or protocol formats. For example, when the information may be the identifier B, the second server determines whether to create the meeting place by determining whether the identifier B is included in the request to create the meeting place.

It should be noted that, when multiple active parties call multiple called parties simultaneously and the core network device detects that the active parties and/or the passive parties need a third party to participate in the current call, the core network device may send multiple trigger control requests, where each trigger control request is used to indicate that the first server is requested to control the current call, where the number of the trigger control requests is the same as the number of the current calls, so that calls initiated by the multiple active parties all have different conference places to communicate with the called party and the third party. The number of the current calls is the same as the number of the call requests initiated by the plurality of the active parties. Furthermore, the first server may still send a meeting place creating request to the second server according to the address information of the second server to request the second server to create multiple meeting places. The meeting place creating request can not only indicate that the second server is requested to create the meeting places, but also carry the number of the meeting places, so that a plurality of groups of calling parties, called parties and third parties all have virtual places for respective calls. Wherein the number of meeting places is the same as the number of current calls.

S102, when receiving the meeting place establishing response sent by the second server and controlling the calling party, the called party and the third party to join the meeting place, the first server obtains the voice information of the calling party and the voice information of the called party through the meeting place, and the meeting place establishing response is used for indicating that the second server establishes the meeting place.

In this application, the second server may send a meeting place creating response to the first server when the meeting place creating is finished, where the meeting place creating response is used to notify the first server that the second server has established the meeting place, and at this time, the first server may control the calling party, the called party, and the third party to join the meeting place.

The meeting place creating response may include information indicating that the second server has created the meeting place, where the information is information agreed in advance by the first server and the second server, and the information may be in the form of numbers, identifiers, codes, or protocol formats. For example, when the information may be the identifier C, the first server determines whether the second server has established the meeting by determining whether the identifier C is included in the create meeting response.

It should be noted that, when the number of the conference places created by the second server is multiple, the created conference place response may carry information that uniquely identifies the conference places, such as conference place IDs of the respective conference places, in addition to indicating that the second server has already created the conference places, so as to distinguish the different conference places, so that the first server may add multiple groups of calling parties, called parties, and third parties to the different conference places, thereby avoiding a process in which the multiple groups of calling parties, called parties, and third parties interfere with respective calls.

Further, in the process of the call between the calling party and the called party, the second server can obtain the call content between the calling party and the called party in the meeting place in real time. Furthermore, the second server may perform processing such as voice effect compensation, noise filtering, and the like on the passing content of the calling party and the called party, and then send the processed passing content as voice information of the calling party and voice information of the called party to the first server.

In addition, the second server may send the voice information of the calling party and the voice information of the called party to the first server in various manners such as real-time, periodic, or non-periodic, and the specific implementation manner of the second server transmitting the voice information of the calling party and the voice information of the called party to the first server is not limited.

In a possible implementation manner, the second server may directly send the voice information of the calling party and the voice information of the called party to the first server.

In another possible implementation manner, the second server may perform processing such as blank information removal and garbage removal on the voice information of the calling party and the voice information of the called party, and then send the processed voice information and the voice information of the called party to the first server.

Illustratively, it will be understood by those skilled in the art that mixing is an integration of sound from multiple sources into one stereo or monophonic audio track. The second server can mix the voice information of the calling party and the voice information of the called party to obtain the voice information after mixing, so that a complete conversation process is formed, voice recognition is facilitated, and the speed of the voice recognition is improved.

Further, the first server may obtain the text information by means of a speech recognition function of the third device to avoid reducing a processing speed of the first server. Specifically, the second server may transmit the mixed voice information to the third device, so that the third device may recognize the mixed voice information according to its own voice recognition function to convert the mixed voice information into text information. Furthermore, the third device can send the text information to the first server, so that the first server can store and process the text information conveniently, the text information can be used as a training sample to realize processes such as answer acquisition, feature extraction and the like, and processes such as question consultation and service push are flexibly realized.

S103, the first server executes operation corresponding to the target keyword according to the target keyword which is identified by the equipment corresponding to the third party from the voice information of the calling party and the voice information of the called party.

When the first server acquires the voice information of the calling party and the voice information of the called party, if the first server has a voice recognition function, the first server can recognize the voice information of the calling party and the voice information of the called party to determine whether the voice information of the calling party and the voice information of the called party contain the target keyword. If the first server does not have the voice recognition function, it can be known in conjunction with fig. 1 that the third device has the voice recognition function, and therefore, the first server can recognize the voice information of the calling party and the voice information of the called party by means of the third device to determine whether the voice information of the calling party and the voice information of the called party contain a target keyword, where the target keyword may indicate the requirements of the calling party and the called party during the call.

The target keyword is pre-agreed by the third device and the first server, and may be implemented in various forms such as a text, a voice, an identifier, a code, or a protocol format. For example, the target keyword may be the words "reserve two movie tickets for movie a at movie theater X after 5 pm", mark 1 representing "query for current weather conditions in beijing city", and so on.

In addition, the target keyword may further include a wake-up identifier, where the wake-up identifier is used to wake up the third device, so that the third device switches from the silent state to the working state. The wake-up identifier may be implemented in various forms such as a text, a voice, an identifier, a code, or a protocol format. For example, the wake-up identification may be the voice "hi, wisdom! ".

Further, when the third device recognizes the target keyword, the third device may transmit the target keyword to the first server. In general, the first server performs different operations corresponding to different target keywords. Furthermore, the first server may execute at least one of the following operations corresponding to the target keyword according to the target keyword.

In one possible implementation, the first server, upon determining that the target keyword includes a fraud keyword, reminds the calling party and the called party of the current call as a fraud call by a conference hall.

In the present application, the third device may use information that frequently occurs in telephone fraud as a fraud keyword according to different types of fraud techniques or fraud means, and the fraud keyword is pre-agreed by the third device and the first server. Among them, fraud keywords may include, but are not limited to, "make money", "account number", and "enter identity information".

Further, when the first server determines that the target keyword includes a fraud keyword, the first server may determine that the current call is a fraud call. Furthermore, the first server can remind the calling party and the called party of the current call as a fraud call through the conference place, for example, the voice of 'this is a fraud call' is played, so that the calling party or the called party is prevented from being cheated, and the anti-fraud requirement of the user is met.

The specific implementation form of the voice reminding mode is not limited in the application.

For example, if the target keyword comprises a fraud keyword of "make money", the first server can play a reminder of "this is a fraud call" voice to the calling party and the called party through the conference hall.

Those skilled in the art will appreciate that textual form information is easier to filter than voice form information. Based on the content in S102, the first server may receive the text message sent by the third device, so as to facilitate the first server to determine whether the text message contains a fraud keyword. Specifically, since the fraud keyword database contains fraud keywords corresponding to various fraud techniques or various fraud means, the first server can match the text information in the fraud keyword database to determine whether the text information contains fraud keywords.

The fraud key database may be disposed in the first server, and may also be disposed in other devices, which is not limited in this application.

In addition, fraud keywords such as "make money" may occur during a normal conversation between a calling party and a called party. If the fraud keyword appears once, the first server determines the current call as a fraud call, thereby reducing the call experience of the user. Therefore, in order to increase the accuracy of determining the fraud calls, the first server may record the occurrence number of fraud keywords within a preset time period. The preset time period can be set according to actual conditions, such as 1 minute or 5 minutes. And the first server compares the occurrence frequency of the fraud keywords with the preset frequency to obtain a comparison result. The preset number of times may be set according to an empirical value, such as 3 times or 5 times.

Further, the first server may determine whether the current call is a fraud call according to the comparison result. When the number of occurrences of the fraud keyword is greater than or equal to the preset number as a result of the comparison, the first server may determine that the current call is a fraud call. When the number of occurrences of the fraud keyword is smaller than the preset number as a result of the comparison, the first server may determine that the current call is not a fraud call. Therefore, the first server can more accurately identify the fraud calls, and the phenomenon that the experience of the user is reduced due to mistaken identification of the fraud calls is avoided.

In another possible implementation manner, when the first server determines that the target keyword comprises a subscription keyword, the first server performs a subscription operation and reminds the calling party and the called party of the subscription result through a meeting place by voice.

In the present application, the third device may use the dialogues related to the scheduled items, such as ordering, ordering flowers, ordering train tickets, ordering plane tickets, ordering radio tickets, ordering rooms, ordering food, etc., which are used by the user daily, as the basis for extracting the scheduled keyword, that is, the scheduled keyword includes the specific content of the scheduled item, and the scheduled keyword is pre-agreed by the third device and the first server.

Further, when the first server determines that the target keyword comprises the booking keyword, the first server can execute the booking operation and remind the calling party and the called party of the booking result through the meeting place by voice.

The specific implementation form of the voice reminding mode is not limited in the application. The booking result may be a voice of "successful booking" or "failed booking", or may be a specific booking voice content, which is not limited in this application.

For example, if the target keyword includes a booking keyword "buy two types of concert tickets" for X, the first server can search for X concert tickets on the APP or web page of the booking concert tickets. Furthermore, the first server can directly order two concert tickets by default, and can also inform the calling party and the called party by conference place voice playing, and the process of the application is continuously used, so that the two concert tickets are ordered under the condition of obtaining the agreement of the calling party and the called party, thereby realizing the intelligent ordering operation of the user and meeting the booking requirement of the user.

In another possible implementation manner, when the first server determines that the target keyword includes the query keyword, the first server performs a query operation and prompts a query result to the calling party and the called party by voice through a meeting place.

In the application, the third device may use dialogs related to query items daily used by the user, such as weather query, bus query, geographic location query, traffic condition query, movie theater remaining ticket number, as a basis for extracting the query keyword, that is, the query keyword includes specific content of the query item, and the query keyword is agreed in advance by the third device and the first server.

Further, when the first server determines that the target keyword contains the query keyword, the first server can execute the query operation and remind the calling party and the called party of the query result through the meeting place by voice.

The specific implementation form of the voice reminding mode is not limited in the application. The query result may be a "query success" or "query failure" voice, or may be a specific query voice content, which is not limited in this application.

For example, if the target keyword includes a query keyword "query the current weather condition in beijing city", the first server may query the current weather condition in beijing city through a web page or a weather APP, and notify the calling party and the called party through conference site voice playing to satisfy the query requirement of the user.

In another possible implementation manner, when the first server determines that the target keyword comprises the translation keyword, the first server performs a translation operation and prompts a translation result to the calling party and the called party through a meeting place.

In the application, the third device uses the call content with the translation semantics existing in the voice information identifying the calling party and the voice information identifying the called party as a basis for determining the translation keyword, or uses the situation that two or more different types of voices exist in the voice information identifying the calling party and the voice information identifying the called party as a basis for determining the translation keyword, that is, the translation keyword contains specific contents of translation items, such as the type and the number of translated languages, and the translation keyword is agreed by the third device and the first server in advance.

Further, when the first server determines that the target keyword contains the translation keyword, the first server can execute the translation operation and remind the calling party and the called party of the translation result through the meeting place.

The specific implementation form of the voice reminding mode is not limited in the application. The translation result may be a speech of "translation success" or "translation failure", or may be a specific translation speech content, which is not limited in this application.

For example, if the target keyword includes a query keyword "translate english to chinese", the first server may translate the english speech information appearing in the speech information of the calling party and/or the called party into chinese speech information by translating the APP or online translation on the web page, and notify the calling party and the called party by conference site speech playing to satisfy the simultaneous interpretation requirement of the user.

In another possible implementation manner, when the first server determines that the target keyword includes other keywords, the first server performs other operations and reminds the calling party and the called party of the operation result through a meeting place by voice.

In the application, the user can further include other requirements in the conversation process, and the other requirements can be expressed through other keywords, so that when the first server determines that the target keywords contain other keywords, the first server can execute other operations and remind the operation result to the calling party and the called party through the meeting place by voice.

The specific implementation form of other keywords is not limited in the present application.

The target keyword may include any one of the keywords, or may include any combination of the keywords, which is not limited in this application. When the target keyword includes any combination of the multiple keywords, the first server may perform an operation corresponding to each keyword according to the specific process of the implementation manner, which is not described herein again.

Further, no matter any requirement of the user occurs in the call process, the first server can execute the operation corresponding to the target keyword according to the target keyword identified from the voice information of the calling party and the voice information of the called party, so as to meet various requirements of the user in real time.

According to the call processing method, when the core network device detects that the calling party and/or the called party need a third party to participate in the current call, a trigger control request for indicating the first server to control the current call can be sent to the first server. When the first server receives the trigger control request, a meeting place creating request indicating that the second server creates a meeting place can be sent to the second server according to the address information of the second server, so that the second server creates the meeting place and informs the first server that the second server establishes the meeting place through a meeting place creating response. Furthermore, when the first server receives the meeting place establishing response and controls the calling party, the called party and the third party to join the meeting place, the first server can acquire the voice information of the calling party and the voice information of the called party in real time or irregularly, and can execute the operation corresponding to the target keyword according to the target keyword identified by the equipment corresponding to the third party from the voice information of the calling party and the voice information of the called party, so that various requirements of the user are met in real time in the conversation process, the conversation performance of the user is improved, and the conversation experience of the user is improved.

For example, an embodiment of the present application further provides a call processing apparatus, fig. 3 is a schematic structural diagram of the call processing apparatus provided in an embodiment of the present application, as shown in fig. 3, the call processing apparatus 100 may exist independently, such as a server, or may be integrated in other devices, and may implement mutual communication with a core network device, another server (a second server in fig. 1), a device corresponding to a calling party, a device corresponding to a called party, and a device corresponding to a third party, so as to implement an operation corresponding to a first server in any one of the above method embodiments, where the call processing apparatus 100 in an embodiment of the present application may include:

a sending module 101, configured to send a meeting place creating request to another server according to address information of the other server when the receiving module 102 receives a trigger control request sent by a core network device, where the meeting place creating request is used to instruct the other server to create a meeting place, and the trigger control request is used for instructing, when the core network device detects that a calling party and/or a called party needs a third party to participate in a current call, the server to control the current call;

an obtaining module 103, configured to obtain, when the receiving module 102 receives a meeting place creating response sent by the another server and the joining module 104 controls the calling party, the called party, and the third party to join the meeting place, voice information of the calling party and voice information of the called party through the meeting place, where the meeting place creating response is used to indicate that the another server has created the meeting place;

the executing module 105 is configured to execute an operation corresponding to a target keyword according to the target keyword identified by the device corresponding to the third party from the voice information of the calling party and the voice information of the called party.

Fig. 4 is a schematic structural diagram of a call processing apparatus according to an embodiment of the present application, and as shown in fig. 4, the call processing apparatus 100 may further include, based on the structure shown in fig. 3: a determination module 106;

in some embodiments, the executing module 105 is specifically configured to, according to a target keyword that is identified by the device corresponding to the third party from the voice information of the calling party and the voice information of the called party, execute at least one of the following operations corresponding to the target keyword, including:

when determining module 106 determines that the target keyword comprises a fraud keyword, voice-alerting the calling party and the called party through the meeting place that the current call is a fraud call;

when the determining module 106 determines that the target keyword comprises a booking keyword, performing booking operation and reminding a booking result to the calling party and the called party through the meeting place by voice;

when the determining module 106 determines that the target keyword comprises a query keyword, executing a query operation and prompting a query result to the calling party and the called party through the meeting place by voice;

when the determining module 106 determines that the target keyword comprises a translation keyword, executing a translation operation and reminding a translation result to the calling party and the called party through the meeting place by voice; and the combination of (a) and (b),

and when the determining module 106 determines that the target keywords comprise other keywords, executing other operations and performing voice prompt operation results to the calling party and the called party through the meeting place.

In some embodiments, the determining module 106 is specifically configured to receive text information sent by the device corresponding to the third party, where the text information is obtained by the device corresponding to the third party identifying the received voice information after audio mixing sent by the other server when the calling party and the called party do not communicate within a preset time period, and the voice information after audio mixing is obtained by the other server mixing the voice information of the calling party and the voice information of the called party; and judging whether the text information contains the fraud keywords.

In some embodiments, the determining module 106 is configured to match the text information in a fraud keyword database to determine whether the fraud keyword is included in the text information.

In some embodiments, the executing module 105 is configured to record the number of occurrences of the fraud keyword within a preset time period; judging whether the current call is a fraud call or not according to the occurrence frequency of the fraud keywords; and when the current call is determined to be a fraud call, reminding the calling party and the called party that the current call is a fraud call.

Fig. 5 is a schematic structural diagram of a call processing apparatus according to an embodiment of the present application, and as shown in fig. 5, the call processing apparatus 100 may further include, on the basis of the structure shown in fig. 4: a storage module 107;

the storage module 107 is configured to store the text information.

In some embodiments, the current call is a voice call or a video call.

The call processing apparatus in the embodiment of the application may be configured to execute the technical solutions in the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.

For example, an embodiment of the present application further provides a call processing apparatus, fig. 6 is a schematic diagram of a hardware structure of the call processing apparatus provided in an embodiment of the present application, as shown in fig. 6, the call processing apparatus 200 may exist independently, such as a server, or may be integrated in other devices, and may implement mutual communication with a core network device, another server (a second server in fig. 1), a device corresponding to a calling party, a device corresponding to a called party, and a device corresponding to a third party, so as to implement an operation corresponding to a first server in any method embodiment described above, where the call processing apparatus 200 in an embodiment of the present application may include: a memory 201 and a processor 202. The memory 201 and the processor 202 may be connected by a bus 203.

A memory 201 for storing program code;

the processor 202 calls the program code, and when the program code is executed, the processor is configured to execute the call processing method in any of the embodiments. Reference may be made in particular to the description relating to the method embodiments described above.

Optionally, the embodiment of the present application further includes a communication interface 204, and the communication interface 204 may be connected to the processor 202 through the bus 203. The processor 202 may control the communication interface 203 to implement the above-described receiving and transmitting functions of the call processing apparatus 200.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present application.

In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.

It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The application also provides a readable storage medium, in which an execution instruction is stored, and when at least one processor of the server executes the execution instruction, the server executes the call processing method in the above method embodiment.

The application also provides a chip, the chip is connected with the memory, or the chip is integrated with the memory, and when a software program stored in the memory is executed, the conversation processing method in the embodiment of the method is realized.

The present application also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the server may read the execution instruction from the readable storage medium, and the execution of the execution instruction by the at least one processor causes the server to implement the call processing method in the above method embodiment.

Those of ordinary skill in the art will understand that: in the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims

1. a call processing method, is characterized in that, comprises:

When receiving the trigger control request sent by the core network device, the first server sends a site creation request to the second server according to the address information of the second server, where the site creation request is used to instruct the second server to create a site , the trigger control request is used for the core network device to instruct the first server to control the current call when it detects that the calling party and/or the called party needs a third party to participate in the current call;

When the first server receives a site creation response sent by the second server and controls the calling party, the called party and a third party to join the site, obtains the calling party from the site the voice information of the party and the voice information of the called party, the creating a site response is used to indicate that the second server has established the site;

The first server performs an operation corresponding to the target keyword according to the target keyword recognized by the device corresponding to the third party from the voice information of the calling party and the voice information of the called party.

2 . The method according to claim 1 , wherein the first server identifies the target from the voice information of the calling party and the voice information of the called party according to the device corresponding to the third party. 3 . keyword, perform at least one of the following operations corresponding to the target keyword, including:

When determining that the target keyword includes a fraudulent keyword, the first server voice reminds the calling party and the called party through the conference site that the current call is a fraudulent call;

When determining that the target keyword includes a reservation keyword, the first server executes a reservation operation and voice reminds the calling party and the called party of the reservation result through the conference site;

When determining that the target keyword includes a query keyword, the first server performs a query operation and voice prompts the calling party and the called party with the query result through the conference site;

When determining that the target keyword includes a translation keyword, the first server performs a translation operation and voice reminds the calling party and the called party of the translation result through the conference site; and,

When it is determined that the target keyword includes other keywords, the first server performs other operations, and voice reminds the calling party and the called party of the operation result through the conference site.

3. The method according to claim 2, wherein the first server determines that the target keyword includes a fraudulent keyword, comprising:

The first server receives the text information sent by the device corresponding to the third party, and the text information is the device corresponding to the third party when the calling party and the called party do not have a call within a preset period of time. It is obtained by recognizing the received voice information after mixing sent by the second server, and the voice information after mixing is the voice information of the calling party and the called party that are compared by the second server. obtained by mixing the voice information;

The first server determines whether the fraudulent keyword is included in the text information.

4. The method according to claim 3, wherein the first server determines whether the text message contains the fraudulent keyword, comprising:

The first server matches the text information in a fraudulent keyword database to determine whether the textual information contains the fraudulent keyword.

5. The method according to any one of claims 2-4, wherein the first server voice reminds the calling party and the called party that the current call is a fraudulent call, comprising:

The first server records the number of occurrences of the fraudulent keyword within a preset time period;

The first server determines whether the current call is a fraudulent call according to the number of occurrences of the fraudulent keyword;

When determining that the current call is a fraudulent call, the first server voice reminds the calling party and the called party that the current call is a fraudulent call.

6. The method according to claim 3 or 4, wherein the method further comprises:

The first server stores the text information.

7. The method according to any one of claims 1-4, wherein the current call is a voice call or a video call.

8. A call processing device, wherein the call processing module is applied to a server, comprising:

The sending module is configured to, when the receiving module receives the trigger control request sent by the core network device, send a site creation request to the other server according to the address information of the other server, where the site creation request is used to instruct the other server A server creates a conference site, and the trigger control request is used by the core network device to instruct the server to control the current call when it detects that the calling party and/or the called party needs a third party to participate in the current call;

The acquiring module is configured to, when the receiving module receives a site creation response sent by the other server, and the joining module controls the calling party, the called party, and a third party to join the site, use the The site acquires the voice information of the calling party and the voice information of the called party, and the site creation response is used to indicate that the other server has established the site;

An execution module, configured to execute an operation corresponding to the target keyword according to the target keyword recognized by the device corresponding to the third party from the voice information of the calling party and the voice information of the called party.

9 . The call processing apparatus according to claim 8 , wherein the execution module is specifically configured to obtain the voice information of the calling party and the voice of the called party according to the device corresponding to the third party. 10 . For the target keyword identified in the information, perform at least one of the following operations corresponding to the target keyword, including:

When the determining module determines that the target keyword includes a fraudulent keyword, reminding the calling party and the called party by voice from the conference site that the current call is a fraudulent call;

When the determining module determines that the target keyword includes a reservation keyword, perform a reservation operation and voice remind the calling party and the called party of the reservation result through the conference site;

When the determining module determines that the target keyword includes a query keyword, perform a query operation and voice prompt the calling party and the called party with the query result through the conference site;

When the determining module determines that the target keyword includes a translation keyword, perform a translation operation and voice remind the calling party and the called party of the translation result through the conference site; and,

When the determining module determines that the target keyword includes other keywords, other operations are performed and the calling party and the called party are voice reminded of the operation result through the conference site.

10 . The call processing apparatus according to claim 9 , wherein the determining module is specifically configured to receive text information sent by a device corresponding to the third party, and the text information is the device corresponding to the third party. 11 . It is obtained by recognizing the received mixed voice information sent by the other server when the calling party and the called party do not talk within a preset period of time, and the mixed voice information is: The other server mixes the voice information of the calling party and the voice information of the called party, and judges whether the text information contains the fraud keyword.

11 . The call processing device according to claim 10 , wherein the determining module is configured to match the text information in a fraud keyword database to determine whether the text information contains the fraud key. 11 . Character.

12. The call processing device according to any one of claims 9-11, wherein the execution module is configured to record the number of occurrences of the fraudulent keyword within a preset time period; according to the fraudulent keyword It is judged whether the current call is a fraudulent call; when it is determined that the current call is a fraudulent call, the calling party and the called party are reminded by voice that the current call is a fraudulent call.

13. The call processing device according to claim 10 or 11, wherein the device further comprises: a storage module;

The storage module is used for storing the text information.

14 . The call processing apparatus according to claim 8 , wherein the current call is a voice call or a video call. 15 .

15. A readable storage medium, wherein a computer program is stored on the readable storage medium; when the computer program is executed, the call processing method according to any one of claims 1-7 is implemented .

16. A call processing device, characterized in that it comprises: a memory and a processor, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions in the memory to execute the program according to any one of claims 1-7. The call handling method described above.