CN111916086B

CN111916086B - Voice interaction control method, device, computer equipment and storage medium

Info

Publication number: CN111916086B
Application number: CN202010619579.5A
Authority: CN
Inventors: 唐德顺; 阮亚华
Original assignee: Zhongdian Jinxin Software Co Ltd
Current assignee: Zhongdian Jinxin Software Co Ltd
Priority date: 2020-07-01
Filing date: 2020-07-01
Publication date: 2024-03-29
Anticipated expiration: 2040-07-01
Also published as: CN111916086A

Abstract

The application relates to a voice interaction control method, a voice interaction control device, computer equipment and a storage medium. The method comprises the following steps: receiving voice data input by a user in a current interaction scene, if the voice data are identified to contain voice data of a non-target user, determining operation authority of the non-target user according to the voice data of the non-target user, and if the operation authority of the non-target user contains operation authority corresponding to the current interaction scene, executing corresponding operation according to semantic content and the operation authority of the voice data. According to the scheme, the identity of the non-target user except the target user can be identified, and after the authority verification is completed, corresponding operation is executed, so that the non-target user is allowed to assist the target user to complete the interactive operation under a specific scene, and the operation efficiency and convenience are improved.

Description

Voice interaction control method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of manual interaction technologies, and in particular, to a method and apparatus for controlling voice interaction, a computer device, and a storage medium.

Background

With the development of artificial intelligence and voice recognition technology, the traditional man-machine interaction mode is changed greatly, and the man-machine interaction mode based on voice interaction control is generated. The interactive voice operation system on each type of mobile terminal equipment, each type of computer and each type of electric appliance replaces the man-machine interaction mode of the graphical use interface in the existing operation system.

The interactive voice operation mode is mainly voice input and voice output, and other input and output modes are auxiliary. For example, when the user remotely operates the terminal, the user can operate through voice, the terminal recognizes an operation instruction in the voice, and the corresponding operation can be performed according to the operation instruction.

However, in the existing interactive voice operation scheme, the user performs a privacy comparison operation on the terminal device, and the terminal device generally only recognizes the voice operation instruction of the account owner. In some specific situations, if the account owner is inconvenient to operate, assistance of a parent or other people is required, and voice operation of other people is not recognized by the terminal device, so that the operation cannot be performed. In view of this, the operational convenience of the existing interactive voice operation scheme is not high.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a voice interaction control method, apparatus, computer device, and storage medium capable of improving the operational convenience.

The embodiment of the invention provides a voice interaction control method, which comprises the following steps:

receiving voice data input by a user in a current interaction scene;

if the voice data are recognized to contain the voice data of the non-target user, determining the operation authority of the non-target user according to the voice data of the non-target user;

and if the operation authority of the non-target user comprises the operation authority corresponding to the current interaction scene, executing corresponding operation according to the semantic content and the operation authority of the voice data.

In one embodiment, identifying whether the voice data includes voice data of a non-target user specifically includes:

extracting voiceprint features in voice data;

and determining whether the user corresponding to the voice data is a target user according to the extracted voiceprint characteristics.

In one embodiment, determining whether the user corresponding to the voice data is the target user includes:

the extracted voiceprint features are sent to a third-party voiceprint service platform, and the voiceprint features of the target user are prestored in the third-party voiceprint service platform;

receiving an identification result returned by the third-party voiceprint service platform;

and determining whether the user corresponding to the voice data is a target user or not according to the identification result.

In one embodiment, determining whether the user corresponding to the voice data is the target user according to the extracted voiceprint feature includes:

matching the extracted voiceprint characteristics with the voiceprint characteristics of a pre-stored target user to obtain a voiceprint matching result;

and determining whether the user corresponding to the voice data is a target user according to the voiceprint matching result.

In one embodiment, determining the operation authority of the non-target user according to the voice data of the non-target user includes:

matching the extracted voiceprint features in a preset voiceprint feature database;

if the pre-stored voiceprint characteristics consistent with the extracted voiceprint characteristics are matched, the identity data of the non-target user is searched according to the pre-stored voiceprint characteristics;

and acquiring the operation authority corresponding to the current interaction scene of the non-target user according to the identification data.

In one embodiment, the identification result includes user identification data;

determining the operation authority of the non-target user according to the voice data of the non-target user comprises:

acquiring identity data of a non-target user;

and inquiring the operation authority corresponding to the current interaction scene of the non-target user according to the identification data of the non-target user.

In one embodiment, if the operation authority of the non-target user includes the operation authority corresponding to the current interaction scene, executing the corresponding operation according to the semantic content and the operation authority of the voice data includes:

extracting operation item keywords contained in semantic content of voice data;

determining the interaction type of the current interaction scene according to the semantic content;

and if the operation item keywords and the interaction types are consistent with the operation authority of the current interaction scene, executing corresponding operations according to the operation item keywords.

In one embodiment, the method further comprises:

if the operation authority of the non-target user does not contain the operation authority corresponding to the current interaction scene, the voice data of the non-target user is ignored, and the step of receiving the voice data input by the user in the current interaction scene is returned.

The embodiment of the invention provides a voice interaction control device, which comprises:

the voice data receiving module is used for receiving voice data input by a user in a current interaction scene;

the permission data acquisition module is used for confirming the operation permission of the non-target user according to the voice data of the non-target user if the voice data are recognized to contain the voice data of the non-target user;

and the data processing module is used for executing corresponding operation according to the semantic content and the operation authority of the voice data if the operation authority of the non-target user contains the operation authority corresponding to the current interaction scene.

The embodiment of the invention provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the following steps when executing the computer program:

receiving voice data input by a user in a current interaction scene;

if the operation authority of the non-target user contains the operation authority corresponding to the current interaction scene, then

And executing corresponding operation according to the semantic content and the operation authority of the voice data.

An embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

receiving voice data input by a user in a current interaction scene;

According to the voice interaction control method, the voice interaction control device, the computer equipment and the storage medium, voice data input by a user in a current interaction scene are received, identity recognition is carried out on the voice data input by the user, when the voice data are recognized to contain voice data of non-target users, operation permission verification is carried out, and if the non-target users have relevant operation permission, corresponding operation is carried out according to semantic content of the voice data and the operation permission, so that user interaction is completed. According to the scheme, the identity of the non-target user except the target user can be identified, and after the authority verification is completed, corresponding operation is executed, so that the non-target user is allowed to assist the target user to complete the interactive operation under a specific scene, and the operation efficiency and convenience are improved.

Drawings

FIG. 1 is an application environment diagram of a voice interaction control method in one embodiment;

FIG. 2 is a flow chart of a method of controlling voice interaction in one embodiment;

FIG. 3 is a flowchart of a voice interaction control method according to another embodiment;

FIG. 4 is a flow chart of a user identification step in one embodiment;

FIG. 5 is a block diagram of a voice interaction control apparatus in one embodiment;

FIG. 6 is a block diagram of a voice interaction control apparatus according to another embodiment;

fig. 7 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The voice interaction control method provided by the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. Specifically, the user performs man-machine interaction on the terminal 102 through input voice in an interaction scene, the terminal 102 collects voice data input by the user in real time and sends the voice data to the server 104 for user identity recognition, the server 104 receives the voice data input by the user in the current interaction scene and sent by the terminal 102, if the voice data are recognized to contain voice data of a non-target user, the operation authority of the non-target user is determined according to the voice data of the non-target user, and if the operation authority of the non-target user contains the operation authority corresponding to the current interaction scene, corresponding operation is performed according to semantic content and the operation authority of the voice data. The terminal 102 may be, but not limited to, various electronic devices, various types of computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a voice interaction control method is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:

step 202, receiving voice data input by a user in a current interaction scene.

The voice data is voice/voice data input by the user in the current interaction scene, and specifically, the voice data can be generated based on a microphone, a headset and other devices with recording functions of the terminal equipment. In this embodiment, the interaction scenario takes a transaction scenario of a home bank as an example, a user logs in to an account and inputs voice data (voice command) before a transaction terminal (hereinafter referred to as a terminal) to operate the transaction terminal, and further, the transaction terminal may be provided with a camera to collect a face image of the user to identify the user or ensure transaction security. In the interaction process, other people may exist besides the account owner (i.e. the target user) corresponding to the account of the current transaction, although the user is not acquired with the face image, there may be a speech expression belonging to a non-target user, and for the non-target user speech expression, a distinction is required to be made, and identity recognition is performed to determine whether the second speech expression needs to be adopted. Therefore, in the interaction process, identity recognition is performed on each piece of voice data input by the received user, so as to judge the relationship between the owner of each piece of voice data and the target user of the current interaction scene.

Step 204, if it is recognized that the voice data includes voice data of a non-target user, determining an operation authority of the non-target user according to the voice data of the non-target user.

In a practical application scenario, certain interactions (or transactions) may allow non-target users to operate with voice assistance, taking into account the scenarios used in the home. For example: parents help young children perform interactions and children assist parents in performing interactions, possibly allowing assisted interactions between any two persons that are mutually trusted. Therefore, in the implementation, it is necessary to identify each piece of voice data input by the received user voice, and verify whether the current voice data is input by the account owner (target user) of the current interaction scenario. Specifically, the voice data is identified by voice print identification technology. If the user identity corresponding to the current voice data is identified as the target user, performing related operation according to the voice data of the target user. If the user identity corresponding to the current voice data is identified as the non-target user, acquiring (or inquiring) the authority data of the non-target user is needed to verify whether the voice data of the non-target user can be adopted. The authority setting can be freely authorized by the user before the system is used, and is not limited herein. In actual operation, the target user can add one or more contacts (i.e. non-target users) to the account of the target user, and perfect contact identity information, role information, authority information and the like. The voice print feature and the identity identification information of the contact are pre-stored so as to facilitate identity identification, meanwhile, permission data of the contact for different interaction scenes can be preset for facilitating auxiliary transaction, the identity identification data is associated with the voice print feature and the operation permission data, namely, corresponding operation permission data and voice print feature can be correspondingly found through the identity identification data, and the identity identification data and the operation permission data can be identified through the voice print feature.

And 206, if the operation authority data of the non-target user contains the operation authority for the current interaction scene, executing corresponding operation according to the semantic content and the operation authority of the voice data.

In practical applications, the authorization device for the operation authority data may include, but is not limited to, operation authority of a specific interaction scenario, data operation authority of the interaction scenario, and the like. For example, taking a home banking transaction scenario as an example, the operation authority data may include, but is not limited to, operation authorities of transaction types, such as inquiry operations, transfer operations, and deposit and withdrawal operations, etc., operation authorities of transaction data, such as limits of a single transaction, number of transactions per day, etc. If the non-target user does not have the corresponding operation authority for the current interaction scene, the voice data input by the non-target user is ignored, the non-target user is considered to start new interaction, and the step 202 is returned. And continuing to perform the identification operation on the input voice data in real time. If the operation authority data of the non-target user contains the operation authority which can be performed for the current interaction scene, the non-target user is indicated to belong to one of the contacts added in advance by the target user, the voice data coming in and going out of the non-target user is judged to be valid, and then the corresponding operation is executed according to the semantic content and the operation authority of the voice data. In a specific application, the voice data input by the user may contain a corresponding operation item keyword, such as "inquiring the current balance", the text recognition can be performed on the voice data to obtain semantic content, and then according to the interactive operation keyword contained in the semantic content, the corresponding operation is performed in combination with the operation authority for the current interactive scene.

In the voice interaction control method, voice data input by a user in a current interaction scene is received, identity recognition is carried out on the voice data input by the user, when the voice data are recognized to contain voice data of non-target users, operation permission verification is carried out, and if the non-target users have relevant operation permission, corresponding operation is carried out according to semantic content of the voice data and in combination with the operation permission, so that user interaction is completed. According to the scheme, the identity of the non-target user except the target user can be identified, and after the authority verification is completed, corresponding operation is executed, so that the non-target user is allowed to assist the target user to complete the interactive operation under the specific scene, and the operation efficiency and convenience are improved

In one embodiment, as shown in fig. 3, before step 204, the method further includes:

step 203, extracting voiceprint features in the voice data, and determining whether the user corresponding to the voice data is a target user according to the extracted voiceprint features.

Voiceprint features refer to the spectrum of sound waves that carry user speech information. In specific implementation, the user identity is identified based on the voice data, which can be that a voiceprint recognition technology is adopted to extract voiceprint characteristics in the voice data, and the user identity of the voice data can be correspondingly determined because the voiceprint characteristics have uniqueness. In the embodiment, since the voiceprint features have the characteristics of specificity and stability, the voiceprint features are adopted for identity recognition, so that the user identity can be rapidly and accurately recognized.

In one embodiment, as shown in fig. 4, determining whether the user corresponding to the voice data is the target user according to the extracted voiceprint feature includes:

step 223, the extracted voiceprint features are sent to a third-party voiceprint service platform, and the voiceprint features of the target user are prestored in the third-party voiceprint service platform;

step 243, receiving an identification result returned by the third-party voiceprint service platform;

step 263, determining whether the user corresponding to the voice data is the target user according to the identification result.

Voiceprint services are comprehensive technical services that provide a series of solutions for speech recognition, voiceprint modeling, authentication, identification, etc., based on a voiceprint recognition engine as a core basis. In practical application, whether face recognition or voiceprint recognition is performed, the corresponding face recognition service or voiceprint recognition service can be obtained by calling the voiceprint service platform interface of the third party. In this embodiment, an interface of a third-party voiceprint service platform may be called to obtain a service of the third-party voiceprint service platform, the extracted voiceprint feature is sent to the third-party voiceprint service platform pre-stored with the voiceprint feature of the target user, and the platform performs voiceprint recognition on the extracted voiceprint feature by combining with the pre-stored voiceprint feature of the target user, and returns a corresponding identification result. Specifically, the third party voiceprint service platform can reduce the influence of channel difference by means of the voiceprint technology Ivector technology and the PLDA technology, identify voiceprint features and improve identification performance. The identity recognition result may include user identity information such as identity data and role information (target users and non-target users), and the identity recognition result may also include that the corresponding identity is not recognized. In this embodiment, by calling the third-party voiceprint service platform, workload related to voiceprint recognition function architecture in the early stage can be saved, and identity recognition can be quickly and conveniently performed, so as to obtain a voiceprint recognition result.

In one embodiment, determining whether the user corresponding to the voice data is the target user according to the extracted voiceprint feature includes: and matching the extracted voiceprint characteristics with the voiceprint characteristics of the pre-stored target user to obtain a voiceprint matching result, and determining whether the user corresponding to the voice data is the target user according to the voiceprint matching result.

Besides invoking the third-party voiceprint service platform to identify the user identity, the voiceprint features of the target user can be pre-stored in the database, the extracted voiceprint features are directly matched with the pre-stored voiceprint features of the target user during voiceprint identification each time, if the matching results are consistent, the user identity is proved to be the target user, and if the matching results are inconsistent, the user identity is proved to be the non-target user. In this embodiment, the extracted voiceprint features are directly matched with the voiceprint features of the pre-stored target user, without depending on a third-party voiceprint service platform, so that the privacy of user data can be ensured, and the operation safety of interaction can be improved.

In one embodiment, the identification result includes user identification data;

determining the operation authority of the non-target user according to the voice data of the non-target user comprises: acquiring identity data of a non-target user; and inquiring the operation authority corresponding to the current interaction scene of the non-target user according to the identification data of the non-target user.

As described in the above embodiment, if the third-party voiceprint service platform identifies the user identity, the returned identity identification result may include user identity identification data, where the identity identification data may be an identity number, an identity card number, a mobile phone number, and other identification data for distinguishing the user identity. If the returned identification result includes the identification data of the non-target user, the corresponding operation authority data can be found out according to the identification data of the non-target user. In another embodiment, determining the operation authority of the non-target user according to the voice data of the non-target user may further include: matching the extracted voiceprint features in a preset voiceprint feature database; if the pre-stored voiceprint characteristics consistent with the extracted voiceprint characteristics are matched, the pre-stored voiceprint characteristics are associated with the identity identification data, the preset identity identification data of the non-target user can be searched according to the pre-stored voiceprint characteristics, and the operation authority corresponding to the current interaction scene of the non-target user can be obtained according to the identity identification data. In this embodiment, the operation authority data can be quickly and accurately found according to the identification data.

In one embodiment, as shown in fig. 3, according to the semantic content and the interaction type, in combination with the operation authority of the current interaction scenario, performing the corresponding operation includes: step 226, extracting operation item keywords contained in semantic content of the voice data, determining the interaction type of the current interaction scene according to the semantic content, and executing corresponding operation according to the operation item keywords if the operation item keywords and the interaction type are consistent with the operation authority of the current interaction scene.

The interaction types of the interaction scenario may include, but are not limited to, query class interactions, information update class interactions, functionality selection interactions, and the like. In the implementation, the extraction of the operation item keywords in the semantic content may be to perform semantic recognition (or text recognition) on the voice data to obtain the semantic content contained in the voice data, and then combine the keyword recognition technology to extract the obtained semantic content data. After the operation item keywords are extracted, judging whether the operation item keywords and the operation items contained in the interaction type meet the operation authority of the non-target user for the current interaction scene or not, and if so, executing the corresponding operation. For example, the operation item contained in the semantic content of the non-target user is "balance check", and the non-target user has the operation authority of the current transaction, so that the operation item of "balance check" can be used as the next operation of the current transaction. If the current transaction type and the operation items contained in the non-target user voice content do not accord with the operation authority of the non-target user for the current interaction scene, the operation contained in the non-target user voice content is ignored. For example, the voice content of the non-target user is "check balance", and the non-target user does not have the operation authority of the current transaction or does not have the operation authority of "poor balance", so that the operation term of "check balance" is not used as the next operation of the current transaction, the voice content is ignored, and the step 202 is returned.

It should be understood that, although the steps in the flowcharts of fig. 2-4 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-4 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.

In one embodiment, as shown in fig. 5, there is provided a voice interaction control apparatus, including: a voice data receiving module 510, a rights data obtaining module 520, a data identifying module 530 and a data processing module 540, wherein:

the voice data receiving module 510 is configured to receive voice data input by a user in a current interaction scenario.

And the permission data acquisition module 520 is configured to determine an operation permission of the non-target user according to the voice data of the non-target user if it is recognized that the voice data includes the voice data of the non-target user.

The data processing module 530 is configured to execute a corresponding operation according to the semantic content and the operation authority of the voice data if the operation authority of the non-target user includes the operation authority corresponding to the current interaction scenario.

In one embodiment, as shown in fig. 6, the apparatus further includes an identity recognition module 540, configured to extract voiceprint features in the voice data, and determine whether the user corresponding to the voice data is a target user according to the extracted voiceprint features.

In one embodiment, the identity recognition module 540 is further configured to send the extracted voiceprint feature to a third party voiceprint service platform, where the third party voiceprint service platform pre-stores the voiceprint feature of the target user, receive an identity recognition result returned by the third party voiceprint service platform, and determine, according to the identity recognition result, whether the user corresponding to the voice data is the target user.

In one embodiment, the identity recognition module 540 is further configured to match the extracted voiceprint feature with a voiceprint feature of a pre-stored target user, obtain a voiceprint matching result, and determine whether the user corresponding to the voice data is the target user according to the voiceprint matching result.

In one embodiment, the permission data acquisition module 520 is further configured to acquire identity data of a non-target user, and query an operation permission corresponding to the current interaction scenario for the non-target user according to the identity data of the non-target user.

In one embodiment, the permission data obtaining module 520 is further configured to match the extracted voiceprint feature in a preset voiceprint feature database, and if a pre-stored voiceprint feature consistent with the extracted voiceprint feature is matched, search out the identity data of the preset non-target user according to the pre-stored voiceprint feature, and obtain the operation permission corresponding to the current interaction scenario by the non-target user according to the identity data.

In one embodiment, the data processing module 530 is further configured to extract an operation item keyword included in the semantic content of the voice data, determine, according to the semantic content, an interaction type of the current interaction scenario, and if the operation item keyword and the interaction type match the operation authority of the current interaction scenario, execute a corresponding operation according to the operation item keyword.

In one embodiment, the data processing module 530 is further configured to ignore the voice data of the non-target user if the operation authority of the non-target user does not include the operation authority corresponding to the current interaction scenario, and control the voice data receiving module 510 to perform an operation of receiving the voice data input by the user in the current interaction scenario.

For specific limitations of the voice interaction control apparatus, reference may be made to the above limitation of the voice interaction control method, and no further description is given here. The modules in the voice interaction control device can be realized in whole or in part by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing voiceprint feature data, identity data and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a voice interaction control method.

It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program: receiving voice data input by a user in a current interaction scene, if the voice data are identified to contain voice data of a non-target user, determining operation authority of the non-target user according to the voice data of the non-target user, and if the operation authority data of the non-target user contain operation authority aiming at the current interaction scene, executing corresponding operation according to semantic content and operation authority of the voice data.

In one embodiment, the processor when executing the computer program further performs the steps of: and extracting voiceprint features in the voice data, and determining whether the user corresponding to the voice data is a target user according to the extracted voiceprint features.

In one embodiment, the processor when executing the computer program further performs the steps of: and sending the extracted voiceprint features to a third-party voiceprint service platform, pre-storing the voiceprint features of the target user by the third-party voiceprint service platform, receiving an identification result returned by the third-party voiceprint service platform, and determining whether the user corresponding to the voice data is the target user according to the identification result.

In one embodiment, the processor when executing the computer program further performs the steps of: and matching the extracted voiceprint characteristics with the voiceprint characteristics of the pre-stored target user to obtain a voiceprint matching result, and determining whether the user corresponding to the voice data is the target user according to the voiceprint matching result.

In one embodiment, the processor when executing the computer program further performs the steps of: and acquiring the identity data of the non-target user, and inquiring the operation authority corresponding to the current interaction scene of the non-target user according to the identity data of the non-target user.

In one embodiment, the processor when executing the computer program further performs the steps of: extracting operation item keywords contained in semantic content of voice data, determining the interaction type of the current interaction scene according to the semantic content, and executing corresponding operation according to the operation item keywords if the operation item keywords and the interaction type are consistent with the operation authority of the current interaction scene.

In one embodiment, the processor when executing the computer program further performs the steps of: and matching the extracted voiceprint features in a preset voiceprint feature database, if the pre-stored voiceprint features consistent with the extracted voiceprint features are matched, searching out the identity identification data of the preset non-target user according to the pre-stored voiceprint features, and acquiring the operation authority corresponding to the current interaction scene of the non-target user according to the identity identification data.

In one embodiment, the processor when executing the computer program further performs the steps of: if the operation authority of the non-target user does not contain the operation authority corresponding to the current interaction scene, the voice data of the non-target user is ignored, and the step of receiving the voice data input by the user in the current interaction scene is returned.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of: receiving voice data input by a user in a current interaction scene, if the voice data are identified to contain voice data of a non-target user, determining operation authority of the non-target user according to the voice data of the non-target user, and if the operation authority data of the non-target user contain operation authority aiming at the current interaction scene, executing corresponding operation according to semantic content and operation authority of the voice data.

In one embodiment, the computer program when executed by the processor further performs the steps of: and extracting voiceprint features in the voice data, and determining whether the user corresponding to the voice data is a target user according to the extracted voiceprint features.

In one embodiment, the computer program when executed by the processor further performs the steps of: and sending the extracted voiceprint features to a third-party voiceprint service platform, pre-storing the voiceprint features of the target user by the third-party voiceprint service platform, receiving an identification result returned by the third-party voiceprint service platform, and determining whether the user corresponding to the voice data is the target user according to the identification result.

In one embodiment, the computer program when executed by the processor further performs the steps of: and matching the extracted voiceprint characteristics with the voiceprint characteristics of the pre-stored target user to obtain a voiceprint matching result, and determining whether the user corresponding to the voice data is the target user according to the voiceprint matching result.

In one embodiment, the computer program when executed by the processor further performs the steps of: and acquiring the identity data of the non-target user, and inquiring the operation authority corresponding to the current interaction scene of the non-target user according to the identity data of the non-target user.

In one embodiment, the computer program when executed by the processor further performs the steps of: extracting operation item keywords contained in semantic content of voice data, determining the interaction type of the current interaction scene according to the semantic content, and executing corresponding operation according to the operation item keywords if the operation item keywords and the interaction type are consistent with the operation authority of the current interaction scene.

In one embodiment, the computer program when executed by the processor further performs the steps of: and matching the extracted voiceprint features in a preset voiceprint feature database, if the pre-stored voiceprint features consistent with the extracted voiceprint features are matched, searching out the identity identification data of the preset non-target user according to the pre-stored voiceprint features, and acquiring the operation authority corresponding to the current interaction scene of the non-target user according to the identity identification data.

In one embodiment, the computer program when executed by the processor further performs the steps of: if the operation authority of the non-target user does not contain the operation authority corresponding to the current interaction scene, the voice data of the non-target user is ignored, and the step of receiving the voice data input by the user in the current interaction scene is returned.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A voice interaction control method, the method comprising:

receiving voice data input by a user in a current interaction scene;

extracting voiceprint features in the voice data, determining whether a user corresponding to the voice data is a target user according to the extracted voiceprint features, wherein the target user is a user corresponding to a login account, and determining whether the user corresponding to the voice data is the target user according to the extracted voiceprint features comprises: the extracted voiceprint features are sent to a third-party voiceprint service platform, the voiceprint features of a target user are prestored in the third-party voiceprint service platform, an identification result returned by the third-party voiceprint service platform is received, and whether the user corresponding to the voice data is the target user or not is determined according to the identification result;

if the voice data are recognized to contain voice data of a non-target user, determining the operation authority of the non-target user according to the voice data of the non-target user;

if the operation authority of the non-target user comprises the operation authority corresponding to the current interaction scene, executing corresponding operation according to the semantic content of the voice data and the operation authority;

if the operation authority of the non-target user includes the operation authority corresponding to the current interaction scene, executing the corresponding operation according to the semantic content of the voice data and the operation authority includes: extracting operation item keywords contained in semantic content of the voice data, determining an interaction type of a current interaction scene according to the semantic content, and executing corresponding operations according to the operation item keywords if the operation item keywords and operation items contained in the interaction type accord with the operation rights of the current interaction scene, wherein the operation rights of the current interaction scene comprise query type interactions, information update type interactions or function selective interactions, and the operation rights of the current interaction scene comprise the operation rights of the interaction scene and the data operation rights of the interaction scene.

2. The method of claim 1, wherein determining whether the user corresponding to the voice data is a target user according to the extracted voiceprint feature comprises:

and determining whether the user corresponding to the voice data is a target user or not according to the voiceprint matching result.

3. The method of claim 1, wherein determining the operational rights of the non-target user based on the voice data of the non-target user comprises:

if the pre-stored voiceprint characteristics consistent with the extracted voiceprint characteristics are matched, the identity data of the non-target user is found out according to the pre-stored voiceprint characteristics;

4. The method of claim 1, wherein determining the operational rights of the non-target user based on the voice data of the non-target user comprises:

acquiring the identity data of the non-target user;

5. The method according to any one of claims 1 to 4, further comprising:

and if the operation authority of the non-target user does not contain the operation authority corresponding to the current interaction scene, ignoring the voice data of the non-target user, and returning to the step of receiving the voice data input by the user in the current interaction scene.

6. A voice interaction control apparatus, the apparatus comprising:

the identity recognition module is used for extracting voiceprint features in the voice data, sending the extracted voiceprint features to a third-party voiceprint service platform, pre-storing voiceprint features of a target user by the third-party voiceprint service platform, receiving an identity recognition result returned by the third-party voiceprint service platform, and determining whether the user corresponding to the voice data is the target user according to the identity recognition result, wherein the identity recognition result comprises identity identification data and role information, and the target user is a user corresponding to a login account;

the data processing module is configured to execute a corresponding operation according to the semantic content of the voice data and the operation authority if the operation authority of the non-target user includes the operation authority corresponding to the current interaction scene, wherein if the operation authority of the non-target user includes the operation authority corresponding to the current interaction scene, executing the corresponding operation according to the semantic content of the voice data and the operation authority includes: extracting operation item keywords contained in semantic content of the voice data, determining the interaction type of a current interaction scene according to the semantic content, and executing corresponding operation according to the operation item keywords if the operation item keywords and the operation items contained in the interaction type accord with the operation authority of the current interaction scene, wherein the operation authority of the current interaction scene comprises the operation authority of the interaction scene and the data operation authority of the interaction scene.

7. The apparatus of claim 6, wherein the identity module is further configured to match the extracted voiceprint feature with a voiceprint feature of a pre-stored target user to obtain a voiceprint matching result, and determine whether the user corresponding to the voice data is the target user according to the voiceprint matching result.

8. The apparatus of claim 6, wherein the permission data obtaining module is further configured to match the extracted voiceprint feature in a preset voiceprint feature database, and if a pre-stored voiceprint feature that is consistent with the extracted voiceprint feature is matched, find identity data of the non-target user according to the pre-stored voiceprint feature, and obtain an operation permission corresponding to the current interaction scenario by the non-target user according to the identity data.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.