CN115527531A

CN115527531A - Equipment control method, device, equipment and storage medium

Info

Publication number: CN115527531A
Application number: CN202210823600.2A
Authority: CN
Inventors: 余海超
Original assignee: Shenzhen Coocaa Network Technology Co Ltd
Current assignee: Shenzhen Coocaa Network Technology Co Ltd
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2022-12-27

Abstract

The embodiment of the invention relates to a method, a device, equipment and a storage medium for controlling equipment, wherein the method comprises the following steps: acquiring first text information and identification information of each module in the current interface of the equipment to obtain a first text information set and an identification information set; acquiring second text information corresponding to a voice signal of a user; determining target first text information matched with the second text information from the first text information set; determining target identification information corresponding to the target first text information from the identification information set; and controlling the target module to be started according to the target identification information. Therefore, the target module can be quickly matched from the current interface and controlled to be started according to the voice signal of the user, the voice intention does not need to be recognized through the voice recognition model, the voice recognition process is simplified, the efficiency of voice recognition and voice control is improved, and the user experience is improved, so that the module in the current interface scene can be seen and can be spoken.

Description

Equipment control method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of voice control, in particular to a method, a device, equipment and a storage medium for controlling equipment.

Background

When the user uses the voice command to control the intelligent household appliance, the intention of the user needs to be recognized according to the model, and the corresponding module needs to be controlled according to the recognized intention.

In the prior art, global voice commands need to be recognized through a voice recognition model, and voice control cannot be rapidly performed on the content on the current interface, so that the existing voice recognition process is complicated, the efficiency is low, the text generalization query real-time performance of the current interface is realized, and real-time dynamic update registration is not realized.

Disclosure of Invention

In view of this, in order to solve the technical problems of tedious voice recognition process and low efficiency, embodiments of the present invention provide a method, an apparatus, a device, and a storage medium for controlling a device.

In a first aspect, an embodiment of the present invention provides a method for controlling a device, including:

acquiring first text information and identification information of each module in the current interface of the equipment to obtain a first text information set and an identification information set;

acquiring second text information corresponding to a voice signal of a user;

determining target first text information matched with the second text information from the first text information set;

determining target identification information corresponding to the target first text information from the identification information set;

and controlling the target module to be started according to the target identification information.

In a possible embodiment, the obtaining the first text information of each module in the current interface of the device includes:

acquiring a text corresponding to a selectable module in the current interface of the equipment as the first text information;

the acquiring of the second text information corresponding to the voice signal of the user includes:

receiving a voice signal of a user in a preset mode;

second text information is extracted from the speech signal.

In one possible embodiment, the determining, from the first set of text information, the target first text information that matches the second text information includes:

generating an information list based on the first text information set;

determining a plurality of matching degrees of a plurality of first text messages and second text messages in the information list according to a shortest path algorithm;

and determining the target first text information according to the matching degrees.

In one possible embodiment, the determining the target first text information according to the plurality of matching degrees includes:

determining a plurality of confidence levels for a plurality of said degrees of match;

determining the confidence coefficient with the maximum confidence coefficient and larger than a set threshold value as a target confidence coefficient from the plurality of confidence coefficients;

and determining the first text information corresponding to the target confidence degree as the target first text information.

In one possible embodiment, the method further comprises:

when target first text information matching the second text information is not determined from the first text information set, identifying the second text information through a speech intention recognition model;

and controlling the target module to be started according to the identification result.

In one possible embodiment, the method further comprises:

generating an incidence relation among the module, the first text information and the identification information;

determining target identification information corresponding to the target first text information from the identification information set, including:

determining target identification information corresponding to the target first text information from the identification information set according to the incidence relation;

the controlling the target module to be started according to the target identification information comprises the following steps:

determining a target module corresponding to the target identification information according to the incidence relation;

and controlling the target module to be started.

In one possible embodiment, the method further comprises:

when the target module is a module not in the current interface, generating display information;

and displaying the display information in the current interface.

In a second aspect, an embodiment of the present invention provides a device control apparatus, including:

the acquisition module is used for acquiring first text information and identification information of each module in the current interface of the equipment to obtain a first text information set and an identification information set;

the acquisition module is further used for acquiring second text information corresponding to the voice signal of the user;

the processing module is used for determining target first text information matched with the second text information from the first text information set;

the processing module is further configured to determine target identification information corresponding to the target first text information from the identification information set;

and the control module is used for controlling the target module to be started according to the target identification information.

In a third aspect, an embodiment of the present invention provides an apparatus, including: a processor and a memory, the processor being configured to execute a control program of the apparatus stored in the memory to implement the control method of the apparatus of any one of the above first aspects.

In a fourth aspect, an embodiment of the present invention provides a storage medium, where one or more programs are stored, and the one or more programs are executable by one or more processors to implement the method for controlling an apparatus according to any one of the first aspects.

According to the control scheme of the equipment provided by the embodiment of the invention, a first text information set and an identification information set are obtained by acquiring first text information and identification information of each module in a current interface of the equipment; acquiring second text information corresponding to a voice signal of a user; determining target first text information matched with the second text information from the first text information set; determining target identification information corresponding to the target first text information from the identification information set; and the target module is controlled to be started according to the target identification information, so that the target module can be quickly matched and controlled from the current interface according to the first text information spoken by the user, the voice recognition process is simplified, the voice recognition and voice control efficiency is improved, and the user experience is improved, so that the user can control the module in the current interface scene of the equipment through voice.

Drawings

Fig. 1 is a schematic flowchart of a method for controlling a device according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of another method for controlling a device according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a control device of an apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For the convenience of understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments, which are not to be construed as limiting the embodiments of the present invention.

Fig. 1 is a schematic flowchart of a method for controlling a device according to an embodiment of the present invention, and as shown in fig. 1, the method specifically includes:

s11, acquiring first text information and identification information of each module in the current interface of the equipment to obtain a first text information set and an identification information set;

the control method of the equipment provided by the embodiment of the invention is applied to the intelligent household equipment, and the intelligent household equipment is provided with a display screen, which can be: the target module is controlled by acquiring first text information of a current equipment screen and second text information in voice sent by a user.

In this embodiment, the modules displayed in the current interface of the device are determined, the name of each module is extracted as first text information, the first text information of all the modules forms a first text information set, the unique identifier of each module is extracted as identifier information, and the identifier information of all the modules forms an identifier information set. The first text information may be a text corresponding to a module displayed on the current interface of the device (for example, the device is a television, the module on the current interface of the television includes modules such as a recommended movie, a category in the navigation bar, and a function of the television, and a name of the movie, a name of the category in the navigation bar, and a name of the function of the television are used as the first text information), and the identification information may be a unique ID or an address of each module displayed on the current interface, that is, only the module in the interface currently displayed on the device is acquired and identified in this embodiment.

S12, acquiring second text information corresponding to the voice signal of the user;

in this embodiment, the device may receive audio of the user as a voice signal, and extract content related to a module in a current interface of the device in the voice signal as second text information, where the second text information is used to represent a control instruction of the user on the device.

Specifically, the user can speak a preset voice instruction, the voice control function of the device is awakened through the voice instruction, the awakened device receives audio sent by the user through the voice recognition device to serve as a voice signal, the voice signal is recognized, and recognized characters are extracted from the voice signal to serve as second text information.

S13, determining target first text information matched with the second text information from the first text information set;

in this embodiment, the device sends a connection request to the voice cloud backend, sends the first text information set, the identification information set and the second text information to the voice cloud backend after the connection is successful, registers a real-time information list of the first text information set, and determines, from the real-time information list, the first text information that is most matched with the second text information as the target first text information.

Specifically, the matching method may include: and extracting keywords related to the module from the second text information, determining the similarity between the keywords and each piece of first text information in the real-time information list, and taking the first text information with the highest similarity and larger than a set threshold value as target first text information.

S14, determining target identification information corresponding to the target first text information from the identification information set;

in this embodiment, each piece of identification information corresponds to a unique module, each module corresponds to a unique piece of first text information, an association relationship between the first text information and the identification information is generated when the first text information and the identification information are acquired, and the identification information having the association relationship with the first text information is determined to be the target identification information.

And S15, controlling the target module to be started according to the target identification information.

In this embodiment, since the target identification information is the unique ID of the module, the target module can be determined from the plurality of modules on the current interface according to the target identification information, a control instruction of the target module is generated, and the target module is controlled to be started according to the control instruction.

For example, the first text information set obtained in the current interface includes: and the modules of movies, TV shows, comprehensive arts and the like. And the obtained second text information is used for opening the movie, the movie is extracted from the second text information and used as a keyword, the first text information matched with the keyword is determined as a movie module, the position of the movie module in the interface is determined according to the identification information corresponding to the movie module, and the movie module is controlled to be opened.

According to the control method of the equipment provided by the embodiment of the invention, a first text information set and an identification information set are obtained by acquiring the first text information and the identification information of each module in the current interface of the equipment; acquiring second text information corresponding to a voice signal of a user; determining target first text information matched with the second text information from the first text information set; determining target identification information corresponding to the target first text information from the identification information set; the target module is controlled to be started according to the target identification information, the target module can be quickly matched from the current interface according to the voice signal of the user and is controlled to be started, the voice intention is not required to be recognized through a voice recognition model, the voice recognition process is simplified, the voice recognition efficiency and the voice control efficiency are accelerated, the user experience is improved, and the module in the current interface scene can be seen and can be spoken.

Fig. 2 is a schematic flowchart of another method for controlling a device according to an embodiment of the present invention, where as shown in fig. 2, the method specifically includes:

s21, receiving a voice signal of a user in a preset mode; extracting second text information from the voice signal; acquiring texts corresponding to selectable modules in the current interface of the equipment as the first text information, and acquiring identification information of each module;

in this embodiment, the preset mode is a mode capable of performing voice control; the second text information is a text corresponding to the voice instruction of the user; the selectable module is a module which can be selected or clicked by a user in an interface currently displayed by the equipment, a text corresponding to the module can be described through spoken language of the user, and the first text information is a text displayed in the current interface by the module.

Specifically, a user initiates a voice conversation process, the device is awakened through a far-field awakening use mode, the device enters a preset mode, the device starts a voice module to receive audio of the user as a voice signal, then the voice module calls a system interface to acquire a text corresponding to a selectable module in a current interface of the device as first text information, a first text information set is obtained, and a unique ID of each module is acquired as identification information, so that an identification information set is obtained. The voice module sends the voice signal, the first text information set and the identification information set to the voice cloud background server, and after the voice cloud background server receives the voice signal, the voice signal is recognized, and the second text information is extracted from the voice signal.

S22, generating an incidence relation among the module, the first text information and the identification information;

in this embodiment, each module corresponds to unique identification information, and each module corresponds to unique first text information, so that the voice cloud background server generates an association relationship between the identification information and the first text information after receiving the identification information and the first text information, and the association relationship between the two modules.

S23, generating an information list based on the first text information set; determining a plurality of matching degrees of a plurality of first text messages and second text messages in the information list according to a shortest path algorithm;

s24, determining a plurality of confidence degrees of the matching degrees; determining the confidence coefficient with the maximum confidence coefficient and larger than a set threshold value as a target confidence coefficient from the plurality of confidence coefficients; determining first text information corresponding to the target confidence degree as target first text information;

in this embodiment, after receiving the first text information set, the voice cloud backend server registers a real-time information list of the first text information, performs matching based on a shortest path detection algorithm according to the second text information and the real-time information list, determines a matching degree between the second text information and each piece of first text information to obtain a plurality of matching degrees, scores the plurality of matching degrees to obtain a plurality of confidence degrees corresponding to the plurality of matching degrees, selects a maximum confidence degree from the plurality of confidence degrees, determines whether the maximum confidence degree is greater than a set threshold (for example, the set threshold is 0.8), determines the maximum confidence degree as a target confidence degree when the determination result is greater than the maximum confidence degree, and determines that the first text information corresponding to the target confidence degree is the target first text information.

The shortest path detection algorithm of the embodiment supports real-time high-efficiency matching, information lists of hundreds of modules are simultaneously registered on one page at one time, the matching calculation time reaches 1 millisecond, and the method has generalization fault tolerance.

For example, when the first text message of a module in the registered real-time information list is "Huang Rihua version eight tianlongs", the user directly says that the eight tianrons, the eight tianrons and the Huang Rihua eight tianrons are opened, and the first text message can be matched according to the shortest path detection algorithm. Without the user having to speak the corresponding first text information in its entirety.

In a possible implementation manner, when the target first text information matched with the second text information is not determined from the first text information set, it is stated that the second text information spoken by the user is not the module of the current interface, so that a no-match result is returned through the shortest path detection matching algorithm, at this time, the second text information is recognized according to the standard voice intention recognition model, and a corresponding control instruction is returned according to the recognition result to the voice module of the device for receiving.

S25, determining target identification information corresponding to the target first text information from the identification information set according to the incidence relation; determining a target module corresponding to the target identification information according to the incidence relation; and controlling the target module to be started.

In this embodiment, identification information corresponding to the target first module is determined from the association relationship as target identification information, the target module corresponding to the target identification information is determined, the server generates a control instruction corresponding to the target first text information, the control instruction and the target identification information are returned to the voice module of the device, after the voice module receives the control instruction, the voice module calls the system interface of the current interface to control the target module to execute the control instruction according to the target identification information, and the target module is controlled to be started according to the control instruction.

In a possible implementation manner, when it is described that the module corresponding to the second text information spoken by the user is not the module of the current interface, it is described that the target module which the user wants to open is not in the current interface, and at this time, display information is generated; displaying the display information in the current interface, and reminding the user that the module corresponding to the voice input audio is a non-current interface module through the display information, so that the user can input the audio again or inquire whether the user opens the non-current interface module.

In the control method of the device provided by this embodiment, a voice signal of a user is received in a preset mode; extracting second text information from the voice signal; acquiring texts corresponding to selectable modules in a current interface of equipment as first text information, and acquiring identification information of each module; generating an incidence relation among the module, the first text information and the identification information; generating an information list based on the first text information set; determining a plurality of matching degrees of a plurality of first text messages and the second text messages in an information list according to a shortest path algorithm; determining a plurality of confidence degrees of a plurality of matching degrees; determining the confidence coefficient with the maximum confidence coefficient and larger than a set threshold value as a target confidence coefficient from the confidence coefficients; determining first text information corresponding to the target confidence degree as target first text information; determining target identification information corresponding to the target first text information from the identification information set according to the incidence relation; determining a target module corresponding to the target identification information according to the association relation; and controlling the target module to be started. The real-time registration capability of the data is realized through a real-time registration mechanism between the cloud and the equipment, the generalized matching calculation of the text similarity with high efficiency is realized through the combination of the cloud shortest path detection matching algorithm and the real-time registration capability of the data, the module can be directly selected without clicking the module through a remote controller or recognizing and calculating the voice content through a model, the voice recognition efficiency and the voice control efficiency of the current interface are accelerated, and the voice recognition process is simplified.

Fig. 3 is a schematic structural diagram of a control device of an apparatus according to an embodiment of the present invention, which specifically includes:

the acquiring module 31 is configured to acquire first text information and identification information of each module in the current interface of the device, so as to obtain a first text information set and an identification information set;

the obtaining module 31 is further configured to obtain second text information corresponding to a voice signal of a user;

a processing module 32, configured to determine, from the first text information set, target first text information that matches the second text information;

the processing module 32 is further configured to determine target identification information corresponding to the target first text information from the identification information set;

and the control module 33 is used for controlling the target module to be started according to the target identification information.

In a possible implementation manner, the obtaining module 31 is specifically configured to obtain a text corresponding to a selectable module in the current interface of the device as the first text information;

receiving a voice signal of a user in a preset mode;

second text information is extracted from the speech signal.

In a possible embodiment, the processing module 32 is specifically configured to generate an information list based on the first text information set;

In a possible embodiment, the processing module 32 is specifically configured to determine a plurality of confidence degrees of a plurality of the matching degrees;

and determining the first text information corresponding to the target confidence coefficient as target first text information.

In one possible embodiment, the processing module 32 is specifically configured to identify the second text information through a speech intention recognition model when the target first text information matching the second text information is not determined from the first text information set;

the control module 33 is specifically configured to control the target module to be started according to the recognition result.

In a possible embodiment, the processing module 32 is specifically configured to generate an association relationship among the module, the first text information, and the identification information;

the control module 33 is specifically configured to control the target module to be started.

In a possible embodiment, the processing module 32 is specifically configured to generate the display information when the target module is a module in the non-current interface;

and displaying the display information in the current interface.

The apparatus for controlling a device provided in this embodiment may be the apparatus shown in fig. 3, and may perform all the steps of the method for controlling the device shown in fig. 1 and 2, so as to achieve the technical effect of the method for controlling the device shown in fig. 1 and 2.

Fig. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present invention, where the apparatus 400 shown in fig. 4 includes: at least one processor 401, memory 402, at least one network interface 404, and other user interfaces 403. The various components in the device 400 are coupled together by a bus system 405. It is understood that the bus system 405 is used to enable connection communication between these components. The bus system 405 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 405 in fig. 4.

The user interface 403 may include, among other things, a display, a keyboard or a pointing device (e.g., a mouse, trackball (trackball), a touch pad or touch screen, etc.

It will be appreciated that memory 402 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), double Data Rate Synchronous Dynamic random access memory (ddr Data Rate SDRAM, ddr SDRAM), enhanced Synchronous SDRAM (ESDRAM), synchlronous SDRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 402 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, memory 402 stores the following elements, executable units or data structures, or a subset thereof, or an expanded set thereof: an operating system 4021 and application programs 4022.

The operating system 4021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is configured to implement various basic services and process hardware-based tasks. The application programs 4022 include various application programs, such as a Media Player (Media Player), a Browser (Browser), and the like, for implementing various application services. A program for implementing the method according to the embodiment of the present invention may be included in the application 4022.

In this embodiment of the present invention, by calling a program or an instruction stored in the memory 402, specifically, a program or an instruction stored in the application 4022, the processor 401 is configured to execute the method steps provided by the method embodiments, for example, including:

acquiring second text information corresponding to a voice signal of a user;

The method disclosed in the above embodiments of the present invention may be applied to the processor 401, or implemented by the processor 401. The processor 401 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 401. The Processor 401 may be a general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software elements in the decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in the memory 402, and the processor 401 reads the information in the memory 402 and completes the steps of the method in combination with the hardware.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

The device provided in this embodiment may be the device shown in fig. 4, and may perform all the steps of the control method of the device shown in fig. 1-2, so as to achieve the technical effect of the control method of the device shown in fig. 1-2, and for brevity, it is not described herein again.

The embodiment of the invention also provides a storage medium (computer readable storage medium). The storage medium herein stores one or more programs. Among others, the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.

When one or more programs in the storage medium are executable by one or more processors, the control method of the device executed on the device side described above is realized.

The processor is configured to execute a control program of the device stored in the memory to implement the following steps of the control method of the device executed on the device side:

acquiring second text information corresponding to a voice signal of a user;

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method of controlling a device, comprising:

acquiring second text information corresponding to a voice signal of a user;

2. The method of claim 1, wherein the obtaining first text information of each module in the current interface of the device comprises:

receiving a voice signal of a user in a preset mode;

second text information is extracted from the speech signal.

3. The method of claim 1, wherein the determining the target first textual information from the first set of textual information that matches the second textual information comprises:

generating an information list based on the first text information set;

4. The method according to claim 3, wherein said determining the target first text information according to the plurality of matching degrees comprises:

5. The method of claim 1, further comprising:

and controlling the target module to be started according to the recognition result.

6. The method of claim 4, further comprising:

and controlling the target module to be started.

7. The method of claim 1, further comprising:

and displaying the display information in the current interface.

8. A control apparatus of a device, characterized by comprising:

9. An apparatus, comprising: a processor and a memory, the processor being configured to execute a control program of the apparatus stored in the memory to implement the control method of the apparatus of any one of claims 1 to 7.

10. A storage medium characterized in that the storage medium stores one or more programs executable by one or more processors to implement a control method of an apparatus according to any one of claims 1 to 7.