CN111796926A

CN111796926A - Instruction execution method, device, storage medium and electronic device

Info

Publication number: CN111796926A
Application number: CN201910282156.6A
Authority: CN
Inventors: 陈仲铭; 何明
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2020-10-20
Anticipated expiration: 2039-04-09
Also published as: CN111796926B

Abstract

The embodiment of the application discloses an instruction execution method, an instruction execution device, a storage medium and electronic equipment, wherein when a user instruction is received, a first feature vector is generated according to the user instruction; acquiring current panoramic data and generating a second eigenvector according to the panoramic data; fusing the first feature vector and the second feature vector to generate a user feature matrix; acquiring an intention label matched with a user instruction according to the user characteristic matrix and a pre-trained intention prediction model; the user instruction is executed according to the intention tag. According to the scheme, the context information and the contextual information of the user instruction are supplemented by collecting panoramic data, so that the implicit intention of the user can be more accurately understood, and the user instruction can be better executed.

Description

Instruction execution method, device, storage medium and electronic device

技术领域technical field

本申请涉及终端技术领域，具体涉及一种指令执行方法、装置、存储介质及电子设备。The present application relates to the field of terminal technologies, and in particular, to an instruction execution method, apparatus, storage medium, and electronic device.

背景技术Background technique

用户一般通过控制指令控制终端执行相应的操作，如语音指令、触控操作指令等。随着终端技术的发展，需要终端对用户发出的控制指令的隐含意图进行理解，进而更好的执行用户指令，现有的意图理解方案，主要是通过语音识别技术，结合混合高斯模型、时态分类模型等，对语音数据进行分析，从指令提取关键词语作为隐含意图的理解；但是这种用户意图识别的方式仅依据于用户的语音指令，导致难以准确理解和描画用户的实际需求。The user generally controls the terminal to perform corresponding operations through control commands, such as voice commands, touch operation commands, and the like. With the development of terminal technology, the terminal needs to understand the implicit intention of the control command issued by the user, so as to better execute the user command. However, this method of user intent recognition is only based on the user's voice command, which makes it difficult to accurately understand and describe the user's actual needs.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供了一种指令执行方法、装置、存储介质及电子设备，可以提高终端对用户隐含意图识别的准确度，以更准确地执行用户指令。The embodiments of the present application provide an instruction execution method, apparatus, storage medium, and electronic device, which can improve the accuracy of identifying the implicit intention of the user by the terminal, so as to execute the user instruction more accurately.

第一方面，本申请实施例了提供了的一种指令执行方法，包括：In a first aspect, an embodiment of the present application provides an instruction execution method, including:

当接收到用户指令时，根据所述用户指令生成第一特征向量；When receiving a user instruction, generate a first feature vector according to the user instruction;

获取当前的全景数据，并根据所述全景数据生成第二特征向量；Obtain current panoramic data, and generate a second feature vector according to the panoramic data;

对所述第一特征向量和所述第二特征向量进行融合处理，生成用户特征矩阵；Fusion processing is performed on the first feature vector and the second feature vector to generate a user feature matrix;

根据所述用户特征矩阵和预先训练好的意图预测模型，获取与所述用户指令匹配的意图标签；According to the user feature matrix and the pre-trained intent prediction model, obtain an intent label matching the user instruction;

根据所述意图标签执行所述用户指令。The user instruction is executed according to the intent tag.

第二方面，本申请实施例了提供了的一种指令执行装置，包括：In a second aspect, an embodiment of the present application provides an instruction execution device, including:

信号特征提取模块，用于当接收到用户指令时，根据所述用户指令生成第一特征向量；a signal feature extraction module, configured to generate a first feature vector according to the user instruction when a user instruction is received;

情景特征提取模块，用于获取当前的全景数据，并根据所述全景数据生成第二特征向量；a scene feature extraction module, used to obtain current panoramic data, and generate a second feature vector according to the panoramic data;

特征融合模块，用于对所述第一特征向量和所述第二特征向量进行融合处理，生成用户特征矩阵；a feature fusion module, configured to perform fusion processing on the first feature vector and the second feature vector to generate a user feature matrix;

意图预测模块，用于根据所述用户特征矩阵和预先训练好的意图预测模型，获取与所述用户指令匹配的意图标签；an intent prediction module, configured to obtain an intent label matching the user instruction according to the user feature matrix and the pre-trained intent prediction model;

指令执行模块，用于根据所述意图标签执行所述用户指令。An instruction execution module, configured to execute the user instruction according to the intent tag.

第三方面，本申请实施例提供的存储介质，其上存储有计算机程序，当所述计算机程序在计算机上运行时，使得所述计算机执行如本申请任一实施例提供的指令执行方法。In a third aspect, a storage medium provided by an embodiment of the present application stores a computer program thereon, and when the computer program runs on a computer, the computer executes the instruction execution method provided by any embodiment of the present application.

第四方面，本申请实施例提供了一种电子设备，包括处理器和存储器，所述存储器有计算机程序，所述处理器通过调用所述计算机程序，用于执行如本申请任一实施例提供的指令执行方法。In a fourth aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, the memory having a computer program, and the processor is configured to execute the method provided by any embodiment of the present application by invoking the computer program. instruction execution method.

本申请实施例提供的技术方案，当接收到用户指令时，根据用户指令生成第一特征向量，获取当前的全景数据，根据全景数据生成第二特征向量，对第一特征向量和第二特征向量进行融合处理，生成用户特征矩阵，接下来将该用户特征矩阵作为预先训练好的意图预测模型的输入数据，获取与用户指令匹配的意图标签，根据该意图标签执行用户指令，该方案通过采集全景数据对用户指令的上下文信息和情景信息进行补充，能够更准确地理解用户的隐含意图，从而更好地执行用户指令。According to the technical solution provided by the embodiment of the present application, when a user instruction is received, a first feature vector is generated according to the user instruction, current panoramic data is acquired, a second feature vector is generated according to the panoramic data, and the first feature vector and the second feature vector are compared between the first feature vector and the second feature vector. Fusion processing is performed to generate a user feature matrix. Next, the user feature matrix is used as the input data of the pre-trained intent prediction model, and the intent label matching the user instruction is obtained, and the user instruction is executed according to the intent tag. The data supplements the contextual information and contextual information of the user's instruction, which can more accurately understand the user's implicit intention, so as to better execute the user's instruction.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained from these drawings without creative effort.

图1为本申请实施例提供的指令执行方法的全景感知架构示意图。FIG. 1 is a schematic diagram of a panoramic perception architecture of an instruction execution method provided by an embodiment of the present application.

图2为本申请实施例提供的指令执行方法的应用场景示意图。FIG. 2 is a schematic diagram of an application scenario of an instruction execution method provided by an embodiment of the present application.

图3为本申请实施例提供的指令执行方法的第一种流程示意图。FIG. 3 is a first schematic flowchart of an instruction execution method provided by an embodiment of the present application.

图4为本申请实施例提供的指令执行方法的第二种流程示意图。FIG. 4 is a schematic flowchart of a second method of an instruction execution method provided by an embodiment of the present application.

图5为本申请实施例提供的指令执行方法的第三种流程示意图。FIG. 5 is a third schematic flowchart of an instruction execution method provided by an embodiment of the present application.

图6为本申请实施例提供的指令执行装置的结构示意图。FIG. 6 is a schematic structural diagram of an instruction execution apparatus provided by an embodiment of the present application.

图7为本申请实施例提供的电子设备的第一种结构示意图。FIG. 7 is a schematic diagram of a first structure of an electronic device provided by an embodiment of the present application.

图8为本申请实施例提供的电子设备的第二种结构示意图。FIG. 8 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述。显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域技术人员在没有付出创造性劳动前提下所获得的所有其他实施例，都属于本申请的保护范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative efforts shall fall within the protection scope of this application.

本申请中的术语“第一”、“第二”和“第三”等是用于区别不同对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或模块的过程、方法、系统、产品或设备没有限定于已列出的步骤或模块，而是某些实施例还包括没有列出的步骤或模块，或某些实施例还包括对于这些过程、方法、产品或设备固有的其它步骤或模块。The terms "first," "second," and "third," etc. in this application are used to distinguish different objects, rather than to describe a specific order. Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or modules is not limited to the listed steps or modules, but some embodiments also include unlisted steps or modules, or some embodiments Other steps or modules inherent to these processes, methods, products or devices are also included.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

参考图1，图1为本申请实施例提供的指令执行方法的全景感知架构示意图。所述指令执行方法应用于电子设备。所述电子设备中设置有全景感知架构。所述全景感知架构为电子设备中用于实现所述指令执行方法的硬件和软件的集成。Referring to FIG. 1 , FIG. 1 is a schematic diagram of a panoramic perception architecture of an instruction execution method provided by an embodiment of the present application. The instruction execution method is applied to an electronic device. The electronic device is provided with a panoramic perception architecture. The panoramic perception architecture is an integration of hardware and software in an electronic device for implementing the instruction execution method.

其中，全景感知架构包括信息感知层、数据处理层、特征抽取层、情景建模层以及智能服务层。Among them, the panoramic perception architecture includes an information perception layer, a data processing layer, a feature extraction layer, a scenario modeling layer, and an intelligent service layer.

信息感知层用于获取电子设备自身的信息或者外部环境中的信息。所述信息感知层可以包括多个传感器。例如，所述信息感知层包括距离传感器、磁场传感器、光线传感器、加速度传感器、指纹传感器、霍尔传感器、位置传感器、陀螺仪、惯性传感器、姿态感应器、气压计、心率传感器等多个传感器。The information perception layer is used to obtain the information of the electronic device itself or the information in the external environment. The information perception layer may include a plurality of sensors. For example, the information perception layer includes a distance sensor, a magnetic field sensor, a light sensor, an acceleration sensor, a fingerprint sensor, a Hall sensor, a position sensor, a gyroscope, an inertial sensor, an attitude sensor, a barometer, a heart rate sensor, and other sensors.

其中，距离传感器可以用于检测电子设备与外部物体之间的距离。磁场传感器可以用于检测电子设备所处环境的磁场信息。光线传感器可以用于检测电子设备所处环境的光线信息。加速度传感器可以用于检测电子设备的加速度数据。指纹传感器可以用于采集用户的指纹信息。霍尔传感器是根据霍尔效应制作的一种磁场传感器，可以用于实现电子设备的自动控制。位置传感器可以用于检测电子设备当前所处的地理位置。陀螺仪可以用于检测电子设备在各个方向上的角速度。惯性传感器可以用于检测电子设备的运动数据。姿态感应器可以用于感应电子设备的姿态信息。气压计可以用于检测电子设备所处环境的气压。心率传感器可以用于检测用户的心率信息。Among them, the distance sensor can be used to detect the distance between the electronic device and the external object. The magnetic field sensor can be used to detect the magnetic field information of the environment in which the electronic device is located. The light sensor can be used to detect the light information of the environment where the electronic device is located. Acceleration sensors can be used to detect acceleration data of electronic devices. The fingerprint sensor can be used to collect the user's fingerprint information. Hall sensor is a magnetic field sensor made according to the Hall effect, which can be used to realize automatic control of electronic equipment. The location sensor can be used to detect the current geographic location of the electronic device. Gyroscopes can be used to detect the angular velocity of electronic devices in various directions. Inertial sensors can be used to detect motion data of electronic devices. The attitude sensor can be used to sense the attitude information of the electronic device. A barometer can be used to detect the air pressure in the environment in which the electronic device is located. The heart rate sensor may be used to detect the user's heart rate information.

数据处理层用于对信息感知层获取到的数据进行处理。例如，数据处理层可以对信息感知层获取到的数据进行数据清理、数据集成、数据变换、数据归约等处理。The data processing layer is used to process the data obtained by the information perception layer. For example, the data processing layer can perform data cleaning, data integration, data transformation, data reduction and other processing on the data obtained by the information perception layer.

其中，数据清理是指对信息感知层获取到的大量数据进行清理，以剔除无效数据和重复数据。数据集成是指将信息感知层获取到的多个单维度数据集成到一个更高或者更抽象的维度，以对多个单维度的数据进行综合处理。数据变换是指对信息感知层获取到的数据进行数据类型的转换或者格式的转换等，以使变换后的数据满足处理的需求。数据归约是指在尽可能保持数据原貌的前提下，最大限度的精简数据量。Among them, data cleaning refers to cleaning a large amount of data obtained by the information perception layer to eliminate invalid data and duplicate data. Data integration refers to integrating multiple single-dimensional data obtained by the information perception layer into a higher or more abstract dimension to comprehensively process multiple single-dimensional data. Data transformation refers to converting the data type or format of the data obtained by the information perception layer, so that the transformed data can meet the processing requirements. Data reduction refers to reducing the amount of data to the greatest extent possible on the premise of keeping the original data as much as possible.

特征抽取层用于对数据处理层处理后的数据进行特征抽取，以提取所述数据中包括的特征。提取到的特征可以反映出电子设备自身的状态或者用户的状态或者电子设备所处环境的环境状态等。The feature extraction layer is used to perform feature extraction on the data processed by the data processing layer to extract features included in the data. The extracted features may reflect the state of the electronic device itself, the state of the user, or the environmental state of the environment in which the electronic device is located.

其中，特征抽取层可以通过过滤法、包装法、集成法等方法来提取特征或者对提取到的特征进行处理。Among them, the feature extraction layer can extract features or process the extracted features by filtering method, packaging method, integration method and other methods.

过滤法是指对提取到的特征进行过滤，以删除冗余的特征数据。包装法用于对提取到的特征进行筛选。集成法是指将多种特征提取方法集成到一起，以构建一种更加高效、更加准确的特征提取方法，用于提取特征。The filtering method refers to filtering the extracted features to remove redundant feature data. The packing method is used to filter the extracted features. The integration method refers to the integration of multiple feature extraction methods to construct a more efficient and accurate feature extraction method for feature extraction.

情景建模层用于根据特征抽取层提取到的特征来构建模型，所得到的模型可以用于表示电子设备的状态或者用户的状态或者环境状态等。例如，情景建模层可以根据特征抽取层提取到的特征来构建关键值模型、模式标识模型、图模型、实体联系模型、面向对象模型等。The scenario modeling layer is used to construct a model according to the features extracted by the feature extraction layer, and the obtained model can be used to represent the state of the electronic device, the state of the user, or the environment state, etc. For example, the scenario modeling layer can construct a key value model, a pattern identification model, a graph model, an entity relationship model, an object-oriented model, etc. according to the features extracted by the feature extraction layer.

智能服务层用于根据情景建模层所构建的模型为用户提供智能化的服务。例如，智能服务层可以为用户提供基础应用服务，可以为电子设备进行系统智能优化，还可以为用户提供个性化智能服务。The intelligent service layer is used to provide users with intelligent services according to the model constructed by the scenario modeling layer. For example, the intelligent service layer can provide users with basic application services, can perform system intelligent optimization for electronic devices, and can also provide users with personalized intelligent services.

此外，全景感知架构中还可以包括多种算法，每一种算法都可以用于对数据进行分析处理，所述多种算法可以构成算法库。例如，所述算法库中可以包括马尔科夫算法、隐形狄利克雷分布算法、贝叶斯分类算法、支持向量机、K均值聚类算法、K近邻算法、条件随机场、残差网络、长短期记忆网络、卷积神经网络、循环神经网络等算法。In addition, the panoramic perception architecture may also include multiple algorithms, each of which may be used to analyze and process data, and the multiple algorithms may constitute an algorithm library. For example, the algorithm library may include Markov algorithm, invisible Dirichlet distribution algorithm, Bayesian classification algorithm, support vector machine, K-means clustering algorithm, K-nearest neighbor algorithm, conditional random field, residual network, long-term Algorithms such as short-term memory networks, convolutional neural networks, and recurrent neural networks.

基于上述全景感知构架，电子设备通过信息感知层和/或者其他方式采集用户的全景数据，数据处理层对全景数据进行处理，比如，对获取的全景数据进行数据清理、数据集成等。接下来智能服务层按照本申请提出的指令执行方法响应用户指令，例如，当接收到用户指令时，根据用户指令生成第一特征向量，获取当前的全景数据，根据全景数据生成第二特征向量，对第一特征向量和第二特征向量进行融合处理，生成用户特征矩阵，接下来将该用户特征矩阵作为预先训练好的意图预测模型的输入数据，获取与用户指令匹配的意图标签，根据该意图标签执行用户指令，该方案通过采集全景数据对用户指令的上下文信息和情景信息进行补充，能够更准确地理解用户的隐含意图，从而更好地执行用户指令。Based on the above panoramic perception architecture, the electronic device collects the user's panoramic data through the information perception layer and/or other methods, and the data processing layer processes the panoramic data, for example, performs data cleaning and data integration on the obtained panoramic data. Next, the intelligent service layer responds to the user instruction according to the instruction execution method proposed in this application. For example, when receiving the user instruction, it generates the first feature vector according to the user instruction, obtains the current panoramic data, and generates the second feature vector according to the panoramic data, The first feature vector and the second feature vector are fused to generate a user feature matrix. Next, the user feature matrix is used as the input data of the pre-trained intent prediction model, and the intent label matching the user instruction is obtained. According to the intent The tag executes the user's instruction, and the solution supplements the context information and contextual information of the user's instruction by collecting panoramic data, which can more accurately understand the user's implicit intention, and thus better execute the user's instruction.

本申请实施例提供一种指令执行方法，该指令执行方法的执行主体可以是本申请实施例提供的指令执行装置，或者集成了该指令执行装置的电子设备，其中该指令执行装置可以采用硬件或者软件的方式实现。其中，电子设备可以是智能手机、平板电脑、掌上电脑、笔记本电脑、或者台式电脑等设备。The embodiment of the present application provides an instruction execution method, and the execution body of the instruction execution method may be the instruction execution apparatus provided by the embodiment of the present application, or an electronic device integrated with the instruction execution apparatus, wherein the instruction execution apparatus may adopt hardware or implemented in software. The electronic device may be a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer and other devices.

请参阅图2，图2为本申请实施例提供的指令执行方法的应用场景示意图，以指令执行装置集成在电子设备中为例，电子设备可以接收用户指令，例如语音指令、触控指令或者握持指令等，在接收到用户指令时，根据用户指令生成第一特征向量；然后对电子设备当前的全景数据进行采集，例如用户信息、传感器状态数据、电子设备中应用程序的使用信息等，根据采集到的全景数据生成第二特征向量，接下来将第一特征向量和第二特征向量进行融合处理，生成用户特征矩阵，根据用户特征矩阵和预先训练得到的意图预测模型，生成与该用户特征矩阵对应的意图标签，并根据该意图标签执行用户指令，通过这种方式，根据在接收指令时采集到的全景数据，为用户指令补充完善的上下文信息和所述情景环境的信息，使得电子设备能够更准确地理解用户的隐含意图，从而更好地执行用户指令。Please refer to FIG. 2. FIG. 2 is a schematic diagram of an application scenario of the instruction execution method provided by an embodiment of the present application. Taking the instruction execution device integrated in an electronic device as an example, the electronic device can receive user instructions, such as voice instructions, touch instructions or grip instructions. When receiving the user's instruction, the first feature vector is generated according to the user's instruction; then the current panoramic data of the electronic device is collected, such as user information, sensor status data, application program usage information in the electronic device, etc. The collected panoramic data generates a second feature vector, and then the first feature vector and the second feature vector are fused to generate a user feature matrix. According to the user feature matrix and the pre-trained intent prediction model, the user feature The intent tag corresponding to the matrix, and execute the user instruction according to the intent tag. In this way, based on the panoramic data collected when the instruction is received, the user instruction is supplemented with complete context information and information of the contextual environment, so that the electronic device It is able to more accurately understand the user's implicit intention, so as to better execute the user's instructions.

请参照图3，图3为本申请实施例提供的指令执行方法的第一种流程示意图。本申请实施例提供的指令执行方法的具体流程可以如下：Please refer to FIG. 3 , which is a schematic flowchart of a first type of instruction execution method provided by an embodiment of the present application. The specific process of the instruction execution method provided by the embodiment of the present application may be as follows:

步骤101、当接收到用户指令时，根据所述用户指令生成第一特征向量。Step 101: When a user instruction is received, generate a first feature vector according to the user instruction.

在本实施例中，用户指令可以有多种类型，例如语音指令、触控指令和握持指令。电子设备在接收到用户指令时，获取用户指令对应的信号数据，对信号数据进行归一化处理，生成对应的第一特征向量，使用第一特征向量来表征该用户指令中包含的信息。其中，第一特征向量可以表示如下：In this embodiment, the user command may be of various types, such as voice command, touch command, and hold command. When receiving a user instruction, the electronic device obtains signal data corresponding to the user instruction, normalizes the signal data, generates a corresponding first feature vector, and uses the first feature vector to represent the information contained in the user instruction. Among them, the first feature vector can be expressed as follows:

s₁＝{yi₁,yi₂,…,yi_n}s ₁ ={yi ₁ ,yi ₂ ,...,yi _n }

例如，在一实施例中，电子设备中设置有语音组件，例如麦克风，电子设备可以通过麦克风持续采集用户的语音数据。在电子设备开启语音识别功能后，用户可以通过语音指令对电子设备进行控制，例如“播放音乐”、“拨打电话”等指令。例如，当接收到用户指令时，根据所述用户指令生成第一特征向量的步骤，包括：当接收到语音指令时，获取语音组件采集到的语音数据；根据预先训练好的自编码循环神经网络，生成所述语音数据的语义特征向量，将所述语义特征向量作为所述第一特征向量。For example, in one embodiment, the electronic device is provided with a voice component, such as a microphone, and the electronic device can continuously collect the user's voice data through the microphone. After the electronic device turns on the voice recognition function, the user can control the electronic device through voice commands, such as "play music", "make a call" and other commands. For example, when a user instruction is received, the step of generating the first feature vector according to the user instruction includes: when a voice instruction is received, acquiring the voice data collected by the voice component; , generating a semantic feature vector of the speech data, and using the semantic feature vector as the first feature vector.

其中，自编码神经网络模型由一个encoder编码器和一个decoder解码器组成，该网络的输出等于输入，网络还包括有中间隐藏层，中间隐藏层能够提取语音数据的语义特征向量。本方案中采用自编码循环神经网络从语音数据中提取语义特征向量，自编码循环神经网络的输入数据和输出数据均为上述语音数据。该网络在训练时，无需对语音数据贴标签，预先采集大量的语音数据作为网络的输入和输出，网络通过自学习，确定参数。Among them, the self-encoding neural network model consists of an encoder encoder and a decoder decoder. The output of the network is equal to the input. The network also includes an intermediate hidden layer, which can extract the semantic feature vector of the speech data. In this scheme, the self-encoding cyclic neural network is used to extract the semantic feature vector from the speech data, and the input data and the output data of the self-encoding cyclic neural network are the above-mentioned speech data. During the training of the network, there is no need to label the speech data, and a large amount of speech data is collected in advance as the input and output of the network, and the network determines the parameters through self-learning.

或者，在另一实施例中，还可以采用其他的特征提取方式获取语音特征向量。步骤101、当接收到用户指令时，根据所述用户指令生成第一特征向量的步骤可以包括：Or, in another embodiment, other feature extraction methods may also be used to obtain speech feature vectors. Step 101. When a user instruction is received, the step of generating a first feature vector according to the user instruction may include:

当接收到语音指令时，获取语音组件采集到的语音数据；When a voice command is received, obtain the voice data collected by the voice component;

根据音频特征提取算法将所述语音数据转换为频谱图；Convert the voice data into a spectrogram according to an audio feature extraction algorithm;

根据预先训练好的自编码卷积神经网络和所述频谱图，生成所述语音数据的语义特征向量，将所述语义特征向量作为所述第一特征向量。According to the pre-trained self-encoding convolutional neural network and the spectrogram, a semantic feature vector of the speech data is generated, and the semantic feature vector is used as the first feature vector.

其中，音频特征提取算法可以是MFCC(Mel Frequency Cepstrum Coefficient，梅尔频率倒谱系数)算法或者FFT(Fast Fourier Transformation，快速傅里叶变换)算法，通过音频特征提取算法将语音数据转换为频谱图，将频谱图作为自编码卷积神经网络的输入数据和输出数据，从网络中提取语义特征向量。与上述自编码循环神经网路类似，自编码卷积神经网络也是一种自编码器，其中，自编码卷积神经网络是使用卷积层构建自编码器，通过训练这种自编码器的输出数据与输入数据一致，以获取其中间隐藏层中有价值的信息。The audio feature extraction algorithm may be an MFCC (Mel Frequency Cepstrum Coefficient, Mel Frequency Cepstrum Coefficient) algorithm or an FFT (Fast Fourier Transformation, Fast Fourier Transform) algorithm, and the audio feature extraction algorithm converts the speech data into a spectrogram , using the spectrogram as the input and output data of the self-encoding convolutional neural network, and extracting semantic feature vectors from the network. Similar to the above-mentioned self-encoding cyclic neural network, the self-encoding convolutional neural network is also a kind of self-encoder, wherein the self-encoding convolutional neural network uses a convolutional layer to build an auto-encoder, and the output of the self-encoder is trained by training the output of the self-encoder. The data is aligned with the input data to obtain valuable information in its intermediate hidden layers.

电子设备通过上述方案获取到语义向量特征，将语义向量特征作为第一特征向量。The electronic device acquires the semantic vector feature through the above solution, and uses the semantic vector feature as the first feature vector.

又例如，在一实施例中，所述用户指令为触控指令；当接收到用户指令时，根据所述用户指令生成第一特征向量的步骤可以包括：当接收到触控指令时，获取触控传感器采集到的触控数据；将所述触控数据转换为所述第一特征向量。For another example, in one embodiment, the user instruction is a touch instruction; when the user instruction is received, the step of generating the first feature vector according to the user instruction may include: when the touch instruction is received, obtaining the touch touch data collected by a control sensor; convert the touch data into the first feature vector.

其中，电子设备上设置有用于侦测触摸操作的触摸传感器，比如设置在触摸屏下的触摸传感器，可以侦测用户输入的各种触摸操作、触摸手势等。当接收到触控指令时，获取触控传感器采集到的触控数据，触控数据包括触摸位置、触摸轨迹。The electronic device is provided with a touch sensor for detecting touch operations, such as a touch sensor provided under the touch screen, which can detect various touch operations, touch gestures, etc. input by the user. When a touch command is received, the touch data collected by the touch sensor is acquired, and the touch data includes a touch position and a touch track.

又例如，在一实施例中，所述用户指令为握持指令；当接收到用户指令时，根据所述用户指令生成第一特征向量的步骤可以包括：当接收到握持指令时，获取握持传感器采集到的握持数据；将所述握持数据转换为所述第一特征向量。For another example, in one embodiment, the user instruction is a hold instruction; when the user instruction is received, the step of generating the first feature vector according to the user instruction may include: when the hold instruction is received, obtaining the hold instruction holding data collected by the holding sensor; converting the holding data into the first feature vector.

其中，电子设备的边框、后盖等位置处设置有握持传感器，能够检测用户的握持操作触发的握持指令。该握持数据可以是位置信息，例如位置坐标。为了便于后续将多个特征向量融合为一个特征矩阵，需要预先定义特征向量的预设长度。电子设备将获取到的位置坐标转换预设长度的向量表示，例如，预设的向量长度为10，则采用重复叠加的方式将长度为2的位置坐标转换为长度为10的第一特征向量。Wherein, a holding sensor is provided at positions such as the frame and the back cover of the electronic device, which can detect the holding instruction triggered by the holding operation of the user. The grip data may be location information, such as location coordinates. In order to facilitate the subsequent fusion of multiple feature vectors into one feature matrix, a preset length of the feature vectors needs to be pre-defined. The electronic device converts the obtained position coordinates into a vector representation with a preset length. For example, if the preset vector length is 10, the position coordinates with a length of 2 are converted into a first feature vector with a length of 10 by means of repeated superposition.

步骤102、获取当前的全景数据，并根据所述全景数据生成第二特征向量。Step 102: Acquire current panoramic data, and generate a second feature vector according to the panoramic data.

本申请实施例中，电子设备的全景数据包括但不限于以下几类数据：终端状态数据、用户状态数据和传感器状态数据。In this embodiment of the present application, the panoramic data of the electronic device includes, but is not limited to, the following types of data: terminal state data, user state data, and sensor state data.

其中，用户状态数据包括电子设备的摄像头间接性地捕获到的用户脸部图像，以及从用户数据库中获取的用户的年龄、性别等用户信息。The user state data includes the user's face image indirectly captured by the camera of the electronic device, and user information such as the user's age and gender obtained from the user database.

终端运行数据包括用户终端在各时间区间内所处的运行模式，其中运行模式包括游戏模式、娱乐模式、影音模式等，可以根据当前运行的应用程序的类型确定终端所处的运行模式，当前运行的应用程序的类型可以直接从应用程序安装包的分类信息中获得；或者，终端运行数据还可以包括终端的剩余电量、显示模式、网络状态、熄屏/锁屏状态等。The terminal operation data includes the operation mode of the user terminal in each time interval, and the operation mode includes game mode, entertainment mode, audio-visual mode, etc. The operation mode of the terminal can be determined according to the type of the currently running application program. The type of the application can be obtained directly from the classification information of the application installation package; or, the terminal running data can also include the terminal's remaining power, display mode, network status, screen-off/locking status, and so on.

传感器状态数据包括电子设备上的各传感器采集到的信号，例如，电子设备上包括如下传感器：距离传感器、磁场传感器、光线传感器、加速度传感器、指纹传感器、霍尔传感器、位置传感器、陀螺仪、惯性传感器、姿态感应器、气压计、心率传感器等多个传感器。获取电子设备在接收到用户指令时的传感器状态数据，或者获取电子设备在接收到用户指令前的一段时间的传感器状态数据。在一些实施例中，可以有针对性的获取部分传感器的状态数据。Sensor status data includes signals collected by various sensors on the electronic device. For example, the electronic device includes the following sensors: distance sensor, magnetic field sensor, light sensor, acceleration sensor, fingerprint sensor, Hall sensor, position sensor, gyroscope, inertial sensor Sensors, attitude sensors, barometers, heart rate sensors, etc. Acquire sensor state data of the electronic device when it receives a user instruction, or acquire sensor state data of the electronic device for a period of time before receiving the user instruction. In some embodiments, the state data of some sensors may be acquired in a targeted manner.

请参阅图4，图4为本申请实施例提供的指令执行方法的第二种流程示意图。步骤102、获取当前的全景数据，并根据所述全景数据生成第二特征向量可以包括：Please refer to FIG. 4. FIG. 4 is a schematic flowchart of a second instruction execution method provided by an embodiment of the present application. Step 102, obtaining current panoramic data, and generating a second feature vector according to the panoramic data may include:

步骤1021、获取当前的终端状态数据、用户状态数据和传感器状态数据；Step 1021: Acquire current terminal status data, user status data and sensor status data;

步骤1022、根据所述终端状态数据生成终端状态特征，根据所述用户状态数据生成用户状态特征，并根据所述传感器状态数据生成终端情景特征；Step 1022: Generate a terminal status feature according to the terminal status data, generate a user status feature according to the user status data, and generate a terminal context feature according to the sensor status data;

步骤1023、融合所述终端状态特征、所述用户状态特征和所述终端情景特征，生成所述第二特征向量。Step 1023 , fuse the terminal state feature, the user state feature and the terminal context feature to generate the second feature vector.

其中，在一些实施例中，获取用户状态数据的步骤可以包括：调用摄像头组件捕获用户脸部图像；根据预设的卷积神经网络模型对所述用户脸部图像进行识别，生成用户情感标签；获取用户信息，并将所述用户情感标签和所述用户信息作为所述用户状态数据。用户信息可以从用户数据库中获取，用户信息可以包括用户性别、用户年龄、用户爱好等。Wherein, in some embodiments, the step of acquiring the user status data may include: invoking a camera component to capture the user's face image; recognizing the user's face image according to a preset convolutional neural network model, and generating a user emotion label; Obtain user information, and use the user emotion tag and the user information as the user state data. User information may be obtained from a user database, and the user information may include user gender, user age, user preferences, and the like.

其中，在一些实施例中，获取当前的终端状态数据的步骤，包括：确定当前运行的应用程序所述的程序类别；根据所述程序类别确定终端当前的运行模式，其中运行模式包括游戏模式、娱乐模式、影音模式；将所述运行模式作为终端状态数据。Wherein, in some embodiments, the step of acquiring the current terminal state data includes: determining the program category described by the currently running application program; determining the current running mode of the terminal according to the program category, wherein the running mode includes game mode, Entertainment mode, video and audio mode; take the running mode as terminal state data.

综上所述，根据终端状态数据生成终端状态特征ys₁，根据传感器的状态数据，例如，从磁力计、加速度计、陀螺仪通过卡尔曼滤波算法获得四维的终端姿态特征ys₂～ys₅，通过气压计采集的数据获取气压特征ys₆、通过网络模块确定WIFI连接状态ys₇、通过位置传感采集的数据进行定位，得到用户当前的位置属性(如商场、家里、公司、公园等)，生成特征ys₈；还可以进一步的结合磁力计、加速度传感器、陀螺仪、气压计10轴信息使用滤波算法或者主成分分析算法得到新的多维数据，生成对应的特征ys₉。根据用户的情感标签生成特征ys₁₀，根据用户的性别、年龄、爱好生成特征ys₁₁～ys₁₃。对于上述特征中的非数字形式的特征，可以采用建立索引号的方式，将其转换为数字表示，例如，对于当前系统终端状态的运行模式这一特征，使用索引号代表当前的状态模式，比如1是游戏模式，2是娱乐模式，3是影音模式。若当前运行模式为游戏模式，则确定当前系统状态ys1＝1。在获取到全部数字表示的特征后，融合上述特征数据得到一个长向量，将该长向量归一化处理，得到第二特征向量s₂：To sum up, the terminal state feature ys ₁ is generated according to the terminal state data, and the four-dimensional terminal attitude feature ys ₂ -ys ₅ is obtained from the magnetometer, accelerometer, and gyroscope through the Kalman filtering algorithm according to the state data of the sensor, for example, Obtain the air pressure characteristics ys ₆ through the data collected by the barometer, determine the WIFI connection state ys ₇ through the network module, locate the data collected through the location sensor, and obtain the user's current location attributes (such as shopping malls, homes, companies, parks, etc.), Generate the feature ys ₈ ; further combine the 10-axis information of the magnetometer, the acceleration sensor, the gyroscope, and the barometer to obtain new multi-dimensional data by using the filtering algorithm or the principal component analysis algorithm, and generate the corresponding feature ys ₉ . The feature ys ₁₀ is generated according to the user's emotional tag, and the features ys ₁₁ to ys ₁₃ are generated according to the user's gender, age, and hobby. For the features in the non-numeric form in the above features, the index number can be established to convert it into a digital representation. For example, for the feature of the running mode of the current system terminal state, the index number is used to represent the current state mode, such as 1 is the game mode, 2 is the entertainment mode, and 3 is the video mode. If the current running mode is the game mode, it is determined that the current system state ys1=1. After acquiring the features represented by all numbers, a long vector is obtained by fusing the above feature data, and the long vector is normalized to obtain the second feature vector s ₂ :

s₂＝{ys₁,ys₂,…,ys_m}s ₂ ={ys ₁ ,ys ₂ ,…,ys _m }

步骤103、对所述第一特征向量和所述第二特征向量进行融合处理，生成用户特征矩阵。Step 103: Perform fusion processing on the first feature vector and the second feature vector to generate a user feature matrix.

其中，可以将第一特征向量s₁和第二特征向量s₂进行矩阵叠加，生成如下用户特征矩阵：Among them, the first eigenvector s ₁ and the second eigenvector s ₂ can be superimposed in matrix to generate the following user feature matrix:

若第一特征向量和第二特征向量的长度不相等，则可以通过补零的方式调整长度短的向量，若n＜m，则采用补零的方式，将第一特征向量s₁的长度延伸为m。若n＞m，则采用补零的方式，将第二特征向量s₂的长度延伸为n。If the lengths of the first eigenvector and the second eigenvector are not equal, the short-length vector can be adjusted by means of zero-padding. If n<m, the length of the first eigenvector s ₁ can be extended by means of zero-padding. is m. If n>m, the length of the second feature vector s ₂ is extended to n by means of zero padding.

或者，在一可选的实施方式中，将第一特征向量和第二特征向量调整为相同的长度后，将第一特征向量s₁和第二特征向量s₂进行矩阵叠加，然后，为了产生更加丰富的特征向量提供后续操作，将叠加的矩阵进行翻转操作，生成如下矩阵：Or, in an optional embodiment, after the first eigenvector and the second eigenvector are adjusted to have the same length, the first eigenvector s ₁ and the second eigenvector s ₂ are subjected to matrix superposition, and then, in order to generate More abundant eigenvectors provide subsequent operations, and the superimposed matrix is flipped to generate the following matrix:

将翻转操作前后的矩阵合并得到如下矩阵作为用户特征矩阵。Combine the matrices before and after the flip operation to obtain the following matrix as the user feature matrix.

步骤104、根据所述用户特征矩阵和预先训练好的意图预测模型，获取与所述用户指令匹配的意图标签。Step 104: Acquire an intent label matching the user instruction according to the user feature matrix and the pre-trained intent prediction model.

本申请实施例中，意图预测模型为一种分类模型，表征用户特征矩阵与意图标签之间的关系。例如，可以通过训练卷积神经网络、BP神经网络(Back Propagation，反向传播)或者SVM(Support Vector Machine，支持向量机)算法等分类算法得到意图预测模型。以下以卷积神经网络为例，采集大量的测试用户的样本数据，按照步骤101至103提取特征，然后为提取的特征贴标签，例如，先从全部用户中选择一部分用户作为测试用户，对这些用户的用户指令进行记录，并生成第一特征向量；并对接收该用户指令时的全景数据进行采集，根据全景数据生成第二特征向量。然后记录电子设备对用户指令的响应情况，以及用户基于该响应情况执行的操作，在一可选的实施方式中，可以根据电子设备对用户指令的响应情况，以及用户基于该响应情况执行的操作自动生成意图标签数据，或者，在另外一些实施方式中，可以采用人工贴标签的方式为用户特征矩阵添加标签。In the embodiment of the present application, the intent prediction model is a classification model, which represents the relationship between the user feature matrix and the intent label. For example, the intent prediction model can be obtained by training a classification algorithm such as a convolutional neural network, a BP neural network (Back Propagation, back propagation), or an SVM (Support Vector Machine, support vector machine) algorithm. The following takes the convolutional neural network as an example to collect a large number of sample data of test users, extract features according to steps 101 to 103, and then label the extracted features. The user instruction is recorded, and the first feature vector is generated; the panoramic data when the user instruction is received is collected, and the second feature vector is generated according to the panoramic data. Then record the response of the electronic device to the user's instruction and the operation performed by the user based on the response. In an optional embodiment, the response of the electronic device to the user's instruction and the operation performed by the user based on the response may be recorded. The intent label data is automatically generated, or, in other embodiments, the user feature matrix can be labeled by manual labeling.

将上述具有意图标签的用户特征矩阵输入至卷积神经网络进行训练，其中，卷积神经网络的结构以及超参数可以由用户根据需要预先设置，经过训练，确定网络的权重参数，生成意图预测模型。Input the above-mentioned user feature matrix with intent labels into the convolutional neural network for training, wherein the structure and hyperparameters of the convolutional neural network can be preset by the user as needed, and after training, the weight parameters of the network are determined to generate an intent prediction model .

此外，需要说明的是，为了能够充分表征用户意图，一个用户特征矩阵对应的意图标签是一个意图标签集合，该意图标签集合中具有多个能够表征用户指令、地点、时间、环境、状态、习惯等信息的标签，这些标签刻画了用户当前的全景画像。In addition, it should be noted that, in order to fully characterize the user's intent, an intent label corresponding to a user feature matrix is an intent label set, and the intent label set has multiple sets of intent labels that can represent user instructions, location, time, environment, state, and habits. and other information labels, which describe the current panoramic portrait of the user.

将步骤103中获取的用户特征矩阵输入该意图预测模型，即可生成与用户指令匹配的意图标签。Inputting the user feature matrix obtained in step 103 into the intent prediction model, an intent label matching the user instruction can be generated.

步骤105、根据所述意图标签执行所述用户指令。Step 105: Execute the user instruction according to the intent tag.

意图预测模型输出的意图标签是由多个标签构成的标签集合，这些标签构成与用户指令对应的全景画像，电子设备在响应用户指令并启动对应的目标应用的同时，将标签集合推送给目标应用，这样，电子设备在响应用户指令时执行的就不仅仅是启动软件这样一个简单的操作，还可以进一步地，目标应用开启后，根据收到的标签集合执行更加具体的操作。The intent tag output by the intent prediction model is a tag set composed of multiple tags. These tags form a panoramic image corresponding to the user's instruction. The electronic device responds to the user's instruction and starts the corresponding target application, and pushes the tag set to the target application. In this way, when the electronic device responds to the user's instruction, it is not only a simple operation such as starting the software, but also further, after the target application is opened, it can perform more specific operations according to the received tag set.

例如，参照图5所示，图5为本申请实施例提供的指令执行方法的第三种流程示意图。步骤105、根据所述意图标签执行所述用户指令包括：For example, referring to FIG. 5 , FIG. 5 is a third schematic flowchart of an instruction execution method provided by an embodiment of the present application. Step 105: Executing the user instruction according to the intent tag includes:

步骤1051、根据所述意图标签确定所述用户指令对应的目标应用；Step 1051: Determine the target application corresponding to the user instruction according to the intent tag;

步骤1052、启动所述目标应用，并将所述意图标签发送至所述目标应用，其中所述目标应用基于所述意图标签执行对应的操作。Step 1052: Start the target application, and send the intent tag to the target application, where the target application performs a corresponding operation based on the intent tag.

比如，用户触发语音指令“小欧小欧，给我点餐”，系统的语音识别模块检测到该指令后，通过全景感知技术获得更多的能够体现用户意图的数据，例如，终端状态数据、用户状态数据和传感器状态数据等，使用这些数据补充指令的上下文信息，获取更全面的意图信息，例如：当前时间、当前地点、所处场景、用户饮食习惯、用户饥饿感等。电子设备根据该指令开启第三方点餐应用时，将这些信息推送给第三方点餐应用，第三方点餐应用可以根据上述信息向用户推荐具体的点餐地点和点餐内容。For example, the user triggers the voice command "Xiaoou Xiaoou, order me a meal". After the system's voice recognition module detects the command, it obtains more data that can reflect the user's intention through panoramic perception technology, such as terminal status data, User status data and sensor status data, etc., use these data to supplement the context information of the instruction to obtain more comprehensive intent information, such as: current time, current location, scene, user eating habits, user hunger, etc. When the electronic device opens the third-party ordering application according to the instruction, it pushes the information to the third-party ordering application, and the third-party ordering application can recommend the specific ordering place and ordering content to the user according to the above information.

由上可知，本申请实施例提出的指令执行方法，当接收到用户指令时，根据用户指令生成第一特征向量，获取当前的全景数据，根据全景数据生成第二特征向量，对第一特征向量和第二特征向量进行融合处理，生成用户特征矩阵，接下来将该用户特征矩阵作为预先训练好的意图预测模型的输入数据，获取与用户指令匹配的意图标签，根据该意图标签执行用户指令，该方案通过采集全景数据对用户指令的上下文信息和情景信息进行补充，能够更准确地理解用户的隐含意图，从而更好地执行用户指令。As can be seen from the above, the instruction execution method proposed in the embodiment of the present application, when receiving a user instruction, generates a first feature vector according to the user instruction, obtains the current panoramic data, generates a second feature vector according to the panoramic data, and analyzes the first feature vector. Perform fusion processing with the second feature vector to generate a user feature matrix, and then use the user feature matrix as the input data of the pre-trained intent prediction model, obtain the intent label matching the user instruction, and execute the user instruction according to the intent tag, This solution supplements the context information and context information of user instructions by collecting panoramic data, and can more accurately understand the user's implicit intention, thereby better executing user instructions.

在一实施例中还提供了一种指令执行装置。请参阅图6，图6为本申请实施例提供的指令执行装置400的结构示意图。其中该指令执行装置400应用于电子设备，该指令执行装置400包括第一特征提取模块401、第二特征提取模块402、特征融合模块403、意图预测模块404以及指令执行模块405，如下：In an embodiment, an instruction execution apparatus is also provided. Please refer to FIG. 6 , which is a schematic structural diagram of an instruction execution apparatus 400 according to an embodiment of the present application. The instruction execution apparatus 400 is applied to electronic equipment, and the instruction execution apparatus 400 includes a first feature extraction module 401, a second feature extraction module 402, a feature fusion module 403, an intention prediction module 404 and an instruction execution module 405, as follows:

第一特征提取模块401，用于当接收到用户指令时，根据所述用户指令生成第一特征向量。The first feature extraction module 401 is configured to generate a first feature vector according to the user instruction when a user instruction is received.

在本实施例中，用户指令可以有多种类型，例如语音指令、触控指令和握持指令。在接收到用户指令时，第一特征提取模块401获取用户指令对应的信号数据，对信号数据进行归一化处理，生成对应的第一特征向量，使用第一特征向量来表征该用户指令中包含的信息。其中，第一特征向量可以表示如下：In this embodiment, the user command may be of various types, such as voice command, touch command, and hold command. When receiving a user instruction, the first feature extraction module 401 obtains the signal data corresponding to the user instruction, normalizes the signal data, generates a corresponding first feature vector, and uses the first feature vector to represent that the user instruction contains Information. Among them, the first feature vector can be expressed as follows:

s₁＝{yi₁,yi₂,…,yi_n}s ₁ ={yi ₁ ,yi ₂ ,...,yi _n }

例如，在一实施例中，电子设备中设置有语音组件，例如麦克风，电子设备可以通过麦克风持续采集用户的语音数据。在电子设备开启语音识别功能后，用户可以通过语音指令对电子设备进行控制，例如“播放音乐”、“拨打电话”等指令。例如，第一特征提取模块401还用于：当接收到语音指令时，获取语音组件采集到的语音数据；根据预先训练好的自编码循环神经网络，生成所述语音数据的语义特征向量，将所述语义特征向量作为所述第一特征向量。For example, in one embodiment, the electronic device is provided with a voice component, such as a microphone, and the electronic device can continuously collect the user's voice data through the microphone. After the electronic device turns on the voice recognition function, the user can control the electronic device through voice commands, such as "play music", "make a call" and other commands. For example, the first feature extraction module 401 is further configured to: acquire the voice data collected by the voice component when a voice command is received; The semantic feature vector is used as the first feature vector.

或者，在另一实施例中，还可以采用其他的特征提取方式获取语音特征向量。第一特征提取模块401还用于：当接收到语音指令时，获取语音组件采集到的语音数据；根据音频特征提取算法将所述语音数据转换为频谱图；根据预先训练好的自编码卷积神经网络和所述频谱图，生成所述语音数据的语义特征向量，将所述语义特征向量作为所述第一特征向量。Or, in another embodiment, other feature extraction methods may also be used to obtain speech feature vectors. The first feature extraction module 401 is also used for: when receiving a voice command, acquiring the voice data collected by the voice component; converting the voice data into a spectrogram according to an audio feature extraction algorithm; according to the pre-trained self-encoding convolution A neural network and the spectrogram generate a semantic feature vector of the speech data, and use the semantic feature vector as the first feature vector.

第一特征提取模块401通过上述方案获取到语义向量特征，将语义向量特征作为第一特征向量。The first feature extraction module 401 obtains the semantic vector feature through the above solution, and uses the semantic vector feature as the first feature vector.

又例如，在一实施例中，所述用户指令为触控指令；第一特征提取模块401还用于：当接收到触控指令时，获取触控传感器采集到的触控数据；将所述触控数据转换为所述第一特征向量。For another example, in one embodiment, the user command is a touch command; the first feature extraction module 401 is further configured to: when receiving a touch command, obtain touch data collected by a touch sensor; The touch data is converted into the first feature vector.

又例如，在一实施例中，所述用户指令为握持指令；第一特征提取模块401还用于：当接收到握持指令时，获取握持传感器采集到的握持数据；将所述握持数据转换为所述第一特征向量。For another example, in one embodiment, the user instruction is a holding instruction; the first feature extraction module 401 is further configured to: when receiving the holding instruction, acquire the holding data collected by the holding sensor; The holding data is converted into the first feature vector.

第二特征提取模块402，用于获取当前的全景数据，并根据所述全景数据生成第二特征向量。The second feature extraction module 402 is configured to acquire current panoramic data, and generate a second feature vector according to the panoramic data.

本申请实施例中，电子设备的全景数据包括但不限于以下几类数据：终端状态数据、用户状态数据和传感器状态数据。其中，用户状态数据包括电子设备的摄像头间接性地捕获到的用户脸部图像，以及从用户数据库中获取的用户的年龄、性别等用户信息。In this embodiment of the present application, the panoramic data of the electronic device includes, but is not limited to, the following types of data: terminal state data, user state data, and sensor state data. The user state data includes the user's face image indirectly captured by the camera of the electronic device, and user information such as the user's age and gender obtained from the user database.

可选地，在一实施例中，第二特征提取模块402还用于：获取当前的终端状态数据、用户状态数据和传感器状态数据；根据所述终端状态数据生成终端状态特征，根据所述用户状态数据生成用户状态特征，并根据所述传感器状态数据生成终端情景特征；融合所述终端状态特征、所述用户状态特征和所述终端情景特征，生成所述第二特征向量。Optionally, in an embodiment, the second feature extraction module 402 is further configured to: acquire current terminal status data, user status data and sensor status data; generate terminal status features according to the terminal status data, The state data generates a user state feature, and generates a terminal context feature according to the sensor state data; the second feature vector is generated by fusing the terminal state feature, the user state feature and the terminal context feature.

其中，在一些实施例中，第二特征提取模块402还用于：调用摄像头组件捕获用户脸部图像；根据预设的卷积神经网络模型对所述用户脸部图像进行识别，生成用户情感标签；获取用户信息，并将所述用户情感标签和所述用户信息作为所述用户状态数据。用户信息可以从用户数据库中获取，用户信息可以包括用户性别、用户年龄、用户爱好等。Wherein, in some embodiments, the second feature extraction module 402 is further configured to: call the camera component to capture the user's face image; identify the user's face image according to a preset convolutional neural network model, and generate a user emotion label ; Acquire user information, and use the user emotion tag and the user information as the user state data. User information may be obtained from a user database, and the user information may include user gender, user age, user preferences, and the like.

其中，在一些实施例中，第二特征提取模块402还用于：确定当前运行的应用程序所述的程序类别；根据所述程序类别确定终端当前的运行模式，其中运行模式包括游戏模式、娱乐模式、影音模式；将所述运行模式作为终端状态数据。Wherein, in some embodiments, the second feature extraction module 402 is further configured to: determine the program category described by the currently running application program; determine the current running mode of the terminal according to the program category, wherein the running mode includes game mode, entertainment mode, video and audio mode; take the running mode as terminal state data.

综上所述根据终端状态数据生成终端状态特征ys₁，根据传感器的状态数据，例如，从磁力计、加速度计、陀螺仪通过卡尔曼滤波算法获得四维的终端姿态特征ys₂～ys₅，通过气压计采集的数据获取气压特征ys₆、通过网络模块确定WIFI连接状态ys₇、通过位置传感采集的数据进行定位，得到用户当前的位置属性(如商场、家里、公司、公园等)，生成特征ys₈；还可以进一步的结合磁力计、加速度传感器、陀螺仪、气压计10轴信息使用滤波算法或者主成分分析算法得到新的多维数据，生成对应的特征ys₉。根据用户的情感标签生成特征ys₁₀，根据用户的性别、年龄、爱好生成特征ys₁₁～ys₁₃。对于上述特征中的非数字形式的特征，可以采用建立索引号的方式，将其转换为数字表示，例如，对于当前系统终端状态的运行模式这一特征，使用索引号代表当前的状态模式，比如1是游戏模式，2是娱乐模式，3是影音模式。若当前运行模式为游戏模式，则确定当前系统状态ys1＝1。在获取到全部数字表示的特征后，融合上述特征数据得到一个长向量，将该长向量归一化处理，得到第二特征向量s₂：In summary, the terminal state feature ys ₁ is generated according to the terminal state data, and the four-dimensional terminal attitude feature ys ₂ -ys ₅ is obtained from the magnetometer, accelerometer, and gyroscope through the Kalman filtering algorithm according to the state data of the sensor, and is obtained by The data collected by the barometer obtains the air pressure characteristics ys ₆ , determines the WIFI connection state ys ₇ through the network module, locates the data collected by the position sensor, obtains the user's current location attributes (such as shopping malls, homes, companies, parks, etc.), and generates The feature ys ₈ ; the 10-axis information of the magnetometer, the acceleration sensor, the gyroscope, and the barometer can be further combined with the filtering algorithm or the principal component analysis algorithm to obtain new multi-dimensional data, and the corresponding feature ys ₉ can be generated. The feature ys ₁₀ is generated according to the user's emotional tag, and the features ys ₁₁ to ys ₁₃ are generated according to the user's gender, age, and hobby. For the features in the non-numeric form in the above features, the index number can be established to convert it into a digital representation. For example, for the feature of the running mode of the current system terminal state, the index number is used to represent the current state mode, such as 1 is the game mode, 2 is the entertainment mode, and 3 is the video mode. If the current running mode is the game mode, it is determined that the current system state ys1=1. After acquiring the features represented by all numbers, a long vector is obtained by fusing the above feature data, and the long vector is normalized to obtain the second feature vector s ₂ :

s₂＝{ys₁,ys₂,…,ys_m}s ₂ ={ys ₁ ,ys ₂ ,…,ys _m }

特征融合模块403，用于对所述第一特征向量和所述第二特征向量进行融合处理，生成用户特征矩阵。The feature fusion module 403 is configured to perform fusion processing on the first feature vector and the second feature vector to generate a user feature matrix.

其中，特征融合模块403可以将第一特征向量s₁和第二特征向量s₂进行矩阵叠加，生成如下用户特征矩阵：The feature fusion module 403 can perform matrix superposition of the first feature vector s ₁ and the second feature vector s ₂ to generate the following user feature matrix:

或者，在一可选的实施方式中，将第一特征向量和第二特征向量调整为相同的长度后，特征融合模块403将第一特征向量s₁和第二特征向量s₂进行矩阵叠加，然后，为了产生更加丰富的特征向量提供后续操作，将叠加的矩阵进行翻转操作，生成如下矩阵：Or, in an optional implementation manner, after the first feature vector and the second feature vector are adjusted to the same length, the feature fusion module 403 performs matrix stacking on the first feature vector s ₁ and the second feature vector s ₂ , Then, in order to generate more abundant feature vectors and provide subsequent operations, the superimposed matrix is flipped to generate the following matrix:

意图预测模块404，用于根据所述用户特征矩阵和预先训练好的意图预测模型，获取与所述用户指令匹配的意图标签。The intent prediction module 404 is configured to acquire an intent label matching the user instruction according to the user feature matrix and the pre-trained intent prediction model.

将特征融合模块403获取的用户特征矩阵输入该意图预测模型，即可生成与用户指令匹配的意图标签。By inputting the user feature matrix obtained by the feature fusion module 403 into the intent prediction model, an intent label matching the user instruction can be generated.

指令执行模块405，用于根据所述意图标签执行所述用户指令。The instruction execution module 405 is configured to execute the user instruction according to the intention tag.

意图预测模块404获取的意图标签是由多个标签构成的标签集合，这些标签构成与用户指令对应的全景画像，指令执行模块405在响应用户指令并启动对应的目标应用的同时，将标签集合推送给目标应用，这样，电子设备在响应用户指令时执行的就不仅仅是启动软件这样一个简单的操作，还可以进一步地，目标应用开启后，根据收到的标签集合执行更加具体的操作。The intent tag acquired by the intent prediction module 404 is a tag set composed of multiple tags, and these tags constitute a panoramic portrait corresponding to the user instruction. The instruction execution module 405 responds to the user instruction and starts the corresponding target application, while pushing the tag set. For the target application, in this way, when the electronic device responds to the user's instruction, it is not only a simple operation such as starting the software, but further, after the target application is opened, it can perform more specific operations according to the received tag set.

例如，指令执行模块405还用于：根据所述意图标签确定所述用户指令对应的目标应用；启动所述目标应用，并将所述意图标签发送至所述目标应用，其中所述目标应用基于所述意图标签执行对应的操作。For example, the instruction execution module 405 is further configured to: determine the target application corresponding to the user instruction according to the intent tag; start the target application, and send the intent tag to the target application, where the target application is based on The intent tag executes the corresponding operation.

由上可知，本申请实施例提出的指令执行装置，当接收到用户指令时，第一特征提取模块401根据用户指令生成第一特征向量，第二特征提取模块402获取当前的全景数据，根据全景数据生成第二特征向量，特征融合模块403对第一特征向量和第二特征向量进行融合处理，生成用户特征矩阵，接下来意图预测模块404将该用户特征矩阵作为预先训练好的意图预测模型的输入数据，获取与用户指令匹配的意图标签，指令执行模块405根据该意图标签执行用户指令，该方案通过采集全景数据对用户指令的上下文信息和情景信息进行补充，能够更准确地理解用户的隐含意图，从而更好地执行用户指令。It can be seen from the above that, in the instruction execution device proposed in the embodiment of the present application, when receiving a user instruction, the first feature extraction module 401 generates a first feature vector according to the user instruction, and the second feature extraction module 402 obtains the current panoramic data, according to the panoramic view. The data generates a second feature vector, and the feature fusion module 403 fuses the first feature vector and the second feature vector to generate a user feature matrix, and then the intention prediction module 404 uses the user feature matrix as the pre-trained intention prediction model. Input data, obtain the intent tag matching the user instruction, and the instruction execution module 405 executes the user instruction according to the intent tag. This solution supplements the context information and context information of the user instruction by collecting panoramic data, and can more accurately understand the user's hidden information. Intentions are included to better execute user instructions.

本申请实施例还提供一种电子设备。所述电子设备可以是智能手机、平板电脑等设备。如图7所示，图7为本申请实施例提供的电子设备的第一种结构示意图。电子设备300包括处理器301和存储器302。其中，处理器301与存储器302电性连接。The embodiments of the present application also provide an electronic device. The electronic device may be a smart phone, a tablet computer or the like. As shown in FIG. 7 , FIG. 7 is a schematic diagram of a first structure of an electronic device provided by an embodiment of the present application. Electronic device 300 includes processor 301 and memory 302 . The processor 301 is electrically connected to the memory 302 .

处理器301是电子设备300的控制中心，利用各种接口和线路连接整个电子设备的各个部分，通过运行或调用存储在存储器302内的计算机程序，以及调用存储在存储器302内的数据，执行电子设备的各种功能和处理数据，从而对电子设备进行整体监控。The processor 301 is the control center of the electronic device 300, uses various interfaces and lines to connect various parts of the entire electronic device, executes the electronic device by running or calling the computer program stored in the memory 302, and calling the data stored in the memory 302. Various functions of the device and processing data, so as to carry out the overall monitoring of the electronic device.

在本实施例中，电子设备300中的处理器301会按照如下的步骤，将一个或一个以上的计算机程序的进程对应的指令加载到存储器302中，并由处理器301来运行存储在存储器302中的计算机程序，从而实现各种功能：In this embodiment, the processor 301 in the electronic device 300 loads the instructions corresponding to the processes of one or more computer programs into the memory 302 according to the following steps, and the processor 301 executes the instructions stored in the memory 302 . A computer program in , which implements various functions:

在一些实施例中，所述用户指令为语音指令；当接收到用户指令时，根据所述用户指令生成第一特征向量时，处理器301执行如下步骤：In some embodiments, the user instruction is a voice instruction; when receiving the user instruction, when generating the first feature vector according to the user instruction, the processor 301 performs the following steps:

根据预先训练好的自编码循环神经网络，生成所述语音数据的语义特征向量，将所述语义特征向量作为所述第一特征向量。According to the pre-trained self-encoding recurrent neural network, a semantic feature vector of the speech data is generated, and the semantic feature vector is used as the first feature vector.

在一些实施例中，所述用户指令为触控指令；当接收到用户指令时，根据所述用户指令生成第一特征向量时，处理器301执行如下步骤：In some embodiments, the user instruction is a touch instruction; when receiving the user instruction, when generating the first feature vector according to the user instruction, the processor 301 performs the following steps:

当接收到触控指令时，获取触控传感器采集到的触控数据；When receiving the touch command, obtain the touch data collected by the touch sensor;

将所述触控数据转换为所述第一特征向量。Convert the touch data into the first feature vector.

在一些实施例中，所述全景数据包括终端状态数据、用户状态数据和传感器状态数据；获取当前的全景数据，并根据所述全景数据生成第二特征向量时，处理器301执行如下步骤：In some embodiments, the panoramic data includes terminal state data, user state data and sensor state data; when acquiring the current panoramic data and generating the second feature vector according to the panoramic data, the processor 301 performs the following steps:

获取当前的终端状态数据、用户状态数据和传感器状态数据；Obtain current terminal status data, user status data and sensor status data;

根据所述终端状态数据生成终端状态特征，根据所述用户状态数据生成用户状态特征，并根据所述传感器状态数据生成终端情景特征；generating terminal status features according to the terminal status data, generating user status features according to the user status data, and generating terminal scene features according to the sensor status data;

融合所述终端状态特征、所述用户状态特征和所述终端情景特征，生成所述第二特征向量。The second feature vector is generated by fusing the terminal state feature, the user state feature and the terminal context feature.

在一些实施例中，获取当前的用户状态数据时，处理器301执行如下步骤：In some embodiments, when acquiring the current user state data, the processor 301 performs the following steps:

调用摄像头组件捕获用户脸部图像；Call the camera component to capture the user's face image;

根据预设的卷积神经网络模型对所述用户脸部图像进行识别，生成用户情感标签；Identify the user's face image according to a preset convolutional neural network model, and generate a user emotional label;

获取用户信息，并将所述用户情感标签和所述用户信息作为所述用户状态数据。Obtain user information, and use the user emotion tag and the user information as the user state data.

在一些实施例中，对所述第一特征向量和所述第二特征向量进行融合处理，生成用户特征矩阵时，处理器301执行如下步骤：In some embodiments, the first feature vector and the second feature vector are fused to generate a user feature matrix, the processor 301 performs the following steps:

对所述第一特征向量和所述第二特征向量进行矩阵叠加处理，生成用户特征矩阵。Perform matrix superposition processing on the first feature vector and the second feature vector to generate a user feature matrix.

在一些实施例中，根据所述意图标签执行所述用户指令时，处理器301执行如下步骤：In some embodiments, when executing the user instruction according to the intent tag, the processor 301 performs the following steps:

根据所述意图标签确定所述用户指令对应的目标应用；Determine the target application corresponding to the user instruction according to the intent tag;

启动所述目标应用，并将所述意图标签发送至所述目标应用，其中所述目标应用基于所述意图标签执行对应的操作。The target application is started, and the intent tag is sent to the target application, wherein the target application performs a corresponding operation based on the intent tag.

存储器302可用于存储计算机程序和数据。存储器302存储的计算机程序中包含有可在处理器中执行的指令。计算机程序可以组成各种功能模块。处理器301通过调用存储在存储器302的计算机程序，从而执行各种功能应用以及数据处理。Memory 302 may be used to store computer programs and data. The computer program stored in the memory 302 contains instructions executable in the processor. A computer program can be composed of various functional modules. The processor 301 executes various functional applications and data processing by calling the computer program stored in the memory 302 .

在一些实施例中，如图8所示，图8为本申请实施例提供的电子设备的第二种结构示意图。电子设备300还包括：射频电路303、显示屏304、控制电路305、输入单元306、音频电路307、传感器308以及电源309。其中，处理器301分别与射频电路303、显示屏304、控制电路305、输入单元306、音频电路307、传感器308以及电源309电性连接。In some embodiments, as shown in FIG. 8 , FIG. 8 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the present application. The electronic device 300 further includes: a radio frequency circuit 303 , a display screen 304 , a control circuit 305 , an input unit 306 , an audio circuit 307 , a sensor 308 and a power supply 309 . The processor 301 is electrically connected to the radio frequency circuit 303 , the display screen 304 , the control circuit 305 , the input unit 306 , the audio circuit 307 , the sensor 308 and the power supply 309 respectively.

射频电路303用于收发射频信号，以通过无线通信与网络设备或其他电子设备进行通信。The radio frequency circuit 303 is used to send and receive radio frequency signals to communicate with network equipment or other electronic equipment through wireless communication.

显示屏304可用于显示由用户输入的信息或提供给用户的信息以及电子设备的各种图形用户接口，这些图形用户接口可以由图像、文本、图标、视频和其任意组合来构成。The display screen 304 may be used to display information entered by or provided to the user and various graphical user interfaces of the electronic device, which may consist of images, text, icons, video, and any combination thereof.

控制电路305与显示屏304电性连接，用于控制显示屏304显示信息。The control circuit 305 is electrically connected to the display screen 304 for controlling the display screen 304 to display information.

输入单元306可用于接收输入的数字、字符信息或用户特征信息(例如指纹)，以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。其中，输入单元306可以包括指纹识别模组。Input unit 306 may be used to receive input numbers, character information, or user characteristic information (eg, fingerprints), and generate keyboard, mouse, joystick, optical, or trackball signal input related to user settings and function control. Wherein, the input unit 306 may include a fingerprint identification module.

音频电路307可通过扬声器、传声器提供用户与电子设备之间的音频接口。其中，音频电路307包括麦克风。所述麦克风与所述处理器301电性连接。所述麦克风用于接收用户输入的语音信息。The audio circuit 307 can provide an audio interface between the user and the electronic device through speakers and microphones. Among them, the audio circuit 307 includes a microphone. The microphone is electrically connected to the processor 301 . The microphone is used for receiving voice information input by the user.

传感器308用于采集外部环境信息。传感器308可以包括环境亮度传感器、加速度传感器、陀螺仪等传感器中的一种或多种。The sensor 308 is used to collect external environment information. The sensor 308 may include one or more of an ambient brightness sensor, an acceleration sensor, a gyroscope, and the like.

电源309用于给电子设备300的各个部件供电。在一些实施例中，电源309可以通过电源管理系统与处理器301逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。Power supply 309 is used to power various components of electronic device 300 . In some embodiments, the power supply 309 may be logically connected to the processor 301 through a power management system, so as to implement functions such as managing charging, discharging, and power consumption through the power management system.

尽管图8中未示出，电子设备300还可以包括摄像头、蓝牙模块等，在此不再赘述。Although not shown in FIG. 8 , the electronic device 300 may further include a camera, a Bluetooth module, and the like, which will not be repeated here.

由上可知，本申请实施例提供了一种电子设备，所述电子设备可以接收用户指令，例如语音指令、触控指令或者握持指令等，在接收到用户指令时，根据用户指令生成第一特征向量；然后对电子设备当前的全景数据进行采集，例如用户信息、传感器状态数据、电子设备中应用程序的使用信息等，根据采集到的全景数据生成第二特征向量，接下来将第一特征向量和第二特征向量进行融合处理，生成用户特征矩阵，根据用户特征矩阵和预先训练得到的意图预测模型，生成与该用户特征矩阵对应的意图标签，并根据该意图标签执行用户指令，通过这种方式，根据在接收指令时采集到的全景数据，为用户指令补充完善的上下文信息和所述情景环境的信息，使得电子设备能够更准确地理解用户的隐含意图，从而更好地执行用户指令。As can be seen from the above, the embodiment of the present application provides an electronic device, the electronic device can receive user instructions, such as voice instructions, touch instructions, or holding instructions, etc., and when receiving the user instruction, generates a first feature vector; then collect the current panoramic data of the electronic device, such as user information, sensor status data, application usage information in the electronic device, etc., generate a second feature vector according to the collected panoramic data, and then use the first feature vector The vector and the second feature vector are fused to generate a user feature matrix. According to the user feature matrix and the pre-trained intent prediction model, an intent label corresponding to the user feature matrix is generated, and user instructions are executed according to the intent label. In this way, according to the panoramic data collected when receiving the command, the user's command is supplemented with complete contextual information and the information of the contextual environment, so that the electronic device can more accurately understand the user's implicit intention, so as to better execute the user's instruction.

本申请实施例还提供一种存储介质，所述存储介质中存储有计算机程序，当所述计算机程序在计算机上运行时，所述计算机执行上述任一实施例所述的指令执行方法。An embodiment of the present application further provides a storage medium, where a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer executes the instruction execution method described in any of the foregoing embodiments.

需要说明的是，本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过计算机程序来指令相关的硬件来完成，所述计算机程序可以存储于计算机可读存储介质中，所述存储介质可以包括但不限于：只读存储器(ROM，Read OnlyMemory)、随机存取存储器(RAM，Random Access Memory)、磁盘或光盘等。It should be noted that those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium , the storage medium may include, but is not limited to, a read only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk or an optical disk, and the like.

以上对本申请实施例所提供的指令执行方法、装置、存储介质及电子设备进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The instruction execution method, apparatus, storage medium, and electronic device provided by the embodiments of the present application have been described in detail above. The principles and implementations of the present application are described herein using specific examples, and the descriptions of the above embodiments are only used to help understand the methods and core ideas of the present application; meanwhile, for those skilled in the art, according to the Thoughts, there will be changes in specific embodiments and application scopes. To sum up, the contents of this specification should not be construed as limitations on the present application.

Claims

1. An instruction execution method, comprising:

when a user instruction is received, generating a first feature vector according to the user instruction;

acquiring current panoramic data and generating a second eigenvector according to the panoramic data;

fusing the first feature vector and the second feature vector to generate a user feature matrix;

acquiring an intention label matched with the user instruction according to the user characteristic matrix and a pre-trained intention prediction model;

executing the user instruction according to the intention tag.

2. The instruction execution method of claim 1 wherein the user instruction is a voice instruction; when a user instruction is received, generating a first feature vector according to the user instruction, wherein the step comprises the following steps:

when a voice instruction is received, voice data acquired by a voice component is acquired;

and generating a semantic feature vector of the voice data according to a pre-trained self-coding recurrent neural network, and taking the semantic feature vector as the first feature vector.

3. The instruction execution method of claim 1 wherein the user instruction is a voice instruction; when a user instruction is received, generating a first feature vector according to the user instruction, wherein the step comprises the following steps:

converting the voice data into a spectrogram according to an audio feature extraction algorithm;

and generating a semantic feature vector of the voice data according to a pre-trained self-coding convolutional neural network and the spectrogram, and taking the semantic feature vector as the first feature vector.

4. The instruction execution method of claim 1, wherein the user instruction is a touch instruction; when a user instruction is received, generating a first feature vector according to the user instruction, wherein the step comprises the following steps:

when a touch instruction is received, acquiring touch data acquired by a touch sensor;

converting the touch data into the first feature vector.

5. The instruction execution method of any of claims 1 to 4, wherein the panoramic data comprises terminal state data, user state data, and sensor state data; the method comprises the steps of obtaining current panoramic data and generating a second eigenvector according to the panoramic data, wherein the steps comprise:

acquiring current terminal state data, user state data and sensor state data;

generating terminal state characteristics according to the terminal state data, generating user state characteristics according to the user state data, and generating terminal scene characteristics according to the sensor state data;

and fusing the terminal state feature, the user state feature and the terminal scene feature to generate the second feature vector.

6. The instruction execution method of claim 5 wherein the step of obtaining current user state data comprises:

calling a camera assembly to capture a face image of a user;

identifying the user face image according to a preset convolutional neural network model to generate a user emotion label;

and acquiring user information, and taking the user emotion label and the user information as the user state data.

7. The instruction execution method of any one of claims 1 to 4, wherein the step of performing fusion processing on the first feature vector and the second feature vector to generate a user feature matrix comprises:

and performing matrix superposition processing on the first eigenvector and the second eigenvector to generate a user characteristic matrix.

8. The instruction execution method of any of claims 1-4, wherein executing the user instruction according to the intent tag comprises:

determining a target application corresponding to the user instruction according to the intention label;

and starting the target application and sending the intention label to the target application, wherein the target application executes corresponding operation based on the intention label.

9. The instruction execution method of any one of claims 1 to 4, wherein the step of obtaining the intent tag matching the user instruction according to the user feature matrix and a pre-trained intent prediction model comprises:

and acquiring an intention label matched with the user instruction according to the user characteristic matrix and an intention prediction model obtained based on neural network training.

10. An instruction execution apparatus, comprising:

the first feature extraction module is used for generating a first feature vector according to a user instruction when the user instruction is received;

the second feature extraction module is used for acquiring current panoramic data and generating a second feature vector according to the panoramic data;

the feature fusion module is used for carrying out fusion processing on the first feature vector and the second feature vector to generate a user feature matrix;

the intention prediction module is used for acquiring an intention label matched with the user instruction according to the user characteristic matrix and a pre-trained intention prediction model;

and the instruction execution module is used for executing the user instruction according to the intention label.

11. A storage medium having stored thereon a computer program, characterized in that, when the computer program is run on a computer, it causes the computer to execute the instruction execution method according to any one of claims 1 to 9.

12. An electronic device comprising a processor and a memory, the memory storing a computer program, wherein the processor is configured to perform the instruction execution method of any of claims 1 to 9 by invoking the computer program.