CN111816211B

CN111816211B - Emotion recognition method and device, storage medium and electronic equipment

Info

Publication number: CN111816211B
Application number: CN201910282465.3A
Authority: CN
Inventors: 陈仲铭; 何明
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2023-06-02
Anticipated expiration: 2039-04-09
Also published as: CN111816211A

Abstract

The embodiment of the present application discloses an emotion recognition method, device, storage medium, and electronic equipment, wherein the electronic equipment can obtain the text content input by the user, and perform emotion recognition according to the text content and the pre-trained first emotion recognition model, and obtain The user's first candidate emotion, and then obtain the voice content during the user's input text content, and perform emotion recognition based on the voice content and the pre-trained second emotion recognition model to obtain the user's second candidate emotion, and finally according to the first candidate emotion and The second candidate emotion identifies the user's target emotion. Therefore, the embodiment of the present application recognizes the user's emotion based on different information sources, and finally determines the user's target emotion by combining the emotion recognition results obtained from different information sources, so as to realize accurate recognition of the user's emotion.

Description

Emotion recognition method, device, storage medium and electronic equipment

技术领域technical field

本申请涉及数据处理技术领域，具体涉及一种情绪识别方法、装置、存储介质及电子设备。The present application relates to the technical field of data processing, in particular to an emotion recognition method, device, storage medium and electronic equipment.

背景技术Background technique

人类作为一种具有极强情感因子的群体，具有喜、怒、忧、思、悲、恐以及惊等情绪。电子设备可以通过对用户的情绪进行识别，来向用户提供智能化的服务，比如，在识别出用户不开心时，“说”笑话给用户听。然而，相关技术中，通常基于感情词典的情绪识别方法来识别用户的情绪，比如，识别用户输入文本内容中的情绪词，再根据感情词典匹配出对应的情绪，但是这种情绪识别的方式并不准确。As a group with extremely strong emotional factors, human beings have emotions such as joy, anger, worry, thought, sadness, fear and shock. The electronic device can provide intelligent services to the user by recognizing the user's emotion, for example, "tell" a joke to the user when it is recognized that the user is unhappy. However, in related technologies, the emotion recognition method based on the emotion dictionary is usually used to identify the user's emotion, for example, to identify the emotion words in the text input by the user, and then match the corresponding emotion according to the emotion dictionary, but this emotion recognition method does not Inaccurate.

发明内容Contents of the invention

本申请实施例提供了一种情绪识别方法、装置、存储介质及电子设备，能够实现对用户情绪的准确识别。Embodiments of the present application provide an emotion recognition method, device, storage medium, and electronic equipment, capable of accurately recognizing user emotions.

第一方面，本申请实施例提供了一种情绪识别方法，应用于电子设备，该情绪识别方法包括：In the first aspect, an embodiment of the present application provides an emotion recognition method applied to an electronic device, and the emotion recognition method includes:

获取用户输入的文本内容，并根据所述文本内容以及预先训练的第一情绪识别模型进行情绪识别，得到所述用户的第一候选情绪；Obtaining the text content input by the user, and performing emotion recognition according to the text content and the pre-trained first emotion recognition model, to obtain the first candidate emotion of the user;

获取用户输入所述文本内容期间的声音内容，并根据所述声音内容以及预先训练的第二情绪识别模型进行情绪识别，得到所述用户的第二候选情绪；Acquiring the voice content during which the user inputs the text content, and performing emotion recognition according to the voice content and the pre-trained second emotion recognition model, to obtain the second candidate emotion of the user;

根据所述第一候选情绪以及所述第二候选情绪确定所述用户的目标情绪。The user's target emotion is determined according to the first candidate emotion and the second candidate emotion.

第二方面，本申请实施例提供了一种情绪识别装置，应用于电子设备，该情绪识别装置包括：In the second aspect, the embodiment of the present application provides an emotion recognition device, which is applied to electronic equipment, and the emotion recognition device includes:

第一情绪识别模块，用于获取用户输入的文本内容，并根据所述文本内容以及预先训练的第一情绪识别模型进行情绪识别，得到所述用户的第一候选情绪；The first emotion recognition module is used to obtain the text content input by the user, and perform emotion recognition according to the text content and the pre-trained first emotion recognition model to obtain the first candidate emotion of the user;

第二情绪识别模块，用于获取用户输入所述文本内容期间的声音内容，并根据所述声音内容以及预先训练的第二情绪识别模型进行情绪识别，得到所述用户的第二候选情绪；The second emotion recognition module is used to obtain the voice content during the period when the user inputs the text content, and perform emotion recognition according to the voice content and the pre-trained second emotion recognition model to obtain the second candidate emotion of the user;

目标情绪识别模块，用于根据所述第一候选情绪以及所述第二候选情绪确定所述用户的目标情绪。A target emotion recognition module, configured to determine the user's target emotion according to the first candidate emotion and the second candidate emotion.

第三方面，本申请实施例提供了一种存储介质，其上存储有计算机程序，当所述计算机程序在计算机上运行时，使得所述计算机执行如本申请实施例提供的情绪识别方法中的步骤。In a third aspect, the embodiment of the present application provides a storage medium on which a computer program is stored, and when the computer program is run on a computer, the computer is made to perform the steps in the emotion recognition method provided in the embodiment of the present application. step.

第四方面，本申请实施例提供了一种电子设备，包括处理器和存储器，所述存储器有计算机程序，所述处理器通过调用所述计算机程序，用于执行如本申请实施例提供的情绪识别方法中的步骤。In a fourth aspect, the embodiment of the present application provides an electronic device, including a processor and a memory, the memory has a computer program, and the processor is used to execute the emotion provided by the embodiment of the present application by calling the computer program. Identify the steps in the method.

本申请实施例中，电子设备可以获取用户输入的文本内容，并根据文本内容以及预先训练的第一情绪识别模型进行情绪识别，得到用户的第一候选情绪，然后获取用户输入文本内容期间的声音内容，并根据声音内容以及预先训练的第二情绪识别模型进行情绪识别，得到用户的第二候选情绪，最后根据第一候选情绪以及第二候选情绪确定用户的目标情绪。由此，本申请实施例通过基于不同的信息源对用户进行情绪识别，并结合不同信息源所得的情绪识别结果最终确定用户的目标情绪，实现对用户情绪的准确识别。In the embodiment of the present application, the electronic device can obtain the text content input by the user, perform emotion recognition according to the text content and the pre-trained first emotion recognition model, obtain the user's first candidate emotion, and then obtain the voice during the user input text content content, and perform emotion recognition according to the voice content and the pre-trained second emotion recognition model to obtain the user's second candidate emotion, and finally determine the user's target emotion according to the first candidate emotion and the second candidate emotion. Therefore, the embodiment of the present application recognizes the user's emotion based on different information sources, and finally determines the user's target emotion by combining the emotion recognition results obtained from different information sources, so as to realize accurate recognition of the user's emotion.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1是本申请实施例提供的全景感知架构的结构示意图。FIG. 1 is a schematic structural diagram of a panoramic perception architecture provided by an embodiment of the present application.

图2是本申请实施例提供的情绪识别方法的一流程示意图。Fig. 2 is a schematic flowchart of an emotion recognition method provided by an embodiment of the present application.

图3是本申请实施例提供的情绪识别方法的另一流程示意图。Fig. 3 is another schematic flowchart of the emotion recognition method provided by the embodiment of the present application.

图4是本申请实施例提供的情绪识别方法的应用场景示意图。Fig. 4 is a schematic diagram of an application scenario of an emotion recognition method provided by an embodiment of the present application.

图5是本申请实施例提供的情绪识别装置的一结构示意图。Fig. 5 is a schematic structural diagram of an emotion recognition device provided by an embodiment of the present application.

图6是本申请实施例提供的电子设备的一结构示意图。FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

图7是本申请实施例提供的电子设备的另一结构示意图。FIG. 7 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

请参照图式，其中相同的组件符号代表相同的组件，本申请的原理是以实施在一适当的运算环境中来举例说明。以下的说明是基于所例示的本申请具体实施例，其不应被视为限制本申请未在此详述的其它具体实施例。Referring to the drawings, where the same reference numerals represent the same components, the principles of the present application are exemplified by implementation in a suitable computing environment. The following description is based on illustrated specific embodiments of the present application, which should not be construed as limiting other specific embodiments of the present application that are not described in detail here.

随着传感器的小型化、智能化，如手机、平板电脑等电子设备集成了越来越多的传感器，比如光线传感器、距离传感器、位置传感器、加速度传感器以及重力传感器，等等。电子设备能够通过其配置的传感器以更小的功耗采集到更多的数据。同时，电子设备在运行过程中还会采集到自身状态相关的数据以及用户状态相关的数据，等等。笼统的说，电子设备能够获取到外部环境相关的数据(比如温度、光照、地点、声音、天气等)、用户状态相关的数据(比如姿势、速度、手机使用习惯、个人基本信息等)以及电子设备状态相关的数据(比如耗电量、资源使用状况、网络状况等)。With the miniaturization and intelligence of sensors, electronic devices such as mobile phones and tablet computers integrate more and more sensors, such as light sensors, distance sensors, position sensors, acceleration sensors, and gravity sensors, etc. Electronic equipment can collect more data with less power consumption through its configured sensors. At the same time, the electronic device will also collect data related to its own state and data related to the user state during operation. Generally speaking, electronic devices can obtain data related to the external environment (such as temperature, light, location, sound, weather, etc.), data related to user status (such as posture, speed, mobile phone usage habits, basic personal information, etc.) Data related to device status (such as power consumption, resource usage status, network status, etc.).

本申请实施例中，为了能够对电子设备获取到的这些数据进行处理，向用户提供智能服务，提出了一种全景感知架构。请参照图1，图1为本申请实施例提供的全景感知架构的结构示意图，应用于电子设备，其包括由下至上的信息感知层、数据处理层、特征抽取层、情景建模层以及智能服务层。In the embodiment of the present application, in order to process the data acquired by the electronic device and provide intelligent services to users, a panoramic perception architecture is proposed. Please refer to FIG. 1. FIG. 1 is a schematic structural diagram of the panoramic perception architecture provided by the embodiment of the present application, which is applied to electronic devices and includes a bottom-up information perception layer, a data processing layer, a feature extraction layer, a scene modeling layer, and an intelligent service layer.

作为全景感知架构的最底层，信息感知层用于获取能够描述用户的各类型情景的原始数据，包括动态的数据和静态的数据。其中，信息感知层由多个用于数据采集的传感器组成，包括但不限于图示的用于检测电子设备与外部物体之间的距离的距离传感器、用于检测电子设备所处环境的磁场信息的磁场传感器、用于检测电子设备所处环境的光线信息的光线传感器、用于检测电子设备的加速度数据的加速度传感器、用于采集用户的指纹信息的指纹传感器、用于感应磁场信息的霍尔传感器、用于检测电子设备当前所处的地理位置的位置传感器、用于检测电子设备在各个方向上的角速度的陀螺仪、用于检测电子设备的运动数据惯性传感器、用于感应电子设备的姿态信息的姿态感应器、用于检测电子设备所处环境的气压的气压计以及用于检测用户的心率信息的心率传感器等。As the bottom layer of the panoramic perception architecture, the information perception layer is used to obtain raw data that can describe various types of scenarios of users, including dynamic data and static data. Among them, the information perception layer is composed of multiple sensors for data collection, including but not limited to the distance sensor used to detect the distance between the electronic device and external objects, the magnetic field information used to detect the environment where the electronic device is located The magnetic field sensor, the light sensor used to detect the light information of the environment where the electronic device is located, the acceleration sensor used to detect the acceleration data of the electronic device, the fingerprint sensor used to collect the user's fingerprint information, the Hall sensor used to sense the magnetic field information Sensors, position sensors for detecting the current geographic location of electronic devices, gyroscopes for detecting angular velocity of electronic devices in various directions, inertial sensors for detecting motion data of electronic devices, and sensing the attitude of electronic devices A posture sensor for information, a barometer for detecting the air pressure in the environment where the electronic device is located, and a heart rate sensor for detecting the user's heart rate information, etc.

作为全景感知架构的次底层，数据处理层用于对信息感知层获取到的原始数据进行处理，消除原始数据存在的噪声、不一致等问题。其中，数据处理层可以对信息感知层获取到的数据进行数据清理、数据集成、数据变换、数据归约等处理。As the sub-bottom layer of the panoramic perception architecture, the data processing layer is used to process the original data obtained by the information perception layer, and eliminate the noise and inconsistency of the original data. Among them, the data processing layer can perform data cleaning, data integration, data transformation, data reduction and other processing on the data obtained by the information perception layer.

作为全景感知架构的中间层，特征抽取层用于对数据处理层处理后的数据进行特征抽取，以提取所述数据中包括的特征。其中，特征抽取层可以通过过滤法、包装法、集成法等方法来提取特征或者对提取到的特征进行处理。As the middle layer of the panoramic perception architecture, the feature extraction layer is used to perform feature extraction on the data processed by the data processing layer, so as to extract the features contained in the data. Among them, the feature extraction layer can extract features or process the extracted features through methods such as filtering, packaging, and integration.

过滤法是指对提取到的特征进行过滤，以删除冗余的特征数据。包装法用于对提取到的特征进行筛选。集成法是指将多种特征提取方法集成到一起，以构建一种更加高效、更加准确的特征提取方法，用于提取特征。The filtering method refers to filtering the extracted features to delete redundant feature data. The wrapping method is used to filter the extracted features. The integration method refers to the integration of multiple feature extraction methods to construct a more efficient and accurate feature extraction method for feature extraction.

作为全景感知架构的次高层，情景建模层用于根据特征抽取层提取到的特征来构建模型，所得到的模型可以用于表示电子设备的状态或者用户状态或者环境状态等。例如，情景建模层可以根据特征抽取层提取到的特征来构建关键值模型、模式标识模型、图模型、实体联系模型、面向对象模型等。As the second level of the panoramic perception architecture, the scene modeling layer is used to build a model based on the features extracted by the feature extraction layer, and the obtained model can be used to represent the state of the electronic device, the user state, or the environment state. For example, the scenario modeling layer can construct a key value model, a pattern identification model, a graph model, an entity relationship model, an object-oriented model, etc. according to the features extracted by the feature extraction layer.

作为全景感知架构的最高层，智能服务层用于根据情景建模层所构建的模型提供智能化服务。比如，智能服务层可以为用户提供基础应用服务，可以为电子设备进行系统智能优化服务，还可以为用户提供个性化智能服务等。As the highest layer of the panoramic perception architecture, the intelligent service layer is used to provide intelligent services based on the model constructed by the scenario modeling layer. For example, the intelligent service layer can provide users with basic application services, provide intelligent system optimization services for electronic devices, and provide users with personalized intelligent services.

此外，全景感知架构中还包括算法库，算法库中包括但不限于图示的马尔科夫算法、隐含狄里克雷分布算法、贝叶斯分类算法、支持向量机、K均值聚类算法、K近邻算法、条件随机场、残差网络、长短期记忆网络、卷积神经网络以及循环神经网络等算法。In addition, the panoramic perception architecture also includes an algorithm library, including but not limited to the illustrated Markov algorithm, hidden Dirichlet distribution algorithm, Bayesian classification algorithm, support vector machine, K-means clustering algorithm , K nearest neighbor algorithm, conditional random field, residual network, long short-term memory network, convolutional neural network and recurrent neural network and other algorithms.

基于本申请实施例提供的全景感知架构，本申请实施例提供一种情绪识别方法，该情绪识别方法的执行主体可以是本申请实施例提供的情绪识别装置，或者集成了该情绪识别装置的电子设备，其中该情绪识别装置可以采用硬件或者软件的方式实现。其中，电子设备可以是智能手机、平板电脑、掌上电脑、笔记本电脑、或者台式电脑等配置有处理器而具有处理能力的设备。Based on the panoramic perception framework provided by the embodiment of the present application, the embodiment of the present application provides an emotion recognition method. The execution subject of the emotion recognition method may be the emotion recognition device provided in the embodiment of the present application, or an electronic equipment, wherein the emotion recognition device can be implemented in hardware or software. Wherein, the electronic device may be a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer, etc., which are equipped with a processor and have processing capabilities.

基于本申请实施例所提供的活动预测方法，可以在信息感知层获取到全景数据，提供给数据处理层；在数据处理层从全景数据中筛选出用户输入的文本内容，以及用户输入文本内容期间的声音内容，提供给特征抽取层；在特征抽取层分别对前述文本内容和声音内容进行特征提取，得到对应文本内容的特征向量以及对应声音内容的特征向量，提供给情景建模层；在情景建模层对应文本内容的特征向量以及对应声音内容的特征向量分别进行情绪识别，得到对应文本内容的第一候选情绪、以及对应声音内容的第二候选情绪，再根据第一候选情绪以及第二候选情绪确定用户的目标情绪，提供给智能服务层；在智能服务层根据用户的目标情绪执行对应的操作，比如将前述文本内容以及目标情绪发送至对应的目标设备，使得其他用户能够查看用户输入的文本内容之外，还能够获知用户输入文本内容时的情绪，有助于更好的沟通。Based on the activity prediction method provided by the embodiment of the present application, the panoramic data can be obtained at the information perception layer and provided to the data processing layer; at the data processing layer, the text content input by the user can be screened out from the panoramic data, and during the user input text content The voice content of the audio content is provided to the feature extraction layer; in the feature extraction layer, the feature extraction of the aforementioned text content and voice content is performed respectively, and the feature vector corresponding to the text content and the feature vector corresponding to the voice content are obtained, which are provided to the scene modeling layer; The modeling layer performs emotion recognition on the feature vector corresponding to the text content and the feature vector corresponding to the sound content, and obtains the first candidate emotion corresponding to the text content and the second candidate emotion corresponding to the sound content, and then according to the first candidate emotion and the second candidate emotion Candidate emotions determine the user's target emotion and provide it to the intelligent service layer; in the intelligent service layer, corresponding operations are performed according to the user's target emotion, such as sending the aforementioned text content and target emotion to the corresponding target device, so that other users can view user input In addition to the text content, it is also possible to know the emotions of the user when entering the text content, which is helpful for better communication.

请参照图2，图2为本申请实施例提供的情绪识别方法的流程示意图。如图2所示，本申请实施例提供的情绪识别方法的流程可以如下：Please refer to FIG. 2 . FIG. 2 is a schematic flowchart of an emotion recognition method provided in an embodiment of the present application. As shown in Figure 2, the flow of the emotion recognition method provided in the embodiment of the present application may be as follows:

在101中，获取用户输入的文本内容，并根据文本内容以及预先训练的第一情绪识别模型进行情绪识别，得到用户的第一候选情绪。In 101, the text content input by the user is obtained, and emotion recognition is performed according to the text content and a pre-trained first emotion recognition model to obtain the user's first candidate emotion.

本申请实施例中，电子设备通过对用户输入进行侦测，在侦测到用户输入文本内容时，触发对用户的情绪识别。其中，文本内容包括但不限于词、句以及文章等。比如，用户在通过电子设备安装的即时通讯应用与其他用户进行会话的过程中，电子设备在侦测用户输入的聊天的文本内容时，将触发对用户进行情绪识别。In the embodiment of the present application, the electronic device detects the user input, and triggers the user's emotion recognition when detecting the user input text content. Wherein, the text content includes but not limited to words, sentences and articles. For example, when a user is having a conversation with other users through an instant messaging application installed on the electronic device, when the electronic device detects chat text content input by the user, it will trigger emotion recognition on the user.

在触发对用户进行情绪识别时，电子设备首先获取到用户输入的文本内容，以根据该文本内容对用户的情绪进行初步的识别。应当说明的是，在本申请实施例中，预先训练有用于根据用户输入的文本内容对用户情绪进行识别的第一情绪识别模型。比如，预先建立初始的卷积神经网络模型，以及获取文本内容样本并对文本内容样本对应的情绪进行标定，得到对应的情绪标签，再根据文本内容样本以及标定的情绪标签对初始的卷积神经网络进行训练，将训练后的卷积神经网络作为对用户的文本内容进行情绪识别的第一情绪识别模型。其中，该第一情绪识别模型可以存储在电子设备本地，也可以存储在远端的服务器中。这样，电子设备在获取到用户输入的文本内容之后，进一步从本地获取用于对用户输入的文本内容进行情绪识别的第一情绪识别模型，或者，从远端的服务器获取用于对用户输入的文本内容进行情绪识别的第一情绪识别模型。在获取到第一情绪识别模型之后，电子设备利用该第一情绪识别模型对用户输入的文本内容进行情绪识别，并将此时识别到的情绪记为用户的第一候选情绪。When triggering the user's emotion recognition, the electronic device first acquires the text content input by the user, so as to perform preliminary recognition of the user's emotion according to the text content. It should be noted that, in the embodiment of the present application, a first emotion recognition model for recognizing user emotions according to text content input by the user is pre-trained. For example, the initial convolutional neural network model is established in advance, and the text content samples are obtained and the emotions corresponding to the text content samples are calibrated to obtain the corresponding emotional labels. The network is trained, and the trained convolutional neural network is used as the first emotion recognition model for emotion recognition of the user's text content. Wherein, the first emotion recognition model may be stored locally in the electronic device, or may be stored in a remote server. In this way, after the electronic device obtains the text content input by the user, it further acquires locally the first emotion recognition model for performing emotion recognition on the text content input by the user, or acquires the first emotion recognition model for the user input from the remote server. The first emotion recognition model for emotion recognition of text content. After acquiring the first emotion recognition model, the electronic device uses the first emotion recognition model to perform emotion recognition on the text content input by the user, and records the recognized emotion as the first candidate emotion of the user.

在102中，获取用户输入文本内容期间的声音内容，并根据声音内容以及预先训练的第二情绪识别模型进行情绪识别，得到用户的第二候选情绪。In 102, the voice content during the user inputting the text content is acquired, and emotion recognition is performed according to the voice content and the pre-trained second emotion recognition model to obtain the user's second candidate emotion.

应当说明的是，本申请实施例中电子设备除了根据用户输入的文本内容进行情绪识别之外，还根据用户的声音进行情绪识别。相应的，本申请实施例中还预先训练有用于根据用户的声音内容对用户情绪进行识别的第二情绪识别模型。比如，预先建立初始的卷积神经网络模型，以及获取声音内容样本并对声音内容样本对应的情绪进行标定，得到对应的情绪标签，再根据声音内容样本以及标定的情绪标签对初始的卷积神经网络进行有监督的训练，将训练后的卷积神经网络作为对用户的声音内容进行情绪识别的第二情绪识别模型。其中，该第二情绪识别模型可以存储在电子设备本地，也可以存储在远端的服务器中。这样，电子设备在获取到用户输入文本内容期间的声音内容之后，进一步从本地获取用于对用户的声音内容进行情绪识别的第二情绪识别模型，或者，从远端的服务器获取用于对用户的声音内容进行情绪识别的第二情绪识别模型。在获取到第二情绪识别模型之后，电子设备利用该第二情绪识别模型对用户输入文本内容期间的声音内容进行情绪识别，并将此时识别到的情绪记为用户的第二候选情绪。It should be noted that in the embodiment of the present application, the electronic device not only performs emotion recognition according to the text input by the user, but also performs emotion recognition according to the user's voice. Correspondingly, in the embodiment of the present application, a second emotion recognition model for recognizing the user's emotion according to the user's voice content is also pre-trained. For example, the initial convolutional neural network model is established in advance, and the sound content samples are obtained and the emotions corresponding to the sound content samples are calibrated to obtain the corresponding emotional labels, and then the initial convolutional neural network is The network performs supervised training, and the trained convolutional neural network is used as a second emotion recognition model for emotion recognition of the user's voice content. Wherein, the second emotion recognition model may be stored locally in the electronic device, or may be stored in a remote server. In this way, after the electronic device acquires the voice content during the user input text content, it further acquires locally the second emotion recognition model for emotional recognition of the user's voice content, or acquires the second emotion recognition model from the remote server for user The second emotion recognition model for emotion recognition of voice content. After acquiring the second emotion recognition model, the electronic device uses the second emotion recognition model to perform emotion recognition on the voice content during the user inputting text content, and records the recognized emotion as the user's second candidate emotion.

其中，电子设备在侦测到用户输入文本内容的同时，启动内置麦克风或者外置麦克风进行声音采集，从而采集得到用户在输入前述文本内容期间的声音内容。这样，电子设备在根据用户的声音进行情绪识别时，可以直接获取到之前采集的用户输入文本内容期间的声音内容。Wherein, when the electronic device detects the user inputting the text content, it activates the built-in microphone or the external microphone to collect the sound, so as to collect the sound content during the user inputting the aforementioned text content. In this way, when the electronic device performs emotion recognition according to the user's voice, it can directly acquire the previously collected voice content during the user input text content.

在103中，根据第一候选情绪以及第二候选情绪确定用户的目标情绪。In 103, the user's target emotion is determined according to the first candidate emotion and the second candidate emotion.

根据以上描述，本领域普通技术人员可以理解的是，以上得到的第一候选情绪以及第二候选情绪均通过独立的信息源得到，因此，为了确保对用户情绪识别的准确性，电子设备还根据第一候选情绪以及第二候选情绪进行综合分析，最终确定用户的目标情绪。According to the above description, those skilled in the art can understand that the first candidate emotion and the second candidate emotion obtained above are obtained through independent information sources. Therefore, in order to ensure the accuracy of user emotion recognition, the electronic device also uses The first candidate emotion and the second candidate emotion are comprehensively analyzed to finally determine the user's target emotion.

由上可知，本申请实施例中，电子设备可以获取用户输入的文本内容，并根据文本内容以及预先训练的第一情绪识别模型进行情绪识别，得到用户的第一候选情绪，然后获取用户输入文本内容期间的声音内容，并根据声音内容以及预先训练的第二情绪识别模型进行情绪识别，得到用户的第二候选情绪，最后根据第一候选情绪以及第二候选情绪确定用户的目标情绪。由此，本申请实施例通过基于不同的信息源对用户进行情绪识别，并结合不同信息源所得的情绪识别结果最终确定用户的目标情绪，实现对用户情绪的准确识别。As can be seen from the above, in the embodiment of the present application, the electronic device can obtain the text content input by the user, and perform emotion recognition according to the text content and the pre-trained first emotion recognition model, obtain the user's first candidate emotion, and then obtain the user input text The voice content during the content period, and perform emotion recognition according to the voice content and the pre-trained second emotion recognition model to obtain the user's second candidate emotion, and finally determine the user's target emotion according to the first candidate emotion and the second candidate emotion. Therefore, the embodiment of the present application recognizes the user's emotion based on different information sources, and finally determines the user's target emotion by combining the emotion recognition results obtained from different information sources, so as to realize accurate recognition of the user's emotion.

在一实施例中，“根据第一候选情绪以及第二候选情绪确定用户的目标情绪”包括：In an embodiment, "determining the user's target emotion according to the first candidate emotion and the second candidate emotion" includes:

将第一候选情绪以及第二候选情绪输入预先训练的贝叶斯分类器进行分类，得到贝叶斯分类器输出的用户的目标情绪。The first candidate emotion and the second candidate emotion are input into a pre-trained Bayesian classifier for classification, and the user's target emotion output by the Bayesian classifier is obtained.

由于以上得到的第一候选情绪以及第二候选情绪均通过独立的信息源得到，为了能够结合第一候选情绪以及第二候选情绪以得到用户的目标情绪，本申请实施例中还预先训练有用于对文本来源的候选情绪以及声音来源的候选情绪进行二次情绪分类的贝叶斯分类器。比如，可以获取文本来源的情绪样本以及对应的声音来源的情绪样本，对文本来源的情绪样本及其对应的声音来源的情绪文本进行情绪标定，得到对应的情绪标签，再根据文本来源的情绪样本及其对应的声音来源的情绪文本、以及对应的情绪标签进行训练，得到贝叶斯分类器。Since the first candidate emotion and the second candidate emotion obtained above are obtained through independent information sources, in order to combine the first candidate emotion and the second candidate emotion to obtain the user's target emotion, the embodiment of the present application also pre-trains the A Bayesian classifier for quadratic sentiment classification of candidate sentiments from text sources as well as candidate sentiments from sound sources. For example, it is possible to obtain the emotional samples of the text source and the corresponding emotional samples of the sound source, carry out emotional calibration on the emotional samples of the text source and the corresponding emotional text of the sound source, obtain the corresponding emotional labels, and then use the emotional samples of the text source to and the emotional text of the corresponding sound source, and the corresponding emotional label to obtain a Bayesian classifier.

其中，训练得到的贝叶斯分类器可以存储在电子设备本地，也可以存储在远端的服务器中。这样，电子设备在根据第一候选情绪以及第二候选情绪确定用户的目标情绪时，进一步从本地获取贝叶斯分类器，或者从远端的服务器获取贝叶斯分类器，从而将之前获取到的文本来源的第一候选情绪以及声音来源的第二候选情绪输入到贝叶斯分类器中进行分类，将贝叶斯分类器所输出的情绪作为对用户进行情绪识别所最终得到的目标情绪。Wherein, the trained Bayesian classifier can be stored locally in the electronic device, or can be stored in a remote server. In this way, when the electronic device determines the user's target emotion according to the first candidate emotion and the second candidate emotion, it further obtains the Bayesian classifier locally, or obtains the Bayesian classifier from a remote server, so that the previously acquired The first candidate emotion from the text source and the second candidate emotion from the sound source are input into the Bayesian classifier for classification, and the emotion output by the Bayesian classifier is used as the target emotion finally obtained by performing emotion recognition on the user.

在一实施例中，“根据文本内容以及预先训练的第一情绪识别模型进行情绪识别，得到用户的第一候选情绪”包括：In one embodiment, "performing emotion recognition based on the text content and the pre-trained first emotion recognition model to obtain the user's first candidate emotion" includes:

(1)对文本内容进行特征提取，得到对应的特征向量；(1) Perform feature extraction on the text content to obtain the corresponding feature vector;

(2)将特征向量转换为对应的特征张量，并将特征张量输入第一情绪识别模型进行情绪识别，得到第一情绪识别模型输出的用户的第一候选情绪。(2) Convert the feature vector into a corresponding feature tensor, and input the feature tensor into the first emotion recognition model for emotion recognition, and obtain the user's first candidate emotion output by the first emotion recognition model.

本申请实施例中，电子设备在根据文本内容以及预先训练的第一情绪识别模型进行情绪识别时，并不是直接将原始的文本内容输入到第一情绪识别模型中进行预测，而是对原始的文本内容进行处理后，将能够代表原始文本内容的特征输入第一情绪识别模型中进行情绪识别。In the embodiment of the present application, when the electronic device performs emotion recognition according to the text content and the pre-trained first emotion recognition model, it does not directly input the original text content into the first emotion recognition model for prediction, but the original After the text content is processed, features that can represent the original text content are input into the first emotion recognition model for emotion recognition.

其中，电子设备首先采用预设的特征提取技术对用户输入的文本内容进行特征提取，将文本内容转换为对应的向量，记为特征向量。然后，电子设备进一步将对应文本内容的特征向量组合为张量，记为特征张量。Wherein, the electronic device first uses a preset feature extraction technology to perform feature extraction on the text content input by the user, and converts the text content into a corresponding vector, which is recorded as a feature vector. Then, the electronic device further combines the feature vectors corresponding to the text content into a tensor, which is denoted as a feature tensor.

应当说明的是，与向量、矩阵一样，张量也是一种数据结构，但张量是一个三维及以上的数据结构，其中数据的维度被称为张量的阶，可以将张量看做是向量和矩阵在多维空间中的推广，将向量看做为一阶张量，将矩阵看做是二阶张量。It should be noted that, like vectors and matrices, tensor is also a data structure, but tensor is a three-dimensional and above data structure, where the dimension of data is called the order of tensor, and tensor can be regarded as The generalization of vectors and matrices in multi-dimensional space regards vectors as first-order tensors and matrices as second-order tensors.

相应的，第一情绪识别模型在训练时，也不是根据原始的文本内容样本进行训练，而是以相同的方式获取到文本内容样本的对应的特征张量后，利用其特征张量以及标定的情绪标签进行训练。这样，在将特征向量转换为对应的特征张量之后，即可将特征张量输入第一情绪识别模型进行情绪识别，得到第一情绪识别模型输出的用户的第一候选情绪。Correspondingly, when the first emotion recognition model is trained, it is not trained according to the original text content sample, but after obtaining the corresponding feature tensor of the text content sample in the same way, using its feature tensor and the calibrated emotion labels for training. In this way, after the feature vector is converted into a corresponding feature tensor, the feature tensor can be input into the first emotion recognition model for emotion recognition, and the first candidate emotion of the user output by the first emotion recognition model can be obtained.

在一实施例中，“对文本内容进行特征提取，得到对应的特征向量”包括：In one embodiment, "extracting features from text content to obtain corresponding feature vectors" includes:

提取文本内容包括的关键词，并通过词嵌入模型将关键词映射到向量空间，得到特征向量。Extract the keywords included in the text content, and map the keywords to the vector space through the word embedding model to obtain the feature vector.

本领域普通技术人员可以理解的是，在用户输入的文本内容中并不是所有内容都具有意义，若对完整的文本内容进行特征提取，将影响到对用户情绪识别的整体效率。因此，本申请实施例中电子设备在对用户输入的文本内容进行特征提取时，首先采用预设的关键词提取算法，提取出文本内容中的关键词，利用提取出的关键词来代表完整的文本内容，降低需要进行特征提取的内容，达到提高情绪识别效率的目的。其中，对于采用何种关键词提取算法来对文本内容进行关键词提取，本申请实施例不做具体限制，可由本领域普通技术人员根据实际需要选取合适的关键词提取算法。比如，本申请实施例中，电子设备可以采用ID-TIF算法从用户输入的文本内容中提取关键词，假设用户输入的文本内容为句子“我今天很想你”，利用ID-TIF算法对其提取关键词，得到关键词为“今天”和“想你”。Those of ordinary skill in the art can understand that not all content in the text content input by the user is meaningful, and if feature extraction is performed on the complete text content, the overall efficiency of user emotion recognition will be affected. Therefore, in the embodiment of the present application, when the electronic device extracts the features of the text content input by the user, it first uses the preset keyword extraction algorithm to extract the keywords in the text content, and uses the extracted keywords to represent the complete Text content, reduce the content that needs to be extracted, and achieve the purpose of improving the efficiency of emotion recognition. The embodiment of the present application does not specifically limit which keyword extraction algorithm is used to extract keywords from the text content, and a suitable keyword extraction algorithm can be selected by those skilled in the art according to actual needs. For example, in this embodiment of the application, the electronic device can use the ID-TIF algorithm to extract keywords from the text content input by the user. Assume that the text content input by the user is the sentence "I miss you very much today", and use the ID-TIF algorithm to extract keywords from the text content input by the user. Key words are extracted, and the key words are "today" and "missing you".

电子设备在提取得到能够代表文本内容的关键词之后，进一步通过词嵌入模型将从文本内容中提取出的关键词映射到向量空间，得到对应文本内容的特征向量。其中，词嵌入模型包括但不限于Word2vec模型、GloVe模型、FastText模型以及ELMo模型等。After the electronic device extracts keywords that can represent the text content, it further maps the keywords extracted from the text content to a vector space through a word embedding model to obtain a feature vector corresponding to the text content. Among them, word embedding models include but are not limited to Word2vec model, GloVe model, FastText model, and ELMo model.

在一实施例中，“将特征张量输入第一情绪识别模型进行情绪识别”之前，还包括：In one embodiment, before "inputting the feature tensor into the first emotion recognition model for emotion recognition", it also includes:

对特征张量进行零填充处理。Zero pad the feature tensor.

本领域普通技术人员可以理解的是，用户在向电子设备输入文本内容时，由于其每次输入文本内容的长短不同，这样对于用户每次输入的文本内容所得到的特征张量的数据量也是不同的，而且特征张量内也无法对齐。因此，本申请实施例中电子设备在将特征张量输入第一情绪识别模型进行情绪识别之前，还对特征张量进行零填充处理，使得特征张量内部对齐，以及使得其数据量达到预设数据量。Those of ordinary skill in the art can understand that, when a user inputs text content to an electronic device, since the length of each input text content is different, the data amount of the feature tensor obtained for each text content input by the user is also are different, and cannot be aligned within the feature tensor. Therefore, in the embodiment of the present application, before inputting the feature tensor into the first emotion recognition model for emotion recognition, the electronic device also performs zero-fill processing on the feature tensor, so that the feature tensor is internally aligned, and its data volume reaches the preset The amount of data.

相应的，该第一情绪识别模型在训练时，对于根据文本内容样本对应的特征张量，同样对齐进行零填充处理，使得其内部对齐，且数据量达到预设数据量。Correspondingly, during the training of the first emotion recognition model, the feature tensors corresponding to the text content samples are also aligned and zero-filled so that they are internally aligned and the data volume reaches the preset data volume.

在一实施例中，“根据声音内容以及预先训练的第二情绪识别模型进行情绪识别，得到用户的第二候选情绪”包括：In one embodiment, "performing emotion recognition based on the voice content and the pre-trained second emotion recognition model to obtain the user's second candidate emotion" includes:

(1)将声音内容划分为多个子声音内容；(1) dividing the sound content into a plurality of sub-sound contents;

(2)将多个子声音内容分别输入第二情绪识别模型进行情绪识别，得到对应的多个候选情绪；(2) Input the content of a plurality of sub-sounds into the second emotion recognition model to carry out emotion recognition respectively, and obtain a plurality of corresponding candidate emotions;

(3)根据多个候选情绪确定用户的第二候选情绪。(3) Determine the second candidate emotion of the user according to the plurality of candidate emotions.

本申请实施例中，电子设备在根据声音内容以及预先训练的第二情绪识别模型进行情绪识别，得到用户的第二候选情绪时，首先将用户输入前述文本内容期间的声音内容进行划分，将其划分为多个长度相同的子声音内容。其中，在对声音内容进行划分时，相邻两个子声音内容可以具有相同的声音部分，也可以不具有相同的声音部分。In the embodiment of the present application, when the electronic device performs emotion recognition based on the voice content and the pre-trained second emotion recognition model to obtain the user's second candidate emotion, it first divides the voice content during the user's input of the aforementioned text content, and divides it into Divided into multiple sub-sound contents of the same length. Wherein, when dividing the sound content, two adjacent sub-sound contents may or may not have the same sound part.

在将完整的声音内容划分为多个子声音内容之后，电子设备分别将多个子声音内容转换为对应的频谱图，利用频谱图来代表相应的子声音内容，比如，可以采用快速傅里叶变换或者梅尔频率倒谱系数的方式进行频谱图的转换。After the complete sound content is divided into multiple sub-sound contents, the electronic device respectively converts the multiple sub-sound contents into corresponding spectrograms, and uses the spectrogram to represent the corresponding sub-sound contents, for example, fast Fourier transform or Convert the spectrogram by means of Mel-frequency cepstral coefficients.

相应的，第二情绪识别模型在训练时，也不是根据原始的声音内容样本进行训练，而是将声音内容样本转换为对应的频谱图后，利用其对应的频谱图以及标定的情绪标签进行训练。这样，电子设备在将划分得到多个子声音内容转换为对应的频谱图之后，分别将各子声音内容对应的频谱图输入到第二情绪识别模型进行情绪识别，得到第二情绪识别模型输出的对应各子声音内容的候选情绪。Correspondingly, when the second emotion recognition model is trained, it is not trained according to the original sound content samples, but after the sound content samples are converted into corresponding spectrograms, the corresponding spectrograms and calibrated emotional labels are used for training . In this way, after the electronic device converts a plurality of sub-sound contents obtained by division into corresponding spectrograms, respectively input the spectrograms corresponding to each sub-sound content into the second emotion recognition model for emotion recognition, and obtain the corresponding output of the second emotion recognition model. Candidate emotions for each sub-voice content.

然后，电子设备进一步根据得到多个候选情绪确定用户的第二候选情绪。比如，电子设备可以判断多个候选情绪中、相同候选情绪占全部候选情绪的比例是否达到预设比例，若达到，则可将相同的候选情绪确定为用户的第二候选情绪。Then, the electronic device further determines the second candidate emotion of the user according to the obtained plurality of candidate emotions. For example, the electronic device may determine whether the proportion of the same candidate emotion in all candidate emotions among the plurality of candidate emotions reaches a preset ratio, and if so, may determine the same candidate emotion as the user's second candidate emotion.

需要说明的是，对于预设比例的具体取值，本申请实施例不做具体限制，可由本领域技术人员根据实际需要进行设置，比如，本申请实施例中将预设比例设置为60％。比如，对声音内容进行划分，共得到5个子声音内容，将这5个子声音内容分别转换为对于的频谱图，并通过第二情绪识别模型进行情绪识别，得到5个候选情绪，若这5个候选情绪中有3个候选情绪相同，均为“开心”，此时将“开心”确定为用户的第二候选情绪。It should be noted that the embodiment of the present application does not specifically limit the specific value of the preset ratio, which can be set by those skilled in the art according to actual needs. For example, the preset ratio is set to 60% in the embodiment of the present application. For example, by dividing the voice content, a total of 5 sub-sound contents are obtained, and these 5 sub-sound contents are respectively converted into corresponding spectrograms, and the emotion recognition is carried out through the second emotion recognition model to obtain 5 candidate emotions. Among the candidate emotions, three candidate emotions are the same, all of which are "happy". At this time, "happy" is determined as the second candidate emotion of the user.

在一实施例中，“根据第一候选情绪以及第二候选情绪确定用户的目标情绪”之后，还包括：In an embodiment, after "determining the user's target emotion according to the first candidate emotion and the second candidate emotion", it also includes:

将文本内容以及目标情绪发送至对应的目标设备。Send the text content and target emotion to the corresponding target device.

本领域普通技术人员可以理解的是，用户在通过电子设备与其他用户进行基于文本内容的沟通时，用户并不清楚其他用户的情绪，其他用户也不清楚用户的情绪。为此，在本申请实施例中，当用户输入的文本内容用于其他用户的沟通时，将用户输入的文本内容以及识别出的目标情绪发送至对应的目标设备，也即是与用户沟通的其他用户的电子设备。由此，使得其他用户能够查看用户输入的文本内容之外，还能够获知用户输入文本内容时的情绪，有助于更好的沟通。Those of ordinary skill in the art can understand that when a user communicates with other users based on text content through an electronic device, the user does not know the other user's emotion, and the other user does not know the user's emotion. For this reason, in this embodiment of the application, when the text content entered by the user is used for communication with other users, the text content entered by the user and the recognized target emotion are sent to the corresponding target device, that is, the communication with the user Electronic Devices of Other Users. In this way, other users can not only view the text content input by the user, but also know the emotion of the user when inputting the text content, which is helpful for better communication.

请结合参照图3和图4，图3为本申请实施例提供的情绪识别方法的另一种流程示意图，图4为该情绪识别方法的应用场景示意图。该情绪识别方法可以应用于电子设备，该情绪识别方法的流程可以包括：Please refer to FIG. 3 and FIG. 4 in conjunction. FIG. 3 is another schematic flow chart of the emotion recognition method provided by the embodiment of the present application, and FIG. 4 is a schematic diagram of an application scenario of the emotion recognition method. The emotion recognition method can be applied to electronic equipment, and the process of the emotion recognition method can include:

在201中，电子设备在用户与其他用户沟通时，获取用户输入的文本内容。In 201, the electronic device acquires text content input by the user when the user communicates with other users.

本领域普通技术人员可以理解的是，用户在通过电子设备与其他用户进行基于文本内容的沟通时，用户并不清楚其他用户的情绪，其他用户也不清楚用户的情绪。为此，在本申请实施例中，当用户输入的文本内容用于其他用户的沟通时，可以对用户的情绪进行识别，并将用户输入的文本内容以及识别得到的情绪发送至其他用户的电子设备，从而帮助用户更好的与他人进行沟通。Those of ordinary skill in the art can understand that when a user communicates with other users based on text content through an electronic device, the user does not know the other user's emotion, and the other user does not know the user's emotion. For this reason, in this embodiment of the application, when the text content entered by the user is used for communication with other users, the user's emotion can be recognized, and the text content entered by the user and the recognized emotion can be sent to other users' electronic devices. devices to help users better communicate with others.

其中，电子设备首先识别用户是否与其他用户进行沟通，比如，可以通过识别前台运行的应用程序是否为沟通类应用程序(比如即时通讯应用以及短信等)方式来判定用户是否与其他用户进行沟通，其中，若前台运行的应用程序为沟通类应用程序，则电子设备判定用户在与其他用户进行沟通。电子设备在识别到用户与其他用户沟通时，对用户输入进行侦测，在侦测到用户输入文本内容时，触发对用户的情绪识别。其中，文本内容包括但不限于词、句以及文章等。Among them, the electronic device first identifies whether the user communicates with other users. For example, it can be determined whether the user communicates with other users by identifying whether the application program running in the foreground is a communication application program (such as an instant messaging application and a text message, etc.). Wherein, if the application program running in the foreground is a communication application program, the electronic device determines that the user is communicating with other users. When the electronic device recognizes that the user communicates with other users, it detects the user input, and when it detects the text input by the user, it triggers emotion recognition for the user. Wherein, the text content includes but not limited to words, sentences and articles.

在202中，电子设备提取前述文本内容包括的关键词，并通过词嵌入模型将提取到的关键词映射到向量空间，得到对应的特征向量。In 202, the electronic device extracts keywords included in the aforementioned text content, and maps the extracted keywords to a vector space through a word embedding model to obtain corresponding feature vectors.

在203中，电子设备将前述特征向量转换为对应的特征张量，并将特征张量输入预先训练的第一情绪识别模型进行情绪识别，得到用户的第一候选情绪。In 203, the electronic device converts the aforementioned feature vectors into corresponding feature tensors, and inputs the feature tensors into the pre-trained first emotion recognition model for emotion recognition to obtain the user's first candidate emotions.

应当说明的是，在本申请实施例中，预先训练有用于根据用户输入的文本内容对用户情绪进行识别的第一情绪识别模型。这样，电子设备在获取到用户输入的文本内容之后，即可根据该文本内容以及预先训练的第一情绪识别模型对用户的情绪进行初步的识别。It should be noted that, in the embodiment of the present application, a first emotion recognition model for recognizing user emotions according to text content input by the user is pre-trained. In this way, after the electronic device acquires the text content input by the user, it can initially recognize the user's emotion according to the text content and the pre-trained first emotion recognition model.

其中，电子设备首先采用预设的关键词提取算法，提取出文本内容中的关键词，利用提取出的关键词来代表完整的文本内容。对于采用何种关键词提取算法来对文本内容进行关键词提取，本申请实施例不做具体限制，可由本领域普通技术人员根据实际需要选取合适的关键词提取算法。比如，本申请实施例中，电子设备可以采用ID-TIF算法从用户输入的文本内容中提取关键词，假设用户输入的文本内容为句子“我今天很想你”，利用ID-TIF算法对其提取关键词，得到关键词为“今天”和“想你”。Wherein, the electronic device first uses a preset keyword extraction algorithm to extract keywords in the text content, and uses the extracted keywords to represent the complete text content. The embodiment of the present application does not specifically limit which keyword extraction algorithm is used to extract keywords from the text content, and a suitable keyword extraction algorithm can be selected by those skilled in the art according to actual needs. For example, in this embodiment of the application, the electronic device can use the ID-TIF algorithm to extract keywords from the text content input by the user. Assume that the text content input by the user is the sentence "I miss you very much today", and use the ID-TIF algorithm to extract keywords from the text content input by the user. Key words are extracted, and the key words are "today" and "missing you".

然后，电子设备进一步将对应文本内容的特征向量组合为张量，记为特征张量。与向量、矩阵一样，张量也是一种数据结构，但张量是一个三维及以上的数据结构，其中数据的维度被称为张量的阶，可以将张量看做是向量和矩阵在多维空间中的推广，将向量看做为一阶张量，将矩阵看做是二阶张量。在将特征向量转换为对应的特征张量之后，即可将特征张量输入第一情绪识别模型进行情绪识别，得到第一情绪识别模型输出的用户的第一候选情绪。Then, the electronic device further combines the feature vectors corresponding to the text content into a tensor, which is denoted as a feature tensor. Like vectors and matrices, tensor is also a data structure, but tensor is a three-dimensional and above data structure, where the dimension of data is called the order of tensor, and tensor can be regarded as vector and matrix in multidimensional Generalization in space, consider vectors as first-order tensors, and matrices as second-order tensors. After converting the feature vector into a corresponding feature tensor, the feature tensor can be input into the first emotion recognition model for emotion recognition, and the first candidate emotion of the user output by the first emotion recognition model can be obtained.

在204中，电子设备获取用户输入文本内容期间的声音内容。In 204, the electronic device acquires the sound content during the user inputting the text content.

应当说明的是，本申请实施例中电子设备除了根据用户输入的文本内容进行情绪识别之外，还根据用户的声音进行情绪识别。其中，电子设备在侦测到用户输入文本内容的同时，启动内置麦克风或者外置麦克风进行声音采集，从而采集得到用户在输入前述文本内容期间的声音内容。这样，电子设备可以直接获取到之前采集的用户输入文本内容期间的声音内容。It should be noted that in the embodiment of the present application, the electronic device not only performs emotion recognition according to the text input by the user, but also performs emotion recognition according to the user's voice. Wherein, when the electronic device detects the user inputting the text content, it activates the built-in microphone or the external microphone to collect the sound, so as to collect the sound content during the user inputting the aforementioned text content. In this way, the electronic device can directly acquire the previously collected sound content during the user inputting the text content.

在205中，电子设备根据前述声音内容以及预先训练的第二情绪识别模型进行情绪识别，得到用户的第二候选情绪。In 205, the electronic device performs emotion recognition according to the aforementioned voice content and the pre-trained second emotion recognition model to obtain the second candidate emotion of the user.

本申请实施例中还预先训练有用于根据用户的声音内容对用户情绪进行识别的第二情绪识别模型。其中，电子设备将前述声音内容转换为对应的频谱图，利用频谱图来代表声音内容，比如，可以采用快速傅里叶变换或者梅尔频率倒谱系数的方式进行频谱图的转换。In the embodiment of the present application, a second emotion recognition model for recognizing the user's emotion according to the user's voice content is also pre-trained. Wherein, the electronic device converts the foregoing sound content into a corresponding spectrogram, and uses the spectrogram to represent the sound content. For example, the conversion of the spectrogram may be performed by using fast Fourier transform or Mel frequency cepstral coefficient.

电子设备在将声音内容转换为对应的频谱图之后，转换得到的频谱图输入到第二情绪识别模型进行情绪识别，得到第二情绪识别模型输出的用户的第二候选情绪。After the electronic device converts the sound content into a corresponding spectrogram, the converted spectrogram is input to the second emotion recognition model for emotion recognition, and the second candidate emotion of the user output by the second emotion recognition model is obtained.

在206中，电子设备将第一候选情绪以及第二候选情绪输入预先训练的贝叶斯分类器进行分类，得到贝叶斯分类器输出的用户的目标情绪。In 206, the electronic device inputs the first candidate emotion and the second candidate emotion into a pre-trained Bayesian classifier for classification, and obtains the user's target emotion output by the Bayesian classifier.

申请实施例中还预先训练有用于对文本来源的候选情绪以及声音来源的候选情绪进行二次情绪分类的贝叶斯分类器。比如，可以获取文本来源的情绪样本以及对应的声音来源的情绪样本，对文本来源的情绪样本及其对应的声音来源的情绪文本进行情绪标定，得到对应的情绪标签，再根据文本来源的情绪样本及其对应的声音来源的情绪文本、以及对应的情绪标签进行训练，得到贝叶斯分类器。In the embodiment of the application, a Bayesian classifier for performing secondary sentiment classification on candidate sentiments from text sources and candidate sentiments from voice sources is also pre-trained. For example, it is possible to obtain the emotional samples of the text source and the corresponding emotional samples of the sound source, carry out emotional calibration on the emotional samples of the text source and the corresponding emotional text of the sound source, obtain the corresponding emotional labels, and then use the emotional samples of the text source to and the emotional text of the corresponding sound source, and the corresponding emotional label to obtain a Bayesian classifier.

在207中，电子设备将前述文本内容以及目标情绪发送至其他用户的电子设备。In 207, the electronic device sends the aforementioned text content and target emotion to other user's electronic devices.

电子设备在识别出用户的目标情绪之后，将用户输入的文本内容以及识别出的目标情绪发送至对应的目标设备，也即是与用户沟通的其他用户的电子设备。由此，使得其他用户能够查看用户输入的文本内容之外，还能够获知用户输入文本内容时的情绪，有助于更好的沟通。After recognizing the user's target emotion, the electronic device sends the text content input by the user and the recognized target emotion to the corresponding target device, that is, the electronic device of other users communicating with the user. In this way, other users can not only view the text content input by the user, but also know the emotion of the user when inputting the text content, which is helpful for better communication.

本申请实施例还提供一种情绪识别装置。请参照图5，图5为本申请实施例提供的情绪识别装置的结构示意图。其中该情绪识别装置应用于电子设备，该情绪识别装置包括第一情绪识别模块301、第二情绪识别模块302以及目标情绪识别模块303以及行为预测模块304，如下：The embodiment of the present application also provides an emotion recognition device. Please refer to FIG. 5 , which is a schematic structural diagram of an emotion recognition device provided by an embodiment of the present application. Where the emotion recognition device is applied to electronic equipment, the emotion recognition device includes a first emotion recognition module 301, a second emotion recognition module 302, a target emotion recognition module 303 and a behavior prediction module 304, as follows:

第一情绪识别模块301，用于获取用户输入的文本内容，并根据文本内容以及预先训练的第一情绪识别模型进行情绪识别，得到用户的第一候选情绪；The first emotion recognition module 301 is used to obtain the text content input by the user, and perform emotion recognition according to the text content and the pre-trained first emotion recognition model, to obtain the user's first candidate emotion;

第二情绪识别模块302，用于获取用户输入文本内容期间的声音内容，并根据声音内容以及预先训练的第二情绪识别模型进行情绪识别，得到用户的第二候选情绪；The second emotion recognition module 302 is used to obtain the voice content during the user input text content, and perform emotion recognition according to the voice content and the pre-trained second emotion recognition model to obtain the user's second candidate emotion;

目标情绪识别模块303，用于根据第一候选情绪以及第二候选情绪确定用户的目标情绪。A target emotion identification module 303, configured to determine the user's target emotion according to the first candidate emotion and the second candidate emotion.

在一实施例中，在根据第一候选情绪以及第二候选情绪确定用户的目标情绪时，目标情绪识别模块303可以用于：In an embodiment, when determining the user's target emotion according to the first candidate emotion and the second candidate emotion, the target emotion identification module 303 may be used to:

在一实施例中，在根据文本内容以及预先训练的第一情绪识别模型进行情绪识别，得到用户的第一候选情绪时，第一情绪识别模块301可以用于：In an embodiment, when performing emotion recognition according to the text content and the pre-trained first emotion recognition model to obtain the user's first candidate emotion, the first emotion recognition module 301 can be used to:

对文本内容进行特征提取，得到对应的特征向量；Perform feature extraction on the text content to obtain the corresponding feature vector;

将特征向量转换为对应的特征张量，并将特征张量输入第一情绪识别模型进行情绪识别，得到第一情绪识别模型输出的用户的第一候选情绪。The feature vector is converted into a corresponding feature tensor, and the feature tensor is input into the first emotion recognition model for emotion recognition, and the user's first candidate emotion output by the first emotion recognition model is obtained.

在一实施例中，在对文本内容进行特征提取，得到对应的特征向量时，第一情绪识别模块301可以用于：In an embodiment, when performing feature extraction on the text content to obtain the corresponding feature vector, the first emotion recognition module 301 can be used to:

在一实施例中，在将特征张量输入第一情绪识别模型进行情绪识别之前，还包括：In one embodiment, before inputting the feature tensor into the first emotion recognition model for emotion recognition, it also includes:

对特征张量进行零填充处理。Zero pad the feature tensor.

在一实施例中，在根据声音内容以及预先训练的第二情绪识别模型进行情绪识别，得到用户的第二候选情绪时，第二情绪识别模块302可以用于：In an embodiment, when performing emotion recognition according to the voice content and the pre-trained second emotion recognition model to obtain the second candidate emotion of the user, the second emotion recognition module 302 can be used to:

将声音内容划分为多个子声音内容；dividing the sound content into a plurality of sub-sound contents;

将多个子声音内容分别输入第二情绪识别模型进行情绪识别，得到对应的多个候选情绪；Inputting the content of a plurality of sub-sounds into the second emotion recognition model for emotion recognition to obtain a plurality of corresponding candidate emotions;

根据多个候选情绪确定用户的第二候选情绪。A second candidate emotion of the user is determined according to the plurality of candidate emotions.

在一实施例中，情绪识别装置还包括内容发送模块，用于在根据第一候选情绪以及第二候选情绪确定用户的目标情绪之后，将文本内容以及目标情绪发送至对应的目标设备。In one embodiment, the emotion recognition device further includes a content sending module, configured to send the text content and the target emotion to the corresponding target device after determining the user's target emotion according to the first candidate emotion and the second candidate emotion.

应当说明的是，本申请实施例提供的情绪识别装置与上文实施例中的情绪识别方法属于同一构思，在情绪识别装置上可以运行情绪识别方法实施例中提供的任一方法，其具体实现过程详见情绪识别方法实施例，此处不再赘述。It should be noted that the emotion recognition device provided in the embodiment of the present application belongs to the same concept as the emotion recognition method in the above embodiment, and any method provided in the embodiment of the emotion recognition method can be run on the emotion recognition device, and its specific implementation For details on the process, see the embodiment of the emotion recognition method, which will not be repeated here.

本申请实施例提供一种计算机可读的存储介质，其上存储有计算机程序，当其存储的计算机程序在计算机上执行时，使得计算机执行如本实施例提供的情绪识别方法中的步骤。其中，存储介质可以是磁碟、光盘、只读存储器(Read Only Memory，ROM，)或者随机存取器(Random Access Memory，RAM)等。An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the stored computer program is executed on a computer, the computer is made to perform the steps in the emotion recognition method provided in this embodiment. Wherein, the storage medium may be a magnetic disk, an optical disk, a read only memory (Read Only Memory, ROM), or a random access device (Random Access Memory, RAM), and the like.

本申请实施例还提供一种电子设备，包括存储器，处理器，处理器通过调用存储器中存储的计算机程序，执行本实施例提供的情绪识别方法中的步骤。The embodiment of the present application also provides an electronic device, including a memory and a processor, and the processor executes the steps in the emotion recognition method provided in this embodiment by invoking a computer program stored in the memory.

在一实施例中，还提供一种电子设备。请参照图6，电子设备包括处理器401以及存储器402。其中，处理器401与存储器402电性连接。In an embodiment, an electronic device is also provided. Referring to FIG. 6 , the electronic device includes a processor 401 and a memory 402 . Wherein, the processor 401 is electrically connected with the memory 402 .

处理器401是电子设备的控制中心，利用各种接口和线路连接整个电子设备的各个部分，通过运行或加载存储在存储器402内的计算机程序，以及调用存储在存储器402内的数据，执行电子设备的各种功能并处理数据。The processor 401 is the control center of the electronic device. It uses various interfaces and lines to connect various parts of the entire electronic device. By running or loading the computer program stored in the memory 402 and calling the data stored in the memory 402, the electronic device executes various functions and process data.

存储器402可用于存储软件程序以及模块，处理器401通过运行存储在存储器402的计算机程序以及模块，从而执行各种功能应用以及数据处理。存储器402可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的计算机程序(比如声音播放功能、图像播放功能等)等；存储数据区可存储根据电子设备的使用所创建的数据等。此外，存储器402可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地，存储器402还可以包括存储器控制器，以提供处理器401对存储器402的访问。The memory 402 can be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by running the computer programs and modules stored in the memory 402 . The memory 402 can mainly include a program storage area and a data storage area, wherein the program storage area can store operating systems, computer programs required by at least one function (such as sound playback function, image playback function, etc.); Data created by the use of electronic devices, etc. In addition, the memory 402 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices. Correspondingly, the memory 402 may further include a memory controller to provide the processor 401 with access to the memory 402 .

在本申请实施例中，电子设备中的处理器401会按照如下的步骤，将一个或一个以上的计算机程序的进程对应的指令加载到存储器402中，并由处理器401运行存储在存储器402中的计算机程序，从而实现各种功能，如下：In this embodiment of the application, the processor 401 in the electronic device will follow the steps below to load the instructions corresponding to the process of one or more computer programs into the memory 402, and run and store them in the memory 402 by the processor 401. A computer program to achieve various functions, as follows:

获取用户输入的文本内容，并根据文本内容以及预先训练的第一情绪识别模型进行情绪识别，得到用户的第一候选情绪；Obtaining the text content input by the user, and performing emotion recognition according to the text content and the pre-trained first emotion recognition model, to obtain the user's first candidate emotion;

获取用户输入文本内容期间的声音内容，并根据声音内容以及预先训练的第二情绪识别模型进行情绪识别，得到用户的第二候选情绪；Acquiring the voice content during the period when the user inputs the text content, and performing emotion recognition according to the voice content and the pre-trained second emotion recognition model, to obtain the user's second candidate emotion;

根据第一候选情绪以及第二候选情绪确定用户的目标情绪。The user's target emotion is determined according to the first candidate emotion and the second candidate emotion.

请参照图7，图7为本申请实施例提供的电子设备的另一结构示意图，与图6所示电子设备的区别在于，电子设备还包括输入单元403和输出单元404等组件。Please refer to FIG. 7 . FIG. 7 is another schematic structural diagram of an electronic device provided by an embodiment of the present application. The difference from the electronic device shown in FIG. 6 is that the electronic device also includes components such as an input unit 403 and an output unit 404 .

其中，输入单元403可用于接收输入的数字、字符信息或用户特征信息(比如指纹)，以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入等。Among them, the input unit 403 can be used to receive input numbers, character information or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

输出单元404可用于显示由用户输入的信息或提供给用户的信息，如屏幕。The output unit 404 can be used to display information input by the user or information provided to the user, such as a screen.

在一实施例中，在根据第一候选情绪以及第二候选情绪确定用户的目标情绪时，处理器401可以执行：In an embodiment, when determining the user's target emotion according to the first candidate emotion and the second candidate emotion, the processor 401 may execute:

在一实施例中，在根据文本内容以及预先训练的第一情绪识别模型进行情绪识别，得到用户的第一候选情绪时，处理器401可以执行：In an embodiment, when performing emotion recognition according to the text content and the pre-trained first emotion recognition model to obtain the user's first candidate emotion, the processor 401 may execute:

在一实施例中，在对文本内容进行特征提取，得到对应的特征向量时，处理器401可以执行：In an embodiment, when performing feature extraction on the text content to obtain the corresponding feature vector, the processor 401 may execute:

在一实施例中，在将特征张量输入第一情绪识别模型进行情绪识别之前，处理器401可以执行：In an embodiment, before inputting the feature tensor into the first emotion recognition model for emotion recognition, the processor 401 may execute:

对特征张量进行零填充处理。Zero pad the feature tensor.

在一实施例中，在根据声音内容以及预先训练的第二情绪识别模型进行情绪识别，得到用户的第二候选情绪时，处理器401可以执行：In an embodiment, when performing emotion recognition according to the voice content and the pre-trained second emotion recognition model to obtain the second candidate emotion of the user, the processor 401 may execute:

在一实施例中，在根据第一候选情绪以及第二候选情绪确定用户的目标情绪之后，处理器401可以执行：In an embodiment, after determining the user's target emotion according to the first candidate emotion and the second candidate emotion, the processor 401 may execute:

应当说明的是，本申请实施例提供的电子设备与上文实施例中的情绪识别方法属于同一构思，在电子设备上可以运行情绪识别方法实施例中提供的任一方法，其具体实现过程详见情绪识别方法实施例，此处不再赘述。It should be noted that the electronic device provided in the embodiment of the present application belongs to the same idea as the emotion recognition method in the above embodiment, any method provided in the embodiment of the emotion recognition method can be run on the electronic device, and its specific implementation process is detailed See the embodiment of the emotion recognition method, which will not be repeated here.

应当说明的是，对本申请实施例的情绪识别方法而言，本领域普通技术人员可以理解实现本申请实施例的情绪识别方法的全部或部分流程，是可以通过计算机程序来控制相关的硬件来完成，所述计算机程序可存储于一计算机可读取存储介质中，如存储在电子设备的存储器中，并被该电子设备内的至少一个处理器执行，在执行过程中可包括如情绪识别方法的实施例的流程。其中，所述的存储介质可为磁碟、光盘、只读存储器、随机存取记忆体等。It should be noted that, for the emotion recognition method of the embodiment of the present application, those skilled in the art can understand that all or part of the process of implementing the emotion recognition method of the embodiment of the application can be completed by controlling the relevant hardware through a computer program , the computer program may be stored in a computer-readable storage medium, such as stored in a memory of an electronic device, and executed by at least one processor in the electronic device, and the execution process may include, for example, an emotion recognition method Example flow. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, and the like.

对本申请实施例的情绪识别装置而言，其各功能模块可以集成在一个处理芯片中，也可以是各个模块单独物理存在，也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中，所述存储介质譬如为只读存储器，磁盘或光盘等。For the emotion recognition device of the embodiment of the present application, its various functional modules can be integrated into one processing chip, or each module can exist separately physically, or two or more modules can be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are implemented in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium, such as read-only memory, magnetic disk or optical disk, etc. .

以上对本申请实施例所提供的一种情绪识别方法、装置、存储介质及电子设备进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The above describes in detail the emotion recognition method, device, storage medium and electronic equipment provided by the embodiment of the present application. In this paper, specific examples are used to illustrate the principle and implementation of the present application. The description of the above embodiment is only It is used to help understand the method and its core idea of this application; at the same time, for those skilled in the art, according to the idea of this application, there will be changes in the specific implementation and application scope. In summary, this specification The content should not be construed as a limitation of the application.

Claims

1. A method for emotion recognition, applied to an electronic device, comprising:

the data processing layer in the panoramic sensing architecture acquires text content input by a user from panoramic data and provides the text content to the feature extraction layer, and the scene modeling layer carries out emotion recognition according to the text content and a first emotion recognition model trained in advance to obtain a first candidate emotion of the user, wherein the first candidate emotion comprises the following specific steps: the feature extraction layer performs feature extraction on the text content to obtain feature vectors corresponding to the text content; converting the feature vector of the text content into a corresponding feature tensor at the scene modeling layer, inputting the feature tensor into the first emotion recognition model for emotion recognition, and obtaining a first candidate emotion of the user output by the first emotion recognition model, wherein the feature tensor is a data structure with three dimensions and more than three dimensions;

The data processing layer in the panoramic sensing architecture acquires sound content during the text content input period of a user from panoramic data and provides the sound content to the feature extraction layer, and the scene modeling layer carries out emotion recognition according to the sound content and a pre-trained second emotion recognition model to obtain a second candidate emotion of the user;

determining a target emotion of the user according to the first candidate emotion and the second candidate emotion;

and sending the text content and the target emotion to electronic equipment of other users so that the other users can view the text content for input and know the emotion when the user inputs the text content.

2. The emotion recognition method of claim 1, wherein the determining the target emotion of the user from the first candidate emotion and the second candidate emotion comprises:

and inputting the first candidate emotion and the second candidate emotion into a pre-trained Bayesian classifier to classify, so as to obtain the target emotion of the user output by the Bayesian classifier.

3. The emotion recognition method according to claim 1, wherein the feature extraction of the text content to obtain a corresponding feature vector includes:

And extracting keywords included in the text content, and mapping the keywords to a vector space through a word embedding model to obtain the feature vector.

4. The emotion recognition method of claim 1, wherein before inputting the feature tensor into the first emotion recognition model for emotion recognition, further comprising:

and performing zero filling processing on the characteristic tensor.

5. The method of claim 1, wherein the performing emotion recognition based on the sound content and a pre-trained second emotion recognition model to obtain a second candidate emotion of the user comprises:

dividing the sound content into a plurality of sub-sound contents;

respectively inputting the plurality of sub-sound contents into the second emotion recognition model to perform emotion recognition to obtain a plurality of corresponding candidate emotions;

and determining a second candidate emotion of the user according to the plurality of candidate emotions.

6. An emotion recognition device applied to an electronic apparatus, comprising:

the first emotion recognition module is used for acquiring text content input by a user from panoramic data by the data processing layer in the panoramic sensing architecture and providing the text content to the feature extraction layer, and the scene modeling layer carries out emotion recognition according to the text content and a first emotion recognition model trained in advance to obtain a first candidate emotion of the user, wherein the first candidate emotion comprises the following specific steps of: the feature extraction layer performs feature extraction on the text content to obtain feature vectors corresponding to the text content; converting the feature vector of the text content into a corresponding feature tensor at the scene modeling layer, inputting the feature tensor into the first emotion recognition model for emotion recognition, and obtaining a first candidate emotion of the user output by the first emotion recognition model, wherein the feature tensor is a data structure with three dimensions and more than three dimensions;

The second emotion recognition module is used for acquiring sound content of the user during the text content input period from panoramic data by the data processing layer in the panoramic sensing framework and providing the sound content to the feature extraction layer, and the scene modeling layer carries out emotion recognition according to the sound content and a pre-trained second emotion recognition model to obtain a second candidate emotion of the user;

a target emotion recognition module for determining a target emotion of the user according to the first candidate emotion and the second candidate emotion; and the content sending module is used for sending the text content and the target emotion to electronic equipment of other users after determining the target situation of the user according to the first candidate emotion and the second follow-up situation so that the other users can view the text content used for input and know the emotion when the user inputs the text content.

7. A storage medium having stored thereon a computer program, which when run on a computer causes the computer to perform the emotion recognition method of any of claims 1 to 5.

8. An electronic device comprising a processor and a memory, the memory storing a computer program, wherein the processor is configured to perform the emotion recognition method of any of claims 1 to 5 by invoking the computer program.