Disclosure of Invention
In view of this, the present application provides a method and a related apparatus for recognizing a user emotion, which implement multi-angle comprehensive emotion recognition and greatly improve accuracy of user emotion recognition.
In a first aspect, an embodiment of the present application provides a method for recognizing a user emotion, where the method includes:
obtaining current input information and emotion factors of a user;
extracting emotion characteristics of the current input information to obtain current emotion characteristics corresponding to the current input information;
and obtaining the current emotion type of the user based on the current emotion characteristics, the emotion factors and an emotion recognition model.
Optionally, the current input information includes at least two of current text input information, current voice input information, and current image input information.
Optionally, the current image input information includes a current face image and/or a current body posture image.
Optionally, the emotional factors include a character type and/or a historical emotion type.
Optionally, the personality type of the user is obtained based on preset time period input information of the user and a personality identification model, wherein the personality identification model is obtained by training a first preset identification network in advance based on the personality characteristics and the personality type labels corresponding to the preset time period input information samples.
Optionally, the training step of the character recognition model includes:
inputting the corresponding character features of the input information samples in the preset time period into the first preset identification network according to the time sequence to obtain a predicted character type;
adjusting parameters of the first pre-set identification network based on the predicted personality type, the personality type label, and a first pre-set loss function;
and taking the trained first preset recognition network as the character recognition model.
Optionally, the obtaining of the personality type of the user includes:
performing character feature extraction on the input information of the user in a preset time period to obtain character features corresponding to the input information of the preset time period;
determining a time sequence of character features corresponding to the preset time period input information;
and inputting the character features corresponding to the preset time period input information into the character recognition model according to the time sequence to obtain the character type of the user.
Optionally, the historical emotion types of the user are pre-stored in a map manner and/or an embedded manner.
Optionally, the emotion recognition model is obtained by pre-training a second preset recognition network based on emotion characteristics, emotion factor samples and corresponding emotion type labels corresponding to the preset input information samples.
Optionally, the training step of the emotion recognition model includes:
inputting the emotion characteristics corresponding to the preset input information samples and the emotion factor samples into the second preset identification network to obtain predicted emotion types;
adjusting parameters of the second preset identification network based on the predicted emotion type, the emotion type tag and a second preset loss function;
and taking the trained second preset recognition network as the emotion recognition model.
In a second aspect, an embodiment of the present application provides an apparatus for recognizing a user emotion, where the apparatus includes:
a first obtaining unit for obtaining current input information and emotion factors of a user;
the second obtaining unit is used for extracting emotion characteristics of the current input information to obtain current emotion characteristics corresponding to the current input information;
and the third obtaining unit is used for obtaining the current emotion type of the user based on the current emotion characteristics, the emotion factors and the emotion recognition model.
Optionally, the current input information includes at least two of current text input information, current voice input information, and current image input information.
Optionally, the current image input information includes a current face image and/or a current body posture image.
Optionally, the emotional factors include a character type and/or a historical emotion type.
Optionally, the personality type of the user is obtained based on the preset time period input information of the user and a personality identification model; the character recognition model is obtained by pre-training a first preset recognition network based on character features and character type labels corresponding to input information samples in a preset time period.
Optionally, the apparatus further includes a first training unit, where the first training unit is configured to:
inputting the corresponding character features of the input information samples in the preset time period into the first preset identification network according to the time sequence to obtain a predicted character type;
adjusting parameters of the first pre-set identification network based on the predicted personality type, the personality type label, and a first pre-set loss function;
and taking the trained first preset recognition network as the character recognition model.
Optionally, the apparatus further includes a fourth obtaining unit, where the fourth obtaining unit is configured to:
performing character feature extraction on the input information of the user in a preset time period to obtain character features corresponding to the input information of the preset time period;
determining a time sequence of character features corresponding to the preset time period input information;
and inputting the character features corresponding to the preset time period input information into the character recognition model according to the time sequence to obtain the character type of the user.
Optionally, the historical emotion types of the user are pre-stored in a map manner and/or an embedded manner.
Optionally, the emotion recognition model is obtained by pre-training a second preset recognition network based on emotion characteristics, emotion factor samples and corresponding emotion type labels corresponding to the preset input information samples.
Optionally, the apparatus further includes a second training unit, where the second training unit is configured to:
inputting the emotion characteristics corresponding to the preset input information samples and the emotion factor samples into the second preset identification network to obtain predicted emotion types;
adjusting parameters of the second preset identification network based on the predicted emotion type, the emotion type tag and a second preset loss function;
and taking the trained second preset recognition network as the emotion recognition model.
In a third aspect, embodiments of the present application provide an apparatus for identifying a user's mood, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein the one or more programs configured to be executed by the one or more processors include instructions for:
obtaining current input information and emotion factors of a user;
extracting emotion characteristics of the current input information to obtain current emotion characteristics corresponding to the current input information;
and obtaining the current emotion type of the user based on the current emotion characteristics, the emotion factors and an emotion recognition model.
Optionally, the current input information includes at least two of current text input information, current voice input information, and current image input information.
Optionally, the current image input information includes a current face image and/or a current body posture image.
Optionally, the emotional factors include a character type and/or a historical emotion type.
Optionally, the personality type of the user is obtained based on preset time period input information of the user and a personality identification model, wherein the personality identification model is obtained by training a first preset identification network in advance based on the personality characteristics and the personality type labels corresponding to the preset time period input information samples.
Optionally, the device is also configured to execute the one or more programs by the one or more processors including instructions for:
inputting the corresponding character features of the input information samples in the preset time period into the first preset identification network according to the time sequence to obtain a predicted character type;
adjusting parameters of the first pre-set identification network based on the predicted personality type, the personality type label, and a first pre-set loss function;
and taking the trained first preset recognition network as the character recognition model.
Optionally, the device is also configured to execute the one or more programs by the one or more processors including instructions for:
performing character feature extraction on the input information of the user in a preset time period to obtain character features corresponding to the input information of the preset time period;
determining a time sequence of character features corresponding to the preset time period input information;
and inputting the character features corresponding to the preset time period input information into the character recognition model according to the time sequence to obtain the character type of the user.
Optionally, the historical emotion types of the user are pre-stored in a map manner and/or an embedded manner.
Optionally, the emotion recognition model is obtained by pre-training a second preset recognition network based on emotion characteristics, emotion factor samples and corresponding emotion type labels corresponding to the preset input information samples.
Optionally, the device is also configured to execute the one or more programs by the one or more processors including instructions for:
inputting the emotion characteristics corresponding to the preset input information samples and the emotion factor samples into the second preset identification network to obtain predicted emotion types;
adjusting parameters of the second preset identification network based on the predicted emotion type, the emotion type tag and a second preset loss function;
and taking the trained second preset recognition network as the emotion recognition model.
In a fourth aspect, the present application provides a machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform the method for identifying a user emotion described in any of the first aspects above.
Compared with the prior art, the method has the advantages that:
by adopting the technical scheme of the embodiment of the application, firstly, the current input information and the emotion factors of a user are obtained; then, extracting emotion characteristics of the current input information to obtain current emotion characteristics corresponding to the current input information; and finally, processing the current emotion characteristics and the emotion factors by using the emotion recognition model to obtain the current emotion type of the user. Therefore, emotion characteristics of current input information of a user are extracted to obtain current emotion characteristics, and then emotion recognition is carried out on the current emotion characteristics by combining the emotion factors of the user to obtain the current emotion type of the user; the emotion recognition method and the emotion recognition device not only analyze the emotion of the user represented by the current emotion characteristics, but also analyze the influence of emotion factors on the emotion of the user represented by the current emotion characteristics, realize multi-angle comprehensive emotion recognition, and greatly improve the accuracy of emotion recognition of the user.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Currently, in a user input scenario, emotion recognition is generally performed on input information of a user to obtain a user emotion. However, the inventor finds that the method does not consider the influence of other influence information on the user emotion represented by the input information of the user, and only identifies the user emotion represented by the input information of the user, which easily causes that the identified user emotion is not accurate and is likely to have a large deviation from the true user emotion.
In order to solve this problem, in the embodiment of the present application, first, current input information and emotional factors of a user are obtained; then, extracting emotion characteristics of the current input information to obtain current emotion characteristics corresponding to the current input information; and finally, processing the current emotion characteristics and the emotion factors by using the emotion recognition model to obtain the current emotion type of the user. Therefore, emotion characteristics of current input information of a user are extracted to obtain current emotion characteristics, and then emotion recognition is carried out on the current emotion characteristics by combining the emotion factors of the user to obtain the current emotion type of the user; the emotion recognition method and the emotion recognition device not only analyze the emotion of the user represented by the current emotion characteristics, but also analyze the influence of emotion factors on the emotion of the user represented by the current emotion characteristics, realize multi-angle comprehensive emotion recognition, and greatly improve the accuracy of emotion recognition of the user.
For example, one of the scenarios in the embodiment of the present application may be applied to the scenario shown in fig. 1, where the scenario includes a client 101 and a processor 102, and the client 101 and the processor 102 are in the same artificial intelligence product. The user inputs through an input product installed on the client 101, and the processor 102 recognizes the emotion of the user by adopting the implementation mode provided by the embodiment of the application, so that the processor 102 performs artificial intelligence service based on the recognized emotion of the user, and the artificial intelligence product is more intelligent.
It is to be understood that in the above application scenarios, although the actions of the embodiments of the present application are described as being performed by the processor 102; however, the present application is not limited in terms of executing a subject as long as the actions disclosed in the embodiments of the present application are executed.
It is to be understood that the above scenario is only one example of a scenario provided in the embodiment of the present application, and the embodiment of the present application is not limited to this scenario.
The following describes in detail a specific implementation manner of the method for recognizing a user emotion and the related apparatus in the embodiments of the present application by using embodiments in conjunction with the accompanying drawings.
Exemplary method
Referring to fig. 2, a flow chart of a method for recognizing a user emotion in an embodiment of the present application is shown. In this embodiment, the method may include, for example, the steps of:
step 201: current input information and emotional factors of the user are obtained.
In the embodiment of the application, in order to avoid emotion recognition based on current input information of a single modality, the obtained emotion accuracy of the user is low, and the current input information of at least two different modalities of the user can be obtained, so that the emotion accuracy of the subsequent recognition user is improved. The current input information may refer to input information at the current time, or may refer to input information at the current time period.
Common user input modes include a text input mode, a voice input mode and an image input mode; the current input information corresponding to the text input mode is current text input information, the current input information corresponding to the voice input mode is current voice input information, the current input information corresponding to the image input mode is current image input information, and the current input information is generally image input information such as a face image and a body posture image which can represent the emotion of a user. Therefore, in an optional implementation manner of this embodiment of the present application, the current input information includes at least two of current text input information, current voice input information, and current image input information; the current image input information includes a current face image and/or a current body pose image.
The current face image may be a current still face image, for example, a current still face picture, or a current dynamic face image, for example, a current dynamic face picture or a current face video; similarly, the current body posture image may be either a current static body posture image, such as a current static body posture picture, or a current dynamic body posture image, such as a current dynamic body posture picture or a current body posture video.
Because only the user emotion represented by the input information of the user is recognized, the influence of other influence information on the user emotion represented by the input information of the user is not considered, the recognized user emotion is not accurate, and a large deviation from the real user emotion is likely to occur. Therefore, in the embodiment of the present application, on the basis of obtaining the current input information of the user, information that affects the emotion of the user represented by the current input information of the user needs to be obtained as an emotion factor of the user.
The emotion of the user represented by the current input information of the user under different character types is different, and the emotion of the user represented by the current input information of the user under different historical emotion types is also different; that is, the personality type and the historical emotion type of the user affect the emotion of the user represented by the current input information of the user, and any one or two of the personality type and the historical emotion type can be used as the emotion factor of the user. Therefore, in an alternative implementation of the embodiment of the present application, the emotional factors include a character type and/or a historical emotion type.
In the embodiment of the application, when the emotion factor includes the personality type, because the personality type of the user can be reflected by the input information of the user in the preset time period, the personality identification processing can be performed on the input information of the user in the preset time period, and the personality type of the user is obtained. For example, the personality type may be open, responsible, outward, hommized, neutral, or depolarized, and the like.
The character recognition processing is based on the premise that a character feature and a character type label corresponding to an information sample are input in a preset time period, and a first preset recognition network is trained in advance to obtain a character recognition model. Based on the character recognition model, character recognition processing is carried out on the input information of the user in the preset time period, and the character type of the user can be obtained. Therefore, in an optional implementation manner of the embodiment of the present application, the personality type of the user is obtained based on the preset time period input information of the user and a personality identification model; the character recognition model is obtained by pre-training a first preset recognition network based on character features and character type labels corresponding to input information samples in a preset time period.
The training of the character recognition model refers to: after character features corresponding to the input information samples in a preset time period ordered according to the time sequence and corresponding character type labels are used as training data; training a preset identification network, namely a first preset identification network, by the training data to fully mine the association information between the character features and the character type labels corresponding to the input information samples at the preset time period ordered according to the time sequence; after a certain amount of the training data are subjected to iterative training for multiple times, the character recognition model can be obtained.
When the method is specifically implemented, firstly, the character features corresponding to the information samples are input in a preset time period, and are input into a first preset identification network according to a time sequence, and the first preset identification network can predict character types and output the predicted character types; then, calculating the loss of the predicted character type and the character type label by using a first preset loss function so as to adjust the parameter of the first preset identification network; and after repeated iterative training until the preset iterative training times are reached or the first preset recognition network is converged, finishing the training of the first preset recognition network, and taking the trained first preset recognition network as a character recognition model. Therefore, in an optional implementation manner of the embodiment of the present application, the training step of the character recognition model includes the following steps:
step A: inputting the corresponding character features of the input information samples in the preset time period into the first preset identification network according to the time sequence to obtain a predicted character type;
and B: adjusting parameters of the first pre-set identification network based on the predicted personality type, the personality type label, and a first pre-set loss function;
and C: and taking the trained first preset recognition network as the character recognition model.
Correspondingly, when the personality type of the user is obtained, firstly, the personality characteristics in the input information of the user in a preset time period need to be extracted, then, the time sequence of the personality characteristics needs to be determined, and finally, the personality characteristics are input into the personality identification model according to the time sequence, so that the personality type of the user can be obtained. In an optional implementation manner of the embodiment of the present application, the obtaining of the personality type of the user includes:
step D: performing character feature extraction on the input information of the user in a preset time period to obtain character features corresponding to the input information of the preset time period;
step E: determining a time sequence of character features corresponding to the preset time period input information;
step F: and inputting the character features corresponding to the preset time period input information into the character recognition model according to the time sequence to obtain the character type of the user.
In the embodiment of the present application, when the emotion factor includes a historical emotion type, since the historical emotion type of the user is an emotion type obtained by historical emotion recognition for the user, the historical emotion type of the user is stored in the user emotion database in advance with respect to current input information of the user. The storage mode of the historical emotion type of the user can be, for example, a map mode and/or an embedding mode. That is, in an optional implementation manner of the embodiment of the present application, the historical emotion types of the user are pre-stored in a graph manner and/or an embedded manner.
When the historical emotion types of the user are stored in advance in a graph mode, a historical emotion type graph is obtained and is composed of a plurality of historical emotion type triples. The single historical emotion type triple is in (s, p, o) format, wherein s refers to the user, p refers to the historical emotion type, and p represents the time relationship between the user and the historical emotion type. The historical emotion type map is divided into a long-term historical emotion type map and a short-term historical emotion type map, the long-term historical emotion type of the user is memorized by the long-term historical emotion type map, the short-term historical emotion type of the user is memorized by the short-term historical emotion type map, and the short-term historical emotion type map needs to be updated in time.
When the historical emotion types of the user are stored in advance in an embedding mode, an embedded historical emotion type module is obtained, and the embedded historical emotion type module represents a plurality of historical emotion type triples by using continuous numerical values. The embedded historical emotion type module is also divided into a long-term embedded historical emotion type module and a short-term embedded historical emotion type module, the long-term embedded historical emotion type module memorizes the long-term historical emotion type of the user, the short-term embedded historical emotion type module memorizes the short-term historical emotion type of the user, and the short-term embedded historical emotion type module also needs to be updated in time.
Step 202: and extracting emotion characteristics of the current input information to obtain current emotion characteristics corresponding to the current input information.
In the embodiment of the application, for the current input information of the user, firstly, the emotional feature of the user emotion represented in the current input information needs to be extracted as the current emotional feature corresponding to the current input information. In step 202, in a specific embodiment, the current input information corresponding to the at least two different modalities includes at least two of current text input information, current voice input information, and current image input information, and the specific implementation manners of the current input information corresponding to the three different modalities of the current text input information, the current voice input information, and the current image input information are different, which are specifically as follows:
first, for the current text input information, when the current input information of at least two different modalities includes the current text input information, step 202 may include, for example: and extracting emotion-related semantic features of the first current text input information to obtain first current emotion semantic features corresponding to the first current text input information.
Second, for the current speech input information, when the current input information of at least two different modalities includes the current speech input information, step 202 may include, for example: converting the current voice input information into second current text input information; extracting emotion-related semantic features of the second current text input information to obtain second current emotion semantic features corresponding to the second current text input information; and performing emotion-related voice feature extraction on the current voice input information to obtain current emotion voice features corresponding to the current voice input information. The current voice input information is obtained through preprocessing such as pre-emphasis, framing and windowing, so that the influence of factors such as aliasing, higher harmonic distortion and high frequency caused by a user vocal organ and equipment for collecting the voice input information on the quality of the current voice input information is avoided.
Third, for the current image input information, when the current input information of at least two different modalities includes the current image input information, step 202 may include, for example: and performing emotion-related image feature extraction on the current image input information to obtain current emotion image features corresponding to the current image input information. When the current image input information comprises a current face image, the current emotion image characteristics corresponding to the current face image mainly refer to current face characteristics and the like; when the current image input information includes the current body posture image, the current emotion image characteristics corresponding to the current body posture image mainly refer to current human body characteristics and the like.
Step 203: and obtaining the current emotion type of the user based on the current emotion characteristics, the emotion factors and an emotion recognition model.
In the embodiment of the application, after the current emotion characteristics and the corresponding emotion factors are obtained, in order to avoid the phenomenon that the emotion of the user obtained by recognition is not accurate and is likely to have a large deviation with the true emotion of the user because only emotion recognition is performed on the current emotion characteristics; the current emotion characteristics and corresponding emotion factors are required to be integrated for emotion recognition, and the current emotion type of the user is obtained; the emotion recognition method and the emotion recognition device not only analyze the emotion of the user represented by the current emotion characteristics, but also analyze the influence of emotion factors on the emotion of the user represented by the current emotion characteristics, realize multi-angle comprehensive emotion recognition, and greatly improve the accuracy of emotion recognition of the user.
The emotion recognition is carried out by integrating the current emotion characteristics and the corresponding emotion factors, actually, the current emotion characteristics and the corresponding emotion factors are input into an emotion recognition model obtained through pre-training, and the emotion types obtained through recognition are used as the current emotion types of the user. In an optional implementation manner of the embodiment of the application, the emotion recognition model is obtained by training a second preset recognition network in advance based on the emotion characteristics, the emotion factor samples and the corresponding emotion type labels corresponding to the preset input information samples.
The training of the emotion recognition model refers to: taking emotion characteristics, emotion factor samples and emotion type labels corresponding to preset input information samples as training data; another preset identification network, namely a second preset identification network, needs to be trained through the training data so as to fully mine emotion characteristics corresponding to preset input information samples, and associated information between the emotion factor samples and the emotion type labels; after a certain amount of the training data are subjected to repeated iterative training, the emotion recognition model can be obtained.
When the method is specifically implemented, firstly, emotion characteristics and emotion factor samples corresponding to preset input information samples need to be input into a second preset identification network, and the second preset identification network can identify emotion types and output predicted emotion types; then, calculating the loss of the predicted emotion type and the emotion type label by using a second preset loss function so as to adjust the parameters of a second preset identification network; and after repeated iterative training, finishing the training of the second preset recognition network until the preset iterative training times are reached or the second preset recognition network is converged, and taking the trained second preset recognition network as an emotion recognition model. That is, in an optional implementation manner of the embodiment of the present application, the training step of the emotion recognition model includes:
step G: inputting the emotion characteristics corresponding to the preset input information samples and the emotion factor samples into the second preset identification network to obtain predicted emotion types;
step H: adjusting parameters of the second preset identification network based on the predicted emotion type, the emotion type tag and a second preset loss function;
step I: and taking the trained second preset recognition network as the emotion recognition model.
As an example, obtaining the current input information of the user includes current text input information and current image input information of the user, where the current text input information is that "i like to eat apple" and the current image input information is a current face image; the emotion image information includes a character type, and the character type is extroversion. Performing emotion-related emotional feature extraction on the current text input information 'i likes apple' well ', and obtaining current emotional semantic features corresponding to the current text input information' i likes apple 'well'; performing emotion-related image feature extraction on current image input information, namely a current facial image, and obtaining current image input information, namely current emotion image features corresponding to the current facial image; and the current emotion semantic features and the current emotion image features form current emotion features. Compared with the situation that the current emotion type of the user is happy, the current emotion type of the user is obtained based on the current emotion characteristics, the character type-extroversion and the emotion recognition model, and the multi-angle comprehensive emotion recognition is more accurate.
As another example, in the above example, the current input information is not changed, the emotion image information includes a character type, and the character type is a negativity. Referring to the above example details, in a case where the current emotion type of the user obtained by emotion recognition only on the current emotion feature is happy, the current emotion type of the user is obtained as very happy based on the current emotion feature, the character type — passivity, and the emotion recognition model.
Through various implementation manners provided by the embodiment, firstly, current input information and emotion factors of a user are obtained; then, extracting emotion characteristics of the current input information to obtain current emotion characteristics corresponding to the current input information; and finally, processing the current emotion characteristics and the emotion factors by using the emotion recognition model to obtain the current emotion type of the user. Therefore, emotion characteristics of current input information of a user are extracted to obtain current emotion characteristics, and then emotion recognition is carried out on the current emotion characteristics by combining the emotion factors of the user to obtain the current emotion type of the user; the emotion recognition method and the emotion recognition device not only analyze the emotion of the user represented by the current emotion characteristics, but also analyze the influence of emotion factors on the emotion of the user represented by the current emotion characteristics, realize multi-angle comprehensive emotion recognition, and greatly improve the accuracy of emotion recognition of the user.
Exemplary devices
Referring to fig. 3, a schematic structural diagram of an apparatus for recognizing a user emotion in an embodiment of the present application is shown. In this embodiment, the apparatus may specifically include:
a first obtaining unit 301 for obtaining current input information and emotional factors of a user;
a second obtaining unit 302, configured to perform emotion feature extraction on the current input information, and obtain a current emotion feature corresponding to the current input information;
a third obtaining unit 303, configured to obtain a current emotion type of the user based on the current emotion feature, the emotion factor, and an emotion recognition model.
In an optional implementation manner of the embodiment of the present application, the current input information includes at least two of current text input information, current voice input information, and current image input information.
In an optional implementation manner of the embodiment of the present application, the current image input information includes a current face image and/or a current body posture image.
In an alternative implementation of the embodiment of the present application, the emotional factors include a character type and/or a historical emotion type.
In an optional implementation manner of the embodiment of the present application, the personality type of the user is obtained based on the preset time period input information of the user and a personality identification model; the character recognition model is obtained by pre-training a first preset recognition network based on character features and character type labels corresponding to input information samples in a preset time period.
In an optional implementation manner of the embodiment of the present application, the apparatus further includes a first training unit, where the first training unit is configured to:
inputting the corresponding character features of the input information samples in the preset time period into the first preset identification network according to the time sequence to obtain a predicted character type;
adjusting parameters of the first pre-set identification network based on the predicted personality type, the personality type label, and a first pre-set loss function;
and taking the trained first preset recognition network as the character recognition model.
In an optional implementation manner of the embodiment of the present application, the apparatus further includes a fourth obtaining unit, where the fourth obtaining unit is configured to:
performing character feature extraction on the input information of the user in a preset time period to obtain character features corresponding to the input information of the preset time period;
determining a time sequence of character features corresponding to the preset time period input information;
and inputting the character features corresponding to the preset time period input information into the character recognition model according to the time sequence to obtain the character type of the user.
In an optional implementation manner of the embodiment of the present application, the historical emotion types of the user are pre-stored in a graph manner and/or an embedded manner.
In an optional implementation manner of the embodiment of the application, the emotion recognition model is obtained by training a second preset recognition network in advance based on the emotion characteristics, the emotion factor samples and the corresponding emotion type labels corresponding to the preset input information samples.
In an optional implementation manner of the embodiment of the present application, the apparatus further includes a second training unit, where the second training unit is configured to:
inputting the emotion characteristics corresponding to the preset input information samples and the emotion factor samples into the second preset identification network to obtain predicted emotion types;
adjusting parameters of the second preset identification network based on the predicted emotion type, the emotion type tag and a second preset loss function;
and taking the trained second preset recognition network as the emotion recognition model.
Through various implementation manners provided by the embodiment, firstly, current input information and emotion factors of a user are obtained; then, extracting emotion characteristics of the current input information to obtain current emotion characteristics corresponding to the current input information; and finally, processing the current emotion characteristics and the emotion factors by using the emotion recognition model to obtain the current emotion type of the user. Therefore, emotion characteristics of current input information of a user are extracted to obtain current emotion characteristics, and then emotion recognition is carried out on the current emotion characteristics by combining the emotion factors of the user to obtain the current emotion type of the user; the emotion recognition method and the emotion recognition device not only analyze the emotion of the user represented by the current emotion characteristics, but also analyze the influence of emotion factors on the emotion of the user represented by the current emotion characteristics, realize multi-angle comprehensive emotion recognition, and greatly improve the accuracy of emotion recognition of the user.
Fig. 4 is a block diagram illustrating an apparatus 400 for recognizing a user's emotion according to an exemplary embodiment. For example, the apparatus 400 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 4, the apparatus 400 may include one or more of the following components: processing components 402, memory 404, power components 406, multimedia components 408, audio components 410, input/output (I/O) interfaces 412, sensor components 414, and communication components 416.
The processing component 402 generally controls overall operation of the apparatus 400, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 402 may include one or more processors 420 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 402 can include one or more modules that facilitate interaction between the processing component 402 and other components. For example, the processing component 402 can include a multimedia module to facilitate interaction between the multimedia component 408 and the processing component 402.
The memory 404 is configured to store various types of data to support operations at the device 400. Examples of such data include instructions for any application or method operating on the device 400, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 404 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power supply components 406 provide power to the various components of device 400. The power components 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 400.
The multimedia component 408 includes a screen that provides an output interface between the device 400 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure correlated to the touch or slide operation. In some embodiments, the multimedia component 408 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 400 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 410 is configured to output and/or input audio signals. For example, audio component 410 includes a Microphone (MIC) configured to receive external audio signals when apparatus 400 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 404 or transmitted via the communication component 416. In some embodiments, audio component 410 also includes a speaker for outputting audio signals.
The I/O interface 412 provides an interface between the processing component 402 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 414 includes one or more sensors for providing various aspects of status assessment for the apparatus 400. For example, the sensor component 414 can detect the open/closed state of the device 400, the relative positioning of components, such as a display and keypad of the apparatus 400, the sensor component 414 can also detect a change in the position of the apparatus 400 or a component of the apparatus 400, the presence or absence of user contact with the apparatus 400, orientation or acceleration/deceleration of the apparatus 400, and a change in the temperature of the apparatus 400. The sensor assembly 414 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 414 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 414 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 416 is configured to facilitate wired or wireless communication between the apparatus 400 and other devices. The apparatus 400 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 416 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 404 comprising instructions, executable by the processor 420 of the apparatus 400 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method of recognizing a user's mood, the method comprising:
obtaining current input information and emotion factors of a user;
extracting emotion characteristics of the current input information to obtain current emotion characteristics corresponding to the current input information;
and obtaining the current emotion type of the user based on the current emotion characteristics, the emotion factors and an emotion recognition model.
In an optional implementation manner of the embodiment of the present application, the current input information includes at least two of current text input information, current voice input information, and current image input information.
In an optional implementation manner of the embodiment of the present application, the current image input information includes a current face image and/or a current body posture image.
In an alternative implementation of the embodiment of the present application, the emotional factors include a character type and/or a historical emotion type.
In an optional implementation manner of the embodiment of the present application, the personality type of the user is obtained based on preset time period input information of the user and a personality identification model, where the personality identification model is obtained by training a first preset identification network in advance based on a personality feature and a personality type tag corresponding to a preset time period input information sample.
In an optional implementation manner of the embodiment of the present application, the training step of the character recognition model includes:
inputting the corresponding character features of the input information samples in the preset time period into the first preset identification network according to the time sequence to obtain a predicted character type;
adjusting parameters of the first pre-set identification network based on the predicted personality type, the personality type label, and a first pre-set loss function;
and taking the trained first preset recognition network as the character recognition model.
In an optional implementation manner of the embodiment of the present application, the obtaining of the personality type of the user includes:
performing character feature extraction on the input information of the user in a preset time period to obtain character features corresponding to the input information of the preset time period;
determining a time sequence of character features corresponding to the preset time period input information;
and inputting the character features corresponding to the preset time period input information into the character recognition model according to the time sequence to obtain the character type of the user.
In an optional implementation manner of the embodiment of the present application, the historical emotion types of the user are pre-stored in a graph manner and/or an embedded manner.
In an optional implementation manner of the embodiment of the application, the emotion recognition model is obtained by training a second preset recognition network in advance based on the emotion characteristics, the emotion factor samples and the corresponding emotion type labels corresponding to the preset input information samples.
In an optional implementation manner of the embodiment of the present application, the training step of the emotion recognition model includes:
inputting the emotion characteristics corresponding to the preset input information samples and the emotion factor samples into the second preset identification network to obtain predicted emotion types;
adjusting parameters of the second preset identification network based on the predicted emotion type, the emotion type tag and a second preset loss function;
and taking the trained second preset recognition network as the emotion recognition model.
Fig. 5 is a schematic structural diagram of a server in an embodiment of the present application. The server 500 may vary widely in configuration or performance and may include one or more Central Processing Units (CPUs) 522 (e.g., one or more processors) and memory 532, one or more storage media 530 (e.g., one or more mass storage devices) storing applications 542 or data 544. Memory 532 and storage media 530 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 522 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the server 500.
The server 500 may also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input-output interfaces 558, one or more keyboards 556, and/or one or more operating systems 541, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a preferred embodiment of the present application and is not intended to limit the present application in any way. Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application. Those skilled in the art can now make numerous possible variations and modifications to the disclosed embodiments, or modify equivalent embodiments, using the methods and techniques disclosed above, without departing from the scope of the claimed embodiments. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present application still fall within the protection scope of the technical solution of the present application without departing from the content of the technical solution of the present application.