WO2025046064A1

WO2025046064A1 - A controller for a conversational system using emotion context and method for operating the same

Info

Publication number: WO2025046064A1
Application number: PCT/EP2024/074266
Authority: WO
Inventors: Shanmuga Sundaram KARTHIKEYANI; Purvish KHALPADA; Arvind Devarajan SANKRUTHI; Ravisankar SWETHA SHANKAR
Original assignee: Robert Bosch GmbH; Bosch Global Software Technologies Pvt Ltd
Current assignee: Robert Bosch GmbH; Bosch Global Software Technologies Pvt Ltd
Priority date: 2023-09-01
Filing date: 2024-08-30
Publication date: 2025-03-06
Anticipated expiration: 2026-03-01

Abstract

The conversational system (100) comprises at least one input signal (128), to detect user characteristics, selected from a group comprising speech, texts in the speech, physiological parameters, and a facial image. The controller (110) connected to the at least one means (132), and configured to estimate an emotion profile (116), by an emotion model (112), using input signals (128) from each of the at least one means (132). The emotion profile (116) comprises an estimated emotion, an intensity of the estimated emotion and a confidence score of the estimated emotion. The controller (110), characterized in that, while a speech input is available, configured to determine a context, by a context model (114), of an ongoing conversation detected in the speech input, and store the emotion profile (116) as baseline for the emotion and for the user (130) if the estimated emotion and the emotion detected in the context are same.

Description

Title of the invention:

A CONTROLLER FOR A CONVERSATIONAL SYSTEM USING EMOTION CONTEXT AND METHOD FOR OPERATING THE SAME

Complete Specification:

The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed.

Field of the invention:

[0001] The present invention relates to a controller for a conversational system using emotion context and a method for operating the same.

Background of the invention:

[0002] Emotions play a vital role in effective communication. Emotions often provide important context and insight into what a person is really saying. If emotions are not considered, there is high chances of missing the nuances of the message. By understanding the emotions of the other person, an effective communication style can be tailored. For example, if someone is feeling stressed, then more calming, and reassuring approach must be used.

[0003] The existing conversational systems do not have emotion awareness. For sure, if you say, “I am stressed,” you might get a generic response on how the stress is bad and what you can do. However, you must explicitly express your emotion every time, which is unlike human-to-human conversation. That context is lost, and your future communication is not influence by it. It is as if the conversational system did not really care on your emotions. [0004] According to a patent literature WO23113717, a smart vehicle assistant with artificial intelligence is disclosed. The invention relates to a smart car assistant with artificial intelligence designed for use in automobiles, VIP vehicles, commercial vehicles and smart home systems, which recognizes the user's face, detects the user's emotional state, offers suggestions according to the detected emotion, and has a three-dimensional holographic face to create an emotional bond between the user and the vehicle, has the ability to read news, give weather information, send e- mails, create notes and record alarms, and allows voice control of the equipment in the vehicle by speaking in Turkish or in any desired language, makes it possible to perceive the commands given by speaking in a daily and natural speaking language, and to answer the questions asked, can translate in different languages, can tell the malfunctions that may occur in the vehicle and transmit range information audibly, connects to the phone and allows you to use the features of the phone.

Brief description of the accompanying drawings:

[0005] An embodiment of the disclosure is described with reference to the following accompanying drawings,

[0006] Fig. 1 illustrates a block diagram of a controller for a conversational system for a user, according to an embodiment of the present invention, and

[0007] Fig. 2 illustrates a method of operating the controller for the conversational system, according to the present invention.

Detailed description of the embodiments:

[0008] Fig. 1 illustrates a block diagram of a controller for a conversational system for a user, according to an embodiment of the present invention. The conversational system 100 facilitates contextual conversation with the user 130. The conversational system 100 comprises the controller 110 with an input interface 122 and an output interface 124. The conversational system 100 comprises at least one input signal 128, to detect user characteristics, selected from a group comprising speech, texts in the speech, physiological parameters, and a facial image. The at least one input signal 128 is received from at least one means 132 selected from a group comprising a microphone 102, an Automatic Speech Recognition (ASR) module 104 or Speech-to-Text module, a wearable device 106 and at least one camera 108. The controller 110 connected to the at least one means 132, and configured to estimate an emotion profile 116, by an emotion model 112, using input signals 128 from each of the at least one means 132. The emotion profile 116 comprises an estimated emotion, an intensity of the estimated emotion and a confidence score of the estimated emotion. The controller 110, characterized in that, while a speech input is available, configured to determine a context, by a context model 114, of an ongoing conversation detected in the speech input, and store the emotion profile 116 as baseline for the emotion and for the user 130 if the estimated emotion and the emotion detected in the context are same. Alternatively, while the speech input is unavailable, the controller 110 configured to prompt the user 130, through an output means 132 (such as speaker or display) for a response to validate the estimated emotion, and store the emotion profile 116 as baseline if validated by the user 130 with high confidence score. Otherwise, the controller 110 stores the emotion profile 116 as baseline with the existing confidence score or lower score.

[0009] According to an embodiment of the present invention, the conversational system 100 as explained above comprises the use of at least two input signals 128 from respective means 132, instead of at least one input signal 128 to ensure that the multimodal sensor signals are considered. However, the conversational system 100 is also adaptable or usable or implementable with one input signal 128 from respective means 132 as well. The wearable device 106 is a health monitoring device which are worn by the user 130 and has the ability to measure vitals of the user 130 such as but not limited to heart rate, blood pressure, oxygen level, etc. Further, the means 132 mentioned are not limited to above list but may comprises other devices known in the art.

[0010] According to an embodiment of the present invention, the controller 110 is either for the conversational system 100 which uses the emotion profile 116 of the user 130 for emotion based contextual conversation, or the controller 110 is for the standalone emotion profiling system, in which case the output of the emotion profiling system is used by the conversational system 100 for the emotion aware conversation with the user 130.

[0011] According to another embodiment of the present invention, while the baseline emotion is already stored in a memory element 118 of the controller 110, the controller 110 configured to normalize the estimated emotion with the baseline emotion. The controller 110 then configured to use the normalized emotion of the user 130, and perform an action corresponding to the estimated emotion. The action comprises at least one of a response to the user 130 through an output means 132 and assist the user 130 in a task.

[0012] According to an embodiment of the present invention, the controller 110 configured to monitor the emotion profile 116 of the user 130 for a predefined time period before the emotion profile 116 is stored as the baseline. Further, the baseline is set for each type of emotion foe each user 130.

[0013] According to an embodiment of the present invention, the controller 110 configured to adjust the weightage to each of the input signals 128 received from respective means 132 (based on availability). The controller 110 monitors the emotion profile 116 (or emotions from each of the at least one signal 128) estimated by the emotion model 112 in comparison to the context, and determines that context is more aligned with one of the input signal 132, and thus increases the weightage of the at least one input signal 132 which aligns with the context. , Thus, the controller 110 adjusts weightage of each of the at least one input signal 128 with higher allocation to that at least one input signal 128 which is close to the determined context. For example, if there are two input signals 128 in the conversational system 100, i.e. one microphone 102 and one camera 108, and the input signal 128 from the camera 108 estimates emotion with higher confidence and the input signal 128 from the at least one microphone 102 is neutral, then the controller 110 allocates higher weightage to the input signal 128 from the camera 108 than the at least one microphone 102 for estimating the emotion profile 116 for the user 130.

[0014] In simple words, the controller 110, over a time period, finetunes the weightage. For example, facial emotions are sad, but voice is neutral, and there is already an established context of user having lost her phone, the controller 110 determines that user 130 expresses facially more than vocally. So, the weightage is given little more to the emotion estimated from the camera 108 than the emotion estimated from the microphone 102. Further, when the adjustment happens over multiple instances, over a period of time, the controller 110 learns the characteristic of user’s expression of emotion, by means of gradually finetuning the weightages.

[0015] Further, each user 130 (or person) has different medium of expressing their emotion. Hence, the equal weight to all the means 132 is finetuned to the subjective values for the user 130, based on the understanding of the baseline profile. As the emotion model 112 builds the emotion profile 116 of the user 130, the controller 110 monitors the differential emotion value. The controller 110 adjusts weights based on the feedback response from the emotion model 112. As the emotion history is built, the conversational and assistive system uses the emotion. This allows conversational system 100 to not only understand the conversational context better, but also changes the course of actions and conversation based on the user emotion (for example, not prompting for vehicle servicing).

[0016] According to an embodiment of the present invention, the conversational system 100 may use an always-on microphone 102 to listen/capture/monitor the conversations within an environment. The at least one microphone 102 collects the speech data and passes to the controller 110, where a speech module or speech processor (not shown), continuously analyzes and processes the incoming speech data. The speech module performs speech processing like speaker diarization, recognition, tonal and emotion analysis, and speech to text conversion. The emotion model 112 estimates the emotion profile 116 of the user 130 based on the same followed by formation of the baseline or execution of the action. Alternatively, the at least one microphone 102 is selectively switched ON by the user 130 before the conversation.

[0017] The speech input refers to dialogue or utterances in the environment with one or more users 130. The context model 114 uses at least one of a rule based, and learning based model to process the incoming text data, along with conversation history and/or emotion history, to estimate the context of the ongoing conversation.

[0018] According to an embodiment of the present invention, the task is at least one but not limited to, scheduling a meeting, rescheduling a service of an equipment or appliance, postponing a reminder, setting the reminder, booking tickets for an event such as movie, theater, playing a song, and the like in smart environment. The controller 110 configured to perform an action in relation to the determined emotion and the application domain. The application domain corresponds to the environment in which the conversational system 100 is deployed such as the automotive domain, home, office, hospital, hospitality, etc. Further, the user 130 in the environment is not just one, but one or more user 130 who are in proximity to the at least one microphone 102.

[0019] It is important to understand some aspects of Artificial Intelligence (Al)/ Machine Learning (ML) technology and AI/ML based devices/sy stems (such as conversational system 100), which can be explained as follows. Depending on the architecture of the implements, AI/ML devices/sy stem may include many components. One such component is an AI/ML model or AI/ML modules. Different modules are described later in this disclosure. The AI/ML model can be defined as reference or an inference set of data, which uses different forms of correlation matrices. Using these AI/ML models and the data from these AI/ML models, correlations can be established between different types of data to arrive at some logical understanding of the data. A person skilled in the art would be aware of the different types of AI/ML models such as linear regression, naive bayes classifier, support vector machine, neural networks and the like. It must be understood that this disclosure is not specific to the type of model being executed and can be applied to any AI/ML module irrespective of the AI/ML model being executed. A person skilled in the art will also appreciate that the AI/ML model may be implemented as a set of software instructions, combination of software and hardware or any combination of the same.

[0020] Some of the typical tasks performed by AI/ML systems are classification, clustering, regression etc. Majority of classification tasks depend upon labeled datasets; that is, the data sets are labelled manually in order for a neural network to learn the correlation between labels and data. This is known as supervised learning. Some of the typical applications of classifications are, face recognition, object identification, gesture recognition, voice recognition etc. In a regression task, the model is trained based on labeled datasets, where the target labels are numeric values. Some of the typical applications of regressions are, Weather forecasting, Stock price predictions, House price estimation, energy consumption forecasting etc. Clustering or grouping is the detection of similarities in the inputs. The cluster learning techniques do not require labels to detect similarities.

[0021] In accordance to an embodiment of the present invention, the controller 110 is provided with necessary signal detection, acquisition, and processing circuits. The controller 110 is the one which comprises input interface 122, output interfaces 124 having pins or ports, the memory element 118 such as Random Access Memory (RAM) and/or Read Only Memory (ROM), Anal og-to-Digi tai Converter (ADC) and a Digital-to-Analog Convertor (DAC), clocks, timers, counters and at least one processor (capable of implementing machine learning) connected with each other and to other components through communication bus channels. The memory element 118 is pre-stored with logics or instructions or programs or applications or modules/models and/or threshold values/ranges, reference values, predefined/predetermined criteria/conditions, predetermined lists, which is/are accessed by the at least one processor as per the defined routines. The internal components of the controller 110 are not explained for being state of the art, and the same must not be understood in a limiting manner. The controller 110 may also comprise communication units such as transceivers to communicate through wireless or wired means such as Global System for Mobile Communications (GSM), 3G, 4G, 5G, Wi-Fi, Bluetooth, Ethernet, serial networks, and the like. The controller 110 is implementable in the form of System-in-Package (SiP) or Sy stem - on-Chip (SOC) or any other known types. Examples of controller 110 comprises but not limited to, microcontroller, microprocessor, microcomputer, etc.

[0022] Further, the processor may be implemented as any or a combination of one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored in the memory element 118 and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA). The processor is configured to exchange and manage the processing of various Al models.

[0023] According to an embodiment of the present invention, the controller 110 is part of at least one of an infotainment unit of the vehicle, a smartphone, a wearable device 104, a cloud computer. Alternatively, the conversational system 100 is at least one of the infotainment unit of the vehicle, the smartphone, the wearable device, the cloud computer, a smart speaker, or a smart display and the like. In other words, the controller 110 is part of an internal device of the vehicle or part of external device which is connected to the vehicle through known wired or wireless means as described earlier or an external device to be used in non-automotive environment such as home, office, hospitals, etc. In the vehicle, the conversational system 100 is possible to be distributed such as multiple cameras 108, microphone 102 spread across a cabin of the vehicle, for example. In case of the cloud computer, the controller 110 is in the cloud and receives the signals from the means 132. [0024] In accordance to an embodiment of the present invention, the controller 110 to enable conversation with emotion context with the user 130 is disclosed. A block diagram 126 illustrates the same. The controller 110 configured to determine/receive an estimated emotion of the user 130 from an emotion history or memory element 118, characterized in that, the controller 110 configured to perform the action corresponding to the estimated emotion. The action comprises at least one of a response to the user 130 through the output means 132 and assist the user 130 in the task.

[0025] According to the present invention, a working of the controller 110 of the conversational system 100 is explained. The Fig. 1 is an abstract view of collection of emotion from different sources/means 132 by the conversational system 100 or (multimodal emotion understanding system). The conversational system 100 collects the weighted data of the emotion predictions, intensity, and confidence from various means 132 like emotion detection from voice, emotion detection from vitals (or physiological parameters), through wearable devices 106, emotion detected from the text of what user speaks, and emotion detection from user’s face through camera 108. The emotion model 112 in the controller 110 are neural model (or other Al or ML model) that individually predicts emotions from the respective input signals 128 (voice, text, image stream, etcetera). The conversational system 100 initially considers all these predictions with equal weight. The aggregated data is considered as the emotion profile 116 of the user 130 and is ready for further processing, i.e. for baselining.

[0026] The controller 110 checks if the user’s 130 baseline profile is already established and is confident enough. If so, the controller 110 normalizes the feature matrix (or the emotion profile 116) and stores in the user’s emotion history inside the memory element 118. The emotion profile 116 is usable later for any action to be performed. If the baseline is not present, then the controller 110 checks if there is a known context like user 130 mentioned in ongoing utterances or conversation, such as any interview or a meeting with friend. The controller 110 stores the estimated emotion profile 116 (or emotion matrix) as the baseline value for the detected emotion. This indicates that the emotion profile 116 is estimated for each emotion type and stored under user’s emotion history in the memory element 118.

[0027] In case if the context is not present and if emotion is intense (multimodal system has marked high intensity with high confidence), the controller 110 configured to randomly send a prompt through an output means 132 of the conversational system 100. Depending on the ongoing conversation and context, the controller 110 may or may not prompt the user 130 such as “Hey, you seem happy! Anything special you would like to share?” In response, the user 130 might either confirm or decline (or chose not to response at all) the estimated emotion prompt in indirect manner (or direct manner as well). If validated, the controller 110 sends the emotion back to be stored in the emotion profile 116 of the user 130. Alternatively, the controller 110 saves/stores the estimated emotion profile 116 as the baseline in the memory element 118.

[0028] According to the present invention, the technical effect of the controller 110 is envisaged with an example. A smiling person has a higher baseline emotion of happiness, let us say 55 (out of 100). Hence, if the emotion model 112 estimates the emotion as happiness with an intensity of 80, the controller 110 determines a small variation, compared to someone whose baseline emotion for happiness is 0. This process allows the controller 110 to understand the variation in subjective expression of the emotion.

[0029] According to the present invention, the controller 110 builds a baseline emotion profile 116 from multiple means 132 and understands the variation from the baseline and starts modulating conversation or actions based on the user emotion. For example, a virtual assistant in the car not prompting for the service reminders if the user 130 is stressed or angry. [0030] Fig. 2 illustrates a method of operating the controller for the conversational system, according to the present invention. The method comprises plurality of steps of which, a step 202 comprises receiving, by the controller 110, at least one input signals 128 from at least one means 132 for user characteristic selected from a group comprising speech, texts in the speech, a physiological parameters and facial image. The at least one means 132 comprises at least one microphone 102, the ASR model 104, the wearable device 106, and at least one camera 108, respectively connected to the controller 110. A step 204 comprises estimating the emotion profile 116, by the emotion model 112 of the controller 110, using the input signals 128 from each of the at least one means 132. The emotion profile 116 comprises the estimated emotion, the intensity of the estimated emotion and confidence score of the estimated emotion. The method is characterized by, while the speech input signal 128 is available, a step 206 which comprises, determining the context, by the context model 114 of the controller 110, of the ongoing conversation detected in the speech input signal 128. A step 208 comprises storing, by the controller 110, the emotion profile 116 as baseline for the emotion and for the user 130 if the estimated emotion and the emotion detected in the context are same. However, while the speech input signal 128 is unavailable, a step 210 comprises prompting, by the controller 110, the user 130 for the response to validate the estimated emotion through the output means 120. A step 212 comprises storing, by the controller 110, the emotion profile 116 as baseline if validated by the user 130 with high confidence score, otherwise, i.e. if not validated by the user 130, the method comprises storing the estimated emotion profile 116 as the baseline with existing score or lower score. The method is executed by the controller 110.

[0031] According to the method, while the baseline emotion is stored in the memory element 118, the method comprises a step 214 which comprises normalizing the estimated emotion with the baseline emotion. A step 216 comprises using the normalized emotion of the user 130, and performing the action corresponding to the estimated emotion. The action comprises at least one of the response to the user 130 through the output means 120 and assisting the user 130 in the task. According to the present invention, the method also comprises monitoring the emotion profile 116 of the user 130 for the predefined time period before the emotion profile 116 is stored as the baseline.

[0032] According to the present invention, once the baseline is established/set, the method performs a step 218 (periodically) which comprises monitoring the estimated emotion of the at least one input signal 128 received from the at least one means 132 in comparison to the context, and adjusting weightage of each of the at least one input signal 128 with higher allocation to that at least one input signal 128 which is determined to be close to the determined context. The weightage is adjusted between 0 to 1 or 0% to 100%.

[0033] According to the present invention, a method for enabling conversation with emotion context with the user 130 is disclosed. The method comprises plurality of steps of which a step 220 comprises determining the estimated emotion of the user 130. The method is characterized by a step 222 which comprises performing the action corresponding to the estimated emotion. The action comprises at least one of the response to the user 130 through the output means 120 and assisting the user 130 in the task.

[0034] According to an embodiment of the present invention, the conversational system 100 is preferably used for the vehicle to provide more convenience to the driver or passengers. The conversational system 100 may also be referred to as digital companion or virtual companion which is more than a digital assistant in a manner that the conversational system 100 is able to extract/deriver and give more information for a detected or asked query. Again as indicated above, the automatic conversational system 100 is applicable for different domains and environments such as home, office, hospital, airports, hospitality industry and the like and not just limited to vehicle. [0035] According to the present invention, an emotion aware personal companion is provided through the controller 110 and the method. The present invention uses multiple means 132 and processes these expressions to create a multimodal emotional awareness. In other words, the controller 110 analyses speech, user’s words, user’s face, and user’s wearable to understand the user’s emotional state. Every person’s way of intensity of feeling and expression is different. Someone might have a happiness intensity of 60 on face, while they are feeling 100. On the other hand, many people have “smiling” face and the facial emotion recognition system will always mark that person to be happy, even if the person is neutral, or sometimes angry. So, the standard emotion analysis models might not work fine on each of us. The present invention uses the common practice from psychology, which is establishing the baseline. As a human, we subconsciously profile baseline of our near and dear ones. The present invention ticks both essential conditions. Unlike existing emotion detection systems, in addition to the emotion history, the controller 110 considers weighted aggregation of the sensory information from the means 132, along with the baseline profile, to understand the current emotion of the user 130. On top of that, the understanding of the emotion is non-intrusive and without user 130 mentioning explicitly. Understanding of the emotion as context and modulating future conversation or course of action in conversational and assistive system makes the conversational system 100 as a personal companion, which is perceivably more intelligent, and more importantly, sensible, which is farfetched aim for the existing conversational systems.

[0036] The personal companion seamlessly understand the user’s emotion and influence the conversation to fit the situation. For example, in the evening, while going home, if user 130 is stressed and little furious, the personal companion avoids service reminder or if car fuel drops below reserved level but is more than sufficient for user 130 to comfortably reach home (as per navigation information), the personal companion refrains from alerting about the low fuel. On the other hand, if user 130 is driving fast, the personal companion understands that stress and anger may reduce the reaction time and it may prompt user 130 about the speed in a caring way, for which, it would not have prompted otherwise. At state-of-the-art, virtual assistants on modem cars cannot perform this. This is provided just as an example and to understand the invention in better way, and in no sense limited by the same.

[0037] It should be understood that the embodiments explained in the description above are only illustrative and do not limit the scope of this invention. Many such embodiments and other modifications and changes in the embodiment explained in the description are envisaged. The scope of the invention is only limited by the scope of the claims.

Claims

We claim:

1. A controller (110) of a conversational system (100), said conversational system (100) comprises at least one means (132) to receive input signals (128), to detect user characteristics, selected from a group comprising speech, texts in said speech, a physiological parameters and facial image, and said controller (110) connected to said at least one means (132), and configured to estimate an emotion profile (116), by an emotion model (112), using said input signals (128) from each of said at least one means (132), said emotion profile (116) comprises an estimated emotion, an intensity and confidence score, characterized in that, while a speech input is available, determine a context, by a context model (114), of an ongoing conversation detected in said speech input, and store said emotion profile (116) as baseline for said emotion and for said user (130) if said estimated emotion and the emotion detected in said context are same, and while a speech input is unavailable, prompt said user (130) for a response to validate said estimated emotion, and store said emotion profile (116) as baseline if validated by said user (130) with high confidence score.

2. The controller (110) as claimed in claim 1, wherein while a baseline emotion is stored in a memory element (118), said controller (110) configured to normalize said estimated emotion with said baseline emotion.

3. The controller (110) as claimed in claim 1 configured to monitor said emotion profile (116) of said user (130) for a predefined time period before said emotion profile (116) is stored as the baseline emotion.

4. The controller (110) as claimed in claim Iconfigured to monitor said estimated emotion of said at least one input signal (128) received from said at least one means (132) in comparison to said context, and adjust weightage of each of said at least one input signal (128) with higher allocation to said at least one input signal (128) which is close to said context.

5. A controller (110) to enable conversation with emotion context with a user (130), said controller (110) configured to determine an estimated emotion of said user (130), characterized in that, and perform action corresponding to said estimated emotion, said action comprises at least one of a response to said user (130) through an output means (120) and assist said user (130) in a task.

6. A method for a conversational system (100), said method comprising the steps of: receiving at least one input signal (128) from at least one means (132) for user characteristic selected from a group comprising speech, texts in said speech, a physiological parameters and facial image, and estimating an emotion profile (116), by an emotion model (112), using said input signal (128) from each of said at least one means (132), said emotion profile (116) comprises an estimated emotion, an intensity and confidence score, characterized by, while a speech input is available, determining a context, by a context model (114), of an ongoing conversation detected in said speech input, and storing said emotion profile (116) as baseline for said emotion and for said user (130) if said estimated emotion and the emotion detected in said context are same, and while a speech input is unavailable, prompting said user (130) for a response to validate said estimated emotion, and storing said emotion profile (116) as baseline if validated by said user (130) with high confidence score.

7. The method as claimed in claim 6, while a baseline emotion is stored in a memory element (118), said method comprises normalizing said estimated emotion with said baseline emotion.

8. The method as claimed in claim 6, comprises monitoring said emotion profile (116) of said user (130) for a predefined time period before said emotion profile (116) is stored as the baseline.

9. The method as claimed in claim 6 comprises, monitoring said estimated emotion of said at least one input signal (128) received from said at least one means (132) in comparison to said context, and adjusting weightage of each of the at least one input signal (128) with higher allocation to said at least one input signal (128) which is close to said context.

10. A method for enabling conversation with emotion context with a user (130), said method comprising the steps of determining an estimated emotion of said user (130), characterized by, and performing an action corresponding to said estimated emotion, said action comprises at least one of a response to said user (130) through an output means (132) and assist said user (130) in a task.