[go: up one dir, main page]

DK178068B1 - Mood based recommendation - Google Patents

Mood based recommendation Download PDF

Info

Publication number
DK178068B1
DK178068B1 DK201400147A DKPA201400147A DK178068B1 DK 178068 B1 DK178068 B1 DK 178068B1 DK 201400147 A DK201400147 A DK 201400147A DK PA201400147 A DKPA201400147 A DK PA201400147A DK 178068 B1 DK178068 B1 DK 178068B1
Authority
DK
Denmark
Prior art keywords
mood
emotion
vector
user
formula
Prior art date
Application number
DK201400147A
Other languages
Danish (da)
Inventor
Sven Ewan Shepstone
Original Assignee
Bang & Olufsen As
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bang & Olufsen As filed Critical Bang & Olufsen As
Priority to DK201400147A priority Critical patent/DK178068B1/en
Application granted granted Critical
Publication of DK178068B1 publication Critical patent/DK178068B1/en

Links

Landscapes

  • User Interface Of Digital Computer (AREA)

Abstract

A method and a multimedia rendering apparatus, where selection of media files is related to the mood of a user, and where the mood is detected from spoken words given by the user. By allowing the user to take part in the recommendation process, enables to compute an affective offset personal to each user, to be used for their future recommendation sessions.

Description

Mood Based Recommendation Technical FieldMood Based Recommendation Technical Field

This invention relates to control and use of multimedia rendering systems, specifically for browsing and selecting media files in a more user friendly mode of operation.This invention relates to control and use of multimedia rendering systems, specifically for browsing and selecting media files in a more user friendly mode of operation.

Background of the inventionBackground of the invention

Even with the steady increase of on-demand services such as Netflix and HBO, broadcast TV is still firmly entrenched in the home. It is typically the place where the local news and programming is to be found, where many consumers would be reluctant to part with. It is easy to use - turn on the TV, find a channel and watch. Since the consumer does not take part in the selection of the program lineup, recommendations can be serendipitous, something that customers value. From the provider-side there have been substantial investments in satellite, terrestrial and cable networks, and they want to see the best return on investment. Thus, it is anticipated that broadcast TV will not be going away any time soon.Even with the steady increase of on-demand services such as Netflix and HBO, broadcast TV is still firmly entrenched in the home. It is typically the place where local news and programming is to be found, where many consumers would be reluctant to part with. It's easy to use - turn on the TV, find a channel and watch. Since the consumer does not take part in the selection of the program lineup, recommendations can be serendipitous, something that customers value. From the provider side there have been substantial investments in satellite, terrestrial and cable networks, and they want to see the best return on investment. Thus, it is anticipated that broadcast TV will not be going away any time soon.

The Electronic Program Guide (EPG) is today an integrated part of most home television sets and set top boxes, and is still the predominant method when it comes to navigating both currently-showing and up-and-coming TV programs in the broadcast realm. It is typically presented in a grid-like fashion, with channels down and programs across the grid. However, while the EPG does provide the consumer some assistance, there can still be an overwhelming amount of content to choose from. Not only is the currently-airing program of interest, but also future programs that the consumer may wish to record or be reminded about. To illustrate: over a three-hour period with 30 available channels, with a program length of 30 minutes (typical during prime-time viewing), there are 180 programs to choose from, making an informed decision difficult.The Electronic Program Guide (EPG) is today an integrated part of most home television sets and top boxes, and is still the predominant method when it comes to navigating both currently-showing and up-and-coming TV programs in the broadcast realm. It is typically presented in a grid-like fashion, with channels down and programs across the grid. However, while the EPG does provide some assistance to the consumer, there can still be an overwhelming amount of content to choose from. Not only is the currently-airing program of interest, but also future programs that the consumer may wish to record or be reminded about. To illustrate: over a three-hour period with 30 available channels, with a program length of 30 minutes (typically during prime-time viewing), there are 180 programs to choose from, making an informed decision difficult.

In order to recommend something personal, a user profile is needed. The user profile data can be collected explicitly, e.g. by requesting users to supply data, or implicitly, through usage patterns. Matching of the user profile to the potentially recommendable content of interest can take place at two levels. At the cognitive level, semantic information such as content descriptors, for example genre or user ratings are utilized. The affective level on the other hand deals with the emotional context of the user, and how this relates to the content.In order to recommend something personal, a user profile is needed. The user profile data can be collected explicitly, e.g. by requesting users to supply data, or implicitly, through usage patterns. Matching the user profile to the potentially recommendable content of interest can take place on two levels. At the cognitive level, semantic information such as content descriptors, for example genre or user ratings are utilized. The affective level on the other hand deals with the emotional context of the user, and how this relates to the content.

One area that has received little attention, in the context of recommending content within the EPG framework, is using the user's direct audio environment to extract profile information that can be used to make recommendations. State-of-the-art speaker recognition methods have made it substantially more feasible to extract information about the users, such as their age and gender or emotions, using models built upon a text-independent speaker recognition framework.One area that has received little attention, in the context of recommending content within the EPG framework, is using the user's direct audio environment to extract profile information that can be used to make recommendations. State-of-the-art speaker recognition methods have made it substantially more feasible to extract information about users, such as their age and gender or emotions, using models built on a text-independent speaker recognition framework.

The current invention proposes a novel framework that takes into account users' audio derived moods to provide the most relevant TV channel recommendation to them. A state-of-the- art audio classifier classifies users' speech into individual emotions, which contribute ultimately to their mood. Since, for a given mood, two separate users might have different ideas of what would be applicable to watch, it's not expected from them to find the initially recommended item immediately appealing. Users are therefore given the possibility to critique the item by navigating the emotion space of all candidate items to find a more suitable item, should they wish to do so. To quantify the difference between the initial item and finally selected item, we model what's called the affective offset between the items. The novelty lies in leveraging this affective offset to provide system adjustments in such a way that future recommendations are more tailored to the individual person.The present invention proposes a novel framework that takes into account users' audio derived moods to provide the most relevant TV channel recommendation to them. A state-of-the-art audio classifier classifies users' speech into individual emotions, which ultimately contribute to their mood. Since, for a given mood, two separate users might have different ideas of what would be applicable to watch, it is not expected of them to find the initially recommended item immediately appealing. Users are therefore given the opportunity to critique the item by navigating the emotion space of all candidate items to find a more suitable item should they wish to do so. To quantify the difference between the initial item and finally selected item, we model what's called the affective offset between the items. The novelty lies in leveraging this affective offset to provide system adjustments in such a way that future recommendations are more tailored to the individual.

Prior artPrior art

With reference to Yazid Attabi et al, the use of i-vectors is a general method applicable in speaker recognition, which allow for an easy relabeling process since the background part of the i-vector system does not need to be retrained, which saves time.With reference to Yazid Attabi et al, the use of i-vectors is a general method applicable in speaker recognition, which allows for an easy relabeling process since the background part of the i-vector system does not need to be retrained, which saves time .

With reference to US2013/0132988, we agree that the mere art of condensing emotions to moods is common knowledge.With reference to US2013 / 0132988, we agree that the more art of condensing emotions to moods is common knowledge.

The current invention discloses a new method of how the emotions are condensed to a mood profile over time is however a contribution.However, the present invention discloses a new method of how the emotions are condensed to a mood profile over time, however, a contribution.

With reference to US2013/0132988, the mere art of deriving a valence arousal parameter is by conventional means. As an example to deriving a valence arousal parameter, patent US2013/013988 proposes to use regression to obtain parameters from audio data, as known: (Projection of Acoustic Features to continuous valence-arousal mood labels via regression - EM Schmidt - 2009).With reference to US2013 / 0132988, the more art of deriving a valence arousal parameter is by conventional means. As an example of deriving a valence arousal parameter, patent US2013 / 013988 proposes to use regression to obtain parameters from audio data, as known: (Projection of Acoustic Features to continuous valence arousal mood labels via regression - EM Schmidt - 2009).

The current invention discloses a new method, which is not regression based, but assumes discrete emotion categories and their fixed placement in the emotion space. In the use of probabilities from emotion detection that are used to condense the incoming mood of a user, which has been obtained over a period of time, to a vector in emotion space, which is then used to identify a potential item for recommendation.The present invention discloses a new method, which is not regression based, but assumes discrete emotion categories and their fixed placement in the emotion space. In the use of probabilities from emotion detection that are used to condense the incoming mood of a user, which has been obtained over a period of time, to a vector in emotion space, which is then used to identify a potential item for recommendation.

With respect to US2013/0132988, the patent refers to using distance from the valence arousal parameter to identify a set of items where the number of items depends on a user-specified range setting.With respect to US2013 / 0132988, the patent refers to using distance from the valence arousal parameter to identify a set of items where the number of items depends on a user-specified range setting.

The current invention discloses a new method is concerned with measuring the distance from a directional mood vector in valence arousal space to a single content item.The present invention discloses a new method concerned with measuring the distance from a directional mood vector in valence arousal space to a single content item.

The invention introduces a new method of controlling the selection of media files provided to the user; the method of which the mood of the user is detected and calculated from spoken words said by the user. It's not specific system control commands that the user gives, but it's the mood as such represented in whatever the user speaks, that's detected and applied to control the system.The invention introduces a new method of controlling the selection of media files provided to the user; the method by which the mood of the user is detected and calculated from spoken words said by the user. It's not specific system control commands that the user gives, but it's the mood as such represented in whatever the user speaks, that's detected and applied to control the system.

Applied terms and definitions: • Content items have been pre-annotated in the valence arousal (VA) space. • A recent archive of clean speech data is available for each person (as might be collected through microphones, telephone data, cellular data, etc.) from which audio parameters can be extracted. • Valence: Degree of pleasantness / unpleasantness of an experienced emotion. • Arousal: Degree of perception of arousal is also called activation or intensity, of an experienced emotion. • A valence arousal mood parameter is also called directional mood vector. • l-vectors are low-dimensional, fixed length features extracted from audio utterances. They are currently the state of the art in speaker recognition, and have shown to have good performance. Regardless of the length of the audio utterance, the l-vector will always have a fixed length, l-vectors are extracted from audio utterances before doing any classification or modeling on them. • Circumplex Model of Affect: Psychological model stating that emotions exhibit a structured ordering along the periphery of a circle in the valence arousal space. • The mood offset is also called affective offset. • SVM stands for "Support Vector Machine". It can be used to build a classifier from the iVectors (1 emotion per class). • PLDA stands for "Probabilistic Linear Discriminant Analysis" and is used to enhance models by modelling noise and other nuisance effects. • PHP stands for "PHP: Hypertext Preprocessor".Applied terms and definitions: • Content items have been pre-annotated in the valence arousal (VA) space. • A recent archive of clean speech data is available to each person (as might be collected through microphones, telephone data, cellular data, etc.) from which audio parameters can be extracted. • Valence: Degree of pleasantness / unpleasantness of an experienced emotion. • Arousal: Degree of perception of arousal is also called activation or intensity, or an experienced emotion. • A valence arousal mood parameter is also called directional mood vector. • L-vectors are low-dimensional, fixed-length features extracted from audio utterances. They are currently the state of the art in speaker recognition, and have shown to have good performance. Regardless of the length of the audio utterance, the l-vector will always have a fixed length, l-vectors are extracted from audio utterances before doing any classification or modeling on them. • Circumplex Model of Affect: Psychological model stating that emotions exhibited a structured ordering along the periphery of a circle in the valence arousal space. • The mood offset is also called affective offset. • SVM stands for "Support Vector Machine". It can be used to build a classifier from the iVectors (1 emotion per class). • PLDA stands for "Probabilistic Linear Discriminant Analysis" and is used to enhance models by modeling noise and other nuance effects. • PHP stands for "PHP: Hypertext Preprocessor".

The present invention introduces several new technologies and in summary these are: • The concept of affective offset is disclosed, which is the difference between a user's perceived affective state and the affective annotation of the content they wish to see.The present invention introduces several new technologies and in summary these are: • The concept of affective offset is disclosed, which is the difference between a user's perceived affective state and the affective annotation of the content they wish to see.

It's described how this affective offset can be used within a framework for providing recommendations for e.g. TV programs. • First a user's mood profile is determined using 12-class audio-based emotion classification. An initial TV content item is then displayed to the user based on the extracted mood profile. • The user has the option to either accept the recommendation, or to critique the item once or several times, by navigating the emotion space to request an alternative match. The final match is then compared to the initial match, in terms of the difference in the items' affective parameterization. • This offset is then utilized in future recommendation sessions.It is described how this affective offset can be used within a framework for providing recommendations for e.g. TV programs. • First a user's mood profile is determined using 12-class audio-based emotion classification. An initial TV content item is then displayed to the user based on the extracted mood profile. • The user has the option to either accept the recommendation, or to critique the item once or several times, by navigating the emotion space to request an alternative match. The final match is then compared to the initial match, in terms of the difference in the items' affective parameterization. • This offset is then utilized in future recommendation sessions.

Thus, a first aspect of the invention is: A method for identifying the mood of a person through audio parameters, and using this mood to enhance both current and future recommendation of content items, the method characterized as: • the vector of all emotion category scores is termed the emotion profile and is according to formula 1; • individual emotion profiles, collected at different points in time, are summed to give a mood profile, and is according to formula 2; • the number of components in the mood profile is equal to the total number of emotion categories; • each emotion category score in the mood profile is converted to a vector, whose direction is indicative of the actual emotion itself, and whose magnitude is proportional to the confidence score for that emotion, according to formula 5; • having the same number of vectors in the emotion space as the number of predefined emotions categories; • all vectors are summed to obtain a single directional mood vector that encapsulates the overall mood, according to formula 6; • emotion categories that received a low confidence score in the mood profile will play a small part in the makeup of the directional mood vector; • the directional mood vector is used to identify an initial item for recommendation, by choosing the item with the shortest Euclidean distance from this vector.Thus, a first aspect of the invention is: A method for identifying a person's mood through audio parameters, and using this mood to enhance both current and future recommendation of content items, the method characterized as: • the vector of each emotion category scores are termed the emotion profile and are according to formula 1; • individual emotion profiles, collected at different points in time, are summed to give a mood profile, and is according to formula 2; • the number of components in the mood profile is equal to the total number of emotion categories; • each emotion category score in the mood profile is converted to a vector, whose direction is indicative of the actual emotion itself, and whose magnitude is proportional to the confidence score for that emotion, according to formula 5; • having the same number of vectors in the emotion space as the number of predefined emotion categories; • all vectors are summed to obtain a single directional mood vector that encapsulates the overall mood, according to formula 6; • emotion categories that received a low confidence score in the mood profile will play a small part in the makeup of the directional mood vector; • The directional mood vector is used to identify an initial item for recommendation, by choosing the item with the shortest Euclidean distance from this vector.

Overview of figuresOverview of figures

Figure 1 illustrates the system functionality with a TV EPG as example.Figure 1 illustrates the system functionality with a TV EPG as an example.

Figure 2 illustrate the emotion labels.Figure 2 illustrates the emotion labels.

Figure 3 illustrate different modes of system operation.Figure 3 illustrates different modes of system operation.

Figure 4 illustrate a system control concept applying the invention.Figure 4 illustrates a system control concept applying the invention.

DescriptionDescription

Moods cannot be measured in the same way as emotions. Emotions are more transient in nature, give rise to moods and that certain emotions, such as anger, can cause one to be in a bad mood for a longer period. The more frequently occurring or dominant that an emotion is will have an ultimate bearing on the person's mood. A person with a propensity for being in bad mood might more easily be triggered into becoming angry; it is generally understood that moods last longer than emotions.Moods cannot be measured in the same way as emotions. Emotions are more transient in nature, giving rise to moods and that certain emotions, such as anger, can cause one to be in a bad mood for a longer period. The more frequently occurring or dominant that an emotion is will have an ultimate bearing on the person's mood. A person with a propensity for being in bad mood might more easily be triggered into becoming angry; It is generally understood that moods last longer than emotions.

Since mood can't be measured directly, emotions are evaluated to see how they might be used to determine an entry mood for the system. The emotional state of a TV viewer can be acquired either explicitly or implicitly. Due to certain problems seen with explicit acquisition of emotions, it has been chosen that they be collected implicitly. Of all the induction methodologies available for obtaining the emotional state, speech is the cheapest and most non-intrusive method.Since mood cannot be measured directly, emotions are evaluated to see how they might be used to determine an entry mood for the system. The emotional state of a TV viewer can be acquired either explicitly or implicitly. Due to certain problems seen with explicit acquisition of emotions, it has been chosen that they are collected implicitly. Of all the induction methodologies available for obtaining the emotional state, speech is the cheapest and most non-intrusive method.

The modeling of emotions is chosen to be the most prominent model used is the dimensional approach, which is based on separate valence, arousal and dominance dimensions, where any emotional state can be represented as a linear combination of these three basic dimensions. In the valence arousal dimension space, valence is more commonly referred to as the pleasantness of the emotion, whereas arousal refers to the actual intensity.The modeling of emotions chosen to be the most prominent model used is the dimensional approach, which is based on separate valence, arousal and dominance dimensions, where any emotional state can be represented as a linear combination of these three basic dimensions. In the valence arousal dimension space, valence is more commonly referred to as the pleasantness of the emotion, whereas arousal refers to the actual intensity.

In this invention the well-known dimensional model known as the Circumplex Model of Affect is applied. This is also based on valence and arousal shows how emotional states exhibit a very particular ordering around the periphery of a circle. Emotions that are close in nature are adjacent to one another, whereas bipolar emotions are situated on opposite sides of the circle. Furthermore, the emotional states are not distributed evenly around the circle, and some states lie closer to each other on the circumplex than others do. The location of these affective states has been determined using empirical data from psychological studies. The location of each emotion is expressed in degrees going counterclockwise around the circle, starting at 0° from the positive valence axis.In this invention the well-known dimensional model known as the Circumplex Model of Affect is applied. This is also based on valence and arousal shows how emotional states exhibit a very particular order around the periphery of a circle. Emotions that are close in nature are adjacent to one another, whereas bipolar emotions are situated on opposite sides of the circle. Furthermore, the emotional states are not evenly distributed around the circle, and some states are closer to each other on the circumplex than others do. The location of these affective states has been determined using empirical data from psychological studies. The location of each emotion is expressed in degrees going counterclockwise around the circle, starting at 0 ° from the positive valence axis.

The Circumplex Model of affect has the advantage of treating emotion categories as single points around a circle while at the same time giving sense of location, and ordering, for the emotions. Furthermore, since emotions points are relative to the valence arousal axes, the model gives an easy interpretation of what happens when the valence arousal axes are shifted, or tilted. All this will help to relate the emotion categories to the valence arousal space shortly.The Circumplex Model of affect has the advantage of treating emotion categories as single points around a circle while at the same time giving sense of location, and ordering, for the emotions. Furthermore, since emotion points are relative to the valence arousal axes, the model gives an easy interpretation of what happens when the valence arousal axes are shifted, or tilted. All this will help to relate the emotion categories to the valence arousal space shortly.

The applied emotion classification in speech are utterances generally assigned to fixed labels, such as the emotions anger, disgust, fear, happiness, sadness and surprise. Emotion corpora typically contain either acted speech or spontaneous speech assigned to fixed emotion labels.The applied emotion classification in speech are utterances generally assigned to fixed labels, such as the emotions anger, disgust, fear, happiness, sadness and surprise. Emotion corpora typically contain either acted speech or spontaneous speech assigned to fixed emotion labels.

After any necessary pre-processing of the speech signal has taken place, low-level feature descriptors are extracted, from which an appropriate model is constructed. Many parameters are used to detect emotion, including mel-frequency-cepstral coefficients (MFCCs), as well as pitch, intensity, formants and even zero-crossing rate. Furthermore the modeling can either be based on fixed length features or variable length features.After any necessary pre-processing of the speech signal has taken place, low-level feature descriptors are extracted from which an appropriate model is constructed. Many parameters are used to detect emotion, including mel-frequency-cepstral coefficients (MFCCs), as well as pitch, intensity, formants and even zero-crossing rate. Furthermore, the modeling can either be based on fixed length features or variable length features.

Emotions are represented in the l-vector model, where each utterance is expressed as a lowdimensional l-vector (usually between 10 and 300 dimensions). One of the advantages of modeling in the l-vector space is that l-vectors themselves are trained as unsupervised data, without any knowledge of emotion classes. What this essentially means is that when emotion classes are enrolled, a more traditional classifier, such as SVM or PLDA can be used, allowing for very quick enrollment of the users' emotional data. This is particularly advantageous when a large amount of background data is needed to increase the performance of the classification.Emotions are represented in the L-vector model, where each utterance is expressed as a low-dimensional L-vector (usually between 10 and 300 dimensions). One of the advantages of modeling in the l-vector space is that l-vectors themselves are trained as unsupervised data, without any knowledge of emotion classes. What this essentially means is that when emotion classes are enrolled, a more traditional classifier such as SVM or PLDA can be used, allowing for very quick enrollment of the users' emotional data. This is particularly advantageous when a large amount of background data is needed to increase the performance of the classification.

There is a strong basis for providing recommendation for EPG items by quickly singling out the most relevant channel. A more flexible selection system is required for the benefits of the user.There is a strong basis for providing recommendation for EPG items by quickly singling out the most relevant channel. A more flexible selection system is required for the user's benefits.

In a typical case-based recommender system, which is just one of the main types of knowledge-based systems, a process known as critiquing is used in the following manner: • The consumer's preference is extracted, either explicitly, or implicitly. • Using some sort of similarity metric, the system provides an initial recommendation. • The consumer either accepts the recommendation, which ends the entire process, or critiques it, by selecting one of the critique options available. • For each critique made, the item space is narrowed down by filtering out the unwanted items, and a new recommendation is made. • The process continues until the customer finally selects an item.In a typical case-based recommendation system, which is just one of the main types of knowledge-based systems, a process known as critiquing is used in the following manner: • The consumer's preference is extracted, either explicitly or implicitly. • Using some sort of similarity metric, the system provides an initial recommendation. • The consumer either accepts the recommendation, which ends the entire process, or critiques it, by selecting one of the critical options available. • For each critique made, the item space is narrowed down by filtering out the unwanted items, and a new recommendation is made. • The process continues until the customer finally selects an item.

The current invention makes use of critiquing to allow navigation of items in the valence arousal space, and to gather feedback needed for computing affective offset. By allowing the user themselves to take part in the recommendation process gives feedback on how the user's perceived affective state differs from their desired state, and what they really would like to watch.The present invention utilizes critique to allow navigation of items in the valence arousal space, and to gather feedback needed for computing affective offset. By allowing the user to take part in the recommendation process, they provide feedback on how the user's perceived affective state differs from their desired state, and what they would really like to watch.

Thus a second aspect of the invention is: • the user can navigate through alternative media files by: a. select a first item or browse to find an alternative second item by means of critiquing; b. for each browsing iteration, alternatives are found by specifying constraints, whereby the space of all items is reduced according to the constraint specified; c. for each browsing iteration, the most applicable item is found by minimizing the distance between the currently recommended item and the set of items bound by the new constraint; d. this step is repeated until a suitable item is found. A preferred embodiment of the invention is describes in the following.Thus a second aspect of the invention is: • the user can navigate through alternative media files by: a. Select a first item or browse to find an alternative second item by means of critiquing; b. for each browsing iteration, alternatives are found by specifying constraints, whereby the space of all items is reduced according to the constraint specified; c. for each browsing iteration, the most applicable item is found by minimizing the distance between the currently recommended item and the set of items bound by the new constraint; d. This step is repeated until a suitable item is found. A preferred embodiment of the invention is described in the following.

Figure 1 illustrate the system functionality with a TV EPG as example - but any media file in a playlist or individual media file may be addressed; the files having related mood attributes. A typical system operation can be realized as follows:Figure 1 illustrates the system functionality with a TV EPG as an example - but any media file in a playlist or individual media file may be addressed; the files having related mood attributes. A typical system operation can be realized as follows:

Once the user's mood has been detected, from audio-based parameters, the closest matching item that matches the user's mood profile is displayed to the user. The user can either accept the item, or request the recommendation of a new item. To be able to make a new recommendation, the user provides information on how the system should constrain its search. The process continues until the user finally accepts the item.Once the user's mood has been detected, from audio-based parameters, the closest matching item that matches the user's mood profile is displayed to the user. The user can either accept the item, or request the recommendation of a new item. To be able to make a new recommendation, the user provides information on how the system should constrain its search. The process continues until the user finally accepts the item.

After the recommendation process has completed, the system calculates the affective offset between the initially recommended item and the finally selected item, and takes this into account when processing the output labels from the classification stage, in such a way as to reflect the new mood offset.After the recommendation process has completed, the system calculates the affective offset between the initially recommended item and the finally selected item, and takes this into account when processing the output labels from the classification stage, in such a way as to reflect the new mood offset .

Mood DetectionMood Detection

The mood detection is the first step to process. Since it is emotions themselves that are detectable and give rise to moods, the emotion detection is: Let E be the total number of emotion classes. Emotions can then be detected by analyzing the speech utterances from each user and assigning an emotion class e e E to each. The more classes that need to be classified the lower the classification accuracy. What this entails is, that for a set of utterances over a time interval for which the actual emotion was ea, and the predicted emotion is ep, that there will almost always exist a subset of these utterances where ea Φ ep, i.e. utterances for which the actual class was not predicted correctly. What is important here is not so much that each emotion is categorized 100 % correctly, but that the areas of the emotion space, and hence adjacent emotions, that were detected, are reflected in the profile. With this in mind, the emotion profile for a single user u can be modeled by:The mood detection is the first step to process. Since it is emotions themselves that are detectable and give rise to moods, the emotion detection is: Let E be the total number of emotion classes. Emotions can then be detected by analyzing the speech utterances from each user and assigning an emotion class e to each. The more classes that need to be classified the lower the classification accuracy. What this entails is that for a set of utterances over a time interval for which the actual emotion was ea, and the predicted emotion is ep, that there will almost always be a subset of these utterances where ea Φ ep, i.e. utterances for which the actual class was not predicted correctly. What is important here is not so much that each emotion is categorized 100% correctly, but that the areas of the emotion space, and hence adjacent emotions, that were detected are reflected in the profile. With this in mind, the emotion profile for a single user you can be modeled by:

Figure DK178068B1D00101

where py, 0 < py < 1, simply represents the actual predicted probability for emotion class j, 1< j < E, andwhere py, 0 <py <1, simply represents the actual predicted probability for emotion class j, 1 <j <E, and

Figure DK178068B1D00102

Over a sequence of time intervals, e.g. over the last 12 hours, the system collects the individual emotion profiles, and condenses them to a mood profile.Over a sequence of time intervals, e.g. over the last 12 hours, the system collects the individual emotion profiles, and condenses them to a mood profile.

Figure DK178068B1D00103

where e^= The e" corresponding to the fh time interval T = Total number of discrete time intervals, and wi = Weighting of for the fh time intervalwhere e ^ = The e "corresponding to the fh time interval T = Total number of discrete time intervals, and wi = Weighting of for the fh time interval

To compute the weighting, a modified form of the depreciation citation count method is used, where earlier items contribute less to a given score. The significance of applying such a weighting in this context is to allow more recent emotions to be expressed more strongly in the mood profile nQ.To compute the weighting, a modified form of the depreciation citation count method is used, where earlier items contribute less to a given score. The significance of applying such weighting in this context is to allow more recent emotions to be expressed more strongly in the mood profile nQ.

The weighting is thus given by the following:The weighting is thus given by the following:

Figure DK178068B1D00111

Determination of entry item in VA spaceDetermination of entry item in VA space

Now for a given fixed set of emotions, each emotion can be characterized by associating it with an affective location (angle in degrees) around a circle. There is a mapping from each emotion category to its corresponding angle. More formally, this set of emotions can be expressed in the following way:Now for a given fixed set of emotions, each emotion can be characterized by associating it with an affective location (angle in degrees) around a circle. There is a mapping from each emotion category to its corresponding angle. More formally, this set of emotions can be expressed in the following way:

Figure DK178068B1D00112

To map the mood profile, which was introduced in the previous section, to a point in VA space, we introduce the concept of a directional mood vector.To map the mood profile introduced in the previous section, to a point in VA space, we introduce the concept of a directional mood vector.

Each component of both Θ and 1½ is associated with a separate emotion. Therefore for each emotion j, 1 <j<E, we create a new vector Moodva] with magnitude umyand angle θ}, with the angle measured in degrees from the positive valence axis in the VA space:Each component of both Θ and 1½ is associated with a separate emotion. Therefore for each emotion j, 1 <j <E, we create a new vector Moodva] with magnitude umyand angle θ}, with the angle measured in degrees from the positive valence axis in the VA space:

Figure DK178068B1D00113

This results in E separate emotion vectors, where the angle for each serves as identification for an emotion and the magnitude indicates the confidence of that emotion, as detected by the audio classifier. Finally, all E components are summed to obtain the final directional mood vector. More formally, this is deninted as:This results in E separate emotion vectors, where the angle for each serves as identification for an emotion and the magnitude indicates the confidence of that emotion, as detected by the audio classifier. Finally, all E components are summed to obtain the final directional mood vector. More formally, this is deninted as:

Figure DK178068B1D00114

In order to find an appropriate entry item, which forms the first stage of the recommendation process, we need to associate the directional mood vector with a suitable point in VA space. The content items, being these points, are beforehand clustered into E separate partitions, with the cluster means of each given as follows:In order to find an appropriate entry item, which forms the first stage of the recommendation process, we need to associate the directional mood vector with a suitable point in VA space. The content items, being these points, are previously clustered into E separate partitions, with the cluster means of each given as follows:

Figure DK178068B1D00121

The cluster k, 1 < k < E to which the directional mood vector should be associated to is then the one where the Euclidean distance between its mean μ(Φκ) and the directional mood vector Moodva' is shortest. The first item to be recommended, or entry item, is then simply the item that is closest to the mean of the cluster.The cluster k, 1 <k <E to which the directional mood vector should be associated is then the one where the Euclidean distance between its mean μ (Φκ) and the directional mood vector Moodva 'is shortest. The first item to be recommended, or entry item, is then simply the item closest to the mean of the cluster.

Thus, a third aspect of the invention is: • the mood profile is converted to a valence arousal mood parameter, whose direction is indicative of the pervasive detected mood of the user and whose magnitude is proportional to the confidence of this mood; and • with the mood parameter as input, a single content item is presented to the user, by finding the cluster with shortest distance to the mood parameter and extracting a first item being an audio/video object from this cluster; and • the mood offset between the finally selected item and initially recommended item is computed; and • the mood offset is applied in future recommendation for a quick access to the desired region in the VA space.Thus, a third aspect of the invention is: • the mood profile is converted to a valence arousal mood parameter, whose direction is indicative of the pervasive detected mood of the user and whose magnitude is proportional to the confidence of this mood; and • with the mood parameter as input, a single content item is presented to the user, by finding the cluster with the shortest distance to the mood parameter and extracting a first item being an audio / video object from this cluster; and • the mood offset between the finally selected item and initially recommended item is computed; and • the mood offset is applied in future recommendation for quick access to the desired region in the VA space.

Critiquing StageCritiquing Stage

At this stage the user has the opportunity to examine the entry item. If he/she decides not to accept the item, a critique is specified for the new item. The possible critiques are more pleasant, less pleasant, more intense and less intense. These correspond to the affective operations more valence, less valence, more arousal and less arousal, respectively. The algorithm determines beforehand whether there is an availability of items to satisfy the potential constraint. If this condition is not satisfied, the constraint is simply not presented. Although it is possible to implement compound constraints, due to the low dimensionality of the number of free parameters available (only four), we have opted for simple constraints only in this invention.At this stage the user has the opportunity to examine the entry item. If he / she decides not to accept the item, a critique is specified for the new item. The possible critiques are more pleasant, less pleasant, more intense and less intense. These correspond to the affective operations more valence, less valence, more arousal and less arousal, respectively. The algorithm determines beforehand whether there is an availability of items to satisfy the potential constraint. If this condition is not satisfied, the constraint is simply not presented. Although it is possible to implement compound constraints, due to the low dimensionality of the number of free parameters available (only four), we have opted for simple constraints only in this invention.

Once the user has selected a constraint, the best matching item is determined and displayed in the following way: for a given iteration r, let Sbe the set of items subject to the new constraint Cr. The next item to be recommended is then the item with the shortest distance between the currently displayed item itemc, i.e. the last recommended item, and all other items subject to the constraint, and given as:Once the user has selected a constraint, the best matching item is determined and displayed in the following way: for a given iteration, let Sbe subject the set of items subject to the new constraint Cr. The next item to be recommended is then the item with the shortest distance between the currently displayed item itemc, i.e. the last recommended item, and all other items subject to the constraint, and given as:

Figure DK178068B1D00131

where the distance d{itern'clTterrQ is a weighted form of the standard Euclidean distance in VA space:where the distance d {itern'clTterrQ is a weighted form of the standard Euclidean distance in VA space:

Figure DK178068B1D00132

One of the problems with using the standard Euclidean distance is that it is based on pure distance and no consideration is given to the direction in which the user really wishes to traverse the space. The weights i/i/v and i/i/a are therefore introduced and chosen empirically to ensure that more preference is given to either the valence or arousal dimension, depending on what constraint was chosen. This allows for a larger distance in the desired direction to be taken into consideration than would be otherwise, and results in a more direct path.One of the problems with using the standard Euclidean distance is that it is based on pure distance and no consideration is given to the direction in which the user really wishes to traverse the space. The weights i / i / v and i / i / a are therefore introduced and chosen empirically to ensure that more preference is given to either the valence or arousal dimension, depending on which constraint was chosen. This allows for greater distance in the desired direction to be taken into consideration than would otherwise be, and results in a more direct path.

The recommendation process continues until the user selects an item as acceptable, in which case it is terminated.The recommendation process continues until the user selects an item as acceptable, in which case it is terminated.

Affective Offset DeterminationAffective Offset Determination

Once the recommendation process has completed, the user will be located at another point in VA space. How far this point is located from the initial recommendation depends on both the number of cycles taken as well as the overall affective bearing the user took. In order to know how far off the user is from the initial recommendation, we now compute the affective offset. This offset will then be taken into account in future recommendation sessions to offset the user's mood profile (the perceived mood) with the recommended content (which relates to the desired mood).Once the recommendation process has completed, the user will be located at another point in VA space. How far this point is located from the initial recommendation depends on both the number of cycles taken as well as the overall affective bearing the user took. In order to know how far the user is from the initial recommendation, we now compute the affective offset. This offset will then be taken into account in future recommendation sessions to offset the user's mood profile (the perceived mood) with the recommended content (which relates to the desired mood).

Let A be the vector passing through the origin and the initial point where the user set out from, and let B be the vector passing through the origin and the point representing the finally selected item. Then the angle of this offset, in degrees, is computed as:Let A be the vector passing through the origin and the initial point where the user sets out from, and let B be the vector passing through the origin and the point representing the finally selected item. Then the angle of this offset, in degrees, is computed as:

Figure DK178068B1D00133

Where the cosine distance between A and B, andWhere the cosine distance between A and B, and

Figure DK178068B1D00141

arccos(x) = Θ gives Θ in degrees and not radians.arccos (x) = Θ gives Θ in degrees and not radians.

However, not only is the angle important here, but also the direction (on the emotion circumplex) of B relative to A. If in future recommendation rounds offset the emotions in the wrong direction, instead of compensating for the mismatch between detected mood and recommended item, it would effectively be contributing to the error instead of reducing it. Therefore it is determined whether this direction is clockwise, or counter-clockwise. To do this, first compute the absolute angle of both A and B. The absolute angle for a vector through the origin (positive valence axis) to a given point P, P = A, P= B, is computed in the following way:However, not only is the angle important here, but also the direction (on the emotion circumplex) of B relative to A. If in future recommendation rounds offset the emotions in the wrong direction, instead of compensating for the mismatch between detected mood and recommended item, it would effectively contribute to the error instead of reducing it. Therefore, it is determined whether this direction is clockwise, or counter-clockwise. To do this, first compute the absolute angle of both A and B. The absolute angle for a vector through the origin (positive valence axis) to a given point P, P = A, P = B, is computed in the following way:

Figure DK178068B1D00142

where arctan(y; x) = Θ gives Θ in degrees (-180 < Θ < 180) mod (Θ; 360) = Θ gives Θ in degrees (0 < Θ < 360)where arctan (y; x) = Θ gives Θ in degrees (-180 <Θ <180) mod (Θ; 360) = Θ gives Θ in degrees (0 <Θ <360)

Depending on the location of A and B, two possible angles can be computed:Depending on the location of A and B, two possible angles can be computed:

Figure DK178068B1D00143

WhereSomewhere

Diffc = B is located clockwise relative to A Diffcc = B is located counterclockwise relative to ADiffc = B is located clockwise relative to A Diffcc = B is located counterclockwise relative to A

If Diffc = OffsetAngie> then this indicates that the offset occurs in the clockwise direction andIf Diffc = OffsetAngie> then this indicates that the offset occurs in the clockwise direction and

If Offsetsign = -1.If Offsetsign = -1.

Likewise if Diffcc = OffsetAngle, then Offsetsign = 1. The sign is combined with the previously computed offset anale, to aive the directional offset:Likewise if Diffcc = OffsetAngle, then Offsetsign = 1. The sign is combined with the previously computed offset anal, to aive the directional offset:

Figure DK178068B1D00144

Relabeling stageRelabeling stage

When training the 12-class emotion classifier, labels indicate the emotion that each audio utterance is associated with. With respect to labeling of emotions, there is no concept of distance or overlap between labels, and the labels themselves are simply emotional categories. However, these concepts hold for the emotional spaces themselves, and where they move, the labels will move.When training the 12-class emotion classifier, labels indicate the emotion that each audio utterance is associated with. With respect to labeling of emotions, there is no concept of distance or overlap between labels, and the labels themselves are simply emotional categories. However, these concepts hold for the emotional spaces themselves, and where they move, the labels will move.

Figure 2 illustrate the emotion labels. By tilting the valence, arousal axis by Θ, imposes a new ordering of the labels.Figure 2 illustrates the emotion labels. By tilting the valence, arousal axis by Θ, imposes a new ordering of the labels.

Labels before tilt: Amusement, Joy, Interest, ..., Pride, Pleasure.Labels before tilt: Entertainment, Joy, Interest, ..., Pride, Pleasure.

Labels after tilt: Cold Anger, Hot Anger, ...., Joy, Interest.Labels after tilt: Cold Anger, Hot Anger, ...., Joy, Interest.

For a given configuration, starting from 0°, there exists an explicit fixed ordering of the emotion labels. By tilting the valence and arousal axes by Θ, which happens to be the affective offset calculated in the previous stage, it effectively changes the ordering of the labels. An important design consideration was whether to rotate the directional mood vector, as computed in equation 6, or to rotate the labels themselves. The rationale for rotating the speech labels themselves allows for the possibility of incorporating future enrollment data, for example, as might be retrieved through multi-modal emotional systems, and leads to a better accuracy over time. Simply rotating the directional mood vector would make the system unadaptable.For a given configuration, starting from 0 °, there is an explicit fixed ordering of the emotion labels. By tilting the valence and arousal axes by Θ, which happens to be the affective offset calculated in the previous stage, it effectively changes the ordering of the labels. An important design consideration was whether to rotate the directional mood vector, as computed in equation 6, or to rotate the labels themselves. The rationale for rotating the speech labels themselves allows for the possibility of incorporating future enrollment data, for example, as might be retrieved through multi-modal emotional systems, and leads to better accuracy over time. Simply rotating the directional mood vector would make the system inadaptable.

Now more formally, let L = {lt, l2,..., lE) be the set of labels; then X = {Z1(Z2,..., lE] represents the sequence of labels from L, before applying the affective offset.Now more formally, let L = {lt, l2, ..., lE) be the set of labels; then X = {Z1 (Z2, ..., lE) represents the sequence of labels from L, before applying the affective offset.

The labels in the list are arranged in order of their respective locations after Θ =The labels in the list are arranged in order of their respective locations after Θ =

Likewise Y = {lt, l2, ..., lE] represents the sequence of labels from L after applying affective offset, but where the list now starts from Θ = θ2 instead.Likewise Y = {lt, l2, ..., lE] represents the sequence of labels from L after applying affective offset, but where the list now starts from Θ = θ2 instead.

The mapping from old label / to new label is then simply carried out by the mapping function f-.L^>L,l^>Y [lndexx{l)], where 7ndexx(Z)is the index of label /in X.The mapping from old label / to new label is then simply performed by the mapping function f-.L ^> L, l ^> Y [lndexx {l)], where 7ndexx (Z) is the index of label / in X .

Yet another aspect of the invention is: • the mood offset computation is realized by modifying the emotion labels of detected emotions which amounts to a mood rotation, this according to the Circumplex Model of Affect, in the VA space.Yet another aspect of the invention is: • The mood offset computation is realized by modifying the emotion labels of detected emotions which amount to a mood rotation, this according to the Circumplex Model of Affect, in the VA space.

Example data for a preferred embodiment is given below:Example data for a preferred embodiment is given below:

Mood Determination and Audio Classification of EmotionsMood Determination and Audio Classification of Emotions

The audio data used to represent the home user's emotional state applies the Geneva Multimodal Emotion Portrayals (GEMEP) which was also chosen as the dataset. The dataset contains 1260 short voice utterances, divided into 18 emotional classes. The data is split across 10 actors, of which half are male and other half female. Due to the fact that 6 out of 18 of the emotions occur very sparsely in the dataset, the classification was restricted to 12 separate emotions. These were amusement, pride, joy and interest (positive valence, positive arousal), anger, fear, irritation and anxiety (negative valence, positive arousal), despair and sadness (negative valence, negative arousal) and finally pleasure and relief (positive valence, negative arousal).The audio data used to represent the home user's emotional state applies the Geneva Multimodal Emotion Portrayals (GEMEP) which was also chosen as the dataset. The dataset contains 1260 short voice utterances, divided into 18 emotional classes. The data is split across 10 actors, of which half are male and other half female. Due to the fact that 6 out of 18 of the emotions occur very sparsely in the dataset, the classification was restricted to 12 separate emotions. These were amusement, pride, joy and interest (positive valence, positive arousal), anger, fear, irritation and anxiety (negative valence, positive arousal), despair and sadness (negative valence, negative arousal) and finally pleasure and relief (positive valence) , negative arousal).

Each case is connecting the mood configuration to real speaker utterances from the dataset by assigning each mood to the most appropriate emotions. The good mood is associated with the emotions amusement, joy and interest, the bad mood is associated with cold anger (irritation), hot anger, fear, despair, anxiety and sadness, and the neutral mood is associated with the emotions relief and pleasure.Each case is connecting the mood configuration to real speaker utterances from the dataset by assigning each mood to the most appropriate emotions. The good mood is associated with the emotions amusement, joy and interest, the bad mood is associated with cold anger (irritation), hot anger, fear, despair, anxiety and sadness, and the neutral mood is associated with the emotions relief and pleasure.

For each test trail, a speaker is randomly identified from the GEMEP dataset. For the given speaker, a mood configuration is selected and the relevant emotion features for that speaker, taken from the test set, were concatenated and used for mood profile determination. 12-way classification of the data was carried out using frontend factor analysis (l-vectors), using the ALIZE 3.0 framework.For each test trail, a speaker is randomly identified from the GEMEP dataset. For the given speaker, a mood configuration was selected and the relevant emotion features for that speaker, taken from the test set, were concatenated and used for mood profile determination. 12-way classification of the data was carried out using frontend factor analysis (l-vectors), using the ALIZE 3.0 framework.

The process is: 13 Mel Frequency Cepstral Coefficients (MFCCs) (including log energy), first and second derivatives is extracted to give a fixed 39-feature frame for each 25 ms voice frame, with a 10 ms overlap for each frame. MFCCs are simply a compact representation of the spectral envelope of a speech signal. A 128-component Gaussian mixture model (GMM) is trained with the entire training set. At this point, the six unused classes were not utilized further in the system. Using the data from the GMM, a total variability matrix is trained. Subsequent to this, for each utterance, a 90-dimensional l-Vector is extracted from the total variability matrix. Once in the l-vector space, classification of the utterances is then carried out using probabilistic linear discriminant analysis (PLDA) after performing normalization on the I-vectors. PLDA is known to exhibit good performance when used for the classification of I-Vectors.The process is: 13 Mel Frequency Cepstral Coefficients (MFCCs) (including log energy), first and second derivatives extracted to give a fixed 39 feature frame for each 25 ms voice frame, with a 10 ms overlap for each frame. MFCCs are simply a compact representation of the spectral envelope of a speech signal. A 128-component Gaussian mixture model (GMM) is trained with the entire training set. At this point, the six unused classes were not utilized further in the system. Using the data from the GMM, a total variability matrix is trained. Subsequent to this, for each utterance, a 90-dimensional l-Vector is extracted from the total variability matrix. Once in the l-vector space, classification of the utterances is then carried out using probabilistic linear discriminant analysis (PLDA) after performing normalization on the I-vectors. PLDA is known to exhibit good performance when used for the classification of I-Vectors.

Other System ParametersOther System Parameters

The affective state locations used for computing the directional mood vector were adopted from experiences. For the operations more pleasant and less pleasant the weights were set to wv = 0.45 and wa= 1 and for the operations more intensity and less intensity the weights were set to wv= 1 and wa = 0.45.The affective state locations used for computing the directional mood vector were adopted from experiences. For the operations more pleasant and less pleasant the weights were set to wv = 0.45 and wa = 1 and for the operations more intensity and less intensity the weights were set to wv = 1 and wa = 0.45.

The user interface used for presenting the items to users and used for critiquing was implemented in PHP.The user interface used to present the items to users and used for critiquing was implemented in PHP.

Figure 3 illustrate different modes of system operation: • (A) In which mood attributes are allocated to a collection of media files.Figure 3 illustrates different modes of system operation: • (A) In which mood attributes are allocated to a collection of media files.

This may be processed automatically by an application as a pre-process and offline i.e. not a real time system process. This mode might be a time consuming task to perform.This may be processed automatically by an application such as a pre-process and offline i.e. not a real time system process. This mode might be a time consuming task to perform.

This mode is useful when preparing a mood attributes related to a collection of media files candidates to be distributed via streaming from a service. • (B) In which the mood attributes may be allocated manually by a user, and related to a user selected and actual playing media file. • (C) In which a user spoken word or sentence is detected and mapped into a mood offset that's applied to address and select a corresponding media file to be rendered via a multimedia system, or a multimedia player.This mode is useful when preparing mood attributes related to a collection of media file candidates to be distributed via streaming from a service. • (B) In which the mood attributes may be allocated manually by a user, and related to a user selected and actual playing media file. • (C) In which a user spoken word or sentence is detected and mapped into a mood offset that is applied to address and select a corresponding media file to be rendered via a multimedia system, or a multimedia player.

This system being a: A/V system, TV, Audio System, Tablet, SmartPhone, Laptop, Desktop or alike.This system being a: A / V system, TV, Audio system, Tablet, SmartPhone, Laptop, Desktop or similar.

Figure 4 illustrate a system control concept into which the invention is embedded: • Media files are residing on Internet service providers (with e.g. music files) or on virtual services in the "cloud". Files may be available locally on the system device as AV media collection with the related Mood attributes and Mood mapping information. Media files to access, manage and execute are e.g. files/stream from Spotify and Netflix. • The system controller in the multimedia player manages the rendering AV means i.e. display and loudspeaker, and the user input enabled to accept touch based commands or key inputs. In addition remote control may be applied as well via wireless means (WiFi, Bluetooth). • The system is configured with at least one microphone enabled to detect the user speech sentences, this oral information being the source for the mood detection and corresponding system related events for selection media files to be current.Figure 4 illustrates a system control concept in which the invention is embedded: • Media files are residing on Internet service providers (e.g. music files) or on virtual services in the "cloud". Files may be available locally on the system device as AV media collection with the related Mood attributes and Mood mapping information. Media files to access, manage and execute are e.g. files / stream from Spotify and Netflix. • The system controller in the multimedia player manages the rendering AV means i.e. display and loudspeaker, and the user input enabled to accept touch based commands or key inputs. In addition, remote control may be applied as well via wireless means (WiFi, Bluetooth). • The system is configured with at least one microphone enabled to detect the user speech sentences, this oral information being the source for the mood detection and corresponding system related events for selection media files to be current.

The invention as disclosed is very applicable as a user friendly interface to multimedia, thus it enhance the experience, and relates to control and use of multimedia rendering systems, specifically for browsing and selecting media files in a more user friendly mode of operation. The media files including all or one of the data types: audio files, video files, audio streams and video streams, images, text or alike.The invention disclosed is very applicable as a user friendly interface to multimedia, thus enhancing the experience, and relates to control and use of multimedia rendering systems, specifically for browsing and selecting media files in a more user friendly mode of operation. The media files including all or one of the data types: audio files, video files, audio streams and video streams, images, text or alike.

Claims (5)

l. En metode til at identificere mood for en person via audio-parametre, og benytte dette mood til at forøge både aktuelle og fremtidige anbefalinger af enheder i et indhold; metoden inkluderer brug af audio-udtalelser, for personer hvis mood skal identificeres, og er indsamlet over en kort tidsperiode og for hver udtalelse er Mel Frequency Cepstral Coefficient egenskaber udledt og konverteret til i-vektorer, som er anvendt i en følelses-klasse med tildelt point til hver udtalelse versus det totale antal følelses-kategorier, hvor højeste point tildeles den mest sandsynlige følelseskategori af udtalelsen, og hvor laveste point tildeles den mindst sandsynlige følelseskategori af udtalelsen, og hvor metoden er karakteriseret ved: a. Vektoren for alle følelses-kategori-point benævnes følelsesprofil og er ifølge formel 1;l. A method of identifying a person's mood through audio parameters and using that mood to augment both current and future recommendations of devices in a content; the method includes the use of audio statements, for persons whose mood is to be identified, and is collected over a short period of time and for each statement, Mel Frequency Cepstral Coefficient properties are derived and converted into i-vectors used in an emotion class with assigned points for each opinion versus the total number of emotion categories, where highest points are assigned to the most likely emotion category of the opinion and where lowest points are assigned to the least likely emotion category of the opinion and where the method is characterized by: a. The vector for each emotion category -points are called emotion profile and are according to formula 1;
Figure DK178068B1C00191
Figure DK178068B1C00191
hvor Pi , 0 < pj < 1, repræsenterer den aktuelt forudsagte sandsynlighed for følelses-klassen j, 1<j < E, ogwhere Pi, 0 <pj <1, represents the currently predicted probability of the emotion class j, 1 <j <E, and
Figure DK178068B1C00192
Figure DK178068B1C00192
b. Individuelle følelsesprofiler, indsamlet på forskellige tidspunkter summeres for at give en mood-profil, og er angivet i formel 2;b. Individual emotion profiles, collected at different times, are summed to give a mood profile, and are given in Formula 2;
Figure DK178068B1C00193
Figure DK178068B1C00193
hvor svarer til fh tidsinterval T = Totale antal af diskrete tidsintervaller, og wi = vægtet værdi af for det /* tidsinterval c. antallet af komponenter i mood-profilen er lig antallet af følelseskategorier; d. hver følelses-kategori-point i mood-profilen er konverteret til en vektor, hvis retning er indikativ for den aktuelle følelse, og hvis størrelse er proportional med graden af konfidens for den følelse, og er angivet i formel 5;where equals fh time interval T = Total number of discrete time intervals, and wi = weighted value of for that / * time interval c. The number of components in the mood profile is equal to the number of emotion categories; d. each emotion category point in the mood profile is converted to a vector whose direction is indicative of the current emotion and the magnitude of which is proportional to the degree of confidence of that emotion and is expressed in formula 5;
Figure DK178068B1C00194
Figure DK178068B1C00194
hvor, for enhver følelse j, 1 <j<E, dannes en ny vektor Moodva' med størrelse um. og vinkel 07·, hvor vinklen er målt i grader fra den positive valence akse i VA rummet; e. der indgår det samme antal vektorer i følelses-rummet svarende til antallet af prædefinerede følelses-kategorier; f. alle vektorer er summeret for at opnå en enkelt orienteret mood-vektor som indkapsler den generelle mood, og er angivet i formel 6;where, for every emotion j, 1 <j <E, a new vector Moodva 'of size um is formed. and angle 07 ·, where the angle is measured in degrees from the positive valence axis in the VA space; e. including the same number of vectors in the emotion space corresponding to the number of predefined emotion categories; f. all vectors are summed to obtain a single oriented mood vector encapsulating the general mood, and are given in formula 6;
Figure DK178068B1C00201
Figure DK178068B1C00201
g. følelses-kategorier der tildeles en lav konfidens i mood-profilen vil spille en mindre rolle i generering af den generelle mood-vektor; h. den direktionelle mood-vektor benyttes til at identificere et initial objekt, der rekommanderes, dette ved at vælge det objekt som har den korteste ''Euclidean afstand" fra den vektor.g. Emotion categories assigned a low confidence in the mood profile will play a minor role in generating the general mood vector; h. The directional mood vector is used to identify an initial object that is recommended, by selecting the object that has the shortest "Euclidean distance" from that vector.
2. En metode til at identificere mood for en person ifølge krav 1, hvor brugeren kan a. vælge et første objekt eller søge at finde et alternativt andet objekt ved hjælp af kritisk vurdering; b. for hver iteration i en søgning findes alternativer ved at specificere tvangsbindinger, hvorved rummet indeholdende alle objekter reduceres som følge af den specificerede tvangsbinding; c. for hver iteration, findes det mest anvendelige objekt ved at minimere afstanden mellem det aktuelt rekommanderede objekt og det sæt af objekter, som er bundet af den nye tvangsbinding ifølge formel 8 og formel 9;A method of identifying the mood of a person according to claim 1, wherein the user can a. Select a first object or seek to find an alternative second object by means of critical appraisal; b. for each iteration of a search, alternatives are found by specifying forced bonds, thereby reducing the space containing all objects as a result of the specified forced binding; c. for each iteration, the most useful object is found by minimizing the distance between the currently recommended object and the set of objects bound by the new bond of formula 8 and formula 9;
Figure DK178068B1C00202
Figure DK178068B1C00202
hvor afstanden d(uém"c,Ttérns) er en vægtet form for standard ''Euclidean afstand" in VA rummet:where the distance d (uém "c, Ttérns) is a weighted form of standard" Euclidean distance "in the VA space:
Figure DK178068B1C00203
Figure DK178068B1C00203
d. ovenævnte gentages til et passende objekt er fundet.d. The above is repeated until a suitable object is found.
3. En metode til at identificere mood for en person ifølge krav 2, hvor mood-offset mellem det endeligt valgte objekt og initialt rekommanderede objekt er beregnet.A method of identifying the mood of a person according to claim 2, wherein the mood offset between the finally selected object and initially recommended object is calculated. 4. En metode til at identificere mood for en person ifølge krav 3, hvor mood offset er anvendt i fremtidige anbefalinger som en hurtig adgang til det ønskede område i VA rummet ifølge formel 10,11,12,13 og formel 14;A method for identifying mood for a person according to claim 3, wherein mood offset is used in future recommendations as a quick access to the desired area of the VA space according to Formulas 10,11,12,13 and Formula 14;
Figure DK178068B1C00211
Figure DK178068B1C00211
hvor A er en vektor gennem origin og et initial-punkt og B er en vektor gennem origin og et slut-punktwhere A is a vector through origin and an initial point and B is a vector through origin and an end point
Figure DK178068B1C00212
Figure DK178068B1C00212
hvor Py, Px er vektorer gennem origin til givne punkter hvor arctan(y;x) = Θ giver Θ i grader (-180 < Θ < 180) mod (Θ; 360) = Θ giver Θ i grader (0 < Θ < 360)where Py, Px are vectors through origin to given points where arctane (y; x) = Θ gives Θ in degrees (-180 <Θ <180) toward (Θ; 360) = Θ gives Θ in degrees (0 <Θ <360 )
Figure DK178068B1C00213
Figure DK178068B1C00213
hvor Diffc = B er lokaliseret med uret relativt til A Dif fee = B er lokaliseret mod uret relativt til Awhere Diffc = B is clockwise relative to A Difc = B is clockwise relative to A
Figure DK178068B1C00214
Figure DK178068B1C00214
5. En metode til at identificere mood for en person ifølge krav 4, hvor mood-offset beregningen er realiseret ved at modificere følelses-labels af detekterede følelser svarende til en mood-rotation ifølge ''Circumplex Model of Affect", i VA rummet.A method of identifying mood for a person according to claim 4, wherein the mood-offset calculation is realized by modifying emotion labels of detected emotions corresponding to a mood rotation according to the Circumplex Model of Affect in the VA room.
DK201400147A 2014-01-21 2014-03-17 Mood based recommendation DK178068B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
DK201400147A DK178068B1 (en) 2014-01-21 2014-03-17 Mood based recommendation

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
DKPA201400035 2014-01-21
DK201400035 2014-01-21
DK201400147 2014-03-17
DK201400147A DK178068B1 (en) 2014-01-21 2014-03-17 Mood based recommendation

Publications (1)

Publication Number Publication Date
DK178068B1 true DK178068B1 (en) 2015-04-20

Family

ID=52824622

Family Applications (1)

Application Number Title Priority Date Filing Date
DK201400147A DK178068B1 (en) 2014-01-21 2014-03-17 Mood based recommendation

Country Status (1)

Country Link
DK (1) DK178068B1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002043391A1 (en) * 2000-11-22 2002-05-30 Koninklijke Philips Electronics N.V. Method and apparatus for generating recommendations based on current mood of user
US20090182736A1 (en) * 2008-01-16 2009-07-16 Kausik Ghatak Mood based music recommendation method and system
US20130132988A1 (en) * 2011-11-21 2013-05-23 Electronics And Telecommunications Research Institute System and method for content recommendation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002043391A1 (en) * 2000-11-22 2002-05-30 Koninklijke Philips Electronics N.V. Method and apparatus for generating recommendations based on current mood of user
US20090182736A1 (en) * 2008-01-16 2009-07-16 Kausik Ghatak Mood based music recommendation method and system
US20130132988A1 (en) * 2011-11-21 2013-05-23 Electronics And Telecommunications Research Institute System and method for content recommendation

Similar Documents

Publication Publication Date Title
US11398236B2 (en) Intent-specific automatic speech recognition result generation
US9824150B2 (en) Systems and methods for providing information discovery and retrieval
US10964312B2 (en) Generation of predictive natural language processing models
US8930189B2 (en) Distributed user input to text generated by a speech to text transcription service
US9336772B1 (en) Predictive natural language processing models
US9190055B1 (en) Named entity recognition with personalized models
CN110164416B (en) Voice recognition method and device, equipment and storage medium thereof
Muckenhirn et al. Long-term spectral statistics for voice presentation attack detection
US20190318737A1 (en) Dynamic gazetteers for personalized entity recognition
JP2019216408A (en) Method and apparatus for outputting information
US9922650B1 (en) Intent-specific automatic speech recognition result generation
US11107462B1 (en) Methods and systems for performing end-to-end spoken language analysis
US20130132988A1 (en) System and method for content recommendation
WO2017051601A1 (en) Dialogue system, terminal, method for control of dialogue, and program for causing computer to function as dialogue system
CN111859008B (en) A method and terminal for recommending music
US12488800B2 (en) Display apparatus and operating method thereof
Shepstone et al. Using audio-derived affective offset to enhance tv recommendation
CN118467780A (en) Film and television search recommendation method, system, equipment and medium based on large model
Qu et al. Towards building voice-based conversational recommender systems: datasets, potential solutions and prospects
van Bemmel et al. Automatic selection of the most characterizing features for detecting COPD in speech
DK178068B1 (en) Mood based recommendation
CN118972752B (en) Audio recognition-based classification playback methods, audio equipment, and storage media
US9412395B1 (en) Narrator selection by comparison to preferred recording features
CN118098215A (en) Audio recognition model training method, device, storage medium and electronic device
CN121462835A (en) Methods, devices, equipment, and media for music recommendation and determination of musical emotion