WO2018154098A1

WO2018154098A1 - Method and system for recognizing mood by means of image analysis

Info

Publication number: WO2018154098A1
Application number: PCT/EP2018/054622
Authority: WO
Inventors: Javier Varona Gómez; Diana Arellano Távara; Miquel Mascaró Oliver; Cristina Manresa Yee; Simón Garcés Rayo; Juan Sebastián Filippini
Original assignee: Universitat De Les Illes Balears
Priority date: 2017-02-27
Filing date: 2018-02-26
Publication date: 2018-08-30
Also published as: ES2633152A1; ES2633152B1

Abstract

The invention relates to a mood recognition method for recognizing the mood of a subject (1) based on their relationship with facial expressions/movements. The method of the invention focuses on recognizing moods, a concept that is different from emotion. The manner of transforming the captured images of the subjects (1) into facial movements is customized, by learning the particular form of the facial features of the analyzed subject (1). The invention is based on the analysis of a set of a given number of images, but said number being greater than the number used in standard emotion recognition. A more robust mood recognition method is thereby defined. The method comprises three fundamental steps: defining general previous criteria and data, defining customized resting patterns, and evaluating the mood.

Description

DESCRIPTION

"Method and system for recognizing mood by means of image analysis"

FIELD OF THE INVENTION

The present invention is comprised in the technical field corresponding to the sector of artificial intelligence and facial expression recognition. More specifically, the invention relates to a mood recognition method based on image sequence processing.

BACKGROUND OF THE INVENTION

The recognition of emotions from facial expressions is a very dynamic field today given its various applications in the field of psychology, advertising or marketing, among others. Said recognition is typically performed according to the system known as the Facial Action Coding System (FACS). FACS allows analyzing human facial expressions through facial coding, and it can be used to classify virtually any anatomical facial expression by analyzing the possible movements of the muscles associated with said facial expression. These movements are divided into what is commonly referred to as Action Units (AU), which are the fundamental actions of muscles or individual muscle groups (for example, according to the mentioned classification, AU6 refers to raising cheeks). Terms such as action units, gestures, facial expressions and AU will be used interchangeably herein.

On the other hand, the terms mood and emotion are normally confused in colloquial language and in their formal definitions. There is a general consensus today that establishes at least three main differences between both terms:

- moods last longer than emotions do;

- moods are not outwardly expressed in a direct manner, unlike emotions;

- moods relate to emotions insofar as a person who is in a certain mood tends to experience certain emotions. In other words, by means of noticeable effects produced by emotions, facial expressions or gestures, it is possible to recognize a person's mood.

As mentioned, the applications for mood-based facial recognition may be very useful in various sectors, such as commercial or political marketing, human resources, video games, distance learning, digital signage and human-computer interactions in general.

In the field of facial recognition for recognizing emotions, different analysis technologies are known, such as those disclosed in patents US 8798374 B2, US 8879854 B2 or US 9405962 B2. These patent documents disclose systems focusing on recognizing emotions, not moods (which are different concepts), and their associated methods of analysis therefore focus on the recognition and processing of instantaneous images of the subjects under study. These patent documents primarily disclose the construction of a set of descriptors based on detectable geometric facial features, and a method of classifying AUs based on these descriptors. The heuristic definition of a set of rules is used to obtain suitable descriptors, and even automatic methods for selecting features in the context of automatic learning methods are used for the same purpose. Therefore, US 8798374 B2 discloses an automatic method for image processing for the detection of AUs, and US 8879854 B2 discloses a method and apparatus for recognizing emotions based on action units. The descriptors constructed in a heuristic manner have very little discriminatory power, fundamentally in interpersonal detection. This is why various lines of work have tended to construct more complex descriptors by means of automatic methods for selecting features. For example, US 9405962 B2 discloses a method for determining emotions in a set of images in the presence of a facial artifact (beard, mustache, glasses, etc.), including the detection of action units.

On the other hand, the "Pleasure-Arousal-Dominance" (or PAD) model is also known today as a theoretical framework for mood recognition. The PAD model is a system that allows defining and measuring different moods, emotional traits and personality traits as a function of three orthogonal dimensions: pleasure (P), arousal (A), and dominance (D). The PAD model is a framework that is generally used for defining moods and it allows the interrelation thereof with the facial coding in FACS. In other words, PAD can describe a mood in terms of action units. In the PAD model, based on the intersection of the pleasure, arousal and dominance axes, eight octants representing the basic categories of moods can be derived (Table 1 ).

Table 1 : Moods, PAD space octants.

It is possible to express emotions in terms of pleasure, arousal and dominance according to a certain correlation (Table 2). Therefore, a mood can give rise to various emotions. For example, the mood "anxious" can manifest itself in emotions such as "confused", "fearful", "worried", "ashamed", etc., which in turn can be related to action units (AUs).

Table 2: Example of emotions represented in PAD space. Particularly, it is possible to define the correspondence between AUs and PAD space octants by means of the PAD model. The main objective of this correspondence is the description of each of the eight moods in AU terms. The Facial Expression Repertoire (or FER) is known for this description. In the state of the art, the manner of transforming captured images of people into facial expressions/movements is through the use of generic methods, based on processing instantaneous images of the subjects subjected to analysis. However, these methods entail errors since the particular form of the facial features of the subject analyzed cannot be "learned" and customized, such that the emotion recognition method is more precise. Additionally, said methods of the state of the art are restricted to the identification of emotions (happiness, sadness, etc.), but they do not allow detecting complex constructs such as moods, the activation of which may comprise, at the same time, different configurations of emotions, sometimes even opposing emotions. For example, an anxious mood can be reflected in both a sad subject and in a happy subject. Therefore, the known solutions of the state of the art are still unable to solve the technical problem which entails providing a precise mood recognition method.

The present invention proposes a solution to this technical problem by means of a novel facial recognition method for recognizing moods in a set of images, which provides for the customization of the subject to minimize AU detection errors.

BRIEF DISCLOSURE OF THE INVENTION

The main object of the invention relates to a method for recognizing the mood of a subject based on their relationship with facial expressions/movements. The method of the invention focuses on recognizing moods, a concept that is technically different from emotion. In the method of the invention, the manner of transforming the captured images of the subjects into facial gestures/movements is customized, "learning" the particular form of the facial features of the person analyzed, such that the mood recognition method is more precise than if this customization were not performed. The mentioned object of the invention is performed by means of a mood recognition method for recognizing the mood of a subject based on facial images of said subjected obtained by means of a system comprising a camera suitable for taking said images, and a processor for storing and/or processing said images. Advantageously, said method comprises carrying out the following steps:

a) registering one or more facial images of the subject in a reference mood; b) defining a plurality of characteristic facial points of the subject in one or more of the images associated with the reference mood;

c) defining one or more resting patterns corresponding to the distances between the characteristic facial points of the subject, defined in step b);

d) defining one or more action units (AUs) corresponding to the movement of the facial points with respect to the resting patterns;

e) defining one or more activation rules of each action unit for the mood to be recognized based on threshold values associated with the amount of movement of the characteristic facial points with respect to the resting patterns;

f) defining a standard probability distribution associated with the activation of one or more action units associated with a mood;

g) registering a sequence of facial images of the subject that is associated with the mood to be recognized;

h) obtaining, for each image of the sequence, the activation probability distribution of the action units associated with the mood to be recognized, according to the rules defined in step e);

i) determining the similarity between the probability distribution obtained in step h) and the standard probability distribution defined in step f). A reliable and robust mood recognition method is thereby achieved, where image analysis is performed in sequences captured by the camera, such that said sequences allow dynamically evaluating the contribution of the action units to the mood of the subject. In another preferred embodiment of the invention, the mood recognition method further comprises carrying out the following steps in step f):

- defining a standard probability distribution associated with the activation of one or more action units associated with a mood /^', defining to that end a value p<, between 0 and 1 to designate the contribution of each action unit j, where the value 0 is assigned to the minimum contribution and the value 1 to the maximum;

- constructing with these values p,j a vector p, for each mood where n is the number of action units involved in determining the moods:

{Pij} = Cp {Pi1, Pi2,... Pin},

where C_p is a normalization constant for imposing the condition that .∑"/=·/ < =1 .

In another preferred embodiment of the invention, the mood recognition method further comprises carrying out the following steps in step h):

- registering a number W of facial images of the subject;

- obtaining, for the set of images W, the activation probability distribution of the action units j associated with the mood / to be recognized, defining to that end a value qj to designate the contribution of each action unit j, according to the expression:

qj= C_q (IM ^k-o Skj

where / =0,1 ,...,l l/; y=1 ,2,...,n; and s¾- is assigned the value s¾=1 if the action unit j is activated, and s¾=0 if the action unit j is not activated; and C_q is a normalization constant for imposing the condition that∑ⁿj=i q, =1.

In another preferred embodiment of the invention, the mood recognition method further comprises carrying out the following step in step i):

- determining the similarity between the probability distribution obtained in step h) and the standard probability distribution defined in step f) by calculating the Bhattacharyya coefficient, D,, for each mood /^', according to the expression:

More preferably, the W facial images of the subject are consecutive in the sequence captured by the camera.

In another preferred embodiment of the method of the invention, the set of n action units involved in determining the mood or moods of the subject are selected from all the action units existing in FACS.

More preferably, the action units involved in determining the mood or moods of the subject are one or more of the following: inner brow raiser; outer brow raiser; brow lowerer; upper lid raiser; cheek raiser; upper lip raiser; lip corner puller; lip corner depressor; lips part; jaw drop; eyes closed.

In another preferred embodiment of the method of the invention, the moods considered are the eight moods of the Pleasure-Arousal-Dominance (PAD) space.

More preferably, the relationship between the eight moods of the PAD space developed by Mehrabian and the action units that are activated in each of them are those defined in the Facial Expression Repertoire (FER).

In another preferred embodiment of the method of the invention, one or more resting patterns corresponding to the distances between the characteristic facial points of the subject are defined, with said distances being one or more of the following: middle right eye-eyebrow distance; inner right eye-eyebrow distance; middle left eye-eyebrow distance; inner left eye-eyebrow distance; open right eye distance; open left eye distance; horizontal mouth distance; upper mouth-nose distance; jaw-nose distance; almost lower mouth-outer mouth distance; left eyebrow-upper lid distance; left eyebrow-lower lid distance; right eyebrow-upper lid distance; right eyebrow-lower lid distance.

In another preferred embodiment of the method of the invention, the mood or moods of the subject are gauged in a session with known and controlled stimuli, such that one or more action units can be associated with one or more moods / of said subject.

Another object of the invention relates to a mood recognition system for recognizing the mood of a subject through the mood recognition method according to any of the embodiments described herein, comprising:

- a camera suitable for taking facial images of said subject;

- one or more processing means (3) for storing and/or processing the facial images, wherein said processing means (3) are configured by means of hardware and/or software for carrying out an emotional state recognition method according to any of the embodiments described herein.

In a preferred embodiment of the system of the invention, said system additional comprises a learning subsystem configured by means of hardware and/or software, to establish classification criteria for the sequences taken by the camera, as a function of results obtained in previous analyses. More preferably, said learning subsystem is locally or remotely connected to the processing means.

DESCRIPTION OF THE DRAWINGS Figure 1 shows a flowchart of the steps of the method of the invention according to a preferred embodiment thereof.

Figure 2 shows the characteristic facial points used in detecting action units of the method of the invention according to a preferred embodiment thereof. Figure 3 depicts the detection of the activation of an action unit (specifically, AU1 ) in a sequence of images upon comparing the minimum theoretical variation in pixels with the experimental variation of facial parameters with respect to the customized resting pattern parameters (in this case parameter P2).

Figure 4 shows a mood recognition system according to a preferred embodiment of the invention, showing in detail the elements thereof.

DETAILED DISCLOSURE OF THE INVENTION

A detailed description of the method of the invention is provided below in reference to a preferred embodiment thereof based on Figure 1 of the present patent document. Said embodiment is provided for the purpose of illustrating the claimed invention in a non-limiting manner.

One object of the invention relates to a mood recognition method for recognizing the mood of a subject (1 ) based on their relationship with facial expressions/movements. The method of the invention focuses on recognizing moods, a concept that is different from emotion. In defining said relationship, the theory existing between facial gestures/movements and emotions (FACS coding) and the theory relating emotions and moods (PAD model) are used. In the method of the invention, the manner of transforming the captured images of the subjects (1 ) into facial gestures/movements is customized, "learning" the particular form of the facial features of the analyzed subject (1 ), such that the mood recognition method is more precise than if this customization were not performed.

The method of the invention furthermore takes into account the prior history of the sequence of images (i.e., the recognition of expressions in the images preceding the processed image). The invention is therefore based on the analysis of a set of a given number of images, unlike methods based on instantaneous recognition for the identification of emotions.

According to Figure 1 , the method comprises three fundamental steps: defining general previous criteria and data, defining customized resting patterns, and evaluating the mood. Each of these steps is described below in detail.

1. Defining general previous criteria and data The method requires basic data, prior to the analysis of the mood of the subject (1 ):

- Firstly, a subset n of action units (AUs), which are considered sufficient for being able to describe and recognize any mood of the PAD space, must be selected from among all those existing in FACS. For example, Table 3 shows a possible subset n=11. Therefore, there is a set of gestures or AUj the combination of which gestures can describe moods, where y^'=1 ,2,...n.

Table 3: Subset of action units considered in mood recognition.

- Secondly, a previous criterion relating the eight moods of the PAD space developed by Mehrabian with facial gestures or action units (AUs) that are activated in each of them is required. Table 4 shows all this starting data defined by Russel and Mehrabian and the Facial Expression Repertoire (FER) according to the subset of action units considered.

Mood Active AUs

Exuberant AU5, AU6, AU12, AU25, AU26

Anxious AU1 , AU2, AU4, AU5, AU15, AU25, AU26 Bored AU1 , AU2, AU4, AU15, AU43

Docile AU1 , AU2, AU12, AU43

Hostile AU4, AU 10, AU5, AU15, AU25, AU26

Relaxed AU6, AU12, AU43

Dependent AU1 , AU2, AU5, AU12, AU25, AU26

Disdainful AU4, All 15, AU43

Table 4. Active AUs per PAD quadrant

The starting data must also indicate the importance of each gesture or AUj in the corresponding mood. To that end, a number between 0 and 1 is defined to determine the weight of each gesture or AUj. If an AUj is highly determinant, it is assigned the value 1 , whereas if it is not important for a certain mood, it is assigned the value 0. A vector is constructed for each mood / with these values. This vector is called p,, for example: p,= C_P {pij} = C_P {pi, p_i2l... Pin} = C_P {1 ,1 ,1 ,0.7,0,0,0,1 ,1 ,1 ,0} (Eq. 1 )

Each pij is a scalar that determines the importance of an AUj in the mood /^', and C_p is a normalization constant for imposing the condition that .∑"/= p<, =1 in Eq. 1. Then p,- is a pattern of the mood that relates it with gestures or AUs.

A standard probability distribution associated with the activation of one or more action units associated with a mood is thereby defined.

2. Defining customized resting patterns

In a second step, the method requires defining criteria for activating each AUj when can be used to determine if a gesture or AUj has been made by the subject (1 ) under study when interpreting the image data. To define said criteria, the following steps are carried out:

- Registering one or more facial images of the subject (1 ) in a reference mood.

- Defining a plurality of characteristic facial points of the subject (1 ) in one or more of the images in the reference mood. For example, as shown in Figure 2, 24 facial points or curves can be taken. These characteristic points are strategically associated with the facial points or curves that are most susceptible to undergoing changes in position upon activating one or more AUj. - A plurality of distances between those characteristic facial points selected in the preceding step are defined. These distances are called parameters P. As an example, 15 distance parameters that will be used in detecting AUs are defined in Table 5.

Table 5. Distance parameters for detecting AUs.

- Defining one or more resting patterns for each parameter P. The definition of these resting patterns includes the definition of a mean value μ and a maximum deviation σ from the mean value. These resting patterns must be found for each subject (1 ) subjected to the method of facial recognition analysis. It is a step included in each analysis, not a prior independent gauging. - Defining one or more rules relating the non-resting measurements with the resting patterns to indicate the activation of each AUj. If a comparison with respect to the resting pattern

is a positive number, it is an expansion ΔΡ(+), and if in contrast there is a negative difference, it is a contraction of this facial parameter ΔΡ(-). Table 6 shows an example of a set of rules for detecting activations of AUs that describe a threshold value for each variation of parameters relating to the AUs and are defined as a function of the deviation σ. For example, if in an image ΔΡ₇(+) > 2σ, AU12 will have been activated.

Table 6. Rules used in detecting AUs.

3. Evaluating the mood With these steps described above, it is possible to determine changes in facial parameters in a consecutive image package, as shown in Figure 3 by way of example. According to the activation rules, what the theoretical change for activating an AU should be like can be compared with the actual changes experienced by the subject (1 ) throughout a sequence of images. Figure 3 shows as an example the rule for activating AU1 with respect to parameter P2 and the experimental value of parameter P2 in pixels in a sequence of images. Since the theoretical variation fits with the experimental variation, AU1 is considered to have been activated in that set of images.

The method of the invention then comprises a final step of comparison to carry out the final step of evaluating the mood:

Assume that there is a number Wof images, which are preferably consecutive images. If each of those images is compared with the criteria for activating the AUs, it can be determined if it has been activated for each gesture or AUj. By repeating that comparison with all the images, it is possible to determine if it has been activated in one or in several images. In other words, an occurrence or relevance value can be obtained for each gesture or AUj. Each of those occurrence values can be referred to as qj. Each q is calculated with the following expression: q_j= C_q (1/W)∑^Wk-o S_kj (Eq. 2) where / =0,1 ,...,l l/ and each k designates an image and where Skj represents the activation or non-activation of the gesture AUj. If the gesture AUj has been activated, Skj is assigned the value Skj=1, whereas if it has not been activated, Skj=0. Finally, C_q is a normalization constant for imposing the condition that∑ⁿj=i q, =1 in Eq. 2.

With the set of resulting scalars a vector <¾={¾} having the same dimensions as pattern p, can be constructed, but this time it denotes the experimental weight or activation probability distribution of each gesture in a set of images W associated with the mood to be recognized.

The final step of the method of the invention consists of comparing the pattern with the experiment. To that end, the Bhattacharyya coefficient, D,, is used for each mood:

This coefficient gives a value indicating the proximity of the probability distribution of the experiment with respect to the standard probability distribution.

Through these steps it is possible to determine the mood or moods that are closer to the experimental mood of the subject (1 ) under analysis. In conclusion, this invention considers the use of descriptors of the temporary dynamics of a person's facial expression to determine said person's mood. These descriptors encode the importance of the occurrence of each AU for each mood. The invention uses a method of detection AUs capable of learning the particular parameters of the appearance of the facial movement in a customized manner in the same analysis session without a prior learning step. The final system that is provided also allows the possibility of defining a temporary analysis parameter W relating to the set of images to be processed, which allows the correct robust interpretation over partial errors of the mood of the person participating in the analysis. The analysis process is an iteration the duration of which depends on the number of image sequences.

Alternatively, it is possible to perform a special gauging of the subject (1 ) in a session with known stimuli, which allow evaluating the degree of response of the subject (1 ) to standard stimuli to then recognize moods in non-standard stimuli with greater precision.

Another object of the invention relates to a facial recognition system for recognizing the mood of a subject (1 ) through the mood recognition method such as the one described in the preceding embodiment, comprising:

- a camera (2) suitable for taking facial images of said subject (1 );

- one or more processing means (3) for storing and/or processing the facial images, where said processing means (3) are configured by means of hardware and/or software for carrying out an emotional state recognition method according to any of the embodiments described herein.

The system of the invention can additionally comprise a learning subsystem configured by means of hardware and/or software, to establish classification criteria for the sequences taken by the camera (2), as a function of results obtained in previous analyses. This allows progressively improving system precision and feeding the previously obtained information back into said system, associating certain action units with moods of the subject, in a customized manner. The learning subsystem can be locally and remotely connected to the processing means (3).

Claims

1 . A mood recognition method for recognizing the mood of a subject (1 ) based on facial images of said subject (1 ) obtained by means of a system comprising a camera (2) suitable for taking said images, and a processor (3) for storing and/or processing said images; where said method is characterized in that it comprises carrying out the following steps:

a) registering one or more facial images of the subject (1 ) in a reference mood;

b) defining a plurality of characteristic facial points of the subject (1 ) in one or more of the images associated with the reference mood;

c) defining one or more resting patterns corresponding to the distances between the characteristic facial points of the subject (1 ), defined in step b);

g) registering a sequence of facial images of the subject (1 ) that is associated with the mood to be recognized;

i) determining the similarity between the probability distribution obtained in step h) and the standard probability distribution defined in step f).

2. The mood recognition method according to the preceding claim, wherein:

- a standard probability distribution associated with the activation of one or more action units associated with a mood / is defined, defining to that end a value p<, between 0 and 1 to designate the contribution of each action unit j, where the value 0 is assigned to the minimum contribution and the value 1 is assigned to the maximum contribution;

- a vector p, is constructed with these values p<, for each mood where n is the number of action units involved in determining the moods:

Cp {Pij} = Cp {Pi1, Pi2,... Pin} ,

where C_p is a normalization constant for imposing the condition that∑"_i

3. The mood recognition method according to any of the preceding claims, wherein:

- a number W of facial images of the subject (1 ) are registered;

- the activation probability distribution of the action units j associated with the mood / to be recognized is obtained for the set of images W, defining to that end a value ¾ to designate the contribution of each action unit j, according to the expression: where / =0,1 , ..., l l/; y=1 ,2,... ,n; and s¾- is assigned the value s¾=1 if the action unit j is activated, and

if the action unit j is not activated; and C_q is a normalization constant for imposing the condition that∑ⁿj=i q, =1 .

4. The mood recognition method according to any of the preceding claims, wherein the similarity between the probability distribution obtained in step h) and the standard probability distribution defined in step f) is determined by calculating the Bhattacharyya coefficient, D,, for each mood /^', according to the expression:

5. The mood recognition method according to the preceding claim, wherein the W facial images of the subject (1 ) are consecutive images in a sequence captured by the camera (2).

6. The mood recognition method according to any of the preceding claims, wherein the set of n action units involved in determining the mood or moods of the subject (1 ) are selected from all the action units existing in the Facial Action Coding System (FACS).

7. The mood recognition method according to the preceding claim, wherein the action units involved in determining the mood or moods of the subject (1 ) are one or more of the following: inner brow raiser; outer brow raiser; brow lowerer; upper lid raiser; cheek raiser; upper lip raiser; lip corner puller; lip corner depressor; lips part; jaw drop; eyes closed.

8. The mood recognition method according to any of the preceding claims, wherein the moods considered are the eight moods of the PAD space developed by Mehrabian.

9. The mood recognition method according to the preceding claim, wherein the relationship between the eight moods of the PAD space developed by Mehrabian and the action units that are activated in each of them are those defined by Russel and Mehrabian and the Facial Expression Repertoire (FER).

10. The mood recognition method according to any of the preceding claims, wherein one or more resting patterns corresponding to the distances between the characteristic facial points of the subject (1 ) are defined, with said distances being one or more of the following: middle right eye-eyebrow distance; inner right eye-eyebrow distance; middle left eye-eyebrow distance; inner left eye-eyebrow distance; open right eye distance; open left eye distance; horizontal mouth distance; upper mouth-nose distance; jaw-nose distance; almost lower mouth- outer mouth distance; left eyebrow-upper lid distance; left eyebrow-lower lid distance; right eyebrow-upper lid distance; right eyebrow-lower lid distance.

1 1 . The mood recognition method according to any of the preceding claims, wherein the mood or moods of the subject (1 ) are gauged in a session with known stimuli.

12. A mood recognition system for recognizing the mood of a subject (1 ) through a mood recognition method according to any of claims 1 to 1 1 , comprising:

- a camera (2) suitable for taking facial images of said subject (1 );

- a processor (3) for storing and/or processing the facial images;

one or more processing means (3) for storing and/or processing the facial images, wherein said processing means (3) are configured by means of hardware and/or software for carrying out a mood recognition method according to any of the preceding claims.

13. The mood recognition system for recognizing the mood of a subject (1 ) according to the preceding claim, wherein the images of the registered sequence are consecutive images obtained by the camera (2).

14. The mood recognition system for recognizing the mood of a subject (1 ) according to any of the preceding claims, additionally comprises a learning subsystem configured by means of hardware and/or software to establish classification criteria for the sequences taken by the camera (2) as a function of results obtained in previous analyses, wherein said learning subsystem is locally or remotely connected to the processing means (3).