CN111816211B

CN111816211B - Emotion recognition method and device, storage medium and electronic equipment

Info

Publication number: CN111816211B
Application number: CN201910282465.3A
Authority: CN
Inventors: 陈仲铭; 何明
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2023-06-02
Anticipated expiration: 2039-04-09
Also published as: CN111816211A

Abstract

The embodiment of the application discloses an emotion recognition method, device, storage medium and electronic equipment, wherein the electronic equipment can acquire text content input by a user, perform emotion recognition according to the text content and a first emotion recognition model trained in advance to acquire a first candidate emotion of the user, acquire sound content during the text content input by the user, perform emotion recognition according to the sound content and a second emotion recognition model trained in advance to acquire a second candidate emotion of the user, and finally determine a target emotion of the user according to the first candidate emotion and the second candidate emotion. Therefore, according to the method and the device for identifying the emotion of the user, the emotion of the user is identified based on different information sources, and the target emotion of the user is finally determined by combining emotion identification results obtained by the different information sources, so that the user emotion can be accurately identified.

Description

Emotion recognition method and device, storage medium and electronic equipment

Technical Field

The application relates to the technical field of data processing, in particular to a method and a device for identifying emotion, a storage medium and electronic equipment.

Background

Human beings as a group with extremely strong emotion factors have moods of happiness, anger, anxiety, thinking, sadness, terrorism, frightening and the like. The electronic device may provide an intelligent service to the user by recognizing the emotion of the user, such as "speaking" a joke to the user when the user is recognized as not happy. However, in the related art, emotion recognition methods based on an emotion dictionary are generally used to recognize the emotion of a user, for example, to recognize emotion words in text content input by the user, and then to match corresponding emotion according to the emotion dictionary, but such emotion recognition is not accurate.

Disclosure of Invention

The embodiment of the application provides a method and a device for recognizing emotion, a storage medium and electronic equipment, which can realize accurate recognition of emotion of a user.

In a first aspect, an embodiment of the present application provides an emotion recognition method, applied to an electronic device, where the emotion recognition method includes:

acquiring text content input by a user, and carrying out emotion recognition according to the text content and a first emotion recognition model trained in advance to obtain a first candidate emotion of the user;

acquiring sound content during the text content input period of a user, and carrying out emotion recognition according to the sound content and a second emotion recognition model trained in advance to obtain a second candidate emotion of the user;

and determining the target emotion of the user according to the first candidate emotion and the second candidate emotion.

In a second aspect, an embodiment of the present application provides an emotion recognition device, applied to an electronic device, where the emotion recognition device includes:

the first emotion recognition module is used for acquiring text content input by a user, and carrying out emotion recognition according to the text content and a first emotion recognition model trained in advance to obtain a first candidate emotion of the user;

The second emotion recognition module is used for acquiring sound content during the text content input period of the user, and carrying out emotion recognition according to the sound content and a pre-trained second emotion recognition model to obtain a second candidate emotion of the user;

and the target emotion recognition module is used for determining the target emotion of the user according to the first candidate emotion and the second candidate emotion.

In a third aspect, embodiments of the present application provide a storage medium having stored thereon a computer program which, when run on a computer, causes the computer to perform the steps in the emotion recognition method as provided by embodiments of the present application.

In a fourth aspect, embodiments of the present application provide an electronic device, including a processor and a memory, where the memory has a computer program, and the processor is configured to execute steps in the emotion recognition method as provided in the embodiments of the present application by calling the computer program.

In the embodiment of the application, the electronic device may acquire text content input by the user, perform emotion recognition according to the text content and the first emotion recognition model trained in advance to obtain a first candidate emotion of the user, then acquire sound content during the text content input by the user, perform emotion recognition according to the sound content and the second emotion recognition model trained in advance to obtain a second candidate emotion of the user, and finally determine a target emotion of the user according to the first candidate emotion and the second candidate emotion. Therefore, according to the method and the device for identifying the emotion of the user, the emotion of the user is identified based on different information sources, and the target emotion of the user is finally determined by combining emotion identification results obtained by the different information sources, so that the user emotion can be accurately identified.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of a panoramic sensing architecture according to an embodiment of the present application.

Fig. 2 is a schematic flow chart of an emotion recognition method according to an embodiment of the present application.

Fig. 3 is another flow chart of an emotion recognition method according to an embodiment of the present application.

Fig. 4 is an application scenario schematic diagram of an emotion recognition method provided in an embodiment of the present application.

Fig. 5 is a schematic structural diagram of an emotion recognition device according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Fig. 7 is another schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements throughout, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on the illustrated embodiments of the present application and should not be taken as limiting other embodiments not described in detail herein.

With miniaturization and intellectualization of sensors, electronic devices such as mobile phones and tablet computers integrate more and more sensors, such as light sensors, distance sensors, position sensors, acceleration sensors, gravity sensors, and the like. The electronic device can collect more data with less power consumption by its configured sensors. Meanwhile, the electronic equipment can acquire data related to the state of the electronic equipment and data related to the state of a user in the running process, and the like. In general, the electronic device can acquire data related to external environments (such as temperature, illumination, place, sound, weather, etc.), data related to user states (such as gestures, speed, mobile phone usage habits, personal basic information, etc.), and data related to electronic device states (such as power consumption, resource usage status, network status, etc.).

In the embodiment of the application, in order to process the data acquired by the electronic device and provide intelligent service for a user, a panoramic sensing architecture is provided. Referring to fig. 1, fig. 1 is a schematic structural diagram of a panoramic sensor architecture provided in an embodiment of the present application, which is applied to an electronic device and includes a bottom-to-top information sensor layer, a data processing layer, a feature extraction layer, a scenario modeling layer, and an intelligent service layer.

As the bottom layer of the panorama sensing architecture, the information sensing layer is used for acquiring raw data capable of describing various types of scenes of a user, including dynamic data and static data. The information sensing layer is composed of a plurality of sensors for data acquisition, including but not limited to a distance sensor for detecting the distance between the electronic equipment and an external object, a magnetic field sensor for detecting magnetic field information of the environment where the electronic equipment is located, a light sensor for detecting light information of the environment where the electronic equipment is located, an acceleration sensor for detecting acceleration data of the electronic equipment, a fingerprint sensor for acquiring fingerprint information of a user, a Hall sensor for sensing magnetic field information, a position sensor for detecting the current geographic position where the electronic equipment is located, a gyroscope for detecting angular velocity of the electronic equipment in all directions, a motion data inertial sensor for detecting motion data inertial information of the electronic equipment, a gesture sensor for sensing gesture information of the electronic equipment, a barometer for detecting air pressure of the environment where the electronic equipment is located, a heart rate sensor for detecting heart rate information of the user and the like.

The data processing layer is used for processing the original data acquired by the information sensing layer as a sub-bottom layer of the panoramic sensing architecture, and the problems of noise, inconsistency and the like of the original data are eliminated. The data processing layer can perform data cleaning, data integration, data transformation, data reduction and the like on the data acquired by the information sensing layer.

And the feature extraction layer is used for extracting features of the data processed by the data processing layer as a middle layer of the panoramic sensing architecture so as to extract the features included in the data. The feature extraction layer may extract features by filtration, packaging, integration, or the like, or process the extracted features.

Filtering means that the extracted features are filtered to delete redundant feature data. Packaging methods are used to screen the extracted features. The integration method is to integrate multiple feature extraction methods together to construct a more efficient and accurate feature extraction method for extracting features.

As a next-higher layer of the panoramic perception architecture, the scene modeling layer is configured to construct a model according to the features extracted by the feature extraction layer, and the obtained model may be used to represent a state of the electronic device or a user state or an environmental state, or the like. For example, the scenario modeling layer may construct a key value model, a pattern identification model, a graph model, a physical relationship model, an object-oriented model, and the like from the features extracted by the feature extraction layer.

As the highest layer of the panoramic sensing architecture, the intelligent service layer is used for providing intelligent service according to the model constructed by the scene modeling layer. For example, the intelligent service layer can provide basic application service for users, can perform system intelligent optimization service for electronic equipment, and can provide personalized intelligent service for users.

In addition, the panoramic sensing architecture further comprises an algorithm library, wherein the algorithm library comprises but is not limited to an illustrated Markov algorithm, an implicit Dirichlet distribution algorithm, a Bayesian classification algorithm, a support vector machine, a K-means clustering algorithm, a K-nearest neighbor algorithm, a conditional random field, a residual network, a long-term and short-term memory network, a convolutional neural network, a cyclic neural network and the like.

Based on the panoramic sensing architecture provided by the embodiment of the present application, the embodiment of the present application provides an emotion recognition method, and an execution subject of the emotion recognition method may be an emotion recognition device provided by the embodiment of the present application, or an electronic device integrated with the emotion recognition device, where the emotion recognition device may be implemented in a hardware or software manner. The electronic device may be a device with a processing capability, such as a smart phone, a tablet computer, a palm computer, a notebook computer, or a desktop computer, configured with a processor.

Based on the activity prediction method provided by the embodiment of the application, panoramic data can be acquired in the information sensing layer and provided for the data processing layer; screening text content input by a user from panoramic data at a data processing layer, and providing sound content during text content input by the user to a feature extraction layer; the feature extraction layer extracts the features of the text content and the sound content respectively to obtain feature vectors corresponding to the text content and feature vectors corresponding to the sound content, and the feature vectors are provided for the scene modeling layer; respectively carrying out emotion recognition on the feature vector of the text content corresponding to the scene modeling layer and the feature vector of the sound content corresponding to the scene modeling layer to obtain a first candidate emotion of the text content corresponding to the first candidate emotion and a second candidate emotion of the sound content corresponding to the second candidate emotion, and determining a target emotion of the user according to the first candidate emotion and the second candidate emotion to provide the target emotion for the intelligent service layer; and corresponding operations are executed at the intelligent service layer according to the target emotion of the user, such as the text content and the target emotion are sent to the corresponding target equipment, so that other users can view the text content input by the user, know the emotion of the user when inputting the text content, and better communication is facilitated.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for emotion recognition according to an embodiment of the present application. As shown in fig. 2, the flow of the emotion recognition method provided in the embodiment of the present application may be as follows:

in 101, text content input by a user is acquired, and emotion recognition is performed according to the text content and a first emotion recognition model trained in advance, so as to obtain a first candidate emotion of the user.

In the embodiment of the application, the electronic device detects the user input, and triggers emotion recognition of the user when detecting the text content input by the user. Text content includes, but is not limited to, words, sentences, articles, and the like. For example, in the process that a user carries out a conversation with other users through an instant messaging application installed on the electronic equipment, the electronic equipment triggers emotion recognition on the user when detecting the text content of chat input by the user.

When triggering emotion recognition of a user, the electronic equipment firstly acquires text content input by the user so as to primarily recognize the emotion of the user according to the text content. It should be noted that, in the embodiment of the present application, a first emotion recognition model for recognizing the emotion of the user from the text content input by the user is trained in advance. For example, an initial convolutional neural network model is built in advance, a text content sample is obtained, emotion corresponding to the text content sample is calibrated, a corresponding emotion label is obtained, the initial convolutional neural network is trained according to the text content sample and the calibrated emotion label, and the trained convolutional neural network is used as a first emotion recognition model for performing emotion recognition on text content of a user. The first emotion recognition model may be stored locally on the electronic device or in a remote server. In this way, the electronic device further obtains the first emotion recognition model for emotion recognition of the text content input by the user from the local after obtaining the text content input by the user, or obtains the first emotion recognition model for emotion recognition of the text content input by the user from the remote server. After the first emotion recognition model is acquired, the electronic device utilizes the first emotion recognition model to carry out emotion recognition on text content input by the user, and the emotion recognized at the moment is recorded as a first candidate emotion of the user.

In 102, sound content during text content input by a user is acquired, and emotion recognition is performed according to the sound content and a second emotion recognition model trained in advance, so as to obtain a second candidate emotion of the user.

It should be noted that, in the embodiment of the present application, the electronic device performs emotion recognition according to the voice of the user in addition to performing emotion recognition according to the text content input by the user. Correspondingly, a second emotion recognition model for recognizing the emotion of the user according to the voice content of the user is also trained in advance in the embodiment of the application. For example, an initial convolutional neural network model is built in advance, a sound content sample is obtained, emotion corresponding to the sound content sample is calibrated, a corresponding emotion label is obtained, the initial convolutional neural network is subjected to supervised training according to the sound content sample and the calibrated emotion label, and the trained convolutional neural network is used as a second emotion recognition model for performing emotion recognition on sound content of a user. The second emotion recognition model may be stored locally on the electronic device or in a remote server. In this way, the electronic device further obtains the second emotion recognition model for emotion recognition of the user's voice content from the local after obtaining the voice content during the user's input of the text content, or obtains the second emotion recognition model for emotion recognition of the user's voice content from a remote server. After the second emotion recognition model is acquired, the electronic device performs emotion recognition on the sound content during the text content input by the user by using the second emotion recognition model, and marks the emotion recognized at this time as a second candidate emotion of the user.

The electronic equipment starts an internal microphone or an external microphone to collect sound when detecting text content input by a user, so that the sound content of the user during the text content input period is collected. In this way, the electronic device can directly acquire the sound content during the text content input by the user, which is acquired before, when performing emotion recognition according to the sound of the user.

In 103, a target emotion of the user is determined from the first candidate emotion and the second candidate emotion.

From the above description, it will be appreciated by those skilled in the art that the first candidate emotion and the second candidate emotion obtained above are obtained through independent information sources, and therefore, in order to ensure accuracy of emotion recognition of the user, the electronic device further performs comprehensive analysis according to the first candidate emotion and the second candidate emotion, and finally determines the target emotion of the user.

As can be seen from the above, in the embodiment of the present application, the electronic device may obtain text content input by the user, perform emotion recognition according to the text content and the first emotion recognition model trained in advance, obtain a first candidate emotion of the user, then obtain sound content during the text content input by the user, perform emotion recognition according to the sound content and the second emotion recognition model trained in advance, obtain a second candidate emotion of the user, and finally determine a target emotion of the user according to the first candidate emotion and the second candidate emotion. Therefore, according to the method and the device for identifying the emotion of the user, the emotion of the user is identified based on different information sources, and the target emotion of the user is finally determined by combining emotion identification results obtained by the different information sources, so that the user emotion can be accurately identified.

In an embodiment, "determining a target emotion of a user from a first candidate emotion and a second candidate emotion" includes:

inputting the first candidate emotion and the second candidate emotion into a pre-trained Bayesian classifier for classification, and obtaining the target emotion of the user output by the Bayesian classifier.

Since the first candidate emotion and the second candidate emotion obtained in the above manner are obtained through independent information sources, in order to combine the first candidate emotion and the second candidate emotion to obtain the target emotion of the user, in the embodiment of the application, a bayesian classifier for performing secondary emotion classification on the candidate emotion of the text source and the candidate emotion of the sound source is also trained in advance. For example, an emotion sample of a text source and an emotion sample of a corresponding sound source can be obtained, the emotion sample of the text source and the emotion text of the corresponding sound source are subjected to emotion calibration to obtain a corresponding emotion label, and training is performed according to the emotion sample of the text source and the emotion text of the corresponding sound source and the corresponding emotion label to obtain the Bayesian classifier.

The bayesian classifier obtained through training can be stored locally in the electronic equipment or in a remote server. In this way, when determining the target emotion of the user according to the first candidate emotion and the second candidate emotion, the electronic device further obtains the bayesian classifier from the local or obtains the bayesian classifier from a remote server, so that the first candidate emotion of the text source and the second candidate emotion of the sound source, which are obtained before, are input into the bayesian classifier to be classified, and the emotion output by the bayesian classifier is used as the target emotion finally obtained by performing emotion recognition on the user.

In one embodiment, "performing emotion recognition according to text content and a first emotion recognition model trained in advance, obtaining a first candidate emotion of a user" includes:

(1) Extracting the characteristics of the text content to obtain corresponding characteristic vectors;

(2) And converting the feature vector into a corresponding feature tensor, inputting the feature tensor into the first emotion recognition model for emotion recognition, and obtaining a first candidate emotion of the user output by the first emotion recognition model.

In the embodiment of the application, when performing emotion recognition according to the text content and the first emotion recognition model trained in advance, the electronic device does not directly input the original text content into the first emotion recognition model to perform prediction, but inputs the characteristics capable of representing the original text content into the first emotion recognition model to perform emotion recognition after processing the original text content.

The electronic equipment firstly adopts a preset feature extraction technology to extract features of text contents input by a user, converts the text contents into corresponding vectors, and records the corresponding vectors as feature vectors. The electronic device then further combines the feature vectors of the corresponding text content into tensors, denoted feature tensors.

It should be noted that, as with vectors and matrices, tensors are also a data structure, but tensors are a three-dimensional and more data structure, where the dimension of the data is referred to as the order of the tensor, and can be considered as a generalization of vectors and matrices in multidimensional space, as a first order tensor, and as a second order tensor.

Correspondingly, the first emotion recognition model is trained by using the characteristic tensor and the calibrated emotion label after the corresponding characteristic tensor of the text content sample is obtained in the same mode instead of training according to the original text content sample during training. Thus, after the feature vector is converted into the corresponding feature tensor, the feature tensor can be input into the first emotion recognition model for emotion recognition, and the first candidate emotion of the user output by the first emotion recognition model is obtained.

In one embodiment, "extracting features of text content to obtain corresponding feature vectors" includes:

and extracting keywords included in the text content, and mapping the keywords to a vector space through a word embedding model to obtain feature vectors.

It will be appreciated by those of ordinary skill in the art that not all of the text content entered by the user is significant, and that feature extraction of the complete text content will affect the overall efficiency of emotion recognition for the user. Therefore, when the electronic device in the embodiment of the application performs feature extraction on the text content input by the user, a preset keyword extraction algorithm is adopted to extract keywords in the text content, the extracted keywords are used for representing the complete text content, the content needing feature extraction is reduced, and the purpose of improving emotion recognition efficiency is achieved. The embodiment of the application is not particularly limited, and one of ordinary skill in the art can select a suitable keyword extraction algorithm according to actual needs. For example, in the embodiment of the application, the electronic device may use an ID-TIF algorithm to extract keywords from text content input by the user, and assume that the text content input by the user is a sentence "i want you today", and use the ID-TIF algorithm to extract keywords to obtain keywords as "today" and "want you".

After extracting keywords capable of representing text contents, the electronic device further maps the keywords extracted from the text contents to a vector space through a word embedding model to obtain feature vectors of the corresponding text contents. Word embedding models include, but are not limited to, word2vec models, gloVe models, fastText models, ELMo models, and the like.

In an embodiment, before the inputting the feature tensor into the first emotion recognition model for emotion recognition, the method further includes:

and performing zero filling processing on the characteristic tensor.

As will be appreciated by those of ordinary skill in the art, when a user inputs text content into an electronic device, the amount of data of the feature tensor obtained for each text content input by the user is different and cannot be aligned within the feature tensor due to the different length of each text content input by the user. Therefore, before inputting the feature tensor into the first emotion recognition model for emotion recognition, the electronic device in the embodiment of the application further performs zero filling processing on the feature tensor, so that the feature tensor is internally aligned, and the data volume of the feature tensor reaches the preset data volume.

Correspondingly, when the first emotion recognition model is trained, the feature tensor corresponding to the text content sample is subjected to zero filling processing in the same alignment manner, so that the interior of the first emotion recognition model is aligned, and the data volume reaches the preset data volume.

In an embodiment, "performing emotion recognition according to the sound content and the second emotion recognition model trained in advance, obtaining the second candidate emotion of the user" includes:

(1) Dividing the sound content into a plurality of sub-sound contents;

(2) Respectively inputting the plurality of sub-sound contents into a second emotion recognition model to perform emotion recognition to obtain a plurality of corresponding candidate emotions;

(3) A second candidate emotion of the user is determined from the plurality of candidate emotions.

In this embodiment of the present application, when performing emotion recognition according to a sound content and a second emotion recognition model trained in advance to obtain a second candidate emotion of a user, the electronic device first divides the sound content during the period when the user inputs the text content into a plurality of sub-sound contents with the same length. When dividing the sound content, two adjacent sub-sound contents may or may not have the same sound portion.

After dividing the complete sound content into a plurality of sub-sound contents, the electronic device respectively converts the plurality of sub-sound contents into corresponding spectrograms, and represents the corresponding sub-sound contents by using the spectrograms, for example, the spectrograms can be converted by adopting a fast fourier transform or mel-frequency cepstrum coefficient mode.

Correspondingly, the second emotion recognition model is not trained according to the original sound content sample during training, but is trained by utilizing the corresponding spectrogram and the calibrated emotion label after the sound content sample is converted into the corresponding spectrogram. In this way, after the electronic device converts the sub-sound contents obtained by dividing into corresponding spectrograms, the spectrograms corresponding to the sub-sound contents are respectively input into the second emotion recognition model to perform emotion recognition, and the candidate emotion corresponding to the sub-sound contents output by the second emotion recognition model is obtained.

The electronic device then determines a second candidate emotion of the user further based on the deriving the plurality of candidate emotions. For example, the electronic device may determine whether a proportion of the same candidate emotion to all the candidate emotions among the plurality of candidate emotions reaches a preset proportion, and if so, may determine the same candidate emotion as a second candidate emotion of the user.

It should be noted that, for a specific value of the preset proportion, the embodiment of the present application is not particularly limited, and may be set by a person skilled in the art according to actual needs, for example, the preset proportion is set to 60% in the embodiment of the present application. For example, the sound content is divided to obtain 5 pieces of sub-sound content, the 5 pieces of sub-sound content are respectively converted into corresponding spectrograms, emotion recognition is performed through a second emotion recognition model to obtain 5 pieces of candidate emotion, if 3 pieces of candidate emotion in the 5 pieces of candidate emotion are identical, all the 3 pieces of candidate emotion are "happy", and at the moment, the "happy" is determined as the second candidate emotion of the user.

In an embodiment, after "determining the target emotion of the user according to the first candidate emotion and the second candidate emotion", further comprising:

and sending the text content and the target emotion to the corresponding target equipment.

It will be appreciated by those of ordinary skill in the art that when a user communicates text-based content with other users via an electronic device, the user does not know the emotion of the other users nor the emotion of the user. For this reason, in the embodiment of the present application, when the text content input by the user is used for communication of other users, the text content input by the user and the identified target emotion are sent to the corresponding target device, that is, the electronic device of the other user communicating with the user. Therefore, other users can view text content input by the user, and can know emotion of the user when inputting the text content, so that better communication is facilitated.

Referring to fig. 3 and fig. 4 in combination, fig. 3 is another flow chart of the emotion recognition method provided in the embodiment of the present application, and fig. 4 is an application scenario diagram of the emotion recognition method. The emotion recognition method may be applied to an electronic device, and the flow of the emotion recognition method may include:

In 201, the electronic device obtains text content entered by a user while the user communicates with other users.

It will be appreciated by those of ordinary skill in the art that when a user communicates text-based content with other users via an electronic device, the user does not know the emotion of the other users nor the emotion of the user. Therefore, in the embodiment of the application, when the text content input by the user is used for communication of other users, the emotion of the user can be identified, and the text content input by the user and the emotion obtained by identification are sent to the electronic equipment of the other users, so that the user is helped to communicate with other people better.

The electronic device first identifies whether the user communicates with other users, for example, whether the user communicates with other users can be determined by identifying whether an application running in the foreground is a communication application (such as an instant messaging application and a short message), wherein if the application running in the foreground is the communication application, the electronic device determines that the user communicates with other users. The electronic equipment detects user input when recognizing that the user communicates with other users, and triggers emotion recognition of the user when detecting that the user inputs text content. Text content includes, but is not limited to, words, sentences, articles, and the like.

In 202, the electronic device extracts keywords included in the text content, and maps the extracted keywords to a vector space through a word embedding model, so as to obtain corresponding feature vectors.

In 203, the electronic device converts the feature vector into a corresponding feature tensor, and inputs the feature tensor into a first emotion recognition model trained in advance to perform emotion recognition, so as to obtain a first candidate emotion of the user.

It should be noted that, in the embodiment of the present application, a first emotion recognition model for recognizing the emotion of the user from the text content input by the user is trained in advance. Thus, after the electronic device acquires the text content input by the user, the electronic device can primarily recognize the emotion of the user according to the text content and the first emotion recognition model trained in advance.

The electronic equipment firstly adopts a preset keyword extraction algorithm to extract keywords in the text content, and the extracted keywords are used for representing the complete text content. The embodiment of the application is not particularly limited as to what keyword extraction algorithm is used to extract the keywords from the text content, and a person skilled in the art can select an appropriate keyword extraction algorithm according to actual needs. For example, in the embodiment of the application, the electronic device may use an ID-TIF algorithm to extract keywords from text content input by the user, and assume that the text content input by the user is a sentence "i want you today", and use the ID-TIF algorithm to extract keywords to obtain keywords as "today" and "want you".

The electronic device then further combines the feature vectors of the corresponding text content into tensors, denoted feature tensors. Like vectors and matrices, tensors are also a data structure, but tensors are a three-dimensional or more data structure, where the dimension of the data is referred to as the order of the tensor, which can be thought of as the generalization of vectors and matrices in multidimensional space, as the first order tensor, and as the second order tensor. After the feature vector is converted into the corresponding feature tensor, the feature tensor can be input into the first emotion recognition model to carry out emotion recognition, and a first candidate emotion of the user output by the first emotion recognition model is obtained.

In 204, the electronic device obtains sound content during user input of text content.

It should be noted that, in the embodiment of the present application, the electronic device performs emotion recognition according to the voice of the user in addition to performing emotion recognition according to the text content input by the user. The electronic equipment starts an internal microphone or an external microphone to collect sound when detecting text content input by a user, so that the sound content of the user during the text content input period is collected. In this way, the electronic device can directly obtain sound content during the previously captured user input text content.

In 205, the electronic device performs emotion recognition according to the foregoing sound content and the second emotion recognition model trained in advance, to obtain a second candidate emotion of the user.

In the embodiment of the application, a second emotion recognition model for recognizing the emotion of the user according to the voice content of the user is trained in advance. The electronic device converts the sound content into a corresponding spectrogram, and represents the sound content by using the spectrogram, for example, the spectrogram may be converted by adopting a fast fourier transform or mel frequency cepstrum coefficient.

After the electronic equipment converts the sound content into the corresponding spectrogram, the spectrogram obtained through conversion is input into a second emotion recognition model for emotion recognition, and a second candidate emotion of the user output by the second emotion recognition model is obtained.

In 206, the electronic device inputs the first candidate emotion and the second candidate emotion into a pre-trained bayesian classifier to classify, and obtains a target emotion of the user output by the bayesian classifier.

In the application embodiment, a bayesian classifier for performing secondary emotion classification on the candidate emotion of the text source and the candidate emotion of the sound source is also trained in advance. For example, an emotion sample of a text source and an emotion sample of a corresponding sound source can be obtained, the emotion sample of the text source and the emotion text of the corresponding sound source are subjected to emotion calibration to obtain a corresponding emotion label, and training is performed according to the emotion sample of the text source and the emotion text of the corresponding sound source and the corresponding emotion label to obtain the Bayesian classifier.

At 207, the electronic device transmits the aforementioned text content and the target emotion to the electronic devices of other users.

After the electronic equipment recognizes the target emotion of the user, the text content input by the user and the recognized target emotion are sent to the corresponding target equipment, namely the electronic equipment of other users communicating with the user. Therefore, other users can view text content input by the user, and can know emotion of the user when inputting the text content, so that better communication is facilitated.

The embodiment of the application also provides an emotion recognition device. Referring to fig. 5, fig. 5 is a schematic structural diagram of an emotion recognition device according to an embodiment of the present application. Wherein the emotion recognition device is applied to an electronic device, the emotion recognition device comprises a first emotion recognition module 301, a second emotion recognition module 302, a target emotion recognition module 303 and a behavior prediction module 304, as follows:

the first emotion recognition module 301 is configured to obtain text content input by a user, and perform emotion recognition according to the text content and a first emotion recognition model trained in advance, so as to obtain a first candidate emotion of the user;

a second emotion recognition module 302, configured to obtain sound content during text content input by a user, and perform emotion recognition according to the sound content and a second emotion recognition model trained in advance, so as to obtain a second candidate emotion of the user;

The target emotion recognition module 303 is configured to determine a target emotion of the user according to the first candidate emotion and the second candidate emotion.

In an embodiment, in determining the target emotion of the user from the first candidate emotion and the second candidate emotion, the target emotion recognition module 303 may be configured to:

In an embodiment, when performing emotion recognition according to text content and a first emotion recognition model trained in advance, to obtain a first candidate emotion of the user, the first emotion recognition module 301 may be configured to:

extracting the characteristics of the text content to obtain corresponding characteristic vectors;

and converting the feature vector into a corresponding feature tensor, inputting the feature tensor into the first emotion recognition model for emotion recognition, and obtaining a first candidate emotion of the user output by the first emotion recognition model.

In an embodiment, when extracting features of text content to obtain corresponding feature vectors, the first emotion recognition module 301 may be configured to:

In an embodiment, before inputting the feature tensor into the first emotion recognition model for emotion recognition, the method further comprises:

and performing zero filling processing on the characteristic tensor.

In an embodiment, when performing emotion recognition according to the sound content and the second emotion recognition model trained in advance, to obtain the second candidate emotion of the user, the second emotion recognition module 302 may be configured to:

dividing the sound content into a plurality of sub-sound contents;

respectively inputting the plurality of sub-sound contents into a second emotion recognition model to perform emotion recognition to obtain a plurality of corresponding candidate emotions;

a second candidate emotion of the user is determined from the plurality of candidate emotions.

In an embodiment, the emotion recognition device further comprises a content sending module, configured to send the text content and the target emotion to the corresponding target device after determining the target emotion of the user according to the first candidate emotion and the second candidate emotion.

It should be noted that, the emotion recognition device provided in the embodiment of the present application and the emotion recognition method in the foregoing embodiment belong to the same concept, and any method provided in the emotion recognition method embodiment may be run on the emotion recognition device, and detailed implementation processes of the method embodiment are shown in the emotion recognition method embodiment and will not be repeated herein.

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed on a computer, causes the computer to perform the steps in the emotion recognition method as provided in the present embodiment. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the processor executes the steps in the emotion recognition method provided by the embodiment by calling the computer program stored in the memory.

In an embodiment, an electronic device is also provided. Referring to fig. 6, the electronic device includes a processor 401 and a memory 402. The processor 401 is electrically connected to the memory 402.

The processor 401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, performs various functions of the electronic device and processes data by running or loading computer programs stored in the memory 402, and calling data stored in the memory 402.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by running the computer programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, a computer program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

In the embodiment of the present application, the processor 401 in the electronic device loads the instructions corresponding to the processes of one or more computer programs into the memory 402 according to the following steps, and the processor 401 executes the computer programs stored in the memory 402, so as to implement various functions, as follows:

acquiring sound content during text content input by a user, and carrying out emotion recognition according to the sound content and a second emotion recognition model trained in advance to obtain a second candidate emotion of the user;

Referring to fig. 7, fig. 7 is another schematic structural diagram of an electronic device according to an embodiment of the present application, which is different from the electronic device shown in fig. 6 in that the electronic device further includes an input unit 403, an output unit 404, and other components.

The input unit 403 may be used to receive input numbers, character information or user characteristic information (such as fingerprints), and to generate keyboard, mouse, joystick, optical or trackball signal inputs, etc. in connection with user settings and function control.

The output unit 404 may be used to display information input by a user or information provided to a user, such as a screen.

In an embodiment, in determining the target emotion of the user from the first and second candidate emotions, the processor 401 may perform:

In one embodiment, when performing emotion recognition according to text content and a pre-trained first emotion recognition model, to obtain a first candidate emotion of the user, the processor 401 may perform:

In one embodiment, in performing feature extraction on text content to obtain corresponding feature vectors, the processor 401 may perform:

In an embodiment, before inputting the feature tensor into the first emotion recognition model for emotion recognition, the processor 401 may perform:

and performing zero filling processing on the characteristic tensor.

In one embodiment, when performing emotion recognition according to the sound content and the second emotion recognition model trained in advance, to obtain the second candidate emotion of the user, the processor 401 may perform:

dividing the sound content into a plurality of sub-sound contents;

In an embodiment, after determining the target emotion of the user from the first and second candidate emotions, the processor 401 may perform:

It should be noted that, the electronic device provided in the embodiment of the present application and the emotion recognition method in the foregoing embodiment belong to the same concept, and any method provided in the emotion recognition method embodiment may be run on the electronic device, and detailed implementation processes of the method embodiment are shown in the emotion recognition method embodiment and are not repeated herein.

It should be noted that, for the emotion recognition method of the embodiment of the present application, it will be understood by those skilled in the art that all or part of the flow of implementing the emotion recognition method of the embodiment of the present application may be implemented by controlling related hardware through a computer program, where the computer program may be stored in a computer readable storage medium, such as a memory of an electronic device, and executed by at least one processor within the electronic device, and the execution may include, for example, the flow of the embodiment of the emotion recognition method. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.

For the emotion recognition device of the embodiment of the present application, each functional module may be integrated in one processing chip, or each module may exist alone physically, or two or more modules may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated module, if implemented as a software functional module and sold or used as a stand-alone product, may also be stored on a computer readable storage medium such as read-only memory, magnetic or optical disk, etc.

The foregoing describes in detail a method, apparatus, storage medium and electronic device for emotion recognition provided in the embodiments of the present application, and specific examples are applied to illustrate principles and implementations of the present application, where the foregoing examples are only used to help understand the method and core idea of the present application; meanwhile, those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, and the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method for emotion recognition, applied to an electronic device, comprising:

the data processing layer in the panoramic sensing architecture acquires text content input by a user from panoramic data and provides the text content to the feature extraction layer, and the scene modeling layer carries out emotion recognition according to the text content and a first emotion recognition model trained in advance to obtain a first candidate emotion of the user, wherein the first candidate emotion comprises the following specific steps: the feature extraction layer performs feature extraction on the text content to obtain feature vectors corresponding to the text content; converting the feature vector of the text content into a corresponding feature tensor at the scene modeling layer, inputting the feature tensor into the first emotion recognition model for emotion recognition, and obtaining a first candidate emotion of the user output by the first emotion recognition model, wherein the feature tensor is a data structure with three dimensions and more than three dimensions;

The data processing layer in the panoramic sensing architecture acquires sound content during the text content input period of a user from panoramic data and provides the sound content to the feature extraction layer, and the scene modeling layer carries out emotion recognition according to the sound content and a pre-trained second emotion recognition model to obtain a second candidate emotion of the user;

determining a target emotion of the user according to the first candidate emotion and the second candidate emotion;

and sending the text content and the target emotion to electronic equipment of other users so that the other users can view the text content for input and know the emotion when the user inputs the text content.

2. The emotion recognition method of claim 1, wherein the determining the target emotion of the user from the first candidate emotion and the second candidate emotion comprises:

and inputting the first candidate emotion and the second candidate emotion into a pre-trained Bayesian classifier to classify, so as to obtain the target emotion of the user output by the Bayesian classifier.

3. The emotion recognition method according to claim 1, wherein the feature extraction of the text content to obtain a corresponding feature vector includes:

And extracting keywords included in the text content, and mapping the keywords to a vector space through a word embedding model to obtain the feature vector.

4. The emotion recognition method of claim 1, wherein before inputting the feature tensor into the first emotion recognition model for emotion recognition, further comprising:

and performing zero filling processing on the characteristic tensor.

5. The method of claim 1, wherein the performing emotion recognition based on the sound content and a pre-trained second emotion recognition model to obtain a second candidate emotion of the user comprises:

dividing the sound content into a plurality of sub-sound contents;

respectively inputting the plurality of sub-sound contents into the second emotion recognition model to perform emotion recognition to obtain a plurality of corresponding candidate emotions;

and determining a second candidate emotion of the user according to the plurality of candidate emotions.

6. An emotion recognition device applied to an electronic apparatus, comprising:

the first emotion recognition module is used for acquiring text content input by a user from panoramic data by the data processing layer in the panoramic sensing architecture and providing the text content to the feature extraction layer, and the scene modeling layer carries out emotion recognition according to the text content and a first emotion recognition model trained in advance to obtain a first candidate emotion of the user, wherein the first candidate emotion comprises the following specific steps of: the feature extraction layer performs feature extraction on the text content to obtain feature vectors corresponding to the text content; converting the feature vector of the text content into a corresponding feature tensor at the scene modeling layer, inputting the feature tensor into the first emotion recognition model for emotion recognition, and obtaining a first candidate emotion of the user output by the first emotion recognition model, wherein the feature tensor is a data structure with three dimensions and more than three dimensions;

The second emotion recognition module is used for acquiring sound content of the user during the text content input period from panoramic data by the data processing layer in the panoramic sensing framework and providing the sound content to the feature extraction layer, and the scene modeling layer carries out emotion recognition according to the sound content and a pre-trained second emotion recognition model to obtain a second candidate emotion of the user;

a target emotion recognition module for determining a target emotion of the user according to the first candidate emotion and the second candidate emotion; and the content sending module is used for sending the text content and the target emotion to electronic equipment of other users after determining the target situation of the user according to the first candidate emotion and the second follow-up situation so that the other users can view the text content used for input and know the emotion when the user inputs the text content.

7. A storage medium having stored thereon a computer program, which when run on a computer causes the computer to perform the emotion recognition method of any of claims 1 to 5.

8. An electronic device comprising a processor and a memory, the memory storing a computer program, wherein the processor is configured to perform the emotion recognition method of any of claims 1 to 5 by invoking the computer program.