CN107945790B - Emotion recognition method and emotion recognition system - Google Patents

Emotion recognition method and emotion recognition system Download PDF

Info

Publication number
CN107945790B
CN107945790B CN201810007403.7A CN201810007403A CN107945790B CN 107945790 B CN107945790 B CN 107945790B CN 201810007403 A CN201810007403 A CN 201810007403A CN 107945790 B CN107945790 B CN 107945790B
Authority
CN
China
Prior art keywords
features
acoustic
text
emotion
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810007403.7A
Other languages
Chinese (zh)
Other versions
CN107945790A (en
Inventor
王雪云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co Ltd filed Critical BOE Technology Group Co Ltd
Priority to CN201810007403.7A priority Critical patent/CN107945790B/en
Publication of CN107945790A publication Critical patent/CN107945790A/en
Application granted granted Critical
Publication of CN107945790B publication Critical patent/CN107945790B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Hospice & Palliative Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Child & Adolescent Psychology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • User Interface Of Digital Computer (AREA)
  • Telephonic Communication Services (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention discloses an emotion recognition method and an emotion recognition system, wherein the method comprises the following steps: acquiring a current voice signal; extracting the voice features of the current voice signal, wherein the voice features comprise: acoustic features and text features; according to the voice features and the preset depth model, recognizing the emotion types corresponding to the current voice signals, wherein the emotion types comprise: the technical scheme of the invention can identify the corresponding emotion type through the voice signal so as to supervise the service personnel and improve the service level.

Description

Emotion recognition method and emotion recognition system
Technical Field
The embodiment of the invention relates to the technical field of communication, in particular to an emotion recognition method and an emotion recognition system.
Background
In interpersonal communication, language is one of the most natural and important means. The emotion entrained in the speaker's speech can have a great influence on the mood of the surrounding people, including: the positive side and the negative side, especially the service personnel, for example, in public places such as buses, nursing homes or hospitals, if the service personnel have bad attitudes, pride voice and have great advantage over the language, i.e. the emotion is negative, the negative side will have bad influence on the serviced personnel, thus being not favorable for social harmony and improving happiness index.
The inventor researches and discovers that no effective technical means can judge the corresponding emotion through the speech of a service staff at present so as to supervise and improve the service level.
Disclosure of Invention
In order to solve the above technical problem, embodiments of the present invention provide an emotion recognition method and an emotion recognition system, which can recognize corresponding emotion through a speech signal.
In one aspect, an embodiment of the present invention provides an emotion recognition method, including:
acquiring a current voice signal;
extracting voice features of a current voice signal, wherein the voice features comprise: acoustic features and text features;
according to the voice features and a preset depth model, recognizing emotion types corresponding to the current voice signals, wherein the emotion types comprise: positive, neutral and negative.
Optionally, before the extracting the speech feature of the current speech signal, the method further includes:
and preprocessing the current voice signal.
Optionally, after the emotion type corresponding to the current speech signal is identified, the method further includes:
and activating a corresponding preset coping scheme according to the emotion type.
Optionally, the acoustic features include: fundamental frequency, duration, energy and frequency spectrum.
Optionally, the recognizing, according to the speech feature and the preset depth model, an emotion type corresponding to the current speech signal includes:
obtaining acoustic characteristic information and text characteristic information for emotion recognition according to the acoustic characteristic and the text characteristic;
obtaining K acoustic feature vectors according to the acoustic feature information;
obtaining K text feature vectors according to the K acoustic feature vectors and the text feature information;
and recognizing the emotion type of the current voice signal according to the K acoustic feature vectors, the K text feature vectors and the preset depth model.
Optionally, the obtaining acoustic feature information and text feature information for emotion recognition according to the acoustic feature and the text feature includes:
respectively converting the acoustic features and the text features into corresponding vectors;
and respectively inputting the vector corresponding to the acoustic feature and the vector corresponding to the text feature into a convolutional neural network to obtain acoustic feature information and text feature information for emotion recognition.
Optionally, the obtaining K acoustic feature vectors according to the acoustic feature information includes:
pooling the acoustic feature information to obtain K acoustic feature vectors;
the obtaining K text feature vectors according to the K acoustic feature vectors and the text feature information includes:
focusing the text feature information by adopting a focusing mechanism according to the mean value of the K acoustic feature vectors;
pooling the focused text feature information to obtain K text feature vectors.
On the other hand, an embodiment of the present invention further provides an emotion recognition system, including:
a voice acquisition module configured to acquire a current voice signal;
a feature extraction module configured to extract voice features of a current voice signal, the voice features including: acoustic features and text features;
the emotion recognition module is configured to recognize an emotion type corresponding to the current voice signal according to the voice feature and a preset depth model, wherein the emotion type comprises: positive, neutral and negative.
Optionally, the system further comprises: the device comprises a signal preprocessing module and an activation module;
the signal preprocessing module is configured to preprocess the current voice signal;
the activation module is configured to activate a corresponding preset coping scheme according to the emotion type.
Optionally, the emotion recognition module includes:
the first obtaining unit is configured to obtain acoustic feature information and text feature information for emotion recognition according to the acoustic feature and the text feature, and specifically includes: respectively converting the acoustic features and the text features into corresponding vectors; respectively inputting the vector corresponding to the acoustic feature and the vector corresponding to the text feature into a convolutional neural network to obtain acoustic feature information and text feature information for emotion recognition; the acoustic features include: fundamental frequency, duration, energy and frequency spectrum;
a second obtaining unit, configured to obtain K acoustic feature vectors according to the acoustic feature information, including: pooling the acoustic feature information to obtain K acoustic feature vectors; the method is further configured to obtain K text feature vectors according to the K acoustic feature vectors and the text feature information, and specifically includes: focusing the text feature information by adopting a focusing mechanism according to the mean value of the K acoustic feature vectors; pooling focused text feature information to obtain K text feature vectors;
and the emotion recognition unit is configured to recognize the emotion type of the current voice signal according to the K acoustic feature vectors, the K text feature vectors and the preset depth model.
The embodiment of the invention provides an emotion recognition method and an emotion recognition system, wherein the method comprises the following steps: acquiring a current voice signal; extracting the voice features of the current voice signal, wherein the voice features comprise: acoustic features and text features; according to the voice features and a preset depth model, recognizing emotion types corresponding to the current voice signals, wherein the emotion types comprise: the technical scheme of the invention can identify the corresponding emotion type through the voice signal so as to supervise the service personnel and improve the service level.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the embodiments of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.
Fig. 1 is a flowchart of an emotion recognition method according to an embodiment of the present invention;
FIG. 2 is another flow chart of a method for emotion recognition according to an embodiment of the present invention;
FIG. 3 is a flow chart of step 300 provided by an embodiment of the present invention;
FIG. 4 is a block diagram of an emotion recognition system provided in an embodiment of the present invention;
FIG. 5 is another block diagram of an emotion recognition system provided in an embodiment of the present invention;
FIG. 6 is a block diagram of an emotion recognition module according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
In order to explain the technical solutions of the embodiments of the present invention, the following description is given by way of specific examples.
Example one
Fig. 1 is a flowchart of an emotion recognition method provided in an embodiment of the present invention, and as shown in fig. 1, the emotion recognition method provided in the embodiment of the present invention specifically includes the following steps:
step 100, obtaining a current voice signal.
Specifically, step 100 acquires a speech signal through a microphone or a microphone array.
Step 200, extracting the voice characteristics of the current voice signal.
Wherein the voice features include: acoustic features and text features.
Optionally, the acoustic features comprise: the method comprises the following steps of fundamental frequency, duration, energy and frequency spectrum, wherein the fundamental frequency determines tone height, and fundamental frequency features are extracted through an autocorrelation algorithm; duration is related to Speech speed, the silence information in the current voice signal is valuable for emotion recognition, and duration features are extracted through a Visual Speech tool; energy is related to amplitude, and energy characteristics and spectral characteristics can be extracted by the existing technology.
Optionally, the text feature is text information in the current Speech signal, and the text feature is extracted by a Speech Recognition technology, for example, Auto-Speech Recognition by science news.
And 300, recognizing the emotion type corresponding to the current voice signal according to the voice characteristics and the preset depth model.
Wherein, the emotion types include: positive, neutral and negative, it should be noted that a positive emotional type may be pleasing to the served person, a neutral emotional type may not affect the mood of the served person, and a negative emotional type may be uncomfortable for the served person. For the same sentence, such as "you are fool," one may be canon a friend, or a jeer adversary, and the emotion may be positive or negative.
It should be noted that the preset depth model is subjected to a large amount of training through the sample database, so that the accuracy of the recognized emotion types is high.
Optionally, the emotion recognition method provided by the embodiment of the invention can be applied to public occasions such as buses, gerocomiums, hospitals and the like.
The emotion recognition method provided by the embodiment of the invention comprises the following steps: acquiring a current voice signal; extracting the voice features of the current voice signal, wherein the voice features comprise: acoustic features and text features; according to the voice features and the preset depth model, recognizing the emotion types corresponding to the current voice signals, wherein the emotion types comprise: the technical scheme of the invention can identify the corresponding emotion type through the voice signal so as to supervise the service personnel and improve the service level.
Optionally, fig. 2 is another flowchart of the emotion recognition method provided in the embodiment of the present invention, as shown in fig. 2, before step 200, the emotion recognition method provided in the embodiment of the present invention further includes:
step 400, preprocessing the current voice signal.
Specifically, the preprocessing in step 400 includes: the present invention is to eliminate the environmental noise, strengthen the useful signal or segment the current speech signal, etc., it should be noted that the segmentation of the current speech signal can be realized by windowing and framing the signal, for example, by using a hamming window with a window length of 25ms and a window shift of 10ms (i.e., a speech duration of 25ms per frame and a window shift step size of 10 ms).
Optionally, after step 300, the emotion recognition method provided in the embodiment of the present invention further includes:
and 500, activating a corresponding preset coping scheme according to the emotion type.
Specifically, step 500 includes: and (3) in the state that the emotion type is positive or neutral, the service personnel is encouraged to keep on, and in the state that the emotion type is negative, a preset coping scheme is activated, wherein the coping scheme comprises the following steps: (1) alarming in time to remind service personnel of paying attention to the service attitude, wherein optionally, the alarming comprises character display, buzzing, voice broadcasting and the like; (2) collecting current voice signals corresponding to the negative emotion and storing the current voice signals in a cloud end for service organizations to evaluate and improve service quality; (3) and (3) pushing timing messages, namely pushing the service quality information of the service personnel to the mobile phone of the service personnel every day after work, so that the service personnel comprehensively know the service condition of the service personnel on the day, and further improving the service level.
Optionally, fig. 3 is a flowchart of step 300 provided in the embodiment of the present invention, as shown in fig. 3, step 300 includes:
step 301, obtaining acoustic feature information and text feature information for emotion recognition according to the acoustic feature and the text feature.
Specifically, step 301 includes: respectively converting the acoustic features and the text features into corresponding vectors; and respectively inputting the vector corresponding to the acoustic feature and the vector corresponding to the text feature into a convolutional neural network to obtain acoustic feature information and text feature information for emotion recognition.
And step 302, obtaining K acoustic feature vectors according to the acoustic feature information.
Specifically, step 302 includes: and pooling the acoustic feature information to obtain K acoustic feature vectors.
And 303, acquiring K text characteristic vectors according to the K acoustic characteristic vectors and the text characteristic information.
Specifically, step 303 includes: focusing the text feature information by adopting a focusing mechanism according to the mean value of the K acoustic feature vectors; pooling the focused text feature information to obtain K text feature vectors.
It should be noted that, different weights are assigned to different texts by using the focusing mechanism, for example, a higher weight is assigned to an illegitimate word, which affects the judgment of emotion, in a colloquial way, for example, the feature of the convolutional neural output indicates that the attitude of the current speaker is rough, the focusing mechanism of the convolutional neural network assigns a higher weight to an "illegitimate word" (e.g., mixed egg, stupid, etc.), for example, the feature of the convolutional neural output indicates that the attitude of the current speaker is mild, and the focusing mechanism of the convolutional neural network does not assign a higher weight to an "illegitimate word" (e.g., mixed egg, stupid, etc.).
Specifically, the focusing mechanism of the text feature information is as follows: and assigning weights to the text feature information, wherein the weights are determined according to the K acoustic feature vectors.
In particular, suppose that at time t, the text feature information is ha(t) acoustic feature information is OqEach text characteristic information becomes after the action of the focusing mechanism
Figure BDA0001538413190000071
ma,q(t)=tanh(Wamha(t)+WqmOq)
Figure BDA0001538413190000072
Figure BDA0001538413190000073
Wherein, Wam,Wqm,WmsIs the focus parameter, Sa,q(t) is a weight of the weight,
Figure BDA0001538413190000074
is based on focused text feature information.
And step 304, recognizing the emotion type of the current voice signal according to the K acoustic feature vectors, the K text feature vectors and the preset depth model.
Specifically, step 304 specifically includes: and performing logistic regression on the K voice feature vectors and the K text feature vectors, and identifying the emotion type of the current voice signal according to the K voice feature vectors, the K text feature vectors and the depth model after the logistic regression.
The working principle of the embodiment of the invention is specifically explained as follows: obtaining a current voice signal through a microphone or a microphone array; preprocessing current voice information; extracting acoustic features of the current voice signal, extracting text features of the current voice signal through a voice recognition technology, and respectively converting the acoustic features and the text features into corresponding vectors; respectively inputting the vector corresponding to the acoustic feature and the vector corresponding to the text feature into a convolutional neural network to obtain acoustic feature information and text feature information for emotion recognition; pooling acoustic feature information to obtain K acoustic feature vectors; focusing the text feature information by adopting a focusing mechanism according to the mean value of the K acoustic feature vectors; pooling focused text feature information to obtain K text feature vectors; performing logistic regression on the K voice feature vectors and the K text feature vectors, and identifying the emotion type of the current voice signal according to the K voice feature vectors, the K text feature vectors and the depth model after the logistic regression; and activating a corresponding preset coping scheme according to the emotion type.
Example two
Based on the inventive concept of the above embodiment, fig. 4 is a schematic structural diagram of an emotion recognition system provided in an embodiment of the present invention, and as shown in fig. 4, the emotion recognition system provided in an embodiment of the present invention includes: a voice acquisition module 10, a feature extraction module 20 and an emotion recognition module 30.
In the present embodiment, the voice acquiring module 10 is configured to acquire a current voice signal; a feature extraction module 20 configured to extract a voice feature of the current voice signal; and the emotion recognition module 30 is configured to recognize an emotion type corresponding to the current voice signal according to the voice feature and the preset depth model.
Optionally, the acoustic features comprise: the method comprises the following steps of fundamental frequency, duration, energy and frequency spectrum, wherein the fundamental frequency determines tone height, and fundamental frequency features are extracted through an autocorrelation algorithm; duration is related to Speech speed, the silence information in the current voice signal is valuable for emotion recognition, and duration features are extracted through a Visual Speech tool; energy is related to amplitude, and energy characteristics and spectral characteristics can be extracted by the existing technology.
Optionally, the text feature is text information in the current Speech signal, and the text feature is extracted by a Speech Recognition technology, for example, Auto-Speech Recognition by science news.
Wherein, the emotion types include: positive, neutral and negative, it should be noted that a positive emotional type may be pleasing to the served person, a neutral emotional type may not affect the mood of the served person, and a negative emotional type may be uncomfortable for the served person. For the same sentence, such as "you are fool," one may be canon a friend, or a jeer adversary, and the emotion may be positive or negative.
Optionally, the emotion recognition system provided by the embodiment of the invention can be applied to public occasions such as buses, gerocomiums, hospitals and the like.
The emotion recognition system provided by the embodiment of the invention comprises: a voice acquisition module configured to acquire a current voice signal; the feature extraction module is configured to extract voice features of a current voice signal, the voice features including: acoustic features and text features; the emotion recognition module is configured to recognize an emotion type corresponding to the current voice signal according to the voice features and the preset depth model, wherein the emotion type comprises: the technical scheme of the invention can identify the corresponding emotion type through the voice signal so as to supervise the service personnel and improve the service level.
Optionally, fig. 5 is another schematic structural diagram of the emotion recognition system provided in the embodiment of the present invention, and as shown in fig. 5, the system provided in the embodiment of the present invention further includes: a signal preprocessing module 40 and an activation module 50.
A signal preprocessing module 40 configured to preprocess the current speech signal.
Specifically, the pretreatment comprises the following steps: the present invention is to eliminate the environmental noise, strengthen the useful signal or segment the current speech signal, etc., it should be noted that the segmentation of the current speech signal can be realized by windowing and framing the signal, for example, by using a hamming window with a window length of 25ms and a window shift of 10ms (i.e., a speech duration of 25ms per frame and a window shift step size of 10 ms).
And the activation module 50 is configured to activate the corresponding preset coping schemes according to the emotion types.
Specifically, the activation module 50 encourages the service staff to keep on in a state that the emotion type is positive or neutral, and activates a preset coping scheme in a state that the emotion type is negative, wherein the coping scheme includes, but is not limited to, the following: (1) alarming in time to remind service personnel of paying attention to the service attitude, wherein optionally, the alarming comprises character display, buzzing, voice broadcasting and the like; (2) collecting current voice signals corresponding to the negative emotion and storing the current voice signals in a cloud end for service organizations to evaluate and improve service quality; (3) and (3) pushing timing messages, namely pushing the service quality information of the service personnel to the mobile phone of the service personnel every day after work, so that the service personnel comprehensively know the service condition of the service personnel on the day, and further improving the service level.
Optionally, fig. 6 is a schematic structural diagram of an emotion recognition module provided in an embodiment of the present invention, and as shown in fig. 6, the emotion recognition module includes: a first obtaining unit 31, a second obtaining unit 32 and an emotion recognition unit 33.
The first obtaining unit 31 is configured to obtain acoustic feature information and text feature information for emotion recognition according to the acoustic feature and the text feature, and specifically includes: respectively converting the acoustic features and the text features into corresponding vectors; respectively inputting the vector corresponding to the acoustic feature and the vector corresponding to the text feature into a convolutional neural network to obtain acoustic feature information and text feature information for emotion recognition; the acoustic features include: (ii) a
The second obtaining unit 31 is configured to obtain K acoustic feature vectors according to the acoustic feature information, and specifically includes: pooling acoustic feature information to obtain K acoustic feature vectors; the method is further configured to obtain K text feature vectors according to the K acoustic feature vectors and the text feature information, and specifically includes: focusing the text feature information by adopting a focusing mechanism according to the mean value of the K acoustic feature vectors; pooling focused text feature information to obtain K text feature vectors;
and the emotion recognition unit 33 is configured to recognize the emotion type of the current voice signal according to the K acoustic feature vectors, the K text feature vectors and the preset depth model.
Those skilled in the art can understand that each module or unit included in the second embodiment is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It will be further understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by executing the relevant hardware through a program, where the program may be stored in a computer-readable storage medium, and the storage medium includes: ROM/RAM, magnetic disks, optical disks, and the like.
Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. An emotion recognition method, comprising:
acquiring a current voice signal;
extracting voice features of a current voice signal, wherein the voice features comprise: acoustic features and text features;
according to the voice features and a preset depth model, recognizing emotion types corresponding to the current voice signals, wherein the emotion types comprise: positive, neutral and negative;
respectively converting the acoustic features and the text features into corresponding vectors; respectively inputting the vector corresponding to the acoustic feature and the vector corresponding to the text feature into a convolutional neural network to obtain acoustic feature information and text feature information for emotion recognition;
pooling the acoustic feature information to obtain K acoustic feature vectors;
focusing the text feature information by adopting a focusing mechanism according to the mean value of the K acoustic feature vectors; pooling focused text feature information to obtain K text feature vectors;
and performing logistic regression on the K voice feature vectors and the K text feature vectors, and identifying the emotion type of the current voice signal according to the K voice feature vectors, the K text feature vectors and the depth model after the logistic regression.
2. The method of claim 1, wherein prior to extracting the speech feature of the current speech signal, the method further comprises:
and preprocessing the current voice signal.
3. The method of claim 1 or 2, wherein after the emotion type corresponding to the current speech signal is identified, the method further comprises:
and activating a corresponding preset coping scheme according to the emotion type.
4. The method of claim 1, wherein the acoustic features comprise: fundamental frequency, duration, energy and frequency spectrum.
5. An emotion recognition system, comprising:
a voice acquisition module configured to acquire a current voice signal;
a feature extraction module configured to extract voice features of a current voice signal, the voice features including: acoustic features and text features;
an emotion recognition module configured to include:
a first obtaining unit configured to convert the acoustic features and the text features into corresponding vectors, respectively; respectively inputting the vector corresponding to the acoustic feature and the vector corresponding to the text feature into a convolutional neural network to obtain acoustic feature information and text feature information for emotion recognition;
the second obtaining unit is configured to pool the acoustic feature information, obtain K acoustic feature vectors, and focus the text feature information by adopting a focusing mechanism according to a mean value of the K acoustic feature vectors; pooling focused text feature information to obtain K text feature vectors;
the emotion recognition unit is configured to perform logistic regression on the K voice feature vectors and the K text feature vectors, and recognize the emotion type of the current voice signal according to the K voice feature vectors, the K text feature vectors and the depth model after the logistic regression;
the emotion types include: positive, neutral and negative.
6. The system of claim 5, further comprising: the device comprises a signal preprocessing module and an activation module;
the signal preprocessing module is configured to preprocess the current voice signal;
the activation module is configured to activate a corresponding preset coping scheme according to the emotion type.
7. The system of claim 5, wherein the acoustic features comprise: fundamental frequency, duration, energy and frequency spectrum.
CN201810007403.7A 2018-01-03 2018-01-03 Emotion recognition method and emotion recognition system Active CN107945790B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810007403.7A CN107945790B (en) 2018-01-03 2018-01-03 Emotion recognition method and emotion recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810007403.7A CN107945790B (en) 2018-01-03 2018-01-03 Emotion recognition method and emotion recognition system

Publications (2)

Publication Number Publication Date
CN107945790A CN107945790A (en) 2018-04-20
CN107945790B true CN107945790B (en) 2021-01-26

Family

ID=61938328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810007403.7A Active CN107945790B (en) 2018-01-03 2018-01-03 Emotion recognition method and emotion recognition system

Country Status (1)

Country Link
CN (1) CN107945790B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108833722B (en) * 2018-05-29 2021-05-11 平安科技(深圳)有限公司 Speech recognition method, speech recognition device, computer equipment and storage medium
CN110660412A (en) * 2018-06-28 2020-01-07 Tcl集团股份有限公司 Emotion guiding method and device and terminal equipment
CN110728983B (en) * 2018-07-16 2024-04-30 科大讯飞股份有限公司 Information display method, device, equipment and readable storage medium
CN109741732B (en) * 2018-08-30 2022-06-21 京东方科技集团股份有限公司 Named entity recognition method, named entity recognition device, equipment and medium
CN109192225B (en) * 2018-09-28 2021-07-09 清华大学 Method and device for recognizing and marking speech emotion
CN109243490A (en) * 2018-10-11 2019-01-18 平安科技(深圳)有限公司 Driver's Emotion identification method and terminal device
CN109410986B (en) * 2018-11-21 2021-08-06 咪咕数字传媒有限公司 Emotion recognition method and device and storage medium
CN111354361A (en) * 2018-12-21 2020-06-30 深圳市优必选科技有限公司 Emotion communication method and system and robot
CN110047517A (en) * 2019-04-24 2019-07-23 京东方科技集团股份有限公司 Speech-emotion recognition method, answering method and computer equipment
CN110473571A (en) * 2019-07-26 2019-11-19 北京影谱科技股份有限公司 Emotion identification method and device based on short video speech
CN110600033B (en) * 2019-08-26 2022-04-05 北京大米科技有限公司 Learning condition evaluation method and device, storage medium and electronic equipment
CN111128189A (en) * 2019-12-30 2020-05-08 秒针信息技术有限公司 Warning information prompting method and device
US11810596B2 (en) 2021-08-16 2023-11-07 Hong Kong Applied Science and Technology Research Institute Company Limited Apparatus and method for speech-emotion recognition with quantified emotional states

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1391876A1 (en) * 2002-08-14 2004-02-25 Sony International (Europe) GmbH Method of determining phonemes in spoken utterances suitable for recognizing emotions using voice quality features
EP1429314A1 (en) * 2002-12-13 2004-06-16 Sony International (Europe) GmbH Correction of energy as input feature for speech processing
JP2005283647A (en) * 2004-03-26 2005-10-13 Matsushita Electric Ind Co Ltd Feeling recognition device
CN101894550A (en) * 2010-07-19 2010-11-24 东南大学 Speech emotion classifying method for emotion-based characteristic optimization
US9493130B2 (en) * 2011-04-22 2016-11-15 Angel A. Penilla Methods and systems for communicating content to connected vehicle users based detected tone/mood in voice input
JP5772448B2 (en) * 2011-09-27 2015-09-02 富士ゼロックス株式会社 Speech analysis system and speech analysis apparatus
JP6213476B2 (en) * 2012-10-31 2017-10-18 日本電気株式会社 Dissatisfied conversation determination device and dissatisfied conversation determination method
CN104050965A (en) * 2013-09-02 2014-09-17 广东外语外贸大学 English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof
US9324320B1 (en) * 2014-10-02 2016-04-26 Microsoft Technology Licensing, Llc Neural network-based speech processing
KR20160116586A (en) * 2015-03-30 2016-10-10 한국전자통신연구원 Method and apparatus for emotion recognition
US10276188B2 (en) * 2015-09-14 2019-04-30 Cogito Corporation Systems and methods for identifying human emotions and/or mental health states based on analyses of audio inputs and/or behavioral data collected from computing devices
CN107516511B (en) * 2016-06-13 2021-05-25 微软技术许可有限责任公司 Text-to-speech learning system for intent recognition and emotion
CN106297826A (en) * 2016-08-18 2017-01-04 竹间智能科技(上海)有限公司 Speech emotional identification system and method
CN106782615B (en) * 2016-12-20 2020-06-12 科大讯飞股份有限公司 Voice data emotion detection method, device and system

Also Published As

Publication number Publication date
CN107945790A (en) 2018-04-20

Similar Documents

Publication Publication Date Title
CN107945790B (en) Emotion recognition method and emotion recognition system
CN109256150B (en) Speech emotion recognition system and method based on machine learning
CN105096941B (en) Audio recognition method and device
CN106601259B (en) Information recommendation method and device based on voiceprint search
CN106504768B (en) Phone testing audio frequency classification method and device based on artificial intelligence
CN109543020B (en) Query processing method and system
WO2016173132A1 (en) Method and device for voice recognition, and user equipment
CN106782615A (en) Speech data emotion detection method and apparatus and system
CN103531198A (en) Speech emotion feature normalization method based on pseudo speaker clustering
CN108597505A (en) Audio recognition method, device and terminal device
CN106033669B (en) Audio recognition method and device
CN110428853A (en) Voice activity detection method, Voice activity detection device and electronic equipment
CN104103272A (en) Voice recognition method and device and blue-tooth earphone
CN108986798A (en) Processing method, device and the equipment of voice data
Gupta et al. Speech feature extraction and recognition using genetic algorithm
CN113744742B (en) Role identification method, device and system under dialogue scene
CN116665676A (en) Semantic recognition method for intelligent voice outbound system
CN110246518A (en) Speech-emotion recognition method, device, system and storage medium based on more granularity sound state fusion features
CN109817223A (en) Phoneme marking method and device based on audio fingerprints
CN113743267A (en) Multi-mode video emotion visualization method and device based on spiral and text
KR20130068624A (en) Apparatus and method for recognizing speech based on speaker group
CN115168563B (en) Airport service guiding method, system and device based on intention recognition
CN110930794A (en) Intelligent language education system and method
CN110807370B (en) Conference speaker identity noninductive confirmation method based on multiple modes
CN114242061A (en) Order dispatching method and system based on voice recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant