CN112002331A - Method and system for recognizing emotion based on conversation voice - Google Patents

Method and system for recognizing emotion based on conversation voice Download PDF

Info

Publication number
CN112002331A
CN112002331A CN202010779424.8A CN202010779424A CN112002331A CN 112002331 A CN112002331 A CN 112002331A CN 202010779424 A CN202010779424 A CN 202010779424A CN 112002331 A CN112002331 A CN 112002331A
Authority
CN
China
Prior art keywords
employee
voiceprint
conversation sound
conversation
working process
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010779424.8A
Other languages
Chinese (zh)
Inventor
王韬
秦瀚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Jingrui Intelligent Technology Co ltd
Original Assignee
Guangzhou Jingrui Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Jingrui Intelligent Technology Co ltd filed Critical Guangzhou Jingrui Intelligent Technology Co ltd
Priority to CN202010779424.8A priority Critical patent/CN112002331A/en
Publication of CN112002331A publication Critical patent/CN112002331A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of emotion recognition, and particularly discloses a method and a system for recognizing emotion based on conversation voice. The method for recognizing the emotion based on the conversation voice comprises the following steps: obtaining conversation sound in the working process of a certain employee; extracting voiceprint characteristics of a certain employee based on conversation sound in the working process of the employee; comparing the voiceprint characteristics of the employee with the standard voiceprint characteristics of the employee in the normal working state; it is determined whether the employee's mood is appropriate for the work being performed. The method and the system can be used for a factory production line, and can identify whether the emotion of the staff of each post is suitable for the work which is carried out by the staff; and the quick and accurate recognition effect is obtained.

Description

Method and system for recognizing emotion based on conversation voice
Technical Field
The invention relates to the technical field of emotion recognition, in particular to a method and a system for recognizing emotion based on conversation voice.
Background
The emotion refers to the attitude of external objects generated along with the cognitive and consciousness processes, is a reaction to the relation between objective objects and main body requirements, is a psychological activity mediated by individual desire and need, and is mainly expressed as several types of happiness, anger, sadness, happiness and the like in life; the above-mentioned emotions of the person can be reflected by the voice during the conversation. With the development of the AI technology, the existing AI technology can achieve the purpose of recognizing the main emotion types of people by extracting the voiceprint features through conversational speech.
Except for life, in work, the effect of mood changes on work is quite large. Especially in the field of industrial production which depends on manpower and needs fine processing, the emotion of staff has an important influence on the qualification rate of products. When the daily emotion of the staff is not right, the qualification rate of products produced by the staff depending on the emotion is greatly reduced. Particularly, when a product with more production links is produced, the emotion of the staff in one link is not right, so that the processing link is unqualified, and finally, the qualified product cannot be obtained in each link after the staff makes efforts. Such defective products require rework on the one hand and increase raw material costs on the other hand. Therefore, the production cost of the product is increased indirectly when the emotion of the staff cannot adapt to the working position of the staff in the production process. Therefore, the production field which depends on manpower and needs fine processing plays an important role in identifying whether the emotion of the staff is suitable for the current work. When the current emotion of the staff changes, if the staff can recognize and pause the work in the hands in time, the qualification rate of the produced products can be effectively prevented from being too low.
However, no automatic employee emotion recognition technology for a plant production line exists at present. In particular, there is a lack of techniques for identifying whether the mood of plant process line personnel is appropriate for the work they are doing.
Disclosure of Invention
In order to overcome the defect that the existing emotion recognition method cannot be used for a factory production line to recognize whether the emotion of an employee is suitable for the work which the employee is doing, the invention provides a method for recognizing the emotion based on conversation voice. The method can be used for a factory production line, and can identify whether the emotion of the staff of each post is suitable for the work which is carried out by the staff.
The technical scheme for solving the technical problems is as follows:
a method of recognizing emotion based on conversational voice, comprising the steps of:
obtaining conversation sound in the working process of a certain employee;
extracting voiceprint characteristics of a certain employee based on conversation sound in the working process of the employee;
comparing the voiceprint characteristics of the employee with the standard voiceprint characteristics of the employee in the normal working state; it is determined whether the employee's mood is appropriate for the work being performed.
The inventors found in their research that the difficulty in developing this technology is: for a specific staff at a specific post, the conventional emotion judgment such as happiness, anger, sadness, music and the like has no direct relation with the improvement of the qualification rate of the product; the basic emotions of happiness, anger, sadness, music and the like can not be directly used for judging whether the staff is suitable for the ongoing work; the accuracy rate of judging whether the staff is suitable for the ongoing work by using basic moods such as happiness, anger, sadness, happiness and the like is not high; thus, the existing emotion technology cannot be used in a factory production line to identify whether the emotion of the factory production line is suitable for the currently ongoing work.
The invention creatively obtains the voiceprint characteristics of a certain employee by obtaining the dialogue sound of the employee, and compares the voiceprint characteristics with the voiceprint characteristics of the employee in the normal working state to judge whether the emotion of the employee is suitable for the current work. The method does not need to identify whether the emotion of a certain employee is joy, anger, sadness or not, and can judge whether the emotion of the employee is suitable for the ongoing work or not by directly comparing the emotion with the voiceprint characteristics under the normal working state of the employee; the method only needs to compare with the standard voiceprint characteristics of a certain employee, so that the accuracy and the convenience degree of identification are greatly improved. The method successfully overcomes the defect that the existing emotion recognition method cannot be used for a factory production line to recognize whether the emotion of the staff is suitable for the ongoing work of the staff, and achieves the recognition effect of fast and accurate recognition.
The invention also provides a system for recognizing emotion based on conversation voice, comprising:
the conversation sound acquisition module is used for acquiring conversation sound in the working process of a certain employee;
the voice print feature extraction module is used for extracting voice print features of a certain employee based on conversation voice in the working process of the employee;
the emotion judging module is used for comparing the voiceprint characteristics of the employee with the standard voiceprint characteristics of the employee in the normal working state; it is determined whether the employee's mood is appropriate for the work being performed.
The invention also provides a terminal, which comprises a processor and a memory, wherein the processor is used for executing the program stored in the memory so as to realize the method.
The present invention also provides a storage medium having one or more programs stored thereon for execution by one or more processors to perform the above-described method.
Has the advantages that: the invention provides a brand-new method and a system for recognizing emotion based on conversation voice. The method and the system can be used for a factory production line, and can identify whether the emotion of the staff of each post is suitable for the work which is carried out by the staff; the problem that the existing emotion recognition method cannot be used for a factory production line to recognize whether the emotion of an employee is suitable for the ongoing work of the employee is solved; and a quick and accurate identification effect is obtained.
Drawings
Fig. 1 is a flowchart of a method for recognizing emotion based on conversational voice according to the present invention.
Fig. 2 is a schematic diagram of the system for recognizing emotion based on conversational voice according to the present invention.
Detailed Description
The present invention is further explained below with reference to specific examples, which are not intended to limit the present invention in any way.
As shown in fig. 1, the method for recognizing emotion based on conversational voice of the present invention includes the following steps:
s100, obtaining conversation sound in the working process of a certain employee.
Specifically, the dialogue sound is obtained by a person or a machine in dialogue with an employee. For example, a conversation device can be arranged on a work station of an employee to carry out conversation with the employee; the conversation can be carried out with the staff through manual or set voice, so that the conversation voice of the staff is obtained and is used for extracting the voiceprint characteristics of the staff. For example, employees may be called at intervals: "do you work well on a certain Zhao (employee name)? ", the employee must make a canonical answer to the call: "Zhao somebody (employee name) receives the call and works all the way to normal … …".
S200, extracting voiceprint characteristics of a certain employee based on conversation voice in the working process of the employee.
Specifically, the high-pitched voice print feature, the middle-pitched voice print feature and/or the low-pitched voice print feature of the employee are extracted. The method analyzes the voice of the staff from the aspect of high, middle and low voice print characteristics; whether the current emotion of the employee is suitable for the ongoing work can be accurately and quickly determined. The extraction of the high, middle and low voice print characteristics can be realized by adopting a conventional method.
In a preferred embodiment, a specific method for extracting voiceprint features of an employee based on dialogue sounds in the working process of the employee comprises the following steps: analyzing a high-pitch part of a certain employee in the conversation sound based on the conversation sound in the working process of the employee, and extracting high-pitch voiceprint characteristics of the employee; analyzing a mediant part of an employee in the conversation sound based on the conversation sound in the working process of the employee, and extracting mediant voiceprint characteristics of the employee; or analyzing the bass part of the employee in the conversation sound based on the conversation sound in the working process of a certain employee, and extracting the bass voiceprint characteristic of the employee. The inventor researches and discovers that the conversation sound is further analyzed to combine the high pitch part, the middle pitch part and the low pitch part of the staff in the conversation sound, so that the judgment accuracy of judging whether the emotion of the staff at each post is suitable for the work in progress can be further improved.
In another preferred embodiment, the specific method for extracting the voiceprint feature of an employee based on the dialogue sound in the working process of the employee comprises the following steps: s201, analyzing a high-pitch part of a certain staff in conversation sound based on the conversation sound in the working process of the staff, and extracting high-pitch voiceprint characteristics of the staff; s201, analyzing a mediant part of a certain staff in conversation sound based on the conversation sound in the working process of the staff, and extracting mediant voiceprint characteristics of the staff; s203, analyzing a bass part of a certain employee in the conversation sound based on the conversation sound of the employee in the working process, and extracting a bass voiceprint characteristic of the employee; and S204, combining the high-pitch voiceprint characteristics, the middle-pitch voiceprint characteristics and the low-pitch voiceprint characteristics to obtain the voiceprint characteristics. The inventor researches and discovers that in the embodiment of the invention, conversation sound is further analyzed to combine high pitch, middle pitch and low pitch parts of the staff, so that the judgment accuracy of judging whether the emotion of the staff at each post is suitable for the work in progress can be further improved.
S300, comparing the voiceprint characteristics of the employee with the standard voiceprint characteristics of the employee in the normal working state; it is determined whether the employee's mood is appropriate for the work being performed.
Specifically, the standard voiceprint feature of the employee in the normal working state is a prestored standard voiceprint feature of the employee in the normal working state; the standard voiceprint features include a high pitch voiceprint feature, a mid pitch voiceprint feature and/or a low pitch voiceprint feature.
In a preferred embodiment, the standard voiceprint feature of the employee in the normal working state is obtained by the following method: obtaining conversation sound of a certain employee in the normal working process; analyzing a high-pitch part of a certain employee in the conversation sound based on the conversation sound in the normal working process of the employee, and extracting high-pitch voiceprint characteristics of the employee; analyzing a mediant part of an employee in the conversation sound based on the conversation sound of the employee in the normal working process, and extracting mediant voiceprint characteristics of the employee; or analyzing the bass part of the employee in the conversation sound based on the conversation sound of a certain employee in the normal working process, and extracting the bass voiceprint characteristic of the employee. The standard voiceprint characteristics obtained by the method are compared with the voiceprint characteristics of the staff, and are analyzed from the voiceprint characteristics of high pitch, middle pitch or/and low pitch and the like; therefore, the standard voiceprint characteristics which can be used for comparison can be obtained only by once construction; the defect that the standard voiceprint characteristics can be obtained only through a large number of manual learning methods in the prior art is overcome.
In another preferred embodiment, the standard voiceprint feature of the employee in the normal working state is obtained by the following method: s301, obtaining conversation sound of a certain employee in the normal working process; s302, analyzing a high-pitch part of a certain employee in the conversation sound based on the conversation sound in the normal working process of the employee, and extracting high-pitch voiceprint characteristics of the employee; s303, analyzing the middle voice part of the employee in the conversation voice based on the conversation voice of a certain employee in the normal working process, and extracting the middle voice print characteristics of the employee; s304, analyzing a bass part of a certain employee in the conversation sound based on the conversation sound in the normal working process of the employee, and extracting the bass voiceprint characteristics of the employee; s305, combining the high-pitch voiceprint characteristics, the middle-pitch voiceprint characteristics and the low-pitch voiceprint characteristics to obtain standard voiceprint characteristics. In the embodiment of the invention, the accuracy of the standard voiceprint feature can be further improved by combining the high-pitch voiceprint feature, the middle-pitch voiceprint feature and the low-pitch voiceprint feature.
Specifically, the specific method for determining whether the emotion of the employee is suitable for the ongoing work is as follows: comparing the voiceprint characteristics of the employee with standard voiceprint characteristics of the employee in a normal working state, judging that the emotion of the employee is suitable for the current work if the voiceprint characteristics of the employee are consistent, and judging that the emotion of the employee is not suitable for the current work if the voiceprint characteristics of the employee are inconsistent.
As shown in fig. 2, the present invention provides a system for recognizing emotion based on conversational voice, comprising:
the conversation sound acquisition module 100, the conversation sound acquisition module 100 is used for acquiring conversation sound in the working process of a certain employee;
the voiceprint feature extraction module 200, wherein the voiceprint feature extraction module 200 is used for extracting the voiceprint feature of a certain employee based on the conversation voice in the working process of the employee;
the emotion judging module 300 is used for comparing the voiceprint characteristics of the employee with the standard voiceprint characteristics of the employee in the normal working state; it is determined whether the employee's mood is appropriate for the work being performed.
The method and the system provided by the invention can be used for obtaining the voiceprint characteristics of a certain employee through obtaining the dialogue voice of the employee, comparing the voiceprint characteristics with the voiceprint characteristics of the employee in the normal working state and judging whether the emotion of the employee is suitable for the current working. The method and the system of the invention can judge whether the emotion of the employee is suitable for the ongoing work or not by directly comparing the emotion of the employee with the voiceprint characteristics under the normal working state without identifying whether the emotion of the employee is happy, angry and sadness or not; the method and the system only need to compare with the standard voiceprint characteristics of a certain employee, so that the accuracy and the convenience of identification are greatly improved. The method successfully overcomes the defect that the conventional emotion recognition method cannot be used for a factory production line to recognize whether the emotion of the employee is suitable for the ongoing work of the employee, and achieves the recognition effect of fast and accurate recognition.
In a specific application case, if the emotion is not good in the working process of zhangsan, the system can obtain the dialogue sound of zhangsan through a dialogue device on the zhangsan station, then the voiceprint feature in the working process of the system is obtained through the dialogue sound of zhangsan through the method provided by the invention, and is compared with the standard voiceprint feature of zhangsan, and if the system identifies that the voiceprint feature in the working process of zhangsan is inconsistent with the standard voiceprint feature, a prompt that the current emotion of zhangsan is not suitable for the working process can be given in the system. The manager can pause the work at the head of the manager in time, and can guide the manager to the emotion or adjust the manager to other posts which are less affected by the current emotion; the problem that the product qualification rate is reduced due to the emotion of staff is avoided.

Claims (9)

1. A method for recognizing emotion based on conversational speech, comprising the steps of:
obtaining conversation sound in the working process of a certain employee;
extracting voiceprint characteristics of a certain employee based on conversation sound in the working process of the employee;
comparing the voiceprint characteristics of the employee with the standard voiceprint characteristics of the employee in the normal working state; it is determined whether the employee's mood is appropriate for the work being performed.
2. The method for recognizing emotion based on conversation sound according to claim 1, wherein said conversation sound is obtained by a human or machine conversation with an employee.
3. The method of claim 1, wherein the voiceprint features comprise a high pitch voiceprint feature, a medium pitch voiceprint feature and/or a low pitch voiceprint feature.
4. The method for recognizing emotion based on conversation sound according to claim 3, wherein the specific method for extracting the voiceprint feature of an employee based on the conversation sound in the working process of the employee is as follows:
analyzing a high-pitch part of a certain employee in the conversation sound based on the conversation sound in the working process of the employee, and extracting high-pitch voiceprint characteristics of the employee;
analyzing a mediant part of an employee in the conversation sound based on the conversation sound in the working process of the employee, and extracting mediant voiceprint characteristics of the employee; or the like, or, alternatively,
based on conversation sound in the working process of a certain employee, the bass part of the employee in the conversation sound is analyzed, and the bass voiceprint feature of the employee is extracted.
5. The method for recognizing emotion based on conversation sound according to claim 3, wherein the specific method for extracting the voiceprint feature of an employee based on the conversation sound in the working process of the employee is as follows:
analyzing a high-pitch part of a certain employee in the conversation sound based on the conversation sound in the working process of the employee, and extracting high-pitch voiceprint characteristics of the employee;
analyzing a mediant part of an employee in the conversation sound based on the conversation sound in the working process of the employee, and extracting mediant voiceprint characteristics of the employee;
analyzing a bass part of an employee in conversation sound based on the conversation sound in the working process of the employee, and extracting bass voiceprint characteristics of the employee;
and combining any two or three of the high-pitch voiceprint characteristic, the middle-pitch voiceprint characteristic and the low-pitch voiceprint characteristic to obtain the voiceprint characteristic.
6. The method for recognizing emotion based on conversation sound according to claim 1, wherein the standard voiceprint feature of the employee in normal working state is a pre-stored standard voiceprint feature of the employee in normal working state.
7. A system for recognizing emotion based on conversational voice, comprising:
the conversation sound acquisition module is used for acquiring conversation sound in the working process of a certain employee;
the voice print feature extraction module is used for extracting voice print features of a certain employee based on conversation voice in the working process of the employee;
the emotion judging module is used for comparing the voiceprint characteristics of the employee with the standard voiceprint characteristics of the employee in the normal working state; it is determined whether the employee's mood is appropriate for the work being performed.
8. A terminal, comprising a processor and a memory, wherein the processor is configured to execute a program stored in the memory to implement the method of any one of claims 1 to 6.
9. A storage medium, characterized in that the storage medium stores one or more programs, which are executable by one or more processors to implement the method of any one of claims 1 to 6.
CN202010779424.8A 2020-08-05 2020-08-05 Method and system for recognizing emotion based on conversation voice Withdrawn CN112002331A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010779424.8A CN112002331A (en) 2020-08-05 2020-08-05 Method and system for recognizing emotion based on conversation voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010779424.8A CN112002331A (en) 2020-08-05 2020-08-05 Method and system for recognizing emotion based on conversation voice

Publications (1)

Publication Number Publication Date
CN112002331A true CN112002331A (en) 2020-11-27

Family

ID=73464131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010779424.8A Withdrawn CN112002331A (en) 2020-08-05 2020-08-05 Method and system for recognizing emotion based on conversation voice

Country Status (1)

Country Link
CN (1) CN112002331A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105895101A (en) * 2016-06-08 2016-08-24 国网上海市电力公司 Speech processing equipment and processing method for power intelligent auxiliary service system
CN108174046A (en) * 2017-11-10 2018-06-15 大连金慧融智科技股份有限公司 A kind of personnel monitoring system and method for call center
CN111179943A (en) * 2019-10-30 2020-05-19 王东 Conversation auxiliary equipment and method for acquiring information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105895101A (en) * 2016-06-08 2016-08-24 国网上海市电力公司 Speech processing equipment and processing method for power intelligent auxiliary service system
CN108174046A (en) * 2017-11-10 2018-06-15 大连金慧融智科技股份有限公司 A kind of personnel monitoring system and method for call center
CN111179943A (en) * 2019-10-30 2020-05-19 王东 Conversation auxiliary equipment and method for acquiring information

Similar Documents

Publication Publication Date Title
Levow Characterizing and recognizing spoken corrections in human-computer dialogue
EP1222656B1 (en) Telephonic emotion detector with operator feedback
EP1222448B1 (en) System, method, and article of manufacture for detecting emotion in voice signals by utilizing statistics for voice signal parameters
EP2482277B1 (en) Method for identifying a speaker using formant equalization
US5228087A (en) Speech recognition apparatus and methods
US20180122377A1 (en) Voice interaction apparatus and voice interaction method
EP0389541A1 (en) Pattern recognition error reduction system
KR20040014431A (en) Device and method for judging dog's feeling from cry vocal character analysis
Ravikumar et al. Automatic detection of syllable repetition in read speech for objective assessment of stuttered disfluencies
US6629072B1 (en) Method of an arrangement for speech recognition with speech velocity adaptation
Rybka et al. Comparison of speaker dependent and speaker independent emotion recognition
US6006185A (en) System and device for advanced voice recognition word spotting
EP1280137B1 (en) Method for speaker identification
WO2015057661A1 (en) System and method for automated speech recognition
CN111179965A (en) Pet emotion recognition method and system
US20040073425A1 (en) Arrangement for real-time automatic recognition of accented speech
Huang et al. Toward a speaker-independent real-time affect detection system
EP1298645A1 (en) Method for detecting emotions in speech, involving linguistic correlation information
CN112002331A (en) Method and system for recognizing emotion based on conversation voice
CN108986844A (en) A kind of sound end detecting method based on speaker's phonetic feature
WO2014167570A1 (en) System and method for extracting and using prosody features
Mishra et al. Real time emotion detection from speech using Raspberry Pi 3
JP4839970B2 (en) Prosody identification apparatus and method, and speech recognition apparatus and method
Hansen Evaluation of acoustic correlates of speech under stress for robust speech recognition
Ishi et al. Analysis of Acoustic-Prosodic Features Related to Paralinguistic Information Carried by Interjections in Dialogue Speech.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20201127