CN112002331A

CN112002331A - Method and system for recognizing emotion based on conversation voice

Info

Publication number: CN112002331A
Application number: CN202010779424.8A
Authority: CN
Inventors: 王韬; 秦瀚
Original assignee: Guangzhou Jingrui Intelligent Technology Co ltd
Current assignee: Guangzhou Jingrui Intelligent Technology Co ltd
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2020-11-27

Abstract

The invention relates to the technical field of emotion recognition, and particularly discloses a method and a system for recognizing emotion based on conversation voice. The method for recognizing the emotion based on the conversation voice comprises the following steps: obtaining conversation sound in the working process of a certain employee; extracting voiceprint characteristics of a certain employee based on conversation sound in the working process of the employee; comparing the voiceprint characteristics of the employee with the standard voiceprint characteristics of the employee in the normal working state; it is determined whether the employee's mood is appropriate for the work being performed. The method and the system can be used for a factory production line, and can identify whether the emotion of the staff of each post is suitable for the work which is carried out by the staff; and the quick and accurate recognition effect is obtained.

Description

Method and system for recognizing emotion based on conversation voice

Technical Field

The invention relates to the technical field of emotion recognition, in particular to a method and a system for recognizing emotion based on conversation voice.

Background

The emotion refers to the attitude of external objects generated along with the cognitive and consciousness processes, is a reaction to the relation between objective objects and main body requirements, is a psychological activity mediated by individual desire and need, and is mainly expressed as several types of happiness, anger, sadness, happiness and the like in life; the above-mentioned emotions of the person can be reflected by the voice during the conversation. With the development of the AI technology, the existing AI technology can achieve the purpose of recognizing the main emotion types of people by extracting the voiceprint features through conversational speech.

Except for life, in work, the effect of mood changes on work is quite large. Especially in the field of industrial production which depends on manpower and needs fine processing, the emotion of staff has an important influence on the qualification rate of products. When the daily emotion of the staff is not right, the qualification rate of products produced by the staff depending on the emotion is greatly reduced. Particularly, when a product with more production links is produced, the emotion of the staff in one link is not right, so that the processing link is unqualified, and finally, the qualified product cannot be obtained in each link after the staff makes efforts. Such defective products require rework on the one hand and increase raw material costs on the other hand. Therefore, the production cost of the product is increased indirectly when the emotion of the staff cannot adapt to the working position of the staff in the production process. Therefore, the production field which depends on manpower and needs fine processing plays an important role in identifying whether the emotion of the staff is suitable for the current work. When the current emotion of the staff changes, if the staff can recognize and pause the work in the hands in time, the qualification rate of the produced products can be effectively prevented from being too low.

However, no automatic employee emotion recognition technology for a plant production line exists at present. In particular, there is a lack of techniques for identifying whether the mood of plant process line personnel is appropriate for the work they are doing.

Disclosure of Invention

In order to overcome the defect that the existing emotion recognition method cannot be used for a factory production line to recognize whether the emotion of an employee is suitable for the work which the employee is doing, the invention provides a method for recognizing the emotion based on conversation voice. The method can be used for a factory production line, and can identify whether the emotion of the staff of each post is suitable for the work which is carried out by the staff.

The technical scheme for solving the technical problems is as follows:

a method of recognizing emotion based on conversational voice, comprising the steps of:

obtaining conversation sound in the working process of a certain employee;

extracting voiceprint characteristics of a certain employee based on conversation sound in the working process of the employee;

comparing the voiceprint characteristics of the employee with the standard voiceprint characteristics of the employee in the normal working state; it is determined whether the employee's mood is appropriate for the work being performed.

The inventors found in their research that the difficulty in developing this technology is: for a specific staff at a specific post, the conventional emotion judgment such as happiness, anger, sadness, music and the like has no direct relation with the improvement of the qualification rate of the product; the basic emotions of happiness, anger, sadness, music and the like can not be directly used for judging whether the staff is suitable for the ongoing work; the accuracy rate of judging whether the staff is suitable for the ongoing work by using basic moods such as happiness, anger, sadness, happiness and the like is not high; thus, the existing emotion technology cannot be used in a factory production line to identify whether the emotion of the factory production line is suitable for the currently ongoing work.

The invention creatively obtains the voiceprint characteristics of a certain employee by obtaining the dialogue sound of the employee, and compares the voiceprint characteristics with the voiceprint characteristics of the employee in the normal working state to judge whether the emotion of the employee is suitable for the current work. The method does not need to identify whether the emotion of a certain employee is joy, anger, sadness or not, and can judge whether the emotion of the employee is suitable for the ongoing work or not by directly comparing the emotion with the voiceprint characteristics under the normal working state of the employee; the method only needs to compare with the standard voiceprint characteristics of a certain employee, so that the accuracy and the convenience degree of identification are greatly improved. The method successfully overcomes the defect that the existing emotion recognition method cannot be used for a factory production line to recognize whether the emotion of the staff is suitable for the ongoing work of the staff, and achieves the recognition effect of fast and accurate recognition.

The invention also provides a system for recognizing emotion based on conversation voice, comprising:

the conversation sound acquisition module is used for acquiring conversation sound in the working process of a certain employee;

the voice print feature extraction module is used for extracting voice print features of a certain employee based on conversation voice in the working process of the employee;

the emotion judging module is used for comparing the voiceprint characteristics of the employee with the standard voiceprint characteristics of the employee in the normal working state; it is determined whether the employee's mood is appropriate for the work being performed.

The invention also provides a terminal, which comprises a processor and a memory, wherein the processor is used for executing the program stored in the memory so as to realize the method.

The present invention also provides a storage medium having one or more programs stored thereon for execution by one or more processors to perform the above-described method.

Has the advantages that: the invention provides a brand-new method and a system for recognizing emotion based on conversation voice. The method and the system can be used for a factory production line, and can identify whether the emotion of the staff of each post is suitable for the work which is carried out by the staff; the problem that the existing emotion recognition method cannot be used for a factory production line to recognize whether the emotion of an employee is suitable for the ongoing work of the employee is solved; and a quick and accurate identification effect is obtained.

Drawings

Fig. 1 is a flowchart of a method for recognizing emotion based on conversational voice according to the present invention.

Fig. 2 is a schematic diagram of the system for recognizing emotion based on conversational voice according to the present invention.

Detailed Description

The present invention is further explained below with reference to specific examples, which are not intended to limit the present invention in any way.

As shown in fig. 1, the method for recognizing emotion based on conversational voice of the present invention includes the following steps:

s100, obtaining conversation sound in the working process of a certain employee.

Specifically, the dialogue sound is obtained by a person or a machine in dialogue with an employee. For example, a conversation device can be arranged on a work station of an employee to carry out conversation with the employee; the conversation can be carried out with the staff through manual or set voice, so that the conversation voice of the staff is obtained and is used for extracting the voiceprint characteristics of the staff. For example, employees may be called at intervals: "do you work well on a certain Zhao (employee name)? ", the employee must make a canonical answer to the call: "Zhao somebody (employee name) receives the call and works all the way to normal … …".

S200, extracting voiceprint characteristics of a certain employee based on conversation voice in the working process of the employee.

Specifically, the high-pitched voice print feature, the middle-pitched voice print feature and/or the low-pitched voice print feature of the employee are extracted. The method analyzes the voice of the staff from the aspect of high, middle and low voice print characteristics; whether the current emotion of the employee is suitable for the ongoing work can be accurately and quickly determined. The extraction of the high, middle and low voice print characteristics can be realized by adopting a conventional method.

In a preferred embodiment, a specific method for extracting voiceprint features of an employee based on dialogue sounds in the working process of the employee comprises the following steps: analyzing a high-pitch part of a certain employee in the conversation sound based on the conversation sound in the working process of the employee, and extracting high-pitch voiceprint characteristics of the employee; analyzing a mediant part of an employee in the conversation sound based on the conversation sound in the working process of the employee, and extracting mediant voiceprint characteristics of the employee; or analyzing the bass part of the employee in the conversation sound based on the conversation sound in the working process of a certain employee, and extracting the bass voiceprint characteristic of the employee. The inventor researches and discovers that the conversation sound is further analyzed to combine the high pitch part, the middle pitch part and the low pitch part of the staff in the conversation sound, so that the judgment accuracy of judging whether the emotion of the staff at each post is suitable for the work in progress can be further improved.

In another preferred embodiment, the specific method for extracting the voiceprint feature of an employee based on the dialogue sound in the working process of the employee comprises the following steps: s201, analyzing a high-pitch part of a certain staff in conversation sound based on the conversation sound in the working process of the staff, and extracting high-pitch voiceprint characteristics of the staff; s201, analyzing a mediant part of a certain staff in conversation sound based on the conversation sound in the working process of the staff, and extracting mediant voiceprint characteristics of the staff; s203, analyzing a bass part of a certain employee in the conversation sound based on the conversation sound of the employee in the working process, and extracting a bass voiceprint characteristic of the employee; and S204, combining the high-pitch voiceprint characteristics, the middle-pitch voiceprint characteristics and the low-pitch voiceprint characteristics to obtain the voiceprint characteristics. The inventor researches and discovers that in the embodiment of the invention, conversation sound is further analyzed to combine high pitch, middle pitch and low pitch parts of the staff, so that the judgment accuracy of judging whether the emotion of the staff at each post is suitable for the work in progress can be further improved.

S300, comparing the voiceprint characteristics of the employee with the standard voiceprint characteristics of the employee in the normal working state; it is determined whether the employee's mood is appropriate for the work being performed.

Specifically, the standard voiceprint feature of the employee in the normal working state is a prestored standard voiceprint feature of the employee in the normal working state; the standard voiceprint features include a high pitch voiceprint feature, a mid pitch voiceprint feature and/or a low pitch voiceprint feature.

In a preferred embodiment, the standard voiceprint feature of the employee in the normal working state is obtained by the following method: obtaining conversation sound of a certain employee in the normal working process; analyzing a high-pitch part of a certain employee in the conversation sound based on the conversation sound in the normal working process of the employee, and extracting high-pitch voiceprint characteristics of the employee; analyzing a mediant part of an employee in the conversation sound based on the conversation sound of the employee in the normal working process, and extracting mediant voiceprint characteristics of the employee; or analyzing the bass part of the employee in the conversation sound based on the conversation sound of a certain employee in the normal working process, and extracting the bass voiceprint characteristic of the employee. The standard voiceprint characteristics obtained by the method are compared with the voiceprint characteristics of the staff, and are analyzed from the voiceprint characteristics of high pitch, middle pitch or/and low pitch and the like; therefore, the standard voiceprint characteristics which can be used for comparison can be obtained only by once construction; the defect that the standard voiceprint characteristics can be obtained only through a large number of manual learning methods in the prior art is overcome.

In another preferred embodiment, the standard voiceprint feature of the employee in the normal working state is obtained by the following method: s301, obtaining conversation sound of a certain employee in the normal working process; s302, analyzing a high-pitch part of a certain employee in the conversation sound based on the conversation sound in the normal working process of the employee, and extracting high-pitch voiceprint characteristics of the employee; s303, analyzing the middle voice part of the employee in the conversation voice based on the conversation voice of a certain employee in the normal working process, and extracting the middle voice print characteristics of the employee; s304, analyzing a bass part of a certain employee in the conversation sound based on the conversation sound in the normal working process of the employee, and extracting the bass voiceprint characteristics of the employee; s305, combining the high-pitch voiceprint characteristics, the middle-pitch voiceprint characteristics and the low-pitch voiceprint characteristics to obtain standard voiceprint characteristics. In the embodiment of the invention, the accuracy of the standard voiceprint feature can be further improved by combining the high-pitch voiceprint feature, the middle-pitch voiceprint feature and the low-pitch voiceprint feature.

Specifically, the specific method for determining whether the emotion of the employee is suitable for the ongoing work is as follows: comparing the voiceprint characteristics of the employee with standard voiceprint characteristics of the employee in a normal working state, judging that the emotion of the employee is suitable for the current work if the voiceprint characteristics of the employee are consistent, and judging that the emotion of the employee is not suitable for the current work if the voiceprint characteristics of the employee are inconsistent.

As shown in fig. 2, the present invention provides a system for recognizing emotion based on conversational voice, comprising:

the conversation sound acquisition module 100, the conversation sound acquisition module 100 is used for acquiring conversation sound in the working process of a certain employee;

the voiceprint feature extraction module 200, wherein the voiceprint feature extraction module 200 is used for extracting the voiceprint feature of a certain employee based on the conversation voice in the working process of the employee;

the emotion judging module 300 is used for comparing the voiceprint characteristics of the employee with the standard voiceprint characteristics of the employee in the normal working state; it is determined whether the employee's mood is appropriate for the work being performed.

The method and the system provided by the invention can be used for obtaining the voiceprint characteristics of a certain employee through obtaining the dialogue voice of the employee, comparing the voiceprint characteristics with the voiceprint characteristics of the employee in the normal working state and judging whether the emotion of the employee is suitable for the current working. The method and the system of the invention can judge whether the emotion of the employee is suitable for the ongoing work or not by directly comparing the emotion of the employee with the voiceprint characteristics under the normal working state without identifying whether the emotion of the employee is happy, angry and sadness or not; the method and the system only need to compare with the standard voiceprint characteristics of a certain employee, so that the accuracy and the convenience of identification are greatly improved. The method successfully overcomes the defect that the conventional emotion recognition method cannot be used for a factory production line to recognize whether the emotion of the employee is suitable for the ongoing work of the employee, and achieves the recognition effect of fast and accurate recognition.

In a specific application case, if the emotion is not good in the working process of zhangsan, the system can obtain the dialogue sound of zhangsan through a dialogue device on the zhangsan station, then the voiceprint feature in the working process of the system is obtained through the dialogue sound of zhangsan through the method provided by the invention, and is compared with the standard voiceprint feature of zhangsan, and if the system identifies that the voiceprint feature in the working process of zhangsan is inconsistent with the standard voiceprint feature, a prompt that the current emotion of zhangsan is not suitable for the working process can be given in the system. The manager can pause the work at the head of the manager in time, and can guide the manager to the emotion or adjust the manager to other posts which are less affected by the current emotion; the problem that the product qualification rate is reduced due to the emotion of staff is avoided.

Claims

1. A method for recognizing emotion based on conversational speech, comprising the steps of:

obtaining conversation sound in the working process of a certain employee;

2. The method for recognizing emotion based on conversation sound according to claim 1, wherein said conversation sound is obtained by a human or machine conversation with an employee.

3. The method of claim 1, wherein the voiceprint features comprise a high pitch voiceprint feature, a medium pitch voiceprint feature and/or a low pitch voiceprint feature.

4. The method for recognizing emotion based on conversation sound according to claim 3, wherein the specific method for extracting the voiceprint feature of an employee based on the conversation sound in the working process of the employee is as follows:

analyzing a high-pitch part of a certain employee in the conversation sound based on the conversation sound in the working process of the employee, and extracting high-pitch voiceprint characteristics of the employee;

analyzing a mediant part of an employee in the conversation sound based on the conversation sound in the working process of the employee, and extracting mediant voiceprint characteristics of the employee; or the like, or, alternatively,

based on conversation sound in the working process of a certain employee, the bass part of the employee in the conversation sound is analyzed, and the bass voiceprint feature of the employee is extracted.

5. The method for recognizing emotion based on conversation sound according to claim 3, wherein the specific method for extracting the voiceprint feature of an employee based on the conversation sound in the working process of the employee is as follows:

analyzing a mediant part of an employee in the conversation sound based on the conversation sound in the working process of the employee, and extracting mediant voiceprint characteristics of the employee;

analyzing a bass part of an employee in conversation sound based on the conversation sound in the working process of the employee, and extracting bass voiceprint characteristics of the employee;

and combining any two or three of the high-pitch voiceprint characteristic, the middle-pitch voiceprint characteristic and the low-pitch voiceprint characteristic to obtain the voiceprint characteristic.

6. The method for recognizing emotion based on conversation sound according to claim 1, wherein the standard voiceprint feature of the employee in normal working state is a pre-stored standard voiceprint feature of the employee in normal working state.

7. A system for recognizing emotion based on conversational voice, comprising:

8. A terminal, comprising a processor and a memory, wherein the processor is configured to execute a program stored in the memory to implement the method of any one of claims 1 to 6.

9. A storage medium, characterized in that the storage medium stores one or more programs, which are executable by one or more processors to implement the method of any one of claims 1 to 6.