CN116665281B

CN116665281B - Key emotion extraction method based on doctor-patient interaction

Info

Publication number: CN116665281B
Application number: CN202310773657.0A
Authority: CN
Inventors: 杨文君; 任强; 龙海; 文舸扬; 文建全; 黄刊迪
Original assignee: Hunan Trasen Technology Co ltd
Current assignee: Hunan Trasen Technology Co ltd
Priority date: 2023-06-28
Filing date: 2023-06-28
Publication date: 2024-05-10
Anticipated expiration: 2043-06-28
Also published as: CN116665281A

Abstract

The invention discloses a key emotion extraction method based on doctor-patient interaction, which comprises the following steps of: acquiring facial image data of a patient to form a first image set; a face detection model is built, and face detection and recognition are carried out on the image frames in the first image set through the face detection model, so that a second image set is formed; constructing an emotion frame detection model, and carrying out emotion recognition on the second image set through the emotion frame detection model to form a third image set; and constructing an emotion key frame detection model, and extracting the emotion key frames of the patient from the third image set through the emotion key frame detection model so as to realize the identification of the real-time emotion of the patient and the extraction of the emotion key frames in the doctor-patient interaction process. The invention solves the technical problems that the traditional mode of acquiring the emotional state of the patient through off-line evaluation by medical staff is not convenient enough and the mental state evaluation of the patient is not accurate enough.

Description

Key emotion extraction method based on doctor-patient interaction

Technical Field

The invention relates to the technical field of intelligent medical services, in particular to a key emotion extraction method based on doctor-patient interaction.

Background

Mental disorders have become one of the focus of social attention in modern medicine. As the population ages and the pace of life increases, various stress and emotion problems become one of the daily challenges facing people. Mental state assessment is an important problem in the medical field and can be used for assisting in diagnosing mental diseases such as depression, dysphoria and the like, and emotion recognition is an important part of mental state assessment of patients.

Currently, researchers explore emotion recognition of patients based on doctor-patient interaction patterns by using various technical means such as facial expression analysis, speech emotion recognition, physiological signal monitoring and the like; for example, by analyzing the facial expressions of a doctor and patient, the emotions of the doctor and patient are classified and identified by computer vision technology; through monitoring the voice and physiological signals of the patient, the emotion characteristics are extracted from the voice and physiological signals by combining a machine learning technology, and the emotion of the patient is further identified and classified. Facial expression emotion recognition is an important branch in the fields of computer vision and artificial intelligence, and aims to analyze facial expressions through computer vision and pattern recognition technologies so as to judge the emotional state of a person. However, in practical applications, the mental state is still evaluated by using an offline doctor-patient interaction mode, and the emotion recognition accuracy through facial expression analysis, voice emotion recognition or physiological signal detection is not high, so that a large amount of data needs to be acquired. Therefore, it is needed to provide a key emotion extraction method based on doctor-patient interaction, which solves the technical problems that the existing medical staff is not convenient enough to acquire the emotion state of the patient through offline evaluation and the mental state evaluation of the patient is not accurate enough.

Disclosure of Invention

The invention mainly aims to provide a key emotion extraction method based on doctor-patient interaction, and aims to solve the technical problems that an existing medical staff is not convenient enough to acquire the emotion state of a patient through offline evaluation and the mental state evaluation of the patient is not accurate enough.

In order to achieve the above purpose, the present invention provides a key emotion extraction method based on doctor-patient interaction, wherein the key emotion extraction method based on doctor-patient interaction includes the following steps:

S1, acquiring facial image data of a patient to form a first image set;

s2, constructing a face detection model, and carrying out face detection and recognition on the image frames in the first image set through the face detection model to form a second image set;

s3, constructing an emotion frame detection model, and carrying out emotion recognition on the second image set through the emotion frame detection model to form a third image set;

S4, constructing an emotion key frame detection model, and extracting the emotion key frames of the patient from the third image set through the emotion key frame detection model so as to realize the identification of the real-time emotion of the patient and the extraction of the emotion key frames in the doctor-patient interaction process.

In one preferred embodiment, the step S1 acquires facial image data of the patient to form a first image set, specifically:

The method comprises the steps of obtaining facial image data of a patient through image acquisition equipment, carrying out data analysis on the facial image data to obtain a plurality of image frames, and carrying out normalization processing on the plurality of image frames to form a first image set.

In one preferred embodiment, the face detection model adopts RETINAFACE network model architecture.

In one preferred embodiment, the face detection model includes a linear rectification function, and the linear rectification function is used for performing pixel correction processing on a plurality of image frames in the first image set.

In one of the preferred schemes, the face detection model comprises a PFLD face key point recognition algorithm.

In one preferred embodiment, the emotion frame detection model includes ResNet a network, and the ResNet network is configured to perform classification and identification of emotion of the patient on the second image set, so as to obtain an emotion prediction result.

In one of the preferred embodiments, the emotion prediction result includes a natural emotion, a happy emotion, a sad emotion, a vital emotion, a fear emotion, a surprise emotion and a frigid emotion.

In one preferred embodiment, the step S4 is performed on the third image set by using the emotion key frame detection model to extract an emotion key frame of the patient, and specifically includes:

storing emotion prediction results and time labels of each image frame to an emotion linked list;

judging whether the emotion prediction results of the current moment and the image frame at the previous moment are consistent, if not, setting the current image frame as an emotion key frame, updating an emotion linked list, and if so, executing the next step;

and counting the frequency value of the emotion prediction result of each image frame in the emotion linked list, if the frequency value of the emotion prediction result of the current image frame in continuous time is positioned in an emotion frequency threshold value interval, setting the current image frame as an emotion key frame, updating the emotion linked list, and otherwise, setting the current image frame as a common frame.

In one preferred embodiment, the threshold region of emotion frequency is (7, 10).

In one preferred embodiment, the step S4 further includes, after completing the recognition of the real-time emotion of the patient and extracting the emotion key frame:

And encrypting the emotion prediction result and the time tag of the patient by adopting an encryption algorithm, and uploading the generated hash value to a blockchain network for storage.

In the technical scheme of the invention, the key emotion extraction method based on doctor-patient interaction comprises the following steps of: acquiring facial image data of a patient to form a first image set; a face detection model is built, and face detection and recognition are carried out on the image frames in the first image set through the face detection model, so that a second image set is formed; constructing an emotion frame detection model, and carrying out emotion recognition on the second image set through the emotion frame detection model to form a third image set; and constructing an emotion key frame detection model, and extracting the emotion key frames of the patient from the third image set through the emotion key frame detection model so as to realize the identification of the real-time emotion of the patient and the extraction of the emotion key frames in the doctor-patient interaction process. The invention solves the technical problems that the traditional mode of acquiring the emotional state of the patient through off-line evaluation by medical staff is not convenient enough and the mental and psychological state evaluation of the patient is not accurate enough.

In the invention, resNet networks are adopted to identify the emotion of the patient, the identified emotion prediction result is used for extracting the emotion key frame, and the mental state of the patient is evaluated based on the emotion key frame and the corresponding time label, so that the accuracy and the instantaneity of the mental state evaluation are improved.

In the invention, after the identification of the real-time emotion and key frame of the patient in the doctor-patient interaction process is completed, the emotion prediction result and time label of the patient are encrypted, the encrypted hash value is uploaded to the blockchain network, and the blockchain network is adopted to perform the uplink operation on the emotion prediction result and time label of the patient, so as to protect the privacy of the patient.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings may be obtained from the structures shown in these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a key emotion extraction method based on doctor-patient interaction according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a face detection model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of emotion frame recognition for a patient according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating emotion key frame recognition according to an embodiment of the present invention;

Fig. 5 is a schematic diagram of a face detection and recognition result according to an embodiment of the present invention;

FIG. 6 is a schematic diagram showing the results of emotion recognition for a patient according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a result of emotion key frame recognition according to an embodiment of the present invention.

The achievement of the object, functional features and advantages of the present invention will be further described with reference to the drawings in connection with the embodiments.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention.

Furthermore, descriptions such as those referred to as "first," "second," and the like, are provided for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implying an order of magnitude of the indicated technical features in the present disclosure. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature.

Moreover, the technical solutions of the embodiments of the present invention may be combined with each other, but it is necessary to be based on the fact that those skilled in the art can implement the embodiments, and when the technical solutions are contradictory or cannot be implemented, it should be considered that the combination of the technical solutions does not exist, and is not within the scope of protection claimed by the present invention.

Referring to fig. 1, according to an aspect of the present invention, the present invention provides a key emotion extraction method based on doctor-patient interaction, wherein the key emotion extraction method based on doctor-patient interaction includes the following steps:

S1, acquiring facial image data of a patient to form a first image set;

Specifically, in this embodiment, the meta-diagnosis room is a new medical scenario, a virtual doctor-patient interaction diagnosis and treatment environment is generated by using high-tech equipment and information technology means, and medical resources are concentrated in one room to provide a medical scenario of one-stop medical service. The meta-diagnosis room comprises a doctor workstation, a patient bed, diagnosis and treatment equipment, information technology equipment and other functional areas, and a doctor can check medical record data of a patient, monitor physiological information of the patient, perform diagnosis, treatment and other works on the workstation; the patient can receive diagnosis and treatment, monitor physiological information and the like on the bed; the diagnosis and treatment equipment comprises various medical equipment such as electrocardiographs, sphygmomanometers, thermometers and the like; the information technology equipment comprises video monitoring equipment, remote medical equipment, intelligent diagnosis equipment and the like, and can realize functions of remote medical treatment, information sharing and the like between doctors and patients. The meta-office can provide more humanized medical services, so that interaction between a patient and a doctor is tighter, the medical efficiency can be improved, the medical cost is reduced, the medical experience is improved, and a new thought is provided for the development of medical technology. According to the invention, doctor-patient interaction is performed in a meta-diagnosis room scene, and facial image data of a patient is obtained, so that emotion key frames of the patient are extracted in real time, the accuracy and efficiency of mental state assessment are improved, and a better and accurate diagnosis basis is provided for doctors.

Specifically, in this embodiment, before the step S1 of acquiring the facial image data of the patient, the method further includes: the method comprises the steps of guiding a patient to start acquisition equipment, starting an AI digital human doctor, asking the patient according to a mental assessment scale through the AI digital human doctor, performing interaction such as emotion perception, game interaction and the like, recording current image information of the patient, namely facial image data of the patient through the acquisition equipment, uploading the image information to a server in real time, closing the acquisition equipment if the AI digital human doctor asks to end according to the mental assessment scale, ending doctor-patient interaction, enabling the patient to confirm to end, and enabling the acquisition equipment to automatically close if the patient does not confirm, wherein the acquisition equipment does not capture any facial data beyond a time threshold, the time threshold is 10 minutes, and the method is not specifically limited and can be specifically set according to requirements; if not, continuously asking questions according to the psychological assessment scale; in the present invention, the acquisition device is an image capturing device, and the present invention is not particularly limited, and may be specifically set as needed.

Specifically, in this embodiment, the step S1 acquires facial image data of the patient to form a first image set, specifically: the method comprises the steps of obtaining facial image data of a patient through image acquisition equipment, carrying out data analysis on the facial image data to obtain a plurality of image frames, and carrying out normalization processing on the plurality of image frames to form a first image set.

Specifically, in this embodiment, referring to fig. 2 and fig. 5, the face detection model adopts RETINAFACE network model architecture; wherein, the RETINAFACE network model architecture adopts a pyramid structure; the face detection model comprises a linear rectification function, wherein the linear rectification function is used for carrying out pixel correction processing on a plurality of image frames in a first image set, and replacing all negative values in a feature image in the first image set with 0, namely replacing all negative values in the plurality of image frames in the first image set with 0, and forming feature images Conv1_x, conv2_x, conv3_x, conv4_x and Conv5_x through convolution operation from bottom to top; the method comprises the steps of adopting 1X 1 convolution operation with the step length of 1 for a characteristic layer Conv5_x to form a characteristic layer M5, carrying out up-sampling operation on the characteristic layer M5, fusing the characteristic layer with the characteristic layer after adopting the 1X 1 convolution operation with the step length of 1 for the characteristic layer Conv4_x, and carrying out up-sampling operation on the fused characteristic layer once to form a characteristic layer M4; after the up-sampling operation is carried out on the feature layer M4 once, the feature layer M4 is fused with the feature layer after the 1 multiplied by 1 convolution operation with the step length of 1 is adopted on the feature layer Conv3_x, and the feature layer M3 is formed after the up-sampling operation is carried out on the fused feature layer again; after the feature layer M3 is subjected to one-time up-sampling operation, the feature layer M3 is fused with the feature layer subjected to 1 gamma 1 convolution operation with the step length of 1 on the feature layer Conv2_x, and the fused feature layer is subjected to one-time up-sampling operation to form a feature layer M2; performing 3 gamma 3 convolution operation with the step length of 2 on the M2, M3, M4, M5 and Conv5_x characteristic layers to form effective characteristic layers P2, P3, P4, P5 and P6, and finally performing classification, regression and face key point prediction on the effective characteristic layers P2, P3, P4, P5 and P6 respectively, wherein a classification task outputs classification class (cls) of face pixels; the regression task outputs the position (box) of the face in the image, comprising four vertex coordinates; the invention adopts the PFLD face key point recognition algorithm to predict the face key points and output a plurality of values, in the invention, the PFLD face key point recognition algorithm is adopted to predict the face key points (landmark), ten values are output, and the coordinates of eyes, nose tips and mouth angles are respectively positioned.

Specifically, in this embodiment, referring to fig. 3 and fig. 6, the emotion frame detection model includes ResNet networks, and the ResNet networks are configured to perform classification and identification of emotion of the patient on the second image set, so as to obtain an emotion prediction result; the ResNet network carries out iterative training through a data set which completes emotion recognition in advance so as to have the function of emotion classification recognition, the invention is not particularly limited, and the iterative training is carried out by adopting a conventional technical means; and inputting face images in the second image set identified by the face detection model into the ResNet network which is already pre-trained, so as to obtain emotion prediction results of the patient at each moment and form a third image set.

Specifically, in this embodiment, the emotion prediction result includes a natural emotion, a happy emotion, a sad emotion, a vital emotion, a fear emotion, a surprise emotion, a frigid emotion, and the like, and the present invention is not particularly limited, and may be specifically set as needed.

Specifically, in this embodiment, referring to fig. 4 and fig. 7, in step S4, patient emotion key frame extraction is performed on the third image set through the emotion key frame detection model, specifically: storing emotion prediction results and time labels of each image frame to an emotion linked list; judging whether the emotion prediction results of the current moment and the image frame at the previous moment are consistent, if not, setting the current image frame as an emotion key frame, updating an emotion linked list, and if so, executing the next step; and counting the frequency value of the emotion prediction result of each image frame in the emotion linked list, if the frequency value of the emotion prediction result of the current image frame in continuous time is positioned in an emotion frequency threshold value interval, setting the current image frame as an emotion key frame, updating the emotion linked list, and otherwise, setting the current image frame as a common frame.

Specifically, in the present embodiment, the emotion frequency threshold region is (7, 10); the present invention is not particularly limited, and may be specifically set as needed.

Specifically, in this embodiment, after the step S3 completes the identification of the real-time emotion and key frame of the patient in the doctor-patient interaction process, the method further includes: encrypting the emotion prediction result and the time tag of the patient, the identity information of the patient, the diagnosis and treatment ID and the like by adopting an encryption algorithm, and uploading the generated hash value to a blockchain network for storage so as to better protect the privacy of the patient; in the present invention, the encryption algorithm is a national encryption algorithm, and the present invention is not particularly limited, and may be specifically set as required.

The foregoing description of the preferred embodiments of the present invention should not be construed as limiting the scope of the invention, but rather as utilizing equivalent structural changes made in the description of the present invention and the accompanying drawings or directly/indirectly applied to other related technical fields under the inventive concept of the present invention.

Claims

1. The key emotion extraction method based on doctor-patient interaction is characterized by comprising the following steps of:

S1, acquiring facial image data of a patient to form a first image set;

S3, constructing an emotion frame detection model, and carrying out emotion recognition on the second image set through the emotion frame detection model to form a third image set; the emotion frame detection model comprises ResNet networks, and the ResNet networks are used for carrying out emotion classification and identification on the second image set so as to obtain emotion prediction results;

S4, constructing an emotion key frame detection model, and extracting an emotion key frame of the patient from the third image set through the emotion key frame detection model so as to realize the identification of the real-time emotion of the patient and the extraction of the emotion key frame in the doctor-patient interaction process; the method comprises the following steps:

2. The method for extracting key emotion based on doctor-patient interaction according to claim 1, wherein the step S1 is to acquire facial image data of a patient to form a first image set, specifically:

3. A method for extracting key emotion based on doctor-patient interaction according to any one of claims 1-2, wherein the face detection model adopts RETINAFACE network model architecture.

4. A method of key emotion extraction based on doctor-patient interaction as claimed in any one of claims 1-2, wherein said face detection model includes a linear rectification function for performing pixel correction processing on a number of image frames in the first image set.

5. A method of key emotion extraction based on doctor-patient interaction as recited in any of claims 1-2, wherein said face detection model includes PFLD face key point recognition algorithm.

6. The method for extracting key emotion based on doctor-patient interaction of claim 1, wherein the emotion prediction result includes natural emotion, happy emotion, sad emotion, vital emotion, fear emotion, surprise emotion and frigid emotion.

7. A key emotion extraction method based on doctor-patient interaction as claimed in claim 1, characterized in that the emotion frequency threshold region is (7, 10).

8. The method for extracting key emotion based on doctor-patient interaction of claim 7, wherein after the step S4 is completed for identifying the real-time emotion of the patient and extracting the emotion key frame, further comprises: