CN115050075A - Cross-granularity interactive learning micro-expression image labeling method and device - Google Patents

Cross-granularity interactive learning micro-expression image labeling method and device Download PDF

Info

Publication number
CN115050075A
CN115050075A CN202210736803.8A CN202210736803A CN115050075A CN 115050075 A CN115050075 A CN 115050075A CN 202210736803 A CN202210736803 A CN 202210736803A CN 115050075 A CN115050075 A CN 115050075A
Authority
CN
China
Prior art keywords
expression
micro
image
category
labeled
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210736803.8A
Other languages
Chinese (zh)
Inventor
刘海
张昭理
周启云
石佛波
朱俊艳
宋云霄
刘婷婷
杨兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University
Central China Normal University
Original Assignee
Hubei University
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University, Central China Normal University filed Critical Hubei University
Priority to CN202210736803.8A priority Critical patent/CN115050075A/en
Publication of CN115050075A publication Critical patent/CN115050075A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a micro-expression image labeling method and device for cross-granularity interactive learning, which relate to the technical field of image processing and comprise the following steps: acquiring a micro-expression image sequence to be marked; acquiring a preset number of marked micro expression images, and inputting the marked micro expression images and a micro expression image sequence to be marked into a pre-trained feature extractor model; labeling the category of each micro expression to be labeled; acquiring a standard confidence score corresponding to each micro expression category; obtaining the confidence score of the recognized micro expression, comparing the confidence score with the standard confidence score of the micro expression of the corresponding category, and outputting the micro expression of which the confidence score is greater than or equal to the standard confidence score; and updating the micro expression image sequence to be labeled until all the micro expressions are labeled and outputting the labeled micro expression image set. The method and the device have the advantages that the micro-expressions of the students can be efficiently and accurately automatically marked in a teaching scene, the subjectivity of manual marking and the ambiguity of the collected micro-expressions are avoided, and a large amount of manpower and material resources are saved.

Description

Cross-granularity interactive learning micro-expression image labeling method and device
Technical Field
The invention relates to the technical field of image processing, in particular to a micro-expression image labeling method and device for cross-granularity interactive learning.
Background
In order to study the classroom performance states of students in classroom concentration, entrance degree, liveness and the like in a teaching scene, the students in classroom performance states are generally analyzed and studied by collecting video data of the students in the teaching scene, so that the expression states of the students in classroom are analyzed according to the expressions of the students. The micro-expression of the students in the classroom is usually weak facial activity with extremely short duration, when the students try to hide the true emotion of the inner heart, the expression is unconsciously generated, and compared with other data information, the micro-expression of the students can reflect the true emotional state of the students in the classroom.
At present, the micro-expression data of students in a teaching scene are labeled and analyzed in a manual mode, the manual labeling can cause certain subjectivity in the judgment of the type of expressions, and the results of identifying the same micro-expression data by different people are possibly different, so that ambiguity is easily generated in data processing; the subjectivity of the label and the ambiguity of the expression will seriously affect the accuracy of the analysis and research on the learning state of the student; in addition, with the rapid development of information technology, the educational big data is massive in daily teaching activities, if a traditional manual labeling method is adopted, a large amount of manual labor is consumed, and the cost of paid time is extremely expensive.
Disclosure of Invention
The invention provides a micro expression image labeling method for cross-granularity interactive learning, which is used for solving the defect that manual labeling in the prior art cannot accurately identify the category of each student micro expression in a large amount of expression data, realizing the automatic labeling of the student micro expressions in a teaching scene efficiently and accurately, effectively avoiding the subjectivity of the manual labeling and the ambiguity of the acquired micro expressions on one hand, and saving a large amount of manpower and material resources on the other hand.
The invention provides a micro-expression image labeling method for cross-granularity interactive learning, which comprises the following steps of:
s1, acquiring a micro expression image sequence to be annotated;
s2, acquiring a preset number of marked micro expression images, and inputting the marked micro expression images and the micro expression image sequence to be marked into a pre-trained feature extractor model;
marking the category corresponding to each micro expression in the micro expression image sequence to be marked through the characteristic extractor model; acquiring a standard confidence score corresponding to each micro-expression category based on the marked micro-expression image;
s3, obtaining the confidence score of the micro expression recognized in the micro expression image sequence to be labeled and comparing the confidence score with the standard confidence score of the micro expression of the corresponding category, if the confidence score is more than or equal to the standard confidence score, adding the micro expression more than or equal to the standard confidence score into the labeled micro expression image set;
and S4, updating the micro expression image sequence to be labeled until all micro expressions are labeled and then outputting a labeled micro expression image set.
According to the micro-expression image annotation method provided by the invention, before the step S1, the method comprises the following steps:
collecting video data of students in a classroom;
preprocessing the video data, and converting the video data into image data of each frame;
carrying out face detection on the image data to generate a plurality of face images;
and storing a plurality of face images as the micro-expression image sequence to be labeled.
According to the micro-expression image labeling method provided by the invention, training the feature extractor model comprises the following steps:
inputting the marked micro expression image and the micro expression image sequence to be marked into a feature extractor model after performing a data enhancement strategy:
and acquiring the category corresponding to the marked micro expression image, acquiring the standard confidence score corresponding to each micro expression category, and outputting the trained feature extractor model.
According to the micro-expression image labeling method provided by the invention, training the feature extractor model comprises the following steps:
selecting any image from the micro-expression image sequence to be marked, carrying out a strong enhancement strategy and a weak enhancement strategy, and obtaining a corresponding strong enhancement image u s And weakly enhanced image u w (ii) a Selecting any image from the marked micro-expression image, performing weak enhancement strategy, and obtaining a weak enhancement image s w
Will strongly enhance the image u s And weakly enhanced image u w 、s w Respectively inputting the images into the feature extractor, respectively acquiring corresponding image features, and judging the category of the micro expression;
based on a plurality of weakly enhanced images u w Outputting the category probability distribution corresponding to each micro-expression category through the feature extractor, comparing the maximum value of the category probability distribution with a preset threshold value, and taking the micro-expression categories which are more than or equal to the preset threshold value as pseudo categories of the weakly enhanced image; obtaining a plurality of strongly enhanced images u s The cross entropy of the micro-expression class and the pseudo class;
based on a plurality of weakly enhanced images s w The micro-expression categories are output through the feature extractor, category probability distribution corresponding to each micro-expression category is output, and cross entropy between the output category probability distribution and the original category probability distribution of the marked micro-expression images is obtained.
According to the micro expression image labeling method provided by the invention, the confidence score of the micro expression recognized in the micro expression image sequence to be labeled is compared with the standard confidence score of the micro expression of the corresponding category, and the method comprises the following steps:
the probability distribution of the first class for obtaining the image data in the micro-expression image sequence to be labeled is
Figure BDA0003715857510000031
Obtaining a second class probability distribution of the labeled micro-expression dataset as
Figure BDA0003715857510000032
For the same micro-expression category, obtaining the average probability distribution of the first category probability distribution and the second category probability distribution
Figure BDA0003715857510000041
Will be provided with
Figure BDA0003715857510000042
Adaptive fraction T with preset c Make a comparison if
Figure BDA0003715857510000043
Judging that the confidence score of the corresponding micro expression is more than or equal to the standard confidence score, updating the micro expression image sequence to be labeled, and adding the micro expressions which are more than or equal to the standard confidence score in the micro expression image sequence to be labeled into the labeled micro expression image set;
if it is
Figure BDA0003715857510000044
And judging that the confidence score of the corresponding micro expression is smaller than the standard confidence score, adding the corresponding micro expression into a retraining image set, and using the retraining image set for retraining the feature extractor model.
According to the micro-expression image labeling method provided by the invention, the feature extractor model comprises a fine-grained feature extractor model and a coarse-grained feature extractor model, and the training of the feature extractor model comprises the following steps:
acquiring a neutral expression image and other expression images of the same target to be identified in the micro-expression image data; acquiring the identity characteristics of the neutral expression image, and acquiring the identity characteristics and expression category characteristics of other expression images through an encoder;
performing expression reconstruction by combining the identity characteristics of the neutral expression image and the expression category characteristics of the other expression images through a decoder;
the encoder and the decoder perform counterstudy, and the encoder classifies the micro-expression images of different targets to be recognized according to the identity characteristics, so that the difference between the micro-expression category distributions of the different targets to be recognized is minimum;
and analyzing the reconstructed expression image through an expression classifier, acquiring corresponding expression category characteristics, and outputting category probability distribution.
According to the micro-expression image labeling method provided by the invention, the training of the feature extractor model comprises the following steps:
anchor the micro-expression x anchor And the macro expression positive example x with the same expression category as the micro expression anchor posi And micro-expression counter-example x of different expression categories from micro-expression nega Form a triplet (x) anchor ,x nega ,x posi ) Input to the feature extractor model;
wherein the micro-expression anchor and the micro-expression counter-example are input into the fine-grained feature extractor to respectively obtain expression embedding
Figure BDA0003715857510000051
And
Figure BDA0003715857510000052
inputting the macro expression positive example into the coarse-grained feature extractor model, and acquiring expression embedded into
Figure BDA0003715857510000053
Obtaining triple loss between multiple expression embeddings:
Figure BDA0003715857510000054
performing a counterstudy between the fine-grained feature extractor and the coarse-grained feature extractor model; the coarse-grained feature learning module provides embedded expression of macro expression and records the classification of the corresponding macro expression image as a correct category; the fine-grained feature extractor provides embedded representation of the micro expression, and the classification of the corresponding micro expression image is recorded as an error category;
distinguishing the two embedded representations through a discriminator, and adjusting common expression features between a macro expression and a micro expression through a fine-grained feature extractor model so that the gradient between the macro expression and the micro expression is larger than or equal to a lowest threshold value;
wherein m is a hyperparameter.
On the other hand, the invention also provides a micro-expression image labeling device for cross-granularity interactive learning, which comprises:
the image processing module is used for acquiring micro-expression image sequences to be labeled, acquiring a preset number of labeled micro-expression images, and inputting the labeled micro-expression images and the micro-expression image sequences to be labeled into the feature extractor module;
the characteristic extractor module is used for marking the category corresponding to each micro expression in the micro expression image sequence to be marked through the characteristic extractor model; acquiring a standard confidence score corresponding to each micro-expression category based on the marked micro-expression image;
the image labeling module is used for acquiring the confidence score of the identified micro expression in the micro expression image sequence to be labeled, comparing the confidence score with the standard confidence score of the micro expression of the corresponding category, and adding the micro expression of which the confidence score is more than or equal to the standard confidence score into the labeled micro expression image set if the confidence score is more than or equal to the standard confidence score;
the image labeling module is also used for updating the micro-expression image sequence to be labeled until all the micro-expressions are labeled and then outputting a labeled micro-expression image set.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method for annotating a microexpressive image as described in any of the above.
The invention provides a micro-expression image labeling method and device for cross-granularity interactive learning, wherein a labeled micro-expression image and a micro-expression image sequence to be labeled are input into a pre-trained feature extractor model, and a standard confidence score corresponding to each micro-expression category is obtained from the labeled micro-expression image, so that the category corresponding to each micro-expression in the micro-expression image sequence to be labeled is further identified, the confidence score of the identified micro-expression in the micro-expression image sequence to be labeled is obtained and is compared with the standard confidence score of the micro-expression of the corresponding category, and the identification accuracy of the micro-expression is improved; the micro-expression images which do not meet the confidence score requirement are input into the feature extractor model again for training, so that the precision of the feature extractor model can be effectively improved; the micro-expression of the students in the teaching scene is identified by the method provided by the invention, so that the feedback of the students on the teaching contents in the classroom and the classroom emotion of the students are obtained, and the teacher or the teaching staff can effectively master the classroom states of different students, thereby being convenient for the teacher or the teaching staff to pertinently provide guidance for different students and improve the teaching effect in the classroom.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a micro-expression image annotation method according to the present invention;
FIG. 2 is a schematic diagram of image acquisition of the micro-expression image annotation method provided by the present invention;
FIG. 3 is a second schematic flow chart of the micro-expression image annotation method provided by the present invention;
fig. 4 is a third schematic flow chart of the micro-expression image annotation method provided by the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the term "first \ second" referred to in the present invention is only used for distinguishing similar objects, and does not represent a specific ordering for the objects, and it should be understood that "first \ second" may be interchanged in a specific order or sequence, if allowed. It should be understood that "first \ second" distinct objects may be interchanged under appropriate circumstances such that embodiments of the invention described herein may be practiced in sequences other than those described or illustrated herein.
In one embodiment, as shown in fig. 1, the present invention provides a cross-granularity interactive learning micro-expression image labeling method, including the steps of:
s1, acquiring a micro expression image sequence to be annotated;
s2, acquiring a preset number of marked micro expression images, and inputting the marked micro expression images and the micro expression image sequence to be marked into a pre-trained feature extractor model;
marking the category corresponding to each micro expression in the micro expression image sequence to be marked through the characteristic extractor model; acquiring a standard confidence score corresponding to each micro-expression category based on the marked micro-expression image;
it should be noted that the labeled micro-expression image can select the micro-expression image and the labeled category from the currently disclosed micro-expression database;
s3, obtaining the confidence score of the micro expression recognized in the micro expression image sequence to be labeled and comparing the confidence score with the standard confidence score of the micro expression of the corresponding category, if the confidence score is more than or equal to the standard confidence score, adding the micro expression more than or equal to the standard confidence score into the labeled micro expression image set;
further, if the micro expression image is smaller than the standard confidence score, the micro expression image which is smaller than the standard confidence score is input into the feature extractor model again for training;
it should be noted that learning the confidence score corresponding to each micro expression category provides a reference basis for judging and comparing whether the recognition result of the micro expression data to be annotated is qualified or not; the confidence score represents the correct recognition probability of each micro-expression category, and the higher the score is, the higher the probability of containing the category is;
and S4, updating the micro expression image sequence to be labeled until all micro expressions are labeled and then outputting a labeled micro expression image set.
Optionally, before step S1, the method includes the steps of:
collecting video data of students in a classroom;
preprocessing the video data, and converting the video data into image data of each frame;
carrying out face detection on the image data to generate a plurality of face images;
storing a plurality of face images as the micro-expression image sequence to be labeled;
specifically, video data of students in a classroom are collected through a front RGB intelligent camera and an intelligent tracking camera;
as shown in fig. 2, the camera includes an intelligent tracking camera facing the teaching scene; still including two leading intelligent RGB cameras CL, CR that set up respectively to guarantee to the collection of every student under the teaching scene, guarantee the collection of all-round multi-angle student's facial little expression data in to the teaching scene, can enlarge, reduce the operation to the camera lens to the student of different distances.
Specifically, after video data of students in a classroom is collected by using a specific camera device, data preprocessing is performed on the video data of the students in the classroom, and the data preprocessing comprises the following steps: and converting the video data into image data of each frame, carrying out face detection on students, cutting face images of each frame of students according to the same size, and finally storing a data set subjected to data preprocessing as a micro-expression image sequence to be labeled.
In one embodiment, as shown in FIG. 3, training the feature extractor model comprises:
inputting the marked micro expression image and the micro expression image sequence to be marked into a feature extractor model after performing a data enhancement strategy:
and acquiring the category corresponding to the marked micro expression image, acquiring the standard confidence score corresponding to each micro expression category, and outputting the trained feature extractor model.
It should be noted that the feature extractor model is a feature extractor for cross-granularity interactive learning, and the trained feature extractor model is stored by performing model training on the labeled micro-expression data set and learning the confidence score corresponding to each micro-expression category; the macro expression and the micro expression have some shared characteristics, but the characteristics of the macro expression are coarse-grained, and the characteristics of the micro expression are finer-grained, so that the fine-grained micro expression characteristic training characteristic extractor is guided by the coarse-grained macro expression characteristics, and the accuracy of the micro expression identification to be marked is better facilitated;
it should be noted that the micro-expression has the characteristics of short duration (less than 0.5 second), small amplitude and difficult perception by naked eyes, is an unconscious spontaneous behavior, and the reaction quickly appears after an emotion evoking event and is difficult to inhibit, so that the real emotional state of a person can be revealed; the macro expression has the characteristics of long duration (1s-5s), large amplitude and easy observation;
wherein, the macro expression is coarse-grained, the micro expression is fine-grained, and the granularity refers to the fineness of the expression characteristics; micro-expressions are usually short in duration, evanescent at the moment, small in facial muscle movement amplitude, and difficult to capture, for example, a moment of raising of the corners of the mouth, eyebrows, a minute movement contraction or stretching of the eyes, etc. are more fine expressive features, and therefore fine-grained; macro expression has a long duration and large facial muscle movement amplitude compared with micro expression, and is easy to extract captured features, so that the macro expression is coarse-grained;
specifically, training the feature extractor model includes two stages:
the first stage comprises:
dividing the acquired video data into a micro expression data set and a macro expression data set; acquiring neutral expressions in the same video
Figure BDA0003715857510000091
(Neutral), other expressions
Figure BDA0003715857510000092
(Other Expression) image and corresponding Expression category y i
In-micro-expression data set
Figure BDA0003715857510000093
On-pretraining a fine-grained feature extractor model, M 1 Representing the number of videos; simultaneously in the macro expression data set
Figure BDA0003715857510000094
Pre-training a coarse-grained feature extractor model;
the inputs of the fine-grained feature extractor model and the coarse-grained feature extractor model are paired images from the same pair of videos, including a neutral micro-expression image x N Other expression image x O (ii) a The training of both feature extractor models comprises four steps: expression feature extraction, expression reconstruction, identity classification and expression classification:
wherein, expression feature extraction includes: the fine-grained characteristic learning module carries out encoding on neutral expression images x from the micro expression video N Other expression image x O Encode and obtain their embedded representations, respectively:
Figure BDA0003715857510000101
obtaining the identity characteristics of the neutral expression image through an encoder
Figure BDA0003715857510000102
Obtaining identity characteristics of other expression images
Figure BDA0003715857510000103
And expression category characteristics
Figure BDA0003715857510000104
Expression category characteristics of neutral expression image
Figure BDA0003715857510000105
Can be considered as a fixed value; the neutral expression image refers to the situation of facial expression, and can be generally regarded as the neutral expression image only comprising identity features; the other expression images refer to expressions generated by facial muscle movement except neutral expressions, such as basic expressions of happiness, sadness, anger, surprise, aversion and the like;
note that the neutral expression image x N Other expression image x O For images originating from the same object, their identity-related features
Figure BDA0003715857510000106
And
Figure BDA0003715857510000107
similarly, the difference between the two can be expressed as a loss function as follows:
Figure BDA0003715857510000108
further, other expression-related features
Figure BDA0003715857510000109
Enough category characteristic information of the original expression is transmitted, so that the expression reconstruction comprises the following steps: through a decoder D r Combining the identity characteristics of the neutral expression image and the expression category characteristics of the other expression images to reconstruct the expression, and outputting a reconstruction loss function
Figure BDA00037158575100001010
Further, the step of identity classification comprises: the encoder E and the identity classifier D s Performing antagonistic learning; classifying the micro expression images of different targets to be recognized according to the identity characteristics through an encoder to obtain micro expression category distribution of a plurality of different targets to be recognized, so that the difference between the micro expression category distribution of the different targets to be recognized is minimum, namely the probability distribution difference of expression categories in the recognition results of the plurality of images of the different targets is minimum; thereby making D s Related expression characteristics of target difficult to recognize
Figure BDA00037158575100001011
Classifying; the goal of the confrontational training is expression class distribution
Figure BDA00037158575100001012
Cross entropy loss with the real expression class s of the recognition target:
Figure BDA0003715857510000111
it should be noted that the similarity between the expression category distribution and the real category distribution obtained by recognition can be measured through cross entropy loss, which is beneficial to improving the learning rate of the recognition model;
further, the step of classifying the expressions comprises: by expression classifier D e Generating characteristics related to expressions, namely generating various expression characteristics based on micro expressions and/or macro expressions, and introducing cross entropy loss L c Is defined as:
Figure BDA0003715857510000112
wherein y is x O The category of the expression of (a) is,
Figure BDA0003715857510000113
the obtained expression category distribution is identified;
finally, a total loss function is obtained, and for the fine-grained feature extractor model, the total loss function L micro Is defined as:
Figure BDA0003715857510000114
wherein λ, β, γ are hyper-parameters of the control loss function;
it should be noted that the cross entropy loss is the difference between the real probability distribution and the prediction probability distribution, and the smaller the cross entropy is, the better the model prediction effect is; smaller values for the overall loss function indicate better model training.
Further, the second stage comprises two steps of feature space guidance and category space guidance:
wherein the feature space guide comprises: fixing the coarse-grained feature extractor model, and training the fine-grained feature extractor model to improve the identification precision of the fine-grained feature extractor model;
anchor the micro-expression x anchor And the macro expression positive example x with the same expression category as the micro expression anchor posi And micro-expression counter-example x of different expression categories from micro-expression nega Form a triplet (x) anchor ,x nega ,x posi ) Inputting the fine-grained feature extractor model;
for each input triple, inputting the micro-expression anchor and the micro-expression counter-example into the fine-grained feature extractor, and respectively acquiring expression embedded as
Figure BDA0003715857510000115
And
Figure BDA0003715857510000116
inputting the macro expression into the coarse-grained feature extractionGet the model of the device, get the expression to embed as
Figure BDA0003715857510000117
Obtaining triple loss between multiple expression embeddings:
Figure BDA0003715857510000121
specifically, the details of the expression can be better distinguished by acquiring triple loss, which is beneficial to improving the recognition precision of the expression.
Wherein m is a hyper-parameter;
it should be noted that the micro-expression anchor is a micro-expression reconstructed image from a fine-granularity feature extractor, the macro-expression positive example is a macro-expression reconstructed image from a coarse-granularity feature extractor, and the micro-expression negative example is an expression image with the same identity feature as the micro-expression anchor;
further, performing countermeasure training between the coarse-grained feature extractor model and the fine-grained feature extractor model; the coarse-grained feature learning module provides embedded expression of macro expression and records the classification of the corresponding macro expression image as a correct category; the fine-grained feature extractor provides embedded representation of the micro expression, and the classification of the corresponding micro expression image is recorded as an error category;
distinguishing the two embedded representations through a discriminator D, adjusting common expression features between the macro expression and the micro expression through a fine-grained feature extractor model, enabling the difference gradient between the macro expression and the micro expression to be larger than or equal to a minimum threshold value, and aiming at counterstudy:
Figure BDA0003715857510000122
the loss of the discriminator is defined as
Figure BDA0003715857510000123
To avoid disappearance of the gradient, minimize
Figure BDA0003715857510000124
Calculating the confrontation loss of the fine-grained feature learning module:
Figure BDA0003715857510000125
it should be noted that the common expressive features of the macro expression and the micro expression refer to: for example, the smile of the macro expression may be a laugh, the smile of the micro expression may be a light smile, and the shared features between the two may include muscle stretching of the cheek, mouth corner rising, eye corner change, and the like, which are common feature points of the macro expression and the micro expression, except for the degree of muscle stretching, mouth corner rising, eye corner change.
Further, the class space guidance includes:
the expression classification loss is used for controlling the recognition precision of the expression, the classification loss is introduced as follows, and the classification loss of the fine-grained encoder branch is as follows:
Figure BDA0003715857510000131
wherein y represents x anchor The category of the expression of (a) is,
Figure BDA0003715857510000132
a distribution of identified expression categories;
during training, assuming that a fine-grained feature learning module and a coarse-grained feature learning module generate similar outputs, so as to jointly train the two networks, and considering the difference that a regularization term is added in a loss function to punish the two networks;
it should be noted that the fine-grained features are more important than the coarse-grained features, the implied feature information is richer, and regularization can be added for constraint, so that the network learns the finer features
Wherein the loss function is defined as:
L LIR =max{L ds -L cls′ ,0};
wherein L is cls′ Representing positive case characteristics for classification loss of coarse-grained encoder branches
Figure BDA0003715857510000133
And cross entropy loss of classification results between the true expression categories y of the regular images;
the overall loss function is defined as:
L MM =L cls2 L tri3 L adv4 L LIR
wherein λ 2 ,λ 3 ,λ 4 A hyper-parameter to control the loss-factor;
it should be noted that, in the training process, the total loss function needs to be minimized as much as possible, so as to improve the effect of the trained fine-grained recognition model, and update some model parameters in the training process, optimize the model, and improve the performance, which is not limited in this invention;
and acquiring the category identification results of all micro expression images by the feature extractor model obtained after the training, and comparing the identification results with the real categories of the micro expression images:
selecting samples that identify the correct
Figure BDA0003715857510000134
Wherein s is i Represents the confidence score of the ith token data,
Figure BDA0003715857510000135
denotes the ith identification mark, N denotes the sample S C The number of (2);
constructing an adaptive confidence interval T { (T) 1 ,...,T c )|T c E.g. R, C1. ·, C }, the specific formula is:
Figure BDA0003715857510000141
wherein the content of the first and second substances,
Figure BDA0003715857510000142
representative sample S C The number of the middle samples marked as the c-th expression;
after a complete training process is carried out, storing a primary feature extractor model with the best effect and the highest prediction accuracy and a confidence score corresponding to each expression category;
in one embodiment, training the feature extractor model as shown in FIG. 4 further comprises:
the data enhancement strategy is carried out on the micro-expression image, and comprises the following steps: selecting any image from the micro-expression image sequence to be marked, carrying out a strong enhancement strategy and a weak enhancement strategy, and obtaining a corresponding strong enhancement image u s And weakly enhanced image u w (ii) a Selecting any image from the marked micro-expression image, performing weak enhancement strategy, and obtaining a weak enhancement image s w
It should be noted that the strong enhancement strategy is: and (2) enhancing by adopting random Augment and then applying CutOut, wherein the weak enhancement strategy comprises the following steps: the image u is obtained by Random Horizontal Flip (Random Horizontal Flip) with a probability of 50%, and Random Horizontal and vertical shift (Random Translation) with a probability of 13.5% w (ii) a More image samples are run through a strong enhancement strategy and a weak enhancement strategy;
further, the image u will be strongly enhanced s And weakly enhanced image u w 、s w Respectively inputting the images into the feature extractor, respectively acquiring corresponding image features, and judging the category of the micro expression;
based on a plurality of weakly enhanced images u w Outputting the category probability distribution corresponding to each micro-expression category through the feature extractor, comparing the maximum value of the category probability distribution with a preset threshold tau, and taking the micro-expression categories which are greater than or equal to the preset threshold tau as the pseudo categories of the weakly enhanced image; acquiring a plurality of strongly enhanced images u s The cross entropy of the micro-expression class and the pseudo-class:
Figure BDA0003715857510000143
where M denotes the number of unlabelled images, p w Representing the probability distribution of the identified classes after a weak enhancement strategy,
Figure BDA0003715857510000144
a pseudo-class representing a weakly enhanced image,
Figure BDA0003715857510000145
representing the class probability distribution of the strongly enhanced image, H (-) is a cross entropy loss function;
the pseudo category is an image category which is greater than or equal to a preset threshold value in category probability distribution and is identified after a weak enhancement strategy;
optionally, the weakly enhanced image s is obtained after a weak enhancement strategy is carried out on a small number of identified micro-expression images w Identifying to obtain corresponding class probability distribution, and obtaining cross entropy loss L between the class probability distribution and the real class distribution of the image s s The definition is as follows:
Figure BDA0003715857510000151
wherein N represents the number of data, C represents the number of categories,
Figure BDA0003715857510000152
is a true class distribution, p c (s i Is a class probability distribution, s i Is the ith data, theta represents a network parameter, p c () represents a recognition function;
further, comparing the confidence scores of the micro expressions identified in the micro expression image sequence to be labeled with the standard confidence scores of the micro expressions of the corresponding categories, comprising the steps of:
the probability distribution of the first class for obtaining the image data in the micro-expression image sequence to be labeled is
Figure BDA0003715857510000153
Obtaining a second class probability distribution of the labeled micro-expression dataset as
Figure BDA0003715857510000154
For the same micro-expression category, obtaining the average probability distribution of the first category probability distribution and the second category probability distribution
Figure BDA0003715857510000155
Figure BDA0003715857510000156
Wherein c is the corresponding micro-expression category;
will be provided with
Figure BDA0003715857510000157
Adaptive fraction T with preset c Make a comparison if
Figure BDA0003715857510000158
Judging that the confidence score of the corresponding micro expression is more than or equal to the standard confidence score, updating the micro expression image sequence to be labeled, and adding the micro expressions which are more than or equal to the standard confidence score in the micro expression image sequence to be labeled into the labeled micro expression image set;
if it is
Figure BDA0003715857510000159
And judging that the confidence score of the corresponding micro expression is smaller than the standard confidence score, adding the corresponding micro expression into a retraining image set, and using the retraining image set for retraining the feature extractor model.
In another embodiment, the present invention further provides a micro-expression image labeling apparatus for cross-granularity interactive learning, including:
the image processing module is used for acquiring micro-expression image sequences to be labeled, acquiring a preset number of labeled micro-expression images, and inputting the labeled micro-expression images and the micro-expression image sequences to be labeled into the feature extractor module;
the characteristic extractor module is used for marking the category corresponding to each micro expression in the micro expression image sequence to be marked through the characteristic extractor model; acquiring a standard confidence score corresponding to each micro-expression category based on the marked micro-expression image;
the image labeling module is used for acquiring the confidence score of the identified micro expression in the micro expression image sequence to be labeled, comparing the confidence score with the standard confidence score of the micro expression of the corresponding category, and adding the micro expression of which the confidence score is more than or equal to the standard confidence score into the labeled micro expression image set if the confidence score is more than or equal to the standard confidence score;
the image labeling module is also used for updating the micro expression image sequence to be labeled until all micro expressions are labeled and outputting a labeled micro expression image set;
specifically, the micro-expression image labeling device further comprises image acquisition devices arranged at various positions in a teaching scene, and video data of students in a classroom are acquired through devices including an RGB intelligent camera and an intelligent tracking camera; the invention is not limited in this respect, and how to acquire images and which devices to acquire image information in a classroom do not influence the implementation of the invention;
specifically, the student expression categories are divided into three categories: positive, neutral, and negative; wherein the neutral expression is that the face has no obvious expression characteristics and is a facial blankness; positive expressions include smiling, etc., and negative expressions include angry, too much, etc.;
specifically, the type of the expression can be judged through facial features including eye corners, mouth corners, muscle stretching of cheeks and the like, and the expression degree of the expression such as the change degree of the eye corners, the rising degree of the mouth corners, the stretching degree of facial muscles and the like is further judged to further judge the expression of the person;
by the device provided by the invention, the micro expression marking method provided by the invention is mutually and correspondingly referenced, the micro expressions of students can be identified and marked through the pre-trained feature extractor based on student images acquired by a teaching scene, and the micro expressions and the corresponding micro expression categories of the students can be accurately identified from student image data shot by the teaching scene;
through simple arrangement, the emotion state and the classroom state of each student in a classroom are recognized, the subjectivity of micro expression recognition is avoided, the micro expressions of the students in a teaching scene can be recognized based on the device and the method provided by the invention only by inputting the collected classroom images, large-scale expression data can be respectively labeled in batches, the teaching personnel can be concentrated on improving the teaching effect, and the teaching and research efficiency of the teaching personnel is improved.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the micro-expression image annotation method provided by the above methods.
In still another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the micro-expression image annotation method provided by the above methods.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A micro-expression image labeling method for cross-granularity interactive learning is characterized by comprising the following steps:
s1, acquiring a micro expression image sequence to be annotated;
s2, acquiring a preset number of marked micro expression images, and inputting the marked micro expression images and the micro expression image sequence to be marked into a pre-trained feature extractor model;
marking the category corresponding to each micro expression in the micro expression image sequence to be marked through the characteristic extractor model; acquiring a standard confidence score corresponding to each micro-expression category based on the marked micro-expression image;
s3, obtaining the confidence score of the micro expression recognized in the micro expression image sequence to be labeled and comparing the confidence score with the standard confidence score of the micro expression of the corresponding category, if the confidence score is more than or equal to the standard confidence score, adding the micro expression more than or equal to the standard confidence score into the labeled micro expression image set;
and S4, updating the micro expression image sequence to be labeled until all micro expressions are labeled and then outputting a labeled micro expression image set.
2. The micro-expression image annotation method for cross-granularity interactive learning according to claim 1, wherein before step S1, the method comprises the steps of:
collecting video data of students in a classroom;
preprocessing the video data, and converting the video data into image data of each frame;
carrying out face detection on the image data to generate a plurality of face images;
and storing a plurality of face images as the micro-expression image sequence to be labeled.
3. The method of claim 2, wherein training the feature extractor model comprises:
inputting the marked micro expression image and the micro expression image sequence to be marked into a feature extractor model after performing a data enhancement strategy:
and acquiring the category corresponding to the marked micro expression image, acquiring the standard confidence score corresponding to each micro expression category, and outputting the trained feature extractor model.
4. The method of claim 3, wherein training the feature extractor model comprises:
selecting any image from the micro-expression image sequence to be marked, carrying out a strong enhancement strategy and a weak enhancement strategy, and obtaining a corresponding strong enhancement image u s And weakly enhanced image u w (ii) a Selecting any image from the marked micro-expression image, performing weak enhancement strategy, and obtaining a weak enhancement image s w
Will strongly enhance the image u s And weakly enhanced image u w 、s w Respectively inputting the images into the feature extractor, respectively acquiring corresponding image features, and judging the category of the micro expression;
based on a plurality of weakly enhanced images u w Outputting the category probability distribution corresponding to each micro-expression category through the feature extractor, comparing the maximum value of the category probability distribution with a preset threshold value, and taking the micro-expression categories which are more than or equal to the preset threshold value as pseudo categories of the weakly enhanced image; obtaining a plurality of strongly enhanced images u s The cross entropy of the micro-expression class and the pseudo class;
based on a plurality of weakly enhanced images s w The micro-expression categories are output through the feature extractor, category probability distribution corresponding to each micro-expression category is output, and cross entropy between the output category probability distribution and the original category probability distribution of the marked micro-expression images is obtained.
5. The method for labeling the micro-expression images for cross-granularity interactive learning according to claim 4, wherein the step of comparing the confidence scores of the micro-expressions identified in the micro-expression image sequence to be labeled with the standard confidence scores of the micro-expressions of the corresponding categories comprises the following steps:
the probability distribution of the first class for obtaining the image data in the micro-expression image sequence to be labeled is
Figure FDA0003715857500000021
Obtaining a second class probability distribution of the labeled micro-expression dataset as
Figure FDA0003715857500000022
For the same micro-expression category, obtaining the average probability distribution of the first category probability distribution and the second category probability distribution
Figure FDA0003715857500000031
Will be provided with
Figure FDA0003715857500000032
Adaptive fraction T with preset c Make a comparison if
Figure FDA0003715857500000033
Judging that the confidence score of the corresponding micro expression is more than or equal to the standard confidence score, updating the micro expression image sequence to be labeled, and adding the micro expressions which are more than or equal to the standard confidence score in the micro expression image sequence to be labeled into the labeled micro expression image set;
if it is
Figure FDA0003715857500000034
And judging that the confidence score of the corresponding micro expression is smaller than the standard confidence score, adding the corresponding micro expression into a retraining image set, and using the retraining image set for retraining the feature extractor model.
6. The method for labeling the micro-expression images for cross-granularity interactive learning according to claim 3 or 5, wherein the feature extractor model comprises a fine-granularity feature extractor model and a coarse-granularity feature extractor model, and training the feature extractor model comprises the following steps:
acquiring a neutral expression image and other expression images of the same target to be identified in the micro-expression image data; acquiring the identity characteristics of the neutral expression image, and acquiring the identity characteristics and expression category characteristics of other expression images through an encoder;
performing expression reconstruction by combining the identity characteristics of the neutral expression image and the expression category characteristics of the other expression images through a decoder;
the encoder and the decoder perform counterstudy, and the encoder classifies the micro-expression images of different targets to be recognized according to the identity characteristics, so that the difference between the micro-expression category distributions of the different targets to be recognized is minimum;
and analyzing the reconstructed expression image through an expression classifier, acquiring corresponding expression category characteristics, and outputting category probability distribution.
7. The method for labeling the micro-expression images for cross-granularity interactive learning according to claim 6, wherein the training of the feature extractor model comprises the following steps:
anchor the micro-expression x anchor And the macro expression positive example x with the same expression category as the micro expression anchor posi And micro-expression counter-example x of different expression categories from micro-expression nega Form a triplet (x) anchor ,x nega ,x posi ) Input to the feature extractor model;
wherein the micro-expression anchor and the micro-expression counter-example are input into the fine-grained feature extractor to respectively obtain expression embedding
Figure FDA0003715857500000041
And
Figure FDA0003715857500000042
inputting the macro expression positive example into the coarse-grained feature extractor model, and acquiring expression embedded into
Figure FDA0003715857500000043
Obtaining triple loss between multiple expression embeddings:
Figure FDA0003715857500000044
performing a counterstudy between the fine-grained feature extractor and the coarse-grained feature extractor model; the coarse-grained feature learning module provides embedded representation of macro expression and marks classification of corresponding macro expression images as correct categories; the fine-grained feature extractor provides embedded representation of the micro expression, and the classification of the corresponding micro expression image is recorded as an error category;
distinguishing the two embedded representations through a discriminator, and adjusting common expression features between a macro expression and a micro expression through a fine-grained feature extractor model so that the gradient between the macro expression and the micro expression is larger than or equal to a lowest threshold value;
wherein m is a hyperparameter.
8. A micro-expression image labeling device for cross-granularity interactive learning is characterized by comprising:
the image processing module is used for acquiring micro-expression image sequences to be labeled, acquiring a preset number of labeled micro-expression images, and inputting the labeled micro-expression images and the micro-expression image sequences to be labeled into the feature extractor module;
the characteristic extractor module is used for marking the category corresponding to each micro expression in the micro expression image sequence to be marked through the characteristic extractor model; acquiring a standard confidence score corresponding to each micro-expression category based on the marked micro-expression image;
the image labeling module is used for acquiring the confidence score of the identified micro expression in the micro expression image sequence to be labeled, comparing the confidence score with the standard confidence score of the micro expression of the corresponding category, and adding the micro expression of which the confidence score is more than or equal to the standard confidence score into the labeled micro expression image set if the confidence score is more than or equal to the standard confidence score;
the image labeling module is also used for updating the micro-expression image sequence to be labeled until all the micro-expressions are labeled and then outputting a labeled micro-expression image set.
9. A non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the microexpressing image annotation method according to any one of claims 1 to 7.
CN202210736803.8A 2022-06-27 2022-06-27 Cross-granularity interactive learning micro-expression image labeling method and device Pending CN115050075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210736803.8A CN115050075A (en) 2022-06-27 2022-06-27 Cross-granularity interactive learning micro-expression image labeling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210736803.8A CN115050075A (en) 2022-06-27 2022-06-27 Cross-granularity interactive learning micro-expression image labeling method and device

Publications (1)

Publication Number Publication Date
CN115050075A true CN115050075A (en) 2022-09-13

Family

ID=83162792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210736803.8A Pending CN115050075A (en) 2022-06-27 2022-06-27 Cross-granularity interactive learning micro-expression image labeling method and device

Country Status (1)

Country Link
CN (1) CN115050075A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894985A (en) * 2023-09-08 2023-10-17 吉林大学 Semi-supervised image classification method and semi-supervised image classification system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894985A (en) * 2023-09-08 2023-10-17 吉林大学 Semi-supervised image classification method and semi-supervised image classification system
CN116894985B (en) * 2023-09-08 2023-12-15 吉林大学 Semi-supervised image classification method and semi-supervised image classification system

Similar Documents

Publication Publication Date Title
CN110889672B (en) Student card punching and class taking state detection system based on deep learning
CN111523462B (en) Video sequence expression recognition system and method based on self-attention enhanced CNN
CN113657168B (en) Student learning emotion recognition method based on convolutional neural network
CN110427881B (en) Cross-library micro-expression recognition method and device based on face local area feature learning
CN116311483B (en) Micro-expression recognition method based on local facial area reconstruction and memory contrast learning
CN113205002B (en) Low-definition face recognition method, device, equipment and medium for unlimited video monitoring
Diyasa et al. Multi-face Recognition for the Detection of Prisoners in Jail using a Modified Cascade Classifier and CNN
Shrivastava et al. Conceptual model for proficient automated attendance system based on face recognition and gender classification using Haar-Cascade, LBPH algorithm along with LDA model
CN116052211A (en) Knowledge distillation-based YOLOv5s lightweight sheep variety identification method and system
CN115050075A (en) Cross-granularity interactive learning micro-expression image labeling method and device
CN111259759A (en) Cross-database micro-expression recognition method and device based on domain selection migration regression
CN108197593B (en) Multi-size facial expression recognition method and device based on three-point positioning method
CN115719497A (en) Student concentration degree identification method and system
Shukla et al. Deep Learning Model to Identify Hide Images using CNN Algorithm
Vivek et al. A Way to Mark Attentance using Face Recognition using PL
Wu et al. Question-driven multiple attention (dqma) model for visual question answer
Hashan et al. Automated human facial emotion recognition system using depthwise separable convolutional neural network
Goyal et al. Online Attendance Management System Based on Face Recognition Using CNN
Ghosh Real-Time Attendance System Using Face Recognition Technique
Musa Facial Emotion Detection for Educational Purpose Using Image Processing Technique
Suresh et al. Real Time Automatic Sign Language Translation Machine Learning & IoT
Momin et al. Recognizing facial expressions in the wild using multi-architectural representations based ensemble learning with distillation
Malik et al. Compute Age and Gender of Person through ML
Tursun et al. Automated Real Time Detection of Suspicious Appearances using Deep Learning
Varshney et al. Multi-Model Emotion Detection using Machine Learning Techniques and Data Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination