CN113065460B - Establishment method of pig face facial expression recognition framework based on multitask cascade - Google Patents

Establishment method of pig face facial expression recognition framework based on multitask cascade Download PDF

Info

Publication number
CN113065460B
CN113065460B CN202110350752.0A CN202110350752A CN113065460B CN 113065460 B CN113065460 B CN 113065460B CN 202110350752 A CN202110350752 A CN 202110350752A CN 113065460 B CN113065460 B CN 113065460B
Authority
CN
China
Prior art keywords
attention
pig face
pig
network
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110350752.0A
Other languages
Chinese (zh)
Other versions
CN113065460A (en
Inventor
温长吉
张笑然
吴建双
于合龙
石磊
郭宏亮
毕春光
李卓识
苏恒强
薛明轩
杨之音
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin Agricultural University
Original Assignee
Jilin Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin Agricultural University filed Critical Jilin Agricultural University
Priority to CN202110350752.0A priority Critical patent/CN113065460B/en
Publication of CN113065460A publication Critical patent/CN113065460A/en
Application granted granted Critical
Publication of CN113065460B publication Critical patent/CN113065460B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method for establishing a pig face facial expression recognition framework based on multitask cascade, and belongs to the technical field of computer image recognition and artificial intelligence. The cascade framework model is firstly proposed to be applied to the classification and identification of the pig time sequence facial expression images. The network model is composed of three level 1 connection structures, and pig face facial expression video frame images are selected and input into the simplified multitask cascade convolution neural network at equal intervals. And secondly, inputting the extracted frame feature map of the pig face sequence into a multi-attention machine module, capturing a face significant region caused by expression change, and realizing attention to fine changes of the face. And then, merging the fine characteristic diagram and the multi-attention characteristic diagram extracted from the video frame through array merging operation, and inputting the merged fine characteristic diagram and the multi-attention characteristic diagram into a long-term and short-term memory network to realize expression classification and recognition. Emotion regulation and control can be better realized through expression recognition of livestock, so that the digestibility and the utilization rate of the feed are improved, the growth speed is increased, and the yield is improved.

Description

Establishment method of pig face facial expression recognition framework based on multitask cascade
Technical Field
The invention relates to the technical field of computer image recognition and artificial intelligence, in particular to a method for establishing a pig face facial expression recognition framework based on multitask cascade, which is an end-to-end model framework for livestock facial expression recognition in a video.
Background
Animal emotion research is one of important research targets of animal science, can better evaluate the welfare of livestock, and the good emotion of livestock such as pigs and the like in the feeding process plays an important role in ensuring that the digestibility and the utilization rate of feed are the highest, so that the growth speed is increased, the yield benefit is increased, and the animal emotion research based on facial expression recognition is of great significance.
The animal facial expression recognition faces challenges, firstly, compared with the facial expression recognition, the animal facial expression change is difficult to perceive and recognize, because the animal facial expression change mainly depends on cheekbar muscles on two sides of a cheek, and the change range of the muscle group structure is simple and low. Secondly, the existing work related to facial expression recognition of animals is mostly based on physiological anatomy, and the cost of payment is high and the efficiency is low. Finally, the collection of the physiological signs of the animal face is difficult, no large-scale standardized data set which can be used for supervising semi-supervised learning exists, and at present, the facial expressions of the animal in a static image are mostly identified by only a few methods based on machine vision, and the research of classifying and identifying the expression with time sequence in a video frame does not exist. Facial expressions in static images are only expression feature records at a certain time point, however, facial expressions themselves have space-time property, so that only a very small amount of static image-based facial expression recognition methods lose a large amount of space-time logic features caused by time dimension change in the extraction and representation processes of facial expression features, and the inherent regularity of the facial expression presentation is violated, so that an end-to-end model framework for recognizing the facial expressions of livestock in videos is urgently needed.
Disclosure of Invention
The invention aims to provide a method for establishing a pig face facial expression recognition framework based on multitask cascade, which solves the problems in the prior art. The method is used for classifying and identifying the facial expressions of the pig faces in the video images based on a multi-attention-machine cascade long-and-short-term memory network model. Firstly, a simplified multi-task cascade convolution network is utilized to quickly detect and position the pig face in a video frame, and the influence of a non-pig face area on the identification performance is removed. The detected and positioned pig face facial sequence feature map is sent to a multi-attention convolution mechanism module, and attention is paid to salient regions caused by various expression changes, so that the problem that facial expressions are difficult to perceive and identify due to the fact that the simple structure and small amplitude change of the domestic pig facial expression muscle group are overcome. And finally fusing the extracted global feature map and the attention feature map into refined features through merging array operation, and sending the refined features into a long-time and short-time memory network in a sequence form to finally realize expression recognition.
The above object of the present invention is achieved by the following technical solutions:
the establishment method of the pig face facial expression recognition framework based on multitask cascade comprises the following steps:
s1, inputting a pig face facial expression video segment, and carrying out category labeling on the input video segment, wherein the input video segment is respectively angry, cheerful, fear and peace four types of expressions of the domestic pig;
s2, a first stage of a cascade framework model: selecting frame images of the pig face facial expression video images at equal intervals, and inputting the frame images into a simplified multi-task cascade convolution neural network for detecting and positioning the pig face; the simplified multitask cascade convolution neural network realizes the rapid detection and positioning of the pig face area by two steps of coarse granularity and fine granularity respectively;
s3, a second stage of the cascade framework model: inputting the extracted pig face facial sequence frame images into a multi-attention machine system module for extracting and constructing a salient region characteristic diagram of pig face facial expression change; firstly, extracting a global convolution characteristic diagram from the face of the pig by using a shallow residual error network; secondly, a channel grouping response attention mechanism is used for capturing and generating a significant region characteristic diagram of the change of the facial expression of the pig face; then merging the attention area feature map and the global convolution feature map to generate a pig face feature map fused with the attention mechanism;
s4, a third stage of the cascade framework model: and (4) inputting the pig face facial feature map fused with the attention mechanism into a long-time memory network and a short-time memory network in sequence, and identifying and classifying the pig face facial expression through a full connection layer and a softmax classifier.
The model architecture method for simplifying the multitask cascade convolution neural network in the step S2 is as follows:
s21, coarse grain detection and localization: acquiring a pig face facial window and a bounding box regression vector thereof by using a full convolution network, namely a recommendation network, and correcting the candidate window according to the estimated bounding box regression vector; finally, non-maximum values are used for restraining and combining the candidate windows with high overlapping;
s22, fine-grained detection and positioning: all candidate objects containing the pig faces obtained in the step S21 are transmitted to a fine network, wrong candidate windows are screened and removed, a bounding box regression vector is used for calibration, non-maximum value inhibition is executed, and finally bounding box coordinates containing the pig faces are output, so that pig face detection and positioning are realized;
s23, loss optimization function: the loss function of the simplified multitask cascade convolution neural network is respectively composed of a pig face classification loss function and an Euclidean distance regression loss function regressed by a face region boundary frame, and network learning is realized by jointly optimizing the loss function; the joint optimization loss function is:
Figure BDA0003002035490000031
Figure BDA0003002035490000032
Figure BDA0003002035490000033
wherein L iscdOptimized objective function representing simplified multi-task cascaded convolutional neural network for pig face detectionN is the total number of samples in the training set, i represents the ith sample, j represents the task type and takes the value as det or box, det is used for representing that the task type is pig face discrimination, box is used for representing that the task type is pig face regression frame detection,
Figure BDA0003002035490000034
represents the loss function of the ith sample in the jth task, alphajIndicating the weight possessed by the jth task corresponding to the penalty function,
Figure BDA0003002035490000035
the value of the label of the ith sample in the jth task is 0 or 1, and the corresponding weight distribution proportion of the coarse-grained task and the fine-grained task is respectively alphadet1 and αbox=0.5,
Figure BDA0003002035490000036
Figure BDA0003002035490000037
True tags representing samples, piRepresenting the probability that the i sample network output is a pig face,
Figure BDA0003002035490000038
the coordinates of the pig face bounding box predicted for the network,
Figure BDA0003002035490000039
the coordinates of the artificially labeled real bounding box, both of which are four-dimensional vectors R4The horizontal and vertical coordinates of the upper left corner of the regression frame and the width and height of the regression frame are respectively.
The method for extracting and constructing the distinctive zone feature map of the facial expression change of the pig face in the step S3 is as follows:
s31, inputting the feature map extracted from the video frame sequence containing the pig face facial expression extracted in the step S2 into a shallow residual network for generating a global feature map with time sequence;
s32, grouping the global feature graphs obtained in the step S31 according to a channel response mode: first, each feature channel is calculatedFor the attention area contribution degree, the weight calculation expression is as follows: dυ(X)=fυ(W X X), wherein dυ(X)=[dυ(1),…,dυ(c)]To generate n attention regions, a set of fully connected functions F (·) { F ] is defined1(·),…fυ(·),…fN(. o) }, each fυ(. The) takes convolution characteristics as input, respectively corresponds to upsilon attention areas, receives the input of a c-dimensional characteristic channel, and generates a c-dimensional weight vector dυThe method is used for referring to the contribution degree of each feature channel to an attention region upsilon, W X represents the convolution feature of an input sample X, W represents the parameter set of a feature extraction unit, W, h and c represent the width, height and number of feature channels of the input sample respectively, and "+" represents the convolution, pooling and activation operation of the feature extraction unit;
s33, calculating an attention area feature map according to the weight calculated in the step S32: first based on the learned weight vector dυObtaining an attention mask matrix M for each region of interestυ
Figure BDA0003002035490000041
Wherein X represents an input sample, a sigmoid function is taken to normalize the input sample to be 0-1, k and upsilon represent different characteristic channels, namely different attention area index values, (k ≠ upsilon) E {1, 2, …, N }, [ · c]kK-th eigen-channel weight vector d representing convolution features W XkMultiplying with corresponding elements of the corresponding feature channel; then calculating the attention area feature map
Figure BDA0003002035490000042
Wherein P isυ(X) representing a characteristic diagram of the upsilon attention areas, and calculating through pooling on each channel, wherein the operation mark is multiplied by points of a mask matrix and a convolution characteristic diagram of the upsilon attention areas and then accumulated;
s34: constructing a characteristic channel grouping clustering optimization objective function LcgImplementing feature channel clustering to obtain attention area, LcgThe method aims to judge the distance between the feature points of the high attention area and the feature points of the weak attention areaThe correlation of (2) makes the coordinates in the same attention area more clustered, which is represented by a function Dis (·), the coordinates in different areas are as far as possible, which is represented by a function Div (·), and λ represents the target constraint assignment weight, and the optimization objective function is as follows:
min Lcg(Mυ)=Dis(Mυ)+2Div(Mυ)
Figure BDA0003002035490000043
Figure BDA0003002035490000044
where (x, y) is taken from the attention area coordinates, mυ(x, y) attention mask matrix M corresponding to the region of interestυ(X) response value at (X, y) coordinate, txAnd tyCoordinates representing the peak response of the training set to the v attention regions,
Figure BDA0003002035490000045
for expressing that the (x, y) coordinate position can best represent the response value of the upsilon areas, TmrgThe threshold value representing the preset boundary is a constant value and is used for preventing extreme values from appearing, so that the loss is not sensitive to noise, and the robustness of the network is realized.
The method for identifying and classifying the facial expressions of the pig faces in the step S4 is as follows:
the long-short term memory network classifies the facial expressions of the pig face in real time, the facial global convolution feature map obtained in the step S31 and the multi-attention feature map obtained in the step S33 are merged and input into the long-short term memory network through merging arrays, and four types of expression probability values of anger, joy, fear, peace and the like are output, so that the classification and identification of the facial expressions of the pig face are realized; the optimization function of the cascaded pig face facial expression recognition framework model is as follows,
Figure BDA0003002035490000046
wherein gamma is the weight of the objective function in the balance stage,LcdRepresenting an optimized objective function, alpha, of a simplified multitask cascaded convolutional neural network for pig face detectionjRepresenting the weight of a loss function corresponding to the jth task in the simplified multi-task cascaded convolutional neural network, wherein j belongs to { det, box }, det is used for representing that the task type is pig face discrimination, box is used for representing that the task type is pig face regression box detection,
Figure BDA0003002035490000051
represents the loss value of the ith sample in task j; l iscgRepresenting the construction of a characteristic channel grouping clustering optimization objective function, lambda represents the target constraint distribution weight in the attention area, MυAn attention mask matrix representing a first v regions.
The invention has the beneficial effects that:
1. the invention firstly provides a multi-attention-machine-based cascade long-and-short-term memory network framework model for classifying and identifying the facial expressions of the pig faces in the video images. Compared with the existing method for recognizing the expression of the livestock by researching the change of the facial muscle group of the animal based on the physiological anatomy, the method has the advantages of higher payment cost and lower efficiency. The method is also different from the existing animal (livestock) expression recognition method based on single static image recognition, and the machine vision recognition method of single static image loses the time sequence information in the expression change process in the feature extraction and representation process. The end-to-end model framework for identifying the facial expression of the livestock in the video has innovation and advancement.
2. The model framework is a multi-task cascading framework, and the structural design is innovative. The first stage of the cascade task is a simplified multitask convolution network which is used for detecting and positioning the pig faces in the video frames so as to remove the influence of non-pig face areas on the identification performance. The second stage of the cascade task is a multi-attention mechanism module, grouping is carried out through characteristic channels, an attention area is obtained through weak supervision clustering by utilizing the characteristic that visual information concerned by different channels of a characteristic diagram is different and a peak value response area is also different, and a saliency area caused by various expression changes is concerned through attention. And the third stage of the cascading task is a long-term and short-term memory network model, and the extracted convolution characteristics and the attention characteristic graph are fused into refined characteristics through merging array operation and are sent into the long-term and short-term memory network in a sequence form to realize expression recognition.
3. The emotion research of the livestock is one of important research targets of animal science, the emotion of the livestock is known by identifying expression changes, the welfare of the livestock can be better evaluated, and the emotion research method plays an important role in ensuring that the digestibility and the utilization rate of the feed are the highest, so that the growth speed is increased, and the production benefit is increased.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention.
FIG. 1 is a framework diagram of the process of establishing a pig face facial expression recognition model according to the present invention;
FIG. 2 is a diagram of steps for establishing a pig face facial expression recognition model according to the present invention;
FIG. 3 is a flow chart of a multi-attention convolutional network implementation of the present invention.
Detailed Description
The details of the present invention and its embodiments are further described below with reference to the accompanying drawings.
Referring to fig. 1 to 3, the method for establishing a pig face facial expression recognition framework based on multitask cascade connection comprises the following steps:
s1: inputting a video segment of pig face facial expression, selecting a video segment which is shot in a pig farm and contains a front pig face, and carrying out category labeling on the input video segment according to related results and artificial experience, wherein a video segment data set is divided into four types of expressions of anger, joy, fear and peace and the like of a live pig and is used for training, verifying and testing a frame model;
s2: and (3) inputting the frame images of the pig face facial expression video images at equal intervals into a simplified multitask cascade convolution neural network for detecting and positioning the pig face. The simplified multitask cascade convolution neural network realizes rapid detection and positioning of the pig face area by two steps of coarse granularity and fine granularity respectively. The specific execution steps are as follows:
s21: coarse grain detection and positioning, obtaining a pig face window and a boundary frame regression vector thereof by utilizing a full convolution network, namely a recommendation network, and correcting the candidate window according to the estimated boundary frame regression vector. And finally, suppressing and merging the high-overlapping candidate windows by using the non-maximum value.
S22: fine-grained detection and positioning, all candidate objects containing the pig face obtained in the step S21 are transmitted to a refined network, wrong candidate windows are screened and removed, a bounding box regression vector is used for calibration, non-maximum value inhibition is executed, and finally bounding box coordinates containing the pig face are output, so that pig face detection and positioning are realized.
S23: and the loss optimization function is a simplified loss function of the multitask cascade convolution neural network and is respectively composed of a pig face classification loss function and an Euclidean distance regression loss function regressed by a face region boundary frame, and the network learning is realized by the joint optimization loss function. The joint optimization loss function is:
Figure BDA0003002035490000061
Figure BDA0003002035490000071
Figure BDA0003002035490000072
wherein L iscdThe optimization objective function of the simplified multi-task cascade convolution neural network for pig face detection is represented, N is the total number of samples in a training set, i represents the ith sample, j represents the task type and takes the value of det or box, det is used for representing that the task type is pig face discrimination, box is used for representing that the task type is pig face regression box detection,
Figure BDA0003002035490000073
indicating that the ith sample is in the jth taskLoss function of alphajIndicating the weight possessed by the jth task corresponding to the penalty function,
Figure BDA0003002035490000074
the value of the label of the ith sample in the jth task is 0 or 1, and the corresponding weight distribution proportion of the coarse-grained task and the fine-grained task is respectively alphadet1 and αbox=0.5,
Figure BDA0003002035490000075
Figure BDA0003002035490000076
True tags representing samples, piRepresenting the probability that the i sample network output is a pig face,
Figure BDA0003002035490000077
the coordinates of the pig face bounding box predicted for the network,
Figure BDA0003002035490000078
the coordinates of the artificially labeled real bounding box, both of which are four-dimensional vectors R4The horizontal and vertical coordinates of the upper left corner of the regression frame and the width and height of the regression frame are respectively.
S3: on the basis of the step S2, the extracted pig face facial sequence frame images are input into a multi-attention machine module for extracting and constructing a salient region feature map of the change of pig face facial expression. The specific execution steps are as follows:
s31: inputting the video sequence containing the pig face facial expression extracted in the step S2 into a residual error network with a depth of 24 layers, wherein the network structure comprises 8 groups of residual error units, the first two layers of the residual error units in each group are respectively of a structure BN-ReLu-Conv (3 × 3), the last structure is of a structure BN-Conv (3 × 3), the step size is 1, a downsampling structure needs to be added for realizing each stage of the network, and at the moment, the step size is changed to 2 to obtain a pig face convolution characteristic map with the size of 28 × 28 × 512.
S32: grouping the global feature maps obtained in the step S31 according to the channel response mode: first, the attention area of each feature channel is calculatedThe domain contribution degree and the weight calculation expression are as follows: dυ(X)=fυ(W X X), wherein dυ(X)=[dυ(1),…,dυ(c)]To generate n attention regions, a set of fully connected functions F (·) { F ] is defined1(·),…fυ(·),…fN(. o) }, each fυ(. h) taking convolution characteristics as input, respectively corresponding to a upsilon attention areas, receiving input of c-dimensional characteristic channels, and generating a c-dimensional weight vector d upsilon for indicating the contribution degree of each characteristic channel to the attention areas upsilon, wherein W X represents the convolution characteristics of input samples X, W represents a parameter set of a characteristic extraction unit, and is respectively W, h, c represents the width, height and number of the characteristic channels of the input samples, and the' represents the convolution, pooling and activation operations of the characteristic extraction unit;
s33, calculating an attention area feature map according to the weight calculated in the step S32: first based on the learned weight vector dυObtaining an attention mask matrix M for each region of interestυ
Figure BDA0003002035490000081
Wherein X represents an input sample, a sigmoid function is taken to normalize the input sample to be 0-1, k and upsilon represent different characteristic channels, namely different attention area index values, (k ≠ upsilon) E {1, 2, …, N }, [ · c]kK-th eigen-channel weight vector d representing convolution features W XkMultiplying with corresponding elements of the corresponding feature channel; then calculating the attention area feature map
Figure BDA0003002035490000082
Wherein P isυ(X) representing a characteristic diagram of the upsilon attention areas, and calculating through pooling on each channel, wherein the operation mark is multiplied by points of a mask matrix and a convolution characteristic diagram of the upsilon attention areas and then accumulated;
s34: constructing a characteristic channel grouping clustering optimization objective function LcgImplementing feature channel clustering to obtain attention area, LcgThe method aims to judge the correlation between the characteristic points of the high attention area and the weak attention area so as to ensure that the same characteristic point is in the same stateAttention is paid to the more concentrated coordinates in the area, expressed by the function Dis (-) and the coordinates of different areas as far as possible, expressed by the function Div (-) and λ, the objective constraint is assigned weight, and the optimization objective function is as follows:
min Lcg(Mυ)=Dis(Mυ)+λDiv(Mx)
Figure BDA0003002035490000083
Figure BDA0003002035490000084
where (x, y) is taken from the attention area coordinates, mυ(x, y) attention mask matrix M corresponding to the region of interestυ(X) response value at (X, y) coordinate, txAnd tyCoordinates representing the peak response of the training set to the v attention regions,
Figure BDA0003002035490000085
for expressing that the (x, y) coordinate position can best represent the response value of the upsilon areas, TmrgThe threshold value representing the preset boundary is a constant value and is used for preventing extreme values from appearing, so that the loss is not sensitive to noise, and the robustness of the network is realized.
S4: the long-short term memory network classifies the pig face facial expressions in real time, the facial global convolution feature map obtained in the step S31 and the multi-attention feature map obtained in the step S33 are input into the long-short term memory network through merging array fusion, the pig face facial expressions are identified and classified through a full connection layer and a softmax classifier, and four types of expression probability values of anger, joy, fear, peace and the like are output, so that the pig face facial expression classification identification is realized. The optimization function of the cascaded pig face facial expression recognition framework model is as follows,
Figure BDA0003002035490000086
where γ is the weight of the objective function in the equilibrium stage, LcdRepresenting a simplified multitasking level for pig face detectionOptimization objective function of the network, alphajRepresenting the weight of a loss function corresponding to the jth task in the simplified multi-task convolutional network, wherein j belongs to { det, box }, det is used for representing that the task type is pig face discrimination, box is used for representing that the task type is pig face regression box detection,
Figure BDA0003002035490000094
represents the loss value of the ith sample in task j; l iscgRepresenting an optimized objective function for clustering of constrained channel groups, λ representing the assigned weight of the target constraint in the attention area, MυAn attention mask matrix representing a first v regions.
S5: optimizing an objective function and updating parameters, and carrying out model training and verification on the network structure in a training set and a verification set through multiple iterations. According to a random gradient descent method
Figure BDA0003002035490000091
we+1=we+ve+1Updating network parameters, optimizing an objective function, wherein e in the above formula represents iteration times, v represents momentum, eta represents learning rate,
Figure BDA0003002035490000092
denotes D in the e-th iterationeThe partial derivatives of the loss function L generated by the training for each batch with respect to the weight w.
The pig face facial expression recognition model based on the multi-attention mechanism cascade long-and-short time memory network model is not only suitable for recognizing pig face facial expressions, but also suitable for recognizing expressions in other livestock video images.
Example (b):
referring to fig. 1 to 3, in the pig face facial expression recognition framework model based on the multi-attention-machine cascade long-and-short-term memory network model, data expansion is firstly performed on the basis of sharing a pig face facial expression data set, specifically, video darkness change, small-angle rotation, left-right turning and the like are performed, and expressions are divided into four classes by the data set, namely four classes of expressions such as anger, joy, fear and peace. The training set and the verification set are divided in a five-fold cross verification mode, the training set is used for training, the error between the actual output result of the training and the label value is calculated, the difference value is transmitted from top to bottom through a back propagation algorithm, the weight value is updated through the weight value, after the training is finished, the trained neural network model is stored, and the verification set adjustment parameters are input to conduct preliminary evaluation on the training condition of the network model. The method specifically comprises the following steps:
s1: the method comprises the steps of inputting a video segment of facial expressions of pig faces, selecting a video segment which is shot in a pig farm and contains front pig faces, and dividing a video segment data set into four categories of angry, cheerful, fear of live pigs, neutral expressions in a peaceful state and the like.
S2: the simplified multitask concatenated convolutional network locates the pig face in the video segment and extracts the video image containing the pig face front normalized to 224 x 3 size.
The network first defines the learning objective as a two-classification problem of pig face/non-pig face, x for each sampleiUsing cross entropy loss:
Figure BDA0003002035490000093
wherein i represents the ith sample, and det is used for representing that the task class of the simplified multitask cascade convolution neural network is pig face discrimination. p is a radical ofiRepresenting the probability that the network output is a pig face,
Figure BDA0003002035490000101
a real bounding box label representing a manual annotation.
Secondly, the offset between the predicted bounding box generated by the network and the actual bounding box nearest to the predicted bounding box is calculated, and for each sample xiThe loss is calculated by euclidean.
Figure BDA0003002035490000102
Where box is used to represent a simplified multi-task cascaded convolutional neural networkThe task category is the pig face regression box detection,
Figure BDA0003002035490000103
the coordinates of the pig face bounding box predicted for the network,
Figure BDA0003002035490000104
the coordinates of the real bounding box marked for the manual work are composed of four values, namely the horizontal and vertical coordinates of the upper left corner of the bounding box and the height and width of the bounding box, so that
Figure BDA0003002035490000105
Representing a four-dimensional vector.
The simplified multi-task cascade convolution neural network adds a weight alpha before the final loss function, and the weights of two tasks, namely det and box, are different, and the training optimization objective function is as follows:
Figure BDA0003002035490000106
wherein L iscdRepresents an optimized objective function of a simplified multi-task cascaded convolutional neural network for pig face detection, N is the total number of samples in a training set, i represents the ith sample, j represents a task type and takes the value of det or box,
Figure BDA0003002035490000107
represents the loss function of the ith sample in the jth task, alphajIndicating the weight possessed by the jth task corresponding to the penalty function,
Figure BDA0003002035490000108
the value of the label of the ith sample in the jth task is 0 or 1, and the corresponding weight distribution proportion of the coarse-grained task and the fine-grained task is respectively alphadet1 and αboxWhen is equal to 0.5
Figure BDA0003002035490000109
When the sample is judged to be a non-pig face, when
Figure BDA00030020354900001010
At that time, the sample was judged to be a pig face.
S3: constructing a network recognition model based on the facial expression of the pig face, and extracting the global features and the multi-attention features of the pig face;
(1) firstly, extracting global convolution characteristics from the face of a pig by using a 24-layer residual error network model;
inputting a video sequence which is extracted by S2 and contains a pig face and has the size of 224 multiplied by 3 into a residual error network with the depth of 24 layers, wherein the network structure comprises 8 groups of residual error units, the first two layers of the residual error units in each group have the structure of BN-ReLu-Conv (3 multiplied by 3), the last structure of the residual error units is BN-Conv (3 multiplied by 3), the step size is 1, a downsampling structure is required to be added for realizing each stage of the network, the step size is changed into 2 at the moment to obtain a pig face convolution characteristic diagram with the size of 28 multiplied by 512, the residual error structure can effectively avoid the problems of gradient disappearance and the like, and the robustness is higher aiming at object characteristic expression. An exemplary expression for the residual network structure is:
F=W2σ(W1x)
y=F(x,{Wi})+x
where the representation σ denotes the ReLu function, x and y are the input and output vectors of the network layer, i is the ith sample, and the function F (x, { W)i}) represents the residual mapping to be learned, and it can be found from the formula that neither additional parameters are introduced nor the computational complexity is increased, and the sizes of x and F must be equal, e.g. if the sizes are different, the linear transformation W needs to be performedsTo match the size.
(2) Secondly, introducing a multi-attention mechanism to generate attention characteristics of key areas of the pig face, as shown in figure 3;
inputting a video segment into a residual error network model to extract a convolution characteristic graph, unfolding characteristic channels of a first convolution layer to find that each characteristic channel has a peak value corresponding area, and clustering the channels with similar response areas and the number of full connection layers by utilizing the characteristics that the different channels of the characteristic graph pay attention to different visual information and the peak value response areas are differentSimilar to the number of fine-grained features, n attention areas are generated by pseudo-clustering, and a group of full-connection functions F (·) and F are defined1(·),…fυ(·),…fN(. o) }, wherein fυ(-) corresponding to the upsilon attention areas, receiving input of a c-dimensional feature channel and generating a c-dimensional weight vector dυAnd adding channels of the same type, and obtaining a corresponding probability value by taking a sigmoid function (normalized to 0-1), thereby obtaining the attention area required by the identification process.
Unfolding the characteristic channel obtained from the first convolution layer in the residual error network model and generating a weight vector d by using n stacked fully-connected layersυUsed to refer to the degree of contribution of each feature channel to region of attention v, dυ(X)=fυ(W*X)。
Where X represents the input image, W represents the overall parameters, with dimensions W × h × c, where W, h, c represent the width, height and number of feature channels of the image, W × X represents the extracted depth, represents the convolution, pooling and activation operations of the feature extraction unit, dυ(X)=[dυ(1),…,dυ(c)],fυ(-) is a fully connected function corresponding to the v attention regions.
Clustering channels with similar response areas by the learned weight vectors, normalizing the channels into 0-1 by a sigmoid function to obtain corresponding probability values, and acquiring an attention mask matrix M of each attention area based on the learned weight vectorsυ(X), upsilon ∈ {1, 2, …, N }, and the formula is as follows.
Figure BDA0003002035490000111
Wherein X represents an input sample, a sigmoid function is taken to normalize the input sample to be 0-1, k and upsilon represent different characteristic channels, namely different attention area index values, (k ≠ upsilon) E {1, 2, …, N }, [ · c]kK-th eigen-channel weight vector d representing convolution features W XkMultiplied by the corresponding element of the corresponding eigen-channel.
(3) And performing spatial pooling operation on the global features and the attention area features to obtain a multi-attention feature map of the pig face.
The size of an input video sequence is 28 x 512, the size of an attention area is 28 x 1, and finally space pooling operation is carried out on a upsilon attention area mask and a convolution feature map to obtain a multi-attention feature P of a pig face attention map with the size of 28 x 512υ(X)。
Figure BDA0003002035490000121
Wherein P isυ(X) representing a characteristic diagram of the upsilon attention areas, and calculating through pooling on each channel, wherein the operation mark is multiplied by points of a mask matrix and a convolution characteristic diagram of the upsilon attention areas and then accumulated;
s4: and merging and inputting the extracted sequence images of the multi-attention feature and the convolution feature into a long-time and short-time memory network in a combined array mode, and identifying and classifying the facial expression of the pig face through a full connection layer and a softmax classifier.
Because the change of the pig expression usually lasts for 3-4 seconds, the frame rate is 25 frames per second, the change of the table expression is considered to be a continuous dynamic change, and the information content provided by the beginning frame and the ending frame is poor, the method of 'pinching head and removing tail' is adopted in the feature frame extraction process, the method of middle average frame is reserved, one frame is taken at every 5 frames in the middle section of the video for halving sampling, and if the number of frames of the original video section is smaller than the average length, the method of copying the last frame is adopted, so that each video sequence becomes 16 frames with the average length required by the experiment. The invention selects 16 frames as input frames, the dimension of the input frames is 28 multiplied by 512, and the input frames pass through an input gate i inside a networktForgetting door ftAnd from the memory cell ctCandidate vector with continuously updated feature information
Figure BDA0003002035490000128
Finally passes through an output gate otAnd obtaining a class vector of the sample, and setting hidden layer units of the long-time and short-time memory network as a single layer 128.
S5: and defining a network model loss function, and performing model training and verification on the network structure in a training set and a verification set through multiple iterations. The loss function is specifically as follows:
Figure BDA0003002035490000122
Figure BDA0003002035490000123
Figure BDA0003002035490000124
Figure BDA0003002035490000125
Lcg(λ,Mυ)=Dis(Mυ)+λDiv(Mυ)
Figure BDA0003002035490000126
Figure BDA0003002035490000127
where γ is the weight of the objective function in the equilibrium stage, αjThe weight of the jth task in the simplified multi-task cascaded convolutional neural network corresponding to the loss function is shown, lambda represents the target constraint distribution weight in the attention area,
Figure BDA0003002035490000131
represents the loss value of the ith sample in task j, MυAn attention mask matrix, L, representing a first v regionscdRepresenting an optimized objective function, L, of a simplified multi-task cascaded convolutional neural network for pig face detectioncgAn optimization objective function for constraining the clustering of channel packets is represented. N is a radical ofIn order to train the total number of samples in the set,
Figure BDA0003002035490000132
the value of the label of the ith sample in the jth task is 0 or 1, j belongs to { det, box }, det is used for indicating that the task type is pig face discrimination, box is used for indicating that the task type is pig face regression box detection,
Figure BDA0003002035490000133
true tags representing samples, piRepresenting the probability that the network output is a pig face.
Figure BDA0003002035490000134
The coordinates of the pig face bounding box predicted for the network,
Figure BDA0003002035490000135
and (3) manually labeling the coordinates of the real boundary box, wherein the coordinates are four-dimensional vectors, namely the horizontal and vertical coordinates of the upper left corner of the regression box and the width and height of the regression box. L iscgThe method aims to judge the correlation between the characteristic points of the high attention area and the characteristic points of the weak attention area, namely, the coordinates close to the peak response area of the channel in the same attention area are more gathered and expressed by a function Dis (-), and the coordinates of the peak response areas of the channel in different attention areas are far as possible and expressed by a function Div (-), (x, y) are taken from the coordinates of the attention area mυ(x, y) attention mask matrix M corresponding to the region of interestυ(X) response value at (X, y) coordinate, txAnd tyCoordinates representing the peak response of the training set to the upsilon attention areas, k and upsilon representing different attention area index values,
Figure BDA0003002035490000136
where k, υ 1, 2, …, N is used to indicate the response value of the (x, y) coordinate position that can best represent the υ zone, TmrgThe threshold value represents a preset boundary threshold value and is a constant value used for preventing extreme values from appearing, so that the loss is not sensitive to noise, and the robustness of the network is realized.
The method is not only suitable for recognizing the facial expression of the pig face, but also suitable for recognizing four types of simple expressions in other livestock facial videos, namely anger, fear, joy and peace. The pig face facial expression recognition model framework based on the multi-attention-machine cascade long-time memory network model identifies the pig expressions in the video frame images, and is different from the existing livestock expression recognition method mainly based on a physiological anatomy method and the livestock expression recognition method in a static image. The invention firstly provides that the cascade framework model is applied to the classification and identification of the timing sequence facial expression images of the pigs. The network model is composed of three cascade structures, and the pig face facial expression video frame images are selected and input into the simplified multi-task cascade convolution neural network at equal intervals to be used for detecting and positioning the pig face. And secondly, inputting the extracted frame feature map of the pig face sequence into a multi-attention machine module, capturing a face significant region caused by expression change, and realizing attention to fine changes of the face. And then, merging the fine characteristic diagram and the multi-attention characteristic diagram extracted from the video frame through array merging operation, and inputting the merged fine characteristic diagram and the multi-attention characteristic diagram into a long-term and short-term memory network to realize expression classification and recognition. The end-to-end cascade framework provided by the invention can effectively solve the problems that the facial expression muscle groups of the domestic pigs are simple in structure, small in quantity and short in expression duration, so that the facial expressions are difficult to perceive and recognize. The model framework provided by the invention is used for livestock emotion research, is one of important research targets of animal science, can better evaluate livestock welfare, and can better realize emotion regulation and control through livestock expression recognition, so that the feed digestibility and utilization rate are improved, the growth speed is increased, and the yield is increased.
The above description is only a preferred example of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like of the present invention shall be included in the protection scope of the present invention.

Claims (3)

1. A pig face facial expression recognition framework establishing method based on multitask cascade is characterized by comprising the following steps: the method comprises the following steps:
s1, inputting a pig face facial expression video segment, and carrying out category labeling on the input video segment, wherein the input video segment is respectively angry, cheerful, fear and peace four types of expressions of the domestic pig;
s2, a first stage of a cascade framework model: selecting frame images of the pig face facial expression video images at equal intervals, and inputting the frame images into a simplified multi-task cascade convolution neural network for detecting and positioning the pig face; the simplified multitask cascade convolution neural network realizes the rapid detection and positioning of the pig face area by two steps of coarse granularity and fine granularity respectively; the model architecture method of the simplified multitask cascade convolution neural network comprises the following steps:
s21, coarse grain detection and localization: acquiring a pig face facial window and a bounding box regression vector thereof by using a full convolution network, namely a recommendation network, and correcting the candidate window according to the estimated bounding box regression vector; finally, non-maximum values are used for restraining and combining the candidate windows with high overlapping;
s22, fine-grained detection and positioning: all candidate objects containing the pig faces obtained in the step S21 are transmitted to a fine network, wrong candidate windows are screened and removed, a bounding box regression vector is used for calibration, non-maximum value inhibition is executed, and finally bounding box coordinates containing the pig faces are output, so that pig face detection and positioning are realized;
s23, loss optimization function: the loss function of the simplified multitask cascade convolution neural network is respectively composed of a pig face classification loss function and an Euclidean distance regression loss function regressed by a face region boundary frame, and network learning is realized by jointly optimizing the loss function; the joint optimization loss function is:
Figure FDA0003557765920000011
Figure FDA0003557765920000012
Figure FDA0003557765920000013
wherein L iscdThe optimization objective function of the simplified multi-task cascade convolution neural network for pig face detection is represented, N is the total number of samples in a training set, i represents the ith sample, j represents the task type, the value is det or box, der is used for representing that the task type is pig face discrimination, box is used for representing that the task type is pig face regression box detection,
Figure FDA0003557765920000014
represents the loss function of the ith sample in the jth task, alphajIndicating the weight possessed by the jth task corresponding to the penalty function,
Figure FDA0003557765920000021
the value of the label of the ith sample in the jth task is 0 or 1, and the corresponding weight distribution proportion of the coarse-grained task and the fine-grained task is respectively alphadet1 and αbox=0.5,
Figure FDA0003557765920000022
True tags representing samples, piRepresenting the probability that the i sample network output is a pig face,
Figure FDA0003557765920000023
the coordinates of the pig face bounding box predicted for the network,
Figure FDA0003557765920000024
the coordinates of the artificially labeled real bounding box, both of which are four-dimensional vectors R4Respectively representing the horizontal and vertical coordinates of the upper left corner of the regression frame and the width and height of the regression frame;
s3, a second stage of the cascade framework model: inputting the extracted pig face facial sequence frame images into a multi-attention machine system module for extracting and constructing a salient region characteristic diagram of pig face facial expression change; firstly, extracting a global convolution characteristic diagram from the face of the pig by using a shallow residual error network; secondly, a channel grouping response attention mechanism is used for capturing and generating a significant region characteristic diagram of the change of the facial expression of the pig face; then merging the attention area feature map and the global convolution feature map to generate a pig face feature map fused with the attention mechanism;
s4, a third stage of the cascade framework model: and (4) inputting the pig face facial feature map fused with the attention mechanism into a long-time memory network and a short-time memory network in sequence, and identifying and classifying the pig face facial expression through a full connection layer and a softmax classifier.
2. The method for establishing a pig face facial expression recognition framework based on multitask cascade as claimed in claim 1, wherein: the method for extracting and constructing the distinctive zone feature map of the facial expression change of the pig face in the step S3 is as follows:
s31, inputting the feature map extracted from the video frame sequence containing the pig face facial expression extracted in the step S2 into a shallow residual network for generating a global feature map with time sequence;
s32, grouping the global feature graphs obtained in the step S31 according to a channel response mode: firstly, calculating the contribution degree of each feature channel to the attention area, wherein the weight calculation expression is as follows: dυ(X)=fυ(W X X), wherein dυ(X)=[dυ(1),…,dυ(c)]To generate n attention regions, a set of fully connected functions F (·) { F ] is defined1(·),…fυ(·),…fN(. o) }, each fυ(. The) takes convolution characteristics as input, respectively corresponds to upsilon attention areas, receives the input of a c-dimensional characteristic channel, and generates a c-dimensional weight vector dυThe method is used for referring to the contribution degree of each feature channel to an attention region upsilon, W X represents the convolution feature of an input sample X, W represents the parameter set of a feature extraction unit, W, h and c represent the width, height and number of feature channels of the input sample respectively, and "+" represents the convolution, pooling and activation operation of the feature extraction unit;
s33, calculating an attention area feature map according to the weight calculated in the step S32: first based on learning toWeight vector d ofυObtaining an attention mask matrix M for each region of interestυ
Figure FDA0003557765920000031
Wherein X represents an input sample, a sigmoid function is taken to normalize the input sample to be 0-1, k and upsilon represent different characteristic channels, namely different attention area index values, (k ≠ upsilon) E {1, 2, …, N }, [ · c]kK-th eigen-channel weight vector d representing convolution features W XkMultiplying with corresponding elements of the corresponding feature channel; then calculating the attention area feature map
Figure FDA0003557765920000032
Wherein P isv(X) representing a characteristic diagram of the upsilon attention areas, and calculating through pooling on each channel, wherein the operation mark is multiplied by points of a mask matrix and a convolution characteristic diagram of the upsilon attention areas and then accumulated;
s34: constructing a characteristic channel grouping clustering optimization objective function LcgImplementing feature channel clustering to obtain attention area, LcgThe method aims to judge the correlation between the feature points of the high attention area and the feature points of the weak attention area, so that coordinates in the same attention area are more aggregated and are represented by a function Dis (·), coordinates in different areas are far as possible and are represented by a function Div (·), lambda represents target constraint distribution weight, and the optimization objective function is as follows:
min Lcg(Mυ)=Dis(Mυ)+λDiv(Mυ)
Figure FDA0003557765920000033
Figure FDA0003557765920000034
where (x, y) is taken from the attention area coordinates, mυ(x, y) attention mask matrix M corresponding to the region of interestυ(X) response value at (X, y) coordinate, txAnd tyCoordinates, max, representing the peak response of the training set to the upsilon attention regionsυ≠k mυ(x, y) is used for expressing that the (x, y) coordinate position can best represent the response value of the upsilon areas, TmrgThe threshold value representing the preset boundary is a constant value and is used for preventing extreme values from appearing, so that the loss is not sensitive to noise, and the robustness of the network is realized.
3. The method for establishing a pig face facial expression recognition framework based on multitask cascade as claimed in claim 1, wherein: the method for identifying and classifying the facial expressions of the pig faces in the step S4 is as follows:
the long-short term memory network classifies the facial expressions of the pig face in real time, the facial global convolution feature map obtained in the step S31 and the multi-attention feature map obtained in the step S33 are merged and input into the long-short term memory network through merging arrays, and angry, cheerful, fear and peace and four types of expression probability values are output, so that the classification and identification of the facial expressions of the pig face are realized; the optimization function of the cascaded pig face facial expression recognition framework model is as follows,
Figure FDA0003557765920000035
where γ is the weight of the objective function in the equilibrium stage, LcdRepresenting an optimized objective function, alpha, of a simplified multitask cascaded convolutional neural network for pig face detectionjRepresenting the weight of a loss function corresponding to the jth task in the simplified multi-task cascaded convolutional neural network, wherein j belongs to { det, box }, det is used for representing that the task type is pig face discrimination, box is used for representing that the task type is pig face regression box detection,
Figure FDA0003557765920000041
represents the loss value of the ith sample in task j; l iscgRepresenting the construction of a characteristic channel grouping clustering optimization objective function, lambda represents the target constraint distribution weight in the attention area, MυThe attention mask matrix of the U-th region is shown.
CN202110350752.0A 2021-03-31 2021-03-31 Establishment method of pig face facial expression recognition framework based on multitask cascade Active CN113065460B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110350752.0A CN113065460B (en) 2021-03-31 2021-03-31 Establishment method of pig face facial expression recognition framework based on multitask cascade

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110350752.0A CN113065460B (en) 2021-03-31 2021-03-31 Establishment method of pig face facial expression recognition framework based on multitask cascade

Publications (2)

Publication Number Publication Date
CN113065460A CN113065460A (en) 2021-07-02
CN113065460B true CN113065460B (en) 2022-04-29

Family

ID=76564956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110350752.0A Active CN113065460B (en) 2021-03-31 2021-03-31 Establishment method of pig face facial expression recognition framework based on multitask cascade

Country Status (1)

Country Link
CN (1) CN113065460B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205085B (en) * 2021-07-05 2021-11-19 武汉华信数据***有限公司 Image identification method and device
CN114359958B (en) * 2021-12-14 2024-02-20 合肥工业大学 Pig face recognition method based on channel attention mechanism
CN114639156B (en) * 2022-05-17 2022-07-22 武汉大学 Depression angle face recognition method and system based on axial attention weight distribution network
CN115761569B (en) * 2022-10-20 2023-07-04 之江实验室 Video emotion positioning method based on emotion classification
CN116385070B (en) * 2023-01-18 2023-10-03 中国科学技术大学 Multi-target prediction method, system, equipment and storage medium for short video advertisement of E-commerce
CN117908369B (en) * 2023-04-23 2024-07-02 重庆市畜牧科学院 Pig farm cultivation environment dynamic adjustment method based on different temperature areas

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460638A (en) * 2018-05-18 2018-08-28 郑州外思创造力文化传播有限公司 A kind of personalized advertisement supplying system and method based on recognition of face
CN110147699A (en) * 2018-04-12 2019-08-20 北京大学 A kind of image-recognizing method, device and relevant device
CN111666838A (en) * 2020-05-22 2020-09-15 吉林大学 Improved residual error network pig face identification method
CN111783620A (en) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 Expression recognition method, device, equipment and storage medium

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014191602A (en) * 2013-03-27 2014-10-06 Casio Comput Co Ltd Display device, program, and display system
CN109460690B (en) * 2017-09-01 2022-10-14 虹软科技股份有限公司 Method and device for pattern recognition
CN108830262A (en) * 2018-07-25 2018-11-16 上海电力学院 Multi-angle human face expression recognition method under natural conditions
CN109522818B (en) * 2018-10-29 2021-03-30 中国科学院深圳先进技术研究院 Expression recognition method and device, terminal equipment and storage medium
CN109815785A (en) * 2018-12-05 2019-05-28 四川大学 A kind of face Emotion identification method based on double-current convolutional neural networks
CN110728179A (en) * 2019-09-04 2020-01-24 天津大学 Pig face identification method adopting multi-path convolutional neural network
CN110737783B (en) * 2019-10-08 2023-01-17 腾讯科技(深圳)有限公司 Method and device for recommending multimedia content and computing equipment
CN112101241A (en) * 2020-09-17 2020-12-18 西南科技大学 Lightweight expression recognition method based on deep learning
CN112287891B (en) * 2020-11-23 2022-06-10 福州大学 Method for evaluating learning concentration through video based on expression behavior feature extraction
CN112464865A (en) * 2020-12-08 2021-03-09 北京理工大学 Facial expression recognition method based on pixel and geometric mixed features
CN112559835B (en) * 2021-02-23 2021-09-14 中国科学院自动化研究所 Multi-mode emotion recognition method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147699A (en) * 2018-04-12 2019-08-20 北京大学 A kind of image-recognizing method, device and relevant device
CN108460638A (en) * 2018-05-18 2018-08-28 郑州外思创造力文化传播有限公司 A kind of personalized advertisement supplying system and method based on recognition of face
CN111666838A (en) * 2020-05-22 2020-09-15 吉林大学 Improved residual error network pig face identification method
CN111783620A (en) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 Expression recognition method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113065460A (en) 2021-07-02

Similar Documents

Publication Publication Date Title
CN113065460B (en) Establishment method of pig face facial expression recognition framework based on multitask cascade
Huang et al. Multi-pseudo regularized label for generated data in person re-identification
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
CN108932500B (en) A kind of dynamic gesture identification method and system based on deep neural network
CN111783576B (en) Pedestrian re-identification method based on improved YOLOv3 network and feature fusion
CN106897738B (en) A kind of pedestrian detection method based on semi-supervised learning
CN108960047B (en) Face duplication removing method in video monitoring based on depth secondary tree
Vrigkas et al. Matching mixtures of curves for human action recognition
Yang et al. Semi-supervised learning of feature hierarchies for object detection in a video
Redondo-Cabrera et al. Learning to exploit the prior network knowledge for weakly supervised semantic segmentation
AU2014240213A1 (en) System and Method for object re-identification
CN113297936B (en) Volleyball group behavior identification method based on local graph convolution network
CN112861917B (en) Weak supervision target detection method based on image attribute learning
CN111368660A (en) Single-stage semi-supervised image human body target detection method
CN110728216A (en) Unsupervised pedestrian re-identification method based on pedestrian attribute adaptive learning
CN110956158A (en) Pedestrian shielding re-identification method based on teacher and student learning frame
Huerta et al. Combining where and what in change detection for unsupervised foreground learning in surveillance
Yang et al. Bottom-up foreground-aware feature fusion for practical person search
CN115035599A (en) Armed personnel identification method and armed personnel identification system integrating equipment and behavior characteristics
Serpush et al. Complex human action recognition in live videos using hybrid FR-DL method
Tarimo et al. Real-time deep learning-based object detection framework
Sun et al. Exploiting deeply supervised inception networks for automatically detecting traffic congestion on freeway in China using ultra-low frame rate videos
CN112418358A (en) Vehicle multi-attribute classification method for strengthening deep fusion network
CN112307894A (en) Pedestrian age identification method based on wrinkle features and posture features in community monitoring scene
CN113920470B (en) Pedestrian retrieval method based on self-attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant