CN112487989B - Video expression recognition method based on capsule-long-and-short-term memory neural network - Google Patents

Video expression recognition method based on capsule-long-and-short-term memory neural network Download PDF

Info

Publication number
CN112487989B
CN112487989B CN202011384713.4A CN202011384713A CN112487989B CN 112487989 B CN112487989 B CN 112487989B CN 202011384713 A CN202011384713 A CN 202011384713A CN 112487989 B CN112487989 B CN 112487989B
Authority
CN
China
Prior art keywords
capsule
network
long
neural network
short
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011384713.4A
Other languages
Chinese (zh)
Other versions
CN112487989A (en
Inventor
刘思苇
舒坤贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Nutrition Tree Biotechnology Co ltd
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202011384713.4A priority Critical patent/CN112487989B/en
Publication of CN112487989A publication Critical patent/CN112487989A/en
Application granted granted Critical
Publication of CN112487989B publication Critical patent/CN112487989B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of facial expression recognition, and particularly relates to a video expression recognition method based on a capsule-long-short-term memory neural network, which comprises the steps of converting a video including a human face into a video frame; detecting a face image in a video frame, and preprocessing the face image; constructing a capsule network, extracting the characteristics of the face image by using the capsule network and reconstructing the image; constructing a long-time and short-time memory neural network, taking the output of a capsule network encoder as the input of the long-time and short-time memory neural network, and extracting time sequence characteristics; classifying expressions corresponding to the maximum probability value in the long-time memory neural network output as labels of the sequence; according to the method, the capsule network is combined with the long-term and short-term memory network, the capsule network extracts spatial information, and the long-term and short-term memory neural network extracts time sequence information, so that the accuracy of expression classification is effectively improved.

Description

Video expression recognition method based on capsule-long-and-short-term memory neural network
Technical Field
The invention belongs to the technical field of facial expression recognition, and particularly relates to a video expression recognition method based on a capsule-long-short-term memory neural network.
Background
The human face is one of the important biological features of a person, and contains a large amount of information, and among many pieces of information contained in the human face, expression information is one of the important information. The expression is the intuitive reaction of human emotion, the state of facial muscles and five sense organs is formed, and expression recognition as an important part of human-computer interaction is always one of important research findings of computer vision. By adopting the progress of computer technology, the arrival of a big data era, the development of computer hardware such as GPU (graphics processing unit) and the like, related achievements of several steps in the field of face recognition, such as face detection, face feature extraction, image classification and the like can be used as reference in the field of expression recognition, the expression recognition is also developed in a great extent on software and hardware, and corresponding research institutions, expression databases and new algorithms are more and more.
The expression recognition has very wide application fields, the body shadow is available in various fields such as man-machine interaction, robot manufacturing, medical health, remote education and the like, the emotion of the other party can be objectively analyzed by adopting artificial intelligence to recognize the expression, the phenomenon that the emotion of the other party is wrongly read due to the personal emotion is avoided, the defect that the energy of human is limited is overcome, and some fine expressions which can be ignored by naked eyes can be captured. In case examination, the police is assisted to monitor expressions of criminals, detect real psychological conditions of the criminals and help to detect cases; in clinical medicine, the psychological activities of patients are known through expression observation of self-closing children, and doctors are assisted to make a more appropriate treatment scheme to assist the quick recovery of the self-closing children; in a shopping mall, the customer satisfaction degree is monitored aiming at a certain product, and staff in the shopping mall are assisted to make a more appropriate popularization scheme; in the aspect of traffic travel, a learner researches fatigue driving to prevent accidents caused by poor self state of the driver; in distance education, through detecting the emotion changes of students in class, teachers are helped to master the learning degree of the students on knowledge points, and learning progress is scientifically adjusted, so that learning of the students is better promoted.
At present, a common video sequence expression recognition method is to combine a convolutional neural network and a long-term and short-term memory neural network to model changes of facial expressions in videos, generally a deep convolutional neural network is adopted to extract spatial information, and a multilayer long-term and short-term memory neural network is adopted to obtain time information. The convolutional neural network has a characteristic learning ability. The advantages of convolutional neural networks are many, but some disadvantages are also revealed in practical applications: (1) the consistency recognition capability after image migration is low, wherein the image migration refers to that the consistency of a convolutional neural network on left and right translation, rotation, frame addition and the like is difficult to perceive, so that a training set required by CNN is very large, and the data enhancement technology is useful but has limited promotion; (2) the problem of the convolutional neural network is that the neurons are all equal, and there is no internal organization structure, which results in that the same identification cannot be made on the same article at different positions and different angles, and the mutual relation between the substructures obtained by different convolutional kernels cannot be extracted. The convolutional neural network has good performance in extracting and detecting object features, but ignores local and internal relative position information (such as relative position, direction, skewness and the like), thereby losing some important information.
Disclosure of Invention
In order to overcome the defects, the invention provides a video expression recognition method based on a capsule-long-short-term memory neural network, which specifically comprises the following steps as shown in fig. 1:
converting a video including a face into a video frame;
detecting a face image in a video frame, and preprocessing the face image;
constructing a capsule network, extracting the characteristics of the face image by using a capsule network encoder and reconstructing the picture by using a capsule network decoder;
constructing a long-term and short-term memory neural network, and taking the output of the capsule network encoder as the input of the long-term and short-term memory neural network;
and classifying the expression corresponding to the maximum probability value in the long-time memory neural network output as a label of the sequence.
Further, detecting a face image in the video frame, and preprocessing the face image includes:
carrying out face detection on the video frame, intercepting a face ROI (region of interest), and carrying out size normalization and graying;
detecting the face in the video frame by adopting an MTCNN algorithm, positioning the face in the video frame, cutting the detected face into a size with a fixed size, and performing graying processing;
and respectively selecting fixed frames from the video frames in each video as a group of video sequences to finish the extraction and pretreatment of the face images.
Furthermore, the capsule network uses three convolution layers, a convolution capsule layer and a digital capsule layer as an encoder of the capsule network, uses four layers of deconvolution layers as a decoder of the digital capsule, extracts the characteristics of pictures through the convolution layers and converts a characteristic diagram after the last convolution operation into an original capsule for the use of a dynamic routing algorithm, iterates the capsule through the dynamic routing algorithm and superposes the capsules in the last dimension, and the digital capsule layer adopts the length of each capsule vector to represent the probability of each expression category and is used for calculating the classification loss; the encoder is used for optimizing the network, reconstructing the image with the highest output probability, comparing the Euclidean distance between the reconstructed image and the original image, and calculating the reconstruction loss.
Further, the compression operation is represented as:
Figure BDA0002810693610000031
wherein v isj、sjVectors, v, that are all capsulesjAccording to the preceding capsule sjAnd (4) calculating.
Further, the dynamic routing algorithm is used for acquiring a high-level capsule according to the original capsule, and comprises the following steps:
Figure BDA0002810693610000032
Figure BDA0002810693610000033
wherein s isjIs a capsule in the high layer, and has the advantages of high-temperature resistance,
Figure BDA0002810693610000034
a capsule being a bottom layer, cijTo be a coupling coefficient, WijAre weight parameters.
Further, the upper layer capsules and the lower layer capsules have a coupling coefficient cijTo c isijThe sum of the coupling coefficients is 1, and the coefficients are expressed as:
Figure BDA0002810693610000035
cij=softmax(b'ij);
wherein, b'ijTo represent the updated value, bijIs a pre-update value, which is initially zero; v. ofjIs the vector of the high-level capsule.
Further, the loss function of the encoder is expressed as:
Lc=Tcmax(0,m+-‖vc‖)2+λ(1-Tc)max(0,‖vc‖-m-)2
wherein, TcExpressed as whether the expression class c exists, when exists, the value is 1, and the nonexistence is 0; m is a unit of+、m-Respectively an upper boundary and a lower boundary; | | Vc| | is expressed as the module length of the capsule, i.e. the probability of expression class c.
Further, the loss function of the decoder is expressed as:
Figure BDA0002810693610000041
wherein n represents the number of pixel points, riObtaining a reconstructed value of the ith pixel point finally through a Capsule-based facial expression recognition network and a decoder; a is aiThe true value of the ith pixel point.
Further, when the long-term memory neural network is constructed, the cross entropy is made between the input vector and the actual label of the vector, the average value of the cross entropy of all elements in the vector is used as a loss function of the long-term memory neural network, and the cross entropy is expressed as:
Figure BDA0002810693610000042
wherein, yi' is the actual expression category label; y isiPredicted expression probability for sample i.
The invention has the following beneficial effects:
1. the improved capsule network is used for replacing a convolutional neural network to perform facial expression image recognition;
2. performing feature extraction on the preprocessed image by adopting three-layer convolution, so that potential features are more easily extracted;
3. the capsule network is combined with the long-time and short-time memory network, the capsule network extracts spatial information, and the long-time and short-time memory neural network extracts time sequence information, so that the accuracy of expression classification is effectively improved;
4. compared with the traditional Recurrent Neural Network (RNN), the long-time and short-time memory neural network solves the problem of gradient disappearance and reduces the difficulty of model training.
Drawings
FIG. 1 is a flow chart of a video expression recognition method based on a capsule-long-and-short-term memory neural network according to the present invention;
FIG. 2 is a schematic diagram illustrating the effect of the AFEW data set after face detection and preprocessing according to the present invention;
FIG. 3 is a network model of a capsule network encoder of the present invention;
fig. 4 is a network model structure diagram of a decoder in the capsule network according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention provides a video expression recognition method based on a capsule-long-short-term memory neural network, which specifically comprises the following steps:
converting a video including a human face into a video frame;
detecting a face image in a video frame, and preprocessing the face image;
constructing a capsule network, extracting the characteristics of the face image by using the capsule network and reconstructing the image;
constructing a long-time memory neural network, and extracting time sequence characteristics of the reconstructed picture by using the network;
and taking the expression classification corresponding to the maximum probability value in the long-time memory neural network output as the label of the sequence.
Example 1
The embodiment further illustrates a video expression recognition method based on a capsule-long-term memory neural network.
In this embodiment, the method is divided into three steps, which specifically include:
firstly, acquiring original data and preprocessing the original data
In this embodiment, the video in the data set is converted to a video frame using ffmpeg using an MMI data set and an AFEW data set.
In the embodiment, MTCNN is adopted to perform face localization on a video frame, and the detected face is cut into a fixed size, wherein the image is reduced to 48 × 48 by resize ();
graying the reduced picture, and selecting 16 frames from the video frames in each video as a group of video sequences as a preprocessing result as shown in fig. 2.
(II) extracting characteristics by utilizing capsule network and reconstructing pictures
In this embodiment, the constructed capsule network includes convolution layers, convolution capsule layers, digital capsule layers and deconvolution layers, in this embodiment, as shown in fig. 3, the number of convolution layers is 3, and the number of deconvolution layers is 4, where the first layer of convolution layers adopts 5 × 5 convolution kernels, and the step length is 1; in the convolution operation of the second layer, the convolution kernel adopts 5 x 5, and the step size is 2; the third layer is a convolution capsule layer, the essence of the layer is to convert the feature map after a convolution operation into the original capsule for use in the dynamic routing algorithm, the convolution operation is performed first in the layer, in the actual operation, the convolution kernel of the layer is 9 × 9, the step size is 2, and each convolution layer adopts the ReLU function as the activation function.
The digital capsule layer is the key of the capsule network, the transformed capsules of the third layer rolling capsule layer are used as input, the input capsules are subjected to iterative computation of a dynamic routing algorithm, extracted information is superposed in the last dimension, compression operation is carried out after superposition, the compression operation is to standardize each element in the vector to enable the element to be between 0 and 1, and the compression function square () is expressed as:
Figure BDA0002810693610000061
wherein v isj、sjVectors, v, all of CapsulejAccording to the preceding capsule sjAnd (4) calculating.
The dynamic routing algorithm is used for acquiring capsules of higher layers according to original capsules, and comprises the following steps:
Figure BDA0002810693610000062
Figure BDA0002810693610000063
wherein s isjIs a capsule in the high layer, and the capsule,
Figure BDA0002810693610000064
being a bottom layer of capsules, wijAs a weight parameter, cijIs a coupling coefficient, WijIs as follows. The upper layer capsule and the lower layer capsule have a coupling coefficient cij,cijThe sum of the coefficients is 1. The coefficients are expressed as:
Figure BDA0002810693610000065
cij=softmax(bij);
wherein, b'ijTo represent the updated value, bijIs the value before update, which is initially zero value, from b'ijThrough softmax function; v. ofjIs the vector of the high-level capsule.
The final output value is also a vector for representing the entity characteristics, and the modular length of the vector represents the probability of the expression. The total loss function of the capsule network consists of two parts, namely editing loss of an encoder and reconstruction loss of a decoder, and parameters of the capsule network are updated iteratively according to the loss function, wherein the loss function of the encoder part of the capsule network is defined as:
Lc=Tcmax(0,m+-‖vc‖)2+λ(1-Tc)max(0,‖vc‖-m-)2
wherein, TcThe expression class c is 1 in existence and 0 in nonexistence; m is+,m-The upper boundary and the lower boundary are respectively, and the corresponding values are respectively set to be 0.9 and 0.1 in the invention; i VcAnd | | is expressed as the module length of the capsule, namely the probability of the expression class c.
The encoder network structure is shown in fig. 4, and features extracted by the encoder are reconstructed by using the deconvolution layer and compared with the original image.
Before entering a decoder network, the output of an original encoder is subjected to a softmax algorithm to obtain an expression with the maximum modular length, then masking other classes of Capsule characterization entities, and then inputting the masked entities into a decoder based on deconvolution. Initially we need to go through a fully connected layer of 2304 output neurons. The data is reorganized by converting this output to 12 x 16 size data and the decoder is ready to derive the original image information from this information. The deconvolution kernel size of the deconvolution parameters in the deconvolution layer is set to be 3 x 3, the corresponding step length is 1, a 48 x 48 data structure can be obtained after four layers of deconvolution networks, and the obtained reconstruction value is compared with the true value of the corresponding pixel point, so that a reconstruction loss function can be defined as follows:
Figure BDA0002810693610000071
wherein n represents the number of pixel points, riThe ith pixel point can be regarded as a reconstructed value finally obtained after the ith pixel point passes through a Capsule-based facial expression recognition network and a decoder; a isiThe true value of the ith pixel point.
(III) adopting long-and-short-time memory neural network to extract time sequence characteristics
The output of a capsule network encoder is used as the input of a long-time and short-time memory neural network, the number of hidden layers of the long-time and short-time memory neural network is set to be 128, and the cross entropy is made between the output vector and the actual label of the sample, and the formula is as follows:
Figure BDA0002810693610000081
wherein, yi' is the actual expression category label; y isiPredicted expression probability for sample i.
And (4) taking the output of the long-time memory neural network, selecting the expression category corresponding to the maximum probability value as the label of the sequence sample, and finishing the video expression classification of the sequence sample.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (1)

1. The video expression recognition method based on the capsule-long-and-short-term memory neural network is characterized by comprising the following steps:
converting a video including a human face into a video frame;
detecting a face image in a video frame, and preprocessing the face image, specifically comprising:
carrying out face detection on the video frame, intercepting a face ROI (region of interest), and carrying out size normalization and graying;
detecting the face in the video frame by adopting an MTCNN algorithm, positioning the face in the video frame, cutting the detected face into a size with a fixed size, and performing graying treatment;
selecting a fixed frame as a group of video sequences from video frames in each video respectively to finish extraction and pretreatment of a face image;
constructing a capsule network, extracting the characteristics of the face image by using a capsule network encoder and reconstructing the picture by using a capsule network decoder; the method specifically comprises the following steps:
the capsule network uses three convolution layers, a convolution capsule layer and a digital capsule layer as an encoder of the capsule network, uses four layers of deconvolution layers as a decoder of the digital capsule, extracts the characteristics of pictures through the convolution layers and converts a characteristic diagram after the last convolution operation into an original capsule for the use of a dynamic routing algorithm, iterates the capsule through the dynamic routing algorithm and superposes the capsules in the last dimension, and the digital capsule layer adopts the length of each capsule vector to represent the probability of each expression category and is used for calculating the classification loss; the encoder is used for optimizing a network, reconstructing the image with the highest output probability, comparing the Euclidean distance between the reconstructed image and the original image, and calculating the reconstruction loss;
the dynamic routing algorithm is used for acquiring capsules of higher layers according to original capsules, and comprises the following steps:
Figure FDA0003666956530000011
Figure FDA0003666956530000012
wherein s isjIs a capsule in the high layer, and has the advantages of high-temperature resistance,
Figure FDA0003666956530000013
is a bottom layer of capsules, wijIs a weight parameter WijIs a weight parameter; c. CijFor the coupling coefficient, the upper layer capsule and the lower layer capsule have a coupling coefficient cij,cijThe sum of the coefficients is 1, and the coefficients are expressed as:
Figure FDA0003666956530000021
cij=softmax(b'ij);
wherein, b'ijTo represent the updated value, bijIs a pre-update value that is initially zero; v. ofjA vector for a high-level capsule;
the loss function of the encoder is expressed as:
Lc=Tc max(0,m+-‖vc‖)2+λ(1-Tc)max(0,‖vc‖-m-)2
wherein, TcExpressed as whether the expression class c exists, when exists, the value is 1, and the nonexistence is 0; m is+、m-Respectively an upper boundary and a lower boundary; i Vc| | is expressed as the module length of the capsule, i.e. the probability of the expression class c;
the loss function of the decoder is expressed as:
Figure FDA0003666956530000022
wherein n represents the number of pixel points, riObtaining a reconstructed value of the ith pixel point finally through a Capsule-based facial expression recognition network and a decoder; a isiThe true value of the ith pixel point is obtained;
constructing a long-term and short-term memory neural network, and taking the output of the capsule network encoder as the input of the long-term and short-term memory neural network; when the long-term and short-term memory neural network is constructed, the input vector and the actual label of the vector are taken as cross entropy, the average value of the cross entropy of all elements in the vector is taken as a loss function of the long-term and short-term memory neural network, and the cross entropy is expressed as:
Figure FDA0003666956530000023
wherein, yi' is the actual expression category label; y isiPredicted expression probabilities for sample i;
and classifying the expression corresponding to the maximum probability value in the long-time memory neural network output as a label of the sequence.
CN202011384713.4A 2020-12-01 2020-12-01 Video expression recognition method based on capsule-long-and-short-term memory neural network Active CN112487989B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011384713.4A CN112487989B (en) 2020-12-01 2020-12-01 Video expression recognition method based on capsule-long-and-short-term memory neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011384713.4A CN112487989B (en) 2020-12-01 2020-12-01 Video expression recognition method based on capsule-long-and-short-term memory neural network

Publications (2)

Publication Number Publication Date
CN112487989A CN112487989A (en) 2021-03-12
CN112487989B true CN112487989B (en) 2022-07-15

Family

ID=74938620

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011384713.4A Active CN112487989B (en) 2020-12-01 2020-12-01 Video expression recognition method based on capsule-long-and-short-term memory neural network

Country Status (1)

Country Link
CN (1) CN112487989B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113283393B (en) * 2021-06-28 2023-07-25 南京信息工程大学 Deepfake video detection method based on image group and two-stream network
CN113268994B (en) * 2021-07-16 2021-10-01 中国平安人寿保险股份有限公司 Intention identification method and device based on capsule network
CN113486863A (en) * 2021-08-20 2021-10-08 西南大学 Expression recognition method and device
CN113642540B (en) * 2021-10-14 2022-01-28 中国科学院自动化研究所 Capsule network-based facial expression recognition method and device
CN114694219A (en) * 2022-03-24 2022-07-01 华南师范大学 Facial expression recognition method and device for improving capsule network
CN116824677B (en) * 2023-08-28 2023-12-12 腾讯科技(深圳)有限公司 Expression recognition method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341452A (en) * 2017-06-20 2017-11-10 东北电力大学 Human bodys' response method based on quaternary number space-time convolutional neural networks
CN109376636A (en) * 2018-10-15 2019-02-22 电子科技大学 Eye ground image classification method based on capsule network
CN109410575A (en) * 2018-10-29 2019-03-01 北京航空航天大学 A kind of road network trend prediction method based on capsule network and the long Memory Neural Networks in short-term of nested type
CN110533004A (en) * 2019-09-07 2019-12-03 哈尔滨理工大学 A kind of complex scene face identification system based on deep learning
EP3629246A1 (en) * 2018-09-27 2020-04-01 Swisscom AG Systems and methods for neural architecture search
CN111241958A (en) * 2020-01-06 2020-06-05 电子科技大学 Video image identification method based on residual error-capsule network
CN111986188A (en) * 2020-08-27 2020-11-24 深圳市智源空间创新科技有限公司 Capsule robot drainage pipe network defect identification method based on Resnet and LSTM

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341452A (en) * 2017-06-20 2017-11-10 东北电力大学 Human bodys' response method based on quaternary number space-time convolutional neural networks
EP3629246A1 (en) * 2018-09-27 2020-04-01 Swisscom AG Systems and methods for neural architecture search
CN109376636A (en) * 2018-10-15 2019-02-22 电子科技大学 Eye ground image classification method based on capsule network
CN109410575A (en) * 2018-10-29 2019-03-01 北京航空航天大学 A kind of road network trend prediction method based on capsule network and the long Memory Neural Networks in short-term of nested type
CN110533004A (en) * 2019-09-07 2019-12-03 哈尔滨理工大学 A kind of complex scene face identification system based on deep learning
CN111241958A (en) * 2020-01-06 2020-06-05 电子科技大学 Video image identification method based on residual error-capsule network
CN111986188A (en) * 2020-08-27 2020-11-24 深圳市智源空间创新科技有限公司 Capsule robot drainage pipe network defect identification method based on Resnet and LSTM

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GF-CapsNet: Using Gabor Jet and Capsule Networks for Facial Age, Gender, and Expression Recognition;Sepidehsadat Hosseini;《IEEE》;20191231;全文 *
姚玉倩.基于胶囊网络的人脸表情特征提取与识别算法研究与识别算法研究.《中国优秀硕士论文合辑》.2019, *

Also Published As

Publication number Publication date
CN112487989A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
CN112487989B (en) Video expression recognition method based on capsule-long-and-short-term memory neural network
Vankdothu et al. A brain tumor identification and classification using deep learning based on CNN-LSTM method
Anand et al. Fusion of U-Net and CNN model for segmentation and classification of skin lesion from dermoscopy images
He et al. Automatic depression recognition using CNN with attention mechanism from videos
Liao et al. Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks
CN110532900B (en) Facial expression recognition method based on U-Net and LS-CNN
CN113947609B (en) Deep learning network structure and multi-label aortic dissection CT image segmentation method
Pan et al. A deep spatial and temporal aggregation framework for video-based facial expression recognition
Abdul et al. Intelligent real-time Arabic sign language classification using attention-based inception and BiLSTM
Sharma et al. Recognition of Indian sign language (ISL) using deep learning model
Ma et al. A crossmodal multiscale fusion network for semantic segmentation of remote sensing data
CN112418166B (en) Emotion distribution learning method based on multi-mode information
Feng et al. 3D convolutional neural network and stacked bidirectional recurrent neural network for Alzheimer’s disease diagnosis
Yu et al. Adaptive depth and receptive field selection network for defect semantic segmentation on castings X-rays
Hazourli et al. Multi-facial patches aggregation network for facial expression recognition and facial regions contributions to emotion display
Chen et al. Skin lesion segmentation using recurrent attentional convolutional networks
Li et al. Robustness comparison between the capsule network and the convolutional network for facial expression recognition
Nemani et al. Deep learning-based holistic speaker independent visual speech recognition
Podder et al. Time efficient real time facial expression recognition with CNN and transfer learning
CN115862120A (en) Separable variation self-encoder decoupled face action unit identification method and equipment
Francis et al. Diagnostic of cystic fibrosis in lung computer tomographic images using image annotation and improved PSPNet modelling
Marais et al. An evaluation of hand-based algorithms for sign language recognition
Sarveshwaran et al. Investigation on human activity recognition using deep learning
Boukdir et al. 3D gesture segmentation for word-level Arabic sign language using large-scale RGB video sequences and autoencoder convolutional networks
ALISAWI et al. Real-Time Emotion Recognition Using Deep Learning Methods: Systematic Review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240626

Address after: 410000 Ruixin Enterprise Plaza, No. 67 Jinzhou Avenue, High tech Development Zone, Changsha City, Hunan Province, China. Ruixin Building 3-101 and Ruixin Building 201-1

Patentee after: Hunan Nutrition Tree Biotechnology Co.,Ltd.

Country or region after: China

Address before: 400065 Chongwen Road, Nanshan Street, Nanan District, Chongqing

Patentee before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS

Country or region before: China