CN114882570A - Remote examination abnormal state pre-judging method, system, equipment and storage medium - Google Patents

Remote examination abnormal state pre-judging method, system, equipment and storage medium Download PDF

Info

Publication number
CN114882570A
CN114882570A CN202210604509.1A CN202210604509A CN114882570A CN 114882570 A CN114882570 A CN 114882570A CN 202210604509 A CN202210604509 A CN 202210604509A CN 114882570 A CN114882570 A CN 114882570A
Authority
CN
China
Prior art keywords
examinee
module
blood pressure
space
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210604509.1A
Other languages
Chinese (zh)
Inventor
刘海
张昭理
周启云
何嘉文
刘俊强
王书通
刘婷婷
杨兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University
Central China Normal University
Original Assignee
Hubei University
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University, Central China Normal University filed Critical Hubei University
Priority to CN202210604509.1A priority Critical patent/CN114882570A/en
Publication of CN114882570A publication Critical patent/CN114882570A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/02Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
    • A61B5/021Measuring pressure in heart or blood vessels
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/74Details of notification to user or communication with user or patient ; user input means
    • A61B5/746Alarms related to a physiological condition, e.g. details of setting alarm thresholds or avoiding false alarms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Cardiology (AREA)
  • Pathology (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Computational Linguistics (AREA)
  • Physiology (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Vascular Medicine (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The application discloses a method, a system, equipment and a storage medium for remotely prejudging abnormal states of examinations. The method comprises the following steps: respectively acquiring examinee video resources acquired by an RGB camera and a near-infrared camera in a remote examination system, and decomposing the video resources according to a time sequence to obtain an examinee RGB multi-frame image and a near-infrared multi-frame image; respectively inputting preprocessed examinee RGB multi-frame images and near-infrared multi-frame images into the trained fine expression recognition model and the trained remote blood pressure recognition model, and respectively obtaining a face fine expression recognition result and a blood pressure recognition result of the examinee in the examination room; and judging and early warning the abnormal state of the examinee according to the fine facial expression and the blood pressure recognition result. The invention can accurately judge whether the examinee has abnormal state tendency in the examination room in real time by identifying and analyzing the fine expression of the examinee and the remote blood pressure signal of the face in the remote examination.

Description

Remote examination abnormal state pre-judging method, system, equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, a system, a device, and a storage medium for remote abnormal examination state prediction.
Background
With the development of education information technology, the examination form is gradually developed from the traditional offline examination to the remote examination, and the paperless remote examination can reduce the influence of environment factors such as invigilation space, time and the like. The remote examination is briskly developed at one time under the influence of epidemic situations, and the requirement of the remote examination cannot be met by the traditional offline mode of implementing invigilation by examination staff. At present, remote examination platforms used in colleges and universities release invigilation modes, such as camera grabbing, background monitoring of answering states of examinees and the like. Some courses with higher invigilation requirements adopt a double-machine-position operation mode: when the camera is started on the computer screen, students need to set up the mobile phone at the back and the side, and the mobile phone camera is used for shooting synchronously. Although remote proctor has various settings for student cheating, it is difficult to achieve the desired effect of the remote proctor system, whether it is accurate or operable.
Disclosure of Invention
Aiming at least one defect or improvement requirement in the prior art, the invention provides a remote examination abnormal state prejudging method, a system, equipment and a storage medium, wherein whether an examinee tends to be in an abnormal state in an examination room can be accurately prejudged in real time by carrying out recognition analysis on fine expression and facial remote blood pressure signals of the examinee in a remote examination.
To achieve the above object, according to a first aspect of the present invention, there is provided a remote test abnormal state prediction method, including:
respectively acquiring examinee video resources acquired by an RGB camera and a near-infrared camera in a remote examination system, dividing the examinee video resources into multi-frame images according to a time sequence, and acquiring examinee RGB multi-frame images and near-infrared multi-frame images;
inputting an examinee RGB multi-frame image and a near-infrared multi-frame image into a trained fine expression recognition model, obtaining a face fine expression recognition result of the examinee in an examination room, inputting the examinee RGB multi-frame image and the near-infrared multi-frame image into the trained face remote blood pressure recognition model, and obtaining a blood pressure recognition result of the examinee in the examination room;
and judging and early warning the abnormal state of the examinee according to the fine facial expression and blood pressure recognition result of the examinee.
Further, the fine expression recognition model includes: the system comprises a feature extraction and fusion module, a convolution space Transformer, a time Transformer and a full connection layer, wherein the convolution space Transformer comprises N space encoders, the time Transformer comprises M time encoders, and the space encoders and the time encoders are both composed of a multi-head attention mechanism;
the feature extraction and fusion module is used for extracting features from RGB (red, green and blue) images and performing feature fusion and inputting the features into the convolution space Transformer, the convolution space Transformer is used for extracting face space features of examinees from the input fusion features and inputting the face space features into the time Transformer, the time Transformer takes the face space features of the examinees in all frames as input to generate judgment feature representation and inputs the judgment feature representation into the full-connection layer, and the full-connection layer is used for obtaining a face fine expression recognition result.
Further, the convolutional space Transformer is used for implementing the steps of:
extracting human face space characteristic of examinee from input fusion characteristic to carry out characteristic mapping, flattening obtained characteristic mapping into one-dimensional sequence M f
According to M f Obtaining an input embedded representation, and calculating a query vector, a keyword vector and a value vector in each spatial encoder through the embedded representation of the previous layer;
calculating respective self-attention coefficients of N space encoders according to the query vector and the keyword vector, calculating the weight sum of the value vector by using the self-attention coefficient of each attention head, projecting the connection of the vectors of all the attention heads, and outputting an embedded representation through MLP operation
Figure BDA0003670808800000021
To embed a representation
Figure BDA0003670808800000022
Are connected at the spatial level to generate a feature map M r Compute feature embedding x 'for each frame' t ,x′ t =GMP(g(M r ) Where g (-) represents convolutional layer and GMP represents global max pooling.
Further, the time Transformer is used for realizing the steps of:
obtaining an input embedded representation according to the spatial features of the face of the examinee;
computing a query vector, a key vector, and a value vector in each time encoder by a previous layer of embedded representation;
calculating respective self-attention coefficients of the M time encoders according to the query vector and the keyword vector, calculating the weight sum of the value vector by using the self-attention coefficient of each attention head, projecting the connection of the vectors of all the attention heads, and outputting a judgment feature representation through MLP operation.
Furthermore, the facial remote blood pressure identification model comprises a feature extraction and fusion module, a blood pressure signal extraction module and a blood pressure signal processing module, wherein the feature extraction and fusion module comprises a blood pressure signal extraction module used for extracting feature mapping from preprocessed examinee RGB multi-frame images and near-infrared multi-frame images, performing feature fusion and inputting the feature mapping into the blood pressure signal extraction module, and the blood pressure signal extraction module is used for extracting facial color change features and inputting the facial color change features into the blood pressure signal processing module as blood pressure signals; the blood pressure signal processing module is used for inhibiting interference caused by facial movement.
Further, the feature extraction and fusion module comprises a convolution layer, an average pooling layer and a feature fusion layer, wherein the convolution layer is a 2D convolution layer or a 3D convolution layer, feature maps are extracted from preprocessed examinee RGB multi-frame images and near-infrared multi-frame images through the convolution layer and then input into the average pooling layer, the feature maps are subjected to average pooling operation through the average pooling layer and then input into the feature fusion layer, and the feature fusion layer merges the feature maps into short-segment feature maps.
Further, the blood pressure signal processing module comprises a space-time attention module, an average pooling module and a 3D convolution module, the space-time attention module comprises a space-time strip convergence module and a space-time convolution block, and the blood pressure signal processing module is configured to implement the steps of:
inputting the blood pressure signal into a space-time attention module, and acquiring the horizontal, vertical and time dimension strip convergence of the input blood pressure signal in a three-dimensional space;
adjusting current position and neighboring features by a spatio-temporal convolution block to obtain y v 、y h 、y K And finally outputting Z, wherein the calculation formula is as follows:
Figure BDA0003670808800000031
Z=σ(f(Y,ω))
wherein f (Y, omega) represents a space-time convolution block represented by parameter omega, sigma represents a sigmoid function, and Y c,i,j,t Representing the convergence of the space-time bands of the c channel along the horizontal, vertical and time directions,
Figure BDA0003670808800000041
Indicating convergence of the c-channel strips in the horizontal direction,
Figure BDA0003670808800000042
indicating that the stripes of the c-channel converge in the vertical direction,
Figure BDA0003670808800000043
representing the stripe convergence of the c channel along the time direction, and Z represents the nonlinear normalization of the stripe convergence;
tensor Z generates spatial information descriptor F by mean pooling avg The calculation formula is as follows:
F avg =σ(g avg (Z))
spatial information descriptor F generated by average pooling avg Carrying out Hadamard product with tensor Z, and finally outputting X';
and after the X 'is subjected to average pooling, convolving the X' with a 3D convolution module, and outputting a predicted DBPG signal.
According to a second aspect of the present invention, there is also provided a remote test abnormal state prediction system, comprising:
the system comprises an acquisition module, a remote examination module and a control module, wherein the acquisition module is used for respectively acquiring examinee video resources acquired by an RGB camera and a near-infrared camera in the remote examination system, dividing the examinee video resources into multi-frame images according to the time sequence and acquiring examinee RGB multi-frame images and near-infrared multi-frame images;
the recognition module is used for inputting an examinee RGB multi-frame image and a near-infrared multi-frame image into the trained fine expression recognition model, acquiring a face fine expression recognition result of the examinee in the examination room, inputting the examinee RGB multi-frame image and the near-infrared multi-frame image into the trained face remote blood pressure recognition model, and acquiring a blood pressure recognition result of the examinee in the examination room;
and the judging and early warning module is used for judging and early warning the abnormal state of the examinee according to the fine facial expression and blood pressure recognition result of the examinee.
According to a third aspect of the present invention, there is also provided an electronic device comprising at least one processor, and at least one memory module, wherein the memory module stores a computer program that, when executed by the processor, causes the processor to perform the steps of any of the methods described above.
According to a fourth aspect of the present invention, there is also provided a storage medium storing a computer program executable by a processor, the computer program, when run on the processor, causing the processor to perform the steps of any of the methods described above.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
(1) according to the invention, by collecting the video resources of the examinee collected by the RGB camera and the near infrared camera in the remote examination system, the fine expression and the face remote blood pressure signals of the examinee in the remote examination are identified and analyzed, and whether the examinee has an abnormal state tendency in the examination room can be accurately pre-judged in real time.
(2) By adopting the examinee fine expression recognition model provided by the invention, fine expression features with more discriminative power can be extracted from a space angle and a time angle, and the method can effectively process the problems of non-positive head posture, head movement, examination room light and the like existing in the examinee and achieve the effect of robustness.
(3) The face remote Blood Pressure identification model provided by the invention can effectively reduce redundant information, enhance the incidence relation of long-distance videos, effectively reduce head motion noise by pooling Space-time strips designed in a Space-time module, enable the model to pay more attention to DBPG (Deep Cross-Space Blood Pressure Signal) signals instead of interference information by a loss function designed finally, and effectively improve the accuracy of face remote Blood Pressure identification.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart of a remote test abnormal state anticipation method according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating a remote test abnormal state pre-judging method according to an embodiment of the present invention;
FIG. 3 is a network diagram of a fine expression recognition model according to an embodiment of the present invention;
fig. 4 is a network diagram of a remote blood pressure recognition model of a face according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The terms "including" and "having," and any variations thereof, in the description and claims of this application and the drawings described above, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to the listed steps or modules but may alternatively include other steps or modules not listed or inherent to such process, method, article, or apparatus.
As shown in fig. 1 and fig. 2, the remote test abnormal state prejudging method according to the embodiment of the present invention includes the steps of:
s101, respectively obtaining examinee video resources collected by an RGB camera and a Near Infrared (NIR) camera in a remote examination system, dividing the examinee video resources into multi-frame images according to a time sequence, and obtaining an examinee RGB multi-frame image and a near infrared multi-frame image.
And arranging RGB cameras and near-infrared cameras on the examination room. The examiner is taking a remote examination using the equipment for the remote examination, and in this scenario, the facial video sequence of the examiner is captured using RGB and near-infrared binocular video capturing instruments. RGB and near-infrared examinee video images collected in the scene provide important data sources for the fine expression recognition module and the facial remote blood pressure recognition module.
Dividing the video resource of the examinee into multi-frame images according to the time sequence to obtain an RGB multi-frame image sequence and a near-infrared multi-frame image sequence of the examinee.
Furthermore, before being input to the step S102 for processing, the RGB multi-frame image and the near-infrared multi-frame image of the examinee are also preprocessed, respectively.
And proportionally dividing the preprocessed image into a training set and a testing set to train and test the fine facial expression recognition model and the remote facial blood pressure recognition model.
S102, inputting an RGB multi-frame image and a near-infrared multi-frame image of the examinee into the trained fine expression recognition model, obtaining a face fine expression recognition result of the examinee in the examination room, inputting the RGB multi-frame image and the near-infrared multi-frame image of the examinee into the trained face remote blood pressure recognition model, and obtaining a blood pressure recognition result of the examinee in the examination room.
As shown in fig. 3, in the present embodiment, the fine expression recognition model includes a feature extraction and fusion module, a convolution space Transformer, a time Transformer, and a full connection layer. The convolutional space Transformer (FER-S-Former) comprises feature mapping, spatial position embedding and N spatial encoders; the time Transformer (FER-T-Former) comprises feature mapping, time position embedding and M time encoders; the space encoder and the time encoder are both composed of a multi-head attention mechanism and input into a feedforward network. The convolution space Transformer is used for taking each frame image of the examinee as input to extract the human face space characteristics of the examinee; the time Transformer takes the spatial features of the faces of examinees in all frames as input to generate discriminant feature representation; and the full connection layer is used for identifying the final fine expression result.
The characteristic extraction and fusion process comprises the following steps: the collected RGB examinee video image is R ═[r 1 ,r 2 ,…,r t ]The collected NIR examinee video image is N ═ N 1 ,n 2 ,…,n t ]Respectively taking t frame images of RGB and NIR near-infrared examinee videos as input, extracting facial expression features through convolution and pooling operations, and then performing feature fusion to represent that A is [ a ] 1 ,a 2 ,…,a n ]。
The data processing procedure of the convolution space Transformer is as follows:
(1) from a spatial perspective, the fine expression features of the examinee in the facial image of the examination room can be modeled as a series of vector representations, a ═ a 1 ,a 2 ,…,a n ]
(2) Extracting the spatial features of the face of the examinee and mapping the features. Flattening the characteristic of the examinee fine expression space after the characteristic fusion into a one-dimensional sequence M by a convolution space Transformer (FER-S-Former) f ∈R Q×C (Q ═ H '× W'), and further input to FER-S-Former.
(3) The query, key, value vectors in each spatial encoder are computed. The input embedded representation may be computed as
Figure BDA0003670808800000071
Wherein e p Representing an embeddable location that can be learned,
Figure BDA0003670808800000072
indicating that the corresponding visual word is embedded,
Figure BDA0003670808800000073
representing the input embedding of the spatial encoder. By preceding one layer of embedded representation
Figure BDA0003670808800000074
Compute a query vector for k-head attention for the l-th layer in each spatial encoder as
Figure BDA0003670808800000075
The k-headed attention key vector of the l-th layer is
Figure BDA0003670808800000076
The value vector of the k head attention of the l-th layer is
Figure BDA0003670808800000077
Where LN (-) denotes the layer normalization, K ∈ { 1.. K } denotes the number of attention heads,
Figure BDA0003670808800000078
representing the potential dimensions of each of the heads of attention,
Figure BDA0003670808800000079
a query vector representing the k-head attention of the l-th layer,
Figure BDA00036708088000000710
a parameter matrix representing the query vector of k-head attention of the l-th layer,
Figure BDA0003670808800000081
a key vector representing the k-head attention of the l-th layer,
Figure BDA0003670808800000082
a parameter matrix of key vectors representing k-head attention of the l-th layer,
Figure BDA0003670808800000083
a vector of values representing the k-head attention of the ith layer,
Figure BDA0003670808800000084
a parameter matrix representing a vector of values of k-head attention of the l-th layer.
(4) Self attention weight
Figure BDA0003670808800000085
For each query p is calculated as
Figure BDA0003670808800000086
Where SM stands for softmax activation function,
Figure BDA0003670808800000087
self-attention weight representing k-head attention of the ith layer. To compute an embedded representation of the l-th layer
Figure BDA0003670808800000088
The sum of the weights of the vectors of values is calculated using the self-attention coefficients of each head of attention, i.e.
Figure BDA0003670808800000089
A weighted sum of vectors of values representing the k-head attention of the ith layer,
Figure BDA00036708088000000810
representing a self-attention weight coefficient corresponding to the value vector in each attention head of the ith layer; then the connected vectors of all the attention heads are projected and passed through MLP operation, which can be expressed as
Figure BDA00036708088000000811
Figure BDA00036708088000000812
In the formula, W o Representing a learnable parameter matrix.
(5) Embedded representation of Q
Figure BDA00036708088000000813
Are connected at the spatial level to generate a finer feature map M r And the features of each frame are embedded into x' t Can be calculated as x' t =GMP(g(M r ) Where g (-) represents convolutional layer and GMP represents global max pooling.
The data processing procedure of the time Transformer model is as follows:
(1) the output of the convolution space Transformer is used as the input of the time Transformer model, and given an input X' is belonged to R T×F The input embedded representation of the time Transformer can be computed as
Figure BDA00036708088000000814
Wherein e t′ ∈R F A learnable position-embedded representation representing a coded temporal position,
Figure BDA00036708088000000815
denotes input embedding of time Transformer, x' t ' indicates feature embedding for each frame. Unlike spatial encoders, a learnable vector x 'is added at the first position of the sequence' 0 Token (embedded representation of class) representing the classification.
(2) For time Transformer, the representation is embedded by the previous layer
Figure BDA00036708088000000816
Compute a query vector of k-head attention for the l-th layer in each time encoder as
Figure BDA0003670808800000091
The k-headed attention key vector of the l-th layer is
Figure BDA0003670808800000092
The value vector of the k head attention of the l-th layer is
Figure BDA0003670808800000093
In the formula (I), the compound is shown in the specification,
Figure BDA0003670808800000094
a query vector representing the k-head attention of the l-th layer,
Figure BDA0003670808800000095
a parameter matrix of query vectors representing k-head attention of the l-th layer,
Figure BDA0003670808800000096
a key vector representing the k-head attention of the l-th layer,
Figure BDA0003670808800000097
key to k-head attention in presentation of layer IA matrix of parameters of the word vector is formed,
Figure BDA0003670808800000098
a vector of values representing the k-head attention of the ith layer,
Figure BDA0003670808800000099
a parameter matrix representing a vector of values of k-head attention of the l-th layer.
(3) Calculating respective self-attention coefficients of the M time encoders according to the query vector and the keyword vector, calculating the weight sum of the value vector by using the self-attention coefficient of each attention head, projecting the connection of the vectors of all the attention heads, and outputting a judgment feature representation through MLP operation.
The self-attention weight for each query vector can be calculated as:
Figure BDA00036708088000000910
in the formula (I), the compound is shown in the specification,
Figure BDA00036708088000000911
the self-attention weight of the K-th head attention of the l-th layer is represented, and F 'represents the potential dimension of each head of attention, i.e., F' ═ F/K.
The embedded representation at the l layer can be computed as
Figure BDA00036708088000000912
Figure BDA00036708088000000913
In the formula (I), the compound is shown in the specification,
Figure BDA00036708088000000914
an embedded representation is represented at the l-level,
Figure BDA00036708088000000915
is shown in each of the l layersThe self-attention weight coefficient in the attention head corresponding to the value vector,
Figure BDA00036708088000000916
representing a self-attention weight coefficient corresponding to a vector of learnable values,
Figure BDA00036708088000000917
indicating the coding at the l level.
And the full connection layer calculates a final recognition result. The final clip embedding is obtained from the last layer of class token of the time Transformer, and the final recognition result can be calculated as
Figure BDA00036708088000000918
Where FC represents a full connectivity layer network, J represents a category of expression recognition,
Figure BDA0003670808800000101
class-embedded representation representing the last layer of the M time encoders. The categories of expression recognition are divided into three categories: positive, neutral, negative.
As shown in fig. 4, the facial remote Blood Pressure recognition model includes a feature extraction and fusion module, a Blood Pressure Signal extraction module (DBPG), and a Blood Pressure Signal processing module (DBPG Signal processing module). The feature extraction and fusion module comprises a blood pressure signal extraction module used for extracting feature mapping from the preprocessed examinee RGB multi-frame images and near infrared multi-frame images, performing feature fusion and inputting the feature mapping into the blood pressure signal extraction module. The DBPG signal extraction module is used for extracting the face color change from the feature fusion and inputting the face color change as a deep blood pressure signal to the DBPG signal processing module. The DBPG signal processing module is used to avoid ignoring important local information and to prevent interference of relevant areas in motion conditions. The DBPG signal extraction Block is composed of a time space Block (ST Block). The DBPG signal processing module comprises a space-time attention module, an average pooling module and a 3D convolution module, wherein the space-time attention module comprises a space-time strip convergence module and a space-time convolution block.
The feature extraction and fusion module comprises a convolution layer, an average pooling layer and a feature fusion layer, wherein the convolution layer is a 2D convolution layer or a 3D convolution layer, feature mappings are extracted from preprocessed examinee RGB multi-frame images and near-infrared multi-frame images through the convolution layer and then input into the average pooling layer, the feature mappings are subjected to average pooling operation through the average pooling layer and then input into the feature fusion layer, and the feature fusion layer merges the feature mappings into short-segment feature mappings.
The data processing steps of the feature extraction and fusion module are as follows:
step 3.3.1: respectively enabling the preprocessed examinee RGB video sequence V to be belonged to R ′×L×H×W NIR near infrared video sequence N epsilon R C×L×H×W As the input tensor, wherein C, L, H, W denotes the number of channels, video length, height and width, respectively;
step 3.3.2: and extracting feature mapping. By having two layers [3, 3]]3DCNN extraction feature mapping X of convolution kernel conv3D ∈R C′×L×H×W C' represents the number of channels of feature mapping, and 3DCNN can enhance learning information in a time dimension; by having two layers [3, 3]]2DCNN of convolution kernel can also obtain feature mapping
Figure BDA0003670808800000102
2DCNN places more emphasis on V e R from the video sequence B×C×H×W Extracting facial features, wherein B is the batch size and is equal in value to the video length;
step 3.3.3: merging the feature maps into a space-time feature map by using a feature fusion function after the average pooling
Figure BDA0003670808800000111
Where T is the number of short segments of the video sequence. The calculation formula is as follows:
Figure BDA0003670808800000112
wherein g is avg (. cndot.) represents the average pooling function,
Figure BDA0003670808800000113
the convolution filtering parameter omega is used for representing a function of 3DCNN or 2DCNN, and f represents a characteristic fusion function;
the data processing steps of the DBPG signal extraction module are as follows:
(1) mapping the space-time characteristics obtained from the face characteristic extraction module
Figure BDA0003670808800000114
Inputting the DBPG signal into a DBPG signal extraction module;
(2) the DBPG signal is extracted using 3 space-time blocks (ST Block). ST Block consists of two sets of concatenated space-time convolutions (ST Conv) and an average pooling layer; ST Conv consists of two cascaded sets of 3D convolution filters with kernel sizes [1,3,3] and [3,1,1], and can significantly reduce the number of parameters of the convolution filters. Additionally, ST Conv may also encapsulate information related to objects, scenes, and motion in the video. The key formula for extracting the DBPG signal is as follows:
I k =h(X+∑ k (1+a kk ) (11)
wherein, I k Represents the DBPG signal extracted from the kth space-time block, X represents the space-time feature mapping, k is the number of space-time blocks, a k Represents the scaling factor, δ, of the k-th space-time block k And h represents a signal feature extraction function.
The data processing steps of the DBPG signal processing module are as follows:
(1) the extracted DBPG signal I k ∈R C×T×H×W For tensor inputs into spatiotemporal attention, the spatiotemporal attention module includes spatiotemporal banding aggregation, spatiotemporal convolution (ST Conv). The space-time strip convergence is used for generating convergence of the DBPG signal in the horizontal direction, the time direction and the vertical direction in a three-dimensional space. The convergence of the spatio-temporal bands along vertical bands is
Figure BDA0003670808800000115
Similarly, the band convergence along the horizontal and time directions is calculated as:
Figure BDA0003670808800000116
Figure BDA0003670808800000117
(2) adjusting current position and neighboring features by ST Conv to obtain y v ∈R C×W ,y h ∈R C×H ,y K ∈R C×T Finally, output Z ∈ R C×T×H×W The key equation for spatio-temporal banding pooling is as follows:
Figure BDA0003670808800000118
Z=σ(f(Y,ω)) (13)
where f (Y, ω) represents ST Conv represented by parameter ω, σ represents sigmoid function, and Y c,i,j,t Representing the convergence of c-channels along the horizontal, vertical, temporal spatio-temporal strips,
Figure BDA0003670808800000121
indicating convergence of the c-channel strips in the horizontal direction,
Figure BDA0003670808800000122
indicating convergence of the c-channels along the vertically oriented strips,
Figure BDA0003670808800000123
the convergence of the bands of the c channel along the time direction is shown, and Z represents the nonlinear normalization of the convergence of the bands.
(3) Tensor Z generates spatial information descriptor F by average pooling avg The calculation formula is as follows:
F avg =σ(g avg (Z)) (14)
(4) spatial information descriptor F generated by average pooling avg Hadamard product is carried out with tensor Z, and finally X' is output to be belonged to R C×T×H×W The calculation formula is as follows:
X′=Z⊙F avg (15)
(5) and after the global average pooling and the convolution of the X' and the 3D convolution module with [1,1,1], outputting a predicted DBPG signal.
To minimize the position error between the estimated signal peak and the true signal peak, a negative pearson correlation coefficient is used as a loss function, and the formula is as follows:
Figure BDA0003670808800000124
where L represents the length of the signal, x is the basic true value of the DBPG signal, and y is the predicted DBPG signal.
Loss of the facial remote blood pressure recognition model is divided into two parts, and Loss is generated in a DBPG signal extraction module pre ,Loss pre The purpose of (1) is to pre-direct the network to learn the DBPG signal, but not other physiological signals; loss generation in DBPG signal processing modules post For making the learned DBPG signal as accurate as possible, thereby improving the periodicity of recovering the DBPG signal. Loss post 、Loss pre Are calculated by the formula (16). With two-part loss, the model can focus more on the DBPG signal, with a total loss function of L:
L=λ*Loss pre +(1-λ)*Loss post (17)
where λ ∈ [0,1] is used to adjust the importance of the two-part loss.
And S103, judging and early warning abnormal states of the examinees according to the fine facial expressions and blood pressure recognition results of the examinees.
Comprehensively judging whether the fine facial expression and the remote facial blood pressure recognition result of the examinee are larger than a threshold f or not after feature fusion is carried out according to the fine facial expression and the remote facial blood pressure recognition result of the examinee at different moments, wherein if the fine facial expression and the remote facial blood pressure recognition result are larger than the threshold f, the examinee has abnormal behaviors, and if the fine facial expression and the remote facial blood pressure recognition result are smaller than the threshold f, the feature fusion rule of the examinee in a normal state is as follows:
Figure BDA0003670808800000131
according to the abnormal behavior judgment result, the abnormal behavior of the examinee is warned to the examinee, and the examinee can take corresponding intervention measures to prevent the abnormal behavior.
The invention provides a remote test abnormal state prejudging system, which comprises:
the system comprises an acquisition module, a remote examination module and a control module, wherein the acquisition module is used for respectively acquiring examinee video resources acquired by an RGB camera and a near-infrared camera in the remote examination system, dividing the examinee video resources into multi-frame images according to the time sequence and acquiring examinee RGB multi-frame images and near-infrared multi-frame images;
the recognition module is used for inputting an examinee RGB multi-frame image and a near-infrared multi-frame image into the trained fine expression recognition model, acquiring a face fine expression recognition result of the examinee in the examination room, inputting the examinee RGB multi-frame image and the near-infrared multi-frame image into the trained face remote blood pressure recognition model, and acquiring a blood pressure recognition result of the examinee in the examination room;
and the judging and early warning module is used for judging and early warning the abnormal state of the examinee according to the fine facial expression and blood pressure recognition result of the examinee.
The implementation method of the system is the same as the above method, and is not described herein again.
The present embodiment further provides an electronic device, which includes at least one processor and at least one memory, where the memory stores a computer program, and when the computer program is executed by the processor, the processor executes any one of the steps of the remote test abnormal state prediction method, where the specific steps refer to method embodiments and are not described herein again; in this embodiment, the types of the processor and the memory are not particularly limited, for example: the processor may be a microprocessor, digital information processor, on-chip programmable logic system, or the like; the memory may be volatile memory, non-volatile memory, a combination thereof, or the like.
The present application further provides a storage medium storing a computer program executable by a processor, wherein the computer program causes the processor to execute the steps of any one of the above-mentioned remote test abnormal state prejudging methods when the computer program runs on the processor. The computer-readable storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed system may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some service interfaces, indirect coupling or communication connection of systems or modules, and may be in electrical or other forms.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, and the memory may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above description is only an exemplary embodiment of the present disclosure, and the scope of the present disclosure should not be limited thereby. That is, all equivalent changes and modifications made in accordance with the teachings of the present disclosure are intended to be included within the scope of the present disclosure. Embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A remote test abnormal state prejudging method is characterized by comprising the following steps:
respectively acquiring examinee video resources acquired by an RGB camera and a near-infrared camera in a remote examination system, dividing the examinee video resources into multi-frame images according to a time sequence, and acquiring examinee RGB multi-frame images and near-infrared multi-frame images;
inputting an examinee RGB multi-frame image and a near-infrared multi-frame image into a trained fine expression recognition model, obtaining a face fine expression recognition result of the examinee in an examination room, inputting the examinee RGB multi-frame image and the near-infrared multi-frame image into the trained face remote blood pressure recognition model, and obtaining a blood pressure recognition result of the examinee in the examination room;
and judging and early warning the abnormal state of the examinee according to the fine facial expression and blood pressure recognition result of the examinee.
2. The remote examination abnormal state prediction method of claim 1, wherein the fine expression recognition model comprises: the system comprises a feature extraction and fusion module, a convolution space Transformer, a time Transformer and a full connection layer, wherein the convolution space Transformer comprises N space encoders, the time Transformer comprises M time encoders, and the space encoders and the time encoders are both composed of a multi-head attention mechanism;
the feature extraction and fusion module is used for extracting features from RGB images and near-infrared images respectively, performing feature fusion and inputting the features into the convolution space Transformer, the convolution space Transformer is used for extracting face space features of examinees from the input fusion features and inputting the face space features into the time Transformer, the time Transformer takes the face space features of the examinees in all frames as input to generate judgment feature representation and inputs the judgment feature representation into the full-connection layer, and the full-connection layer is used for obtaining a face fine expression recognition result.
3. The method of remote examination abnormal state prediction according to claim 2, wherein the convolution space Transformer is used for realizing the steps of:
extracting human face space characteristic of examinee from input fusion characteristic to carry out characteristic mapping, flattening obtained characteristic mapping into one-dimensional sequence M f
According to M f Obtaining an input embedded representation, and calculating a query vector, a keyword vector and a value vector in each spatial encoder through the embedded representation of the previous layer;
calculating respective self-attention coefficients of N space encoders according to the query vector and the keyword vector, calculating the weight sum of the value vector by using the self-attention coefficient of each attention head, projecting the connection of the vectors of all the attention heads, and outputting an embedded representation through MLP operation
Figure FDA0003670808790000021
To embed a representation
Figure FDA0003670808790000022
Are connected at the spatial level to generate a feature map M r Compute feature embedding x 'for each frame' t ,x′ t =GMP(g(M r ) Where g (-) represents convolutional layer and GMP represents global max pooling.
4. The method of remote test abnormal state anticipation as claimed in claim 2, wherein the time Transformer is configured to perform the steps of:
obtaining an input embedded representation according to the spatial features of the face of the examinee;
computing a query vector, a key vector, and a value vector in each time encoder by a previous layer of embedded representation;
calculating respective self-attention coefficients of the M time encoders according to the query vector and the keyword vector, calculating the weight sum of the value vector by using the self-attention coefficient of each attention head, projecting the connection of the vectors of all the attention heads, and outputting a judgment feature representation through MLP operation.
5. The method for predicting the abnormal state of the remote examination according to claim 1, wherein the facial remote blood pressure recognition model comprises a feature extraction and fusion module, a blood pressure signal extraction module and a blood pressure signal processing module, the feature extraction and fusion module comprises a module for extracting feature mapping from preprocessed examinee RGB multi-frame images and near infrared multi-frame images, performing feature fusion and inputting the feature mapping to the blood pressure signal extraction module, and the blood pressure signal extraction module is used for extracting facial color change features and inputting the facial color change features as blood pressure signals to the blood pressure signal processing module; the blood pressure signal processing module is used for inhibiting interference caused by facial movement.
6. The method according to claim 5, wherein the feature extraction and fusion module comprises a convolution layer, an average pooling layer and a feature fusion layer, the convolution layer is a 2D convolution layer or a 3D convolution layer, the feature map is extracted from the preprocessed RGB multi-frame image and near-infrared multi-frame image of the examinee respectively through the convolution layer and then input into the average pooling layer, the feature map is subjected to average pooling operation through the average pooling layer and then input into the feature fusion layer, and the feature fusion layer merges the feature map into a short-segment feature map.
7. The method of remote examination abnormal state prediction according to claim 5, wherein the blood pressure signal processing module comprises a space-time attention module, a mean pooling module and a 3D convolution module, the space-time attention module comprises a space-time strip convergence module and a space-time convolution module, the blood pressure signal processing module is configured to implement the steps of:
inputting the blood pressure signal into a space-time attention module, and acquiring the horizontal, vertical and time dimension strip convergence of the input blood pressure signal in a three-dimensional space;
adjusting current position and neighboring features to obtain y by a spatio-temporal convolution block v 、y h 、y K And finally outputting Z, wherein the calculation formula is as follows:
Figure FDA0003670808790000031
Z=σ(f(Y,ω))
where f (Y, ω) represents a space-time convolution block represented by parameter ω, σ represents a sigmoid function, and Y c,i,j,t Representing the convergence of c-channels along the horizontal, vertical, temporal spatio-temporal strips,
Figure FDA0003670808790000032
indicating convergence of the c-channel strips in the horizontal direction,
Figure FDA0003670808790000033
indicating convergence of the c-channels along the vertically oriented strips,
Figure FDA0003670808790000034
the stripe convergence of the c channel along the time direction is represented, and Z represents the nonlinear normalization of the stripe convergence;
tensor Z generates spatial information descriptor F by average pooling avg The calculation formula is as follows:
F avg =σ(g avg (Z))
spatial information descriptor F generated by average pooling avg Is processed with tensor ZThe Damama product is obtained, and X' is finally output;
and after the X 'is subjected to average pooling, convolving the X' with a 3D convolution module, and outputting a predicted DBPG signal.
8. A remote test abnormal state anticipation system, comprising:
the system comprises an acquisition module, a remote examination module and a control module, wherein the acquisition module is used for respectively acquiring examinee video resources acquired by an RGB camera and a near-infrared camera in the remote examination system, dividing the examinee video resources into multi-frame images according to the time sequence and acquiring examinee RGB multi-frame images and near-infrared multi-frame images;
the recognition module is used for inputting an examinee RGB multi-frame image and a near-infrared multi-frame image into the trained fine expression recognition model, acquiring a face fine expression recognition result of the examinee in the examination room, inputting the examinee RGB multi-frame image and the near-infrared multi-frame image into the trained face remote blood pressure recognition model, and acquiring a blood pressure recognition result of the examinee in the examination room;
and the judging and early warning module is used for judging and early warning the abnormal state of the examinee according to the fine facial expression and blood pressure recognition result of the examinee.
9. An electronic device, comprising at least one processor and at least one memory module, wherein the memory module stores a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any one of claims 1 to 7.
10. A storage medium, characterized in that it stores a computer program which, when run on a processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.
CN202210604509.1A 2022-05-31 2022-05-31 Remote examination abnormal state pre-judging method, system, equipment and storage medium Pending CN114882570A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210604509.1A CN114882570A (en) 2022-05-31 2022-05-31 Remote examination abnormal state pre-judging method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210604509.1A CN114882570A (en) 2022-05-31 2022-05-31 Remote examination abnormal state pre-judging method, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114882570A true CN114882570A (en) 2022-08-09

Family

ID=82680414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210604509.1A Pending CN114882570A (en) 2022-05-31 2022-05-31 Remote examination abnormal state pre-judging method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114882570A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116884067A (en) * 2023-07-12 2023-10-13 成都信息工程大学 Micro-expression recognition method based on improved implicit semantic data enhancement

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116884067A (en) * 2023-07-12 2023-10-13 成都信息工程大学 Micro-expression recognition method based on improved implicit semantic data enhancement

Similar Documents

Publication Publication Date Title
US10089556B1 (en) Self-attention deep neural network for action recognition in surveillance videos
CN111709409B (en) Face living body detection method, device, equipment and medium
CN111310731B (en) Video recommendation method, device, equipment and storage medium based on artificial intelligence
Ciancio et al. No-reference blur assessment of digital pictures based on multifeature classifiers
CN110678875A (en) System and method for guiding user to take self-photo
CN110599421B (en) Model training method, video fuzzy frame conversion method, device and storage medium
Xu et al. Visual quality assessment by machine learning
US20200387718A1 (en) System and method for counting objects
CN113196289A (en) Human body action recognition method, human body action recognition system and device
JP7292492B2 (en) Object tracking method and device, storage medium and computer program
US11501482B2 (en) Anonymization apparatus, surveillance device, method, computer program and storage medium
CN109063643B (en) Facial expression pain degree identification method under condition of partial hiding of facial information
CN109117774B (en) Multi-view video anomaly detection method based on sparse coding
CN110287848A (en) The generation method and device of video
Ding Visual quality assessment for natural and medical image
CN114332911A (en) Head posture detection method and device and computer equipment
CN114882570A (en) Remote examination abnormal state pre-judging method, system, equipment and storage medium
Khan et al. Classification of human's activities from gesture recognition in live videos using deep learning
Chen et al. Sound to visual: Hierarchical cross-modal talking face video generation
CN113269013A (en) Object behavior analysis method, information display method and electronic equipment
Huang et al. Posture-based infant action recognition in the wild with very limited data
CN115690934A (en) Master and student attendance card punching method and device based on batch face recognition
Prajapati et al. Mri-gan: A generalized approach to detect deepfakes using perceptual image assessment
CN115841602A (en) Construction method and device of three-dimensional attitude estimation data set based on multiple visual angles
Chaabouni et al. Prediction of visual attention with Deep CNN for studies of neurodegenerative diseases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination