CN114343612B - Non-contact respiration rate measuring method based on Transformer - Google Patents

Non-contact respiration rate measuring method based on Transformer Download PDF

Info

Publication number
CN114343612B
CN114343612B CN202210232829.9A CN202210232829A CN114343612B CN 114343612 B CN114343612 B CN 114343612B CN 202210232829 A CN202210232829 A CN 202210232829A CN 114343612 B CN114343612 B CN 114343612B
Authority
CN
China
Prior art keywords
sequence
module
respiration rate
transformer
feature extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210232829.9A
Other languages
Chinese (zh)
Other versions
CN114343612A (en
Inventor
王金桥
葛国敬
朱贵波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202210232829.9A priority Critical patent/CN114343612B/en
Publication of CN114343612A publication Critical patent/CN114343612A/en
Application granted granted Critical
Publication of CN114343612B publication Critical patent/CN114343612B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/08Detecting, measuring or recording devices for evaluating the respiratory organs
    • A61B5/0816Measuring devices for examining respiratory frequency
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/74Details of notification to user or communication with user or patient ; user input means
    • A61B5/7475User input or interface means, e.g. keyboard, pointing device, joystick
    • A61B5/748Selection of a region of interest, e.g. using a graphics tablet
    • A61B5/7485Automatic selection of region of interest
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Veterinary Medicine (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Public Health (AREA)
  • Animal Behavior & Ethology (AREA)
  • Surgery (AREA)
  • Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • Pulmonology (AREA)
  • Physiology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention belongs to the field of machine vision and data identification, and particularly relates to a method, a system and a device for measuring a non-contact respiration rate based on a transform, aiming at solving the problems that the model generalization capability obtained by the existing respiration rate measuring method is poor, and the accuracy of the measured respiration rate is further poor. The method comprises the following steps: acquiring a video frame sequence to be detected containing face information in a set time period; based on a video frame sequence to be detected, acquiring a human face interesting region image sequence through a human face detection model and a human face key point model; based on the image sequence of the human face interesting region, a breathing rate sequence in a set time period is obtained through a trained end-to-end Transformer model. The invention improves the measurement precision of the respiration rate.

Description

Non-contact respiration rate measuring method based on Transformer
Technical Field
The invention relates to the field of machine vision and data identification, in particular to a method, a system and equipment for non-contact respiration rate measurement based on a Transformer.
Background
The respiration rate is defined as the number of breaths a person takes in a minute during rest. The number of us breaths per minute indicates how often our brain tells the body to breathe. Simultaneously: the "normal" respiration rate may vary with age. Normal ranges for breathing rates of children of different ages include: newborn: 30-60 breaths per minute; infant (1 to 12 months): 30-60 breaths per minute; infants (1-2 years old): 24-40 breaths per minute; preschool children (3-5 years old): 22-34 breaths per minute; school-age children (6-12 years old): 18-30 breaths per minute; adolescents (13-17 years old): 12-16 breaths per minute. In general: when a person is at rest, a measure of respiratory rate is taken.
In the early research of respiration rate measurement methods, a contact respiration rate measuring method is generally used, and compared with contact heart rate measurement, non-contact respiration rate measurement does not draw enough attention at present. For example, some conventional methods extract rPPG information, extract a respiratory rate change, reject abnormal sample points, and perform spectral analysis to obtain the respiratory rate of the person at that time, and the measurement effect is poor, that is, the variance between the predicted result and the true result is large.
Deep learning is a popular research direction in the field of machine learning in recent years, has been greatly successful in the fields of computer vision, natural language processing and the like, and is also researched and explored in the measurement direction of respiration rate. The existing method for testing the breathing rate of the face based on deep learning has the following defects: firstly, the existing data set is not large enough, the existing data set only has a small number of samples, and based on the reality, the method adopting the pre-training model with good fine tuning performance is a method for achieving relatively good precision; secondly, CNN expression ability mainly comes from a convolutional layer, which has limited expression ability, resulting in low measurement precision, the Transformer has great success in the NLP field and shows strong modeling ability on time sequence data, the time sequence prediction ability constructed on the basis of the Transformer can break through various previous limitations, and the most obvious gain point is that the Transformer for time sequence can have the ability of simultaneously modeling long-term and short-term time sequence characteristics on the basis of a multi-head attention structure. Thirdly, the respiratory rate is different from the heart rate in that the respiratory rate is generally measured when a person is at rest, is relatively stable, and cannot be rapidly changed in a short time like the heart rate. Based on the method, the invention provides a non-contact respiration rate measuring method based on a Transformer.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, to solve the problem that the accuracy of the measured respiration rate is poor due to the poor generalization capability of the model obtained by the conventional respiration rate measuring method, the invention provides a non-contact respiration rate measuring method based on a Transformer, which comprises the following steps:
step S100, acquiring a video frame sequence to be detected containing face information in a set time period;
step S200, acquiring a human face interesting region image sequence through a human face detection model and a human face key point model based on the video frame sequence to be detected;
step S300, acquiring a respiration rate sequence in a set time period through a trained end-to-end Transformer model based on the face region-of-interest image sequence;
the end-to-end Transformer model is constructed on the basis of a preprocessing module, a first-order feature extraction module, a second-order feature extraction module, a third-order feature extraction module, a fourth-order feature extraction module and a full connection layer which are connected in sequence;
the preprocessing module is used for carrying out block cutting operation on the input video frame sequence to be detected;
the first-order feature extraction module is constructed on the basis of a linear mapping module and a Swin Transformer Block module; the linear mapping module is used for mapping the cut video frame sequence to be tested to a set dimension;
the Swin Transformer Block module comprises a first submodule and a second submodule;
the first sub-module is constructed on the basis of a normalization layer, a first attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the second submodule is constructed on the basis of a normalization layer, a second attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the first attention layer is a window multi-head attention layer; the second attention layer is a shift window multi-head attention layer;
the second-order feature extraction module, the third-order feature extraction module and the fourth-order feature extraction module are all constructed on the basis of a Block fusion module and a Swin transform Block module;
and the block fusion module is used for sequentially carrying out down-sampling, series connection, normalization and linear mapping processing on the input features.
In some preferred embodiments, a sample amplification step is further included between step S200 and step S300:
based on the human face interesting region image sequence, obtaining human face image sets with different scales by cutting and affine transformation;
based on the different-scale face picture sets, carrying out sample amplification by partial region erasing and left-right turning methods to obtain an amplified face picture set, and sequencing the amplified face picture set according to time to generate an amplified face interesting region image sequence.
In some preferred embodiments, the query in each window of the window multi-head attention layer performs attention calculation with the key value in the window, and does not perform attention calculation with all the key values in the feature map.
In some preferred embodiments, when the attention mechanism is calculated, the multi-head attention in the shift window multi-head attention layer is firstly subjected to the dicing, shifting and splicing processing of the original feature block in sequence, and then the attention mechanism between the feature blocks is calculated.
In some preferred embodiments, the training method of the end-to-end Transformer model is as follows:
step A100, acquiring a training video frame sequence; based on the training video frame sequence, acquiring a human face region-of-interest image sequence through a human face detection model and a human face key point model; taking a face interesting region image sequence corresponding to a training video frame sequence and a standard respiratory rate sequence thereof as training samples to construct a training sample set;
a200, preprocessing a face region-of-interest image sequence in a training sample set; the preprocessing is to uniformly sample F images as sampling frames to be processed according to a time sequence based on the human face interesting region image sequence;
step A300, inputting the sampling frame to be processed into the end-to-end Transformer model to obtain a predicted respiration rate sequence in a set time period;
step A400, calculating a loss value based on a respiration rate sequence and a standard respiration rate sequence within a set time period predicted by an end-to-end Transformer model, and adjusting parameters of the end-to-end Transformer model;
and step A500, circularly executing the step A200 to the step A400 until a trained end-to-end Transformer model is obtained.
In some preferred embodiments, the end-to-end Transformer model has a loss function during training as follows:
Figure 312070DEST_PATH_IMAGE001
Figure 362065DEST_PATH_IMAGE002
Figure 493970DEST_PATH_IMAGE003
Figure 629416DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 239389DEST_PATH_IMAGE005
which represents the loss in the time domain,
Figure 471787DEST_PATH_IMAGE006
indicating the length of the video signal corresponding to the sequence of video frames to be tested,
Figure 148494DEST_PATH_IMAGE007
represents a sequence of breathing rates predicted by an end-to-end Transformer model over a set period of time,
Figure 895870DEST_PATH_IMAGE008
representing a standard sequence of breath rates over a set period of time,
Figure 512796DEST_PATH_IMAGE009
Figure 537384DEST_PATH_IMAGE010
a preset weight is represented by a weight value,
Figure 11090DEST_PATH_IMAGE011
which represents the cross-entropy loss in the entropy domain,
Figure 855550DEST_PATH_IMAGE012
indicating a loss of learning of the distribution of the label,
Figure 10587DEST_PATH_IMAGE013
the total loss is expressed as a total loss,
Figure 14316DEST_PATH_IMAGE014
an energy spectral density representing the respiration rate GT,
Figure 534290DEST_PATH_IMAGE015
shows the results of GT obtained by a respiration rate device, GT being the group truth, standard respiration rate sequence, normalized by
Figure 990679DEST_PATH_IMAGE016
,
Figure 949408DEST_PATH_IMAGE017
The standard deviation is expressed in terms of the standard deviation,
Figure 184473DEST_PATH_IMAGE018
which indicates the length of the label,
Figure 999982DEST_PATH_IMAGE019
indicating the label length.
In a second aspect of the present invention, a transform-based non-contact respiration rate measuring system is provided, including: the device comprises a video frame acquisition unit, an interested region extraction unit and a respiration rate prediction unit;
the video frame acquisition unit is configured to acquire a video frame sequence to be detected containing face information in a set time period;
the interesting region extracting unit is configured to obtain a human face interesting region image sequence through a human face detection model and a human face key point model based on the video frame sequence to be detected;
the respiration rate predicting unit is configured to obtain a respiration rate sequence within a set time period through a trained end-to-end Transformer model based on the face region-of-interest image sequence;
the end-to-end Transformer model is constructed on the basis of a preprocessing module, a first-order feature extraction module, a second-order feature extraction module, a third-order feature extraction module, a fourth-order feature extraction module and a full connection layer which are connected in sequence;
the preprocessing module is used for carrying out block cutting operation on the input video frame sequence to be detected;
the first-order feature extraction module is constructed on the basis of a linear mapping module and a Swin Transformer Block module; the linear mapping module is used for mapping the cut video frame sequence to be tested to a set dimensionality;
the Swin Transformer Block module comprises a first submodule and a second submodule;
the first sub-module is constructed on the basis of a normalization layer, a first attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the second submodule is constructed on the basis of a normalization layer, a second attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the first attention layer is a window multi-head attention layer; the second attention layer is a shift window multi-head attention layer;
the second-order feature extraction module, the third-order feature extraction module and the fourth-order feature extraction module are all constructed on the basis of a Block fusion module and a Swin transform Block module;
and the block fusion module is used for sequentially carrying out down-sampling, series connection, normalization and linear mapping processing on the input features.
In a third aspect of the present invention, an electronic device is provided, including: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the transform-based contactless respiration rate measurement method described above.
In a fourth aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for being executed by the computer to implement the above-mentioned method for contactless measurement of respiration rate based on Transformer.
The invention has the beneficial effects that:
the invention improves the precision of the respiration rate measurement.
1) The present invention differs from the traditional method by dividing the respiration rate measurement into four phases (i.e.: extracting rPPG information, extracting respiration rate change, rejecting abnormal sample points and performing spectrum analysis); the invention belongs to a single-stage respiration rate measuring method, a network structure of an end-to-end Transformer is directly used, and compared with 3D convolution, the Transformer has stronger long-time modeling capacity, so that a model can obtain better characteristic expression, and the prediction precision of the respiration rate is further improved;
2) according to the method, the time domain loss and the frequency domain loss are used for training the model through combined optimization, and the generalization capability and the robustness of the model are improved. Wherein, the frequency domain loss is optimized by using the cross entropy loss and the label distribution learning loss together; the purpose of the label distribution learning loss is to construct a more reasonable label space by using the label space of the sample, make up for the problem of insufficient supervision signals in simple classification and increase the information quantity.
Drawings
FIG. 1 is a schematic flow chart of a method for contactless measurement of respiration rate based on Transformer according to an embodiment of the present invention;
FIG. 2 is a block diagram of a Transformer-based non-contact respiration rate measurement system according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an end-to-end Transformer model according to an embodiment of the present invention;
FIG. 4 is a Block diagram of a Swin Transformer Block module according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a multi-headed window attention layer according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a shift window multi-head attention layer according to an embodiment of the present invention;
FIG. 7 is a detailed structural diagram of a multi-headed attention of an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The invention relates to a non-contact respiration rate measuring method based on a Transformer, which comprises the following steps:
step S100, acquiring a video frame sequence to be detected containing face information in a set time period;
step S200, acquiring a human face interesting region image sequence through a human face detection model and a human face key point model based on the video frame sequence to be detected;
step S300, acquiring a respiration rate sequence in a set time period through a trained end-to-end Transformer model based on the face region-of-interest image sequence;
the end-to-end Transformer model is constructed on the basis of a preprocessing module, a first-order feature extraction module, a second-order feature extraction module, a third-order feature extraction module, a fourth-order feature extraction module and a full connection layer which are connected in sequence;
the preprocessing module is used for carrying out block cutting operation on the input video frame sequence to be detected;
the first-order feature extraction module is constructed on the basis of a linear mapping module and a Swin Transformer Block module; the linear mapping module is used for mapping the cut video frame sequence to be tested to a set dimension;
the Swin Transformer Block module comprises a first submodule and a second submodule;
the first submodule is constructed on the basis of a normalization layer, a first attention layer, a normalization layer and a multilayer perceptron which are connected in sequence; the second submodule is constructed on the basis of a normalization layer, a second attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the first attention layer is a window multi-head attention layer; the second attention layer is a shift window multi-head attention layer;
the second-order feature extraction module, the third-order feature extraction module and the fourth-order feature extraction module are all constructed on the basis of a Block fusion module and a Swin transform Block module;
and the block fusion module is used for sequentially carrying out down-sampling, series connection, normalization and linear mapping processing on the input features.
In order to more clearly illustrate the method for measuring non-contact respiration rate based on Transformer of the present invention, the following will be described in detail with reference to fig. 1.
In the following embodiment, the process of constructing and training an end-to-end Transformer model is detailed first, and then the process of acquiring a respiration rate sequence within a set time period of a video frame sequence to be measured by a Transformer-based non-contact respiration rate measurement method is described.
1. Construction and training process of end-to-end Transformer model
Step A100, acquiring a training video frame sequence; based on the training video frame sequence, acquiring a human face region-of-interest image sequence through a human face detection model and a human face key point model; taking a face interesting region image sequence corresponding to a training video frame sequence and a standard respiratory rate sequence thereof as training samples to construct a training sample set;
in this embodiment, a training video frame sequence is first obtained, where the video frame sequence is a to-be-detected video frame sequence including face information within a set time period. And then acquiring a human face region-of-interest image sequence in the training video frame sequence through the human face detection model and the human face key point detection model. The face detection model and the face key point detection model are existing models and are not described one by one here.
And finally, taking the face interesting region image sequence and the standard respiration rate sequence thereof as training samples to construct a training sample set. Wherein, the values in the respiration rate sequence represent respiration rate values corresponding to different time points.
A200, preprocessing a face region-of-interest image sequence in a training sample set; the preprocessing is to uniformly sample F images as sampling frames to be processed according to a time sequence based on the human face interesting region image sequence;
in this embodiment, it is preferable that 16, 32 or more images are uniformly acquired in time sequence from the face roi image sequence in the training sample set as the sampling frame to be processed. In other embodiments, the selection may be performed according to actual situations, such as sampling by a method of finding a key frame.
Step A300, inputting the sampling frame to be processed into the end-to-end Transformer model to obtain a predicted respiration rate sequence in a set time period;
in this embodiment, a preprocessed image sequence of the region of interest of the human face is input into an end-to-end transform model, and the end-to-end transform model is constructed based on a preprocessing module, a first-order feature extraction module, a second-order feature extraction module, a third-order feature extraction module, a fourth-order feature extraction module and a full connection layer which are connected in sequence. The structure of the model is shown in fig. 3, and is specifically as follows:
the preprocessing module is used for carrying out block cutting operation on the input video frame sequence to be detected; for example, the input video frame sequence to be tested is
Figure 84613DEST_PATH_IMAGE020
Figure 378191DEST_PATH_IMAGE021
The time is represented by a time-of-day,
Figure 294195DEST_PATH_IMAGE022
Figure 421551DEST_PATH_IMAGE023
representing width and height, divided into non-overlapping blocks by a dicing operation, e.g. using
Figure 55794DEST_PATH_IMAGE024
The block of (a) is taken as token, then the output of the first-order characteristic extraction module after the segmentation is
Figure 153063DEST_PATH_IMAGE025
(ii) a Where 96 is the characteristic dimension of each cut.
The first-order feature extraction module is constructed on the basis of a linear mapping module and a Swin Transformer Block module;
the linear mapping module is used for mapping the cut video frame sequence to be tested to a set dimension; for example, by dividing the above
Figure 861256DEST_PATH_IMAGE025
Is mapped to one
Figure 18568DEST_PATH_IMAGE026
Is then outputted from the output of (a),
Figure 140108DEST_PATH_IMAGE027
representing the dimensions of the mapping. In the present invention, C is preferably 128,192, etc., and in other embodiments, C may be selected according to practical situations.
The Swin Transformer Block module comprises a first submodule and a second submodule; as shown in fig. 4, the first sub-module is the left half of fig. 4, and the second sub-module is the right half of fig. 4.
The first sub-module is constructed on the basis of a normalization layer, a first attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the second submodule is constructed on the basis of a normalization layer, a second attention layer, a normalization layer and a multi-layer perceptron (constructed on the basis of a full connection layer, an activation function layer, a Dropout layer, a full connection layer and a Dropout layer which are connected in sequence); the first attention layer is a window multi-head attention layer (i.e., W-MSA layer); the second attention layer is a shift window multi-head attention layer (i.e., SW-MSA layer); layer normalization is used before each multi-head attention module and perceptron module in the first sub-module, the second sub-module, and residual concatenation is used after each multi-head attention and multi-layered perceptron. As shown in fig. 4.
The multi-head attention is internally provided with a plurality of heads, and the difference between the multi-head attention and the attention mechanism is that the multi-head attention is characterized in that the input is composed of a plurality of heads
Figure 414969DEST_PATH_IMAGE028
Figure 305565DEST_PATH_IMAGE018
Figure 633778DEST_PATH_IMAGE029
Become a plurality of
Figure 914718DEST_PATH_IMAGE030
,
Figure 619369DEST_PATH_IMAGE031
,
Figure 36575DEST_PATH_IMAGE032
Multiple attentions are independently calculated and then integrated to prevent overfitting, and the advantage is that one neural network model is similar to multiple same neural network models, but different weights are caused by different initialization, and then the results are integrated together to make weighting judgment. In the present invention, the multi-head attention, as shown in fig. 7, is the attention mechanism adopted by each head:
multiplying the output of the normalization layer by the weight matrix to obtain q, k and v;
Figure 535689DEST_PATH_IMAGE033
(1)
Figure 38346DEST_PATH_IMAGE034
(2)
Figure 657939DEST_PATH_IMAGE035
(3)
wherein the content of the first and second substances,
Figure 319865DEST_PATH_IMAGE036
denotes the first
Figure 865247DEST_PATH_IMAGE037
The input of each first/second sub-module,
Figure 979833DEST_PATH_IMAGE038
representing the layer normalization operation by the normalization layer within the first sub-module/second sub-module,
Figure 901653DEST_PATH_IMAGE039
is shown as
Figure 418085DEST_PATH_IMAGE037
A first sub-module/a second sub-module
Figure 134368DEST_PATH_IMAGE040
The head of the device is provided with a plurality of heads,
Figure 736251DEST_PATH_IMAGE041
a weight matrix is represented.
Calculating the dot product of q and k, and multiplying the result obtained by the dot product calculation by v as a coefficient after the result passes through an activation function layer and a Dropout layer in sequence;
and outputting the result obtained by multiplying after passing through a linear layer and a normalization layer, namely outputting the result which is the output of a single head in the multi-head attention. In addition, the input in fig. 7 refers to the input of multi-head attention, namely, the inputGo outReferred to as the output of a single head in a multi-head concentration.
And integrating the output of each head in the multi-head attention to form the output of the multi-head attention, wherein the Swin Transformer Block module reduces the size of the output characteristic size by half compared with the input characteristic size, and the number of output channels is twice of the number of input channels.
The window multi-head attention layer is, as shown in fig. 5, to perform window division on first-order features to fourth-order features, and compared with a conventional Tranformer structure, the query in each window only performs attention calculation on key values in the window, rather than performing calculation on all key values in a feature map, so that the calculation amount is reduced (for example, if the feature map is divided into four modules, the calculation amount is reduced by 1/4), and further, the time complexity is reduced and the forward inference time is accelerated.
The shift window multi-head attention mechanism module is, as shown in fig. 6, for solving the problem that only the calculation of the attention mechanism in the feature block is performed between the window multi-head attention mechanism modules, and no attention calculation mechanism exists between the feature blocks, the original feature blocks are re-diced, and the attention mechanism between the feature blocks can be calculated by shifting, re-splicing and calculating.
Based on the end-to-end Transformer model, a predicted respiration rate sequence in a set time period is obtained, and the specific process is as follows:
step S310, preprocessing the F sampling frames to be processed to obtain F embedded vectors, including: dividing the sampling frame to be processed into F multiplied by N sampling blocks with the size of P multiplied by P, wherein each sampling frame to be processed corresponds to N sampling blocks;
drawing each sampling block into a vector to obtain a vector to be processed, and based on the vector to be processed, obtaining an embedded vector to be processed through linear mapping;
stacking the embedded vectors to be processed corresponding to the same sampling frame to be processed to obtain F embedded vectors;
step S320, outputting the extracted feature vector through a first-order feature extraction module, a second-order feature extraction module, a third-order feature extraction module and a fourth-order feature extraction module based on the embedded vector;
step S330; and obtaining a predicted respiration rate sequence in the set time period through the full connection layer based on the extracted feature vector.
Step A400, calculating a loss value based on a respiration rate sequence and a standard respiration rate sequence within a set time period predicted by an end-to-end Transformer model, and adjusting parameters of the end-to-end Transformer model;
in this embodiment, a respiration rate sequence within a set time period is predicted by an end-to-end Transformer model, a loss value is calculated by a loss function pre-constructed in the present invention in combination with a standard respiration rate sequence, and an end-to-end Transformer model parameter is adjusted according to the loss value.
Wherein, the label distribution learning loss is that the original respiration rate is firstly normalized by Gaussian distribution in the training stage, and the specific operation is as follows:
Figure 524078DEST_PATH_IMAGE042
(1)
wherein the content of the first and second substances,
Figure 268918DEST_PATH_IMAGE015
representing the result of GT (group truth, i.e. annotation data (standard sequence of breathing rates), such as the result of a change in breathing rate within a person 1 s) obtained by a breathing rate device, normalized to a value, wherein
Figure 280736DEST_PATH_IMAGE016
,
Figure 245281DEST_PATH_IMAGE017
Represents standard deviation,
Figure 633537DEST_PATH_IMAGE018
a number from 1 to the length of the label is indicated,
Figure 734348DEST_PATH_IMAGE019
indicating the label length.
Then, calculating the label distribution learning loss:
Figure 120330DEST_PATH_IMAGE003
(2)
wherein, the first and the second end of the pipe are connected with each other,
Figure 696805DEST_PATH_IMAGE043
is the energy spectral density of the respiration rate GT, KL is the relative entropy divergence.
And (3) combining the label distribution learning loss to construct a total loss function of an end-to-end Transformer model:
Figure 764118DEST_PATH_IMAGE001
(3)
Figure 47332DEST_PATH_IMAGE045
(4)
wherein the content of the first and second substances,
Figure 666532DEST_PATH_IMAGE005
which represents the loss in the time domain,
Figure 841555DEST_PATH_IMAGE006
indicating the length of the video signal corresponding to the sequence of video frames to be tested,
Figure 571614DEST_PATH_IMAGE007
represents a sequence of breathing rates predicted by an end-to-end Transformer model over a set period of time,
Figure 647017DEST_PATH_IMAGE008
representing a standard sequence of breath rates over a set period of time,
Figure 640381DEST_PATH_IMAGE009
Figure 660290DEST_PATH_IMAGE010
a weight that is preset is represented by a weight,
Figure 334985DEST_PATH_IMAGE011
which represents the cross-entropy loss in the entropy domain,
Figure 389528DEST_PATH_IMAGE013
representing the total loss.
And step A500, circularly executing the step A200 to the step A400 until a trained end-to-end Transformer model is obtained.
2. Non-contact respiration rate measuring method based on Transformer
Step S100, acquiring a video frame sequence to be detected containing face information in a set time period;
step S200, acquiring a human face interesting region image sequence through a human face detection model and a human face key point model based on the video frame sequence to be detected;
and step S300, acquiring a respiration rate sequence in a set time period through a trained end-to-end Transformer model based on the face region-of-interest image sequence.
In this embodiment, the obtained image sequence of the region of interest of the face is input into the trained end-to-end transform model to obtain a respiration rate sequence within a set time period, and the invention preferably obtains a real-time respiration rate result obtained every 1 s.
In addition, a sample amplification step is further included between step S200 and step S300:
based on the human face interesting region image sequence, obtaining human face image sets with different scales by cutting and affine transformation;
based on the different-scale face picture sets, carrying out sample amplification by partial region erasing and left-right turning methods to obtain an amplified face picture set, and sequencing the amplified face picture set according to time to generate an amplified face interesting region image sequence.
A second embodiment of the invention is a transform-based contactless respiration rate measuring system, as shown in fig. 2, including: a video frame acquisition unit 100, a region of interest extraction unit 200, a respiration rate prediction unit 300;
the video frame acquiring unit 100 is configured to acquire a video frame sequence to be detected including face information within a set time period;
the interesting region extracting unit 200 is configured to obtain a face interesting region image sequence through a face detection model and a face key point model based on the video frame sequence to be detected;
the respiration rate prediction unit 300 is configured to obtain a respiration rate sequence within a set time period through a trained end-to-end Transformer model based on the face region-of-interest image sequence;
the end-to-end Transformer model is constructed on the basis of a preprocessing module, a first-order feature extraction module, a second-order feature extraction module, a third-order feature extraction module, a fourth-order feature extraction module and a full connection layer which are connected in sequence;
the preprocessing module is used for carrying out block cutting operation on the input video frame sequence to be detected;
the first-order feature extraction module is constructed on the basis of a linear mapping module and a Swin Transformer Block module; the linear mapping module is used for mapping the cut video frame sequence to be tested to a set dimension;
the Swin Transformer Block module comprises a first submodule and a second submodule;
the first sub-module is constructed on the basis of a normalization layer, a first attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the second submodule is constructed on the basis of a normalization layer, a second attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the first attention layer is a window multi-head attention layer; the second attention layer is a shift window multi-head attention layer;
the second-order feature extraction module, the third-order feature extraction module and the fourth-order feature extraction module are all constructed on the basis of a Block fusion module and a Swin transform Block module;
and the block fusion module is used for sequentially carrying out down-sampling, series connection, normalization and linear mapping processing on the input features.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
It should be noted that, the contactless respiration rate measuring system based on a transducer provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the above embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
An electronic device of a third embodiment of the present invention includes: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the transform-based contactless respiration rate measurement method described above.
A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the method for contactless measurement of respiration rate based on transformers described above.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the electronic device and the computer-readable storage medium described above may refer to corresponding processes in the foregoing method examples, and are not described herein again.
Referring now to FIG. 8, there is illustrated a block diagram of a computer system suitable for use as a server in implementing embodiments of the system, method and apparatus of the present application. The server shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 8, the computer system includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for system operation are also stored. The CPU801, ROM 802, and RAM803 are connected to each other via a bus 804. An Input/Output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a cathode ray tube, a liquid crystal display, and the like, and a speaker and the like; a storage section 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a local area network card, modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the CPU801, performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer-readable storage medium may be, for example but not limited to: an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local area network or a wide area network, or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is apparent to those skilled in the art that the scope of the present invention is not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (9)

1. A method for contactless measurement of respiration rate based on Transformer, comprising:
step S100, acquiring a video frame sequence to be detected containing face information in a set time period;
step S200, acquiring a human face interesting region image sequence through a human face detection model and a human face key point model based on the video frame sequence to be detected;
step S300, acquiring a respiration rate sequence in a set time period through a trained end-to-end Transformer model based on the face region-of-interest image sequence;
the end-to-end Transformer model is constructed on the basis of a preprocessing module, a first-order feature extraction module, a second-order feature extraction module, a third-order feature extraction module, a fourth-order feature extraction module and a full connection layer which are connected in sequence;
the preprocessing module is used for carrying out block cutting operation on the input video frame sequence to be detected;
the first-order feature extraction module is constructed on the basis of a linear mapping module and a Swin Transformer Block module; the linear mapping module is used for mapping the cut video frame sequence to be tested to a set dimension;
the Swin Transformer Block module comprises a first submodule and a second submodule;
the first sub-module is constructed on the basis of a normalization layer, a first attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the second submodule is constructed on the basis of a normalization layer, a second attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the first attention layer is a window multi-head attention layer; the second attention layer is a shift window multi-head attention layer;
the second-order feature extraction module, the third-order feature extraction module and the fourth-order feature extraction module are all constructed on the basis of a Block fusion module and a Swin transform Block module;
and the block fusion module is used for sequentially carrying out down-sampling, series connection, normalization and linear mapping processing on the input features.
2. The method for contactless measurement of respiration rate based on Transformer according to claim 1, wherein between step S200 and step S300, there is further included a sample amplification step:
based on the human face interesting region image sequence, obtaining human face image sets with different scales by cutting and affine transformation;
based on the different-scale face picture sets, carrying out sample amplification by partial region erasing and left-right turning methods to obtain an amplified face picture set, and sequencing the amplified face picture set according to time to generate an amplified face interesting region image sequence.
3. The method of claim 1, wherein the query in each window of the windowed multi-head attention layer is attentively computed with key values in the window, but not with all key values in the profile.
4. The method for measuring the contactless respiration rate based on the Transformer of claim 1, wherein when the attention mechanism is calculated, the multi-head attention in the multi-head attention layer of the shift window is processed by sequentially cutting, shifting and splicing the original feature blocks, and then the attention mechanism between the feature blocks is calculated.
5. The method for contactless measurement of respiration rate based on Transformer according to claim 1, wherein the training method of the end-to-end Transformer model is as follows:
step A100, acquiring a training video frame sequence; based on the training video frame sequence, acquiring a human face region-of-interest image sequence through a human face detection model and a human face key point model; taking a face interesting region image sequence corresponding to a training video frame sequence and a standard respiratory rate sequence thereof as training samples to construct a training sample set;
a200, preprocessing a face region-of-interest image sequence in a training sample set; the preprocessing is to uniformly sample F images as sampling frames to be processed according to a time sequence based on the human face interesting region image sequence;
step A300, inputting the sampling frame to be processed into the end-to-end Transformer model to obtain a predicted respiration rate sequence in a set time period;
step A400, calculating a loss value based on a respiration rate sequence and a standard respiration rate sequence within a set time period predicted by an end-to-end Transformer model, and adjusting parameters of the end-to-end Transformer model;
and step A500, circularly executing the step A200 to the step A400 until a trained end-to-end Transformer model is obtained.
6. The method of claim 5, wherein the end-to-end Transformer model has a loss function during training as follows:
Figure 588192DEST_PATH_IMAGE001
Figure DEST_PATH_IMAGE002
Figure 302070DEST_PATH_IMAGE003
Figure DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 691594DEST_PATH_IMAGE005
which represents the loss in the time domain,
Figure DEST_PATH_IMAGE006
indicating the length of the video signal corresponding to the sequence of video frames to be tested,
Figure 555645DEST_PATH_IMAGE007
representing the breathing rate over a set period of time predicted by an end-to-end Transformer modelThe sequence of the sequence is determined by the sequence,
Figure DEST_PATH_IMAGE008
representing a standard breathing rate sequence over a set period of time,
Figure DEST_PATH_IMAGE009
Figure DEST_PATH_IMAGE010
a weight that is preset is represented by a weight,
Figure DEST_PATH_IMAGE011
which represents the cross-entropy loss in the entropy domain,
Figure DEST_PATH_IMAGE012
indicating a loss of learning of the distribution of the label,
Figure DEST_PATH_IMAGE013
the total loss is expressed as a total loss,
Figure DEST_PATH_IMAGE014
an energy spectral density representing the respiration rate GT,
Figure DEST_PATH_IMAGE015
shows the results of GT obtained by a respiration rate device, GT being the group truth, standard respiration rate sequence, normalized by
Figure DEST_PATH_IMAGE016
,
Figure DEST_PATH_IMAGE017
The standard deviation is expressed in terms of the standard deviation,
Figure DEST_PATH_IMAGE018
a number from 1 to the length of the label is indicated,
Figure DEST_PATH_IMAGE019
which indicates the length of the label,
Figure DEST_PATH_IMAGE020
representing the relative entropy divergence.
7. A Transformer-based contactless respiration rate measurement system, comprising: the device comprises a video frame acquisition unit, an interested region extraction unit and a respiration rate prediction unit;
the video frame acquisition unit is configured to acquire a video frame sequence to be detected containing face information within a set time period;
the interesting region extracting unit is configured to obtain a human face interesting region image sequence through a human face detection model and a human face key point model based on the video frame sequence to be detected;
the respiration rate predicting unit is configured to obtain a respiration rate sequence within a set time period through a trained end-to-end Transformer model based on the face region-of-interest image sequence;
the end-to-end Transformer model is constructed on the basis of a preprocessing module, a first-order feature extraction module, a second-order feature extraction module, a third-order feature extraction module, a fourth-order feature extraction module and a full connection layer which are connected in sequence;
the preprocessing module is used for carrying out block cutting operation on an input video frame sequence to be detected;
the first-order feature extraction module is constructed on the basis of a linear mapping module and a Swin Transformer Block module; the linear mapping module is used for mapping the cut video frame sequence to be tested to a set dimension;
the Swin Transformer Block module comprises a first submodule and a second submodule;
the first sub-module is constructed on the basis of a normalization layer, a first attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the second submodule is constructed on the basis of a normalization layer, a second attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the first attention layer is a window multi-head attention layer; the second attention layer is a shift window multi-head attention layer;
the second-order feature extraction module, the third-order feature extraction module and the fourth-order feature extraction module are all constructed on the basis of a Block fusion module and a Swin transform Block module;
and the block fusion module is used for sequentially carrying out down-sampling, series connection, normalization and linear mapping processing on the input features.
8. An electronic device, comprising: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the transform-based contactless respiration rate measurement method of any one of claims 1-6.
9. A computer readable storage medium having stored thereon computer instructions for execution by the computer to implement the transform-based contactless respiration rate measurement method of any one of claims 1-6.
CN202210232829.9A 2022-03-10 2022-03-10 Non-contact respiration rate measuring method based on Transformer Active CN114343612B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210232829.9A CN114343612B (en) 2022-03-10 2022-03-10 Non-contact respiration rate measuring method based on Transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210232829.9A CN114343612B (en) 2022-03-10 2022-03-10 Non-contact respiration rate measuring method based on Transformer

Publications (2)

Publication Number Publication Date
CN114343612A CN114343612A (en) 2022-04-15
CN114343612B true CN114343612B (en) 2022-05-24

Family

ID=81094417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210232829.9A Active CN114343612B (en) 2022-03-10 2022-03-10 Non-contact respiration rate measuring method based on Transformer

Country Status (1)

Country Link
CN (1) CN114343612B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115844424B (en) * 2022-10-17 2023-09-22 北京大学 Sleep spindle wave hierarchical identification method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113197558A (en) * 2021-03-26 2021-08-03 中南大学 Heart rate and respiratory rate detection method and system and computer storage medium
CN113255635A (en) * 2021-07-19 2021-08-13 中国科学院自动化研究所 Multi-mode fused psychological stress analysis method
CN113408508A (en) * 2021-08-20 2021-09-17 中国科学院自动化研究所 Transformer-based non-contact heart rate measurement method

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7918801B2 (en) * 2005-12-29 2011-04-05 Medility Llc Sensors for monitoring movements, apparatus and systems therefor, and methods for manufacture and use
US8792969B2 (en) * 2012-11-19 2014-07-29 Xerox Corporation Respiratory function estimation from a 2D monocular video
US20170055878A1 (en) * 2015-06-10 2017-03-02 University Of Connecticut Method and system for respiratory monitoring
CN105520724A (en) * 2016-02-26 2016-04-27 严定远 Method for measuring heart rate and respiratory frequency of human body
WO2018188993A1 (en) * 2017-04-14 2018-10-18 Koninklijke Philips N.V. Person identification systems and methods
CN110367950B (en) * 2019-07-22 2022-06-07 西安奇点融合信息科技有限公司 Non-contact physiological information detection method and system
CN112200162B (en) * 2020-12-03 2021-02-23 中国科学院自动化研究所 Non-contact heart rate measuring method, system and device based on end-to-end network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113197558A (en) * 2021-03-26 2021-08-03 中南大学 Heart rate and respiratory rate detection method and system and computer storage medium
CN113255635A (en) * 2021-07-19 2021-08-13 中国科学院自动化研究所 Multi-mode fused psychological stress analysis method
CN113408508A (en) * 2021-08-20 2021-09-17 中国科学院自动化研究所 Transformer-based non-contact heart rate measurement method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Improving Accuracy of Respiratory Rate Estimation by Restoring High Resolution Features with Transformers and Recursive Convolutional Models;Kwasniewska A,等;《Proceedings of the IEEE》;20211231;全文 *
Instantaneous Physiological Estimation using Video Transformers;Revanur A,等;《arXiv preprint arXiv:2202.12368》;20220224;全文 *
基于Relief特征选择算法与多生理信号的焦虑状态识别;雷沛,等;《中国医疗器械杂志》;20141231;第38卷(第3期);全文 *
基于时频信息融合网络的非干扰呼吸检测方法;沈建飞,等;《高技术通讯》;20201231;第30卷(第10期);全文 *

Also Published As

Publication number Publication date
CN114343612A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
Zhu et al. Study on the evaluation method of sound phase cloud maps based on an improved YOLOv4 algorithm
CN109784283A (en) Based on the Remote Sensing Target extracting method under scene Recognition task
CN109034092A (en) Accident detection method for monitoring system
CN114782694B (en) Unsupervised anomaly detection method, system, device and storage medium
CN113017630A (en) Visual perception emotion recognition method
CN114343612B (en) Non-contact respiration rate measuring method based on Transformer
CN113159056A (en) Image segmentation method, device, equipment and storage medium
Luo et al. An underwater acoustic target recognition method based on spectrograms with different resolutions
CN110823190B (en) Island reef shallow sea water depth prediction method based on random forest
Zhang et al. An improved sea ice classification algorithm with Gaofen-3 dual-polarization SAR data based on deep convolutional neural networks
CN111401105B (en) Video expression recognition method, device and equipment
CN115545093A (en) Multi-mode data fusion method, system and storage medium
Haugg et al. Effectiveness of remote ppg construction methods: A preliminary analysis
CN114022742B (en) Infrared and visible light image fusion method and device and computer storage medium
CN113793300A (en) Non-contact type respiration rate detection method based on thermal infrared imager
Zhu et al. Feature selection based on principal component regression for underwater source localization by deep learning
Wu et al. A high-precision automatic pointer meter reading system in low-light environment
Mu et al. Non‐destructive detection of blueberry skin pigments and intrinsic fruit qualities based on deep learning
Bilal et al. An early warning system for earthquake prediction from seismic data using Batch Normalized Graph Convolutional Neural Network with Attention Mechanism (BNGCNNATT)
Cheng et al. Machine learning for music genre classification using visual mel spectrum
CN113111944B (en) Photoacoustic spectrum identification method and device based on deep learning and gas photoacoustic effect
Stogiannopoulos et al. A study of machine learning regression techniques for non-contact spo2 estimation from infrared motion-magnified facial video
Yao et al. Underwater acoustic target recognition based on data augmentation and residual CNN
Yang et al. Self-supervised electroencephalogram representation learning for automatic sleep staging: model development and evaluation study
Ntalianis et al. Deep CNN sparse coding for real time inhaler sounds classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant