CN114343612B

CN114343612B - Non-contact respiration rate measuring method based on Transformer

Info

Publication number: CN114343612B
Application number: CN202210232829.9A
Authority: CN
Inventors: 王金桥; 葛国敬; 朱贵波
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2022-03-10
Filing date: 2022-03-10
Publication date: 2022-05-24
Anticipated expiration: 2042-03-10
Also published as: CN114343612A

Abstract

The invention belongs to the field of machine vision and data identification, and particularly relates to a method, a system and a device for measuring a non-contact respiration rate based on a transform, aiming at solving the problems that the model generalization capability obtained by the existing respiration rate measuring method is poor, and the accuracy of the measured respiration rate is further poor. The method comprises the following steps: acquiring a video frame sequence to be detected containing face information in a set time period; based on a video frame sequence to be detected, acquiring a human face interesting region image sequence through a human face detection model and a human face key point model; based on the image sequence of the human face interesting region, a breathing rate sequence in a set time period is obtained through a trained end-to-end Transformer model. The invention improves the measurement precision of the respiration rate.

Description

Non-contact respiration rate measuring method based on Transformer

Technical Field

The invention relates to the field of machine vision and data identification, in particular to a method, a system and equipment for non-contact respiration rate measurement based on a Transformer.

Background

The respiration rate is defined as the number of breaths a person takes in a minute during rest. The number of us breaths per minute indicates how often our brain tells the body to breathe. Simultaneously: the "normal" respiration rate may vary with age. Normal ranges for breathing rates of children of different ages include: newborn: 30-60 breaths per minute; infant (1 to 12 months): 30-60 breaths per minute; infants (1-2 years old): 24-40 breaths per minute; preschool children (3-5 years old): 22-34 breaths per minute; school-age children (6-12 years old): 18-30 breaths per minute; adolescents (13-17 years old): 12-16 breaths per minute. In general: when a person is at rest, a measure of respiratory rate is taken.

In the early research of respiration rate measurement methods, a contact respiration rate measuring method is generally used, and compared with contact heart rate measurement, non-contact respiration rate measurement does not draw enough attention at present. For example, some conventional methods extract rPPG information, extract a respiratory rate change, reject abnormal sample points, and perform spectral analysis to obtain the respiratory rate of the person at that time, and the measurement effect is poor, that is, the variance between the predicted result and the true result is large.

Deep learning is a popular research direction in the field of machine learning in recent years, has been greatly successful in the fields of computer vision, natural language processing and the like, and is also researched and explored in the measurement direction of respiration rate. The existing method for testing the breathing rate of the face based on deep learning has the following defects: firstly, the existing data set is not large enough, the existing data set only has a small number of samples, and based on the reality, the method adopting the pre-training model with good fine tuning performance is a method for achieving relatively good precision; secondly, CNN expression ability mainly comes from a convolutional layer, which has limited expression ability, resulting in low measurement precision, the Transformer has great success in the NLP field and shows strong modeling ability on time sequence data, the time sequence prediction ability constructed on the basis of the Transformer can break through various previous limitations, and the most obvious gain point is that the Transformer for time sequence can have the ability of simultaneously modeling long-term and short-term time sequence characteristics on the basis of a multi-head attention structure. Thirdly, the respiratory rate is different from the heart rate in that the respiratory rate is generally measured when a person is at rest, is relatively stable, and cannot be rapidly changed in a short time like the heart rate. Based on the method, the invention provides a non-contact respiration rate measuring method based on a Transformer.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, to solve the problem that the accuracy of the measured respiration rate is poor due to the poor generalization capability of the model obtained by the conventional respiration rate measuring method, the invention provides a non-contact respiration rate measuring method based on a Transformer, which comprises the following steps:

step S100, acquiring a video frame sequence to be detected containing face information in a set time period;

step S200, acquiring a human face interesting region image sequence through a human face detection model and a human face key point model based on the video frame sequence to be detected;

step S300, acquiring a respiration rate sequence in a set time period through a trained end-to-end Transformer model based on the face region-of-interest image sequence;

the end-to-end Transformer model is constructed on the basis of a preprocessing module, a first-order feature extraction module, a second-order feature extraction module, a third-order feature extraction module, a fourth-order feature extraction module and a full connection layer which are connected in sequence;

the preprocessing module is used for carrying out block cutting operation on the input video frame sequence to be detected;

the first-order feature extraction module is constructed on the basis of a linear mapping module and a Swin Transformer Block module; the linear mapping module is used for mapping the cut video frame sequence to be tested to a set dimension;

the Swin Transformer Block module comprises a first submodule and a second submodule;

the first sub-module is constructed on the basis of a normalization layer, a first attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the second submodule is constructed on the basis of a normalization layer, a second attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the first attention layer is a window multi-head attention layer; the second attention layer is a shift window multi-head attention layer;

the second-order feature extraction module, the third-order feature extraction module and the fourth-order feature extraction module are all constructed on the basis of a Block fusion module and a Swin transform Block module;

and the block fusion module is used for sequentially carrying out down-sampling, series connection, normalization and linear mapping processing on the input features.

In some preferred embodiments, a sample amplification step is further included between step S200 and step S300:

based on the human face interesting region image sequence, obtaining human face image sets with different scales by cutting and affine transformation;

based on the different-scale face picture sets, carrying out sample amplification by partial region erasing and left-right turning methods to obtain an amplified face picture set, and sequencing the amplified face picture set according to time to generate an amplified face interesting region image sequence.

In some preferred embodiments, the query in each window of the window multi-head attention layer performs attention calculation with the key value in the window, and does not perform attention calculation with all the key values in the feature map.

In some preferred embodiments, when the attention mechanism is calculated, the multi-head attention in the shift window multi-head attention layer is firstly subjected to the dicing, shifting and splicing processing of the original feature block in sequence, and then the attention mechanism between the feature blocks is calculated.

In some preferred embodiments, the training method of the end-to-end Transformer model is as follows:

step A100, acquiring a training video frame sequence; based on the training video frame sequence, acquiring a human face region-of-interest image sequence through a human face detection model and a human face key point model; taking a face interesting region image sequence corresponding to a training video frame sequence and a standard respiratory rate sequence thereof as training samples to construct a training sample set;

a200, preprocessing a face region-of-interest image sequence in a training sample set; the preprocessing is to uniformly sample F images as sampling frames to be processed according to a time sequence based on the human face interesting region image sequence;

step A300, inputting the sampling frame to be processed into the end-to-end Transformer model to obtain a predicted respiration rate sequence in a set time period;

step A400, calculating a loss value based on a respiration rate sequence and a standard respiration rate sequence within a set time period predicted by an end-to-end Transformer model, and adjusting parameters of the end-to-end Transformer model;

and step A500, circularly executing the step A200 to the step A400 until a trained end-to-end Transformer model is obtained.

In some preferred embodiments, the end-to-end Transformer model has a loss function during training as follows:

wherein the content of the first and second substances,

which represents the loss in the time domain,

indicating the length of the video signal corresponding to the sequence of video frames to be tested,

represents a sequence of breathing rates predicted by an end-to-end Transformer model over a set period of time,

representing a standard sequence of breath rates over a set period of time,

、

a preset weight is represented by a weight value,

which represents the cross-entropy loss in the entropy domain,

indicating a loss of learning of the distribution of the label,

the total loss is expressed as a total loss,

an energy spectral density representing the respiration rate GT,

shows the results of GT obtained by a respiration rate device, GT being the group truth, standard respiration rate sequence, normalized by

,

The standard deviation is expressed in terms of the standard deviation,

which indicates the length of the label,

indicating the label length.

In a second aspect of the present invention, a transform-based non-contact respiration rate measuring system is provided, including: the device comprises a video frame acquisition unit, an interested region extraction unit and a respiration rate prediction unit;

the video frame acquisition unit is configured to acquire a video frame sequence to be detected containing face information in a set time period;

the interesting region extracting unit is configured to obtain a human face interesting region image sequence through a human face detection model and a human face key point model based on the video frame sequence to be detected;

the respiration rate predicting unit is configured to obtain a respiration rate sequence within a set time period through a trained end-to-end Transformer model based on the face region-of-interest image sequence;

the first-order feature extraction module is constructed on the basis of a linear mapping module and a Swin Transformer Block module; the linear mapping module is used for mapping the cut video frame sequence to be tested to a set dimensionality;

In a third aspect of the present invention, an electronic device is provided, including: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the transform-based contactless respiration rate measurement method described above.

In a fourth aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for being executed by the computer to implement the above-mentioned method for contactless measurement of respiration rate based on Transformer.

The invention has the beneficial effects that:

the invention improves the precision of the respiration rate measurement.

1) The present invention differs from the traditional method by dividing the respiration rate measurement into four phases (i.e.: extracting rPPG information, extracting respiration rate change, rejecting abnormal sample points and performing spectrum analysis); the invention belongs to a single-stage respiration rate measuring method, a network structure of an end-to-end Transformer is directly used, and compared with 3D convolution, the Transformer has stronger long-time modeling capacity, so that a model can obtain better characteristic expression, and the prediction precision of the respiration rate is further improved;

2) according to the method, the time domain loss and the frequency domain loss are used for training the model through combined optimization, and the generalization capability and the robustness of the model are improved. Wherein, the frequency domain loss is optimized by using the cross entropy loss and the label distribution learning loss together; the purpose of the label distribution learning loss is to construct a more reasonable label space by using the label space of the sample, make up for the problem of insufficient supervision signals in simple classification and increase the information quantity.

Drawings

FIG. 1 is a schematic flow chart of a method for contactless measurement of respiration rate based on Transformer according to an embodiment of the present invention;

FIG. 2 is a block diagram of a Transformer-based non-contact respiration rate measurement system according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an end-to-end Transformer model according to an embodiment of the present invention;

FIG. 4 is a Block diagram of a Swin Transformer Block module according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a multi-headed window attention layer according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a shift window multi-head attention layer according to an embodiment of the present invention;

FIG. 7 is a detailed structural diagram of a multi-headed attention of an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The invention relates to a non-contact respiration rate measuring method based on a Transformer, which comprises the following steps:

the first submodule is constructed on the basis of a normalization layer, a first attention layer, a normalization layer and a multilayer perceptron which are connected in sequence; the second submodule is constructed on the basis of a normalization layer, a second attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the first attention layer is a window multi-head attention layer; the second attention layer is a shift window multi-head attention layer;

In order to more clearly illustrate the method for measuring non-contact respiration rate based on Transformer of the present invention, the following will be described in detail with reference to fig. 1.

In the following embodiment, the process of constructing and training an end-to-end Transformer model is detailed first, and then the process of acquiring a respiration rate sequence within a set time period of a video frame sequence to be measured by a Transformer-based non-contact respiration rate measurement method is described.

1. Construction and training process of end-to-end Transformer model

in this embodiment, a training video frame sequence is first obtained, where the video frame sequence is a to-be-detected video frame sequence including face information within a set time period. And then acquiring a human face region-of-interest image sequence in the training video frame sequence through the human face detection model and the human face key point detection model. The face detection model and the face key point detection model are existing models and are not described one by one here.

And finally, taking the face interesting region image sequence and the standard respiration rate sequence thereof as training samples to construct a training sample set. Wherein, the values in the respiration rate sequence represent respiration rate values corresponding to different time points.

in this embodiment, it is preferable that 16, 32 or more images are uniformly acquired in time sequence from the face roi image sequence in the training sample set as the sampling frame to be processed. In other embodiments, the selection may be performed according to actual situations, such as sampling by a method of finding a key frame.

in this embodiment, a preprocessed image sequence of the region of interest of the human face is input into an end-to-end transform model, and the end-to-end transform model is constructed based on a preprocessing module, a first-order feature extraction module, a second-order feature extraction module, a third-order feature extraction module, a fourth-order feature extraction module and a full connection layer which are connected in sequence. The structure of the model is shown in fig. 3, and is specifically as follows:

the preprocessing module is used for carrying out block cutting operation on the input video frame sequence to be detected; for example, the input video frame sequence to be tested is

，

The time is represented by a time-of-day,

、

representing width and height, divided into non-overlapping blocks by a dicing operation, e.g. using

The block of (a) is taken as token, then the output of the first-order characteristic extraction module after the segmentation is

(ii) a Where 96 is the characteristic dimension of each cut.

The first-order feature extraction module is constructed on the basis of a linear mapping module and a Swin Transformer Block module;

the linear mapping module is used for mapping the cut video frame sequence to be tested to a set dimension; for example, by dividing the above

Is mapped to one

Is then outputted from the output of (a),

representing the dimensions of the mapping. In the present invention, C is preferably 128,192, etc., and in other embodiments, C may be selected according to practical situations.

The Swin Transformer Block module comprises a first submodule and a second submodule; as shown in fig. 4, the first sub-module is the left half of fig. 4, and the second sub-module is the right half of fig. 4.

The first sub-module is constructed on the basis of a normalization layer, a first attention layer, a normalization layer and a multilayer perceptron which are sequentially connected; the second submodule is constructed on the basis of a normalization layer, a second attention layer, a normalization layer and a multi-layer perceptron (constructed on the basis of a full connection layer, an activation function layer, a Dropout layer, a full connection layer and a Dropout layer which are connected in sequence); the first attention layer is a window multi-head attention layer (i.e., W-MSA layer); the second attention layer is a shift window multi-head attention layer (i.e., SW-MSA layer); layer normalization is used before each multi-head attention module and perceptron module in the first sub-module, the second sub-module, and residual concatenation is used after each multi-head attention and multi-layered perceptron. As shown in fig. 4.

The multi-head attention is internally provided with a plurality of heads, and the difference between the multi-head attention and the attention mechanism is that the multi-head attention is characterized in that the input is composed of a plurality of heads

，

Become a plurality of

,

,

Multiple attentions are independently calculated and then integrated to prevent overfitting, and the advantage is that one neural network model is similar to multiple same neural network models, but different weights are caused by different initialization, and then the results are integrated together to make weighting judgment. In the present invention, the multi-head attention, as shown in fig. 7, is the attention mechanism adopted by each head:

multiplying the output of the normalization layer by the weight matrix to obtain q, k and v;

（1）

（2）

（3）

wherein the content of the first and second substances,

denotes the first

The input of each first/second sub-module,

representing the layer normalization operation by the normalization layer within the first sub-module/second sub-module,

is shown as

A first sub-module/a second sub-module

The head of the device is provided with a plurality of heads,

a weight matrix is represented.

Calculating the dot product of q and k, and multiplying the result obtained by the dot product calculation by v as a coefficient after the result passes through an activation function layer and a Dropout layer in sequence;

and outputting the result obtained by multiplying after passing through a linear layer and a normalization layer, namely outputting the result which is the output of a single head in the multi-head attention. In addition, the input in fig. 7 refers to the input of multi-head attention, namely, the inputGo outReferred to as the output of a single head in a multi-head concentration.

And integrating the output of each head in the multi-head attention to form the output of the multi-head attention, wherein the Swin Transformer Block module reduces the size of the output characteristic size by half compared with the input characteristic size, and the number of output channels is twice of the number of input channels.

The window multi-head attention layer is, as shown in fig. 5, to perform window division on first-order features to fourth-order features, and compared with a conventional Tranformer structure, the query in each window only performs attention calculation on key values in the window, rather than performing calculation on all key values in a feature map, so that the calculation amount is reduced (for example, if the feature map is divided into four modules, the calculation amount is reduced by 1/4), and further, the time complexity is reduced and the forward inference time is accelerated.

The shift window multi-head attention mechanism module is, as shown in fig. 6, for solving the problem that only the calculation of the attention mechanism in the feature block is performed between the window multi-head attention mechanism modules, and no attention calculation mechanism exists between the feature blocks, the original feature blocks are re-diced, and the attention mechanism between the feature blocks can be calculated by shifting, re-splicing and calculating.

Based on the end-to-end Transformer model, a predicted respiration rate sequence in a set time period is obtained, and the specific process is as follows:

step S310, preprocessing the F sampling frames to be processed to obtain F embedded vectors, including: dividing the sampling frame to be processed into F multiplied by N sampling blocks with the size of P multiplied by P, wherein each sampling frame to be processed corresponds to N sampling blocks;

drawing each sampling block into a vector to obtain a vector to be processed, and based on the vector to be processed, obtaining an embedded vector to be processed through linear mapping;

stacking the embedded vectors to be processed corresponding to the same sampling frame to be processed to obtain F embedded vectors;

step S320, outputting the extracted feature vector through a first-order feature extraction module, a second-order feature extraction module, a third-order feature extraction module and a fourth-order feature extraction module based on the embedded vector;

step S330; and obtaining a predicted respiration rate sequence in the set time period through the full connection layer based on the extracted feature vector.

in this embodiment, a respiration rate sequence within a set time period is predicted by an end-to-end Transformer model, a loss value is calculated by a loss function pre-constructed in the present invention in combination with a standard respiration rate sequence, and an end-to-end Transformer model parameter is adjusted according to the loss value.

Wherein, the label distribution learning loss is that the original respiration rate is firstly normalized by Gaussian distribution in the training stage, and the specific operation is as follows:

（1）

wherein the content of the first and second substances,

representing the result of GT (group truth, i.e. annotation data (standard sequence of breathing rates), such as the result of a change in breathing rate within a person 1 s) obtained by a breathing rate device, normalized to a value, wherein

,

Represents standard deviation,

a number from 1 to the length of the label is indicated,

indicating the label length.

Then, calculating the label distribution learning loss:

（2）

wherein, the first and the second end of the pipe are connected with each other,

is the energy spectral density of the respiration rate GT, KL is the relative entropy divergence.

And (3) combining the label distribution learning loss to construct a total loss function of an end-to-end Transformer model:

（3）

（4）

wherein the content of the first and second substances,

which represents the loss in the time domain,

representing a standard sequence of breath rates over a set period of time,

、

a weight that is preset is represented by a weight,

which represents the cross-entropy loss in the entropy domain,

representing the total loss.

2. Non-contact respiration rate measuring method based on Transformer

and step S300, acquiring a respiration rate sequence in a set time period through a trained end-to-end Transformer model based on the face region-of-interest image sequence.

In this embodiment, the obtained image sequence of the region of interest of the face is input into the trained end-to-end transform model to obtain a respiration rate sequence within a set time period, and the invention preferably obtains a real-time respiration rate result obtained every 1 s.

In addition, a sample amplification step is further included between step S200 and step S300:

A second embodiment of the invention is a transform-based contactless respiration rate measuring system, as shown in fig. 2, including: a video frame acquisition unit 100, a region of interest extraction unit 200, a respiration rate prediction unit 300;

the video frame acquiring unit 100 is configured to acquire a video frame sequence to be detected including face information within a set time period;

the interesting region extracting unit 200 is configured to obtain a face interesting region image sequence through a face detection model and a face key point model based on the video frame sequence to be detected;

the respiration rate prediction unit 300 is configured to obtain a respiration rate sequence within a set time period through a trained end-to-end Transformer model based on the face region-of-interest image sequence;

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

It should be noted that, the contactless respiration rate measuring system based on a transducer provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the above embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

An electronic device of a third embodiment of the present invention includes: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the transform-based contactless respiration rate measurement method described above.

A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the method for contactless measurement of respiration rate based on transformers described above.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the electronic device and the computer-readable storage medium described above may refer to corresponding processes in the foregoing method examples, and are not described herein again.

Referring now to FIG. 8, there is illustrated a block diagram of a computer system suitable for use as a server in implementing embodiments of the system, method and apparatus of the present application. The server shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 8, the computer system includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for system operation are also stored. The CPU801, ROM 802, and RAM803 are connected to each other via a bus 804. An Input/Output (I/O) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a cathode ray tube, a liquid crystal display, and the like, and a speaker and the like; a storage section 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a local area network card, modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the CPU801, performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer-readable storage medium may be, for example but not limited to: an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local area network or a wide area network, or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is apparent to those skilled in the art that the scope of the present invention is not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A method for contactless measurement of respiration rate based on Transformer, comprising:

2. The method for contactless measurement of respiration rate based on Transformer according to claim 1, wherein between step S200 and step S300, there is further included a sample amplification step:

3. The method of claim 1, wherein the query in each window of the windowed multi-head attention layer is attentively computed with key values in the window, but not with all key values in the profile.

4. The method for measuring the contactless respiration rate based on the Transformer of claim 1, wherein when the attention mechanism is calculated, the multi-head attention in the multi-head attention layer of the shift window is processed by sequentially cutting, shifting and splicing the original feature blocks, and then the attention mechanism between the feature blocks is calculated.

5. The method for contactless measurement of respiration rate based on Transformer according to claim 1, wherein the training method of the end-to-end Transformer model is as follows:

6. The method of claim 5, wherein the end-to-end Transformer model has a loss function during training as follows:

wherein the content of the first and second substances,

which represents the loss in the time domain,

representing the breathing rate over a set period of time predicted by an end-to-end Transformer modelThe sequence of the sequence is determined by the sequence,

representing a standard breathing rate sequence over a set period of time,

、

a weight that is preset is represented by a weight,

which represents the cross-entropy loss in the entropy domain,

indicating a loss of learning of the distribution of the label,

the total loss is expressed as a total loss,

an energy spectral density representing the respiration rate GT,

,

The standard deviation is expressed in terms of the standard deviation,

a number from 1 to the length of the label is indicated,

which indicates the length of the label,

representing the relative entropy divergence.

7. A Transformer-based contactless respiration rate measurement system, comprising: the device comprises a video frame acquisition unit, an interested region extraction unit and a respiration rate prediction unit;

the video frame acquisition unit is configured to acquire a video frame sequence to be detected containing face information within a set time period;

the preprocessing module is used for carrying out block cutting operation on an input video frame sequence to be detected;

8. An electronic device, comprising: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the transform-based contactless respiration rate measurement method of any one of claims 1-6.

9. A computer readable storage medium having stored thereon computer instructions for execution by the computer to implement the transform-based contactless respiration rate measurement method of any one of claims 1-6.