CN114895275B - Efficient multidimensional attention neural network-based radar micro gesture recognition method - Google Patents

Efficient multidimensional attention neural network-based radar micro gesture recognition method Download PDF

Info

Publication number
CN114895275B
CN114895275B CN202210551031.0A CN202210551031A CN114895275B CN 114895275 B CN114895275 B CN 114895275B CN 202210551031 A CN202210551031 A CN 202210551031A CN 114895275 B CN114895275 B CN 114895275B
Authority
CN
China
Prior art keywords
layer
attention
doppler
sequence
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210551031.0A
Other languages
Chinese (zh)
Other versions
CN114895275A (en
Inventor
张文鹏
杨磊
姜卫东
张双辉
刘永祥
霍凯
高勋章
卢杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202210551031.0A priority Critical patent/CN114895275B/en
Publication of CN114895275A publication Critical patent/CN114895275A/en
Application granted granted Critical
Publication of CN114895275B publication Critical patent/CN114895275B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/02Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
    • G01S7/41Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section
    • G01S7/417Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section involving the use of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Discrete Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a radar micro gesture recognition method based on a high-efficiency multidimensional attention neural network, computer equipment and a storage medium. The method comprises the following steps: constructing a multidimensional attention module according to the global maximum pooling layer, the global average pooling layer and the segmentation and concatenation convolution module; the multidimensional attention module comprises a spatial attention module, a channel attention module and a time attention module; constructing a high-efficiency multidimensional attention block by utilizing a multidimensional attention module and a plurality of convolution blocks, constructing a high-efficiency multidimensional attention neural network according to the high-efficiency multidimensional attention block, a preset convolution layer, a maximum pooling layer, a global average pooling layer, a full connection layer and a Softmax layer, and training the high-efficiency multidimensional attention neural network; and inputting the range-Doppler graph sequence of the detected gesture into a trained high-efficiency multidimensional attention neural network to perform gesture recognition. By adopting the method, the accuracy of radar micro gesture recognition can be improved.

Description

Efficient multidimensional attention neural network-based radar micro gesture recognition method
Technical Field
The application relates to the technical field of radar target recognition, in particular to a radar micro gesture recognition method, computer equipment and storage medium based on a high-efficiency multidimensional attention neural network.
Background
Gestures carry rich information in daily life and become a hot topic in the field of human-computer interaction. Gesture recognition has been applied in many areas, such as smart home, virtual reality, etc. The primary gesture recognition technologies at present mainly include vision-based technologies, wearable device-based technologies and radar-based technologies. Under vision-based techniques, hand motions may be captured by an optical camera or depth sensor, which in turn generates Red Green Blue (RGB) images or depth images for recognition. But this method does not work well in darker conditions and presents privacy concerns. Wearable device-based technology requires users to wear designated sensors and devices in order to gather gesture data. Wearable devices are generally expensive and the user experience is poor. Compared with the two, the radar sensor has the characteristics of non-contact, no influence of illumination, no relation to user privacy and the like. Therefore, radar-based gesture recognition methods are of interest to many expert scholars and are widely used. Inching mainly refers to movements of relatively small amplitude, such as rotation, vibration, acceleration, etc., of a target and its components during movement. Thus, the jog feature of the object can be utilized to classify different objects or to distinguish between different actions. Since the magnitude of the gesture motion is small compared with the subject, it can be regarded as one of the kinds of micro motion recognition. The existing radar-based micro gesture recognition method mainly comprises the steps of generating a feature map by processing radar echo data, extracting features and recognizing the feature map by designing a two-dimensional convolutional neural network, and mainly utilizing information such as distance, frequency and the like of gesture actions.
However, the radar echo data of the dynamic gesture simultaneously contains distance, speed and time information, the two-dimensional convolution neural network cannot fully extract effective information in the radar echo data due to the limitation of the structure of the two-dimensional convolution neural network, the problem of insufficient action information of the gesture is solved effectively by using the three-dimensional convolution neural network, but the three-dimensional convolution neural network is still in an initial stage in the radar target recognition field at present, the gesture echo data is insufficiently utilized and has large parameter quantity under a complex scene, characteristics cannot be effectively extracted, and the radar micro gesture recognition accuracy is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a radar jog gesture recognition method, a computer device, and a storage medium based on an efficient multidimensional attention neural network, which can improve accuracy of radar jog gesture recognition.
A method of radar jog gesture recognition based on a high efficiency multidimensional attention neural network, the method comprising:
Acquiring radar echo data; the radar echo data comprises a plurality of gestures to be detected;
Performing two-dimensional Fourier transform and filtering processing on the radar echo data to obtain a range-Doppler graph sequence of the detected gesture; dividing the Doppler sequence into data according to a preset proportion to obtain a training set and a testing set;
constructing a multidimensional attention module according to the global maximum pooling layer, the global average pooling layer and the segmentation and concatenation convolution module; the multidimensional attention module comprises a spatial attention module, a channel attention module and a time attention module;
Constructing a high-efficiency multidimensional attention block by utilizing a multidimensional attention module and a plurality of convolution blocks, and constructing a high-efficiency multidimensional attention neural network according to the high-efficiency multidimensional attention block, a preset convolution layer, a maximum pooling layer, a global average pooling layer, a full connection layer and a Softmax layer;
Training the high-efficiency multidimensional attention neural network by using the training set and the testing set to obtain a trained high-efficiency multidimensional attention neural network;
And inputting the range-Doppler graph sequence of the detected gesture into a trained high-efficiency multidimensional attention neural network to perform gesture recognition.
In one embodiment, the radar echo data includes an intra-pulse fast time and an intra-pulse slow time; performing two-dimensional Fourier transform and filtering processing on radar echo data to obtain a range-Doppler graph sequence of the detected gesture, wherein the range-Doppler graph sequence comprises the following steps:
Performing two-dimensional Fourier transform on the intra-pulse fast time and the intra-pulse slow time of the radar echo data to obtain a function of the distance and the speed of a target scattering point;
And filtering zero frequency components in the function of the distance and the speed of the target scattering point by using mean value filtering to obtain a Doppler graph sequence of the detected gesture.
In one embodiment, performing a two-dimensional fourier transform on the fast and slow intra-pulse times of the radar echo data to obtain a function of distance and speed of the target scattering point, includes:
Performing two-dimensional Fourier transform on the fast and slow intra-pulse times of the radar echo data to obtain a function of the distance and speed of the target scattering point as
Wherein N represents FFT conversion of slow time dimension performed every N pulse repetition periods T p, A l represents intensity of the first scattering point, R l is distance from the first scattering point to radar, v l is speed of the first scattering point, gamma is modulation frequency of a transmitted signal, L represents total number of target scattering points, and f i,fd represents fast timeAnd a representation of the slow time t m in the frequency domain after fourier transform, corresponding to the distance and velocity, respectively.
In one embodiment, constructing a multidimensional attention module from a global max-pooling layer, a global average pooling layer, and a split-splice convolution module includes:
Constructing a channel attention module according to the global maximum pooling layer and the global average pooling layer;
constructing a space attention module according to the segmentation and concatenation convolution module;
the temporal attention module is built from the global averaging pooling layer.
In one embodiment, a high-efficiency multidimensional attention neural network includes an input layer, an intermediate layer, and an output layer; constructing a high-efficiency multidimensional attention neural network according to a high-efficiency multidimensional attention block, a preset convolution layer, a maximum pooling layer, a global average pooling layer, a full connection layer and a Softmax layer, comprising:
Constructing an input layer of the efficient multidimensional attention neural network according to the convolution layer and the maximum pooling layer;
Constructing an intermediate layer of the high-efficiency multi-dimensional attention neural network by utilizing a plurality of high-efficiency multi-dimensional attention blocks comprising different multi-dimensional attention modules;
And constructing an output layer of the efficient multidimensional attention neural network according to the global average pooling layer, the full connection layer and the Softmax layer.
In one embodiment, inputting a range-doppler plot sequence of a detected gesture into a trained high-efficiency multi-dimensional attention neural network for gesture recognition includes:
inputting the distance Doppler graph sequence of the detected gesture into a trained high-efficiency multidimensional attention neural network, preprocessing the distance Doppler graph sequence by an input layer, and outputting the distance Doppler graph sequence to an intermediate layer for feature extraction to obtain a multidimensional feature graph;
And after the multi-dimensional feature map is convolved, carrying out three-dimensional global average pooling on the convolved multi-dimensional feature map according to a global average pooling layer at an output layer, and carrying out gesture classification on the three-dimensional global average pooled multi-dimensional feature map according to a Softmax layer to obtain a recognition result.
In one embodiment, inputting a range-doppler graph sequence of a detected gesture into a trained high-efficiency multidimensional attention neural network, preprocessing the range-doppler graph sequence by an input layer, and outputting the preprocessed range-doppler graph sequence to an intermediate layer for feature extraction to obtain a multidimensional feature graph, wherein the method comprises the following steps:
Inputting the distance Doppler graph sequence of the gesture to be detected into a trained high-efficiency multidimensional attention neural network, preprocessing the distance Doppler graph sequence by an input layer, outputting the distance Doppler graph sequence to an intermediate layer, and extracting features of the distance Doppler graph sequence by using a spatial attention module to obtain a multiscale fusion feature graph;
the channel attention module is utilized to calculate the weight of the distance Doppler graph sequence, so as to obtain the channel weight corresponding to the feature graph;
performing feature extraction on the range-Doppler graph sequence by using a time attention module to obtain a time feature graph;
And multiplying the multi-scale feature map and the channel weight corresponding to the feature map point by point, and then adding the channel weight and the time feature map to obtain the multi-dimensional feature map.
In one embodiment, feature extraction is performed on a range-doppler plot sequence by using a spatial attention module to obtain a multi-scale fusion feature plot, including:
Feature extraction is carried out on the range-Doppler image sequence by utilizing the spatial attention module, and the multi-scale fusion feature image is obtained as
Fs=Conv(1×1×1,N→C)(Fs_all)
Where ,Fs_all=Cat([Fs1,Fs2,…,FsN]),Fsi=Conv(3×ki×ki,C'→1)(Fi)i=1,2…N,Fi denotes the range-doppler plot sequence, k i denotes the i-th split-splice convolution kernel size,Characteristic diagrams representing different scales are shown, N represents the total number of split joint convolution modules, C' represents the number of channels, and C represents the number of input channels.
In one embodiment, the calculating the weights of the range-doppler plot sequence by using the channel attention module to obtain the channel weights corresponding to the feature plot includes:
carrying out global average pooling and global maximum pooling on the distance Doppler image sequence along the time dimension and the space dimension to obtain an average pooling characteristic image and a maximum pooling characteristic image;
splicing the channel dimensions of the average pooling feature map and the maximum pooling feature map to obtain a pooling feature map;
And fusing and exciting the spliced pooling feature images by using two full-connection layers to obtain channel weights corresponding to the feature images.
In one embodiment, the feature extraction of the range-doppler plot sequence by using the time-attention module, to obtain a time feature plot, includes:
feature extraction is carried out on the range-Doppler graph sequence by utilizing the time attention module, and the obtained time feature graph is as follows
Ft=σ(gtWt1)Wt2
Wherein,F represents a range-Doppler plot sequence, H and W represent the height and width, respectively, of the range-Doppler plot sequence,/>Σ represents GeLU operations.
A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:
Acquiring radar echo data; the radar echo data comprises a plurality of gestures to be detected;
Performing two-dimensional Fourier transform and filtering processing on the radar echo data to obtain a range-Doppler graph sequence of the detected gesture; dividing the Doppler sequence into data according to a preset proportion to obtain a training set and a testing set;
constructing a multidimensional attention module according to the global maximum pooling layer, the global average pooling layer and the segmentation and concatenation convolution module; the multidimensional attention module comprises a spatial attention module, a channel attention module and a time attention module;
Constructing a high-efficiency multidimensional attention block by utilizing a multidimensional attention module and a plurality of convolution blocks, and constructing a high-efficiency multidimensional attention neural network according to the high-efficiency multidimensional attention block, a preset convolution layer, a maximum pooling layer, a global average pooling layer, a full connection layer and a Softmax layer;
Training the high-efficiency multidimensional attention neural network by using the training set and the testing set to obtain a trained high-efficiency multidimensional attention neural network;
And inputting the range-Doppler graph sequence of the detected gesture into a trained high-efficiency multidimensional attention neural network to perform gesture recognition.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
Acquiring radar echo data; the radar echo data comprises a plurality of gestures to be detected;
Performing two-dimensional Fourier transform and filtering processing on the radar echo data to obtain a range-Doppler graph sequence of the detected gesture; dividing the Doppler sequence into data according to a preset proportion to obtain a training set and a testing set;
constructing a multidimensional attention module according to the global maximum pooling layer, the global average pooling layer and the segmentation and concatenation convolution module; the multidimensional attention module comprises a spatial attention module, a channel attention module and a time attention module;
Constructing a high-efficiency multidimensional attention block by utilizing a multidimensional attention module and a plurality of convolution blocks, and constructing a high-efficiency multidimensional attention neural network according to the high-efficiency multidimensional attention block, a preset convolution layer, a maximum pooling layer, a global average pooling layer, a full connection layer and a Softmax layer;
Training the high-efficiency multidimensional attention neural network by using the training set and the testing set to obtain a trained high-efficiency multidimensional attention neural network;
And inputting the range-Doppler graph sequence of the detected gesture into a trained high-efficiency multidimensional attention neural network to perform gesture recognition.
According to the radar micro gesture recognition method based on the high-efficiency multidimensional attention neural network, the detection echo data of the radar is processed into the distance Doppler graph sequence through two-dimensional Fourier transform to serve as network input, then the input distance Doppler graph sequence is effectively extracted by the multidimensional attention neural network based on joint space, channels and time, the multiscale spatial characteristics of the characteristic graph are effectively extracted by utilizing the multiscale convolution kernel, the channel attention weight is generated by applying a compression-excitation mechanism on the channel dimension, the channel interactivity is further built by utilizing a Softmax layer, the global time clue is obtained by modeling the global frame number by providing the time self-attention module, the radar echo data is effectively and fully utilized, the problems that the existing network is insufficient in utilization of gesture echo data under a complex scene and large in parameter quantity are solved, and the method has important engineering application value.
Drawings
FIG. 1 is a flow diagram of a method for radar jog gesture recognition based on an efficient multidimensional attention neural network in one embodiment;
FIG. 2 is a flow diagram of a sequence of radar gesture target range-Doppler plots in one embodiment;
FIG. 3 is a schematic diagram of a multi-dimensional attention Module (MDA) architecture in one embodiment;
FIG. 4 is a schematic diagram of a split-splice convolution module (SCC) in another embodiment;
FIG. 5 is a schematic diagram of an efficient multidimensional attention module (EMDA) architecture in one embodiment;
FIG. 6 is a block diagram of a high-efficiency multi-dimensional attention residual network (EMDANet) in one embodiment;
FIG. 7 is a Doppler plot corresponding to six gestures in one embodiment;
FIG. 8 illustrates the amount of parameters and recognition accuracy corresponding to 2D-ResNet-50, C3D, P3D-60,3D-ResNet-50 and EMDANet-50 for one embodiment using D All_train and D All_test as training and test sets;
FIG. 9 illustrates recognition rates corresponding to 2D-ResNet-50, C3D, P3D-60,3D-ResNet-50 and EMDANet-50 for one embodiment using D All_n_train and D All_n_test as training and test sets;
Fig. 10 is an internal structural view of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In one embodiment, as shown in fig. 1, a radar jog gesture recognition method based on a high-efficiency multidimensional attention neural network is provided, comprising the steps of:
102, acquiring radar echo data; the radar echo data comprises a plurality of gestures to be detected; performing two-dimensional Fourier transform and filtering processing on the radar echo data to obtain a range-Doppler graph sequence of the detected gesture; and carrying out data division on the Doppler sequence according to a preset proportion to obtain a training set and a testing set.
And carrying out two-dimensional Fourier transform and filtering processing on the radar echo data to obtain a range Doppler graph sequence of the detected gesture, which is favorable for extracting gesture features in the radar echo data, so as to identify the gesture features and improve the accuracy and efficiency of gesture identification. As shown in fig. 7, (a) - (f) are doppler graphs corresponding to the forward direction, upward hand waving, downward hand waving, leftward hand waving, rightward hand waving and two-hand intersection of the radar gesture respectively, and the two-dimensional fourier transform and filtering processing are performed on the radar echo data, and then the data dividing process is as follows:
s1.1, radar echo data are obtained;
s1.1.1 assume that the wideband radar transmits a chirp signal as The expression is:
Wherein the method comprises the steps of For fast time in pulse, record the time of wave propagation, t m=mTp (m=0, 1, 2.) is slow time between pulses, t is full time,/>F c、Tp, gamma and rect (·) are the center frequency, repetition period, tuning frequency and envelope of the transmitted signal, respectively. The signal is transmitted at time t m and is a chirp signal in time t m,tm+Tp.
S1.1.2 because radar chirp duration is typically on the order of microseconds, a target radar echo model can be built using a "stop-go" model. Let r l(tm) denote the radial distance between the first scattering center and the radar at the m-th pulse start time, the radar echo signal of the spatial target is:
Wherein sigma l(tm),τl(tm)=2rl(tm)/c is the echo amplitude and echo delay corresponding to the first scattering center, respectively.
S1.1.3 demodulating the echo signal, and then echo the radar for a fast timeAnd slow time t m to perform a two-dimensional fourier FFT to obtain a function of the target scattering point distance and velocity:
Wherein N represents FFT conversion of slow time dimension every N pulse repetition periods T p, A l represents intensity of the first scattering point, R l is distance from the first scattering point to radar, v l is speed of the first scattering point, gamma is modulation frequency of a transmitted signal respectively, L represents total number of target scattering points, and f i,fd represents fast time respectively And a representation of the slow time t m in the frequency domain after fourier transform, corresponding to the distance and velocity, respectively.
S1.1.4 to the obtained S if(fi,fd) filtering the zero frequency component by means of mean filtering to obtain a doppler image F rd of the detected gesture, and performing the above processing on each frame of the echo to obtain a doppler image sequence F rds of the detected gesture.
S1.2 preprocessing Doppler image sequence data
And scaling each frame of Doppler images into 256 multiplied by 256 according to a dynamic gesture Doppler image sequence F rds obtained by radar echo processing, and randomly selecting 16 frames of Doppler images from the Doppler image sequence F rds to be used as the input of the neural network.
S1.3, dividing a training set and a testing set for Doppler image sequence data:
S1.3.1 assuming that p dynamic gestures need to be identified, n people are detected, m scenes are provided, each person performs q times on each gesture in each scene, and the total detected gesture action times are measured as D All =n×m×p.
S1.3.2 follow 7 for all detected gesture actions: the ratio of 3 is randomly divided into training set D All_train and test set D All_test.
S1.3.3 the detected gesture acts according to the number of people according to 7: the ratio of 3 is randomly divided into training set D All_n_train and test set D All_n_test.
Step 104, constructing a multidimensional attention module according to the global maximum pooling layer, the global average pooling layer and the segmentation and concatenation convolution module; the multidimensional attention module includes a spatial attention module, a channel attention module, and a temporal attention module.
By arranging the spatial attention module and the time attention module, the omnidirectional feature extraction is carried out in the spatial dimension and the time dimension, the channel attention module is utilized to apply a compression-excitation mechanism in the channel dimension to generate the feature map weight, and radar echo data is effectively and fully utilized.
And step 106, constructing a high-efficiency multidimensional attention block by utilizing the multidimensional attention module and a plurality of convolution blocks, and constructing a high-efficiency multidimensional attention neural network according to the high-efficiency multidimensional attention block, a preset convolution layer, a maximum pooling layer, a global average pooling layer, a full connection layer and a Softmax layer.
By introducing a residual structure, namely two convolution blocks, the high-efficiency multidimensional attention module can be obtained from the multidimensional attention module, the optimization is easy, the problems of gradient elimination and gradient explosion existing in a deep network are relieved, the problem of network degradation is solved, and the training speed of the high-efficiency multidimensional attention neural network is further improved, and the gesture recognition accuracy is further improved. The degradation problem means that the training speed of the network is slow, the conditions of gradient disappearance and gradient explosion occur, the optimal solution is not obtained, and the training speed is slow and the recognition rate is low in gesture recognition.
And step 108, training the high-efficiency multidimensional attention neural network by using the training set and the testing set to obtain the trained high-efficiency multidimensional attention neural network.
The method comprises the steps of training a high-efficiency multidimensional attention neural network by using a training set and a testing set, setting initialization parameters of the network, setting the number of small batches to be 16, considering computing resources and network model convergence, setting a network loss function to be a cross entropy loss function (CrossEntropyLoss), selecting a random gradient descent method (storage GRADIENT DESCENT, SGD) by a network optimization algorithm, setting a learning rate to be 0.00002, setting a momentum factor to be 0.9, setting weight attenuation to be 0.0005, dynamically adjusting the learning rate, carrying out 10 times per iteration, changing the learning rate to be 0.75 times of the original, taking the training set D All_train as input, setting the iteration times to be 51, and inputting the testing set D All_test for testing after each iteration to obtain the gesture motion recognition rate of a whole person. And taking the training set D All_n_train as input, setting the iteration number to be 51, and inputting the testing set D All_n_test for testing after each iteration to obtain the recognition rate of the gesture actions of the non-participators. The trained high-efficiency multidimensional attention neural network can be used for carrying out radar micro gesture recognition.
Step 110, inputting the range-doppler graph sequence of the detected gesture into a trained high-efficiency multidimensional attention neural network for gesture recognition.
In the radar micro gesture recognition method based on the high-efficiency multidimensional attention neural network, the detection echo data of the radar is processed into the distance Doppler graph sequence through two-dimensional Fourier transform to serve as network input, then the multidimensional attention neural network based on the attention of joint space, channels and time is designed to effectively extract the input distance Doppler graph sequence, the multiscale space characteristics of the feature graph are effectively extracted by utilizing the multiscale convolution kernel, the compression-excitation mechanism is applied to the channel dimension to generate the channel attention weight, the Softmax layer is further utilized to establish the channel interactivity, the time self-attention module is provided to model the global frame number in the time dimension to obtain global time clues, the radar echo data is effectively and fully utilized, the problems that the existing network is insufficient in utilization of gesture echo data in complex scenes and the parameter quantity is large are solved, and the method has important engineering application value.
In one embodiment, the radar echo data includes an intra-pulse fast time and an intra-pulse slow time; performing two-dimensional Fourier transform and filtering processing on radar echo data to obtain a range-Doppler graph sequence of the detected gesture, wherein the range-Doppler graph sequence comprises the following steps:
Performing two-dimensional Fourier transform on the intra-pulse fast time and the intra-pulse slow time of the radar echo data to obtain a function of the distance and the speed of a target scattering point;
And filtering zero frequency components in the function of the distance and the speed of the target scattering point by using mean value filtering to obtain a Doppler graph sequence of the detected gesture.
In one embodiment, performing a two-dimensional fourier transform on the fast and slow intra-pulse times of the radar echo data to obtain a function of distance and speed of the target scattering point, includes:
Performing two-dimensional Fourier transform on the fast and slow intra-pulse times of the radar echo data to obtain a function of the distance and speed of the target scattering point as
Wherein N represents FFT conversion of slow time dimension performed every N pulse repetition periods T p, A l represents intensity of the first scattering point, R l is distance from the first scattering point to radar, v l is speed of the first scattering point, gamma is modulation frequency of a transmitted signal, L represents total number of target scattering points, and f i,fd represents fast timeAnd a representation of the slow time t m in the frequency domain after fourier transform, corresponding to the distance and velocity, respectively.
In one embodiment, constructing a multidimensional attention module from a global max-pooling layer, a global average pooling layer, and a split-splice convolution module includes:
Constructing a channel attention module according to the global maximum pooling layer and the global average pooling layer;
constructing a space attention module according to the segmentation and concatenation convolution module;
the temporal attention module is built from the global averaging pooling layer.
In a specific embodiment, as shown in fig. 2, a multi-dimensional attention (MDA) module structure is shown, where GMP refers to a global max pooling layer, GAP refers to a global average pooling layer, SCC refers to a split-splice convolution module, and fig. 3 is a split-splice convolution module (SCC) structure is shown, where the SCC includes a plurality of multi-scale convolution kernels.
In one embodiment, a high-efficiency multidimensional attention neural network includes an input layer, an intermediate layer, and an output layer; constructing a high-efficiency multidimensional attention neural network according to a high-efficiency multidimensional attention block, a preset convolution layer, a maximum pooling layer, a global average pooling layer, a full connection layer and a Softmax layer, comprising:
Constructing an input layer of the efficient multidimensional attention neural network according to the convolution layer and the maximum pooling layer;
Constructing an intermediate layer of the high-efficiency multi-dimensional attention neural network by utilizing a plurality of high-efficiency multi-dimensional attention blocks comprising different multi-dimensional attention modules;
And constructing an output layer of the efficient multidimensional attention neural network according to the global average pooling layer, the full connection layer and the Softmax layer.
In a specific embodiment, as shown in fig. 5, EMDA represents an efficient multidimensional attention block, and the specific steps of constructing the efficient multidimensional attention network (EMDANet) are as follows:
S4.1 construction of input layer
S4.1.1 the input range-doppler plot sequence (3, 16, 256 and 256 represent the number of channels, frame number, height and width, respectively, of the feature plot sequence) is converted to 64, the number of channels is (1, 2), the output size is 64×16×128×128, by passing through a convolution layer of convolution kernel size (3, 7).
S4.1.2 the output size is 64 x 16 x 64 through the largest pooling layer of size (2, 2), step size (1, 2).
S4.2, wherein the middle layer is composed of four EMDA blocks with different structures, and the number of the four EMDA blocks containing the EMDA blocks is 3,4,6,3 in sequence.
S4.2.1 construction of EMDA1 block: the number of output channels of the first part of the 1 x1 convolutional layer in the EMDA block is set to 64, the number of output channels of the second part of the MDA block is set to 64, the number of output channels of the third partial 1 x1 convolutional layer is set to 256, the number of output channels of the identity mapping section is set to 256, and the output size is 256×16×64×64. The above procedure was repeated 2 times.
S4.2.2 construction of EMDA2 Block
S4.2.2.1 the number of output channels of the first part of the 1 x1 convolutional layer in the EMDA block is set to 128, the number of output channels of the second part of the MDA block is set to 128, downsampling is performed in steps (2, 2), the number of output channels of the third partial 1 x1 convolutional layer is set to 512, the number of output channels of the identity mapping section was set to 512, and the output sizes were 512×8×32×32.
S4.2.2.2 the number of output channels of the first part of the 1 x1 convolutional layer in the EMDA block is set to 128, the number of output channels of the second part of the MDA block is set to 128, the number of output channels of the third partial 1 x1 convolutional layer is set to 512, the number of output channels of the identity mapping section was set to 512, and the output sizes were 512×8×32×32. The above procedure was repeated 2 times.
S4.2.3 construction of EMDA3 blocks
S4.2.3.1 the number of output channels of the first part of the 1 x 1 convolutional layer in the EMDA block is set to 256, the number of output channels of the second part of the MDA block is set to 256, downsampling is performed in steps (2, 2), the number of output channels of the third partial 1 x 1 convolutional layer is set to 1024, the number of output channels of the identity mapping section is set to 1024, and the output size is 1024×4×16×16.
S4.2.3.2 the number of output channels of the first part of the 1x 1 convolutional layer in the EMDA block is set to 256, the number of output channels of the second part of the MDA module is set to 256, the number of output channels of the third partial 1x 1 convolutional layer is set to 1024, the number of output channels of the identity mapping section is set to 1024, and the output size is 1024×4×16×16. The above procedure was repeated 4 times.
S4.2.4 construction of EMDA4 Block
S4.2.4.1 the number of output channels of the first part of the 1 x1 convolutional layer in the EMDA block is set to 512, the number of output channels of the second part of the MDA block is set to 512, downsampling is performed in steps (2, 2), the number of output channels of the third partial 1 x1 convolutional layer is set to 2048, the number of output channels of the identity mapping section was set to 2048, and the output sizes were 2048×2×8×8.
S4.2.4.2 the number of output channels of the first part of the 1 x 1 convolutional layer in the EMDA block is set to 512, the number of output channels of the second part of the MDA block is set to 512, the number of output channels of the third partial 1 x 1 convolutional layer is set to 2048, the number of output channels of the identity mapping section was set to 2048, and the output sizes were 2048×2×8×8.
S4.3 output layer
S4.3.1 the result of S4.2 was pooled in a three-dimensional global average over time and space dimensions, with an output size of 2048 x 1.
S4.3.2 a fully-connected layer with the number of neurons being p is constructed, and a Softmax layer with the number of neurons being p is connected to the fully-connected layer for classifying p dynamic gestures.
In one embodiment, inputting a range-doppler plot sequence of a detected gesture into a trained high-efficiency multi-dimensional attention neural network for gesture recognition includes:
inputting the distance Doppler graph sequence of the detected gesture into a trained high-efficiency multidimensional attention neural network, preprocessing the distance Doppler graph sequence by an input layer, and outputting the distance Doppler graph sequence to an intermediate layer for feature extraction to obtain a multidimensional feature graph;
And after the multi-dimensional feature map is convolved, carrying out three-dimensional global average pooling on the convolved multi-dimensional feature map according to a global average pooling layer at an output layer, and carrying out gesture classification on the three-dimensional global average pooled multi-dimensional feature map according to a Softmax layer to obtain a recognition result.
In a specific embodiment, as shown in fig. 4, two convolution blocks are added in the multidimensional attention module to construct an efficient multidimensional attention block, when feature extraction is performed, a feature map is input into two branches to be convolved, one branch is a 1 x 1 convolution layer, a multidimensional attention module (MDA module), a 1 x 1 convolutional layer, another branch connected to the output of the first branch by a shortcut channel, the convolved multidimensional feature map is formed by adding the outputs of two branches, so as to solve the degradation problem, and the main steps are as follows:
For the input feature map sequence F, it is sequentially passed through a1 x1 convolutional layer, an MDA module, a1 x1 convolutional layer, a sequence F' of feature maps of equal size to F is obtained.
The input feature map sequences F and F' are subjected to identity mapping through quick connection and then added according to elements, so that the degradation problem is solved, and finally, the convolved multidimensional feature map is output as
output'=F+F'
As shown in fig. 8, all data measured were measured according to the number of people according to 7:3 are divided into training sets and testing sets, and the network quantity and the recognition rate are corresponding to 2D-ResNet-50, C3D, P3D-60,3D-ResNet-50 and EMDANet-50. Compared with the two-dimensional convolutional neural network, the three-dimensional convolutional neural network is added with time information, and the recognition rate is obviously increased. And because the attention mechanism is added in the space, the channel and the time dimension, compared with 3D-ResNet-50, the parameter quantity of EMDANet is reduced by 16.5%, the recognition rate reaches 95.2%, the required parameter quantity is less, the recognition accuracy is higher, and the radar echo data is fully utilized. Fig. 9 shows the measured data for all the data according to the number of people according to 7:3 into training set and testing set, 2D-ResNet-50, C3D, P3D-60,3D-ResNet-50 and EMDANet-50, and the recognition rate of EMDANet-50 is higher than that of other networks, and is higher than that of other networks by 90% at 35 th iteration.
In one embodiment, inputting a range-doppler graph sequence of a detected gesture into a trained high-efficiency multidimensional attention neural network, preprocessing the range-doppler graph sequence by an input layer, and outputting the preprocessed range-doppler graph sequence to an intermediate layer for feature extraction to obtain a multidimensional feature graph, wherein the method comprises the following steps:
Inputting the distance Doppler graph sequence of the gesture to be detected into a trained high-efficiency multidimensional attention neural network, preprocessing the distance Doppler graph sequence by an input layer, outputting the distance Doppler graph sequence to an intermediate layer, and extracting features of the distance Doppler graph sequence by using a spatial attention module to obtain a multiscale fusion feature graph;
the channel attention module is utilized to calculate the weight of the distance Doppler graph sequence, so as to obtain the channel weight corresponding to the feature graph;
performing feature extraction on the range-Doppler graph sequence by using a time attention module to obtain a time feature graph;
And multiplying the multi-scale feature map and the channel weight corresponding to the feature map point by point, and then adding the channel weight and the time feature map to obtain the multi-dimensional feature map.
In one embodiment, feature extraction is performed on a range-doppler plot sequence by using a spatial attention module to obtain a multi-scale fusion feature plot, including:
Feature extraction is carried out on the range-Doppler image sequence by utilizing the spatial attention module, and the multi-scale fusion feature image is obtained as
Fs=Conv(1×1×1,N→C)(Fs_all)
Where ,Fs_all=Cat([Fs1,Fs2,…,FsN]),Fsi=Conv(3×ki×ki,C'→1)(Fi)i=1,2…N,Fi denotes the range-doppler plot sequence, k i denotes the i-th split-splice convolution kernel size,Characteristic diagrams representing different scales are shown, N represents the total number of split joint convolution modules, C' represents the number of channels, and C represents the number of input channels.
In a specific embodiment, the specific process of feature extraction of the range-doppler plot sequence using the spatial attention module is as follows:
s2.1.1 for an input range-doppler plot sequence Wherein H, W, C, T represent its height, width, number of input channels, and number of frames, respectively. F is equally divided into N parts along the channel dimension, denoted as [ F 1,F2,…,FN ], each part having/>The ith group of feature map sequences are denoted/>i=1,2…N。
S2.1.2 extracting multi-scale space features by applying a multi-scale convolution kernel, and setting the number of output channels of each group of feature map sequences to be 1 to reduce the number of parameters, wherein the multi-scale feature map is as follows:
Fsi=Conv(3×ki×ki,C'→1)(Fi)i=1,2…N
wherein the k i is a variable which, Representing the ith convolution kernel size and feature maps of different scales, respectively.
S2.1.3 splicing the obtained multi-scale feature images:
Fs_all=Cat([Fs1,Fs2,…,FsN])
Wherein the method comprises the steps of
S2.1.3 different feature map sequences are fused by a1 x1 convolutional layer, and simultaneously setting the number of output channels as C:
Fs=Conv(1×1×1,N→C)(Fs_all)
Wherein the method comprises the steps of />
In one embodiment, the calculating the weights of the range-doppler plot sequence by using the channel attention module to obtain the channel weights corresponding to the feature plot includes:
carrying out global average pooling and global maximum pooling on the distance Doppler image sequence along the time dimension and the space dimension to obtain an average pooling characteristic image and a maximum pooling characteristic image;
splicing the channel dimensions of the average pooling feature map and the maximum pooling feature map to obtain a pooling feature map;
And fusing and exciting the spliced pooling feature images by using two full-connection layers to obtain channel weights corresponding to the feature images.
In a specific embodiment, the specific process of calculating the weights of the channels corresponding to the feature map by using the channel attention module to calculate the weights of the range-doppler map sequence is as follows:
s2.2.1 for the input feature map sequence F, carrying out global average pooling on the input feature map sequence along the time and space dimensions to obtain
S2.2.2 for the input feature map sequence F, carrying out global maximum pooling on the input feature map sequence along the time and space dimensions to obtain
S2.2.3 splicing F gc and F mc along the channel dimension to obtain
Fc=Cat[(Fgc,Fmc),C]
S2.2.4 fusion and excitation are carried out on the spliced features by utilizing two full-connection layers, so that the weight of the feature map sequence channel dimension is obtained.
wc=σ(W2δ(W1(Fc)))
Where delta represents the ReLU operation and,And/>Representing the fully connected layer, σ represents the Sigmoid function,/>Is the attention weight.
S2.2.4 pair w c are duplicated transformed into in time and space dimensionsMatching the size of F.
In one embodiment, the feature extraction of the range-doppler plot sequence by using the time-attention module, to obtain a time feature plot, includes:
feature extraction is carried out on the range-Doppler graph sequence by utilizing the time attention module, and the obtained time feature graph is as follows
Ft=σ(gtWt1)Wt2
Wherein,F represents a range-Doppler plot sequence, H and W represent the height and width, respectively, of the range-Doppler plot sequence,/>And/>Representing the full connection layer, σ represents GeLU operations, from which the/>, can be knownAnd/>Representing fully connected layers of different weights.
In a specific embodiment, the time attention module captures a cross-time relationship between frame numbers by using a multi-layer perceptron (multi-layer perceptron, MLP), and further obtains a global time cue, and a specific process of extracting features of the range-doppler image sequence by using the time attention module is as follows:
S2.3.1 for the input feature map sequence F, global averaging pooling is spatially applied.
Wherein the method comprises the steps of
S2.3.2 applies two full connection layers (W t1 and W t2) with the same dimension to carry out cross-frame hybrid sharing on g t, and obtains the relation of global time in the time dimension.
Ft=σ(gtWt1)Wt2
Wherein the method comprises the steps ofAnd/>Representing the fully connected layer, σ represents GeLU operations.
It should be understood that, although the steps in the flowchart of fig. 1 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of other steps or sub-steps of other steps.
In one embodiment, a computer device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for radar jog gesture recognition based on an efficient multidimensional attention neural network. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in FIG. 10 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment a computer device is provided comprising a memory storing a computer program and a processor implementing the steps of the method of the above embodiments when the computer program is executed.
In one embodiment, a computer storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method of the above embodiments.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (Doppler RAM), direct memory bus dynamic RAM (D Doppler RAM), and memory bus dynamic RAM (Doppler RAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (10)

1. A radar jog gesture recognition method based on a high-efficiency multidimensional attention neural network, the method comprising:
Acquiring radar echo data; the radar echo data comprises a plurality of gestures to be detected;
Performing two-dimensional Fourier transform and filtering processing on the radar echo data to obtain a range-Doppler graph sequence of the detected gesture; dividing the Doppler sequence into data according to a preset proportion to obtain a training set and a testing set;
Constructing a multidimensional attention module according to the global maximum pooling layer, the global average pooling layer and the segmentation and concatenation convolution module; the multidimensional attention module comprises a spatial attention module, a channel attention module and a time attention module;
Constructing a high-efficiency multidimensional attention block by utilizing the multidimensional attention module and a plurality of convolution blocks, and constructing a high-efficiency multidimensional attention neural network according to the high-efficiency multidimensional attention block, a preset convolution layer, a maximum pooling layer, a global average pooling layer, a full connection layer and a Softmax layer;
Training the high-efficiency multidimensional attention neural network by using the training set and the testing set to obtain a trained high-efficiency multidimensional attention neural network;
and inputting the range-Doppler graph sequence of the detected gesture into the trained high-efficiency multidimensional attention neural network for gesture recognition.
2. The method of claim 1, wherein the radar echo data includes an intra-pulse fast time and an intra-pulse slow time; performing two-dimensional Fourier transform and filtering processing on the radar echo data to obtain a range-Doppler graph sequence of the detected gesture, wherein the range-Doppler graph sequence comprises the following steps:
performing two-dimensional Fourier transform on the fast intra-pulse time and the slow intra-pulse time of the radar echo data to obtain a function of the distance and the speed of a target scattering point;
And filtering zero frequency components in the function of the distance and the speed of the target scattering point by using mean value filtering to obtain a Doppler graph sequence of the detected gesture.
3. The method of claim 1, wherein performing a two-dimensional fourier transform on the intra-pulse fast time and intra-pulse slow time of the radar echo data results in a function of distance and velocity of a target scattering point, comprising:
performing two-dimensional Fourier transform on the fast and slow intra-pulse times of the radar echo data to obtain a function of the distance and the speed of a target scattering point as follows
Wherein N represents FFT conversion of slow time dimension performed every N pulse repetition periods T p, A l represents intensity of the first scattering point, R l is distance from the first scattering point to radar, v l is speed of the first scattering point, gamma is modulation frequency of a transmitted signal, L represents total number of target scattering points, and f i,fd represents fast timeAnd a representation of the slow time t m in the frequency domain after fourier transform, corresponding to the distance and velocity, respectively.
4. A method according to any one of claims 1 to 3, wherein constructing a multi-dimensional attention module from the global max pooling layer, the global average pooling layer and the split-splice convolution module comprises:
Constructing a channel attention module according to the global maximum pooling layer and the global average pooling layer;
constructing a space attention module according to the segmentation and concatenation convolution module;
the temporal attention module is built from the global averaging pooling layer.
5. The method of claim 4, wherein the high-efficiency multi-dimensional attention neural network comprises an input layer, an intermediate layer, and an output layer; constructing a high-efficiency multidimensional attention neural network according to the high-efficiency multidimensional attention block, a preset convolution layer, a maximum pooling layer, a global average pooling layer, a full connection layer and a Softmax layer, and comprising:
Constructing an input layer of the high-efficiency multidimensional attention neural network according to the convolution layer and the maximum pooling layer;
Constructing an intermediate layer of the high-efficiency multi-dimensional attention neural network by utilizing a plurality of high-efficiency multi-dimensional attention blocks comprising different multi-dimensional attention modules;
and constructing an output layer of the high-efficiency multidimensional attention neural network according to the global average pooling layer, the full connection layer and the Softmax layer.
6. The method of claim 5, wherein inputting the range-doppler plot sequence of the detected gesture into the trained high-efficiency multi-dimensional attention neural network for gesture recognition comprises:
Inputting the distance Doppler graph sequence of the detected gesture into the trained high-efficiency multidimensional attention neural network, preprocessing the distance Doppler graph sequence by an input layer, and outputting the distance Doppler graph sequence to an intermediate layer for feature extraction to obtain a multidimensional feature graph;
And after the multi-dimensional feature map is convolved, carrying out three-dimensional global average pooling on the convolved multi-dimensional feature map according to a global average pooling layer at an output layer, and carrying out gesture classification on the three-dimensional global average pooled multi-dimensional feature map according to a Softmax layer to obtain a recognition result.
7. The method of claim 4, wherein inputting the range-doppler plot sequence of the detected gesture into the trained high-efficiency multi-dimensional attention neural network, preprocessing the input layer, and outputting the preprocessed input layer to an intermediate layer for feature extraction, and obtaining a multi-dimensional feature plot comprises:
Inputting the distance Doppler graph sequence of the detected gesture into the trained high-efficiency multi-dimensional attention neural network, preprocessing the distance Doppler graph sequence by an input layer, outputting the distance Doppler graph sequence to an intermediate layer, and extracting features of the distance Doppler graph sequence by using a spatial attention module to obtain a multi-scale fusion feature graph;
performing weight calculation on the distance Doppler graph sequence by using the channel attention module to obtain channel weights corresponding to the feature graphs;
Performing feature extraction on the range-Doppler graph sequence by using the time attention module to obtain a time feature graph;
and multiplying the multi-scale feature map and the channel weight corresponding to the feature map point by point, and then adding the multiplied channel weight and the time feature map to obtain the multi-dimensional feature map.
8. The method of claim 4, wherein the feature extraction of the range-doppler plot sequence using a spatial attention module results in a multi-scale fusion feature plot comprising:
Feature extraction is carried out on the distance Doppler image sequence by utilizing a spatial attention module, and a multi-scale fusion feature image is obtained as
Fs=Conv(1×1×1,N→C)(Fs_all)
Where ,Fs_all=Cat([Fs1,Fs2,…,FsN]),Fsi=Conv(3×ki×ki,C'→1)(Fi)i=1,2…N,Fi denotes the range-doppler plot sequence, k i denotes the i-th split-splice convolution kernel size,Characteristic diagrams representing different scales are shown, N represents the total number of split joint convolution modules, C' represents the number of channels, and C represents the number of input channels.
9. The method of claim 4, wherein weighting the range-doppler plot sequence with the channel attention module to obtain channel weights for a feature plot comprises:
Carrying out global average pooling and global maximum pooling on the distance Doppler image sequence along the time dimension and the space dimension to obtain an average pooling characteristic image and a maximum pooling characteristic image;
splicing the channel dimensions of the average pooling feature map and the maximum pooling feature map to obtain a pooling feature map;
And fusing and exciting the spliced pooling feature images by using two full-connection layers to obtain channel weights corresponding to the feature images.
10. The method of claim 4, wherein performing feature extraction on the range-doppler plot sequence with the time-attention module to obtain a time-feature plot comprises:
extracting features of the range-Doppler graph sequence by using the time attention module to obtain a time feature graph as
Ft=σ(gtWt1)Wt2
Wherein,F represents a range-Doppler plot sequence, H and W represent the height and width, respectively, of the range-Doppler plot sequence,/>And/>Representing the fully connected layer, σ represents GeLU operations.
CN202210551031.0A 2022-05-20 2022-05-20 Efficient multidimensional attention neural network-based radar micro gesture recognition method Active CN114895275B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210551031.0A CN114895275B (en) 2022-05-20 2022-05-20 Efficient multidimensional attention neural network-based radar micro gesture recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210551031.0A CN114895275B (en) 2022-05-20 2022-05-20 Efficient multidimensional attention neural network-based radar micro gesture recognition method

Publications (2)

Publication Number Publication Date
CN114895275A CN114895275A (en) 2022-08-12
CN114895275B true CN114895275B (en) 2024-06-14

Family

ID=82724596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210551031.0A Active CN114895275B (en) 2022-05-20 2022-05-20 Efficient multidimensional attention neural network-based radar micro gesture recognition method

Country Status (1)

Country Link
CN (1) CN114895275B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116758631B (en) * 2023-06-13 2023-12-22 杭州追形视频科技有限公司 Big data driven behavior intelligent analysis method and system
CN116509382A (en) * 2023-07-03 2023-08-01 深圳市华翌科技有限公司 Human body activity intelligent detection method and health monitoring system based on millimeter wave radar

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3364342A1 (en) * 2017-02-17 2018-08-22 Cogisen SRL Method for image processing and video compression
US10726062B2 (en) * 2018-11-30 2020-07-28 Sony Interactive Entertainment Inc. System and method for converting image data into a natural language description
KR102228524B1 (en) * 2019-06-27 2021-03-15 한양대학교 산학협력단 Non-contact type gesture recognization apparatus and method
CN111091045B (en) * 2019-10-25 2022-08-23 重庆邮电大学 Sign language identification method based on space-time attention mechanism
WO2021068470A1 (en) * 2020-04-09 2021-04-15 浙江大学 Radar signal-based identity and gesture recognition method
CN112329525A (en) * 2020-09-27 2021-02-05 中国科学院软件研究所 Gesture recognition method and device based on space-time diagram convolutional neural network
CN113850135A (en) * 2021-08-24 2021-12-28 中国船舶重工集团公司第七0九研究所 Dynamic gesture recognition method and system based on time shift frame

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Radar-Based Air-Writing Gesture Recognition Using a Novel Multistream CNN Approach;Shahzad Ahmed;IEEE Internet of Things Journal;20220708;全文 *

Also Published As

Publication number Publication date
CN114895275A (en) 2022-08-12

Similar Documents

Publication Publication Date Title
CN109949255B (en) Image reconstruction method and device
CN114895275B (en) Efficient multidimensional attention neural network-based radar micro gesture recognition method
Hazirbas et al. Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture
US20220215227A1 (en) Neural Architecture Search Method, Image Processing Method And Apparatus, And Storage Medium
Khan et al. SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network
CN113874883A (en) Hand pose estimation
CN110222760B (en) Quick image processing method based on winograd algorithm
KR102140805B1 (en) Neural network learning method and apparatus for object detection of satellite images
CN111476806A (en) Image processing method, image processing device, computer equipment and storage medium
US20230153965A1 (en) Image processing method and related device
CN117079098A (en) Space small target detection method based on position coding
Jain et al. Encoded motion image-based dynamic hand gesture recognition
Fang et al. SCENT: A new precipitation nowcasting method based on sparse correspondence and deep neural network
CN113534678B (en) Migration method from simulation of operation question-answering task to physical system
KR102637342B1 (en) Method and apparatus of tracking target objects and electric device
CN114049491A (en) Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium
CN111914809B (en) Target object positioning method, image processing method, device and computer equipment
CN112346056B (en) Resolution characteristic fusion extraction method and identification method of multi-pulse radar signals
He et al. From macro to micro: rethinking multi-scale pedestrian detection
Khan et al. Suspicious Activities Recognition in Video Sequences Using DarkNet-NasNet Optimal Deep Features.
CN113807330A (en) Three-dimensional sight estimation method and device for resource-constrained scene
Murata et al. Segmentation of Cell Membrane and Nucleus using Branches with Different Roles in Deep Neural Network.
CN113743189B (en) Human body posture recognition method based on segmentation guidance
JP2019125128A (en) Information processing device, control method and program
Ma et al. Har enhanced weakly-supervised semantic segmentation coupled with adversarial learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant