CN116304815A

CN116304815A - Motor imagery electroencephalogram signal classification method based on self-attention mechanism and parallel convolution

Info

Publication number: CN116304815A
Application number: CN202310212631.9A
Authority: CN
Inventors: 王丹; 周浩; 陈佳明; 许萌
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2023-03-07
Filing date: 2023-03-07
Publication date: 2023-06-23

Abstract

A motor imagery electroencephalogram signal classification method based on a multi-head self-attention mechanism and parallel convolution belongs to the field of computer software. Aiming at the problem of difficult feature extraction caused by low signal-to-noise ratio of the electroencephalogram signals, an EEGNet-improved network model, which is called EEG-MATCNet for short, is provided. Firstly, a parallel convolution layer is used for carrying out preliminary feature extraction on an original electroencephalogram signal, and convolution kernels with different scales can extract time features with different step sizes. Meanwhile, the attention weight of the electroencephalogram signals between the electrodes is calculated through a multi-head self-attention mechanism, so that the spatial characteristics can be better extracted during network training. In addition, the receptive field of the convolution kernel is improved through the time convolution network, so that the model can extract higher-level time characteristics. Experiments prove that the classification method provided by the invention can more effectively improve the characteristic extraction and classification performance of the motor imagery electroencephalogram signals.

Description

Motor imagery electroencephalogram signal classification method based on self-attention mechanism and parallel convolution

Technical Field

The invention discloses a motor imagery electroencephalogram signal classification method based on a self-attention mechanism and parallel convolution, which can be used for decoding motor imagery electroencephalogram signals and belongs to the field of computers.

Background

Brain-machine interface technology is the leading research direction of multidisciplinary fusion, where brain electrical signals are acquired and decoded by a device, converted into commands, and then forwarded to an output device to perform the required operations. And is widely applied to the fields of biomedical treatment, entertainment, education, intelligent home furnishing, military and the like. Brain activity may be recorded by various neuroimaging methods. These methods may be invasive or non-invasive. The most popular brain-machine interface non-invasive brain wave acquisition method at present is electroencephalogram (EEG) detection. The popularity of electroencephalograms benefits from the low cost of the device, reduced complications compared to invasive surgery, portability, ease of setup and use, and the possibility of directly measuring neural activity. In brain-computer interfaces based on electroencephalograms, motor Imagery (MI) is a very classical paradigm, where Motor Imagery electroencephalogram signals are electrical signals that are emitted on the scalp when a person imagines the movement of different parts of his body. When a person performs a task of motor imagery of the hand, the alpha (8-12 Hz) and beta (13-30 Hz) waves of the sensorimotor electroencephalogram signal on the opposite side of the motor hand decrease in amplitude, known as event-related desynchronization (ERD), while the alpha and beta waves of the sensorimotor electroencephalogram signal on the same side of the motor hand increase in amplitude, known as event-related synchronization (ERS). According to this rule, one's intention can be interpreted. Motor imagery is considered one of the most promising paradigms to aid rehabilitation in quadriplegia, spinal cord injury, and Amyotrophic Lateral Sclerosis (ALS) patients. Although brain-computer interface technology based on motor imagery paradigms has been widely applied in fields such as rehabilitation and medical treatment, the decoding performance of the brain-computer interface technology still cannot well meet the needs of practical application. Because the electroencephalogram signal is non-gaussian, non-stationary and nonlinear, the acquired EEG signal is susceptible to external noise (e.g., power frequency interference of electrical equipment) and internal noise (e.g., physiological sources, electro-oculogram signals). In addition, due to physiological differences of people, brain electrical signals of different subjects performing the same motor imagery task may be quite different; the brain electrical signals of the same subject at different times for the same imagination task may also vary considerably. Therefore, how to extract valid features from motor imagery electroencephalogram signals and perform correct decoding remains a challenging problem.

The flow of a common motor imagery electroencephalogram signal classification algorithm is shown in the following chart, and generally comprises: preprocessing, extracting features, selecting features and classifying the four parts. The traditional feature extraction method is mainly used for selecting the characteristics of the time domain, the frequency domain or the space domain of the electroencephalogram signals. For example: the temporal features of different time points or different time periods can be extracted in the time domain by means of means, variances, hjorth parameters, skewness, etc. The time-frequency domain characteristics of the original electroencephalogram signals can be extracted by utilizing wavelet transformation, power spectral density and fast Fourier transformation. The spatial features of the electroencephalogram signals are extracted using co-spatial modes (Common Spatial Pattern, CSP) and variants thereof. And classifying the extracted features by using methods such as linear discriminant analysis, a support vector machine, a neural network, a Bayesian classifier and the like. However, the conventional method generally requires abundant prior knowledge and a large number of feature selection processes, and with the popularity of deep learning, more and more researchers try to apply an end-to-end depth model to classification of motor imagery electroencephalogram signals and obtain good effects. EEGNet, lawhen et al propose a compact CNN architecture-based network that is rolled in the time dimension and convoluted in the depth in the space dimension, respectively, and that is stable in experimental performance in 4 different paradigms with a substantial reduction in training parameters in the network, and that can achieve good experimental results, and that can achieve higher accuracy in both the cross-test and single-test scenarios. Dai et al propose a convolutional neural network (HS-CNN) with a mixed convolutional scale, and the proposed method effectively solves the problem of limiting the classification effect by using a single convolutional scale in CNN, thereby further improving the classification accuracy. However, there is room for further optimization in the above studies.

Recently, a new variant of CNN, known as the Time Convolutional Network (TCN), was dedicated to time series modeling and classification. TCN performs better than cyclic networks such as LSTM and GRU in many sequence related tasks. TCN can exponentially enlarge the size of the receiving field with a linear increase in the number of parameters compared to typical CNN, and unlike RNN, it is not affected by vanishing or explosive gradients. Ingolfsson et al propose a TCN model named EEG-TCN that combines TCN with a well-known EEGNet architecture. The attention mechanism is an effort to mimic the behavior of the human brain, selectively focusing on some important elements, while ignoring others. Integrating the attention mechanism with the deep learning model helps to automatically (through learning) focus on the most important part of the input data. In 2017 *** researchers have proposed a purely attentive model with multiple attentions, consisting of multiple self-attentive layers. The self-attention mechanism can help the model focus on the most efficient channel information in the data, and multiple heads help focus on multiple locations, resulting in multiple attention representations. Thereby enhancing the learning of the deep learning network on the features. In contrast to convolutional neural networks, the self-attention mechanism can directly calculate information at different locations on each sequence and calculate a comprehensive representation of the sequence. The effect of fusing the self-attentive mechanism into motor imagery electroencephalogram classification is therefore very much studied.

Through discussing and analyzing the advantages and disadvantages of the prior method, the invention obtains a enlightenment and research thought, based on EEGNet, a new network improvement model (EEG-MATCNet) is provided, a parallel convolution layer is adopted to replace a common convolution layer in the EEGNet model to encode motor imagery electroencephalogram signals into a high-level time sequence, then the self-attention layer is used for highlighting the most valuable information in the time sequence, and finally TCN is used for extracting high-level temporal characteristics from the highlighted information for classification. Compared with EEGNet and HS-CNN models, the classification method provided by the invention can more effectively improve the decoding performance of the motor imagery electroencephalogram signals.

Disclosure of Invention

The invention provides a motor imagery electroencephalogram signal classification method based on a self-attention mechanism and parallel convolution, which can effectively improve the characteristic extraction and classification performance of motor imagery electroencephalogram signals. Aiming at the problem of difficult feature extraction caused by low signal-to-noise ratio of the electroencephalogram signals, a common convolution layer in an EEGNet model is replaced by a parallel time convolution layer so as to better perform feature extraction, thereby improving the classification accuracy; simultaneously focusing on the most effective channel information in the electroencephalogram data by using a self-focusing layer through a self-focusing mechanism; finally, the TCN module extracts the advanced temporal features from the highlighted information for classification. Compared with EEGNet, HS-CNN and other models, the method provided by the invention has higher classification accuracy.

In order to achieve the aim of the invention, through research, discussion and repeated practice, the method determines the final scheme as follows:

firstly, preprocessing an original motor imagery electroencephalogram data set, dividing the data set into a training set, a verification set and a test set, respectively inputting the training set, the verification set and the test set into a constructed EEG-MATCNet model for training and testing, finally obtaining a model classification result, evaluating the classification result, and verifying the effectiveness of the method.

The technical scheme of the invention comprises the following specific steps:

step 1, data preprocessing: firstly, carrying out common average reference on the acquired motor imagery electroencephalogram signals, carrying out band-pass filtering processing on the motor imagery electroencephalogram signals by using a band-pass filter, and then carrying out exponential moving average standardization on the filtered signals; dividing the preprocessed electroencephalogram signal data set for training into a training set and a verification set according to the ratio of 4:1 so as to carry out 5-fold cross verification;

step 2, constructing an EEG-MATCNET model: using a parallel multi-scale time convolution layer to replace a common convolution layer in an EEGNet model so as to obtain the characteristics of the electroencephalogram signals with the mixed time scale; adding a multi-head self-attention module MSA, calculating attention weights for electroencephalogram signals of different electrodes to strengthen extraction of spatial features, and finally extracting high-level temporal features from highlighted information through a TCN module to classify;

step 3, inputting the training set and the verification set in the step 1 into an EEG-MATCNet model for training;

and 4, inputting the test set in the step 1 into the trained model in the step 3 for classification, and evaluating the classification accuracy.

The invention has the following advantages:

1. compared with a network of a single-scale convolution layer, the network of the parallel multi-scale time convolution layer can be used for extracting local and long-range time characteristics of the electroencephalogram signals. Better precision and efficiency can be obtained, so that the accuracy of the motor imagery classification task is further improved.

2. The multi-head self-attention module is introduced, so that the most effective channel information in the electroencephalogram data can be focused more during network training, and multiple heads can be helpful for focusing on multiple positions, so that multiple attention representations can be generated. Meanwhile, the multi-head self-attention mechanism can calculate attention weights in parallel, and the loss of model performance is small.

3. The TCN module is introduced, the advanced time features are extracted from the time sequence, the receptive field can be continuously enlarged by stacking the TCN layers, and the problems of gradient disappearance and gradient explosion can be effectively avoided through the residual structure.

Drawings

FIG. 1A general flow chart of the invention

FIG. 2EEG-MATCNet network Structure

FIG. 3 is a schematic diagram of a time convolution network

Detailed Description

Aiming at the problems that the characteristic extraction is difficult and the classification is difficult due to low signal-to-noise ratio of the electroencephalogram signals, the invention provides a motor imagery electroencephalogram signal classification method based on a self-attention mechanism and parallel convolution. The time convolution layer with mixed scale is used to replace the common convolution layer in the EEGNet model, so that the time domain characteristics are extracted better, and the classification accuracy is improved. Meanwhile, a multi-head self-attention module MSA and a time convolution network TCN are added, so that the global characteristics of sequences can be focused during network training, the receptive field is enlarged, the performance of a model is further improved, and an efficient and better-performance deep learning method is provided for classifying motor imagery electroencephalogram signals. Fig. 1 can be broken down into the general flow chart of the present invention, and into the following steps.

Step one, data preprocessing and data set division.

And secondly, constructing an EEG-MATCNet model.

And thirdly, training the model by using the training set and the verification set.

And step four, testing the model effect and evaluating the classification accuracy.

Specific details of each step are set forth below:

step 1:

(1) Carrying out common average reference on the original brain electrical signals;

(2) Extracting 4-40Hz electroencephalogram signals through a 3-order Butterworth band-pass filter;

(3) Carrying out exponential sliding average standardization on the filtered electroencephalogram signals, wherein an attenuation factor is set to be 0.999;

(4) Dividing the training set into a training set and a verification set according to the ratio of 4:1 for later 5-fold cross verification;

(5) The electroencephalogram signals are selected in a segmented mode, each segment represents a complete motor imagery electroencephalogram task, and the length of each intercepted electroencephalogram signal is 4s;

step 2:

aiming at the problems of difficult feature extraction and difficult classification caused by nonlinearity, non-stability and low signal-to-noise ratio of an electroencephalogram signal, the invention provides a new model improvement method based on EEGNet, which is called EEG-MATCNET for short. The network structure of the method is shown in fig. 2, and is mainly summarized into four parts: parallel convolution layer, self-attention layer, time convolution network layer, full connection layer. The model is built here using Pytorch. Each of which is described in detail below:

(1) Parallel convolution layer

And performing preliminary time feature extraction on the input brain electrical signals by using convolution check of different scales through 3 branches. The optimal parallel structure obtained by multiple experiments is as follows: branch 1 uses 2 convolution kernels of size (1, 16) with step size 1. The branch 2 uses 4 convolution kernels with the convolution kernel size of (1, 32) and the step length of 1, the branch 3 uses 8 convolution kernels with the convolution kernel size of (1, 64) and the step length of 1 to perform feature extraction, and the convolution kernel filling modes of the 3 branches are all set to be the same. And then, the gradient disappearance in the network training is prevented through a batch normalization layer, meanwhile, the overfitting is reduced, and finally, the ELU activation function is used for helping the network to converge more quickly.

(2) Self-attention layer

Firstly, the electroencephalogram signal of each electrode extracted by the preliminary time features is converted into three vectors of a query vector (Q), a key vector (K) and a value vector (V) through a linear layer, then the query vector of each electrode and the key vectors of all the electrodes are subjected to dot product calculation through dot product scaling and attention calculation, and then the dimension d of the key vectors is passed _ｋ Normalization is carried out to obtain the attention weight. And calculating each query vector in the sequence to obtain a vector with the same length as the input sequence and the same dimension as the dimension of the weight matrix. The calculation formula of the output vector obtained in the above process is as follows:

after the feature vector strengthened along the space axis is obtained, flattening the electroencephalogram feature of the C dimension into 1 dimension by deep convolution with the convolution kernel size (C, 1) and the step length of 1, wherein C represents the number of electrodes, then reducing the sampling rate by the obtained feature through an average pooling layer with the kernel size (1, 4) and the step length of (1, 4), and randomly discarding the parameter learned in the upper layer network by introducing a Dropout mechanism in order to prevent the model from being over fitted, wherein the discarding proportion is set to be 0.5. And inputting the features into a separable convolution layer, wherein the separable convolution comprises two operations of depth convolution and point-by-point convolution, the convolution kernel size of the depth convolution is set to be (1, 16), the convolution step size is set to be 1, the filling mode is set to be the same, the convolution kernel size of the point-by-point convolution is set to be (1, 1), the convolution step size is set to be 1, and the filling mode is the same. And then sequentially processing by a batch normalization layer, an activation function layer, an average pooling layer and a random discarding parameter layer. The average pooling layer size and step size are (1, 8) to reduce the number of parameters. The proportion of random discard was set to 0.5 for preventing overfitting.

(3) Time convolutional network layer

In order to expand the receptive field of the convolution kernel, a 2-layer time convolution module is introduced, and the problem of gradient disappearance possibly caused by net training is avoided by using a residual connection mode. The convolution kernel size of the causal dilation convolution of the layer 1 time convolution module is set to 4, and the dilation factor is set to 1, so that each point of the network output contains characteristic information of the first four points. The convolution kernel size of the causal dilation convolution of the layer 2 temporal convolution module is set to 4 and the dilation factor is set to 2, which further expands the receptive field of the convolution kernel. A specific network structure of the time convolution network layer is shown in fig. 3 of the specification.

(4) Full connection layer

And superposing the obtained advanced time characteristics and inputting the superposed advanced time characteristics into the full-connection layer. Meanwhile, the maximum norm constraint is added to the full-connection layer for regularization treatment, the maximum norm value is set to be 0.25, so that the overfitting phenomenon is prevented, and the generalization capability of the model is improved.

And finally, inputting the motor imagery task type into a Softmax classifier to classify the motor imagery task type to obtain the final judgment motor imagery task type.

Step 3:

in the training stage, a 5-fold cross validation method is used, the training set is divided into 5 parts on average, 5 experiments are carried out, 4 different data are taken out each time as the training set, and the rest is taken as the validation set. Inputting the training set into an EEG-MATCNet model for training, inputting 64 sections of EEG signals into a network for training each time, iterating 1000 rounds, adopting a cross entropy loss function to record an optimal loss value in the training process, if the loss value of 300 rounds is lower than the optimal loss value, exiting the iteration in advance, recording the average accuracy of the training set and the verification set, and storing the weight of the optimal model. And in the network training process, an Adam optimizer is adopted to relieve gradient oscillation, and the learning rate is set to be 0.001. And respectively carrying out the model training and testing on 9 subjects to obtain 9 groups of verification set accuracy, and recording the average value as the final model accuracy.

Step 4:

inputting the test set in the step 1 into the trained model in the step 3 for classification recognition, and evaluating the classification accuracy.

The data set and experimental results used in the method of the invention are described as follows:

1. data set

The present invention was run using two published data sets BCI Competition IV Dataset a and 2b, where the time profile of the data sets is shown in fig. 3, all of the experimental data has been pre-processed using a 0.5-100Hz band pass filter.

The 2a dataset contains 4 types of motor imagery electroencephalogram tasks, left hand, right hand, both feet and tongue. Provided by 9 healthy subjects. Each subject required 2 runs of experiments, each run including 288 motor imagery tasks, 72 runs of each motor imagery task, with run 1 being used as a training set and run 2 being used as a test set. These brain electrical signals were collected from 22 electrodes at a sampling rate of 250Hz, and data from 0.5 seconds to 2.5 seconds after presentation of the prompt were extracted as one sample. All samples have been labeled (i.e., the motor imagery of which location the sample corresponds to is marked).

The 2b dataset contains 2 types of motor imagery electroencephalogram signals for the left and right hands of 9 subjects. These brain electrical signals are acquired from 3 electrodes, again at a sampling rate of 250Hz. For each subject, the motor imagery task was divided into 5 sessions. And 2a

The data sets are different, the first 2 sessions in the 2b data set are tested under the condition of no feedback, namely, the brain electricity imagination data without visual feedback, and the last 3 sessions are brain electricity imagination data containing visual feedback. For the 2b dataset, data from 0.5 seconds to 4 seconds after presentation of the prompt were extracted as one sample.

Because the difference of the electroencephalogram characteristics of different subjects is large, the classification experiment of the electroencephalogram signals completed by the invention needs to calculate the classification accuracy rate for each subject independently, and the average value of the classification accuracy rates of a plurality of subjects is calculated as the performance index of the model.

2. Experimental results and discussion

In order to verify the effectiveness and versatility of the method of the present invention, a comparison experiment and an ablation experiment were performed on the public data sets 2a and 2b, respectively, with the following experimental results:

cross-dataset comparison experiments the new method and EEGNet method proposed by the present invention were compared using the 2a and 2b datasets, respectively, with experimental results shown in the following table:

table 1 results of cross dataset comparative experiments

The experimental results show that the average classification accuracy of EEG-MATCNet in 9 tested data sets of 2a and 2b is respectively due to EEGNet, the accuracy of the EEG-MATCNet can be improved by 6.45% at most in the data set of 2a, and the accuracy of the EEG-MATCNet can be improved by 8% at most in the data set of 2 b.

To verify the effect of the self-attention layer, the proposed method of the invention was replaced by attention mechanisms on the 2a dataset, respectively. Experimental data are shown in the following table:

table 2a data set ablation experiment results

From table 2, it can be seen that the classification accuracy of the model is higher than EEGNet, but lower than the full method, with the self-attention mechanism, parallel convolution and time convolution networks removed, thus demonstrating that these 3 schemes are all effective.

Claims

1. A motor imagery electroencephalogram signal classification method based on a self-attention mechanism and parallel convolution is characterized by comprising the following steps:

step 1, data preprocessing: performing band-pass filtering processing on the motor imagery electroencephalogram signals by using a band-pass filter, and then performing exponential moving average standardization on the filtered signals; dividing an electroencephalogram signal data set into a training set, a verification set and a test set;

step 2, constructing an EEG-MATCNet model; the parallel convolution layer is used for replacing a common convolution layer in the EEGNet model to extract multi-scale time features, a spatial self-attention mechanism is added to enable a network to better extract the spatial features, and a time sequence convolution network is added to extract advanced time features;

step 3, inputting the training set and the verification set in the step 1 into EEG-MATCNET for training;

and 4, inputting the test set in the step 1 into the trained model in the step 3 for classification, and evaluating classification accuracy.

2. The motor imagery electroencephalogram classification method based on a self-attention mechanism and parallel convolution according to claim 1, wherein:

the step 1 specifically comprises the following steps:

(3) Performing exponential sliding average standardization on the filtered electroencephalogram signals, wherein an attenuation factor is set to be 0.999 so as to reduce the influence of numerical value difference on model effect;

(5) And selecting the electroencephalogram signals in a segmented way, wherein each segment represents a complete motor imagery electroencephalogram task, and the length of each intercepted electroencephalogram signal is 4s.

3. The motor imagery electroencephalogram classification method based on a self-attention mechanism and parallel convolution according to claim 1, wherein:

the step 2 is specifically as follows:

the specific structure of EEG-MATCNet is mainly summarized in four parts: parallel convolution layer, self-attention layer, time convolution network layer, full connection layer; the model is built here using Pytorch; each of which is described in detail below:

(1) Parallel convolution layer

Performing preliminary time feature extraction on the input brain electrical signals by using convolution check of different scales through 3 branches; the optimal parallel structure obtained by multiple experiments is as follows: branch 1 uses 2 convolution kernels of size (1, 16) with step size 1; the method comprises the steps that 4 convolution kernels with the convolution kernel sizes of (1, 32) and the step length of 1 are used for branch 2, 8 convolution kernels with the convolution kernel sizes of (1, 64) and the step length of 1 are used for feature extraction for branch 3, and the convolution kernel filling modes of the 3 branches are all set to be identical; then, gradient disappearance in network training is prevented through a batch normalization layer, and finally, an ELU activation function is used for helping the network to converge more quickly;

(2) Self-attention layer

Firstly, converting an electroencephalogram signal of each electrode extracted by preliminary time features through a linear layer to obtain three vectors, namely a query vector (Q), a key vector (K) and a value vector (V), calculating a dot product of the query vector of each electrode and the key vectors of all electrodes through scaling dot product attentiveness, and normalizing the dimension of the key vectors to obtain the attentiveness weight; each query vector in the sequence is calculated to obtain a vector with the same length as the input sequence and the same dimension as the dimension of the weight matrix; the calculation formula of the output vector obtained in the above process is as follows:

flattening the C-dimensional electroencephalogram features into 1 dimension by deep convolution with a convolution kernel of size (C, 1) and a step length of 1 after the feature vectors strengthened along the space axis are obtained, wherein C represents the number of electrodes, then reducing the sampling rate by an average pooling layer with the kernel of size (1, 4) and the step length of (1, 4), and randomly discarding the parameters learned in the upper layer network by introducing a Dropout mechanism in order to prevent the model from being over-fitted, wherein the discarding proportion is set to be 0.5; then inputting the features into a separable convolution layer, wherein the separable convolution comprises two operations of depth convolution and point-by-point convolution, the convolution kernel size of the depth convolution is set to be (1, 16), the convolution step length is set to be 1, the filling mode is set to be the same, the convolution kernel size of the point-by-point convolution is set to be (1, 1), the convolution step length is set to be 1, and the filling mode is the same; then sequentially processing by a batch normalization layer, an ELU activation function layer, an average pooling layer and a random discarding parameter layer; the size and the step length of the average pooling layer are (1, 8) so as to reduce the parameter number; the proportion of random discard was set to 0.5 for preventing overfitting;

(3) Time convolutional network layer

In order to expand the receptive field of the convolution kernel, a 2-layer time convolution module is introduced, and the problem of gradient disappearance possibly caused by net training is avoided by using a residual connection mode; the convolution kernel size of causal expansion convolution of the layer 1 time convolution module is set to be 4, and the expansion factor is set to be 1, so that each point output by the network contains characteristic information of the first four points; the size of a convolution kernel of causal expansion convolution of the layer 2 time convolution module is set to be 4, and the expansion factor is set to be 2, so that the receptive field of the convolution kernel is further enlarged;

(4) Full connection layer

Superposing the obtained advanced time features and inputting the superimposed advanced time features into a full-connection layer; meanwhile, the maximum norm constraint is added to the full-connection layer for regularization treatment, and the maximum norm value is set to be 0.25 so as to prevent the phenomenon of overfitting; and finally, inputting the motor imagery task type into a Softmax classifier to classify the motor imagery task type to obtain the final judgment motor imagery task type.

4. The motor imagery electroencephalogram classification method based on a self-attention mechanism and parallel convolution according to claim 1, wherein:

the step 3 is specifically as follows:

in the training stage, a 5-fold cross verification method is used, the training set is divided into 5 parts on average, 5 experiments are carried out, 4 different data are taken out each time as the training set, and the rest part is taken as the verification set; inputting a training set into an EEG-MATCNet model for training, inputting 64 sections of EEG signals into a network for training each time, iterating 1000 rounds, adopting a cross entropy loss function to record an optimal loss value in the training process, if the loss value of 300 rounds is lower than the optimal loss value, exiting the iteration in advance, recording the average accuracy of the training set and the verification set, and storing the weight of the optimal model; an Adam optimizer is adopted to relieve gradient oscillation in the network training process, and the learning rate is set to be 0.001; and respectively carrying out the model training and testing on 9 subjects to obtain 9 groups of verification set accuracy, and recording the average value as the final model accuracy.

5. The motor imagery electroencephalogram classification method based on a self-attention mechanism and parallel convolution according to claim 1, wherein:

the step 4 is specifically as follows: