CN117195099A

CN117195099A - Electroencephalogram signal emotion recognition algorithm integrating multi-scale features

Info

Publication number: CN117195099A
Application number: CN202311156664.2A
Authority: CN
Inventors: 杜秀丽; 孟一飞; 邱少明; 吕亚娜; 刘庆利
Original assignee: Dalian University
Current assignee: Dalian University
Priority date: 2023-09-08
Filing date: 2023-09-08
Publication date: 2023-12-08

Abstract

The invention provides an electroencephalogram signal emotion recognition algorithm integrating multi-scale features, which effectively improves emotion recognition performance based on electroencephalogram signals by introducing a deformable convolution, an attention mechanism and a bottom-up feature pyramid network. The method comprises the following steps: converting an original electroencephalogram signal into a three-dimensional feature matrix, inputting the three-dimensional feature matrix into a multi-scale deformable convolution interaction attention network model based on a residual error network for training, and finally predicting the emotion type of the electroencephalogram signal through a classifier; the network model adopts deformable convolution as a core component for feature extraction, and captures important information among channels by using an efficient channel attention mechanism to optimize feature extraction, a bottom-up feature pyramid network is applied to the top of the network model to fuse multi-scale features, a bidirectional gating circulation unit is introduced into the network model, and context semantic information is extracted from the forward direction and the reverse direction.

Description

Electroencephalogram signal emotion recognition algorithm integrating multi-scale features

Technical Field

The invention belongs to the field of deep learning, relates to emotion recognition technology, and particularly relates to an electroencephalogram emotion recognition algorithm integrating multi-scale features.

Background

Emotion is generated from human mental activities and has profound effects on human mental states, cognition and decisions. Emotion recognition has wide application and potential in the field of man-machine interaction, because effective emotion recognition can improve the intelligent level of a computer. In emotion recognition, facial expression and voice signals are easily influenced by subjective control of an individual, and brain electrical signals are not influenced by subjectivity of the individual, so that the actual emotion state of the individual can be reflected more objectively. The realization of reliable recognition is challenging due to the characteristics of electroencephalogram (EEG) signals, such as weak amplitude, complex background noise, randomness, significant individual variability, and the abundance of temporal and spatial information involved in the formation process.

A Residual Network (res net) is a deep convolutional neural Network structure proposed by He et al in 2015, and aims to solve the problems of gradient disappearance, gradient explosion and the like in a deep Network. The core idea is to introduce residual connection, and through cross-layer direct information transmission, the information loss problem is effectively reduced, so that the network can train deeper layers. The advent of ResNet has greatly driven the development of deep neural networks, enabling models to achieve better performance on more complex tasks.

A feature pyramid network (Feature Pyramid Network, FPN) is a network structure for multi-scale feature fusion, proposed by Lin et al in 2017. By constructing a pyramid type feature level, features from different scales are fused, so that better feature expression is realized in an image analysis task. The FPN achieves remarkable results in the fields of target detection, image segmentation and the like, and improves the adaptability of the model to different scales and object sizes.

These two types of network models have some disadvantages although they have good effects on electroencephalogram emotion recognition. Firstly, information is easy to be lost in the process of mapping the low-dimensional feature matrix to the high-dimensional feature matrix. Secondly, the time-frequency domain and spatial domain characteristics of the signals are not extracted sufficiently, and the shape of the traditional convolution kernel can only be square or rectangular and cannot be changed dynamically according to the identification target. Finally, these studies often treat the different electrode channels as independent individuals, with insufficient correlation between electrode channels. These deficiencies result in poor performance of the electroencephalogram emotion recognition algorithm.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides an electroencephalogram signal emotion recognition algorithm fused with multi-scale features, and the emotion recognition performance based on electroencephalogram signals is effectively improved by introducing a deformable convolution, an attention mechanism and a bottom-up feature pyramid network.

The invention adopts the technical proposal for solving the technical problems that:

an electroencephalogram signal emotion recognition algorithm integrating multi-scale features comprises the following steps: converting an original electroencephalogram signal into a three-dimensional feature matrix, inputting the three-dimensional feature matrix into a multi-scale deformable convolution interaction attention network model based on a residual error network for training, and finally predicting the emotion type of the electroencephalogram signal through a classifier; the network model adopts deformable convolution as a core component for feature extraction, and captures important information among channels by using an efficient channel attention mechanism to optimize feature extraction, a bottom-up feature pyramid network is applied to the top of the network model to fuse multi-scale features, a bidirectional gating circulation unit is introduced into the network model, and context semantic information is extracted from the forward direction and the reverse direction.

Further, the step of converting the original electroencephalogram signal into the three-dimensional feature matrix includes:

dividing the back 60s electroencephalogram signals by using a sliding window with the front 3s electroencephalogram signals as a reference, wherein the step length of the sliding window is set to be 1s, no repetition exists among the divided electroencephalogram signal fragments, and each fragment is distributed with the same label as the original electroencephalogram signal; then, dividing each segment into a plurality of frequency bands by adopting a Butterworth band-pass filter, and respectively extracting differential entropy characteristics of an electroencephalogram signal channel from different frequency bands; and then converting all the electroencephalogram signal channels into a 2D matrix, putting differential entropy features extracted from different electroencephalogram signal channels into corresponding positions in the 2D matrix according to relative position coordinates of the differential entropy features, setting unused positions in the 2D matrix as 0, constructing a three-dimensional feature matrix, and finally inputting the constructed three-dimensional feature matrix into the multi-scale deformable convolution interaction attention network model based on the residual error network.

Furthermore, the multi-scale deformable convolution interaction attention network model based on the residual network is provided with a four-layer residual structure, an efficient channel attention mechanism is introduced into the four-layer residual structure, traditional convolution is used in the first three-layer residual structure, and the traditional convolution in the fourth-layer residual structure is replaced by the deformable convolution; firstly, inputting a three-dimensional feature matrix by a network model, and after convolution operation, sequentially passing through a first layer residual error structure, a second layer residual error structure, a third layer residual error structure and a fourth layer residual error structure; the three-dimensional feature matrix output by each layer of residual structure is subjected to convolution operation of a bottom-up feature pyramid network to enable the dimensions of each three-dimensional feature matrix to be consistent, and finally a final electroencephalogram emotion type result is obtained through a two-way gating circulating unit, a full-connection layer and a softmax layer.

Further, the three-dimensional feature matrix with the network model input of 9 multiplied by 4 is changed into the size of 9 multiplied by 64 after the convolution operation with the convolution kernel size of 7 multiplied by 7, then the size of 9 multiplied by 64 is kept unchanged through the first layer residual structure, the residual structure of the second layer is changed into a size of 5 multiplied by 128, the residual structure of the third layer is changed into a size of 3 multiplied by 256, and the residual structure of the fourth layer is changed into a size of 2 multiplied by 512; and then, the three-dimensional feature matrix output by each layer of residual structure is subjected to convolution operation with the convolution kernel size of 1 multiplied by 1 by 256 in a bottom-up feature pyramid network so that the dimensions of each three-dimensional feature matrix are consistent.

Further, the three-dimensional feature matrix output by each layer of residual structure is subjected to convolution operation with the convolution kernel size of 1×1 by 256 from bottom to top of the feature pyramid network, so that the dimensions of each three-dimensional feature matrix are consistent, specifically:

the three-dimensional feature matrixes obtained after the three-dimensional feature matrixes output by the first layer residual structure and the fourth layer residual structure are subjected to the bottom-up feature pyramid network convolution operation are sequentially marked as T2, T3, T4 and T5, and downsampling processing is carried out from T2:

wherein m is the size of the output feature matrix; n is the size of the input feature matrix; p represents padding, which is the size of the filler pixel; f represents filter, which is the size of the convolution kernel; s represents strades, which is a sliding step length; after downsampling the T2, tensor addition is carried out on the T3, so that a three-dimensional feature matrix D3 is obtained; and D3 is processed in the same way, and added with T4 to obtain D4, so that a characteristic diagram D5 is obtained, and finally, a three-dimensional characteristic matrix P5 with smaller size and more abundant information is obtained through convolution of 3*3.

The beneficial effects of the invention include:

a modified multi-scale deformable convolution based Residual Network is used to interwork an attention Network model (Residual Network based Multi-scale Deformable Convolutional Interacting Attention Network, mdcnarresnet) that combines the ideas of deformable convolution (Deformable Convolution, DCN), feature pyramid Network (Feature Pyramid Network, FPN) and efficient channel attention mechanism (Efficient Channel Attention, ECANet) techniques and inherits the Residual Network (ResNet) denominated mdcnarresnet. In this network model, ECANet is used first to capture important information between channels, which is then combined with DCN to obtain richer spatial features, DCN operation helps to capture nonlinear changes in the EEG signal, and allows the convolution kernel to adaptively sample according to the spatial distribution of the data. In addition, a bi-directional gating circulation unit (Bidirectional Gated Recurrent Unit, biGRU) is introduced into the network model, and context semantic information is extracted from the forward direction and the reverse direction, so that fusion of multi-scale spatial features and context semantic features is realized. On top of the network model, a bottom-up feature pyramid network (BU-FPN) is applied to further fuse multi-scale features, and the network model gradually extracts and fuses features of different scales through a bottom-up downsampling process, so that the capturing capability of the model on details and whole information is improved. Finally, emotion classification is predicted by the full connectivity layer and softmax classification layer. Based on this, the scheme has the following characteristics and advantages:

the algorithm utilizes a multi-scale feature fusion method, can extract information from the electroencephalogram signals with different time scales and sampling frequencies, and allows the model to more comprehensively understand the electroencephalogram signals, so that the accuracy of emotion recognition is improved, the emotion recognition generally needs to consider the changes of feeling long-term emotion from instantaneous emotion, and the multi-scale feature fusion is helpful for capturing the changes. The residual network and the deformable convolution are beneficial to learning advanced features from the original electroencephalogram signals, the sensitivity of the model to emotion related information is improved, the residual network is also beneficial to relieving the problem of gradient disappearance, and training is more stable. The introduction of the high-efficiency channel attention mechanism is beneficial to the recognition of the most important channel (electrode) information in the electroencephalogram signals by the model, so that the influence of noise is reduced, the robustness of emotion recognition is improved, and the model can pay attention to the electroencephalogram signals related to emotion better. Features from different convolution layers can be fused by using the feature pyramid network, so that the model can process information of multiple abstract levels at the same time, the generalization capability of the model is improved, and the model can be better adapted to brain electrical signals of different individuals and emotion states. The two-way gating circulation unit is introduced to allow the model to extract context semantic information from the forward direction and the reverse direction, so that the model is facilitated to better understand time dependency information in the electroencephalogram signal, and emotion recognition performance is improved.

In sum, the electroencephalogram signal emotion recognition algorithm with the multi-scale feature fusion improves accuracy and robustness of emotion recognition, and is remarkably improved in emotion recognition tasks.

Drawings

FIG. 1 is a flow chart of an algorithm of the present invention;

FIG. 2 is a flow chart of the construction of the three-dimensional feature matrix of the present invention;

FIG. 3 is a plan view of the International 10/20 System Standard and a 2D matrix diagram thereof;

FIG. 4 is a diagram of the overall structure of the network model of the present invention;

FIG. 5 is a diagram of a four-layer residual error architecture of the network model of the present invention;

FIG. 6 is a flow chart of the bottom-up feature pyramid network convolution operation of the present invention;

FIG. 7 shows recognition accuracy of different K values in embodiment 2;

fig. 8 is an overall performance diagram of the network model of the present invention on a DEAP dataset.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

Example 1

An electroencephalogram signal emotion recognition algorithm fused with multi-scale features is provided, and the overall algorithm structure is shown in figure 1.

First, referring to fig. 2, a three-dimensional feature matrix is constructed by extracting differential entropy features from an original electroencephalogram (EEG) signal:

in order to generate a 3D feature matrix from the original multi-channel EEG signals, a sliding window with the length of 1s is adopted to segment the back 60s EEG signals; the step length of the sliding window is set to be 1s, and the window is dividedWithout repetition between EEG signal samples, each segment is assigned the same label as the original EEG signal. Then, each segment is divided into the following frequency bands (θ [4 to 8Hz],α[8～14HZ],β[14～31Hz]And gamma [ 31-45 Hz]) Differential entropy (Differential Entropy (DE)) features of the EEG channel are extracted from the different frequency bands, respectively. The differential entropy feature can distinguish low-frequency and high-frequency energy of the electroencephalogram signal, which is shannon information entropy-sigma _x Generalized form of p (x) log (p (x)) on continuous variables.

Assume that an original EEG signal is represented as S _n ∈R ^m×r Where m and r represent the number of electrodes and the sampling rate of the original EEG signal, respectively. For each set of EEG sampled signals we calculate the DE characteristics of each band with a sliding window of 1s, thus the EEG segments are converted into DE segments D _n ∈R ^m×d Where d represents the number of frequency bands, which is set to 4 in the present embodiment.

Thereafter, to utilize the spatial information of the electrodes, all EEG channels are converted into a 2D mesh map matrix. The present embodiment uses 32 channels of EEG data, 32 channels being Fp1, AF3, F7, FC5, FC1, C3, T7, CP5, CP1, P3, P7, PO3, O1, oz, pz, fp2, AF4, fz, F4, F8, FC6, FC, cz, C4, T8, cp6, cp2, P4, P8, PO4, O2, respectively, as shown in FIG. 3, which shows a plan view of the International 10/20 system standard and its 2D matrix. The left side of the figure is the international 10/20 system, wherein EEG electrodes marked with filled circles are test points used in the DEAP dataset, and the filled circles are test points not used. In the feature matrix, the frequency domain features extracted from different electroencephalogram channels are put into corresponding positions in the matrix according to the relative position coordinates, and the positions of unused electrodes in the matrix are set to be 0. Thus DE fragment D _n Is converted into X _n ∈R ^h×w×d Where h and w are the height and width of the 2D matrix, the present embodiment sets h= 9,w =9. And finally, inputting the constructed three-dimensional feature matrix into a neural network model.

Based on the above embodiments, further detailed description is given: in the DEAP dataset, the raw EEG signal was expressed as 32 (subs) by 40 (trials) by 40 (channels) by 8064 (samples), where 8064 represents 128 (samples) by 63(s) and the label Lables represents 40 (trials) by 4. The raw data was preprocessed to extract the required 32 EEG channels from the 40 channels, and the example uses the first 3 seconds as a benchmark and 60s of the EEG signal as experimental data, the preprocessed data being expressed as 32 (subs) x 40 (trials) x 32 (channels) x 7680 (samples) due to the delayed response in human vision. The tag selects two dimensions, namely, potency and arousal, namely, 40 (three) x 2.

In this embodiment, the EEG sequence is segmented in a non-overlapping manner by using a 1s duration, i.e. the samples are divided, 60 segments are obtained in each test, each segment contains 128 sampling points, each sampling point contains 32 channels, i.e. each tested EEG data can be expressed as 40×128×60×32, and dimension transformation is performed on each tested EEG data to obtain 2400×32×128 electroencephalogram data, each tested 2400 EEG segments are total, and each segment has a size of 32×128. The labels are transformed to the same dimension, which can be expressed as 2400 x 1.

And extracting differential entropy features from the four frequency bands of the original features, converting the data of 32 channels into a two-dimensional network structure according to the method to obtain 128×2400×9×9 data representation, and splicing the 4 features to obtain a three-dimensional feature matrix with dimensions of 307200×9×9×4, namely, the number of samples of an input depth model is 307200 and the corresponding label is 307200×1.

Secondly, constructing an MDCNARENet-BiGRU network framework:

the MDCNARENet-BiGRU network framework is provided with four layers of residual structures, an efficient channel attention mechanism is introduced into the four layers of residual structures, traditional convolution is used in the first three layers of residual structures, and the traditional convolution in the fourth layer of residual structures is replaced by deformable convolution; firstly, inputting a three-dimensional feature matrix by a network model, and after convolution operation, sequentially passing through a first layer residual error structure, a second layer residual error structure, a third layer residual error structure and a fourth layer residual error structure; the three-dimensional feature matrix output by each layer of residual structure is subjected to convolution operation of a bottom-up feature pyramid network to enable the dimensions of each three-dimensional feature matrix to be consistent, and finally a final electroencephalogram emotion type result is obtained through a two-way gating circulating unit, a full-connection layer and a softmax layer.

First, a deformable convolution is employed as a core component of feature extraction. The deformable convolution can flexibly adapt to characteristic patterns within different receptive fields compared to conventional convolutions, thereby better capturing subtle changes in the EEG signal. This capability is particularly critical for emotion recognition tasks, as the expression of emotional states may vary over different frequency ranges and time scales. By introducing deformable convolution, the model is better able to capture these complex patterns and variations, thereby improving the accuracy of emotion recognition. Second, attention mechanisms were introduced to enhance the expressive power of features. The attention mechanism can adaptively adjust the weights of the features so that the model can focus more on important information related to the emotional state. In the model of the scheme, efficient channel attention (ECANet) is adopted to optimize feature extraction, so that the model can better focus on features which are significant for emotion recognition. The introduction of such mechanisms further enhances the expressive power of the model, helping to more accurately capture mood-related information in the EEG signal. Finally, a bottom-up feature pyramid network (BU-FPN) is adopted to realize multi-scale feature fusion.

Referring to fig. 4, first, a network model is input as a 9×9×4 three-dimensional feature matrix. The convolution operation with the size of 7×7 of 64 convolution kernels is followed by a 9×9×64 size, then a 9×9×64 size is followed by a 5×5×128 size in the first layer residual structure, a 3×3×256 size in the second layer residual structure, and a 2×2×512 size in the fourth layer residual structure. And then, the output obtained by each layer of residual structure is subjected to 256 convolution operations with the convolution kernel size of 1 multiplied by 1 so that the dimensions of each feature matrix are consistent. The final result is then obtained through the biglu, full tie layer and softmax layer.

Referring to fig. 5, the specific change process of the three-dimensional feature matrix in the four-layer residual structure is: the input of the network model is a feature matrix of size 9×9×4, and is changed to 9×9×64 after a convolution operation with a convolution kernel of size 7×7 and a sliding step of 1. Then, after 64 convolution operations with a convolution kernel size of 3×3 and a sliding step of 1, the result is 9×9×64. Then, after 64 convolution operations with a convolution kernel size of 3×3 and a sliding step of 1, the result is 9×9×64. After that, the size is not changed after ECANet. Then, the convolution operation with the size of the 64 convolution kernels being 3×3 and the sliding step being 1 is performed to obtain the result that the convolution operation is 9×9×64. Then, the convolution operation with the size of the 64 convolution kernels being 3×3 and the sliding step being 1 is performed to obtain the result that the convolution operation is 9×9×64. After that, the size is not changed after ECANet. Then, the result is changed into 5×5×128 after 128 convolution operations with a convolution kernel size of 3×3 and a sliding step of 2. Then, after 128 convolution operations with a convolution kernel size of 3×3 and a sliding step of 1, the result is 5×5×128. After that, the size is not changed after ECANet. And then, the convolution operation with the size of the convolution kernel of 3 multiplied by 3 and the sliding step of 1 is carried out to obtain the convolution operation with the size of 5 multiplied by 128. Then, after 128 convolution operations with a convolution kernel size of 3×3 and a sliding step of 1, the result is 5×5×128. After that, the size is not changed after ECANet. Then, the result is changed into 3×3×256 after 256 convolution operations with a convolution kernel size of 3×3 and a sliding step of 2. Then, after 256 convolution operations with a convolution kernel size of 3×3 and a sliding step of 1, the result is 3×3×256. After that, the size is not changed after ECANet. Then, after 256 convolution operations with a convolution kernel size of 3×3 and a sliding step of 1, the result is 3×3×256. Then, after 256 convolution operations with a convolution kernel size of 3×3 and a sliding step of 1, the result is 3×3×256. After that, the size is not changed after ECANet. Then, the variable convolution operation with the convolution kernel size of 3×3 and the sliding step of 2 is changed into 2×2×512 after 512 variable convolution operations. After that, the variable convolution operation with 512 convolution kernels of 3×3 and a sliding step of 1 is performed, and the variable convolution operation is performed with 2×2×512. After that, the size is not changed after ECANet. The result is a 2 x 512 result from 512 deformable convolution operations with a convolution kernel size of 3 x 3 and a sliding step of 1. After that, the variable convolution operation with 512 convolution kernels of 3×3 and a sliding step of 1 is performed, and the variable convolution operation is performed with 2×2×512. After that, the size is not changed after ECANet. The result is a 2 x 512 result from 512 deformable convolution operations with a convolution kernel size of 3 x 3 and a sliding step of 1.

Referring to fig. 6, the convolution operation process of the three-dimensional feature matrix output by each layer of residual structure in the bottom-up feature pyramid network is as follows: the ResNet-18 neural network architecture is used as a backbone part of the model, and a C2-C5 characteristic diagram is obtained through convolution. The 1 x 1 convolution layers are then used together to keep the dimensions of the feature map consistent, resulting in T2-T5. Downsampling processing is performed from T2:

wherein m is the size of the output feature map; n is the size of the input feature map; p represents padding, which is the size of the filler pixel; f represents filter, which is the size of the convolution kernel; s represents strades, which is the slip step. After downsampling T2, tensor addition is performed with T3 to obtain D3. And D3 is processed in the same way, and added with T4 to obtain D4, so that a characteristic diagram D5 is obtained, and finally, a characteristic diagram P5 with smaller size and more abundant information is obtained through convolution of 3*3.

The expression of emotional states may vary on different spatial scales, so fusing features of different scales is critical to improving the accuracy of emotion recognition. The BU-FPN can fuse features at different levels, enabling the model to focus on both local detail and global context. The multi-scale feature fusion is helpful for the model to more comprehensively understand and explain emotion related information in the EEG signals, so that emotion recognition performance is improved.

Compared with the traditional EEG signal emotion recognition algorithm, the EEG-based emotion recognition performance is effectively improved by introducing the deformable convolution, the attention mechanism and the bottom-up characteristic pyramid network. These key components play a key role in capturing complex emotional expression patterns, optimizing feature extraction, and achieving multi-scale feature fusion. The integration of these components and the remarkable achievements achieved in the emotion recognition task bring new insight and possibilities for research and application in this field.

Example 2

1. Evaluation index

In the experiments of the present invention, accuracy and standard deviation were used to evaluate the performance of MDCNARENet-BiGRU.

(1)Accuracy

The calculation formula of Accuracy is shown as formula (2):

TP and TN represent the number of correctly classified examples, wherein TP is positive class and TN is negative class; FP, FN denote the number of instances misclassified, where FP is a negative class and FN is a positive class.

2. Experimental results

Experimental data were derived from the public dataset DEAP and validated.

FIG. 7 shows the recognition accuracy of different K values ("K value" refers to the fold number used for cross-validation). It can be seen that at k=5, the improvement in the value dimension is 2.63% -4.80% and the improvement in the Arousal dimension is 2.63% -4.69% compared to the other K values. Five-fold cross-validation is therefore chosen. For the DEAP dataset, all samples of each subject were randomly divided into 5 groups, 1 group was used as the test set, and the remaining 4 groups were used as the training set according to the experimental results. Fig. 8 shows the overall performance of MDCNAResnet-biglu on the DEAP dataset.

Different network structures affect the final classification accuracy, and performance comparisons are made on DEAP datasets by changing the position of the deformable convolutional layer in the resnet18 network, with experimental results shown in table 1. The numbers in brackets indicate the locations of the deformable convolutions in the residual structure. For example, (4) represents a substitution in a fourth residual structure.

Table 1 different network structure identification accuracy

Based on the above analysis, we found that the substitution of the deformable convolutional layer in the fourth residual structure was improved by 1.29% -3.35% in the value dimension and by 1.17% -3.25% in the Arousal dimension compared to the other alternatives. Therefore, the replacement of the deformable convolution layer in the fourth residual structure is selected.

In order to verify the effectiveness of the model proposed by the scheme, an ablation experiment is performed on each module of the network. The baseline model is marked as Resnet-BiGRU, an efficient channel attention mechanism is introduced into the Resnet-BiGRU to form ECAResnet-BiGRU, the traditional convolution in the fourth residual error structure in the Resnet is replaced by deformable convolution to form DCNAResnet-BiGRU, and finally BU-FPN is adopted to carry out multi-scale fusion to form MDCNARESnet-BiGRU.

Table 2 shows the average parameter index of all subjects, and from the table, it can be seen that the recognition accuracy of Resnet-BiGRU is low, and ECAResnet-BiGRU performs 'purposeful' learning on different feature maps, compared with Resnet-BiGRU, the accuracy is improved in Valence and Arousal dimensions, and the effectiveness of a channel attention mechanism is proved. After the traditional convolution is replaced by the deformable convolution, DCNAResnet-BiGRU is formed, the recognition accuracy class in the value dimension is improved by 3.53%, the recognition accuracy rate in the Arousal dimension is improved by 3.85%, and the deformable convolution is proved to be capable of effectively extracting hidden features and improving the expressive force of a network. And finally, performing multi-scale fusion by using BU-FPN to form MDCNARENET-BiGRU, further extracting multi-scale features, realizing the combination of feature information of different scales, and respectively achieving the recognition accuracy of 98.63% and 98.89% in two dimensions. The result of the ablation experiment proves that the proposal provides a model for assisting EEG emotion recognition.

TABLE 2 ablation experiments on DCNA-FPN-BiGRU on DEAP dataset

The proposal aims to develop an efficient EEG emotion recognition system, and achieves a series of remarkable results by combining a multiscale variable convolution network (MDCNARENET) and an attention mechanism and applying a bottom-up feature pyramid network (BU-FPN).

Firstly, the method has a certain scientific value in the EEG emotion recognition field. The multi-scale deformation convolution operation is introduced into the network structure, which is an innovation in the EEG emotion recognition field. This operation can better capture spatial information in the EEG signal and improve emotion recognition performance. In addition, an efficient channel attention mechanism (ECA) is introduced, so that the attention of the network to the key information is further enhanced. The scheme not only has innovation in the method, but also realizes excellent recognition precision in the task of EEG emotion recognition, and makes an important contribution to the development of the field.

Second, a comparison with existing methods shows significant advantages of the present method. Comparing the scheme model with other network models in recent years, the result is shown in a table 3, and the MDCNARENET is excellent in emotion recognition task and is obviously superior to other methods. The method obtains the recognition precision of 98.63% and 98.89% on the DEAP data set respectively, which is far higher than other methods. This further demonstrates the significant advantage of the present method in EEG emotion recognition.

In a word, the scheme has innovation in the method, also has remarkable results and has a certain scientific value. The MDCNARENET method is excellent in performance in EEG emotion recognition tasks, and provides powerful support for development and application of emotion recognition technology.

Table 3 comparison of different models

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims

1. An electroencephalogram signal emotion recognition algorithm integrating multi-scale features is characterized by comprising the following steps: converting an original electroencephalogram signal into a three-dimensional feature matrix, inputting the three-dimensional feature matrix into a multi-scale deformable convolution interaction attention network model based on a residual error network for training, and finally predicting the emotion type of the electroencephalogram signal through a classifier; the network model adopts deformable convolution as a core component for feature extraction, and captures important information among channels by using an efficient channel attention mechanism to optimize feature extraction, a bottom-up feature pyramid network is applied to the top of the network model to fuse multi-scale features, a bidirectional gating circulation unit is introduced into the network model, and context semantic information is extracted from the forward direction and the reverse direction.

2. The multi-scale feature fused electroencephalogram emotion recognition algorithm according to claim 1, wherein the step of converting the original electroencephalogram into a three-dimensional feature matrix comprises:

3. The electroencephalogram emotion recognition algorithm integrating multi-scale features according to claim 1, wherein the multi-scale deformable convolution interaction attention network model based on a residual network is provided with four layers of residual structures, an efficient channel attention mechanism is introduced into each of the four layers of residual structures, traditional convolution is used in each of the first three layers of residual structures, and the traditional convolution in the fourth layer of residual structures is replaced by the deformable convolution; firstly, inputting a three-dimensional feature matrix by a network model, and after convolution operation, sequentially passing through a first layer residual error structure, a second layer residual error structure, a third layer residual error structure and a fourth layer residual error structure; the three-dimensional feature matrix output by each layer of residual structure is subjected to convolution operation of a bottom-up feature pyramid network to enable the dimensions of each three-dimensional feature matrix to be consistent, and finally a final electroencephalogram emotion type result is obtained through a two-way gating circulating unit, a full-connection layer and a softmax layer.

4. The electroencephalogram emotion recognition algorithm integrating multi-scale features according to claim 3, wherein a three-dimensional feature matrix with the size of 9×9×4 is input into a network model, the three-dimensional feature matrix is changed into the size of 9×9×64 after convolution operation with the size of 7×7 by 64 convolution kernels, then the three-dimensional feature matrix is changed into the size of 5×5×128 after the first layer residual structure is kept unchanged, the three-dimensional feature matrix is changed into the size of 3×3×256 after the third layer residual structure is changed into the size of 2×2×512 after the fourth layer residual structure is changed into the size of 9×9×64; and then, the three-dimensional feature matrix output by each layer of residual structure is subjected to convolution operation with the convolution kernel size of 1 multiplied by 1 by 256 in a bottom-up feature pyramid network so that the dimensions of each three-dimensional feature matrix are consistent.

5. The electroencephalogram emotion recognition algorithm with multi-scale features fused according to claim 4, wherein the three-dimensional feature matrix output by each layer of residual structure is subjected to convolution operation with 256 convolution kernels of 1×1 from bottom to top in a feature pyramid network to make the dimensions of each three-dimensional feature matrix consistent, specifically: