CN113052254A

CN113052254A - Multi-attention ghost residual fusion classification model and classification method thereof

Info

Publication number: CN113052254A
Application number: CN202110366308.8A
Authority: CN
Inventors: 贾晓芬; 杜圣杰; 郭永存; 黄友锐; 赵佰亭
Original assignee: Anhui University of Science and Technology
Current assignee: Anhui University of Science and Technology
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2021-06-29
Anticipated expiration: 2041-04-06
Also published as: CN113052254B

Abstract

The invention discloses a multiple attention ghost residual fusion classification Model (MAGR) for image classification and a classification method thereof, which comprises a basic feature extraction network, a ghost residual mapping network and an image classification network which are sequentially connected; the basic feature extraction network is used for extracting useful feature information with emphasis by means of an attention machine, is responsible for extracting basic features of an input image and sends the basic features into a ghost residual error mapping network; the ghost residual mapping network integrates ghost convolution, multi-branch ghost group convolution and residual connection and is responsible for extracting high-level characteristics of the network; and the image classification network judges the category of the image according to all the extracted characteristic information, acquires the label corresponding to the image and realizes classification. The method is used for image classification, can realize high efficiency and light weight of a classification model, and simultaneously ensures high-precision classification of the images.

Description

Multi-attention ghost residual fusion classification model and classification method thereof

Technical Field

The invention belongs to the technical field of image classification, belongs to a new generation of information technology, and relates to a multi-attention ghost residual fusion classification model of an image and a classification method thereof.

Background

The image classification technique is an image information processing technique for determining a category to which an image belongs by an algorithm given an input image. The image classification is widely applied to the fields of face recognition, pedestrian detection, intelligent video analysis in the security field, vehicle counting, retrograde motion detection, license plate detection and recognition in the traffic field, content-based image retrieval in the internet field, automatic album classification and the like.

The traditional image classification algorithm is excellent in simple classification task performance, and cannot meet the requirements for the image classification effect with serious interference or slight difference. The intelligent classification method based on neural network is recognized by people, and the method for improving the classification effect by deepening the network depth is the most common method, such as Deep residual neural network ResNet, which is detailed in 'He K, Zhang X, and Ren S,' Deep residual learning for image recognition, 'Proceedings of the IEEE conference on component and dpattern recognition,2016, pp.770-778'. ResNet effectively improves the classification accuracy of the network, but also brings high calculation cost.

In order to accelerate the network training efficiency, people began to reduce the number of parameters and the calculation cost of the model by replacing the traditional convolution operation, for example, k.han et al put forward the concept of ghost convolution for the first time and thus construct a ghost net neural network, which greatly reduces the amount of parameters, but has the problem of low classification accuracy, see "k.han, y.wang, q.tie, j.guo, c.xu and c.xu," ghost net: More Features From cheaps Operations, "2020/CVF Conference on Computer Vision and Pattern Recognition (pr), Seattle, WA, USA,2020, pp.1577-1586".

The existing image classification method is usually focused on the performance of a certain aspect, a high-precision classification model may have the problem of high calculation cost, and a high-efficiency classification model may also have the defect of low classification precision, so that the dual requirements of high precision and high efficiency in certain specific occasions are difficult to meet at the same time. With the increasing abundance of application occasions of artificial intelligence technology, various intelligent machines or products have higher and higher requirements on image classification technology, and how to realize image classification with the advantages of high efficiency and high precision is a problem to be solved urgently at present.

Disclosure of Invention

The embodiment of the invention provides a multi-attention ghost residual fusion classification model and a classification method thereof, and aims to solve the problems of high model calculation cost, difficulty in training, low classification precision and the like in the technical field of image classification.

The technical scheme adopted by the embodiment of the invention is that the multiple attention ghost residual fusion classification model comprises a basic feature extraction network, a ghost residual mapping network and an image classification network which are sequentially connected;

the CBAM attention mechanism can help the network to extract the characteristic information of image channels and space positions with emphasis, and can better help the classifier to extract more key characteristics beneficial to classification;

the ghost residual error mapping network establishes a nonlinear mapping relation between input and output by replacing convolution operation, widening network width and connecting residual error, repeatedly extracts high-dimensional characteristic information of an image and transmits the high-dimensional characteristic information to an image classification network;

the image classification network further extracts feature information such as details, textures and the like from the output of the ghost residual error mapping network by using an ECA (equal cost error) attention mechanism, and then sends the feature information to a classifier to finish the classification task of the image.

The embodiment of the invention adopts another technical scheme that the classification method of the multi-attention ghost residual fusion classification model is carried out according to the following steps:

s1, sending the image to be classified to a basic feature extraction network of a multi-attention ghost residual fusion classification model;

step S2, the basic feature extraction network extracts the basic features of the input image to obtain basic feature information;

step S3, sending the basic feature information into a ghost residual error mapping network, and repeatedly extracting the high-dimensional feature information of the input image by adopting 4 MGR-Block modules;

and step S4, sending the high-dimensional characteristic information of the image into an image classification network, wherein the image classification network realizes information interaction among channels for the input high-dimensional characteristic information by utilizing an ECA module, realizes the purpose of extracting characteristic information more useful for classification with a side effect, and then transmits the finally obtained characteristic information to a classifier to realize classification.

The embodiment of the invention has the beneficial effects that a multi-attention ghost residual fusion classification model and a classification method thereof are provided, and a basic feature extraction network is designed, which uses a convolution layer with the size of 3 multiplied by 3, a maximum pooling layer and a mixed attention mechanism of a channel and a space to extract key feature information such as color, texture and the like of an input image with emphasis. By connecting 4 MGR-Block modules in sequence, a ghost residual mapping network (GRM) is provided, the GRM uses ghost convolution operation to replace all traditional convolutions so as to reduce the calculated amount and parameter amount of a model, and then a multi-branch group convolution mode is adopted to change all ghost convolution layers in the GRM so as to widen the network width, enhance the characteristic extraction capability of the network and help the GRM network to obtain richer characteristic information from the output characteristics provided by a basic characteristic network. In addition, the 4 MGR-blocks are sequentially constructed by connecting 3, 4, 6, 3 ghost sub-networks (GRS) and a dimensionality reducer in a local residual error mode. Each GRS is formed by sequentially cascading 1 × 1 ghost group convolution, 3 × 3 ghost group convolution, 1 × 1 ghost group convolution and Relu nonlinear active layers, wherein the input of the first layer of 1 × 1 ghost group convolution layer and the output of the last layer of 1 × 1 ghost group convolution layer are directly connected through residual connection to serve as the input of the next GRS module together, and the last GRS is circulated to obtain the final output of the GRM network after the ghost residual mapping process of the whole GRM network is completed. The image classification network is composed of an ECA attention module, a global average pooling layer (GAP) and a SoftMax classifier which are connected in sequence, the final output of the GRM network is sent into the image classification network, channels are subjected to global average pooling one by one through the ECA attention module under the condition that dimensionality is not reduced, so that each feature map corresponds to one feature point, then feature vectors formed by all the feature points are sent to the SoftMax layer, the SoftMax layer identifies and judges the category and the corresponding label of an image to be classified according to the obtained input features, and finally a classification result is obtained, so that a high-efficiency and high-precision classification task is completed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a multi-attention ghost residual fusion classification model according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a CBAM attention module in a basic feature extraction network in the multi-attention ghost residual fusion classification model according to the embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a ghost convolution operation in a ghost residual mapping network GRM in the multi-attention ghost residual fusion classification model according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a multi-branch ghost group convolution module MGR-Block in a ghost residual mapping network GRM in the multi-attention ghost residual fusion classification model according to the embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a ghost sub-network GRS in a ghost residual mapping network GRM in the multi-attention ghost residual fusion classification model according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of a residual connection structure in a ghost residual mapping network GRM in the multi-attention ghost residual fusion classification model according to an embodiment of the present invention.

Fig. 7 is a schematic structural diagram of an image classification network in the multiple attention ghost residual fusion classification model according to the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The inventor finds that the existing image classification algorithm based on deep learning has poor classification effect, and the existing classification model mainly has the following defects: (1) the number of model parameters is large, and the calculation cost is high; (2) the existing deep learning model improves the classification effect of the network by increasing the network depth, so that the network has the problem of gradient dispersion, which causes difficulty in network training; (3) the existing deep learning model has insufficient extraction capability on key feature information which is more useful for classification in an input image. In view of the above drawbacks, an embodiment of the present invention provides a multiple attention ghost residual fusion classification model, which has a structure as shown in fig. 1, and includes a basic feature extraction network, a ghost residual mapping network, and an image classification network, which are connected in sequence, where the basic feature extraction network is used to extract basic feature information such as color and contour of an initial input image, and a CBAM attention mechanism can make up for the problem of insufficient feature extraction capability of a general convolution pooling operation. And the ghost residual mapping network is used for repeatedly extracting high-dimensional characteristic information of the image and transmitting the high-dimensional characteristic information to the image classification network by establishing a multi-branch ghost group convolution residual mapping relation between input and output. And the image classification network further extracts feature information such as details, textures and the like from the output of the ghost residual error mapping network by using an ECA (equal cost optimization) attention mechanism, and then sends the feature information into a classifier to finish the classification task of the image.

The basic feature extraction network in the embodiment of the invention consists of a 3X 3 convolution layer, a maximum pooling layer Maxpool and a CBAM attention mechanism. A 3 × 3 convolutional layer for extracting basic feature information including color, contour, etc. of the initial input image; the maximum pooling layer is used for reducing parameters and calculated amount while keeping main characteristics, preventing overfitting and improving the generalization capability of the model; the CBAM attention mechanism is used for helping the basic feature extraction network to emphatically extract feature information which is more useful for classification.

For the size of the convolution kernel, a large-scale convolution kernel has the capability of learning complex features but loses detail information, while a small-scale convolution kernel is easy to learn, can bring more abundant detail information, but has poor capability of learning complex features. Therefore, the CBAM with the channel and space mixed attention mechanism helps the network to realize the feature extraction, can effectively make up for the defect of insufficient capability of extracting feature information by a small-scale single convolution kernel, ensures that richer feature information is extracted from an initial input image, and prepares for a subsequent classification task. In the embodiment of the invention, the basic feature extraction network comprises three parts, namely a 3 x 3 convolution layer, a maximum pooling layer Maxpool and a CBAM attention mechanism. The 3 × 3 convolutional layer is used to extract basic feature information including colors, contours, and the like of the initial input image. The Maxpool layer Maxpool is used for reducing parameters and calculation amount while keeping main characteristics, and improving the generalization capability of the model while preventing overfitting. The CBAM attention mechanism consists of a channel attention mechanism CA and a space attention mechanism SA, wherein an input image sequentially passes through a convolution layer and a maximum pooling layer, extracted image features are sent to a CA module, a weighting processing result is obtained and then sent to an SA module, and the extracted basic feature information is obtained by weighting. Thus, a basic feature map is obtained through the basic feature extraction network.

All convolutional layers contained in a ghost residual sub-network GRS in the ghost residual mapping network are replaced by ghost convolutional layers, the input of a first layer of 1 multiplied by 1 ghost convolutional layer in each GRS is directly connected with the output of a last layer of 1 multiplied by 1 ghost convolutional layer to serve as the input of a next GRS module, and the process is circulated to the last GRS. Residual connection is adopted inside the GRS, the input of one GRS is divided into 32 inputs, and each input is transmitted forwards in a mode of convolution of the ghost group on one hand, and is directly transmitted to an output layer through the residual connection on the other hand, so that the final output result of the GRS is obtained. In deep learning, the problem of gradient diffusion can be caused due to the deepening of the network depth, and the flow of information and gradient in the network can be further effectively improved by utilizing residual connection.

Due to the fact that ghost convolution can effectively reduce the calculated amount, the network width is widened, the feature extraction capacity of the network can be enhanced, residual connection can solve the problem of network gradient dispersion and improve information flow and gradient updating of the whole network, and for an image classification algorithm, the method can effectively improve the classification effect. Therefore, the ghost convolution is used for replacing the traditional convolution, and the calculation cost can be effectively reduced. And a multi-branch ghost group convolution mode is used for widening the network width and enhancing the network feature extraction capability. And residual error connection is adopted to solve the problem of network gradient diffusion and improve the information flow and gradient update of the whole network. Thus, a ghost residual error mapping network is constructed.

The image classification network comprises an ECA attention module, a global average pooling layer GAP and a SoftMax classifier. The ECA attention module pools the final output results of all MGR-blocks in the ghost residual mapping network on a Channel-by-Channel global average basis under the condition of not reducing dimensionality to generate a feature vector of 1 x C, then information interaction among channels is completed through a one-dimensional convolution layer, the size of a convolution kernel of the one-dimensional convolution is determined through an adaptive function, the layer with the larger number of channels can perform Cross Channel interaction more, and a calculation formula of the size of the adaptive convolution kernel is as follows:

the ECA module adaptively determines the kernel size of the one-dimensional convolution through nonlinear mapping of channel dimensions, the kernel size k represents the coverage of network cross-channel interaction, and the range size is increased in proportion to the channel dimensions. And finally, the classifier identifies and judges the category of the initial input picture and the label information corresponding to the category according to the transmitted feature graph, and prints a classification result. At this point, the image classification network completes the classification task of the whole model.

Because each image to be classified input into the network has different categories, the characteristic information contained in the image is different, and there may be objects with different sizes, colors and outlines. However, when extracting feature information using convolutional layers, the feature information that can be extracted by a small object is very small, and other surrounding irrelevant information may also be extracted by the size of the receptive field, and as the convolution processing operation continues, some feature information in the original image may be lost. Therefore, residual error connection is established, original image features are introduced, processed information results are supplemented, and information of small objects can be presented on a finally output feature map.

According to the application background, for example, super-resolution reconstruction of images focuses on a nonlinear mapping relation between a low-resolution image and a high-resolution image, original weak contour features are enhanced by extracting feature information of each position in the images, details and textures are improved, and the low-resolution image is used for deducing all missing high-frequency details, which is the key of reconstruction. The image classification is more concerned with the analysis and understanding of the image content, and the identification of the target which is more concerned in the model design is more concerned, and the target information is separated from the whole body for identification and classification. The embodiment of the invention provides a multi-attention ghost residual fusion classification model and a classification method thereof in order to fully extract the characteristic information of an original image and realize a better classification effect. The method adopts a channel attention and space attention mixed mechanism CBAM to help the network to emphatically extract basic feature information which is more useful for classification in an original image, and then the extracted basic feature information is sent to a ghost residual error mapping network. The ghost residual mapping network GRM is formed by sequentially connecting 4 MGR-blocks, the ghost residual mapping network is responsible for repeatedly extracting high-dimensional characteristic information of upper-layer input information and sending the high-dimensional characteristic information to the image classification network, and the image classification network completes classification tasks by adopting an ECA attention module, a global average pooling layer GAP and a SoftMax classifier.

The embodiment of the invention provides a multi-attention ghost residual fusion classification model and a classification method thereof, and the classification method is carried out according to the following steps as shown in figure 1:

and step S1, inputting the original image into a basic feature extraction network of the multi-attention ghost residual fusion classification model, and sequentially passing through a 3 multiplied by 3 convolutional layer, a maximum pooling layer and a CBAM attention module.

Step S2, the basic feature extraction network performs feature extraction on the original image to obtain basic features, namely x' in figure 1;

extracting the basic feature information by the single-layer convolutional layer and the maximum pooling layer may cause more useful feature detail information to be omitted and ignored in the extracting process. The CBAM attention module is adopted in the basic feature extraction network part to help the network to extract more useful key feature information with emphasis, and the structural diagram of the CBAM attention module is shown in figure 2, so that the problem of insufficient feature extraction capability of a single-layer convolutional layer is effectively solved. The basic feature extraction formula is as follows:

n＝H_3×3(m)；

x＝Maxpool(n)；

wherein M is an input image to be classified, H represents a convolution operator, subscripts represent the size of a convolution kernel, Maxpool represents the maximum pooling operation, M_cAnd M_sRespectively representing the channel feature extraction and spatial feature extraction operations,

and the method comprises the steps of representing point multiplication operation, wherein n, x ' and x ' respectively represent a feature diagram obtained by performing 3X 3 convolution operation, channel feature extraction and spatial feature extraction on m, and x ' is a basic feature finally obtained by a basic feature extraction network.

Step S3, sending the basic features into a ghost residual error mapping network, and repeatedly extracting high-dimensional feature information of the image through MGR-Block;

the ghost residual mapping network GRM replaces a traditional convolution layer with ghost convolution, the ghost convolution is shown in figure 3, the ghost convolution comprises two parts, the first part generates a feature graph with a small channel number through the traditional convolution, the second part generates more feature graphs by utilizing the result of the first part through linear operation, the two groups of feature graphs are spliced together to obtain final output, and the mathematical model of the ghost convolution is given as follows:

Y₁＝x₁′*f′

Y＝Y₁+Y₂

where x 'denotes a basic feature obtained by the basic feature extraction network, and x' is assumed to be x₁′+x′₂(x₁′<x′₂′)，x₁'and x'₂' useful basic feature information and redundant basic feature information, x, respectively₁"for generating m intrinsic profiles Y₁Linearly operating on each intrinsic feature map_jGenerating s ghost features, generating n as m x s ghost feature maps Y by m intrinsic features₂Obtaining m + n output characteristic diagrams Y after the Ghost convolution operation, wherein f' is belonged to R^c×k×k×mDenotes the filter used in ghost convolution operation, k × k is the size of convolution kernel, m eigen feature maps Y ∈ R^{h′×w′×n}H 'and w' are the height and width of the output characteristic diagram respectively, and n is the number of ghost characteristic diagrams.

The GRM network is internally connected with 4 MGR-Block modules in series, the MGR-Block modules have structures shown in figure 4, and the 4 MGR-Block modules are formed by sequentially connecting 3, 4, 6 and 3 ghost residual sub-networks GRS. The structure diagram of the GRS is shown in figure 5, all convolutional layers in the GRS are replaced by ghost convolutional layers, each ghost convolutional layer is divided into 32 branches to form a multi-branch grouped ghost convolutional structure, each GRS is formed by sequentially cascading a 1 × 1 ghost group convolutional layer, a 3 × 3 ghost group convolutional layer, a 1 × 1 ghost group convolutional layer and a Relu nonlinear activation layer, wherein the input of the first layer of the 1 × 1 ghost group convolutional layer is directly connected with the output of the last layer of the 1 × 1 ghost group convolutional layer to serve as the input of the next GRS module, and the process is circulated to the last GRS, and the mathematical model of a single GRS is as follows:

where the input X is X', the input X is divided into Q inputs X_i,i＝1,…,Q，T(X_i) The mapping result of the ith branch is shown, after a plurality of GRSs sequentially form MGR-Block (the number of GRSs is 3, 4, 6 and 3 sequentially adopted to construct a network), except that the input of the first GRS is the output of the basic feature extraction network, the input of other GRSs is the output of the last GRS, so that the transmission of the features in the 4 MGR-Block is realized, wherein the mathematical model of the complete ghost group convolution mapping process in the ghost mapping network is as follows:

wherein K, K is 1, …, P, M is the number of GRS contained in Kth multi-branched ghost group volume, T_K(X) represents an output result of the ghost mapping network. The GRS internally uses residual connection to transfer the input directly to the output layer, the input of one GRS is divided into Q inputs, each input is transferred forward in the manner of the aforementioned ghost group convolution on the one hand, and on the other hand is transferred directly to the output layer by means of residual connection, the residual connection is shown in fig. 6, and thus the complete mathematical model of the GRM network is obtained as follows:

wherein T is_KFin(X) represents the final output of the complete ghost residual mapping network GRM. The ghost convolution can effectively reduce the calculated amount of the model, the multi-branch-group convolution can widen the network width and enhance the network feature extraction capability, and the residual connection can effectively improve the flowing and gradient updating of feature information in the network and prevent gradient dispersion or explosion.

The input to the image classification network shown in FIG. 7 is the output of the GRMT_kFin(X)，T_kFin(X) after entering the image classification network, firstly enabling an ECA module to perform channel-by-channel global average pool under the condition of not reducing dimensionality, wherein a mathematical model of the ECA module is as follows:

wherein M is_eShowing the course of the ECA operation,

indicating dot multiplication operation, T_k′_Fin(X) represents an input T_kFin(X) processing results obtained after passing through ECA Module, followed by treatment of T_k′_Fin(X) sending the feature maps into a GAP layer, wherein the GAP layer performs global average pooling on each input feature map, so that each feature map corresponds to one feature point, then sending feature vectors formed by all the feature points to a SoftMax layer, identifying and judging the category and the corresponding label of the image to be classified according to the obtained input features by the SoftMax layer, finally obtaining a classification result, and finishing a final classification task.

The activation functions in the multi-attention ghost residual fusion classification model MAGR all adopt ReLU activation functions, the ReLU activation functions are the key for realizing nonlinear mapping, and the ReLU activation functions are beneficial to learning of complex features of input images by the network model in the embodiment of the invention. Since the convolutional layer is a linear filter with cross-correlation properties. The ReLU has nonlinear characteristics as an activation function of the convolutional layer, and can convert a plurality of input signals of one node into one output signal to realize nonlinear mapping of input and output characteristic images.

Given a training data set E ═ X^(k),Y ^(k)1,2,3, | D |, where X ═ 1,2,3^(k)And Y^(k)Respectively representing the original image and the image corresponding category label. The multi-attention ghost residual fusion classification model is an end-to-end mapping from image feature information to image class labels. In other words, the goal of the multi-attention ghost residual fusion classification model of our inventive embodiments is to learn a deductive modelΓ from the input original image X^(k)Deducing the corresponding class label Y^(k)。

Where Θ is [ ω, b ] a network model parameter, ω is a weight matrix, and b is a bias. The model parameters Θ are determined by minimizing the loss between the reconstructed HR image and the true HR image. We define the loss function as being that,

the process of training the MAGR with the training set E is to minimize the loss and find the optimal parameters of the model Θ. The structure of the MAGR model is shown in fig. 1, and it is composed of a basic feature extraction network (BFE), a ghost residual mapping network (GRM), and an image classification network (IC). The BFE is responsible for extracting basic characteristics of an original image and transmitting the basic characteristics to the GRM, the GRM is responsible for extracting high-dimensional characteristics of the image and sending the high-dimensional characteristics to the IC, and the IC is sent to a SoftMax classifier to perform classification tasks after being processed by the ECA attention module.

To verify the effectiveness of the multi-attention ghost residual fusion classification model of the present embodiment, different scene images were selected as test datasets, in combination with K.Han's algorithm (K.Han, Y.Wang, Q.Tian, J.Guo, C.xu and C.xu, "GhostNet: More Features From Cheap Operations,"2020IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA,2020, pp.1577-1586.); liu Z's algorithm (Liu Z, Sun M, and Zhou T, "resetting the value of network forwarding," arXiv preprint arXiv1, pp.810.05270, 2018.); the algorithm of Shen Y T (Shen Y T, and Wen Y,

"capacitive Network optimization Channel response attribute module," arXivpreprint arXiv, pp.2010.05605, 2020.); liu Y' S algorithm (Liu Y, WentzlaffD, and Kung S Y, "Reinking Class-Discrimination Based CNN Channel planning," arXiv preprintiv, pp.2004.14492, 2020.); ren's algorithm (F.ren, W.Liu and G.Wu, "Feature Reuse research Networks for insert Pest Recognition," in IEEE Access, vol.7, pp.122758-122768,2019.); the algorithm of Wang M (Wang M, Zhang X, and Niu X, "Scene classification of high-resolution moved sensed image based on ResNet," Journal of geographic Analysis and Spatial Analysis, pp.16,2019,3 (2)); the results of the L.Li algorithm (L.Li, T.Tian and H.Li, "Classification of Remote Sensing Based on Neural Architecture Search Network,"2019IEEE 4th International Conference on Signal and Image Processing (ICSIP), Wuxi, China,2019, pp.176-180.) and the present invention are verified by comparative analysis in both principal and objective aspects.

In order to avoid deviation caused by qualitative analysis, quantitative evaluation is carried out by using three objective indexes of model parameter quantity (Params/M), floating point calculated quantity (FLOPs/M) and classification accuracy rate (Acc/%), and analysis and comparison are carried out through experimental results on three data sets of CIFAR10, CIFAR100 and UC Mercded Land (UC-M).

The results of the experiments on the CIFAR10 dataset are shown in Table 1, with the best results shown in bold and the second best results shown in blue. As can be seen from the table: han's Ghost-ResNet-56 algorithm Params and FLOPs are the lowest values, but the classification accuracy is not high. The Liu Z algorithm L1-ResNet-56Params and FLOPs indexes are ranked second, but the classification accuracy is only 92.5%. The Params index and the FLOPs index of the MAGR are ranked the third, but the classification precision is the highest and reaches 94.7%, which is respectively improved by 2.1% and 2.3% compared with the Ghost-ResNet-56 and the L1-ResNet-56.

Table 1 compares the Performance of other models at CIFAR10

The results of the experiments on the CIFAR100 dataset are shown in Table 2, with the best results shown in bold and the second best results shown in blue. As can be seen from table 2: the algorithm ResNet-164-S-GD of Liu Y yields the best Params index and the best FLOPs index. The Params index of the MAGR is positioned at the second, but the highest classification precision is obtained, and the highest classification precision reaches 78.4 percent, which is improved by 1.7 percent compared with ResNet-164-S-GD.

Table 2 compares the Performance of other models on CIFAR100

Finally, the results of the experiments on the UC-M dataset are shown in table 3, with the best results shown in bold. As can be seen from the data in Table 3, the multi-attention ghost residual fusion classification model and the classification method MAGR thereof provided by the invention have the best effect, and the classification precision reaches 96.7%. From the above experimental results, it can be seen that although the MAGR does not obtain the best Params and FLOPs indexes, the scores are medium in the comparison method, and the highest value is obtained for the classification accuracy of the three public datasets CIFAR10, CIFAR100 and UC-M. Therefore, the method provided by the embodiment of the invention has superiority in training efficiency, and has obvious competitiveness compared with other CNN models in classification precision.

TABLE 3 comparison of UC-M Performance with other models

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. The multi-attention ghost residual fusion classification model is characterized by comprising a basic feature extraction network, a ghost residual mapping network GRM and an image classification network which are sequentially connected;

the CBAM attention mechanism can help the basic feature extraction network to extract key information which is useful for a classification stage;

the ghost residual error mapping network GRM establishes a nonlinear mapping relation between input and output by replacing convolution operation, widening network width and connecting residual error, reduces the calculated amount of a model, and effectively improves the feature extraction capability of the network;

the image classification network further extracts information such as details, textures and the like from the features output by the ghost residual error mapping network by using an ECA (equal cost optimization) attention mechanism, and then sends the information to a classifier to finish the classification task of the image.

2. The multi-attention ghost residual error fusion classification model according to claim 1, wherein the basic feature extraction network sequentially performs a convolution operation and a maximum pooling layer on an input image, sends the extracted image features to a channel attention module CA, sends the weighted result to a space attention module SA, and weights the weighted result to obtain extracted basic feature information;

and the final basic features obtained by the feature extraction module are sent to the input end of the ghost residual error mapping network GRM.

3. The multi-attention ghost residual fusion classification model according to any one of claims 1-2, wherein the ghost residual mapping network GRM is formed by cascading 4 multi-branch ghost group convolution networks MGR-Block;

the first MGR-Block in the GRM is formed by cascading a dimensionality reducer and 3 ghost residual error sub-networks GRS;

the second MGR-Block in the GRM is formed by cascading a dimensionality reducer and 4 GRSs;

the third MGR-Block in the GRM is formed by cascading a dimensionality reducer and 6 GRSs;

the fourth MGR-Block in the GRM is formed by cascading a dimensionality reducer and 3 GRSs;

the GRM inputs the basic features extracted by the basic feature extraction network, namely the basic features are input of the first MGR-Block.

4. The multi-attention Ghost residual fusion classification model according to claim 3, wherein Ghost residual sub-networks GRS contained in the multi-branch Ghost group convolutional networks MGR-Block use Ghost convolution Ghost to extract features, the network width is widened to 32 times through grouping, each GRS is formed by connecting in series according to the sequence of 1 x 1 convolution, 3 x 3 convolution and 1 x 1 convolution, a Relu activation layer is adopted after each layer of convolutional layer, and input and output feature information is fused and input to the next GRS by adopting a local residual connection mode at two ends of the GRS, so that Ghost residual mapping and transmission of input features in the whole GRM network are realized;

the grouping convolution operation in the GRS means that the input features are split and then combined, and the number of channels is not changed;

the convolutional layers involved in the GRS are all ghost convolutions;

the local residual in the GRS is connected and fused to represent a combined feature graph, and the number of channels is increased.

5. The multi-attention ghost residual error fusion classification model according to any one of claims 1-4, wherein the image classification network comprises three parts of an effective channel attention mechanism (ECA), a Global Average Pooling (GAP) and a classifier SoftMax which are connected in sequence;

the ECA can strengthen the relation between high-dimensional channel information output by ghost residual mapping, help the model further extract useful characteristic information on the premise of not increasing the calculated amount, the upper layer outputs and pools the characteristic vector of 1 × 1 × C one by one in the global average way under the condition of not reducing the dimensionality after entering the ECA module, then finish the information interaction between the cross-channel through a one-dimensional convolution layer, the convolution kernel size of the one-dimensional convolution is determined through a self-adaptive function, it makes the layer with larger number of channels can carry on the interactive utilization between the channels more, and then have the extraction of the emphasis to classify the more useful detail characteristic information finally;

the global average pooling GAP is used for performing global average pooling on each input feature obtained by the ECA, so that each feature map corresponds to one feature point, and finally, a feature vector consisting of all the feature points is obtained;

and the classifier SoftMax judges the class label of the original input image according to the feature vector which is output by the GAP layer and consists of all the feature points, obtains a classification result and realizes the final image classification.

6. The classification method of the multi-attention ghost residual fusion classification model according to the claims 1 to 5, characterized by comprising the following steps:

step S1, inputting the image to be classified into a basic feature extraction network of a multi-attention ghost residual fusion classification model of the image;

step S2, the basic feature extraction network extracts the features of the image to be classified to obtain basic features;

step S3, sending the basic features into a ghost residual error mapping network, and repeatedly extracting high-dimensional feature information of the input image by adopting 4 MGR-Block modules;

7. The method for classifying the multi-attention ghost residual fusion classification model according to claim 6, wherein the mathematical model of the basic feature extraction network in the step S2 is:

n＝H_3×3(m)，

x＝Maxpool(n)，

8. The method for classifying the multi-attention ghost residual fusion classification model according to claim 6, wherein the process of repeatedly extracting the high-dimensional feature information of the image by the ghost residual mapping network in the step S3 is as follows:

the input of the first layer of 1 multiplied by 1 ghost convolutional layer and the output of the last layer of 1 multiplied by 1 ghost convolutional layer in each GRS in the ghost residual mapping network are directly connected and used as the input of the next GRS module, the operation is circulated to the last GRS, and the convolution mathematical model of the ghost group in the GRS is as follows:

where the input X ═ X', the input X is split into 32 inputs, T (X)_i) Representing the mapping result of the ith branch; the mathematical model of the complete ghost group convolution mapping process in the 4 ghost mapping network is as follows:

wherein M represents the number of GRS contained in 4 MGR-blocks connected in sequence, the values of M are respectively 3, 4, 6 and 3, K represents the serial numbers of the 4 MGR-blocks connected in sequence, and the K corresponding to M is takenValues of 1,2,3, 4, T_K(X) represents an output result of the ghost mapping network;

the GRS adopts residual connection to directly transmit the input to the output layer, the input of one GRS is divided into 32 inputs, each input is transmitted forward according to the mode of the ghost group convolution on one hand, and is directly transmitted to the output layer by means of the residual connection on the other hand, and therefore the final mathematical model of the GRS is obtained as follows:

wherein T is_KFin(X) represents the final output of the complete ghost residual mapping network GRM, with M and K having the same meaning as in the mathematical model of the ghost volume mapping process.

9. The method for classifying the multi-attention ghost residual error fusion classification model according to any one of claims 6 to 8, wherein the image classification network in the step S4 comprises the following steps:

the input of the image classification network is the output T of the GRM_kFin(X)，T_kFin(X) entering an image classification network, and then enabling an ECA module to perform channel-by-channel global average pool under the condition of not reducing dimensionality, wherein a mathematical model of the ECA module is as follows:

wherein M is_eShowing the course of the ECA operation,

indicating dot multiplication operation, T_k′_Fin(X) represents an input T_kFin(X) processing results obtained after passing through ECA Module, followed by treatment of T_k′_Fin(X) entering the GAP layer, the GAP layer pooling the global mean values of each of the input feature maps such that each feature map corresponds to a feature point, and then pooling all of the feature pointsAnd (3) transmitting the feature vectors formed by the feature points to a SoftMax layer, identifying and judging the category and the corresponding label of the image to be classified by the SoftMax layer according to the obtained input features, and finally obtaining a classification result to finish a classification task.