CN113052254B - Multi-attention ghost residual fusion classification model and classification method thereof - Google Patents

Multi-attention ghost residual fusion classification model and classification method thereof Download PDF

Info

Publication number
CN113052254B
CN113052254B CN202110366308.8A CN202110366308A CN113052254B CN 113052254 B CN113052254 B CN 113052254B CN 202110366308 A CN202110366308 A CN 202110366308A CN 113052254 B CN113052254 B CN 113052254B
Authority
CN
China
Prior art keywords
ghost
network
classification
image
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110366308.8A
Other languages
Chinese (zh)
Other versions
CN113052254A (en
Inventor
贾晓芬
杜圣杰
郭永存
黄友锐
赵佰亭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Science and Technology
Original Assignee
Anhui University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Science and Technology filed Critical Anhui University of Science and Technology
Priority to CN202110366308.8A priority Critical patent/CN113052254B/en
Publication of CN113052254A publication Critical patent/CN113052254A/en
Application granted granted Critical
Publication of CN113052254B publication Critical patent/CN113052254B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multiple attention ghost residual fusion classification Model (MAGR) for image classification and a classification method thereof, which comprises a basic feature extraction network, a ghost residual mapping network and an image classification network which are sequentially connected; the basic feature extraction network is used for extracting useful feature information with emphasis by means of an attention machine, is responsible for extracting basic features of an input image and sends the basic features into a ghost residual error mapping network; the ghost residual mapping network integrates ghost convolution, multi-branch ghost group convolution and residual connection and is responsible for extracting high-level characteristics of the network; and the image classification network judges the category of the image according to all the extracted characteristic information, acquires the label corresponding to the image and realizes classification. The method is used for image classification, can realize high efficiency and light weight of a classification model, and simultaneously ensures high-precision classification of the images.

Description

Multi-attention ghost residual fusion classification model and classification method thereof
Technical Field
The invention belongs to the technical field of image classification, belongs to a new generation of information technology, and relates to a multi-attention ghost residual fusion classification model of an image and a classification method thereof.
Background
The image classification technique is an image information processing technique for determining a category to which an image belongs by an algorithm given an input image. The image classification is widely applied to the fields of face recognition, pedestrian detection, intelligent video analysis in the security field, vehicle counting, retrograde motion detection, license plate detection and recognition in the traffic field, content-based image retrieval in the internet field, automatic album classification and the like.
The traditional image classification algorithm is excellent in simple classification task performance, and cannot meet the requirements for the image classification effect with serious interference or slight difference. The intelligent classification method based on neural network is recognized by people, and the method for improving the classification effect by deepening the depth of the network is the most common method, such as a Deep residual neural network ResNet, which is detailed in 'He K, zhang X, and Ren S,' Deep residual learning for image recognition, 'Proceedings of the IEEE conference on vision and dpattern recognition,2016, pp.770-778'. ResNet effectively improves the classification accuracy of the network, but also brings high calculation cost.
In order to accelerate the network training efficiency, people began to reduce the number of parameters and the calculation cost of the model by replacing the traditional convolution operation, for example, K.Han et al put forward the concept of ghost convolution for the first time and thus construct a GhostNet neural network, which greatly reduces the parameter amount, but has the problem of low classification accuracy, as detailed in "K.Han, Y.Wang, Q.Tian, J.Guo, C.Xu and C.Xu", "GhostNet: more Features From Cheap Operations," 2020/CVF Conference on Computer Vision and Pattern Recognition (PR), seattle, WA, USA,2020, pp.1577-1586".
The existing image classification method is usually focused on the performance of a certain aspect, a high-precision classification model may have the problem of high calculation cost, and a high-efficiency classification model may also have the defect of low classification precision, so that the dual requirements of high precision and high efficiency in certain specific occasions are difficult to meet at the same time. With the increasing abundance of application occasions of artificial intelligence technology, various intelligent machines or products have higher and higher requirements on image classification technology, and how to realize image classification with the advantages of high efficiency and high precision is a problem to be solved urgently at present.
Disclosure of Invention
The embodiment of the invention provides a multi-attention ghost residual fusion classification model and a classification method thereof, and aims to solve the problems of high model calculation cost, difficulty in training, low classification precision and the like in the technical field of image classification.
The technical scheme adopted by the embodiment of the invention is that the multiple attention ghost residual fusion classification model comprises a basic feature extraction network, a ghost residual mapping network and an image classification network which are sequentially connected;
the CBAM attention mechanism can help the network to extract the characteristic information of image channels and space positions with emphasis, and can better help the classifier to extract more key characteristics beneficial to classification;
the ghost residual error mapping network establishes a nonlinear mapping relation between input and output by replacing convolution operation, widening network width and connecting residual error, repeatedly extracts high-dimensional characteristic information of an image and transmits the high-dimensional characteristic information to an image classification network;
the image classification network further extracts feature information such as details, textures and the like from the output of the ghost residual error mapping network by using an ECA (equal cost error) attention mechanism, and then sends the feature information to a classifier to finish the classification task of the image.
The embodiment of the invention adopts another technical scheme that the classification method of the multi-attention ghost residual fusion classification model is carried out according to the following steps:
s1, sending an image to be classified to a basic feature extraction network of a multi-attention ghost residual fusion classification model;
s2, basic feature extraction is carried out on the input image by a basic feature extraction network to obtain basic feature information;
s3, sending the basic feature information into a ghost residual error mapping network, and repeatedly extracting high-dimensional feature information of an input image by adopting 4 MGR-Block modules;
and S4, sending the high-dimensional characteristic information of the image into an image classification network, wherein the image classification network realizes information interaction among channels for the input high-dimensional characteristic information by utilizing an ECA (equal cost analysis) module, realizes the purpose of extracting characteristic information more useful for classification in a side-by-side manner, and then transmits the finally obtained characteristic information to a classifier to realize classification.
The embodiment of the invention has the beneficial effects that a multi-attention ghost residual fusion classification model and a classification method thereof are provided, and a basic feature extraction network is designed, which uses a convolution layer with the size of 3 multiplied by 3, a maximum pooling layer and a mixed attention mechanism of a channel and a space to extract key feature information such as color, texture and the like of an input image with emphasis. By connecting 4 MGR-Block modules in sequence, a ghost residual mapping network (GRM) is provided, the GRM uses ghost convolution operation to replace all traditional convolutions so as to reduce the calculated amount and parameter amount of a model, and then a multi-branch group convolution mode is adopted to change all ghost convolution layers in the GRM so as to widen the network width, enhance the characteristic extraction capability of the network and help the GRM network to obtain richer characteristic information from the output characteristics provided by a basic characteristic network. In addition, the 4 MGR-blocks are sequentially constructed by 3, 4, 6, 3 ghost subnetworks (GRS) and a dimensionality reducer in a local residual connection mode. Each GRS is formed by sequentially cascading 1 × 1 ghost group convolution, 3 × 3 ghost group convolution, 1 × 1 ghost group convolution and Relu nonlinear active layers, wherein the input of the first layer of 1 × 1 ghost group convolution layer and the output of the last layer of 1 × 1 ghost group convolution layer are directly connected through residual connection to serve as the input of the next GRS module together, and the last GRS is circulated to obtain the final output of the GRM network after the ghost residual mapping process of the whole GRM network is completed. The image classification network is composed of an ECA attention module, a global average pooling layer (GAP) and a SoftMax classifier which are connected in sequence, the final output of the GRM network is sent into the image classification network, channels are subjected to global average pooling one by one through the ECA attention module under the condition that dimensionality is not reduced, so that each feature map corresponds to one feature point, then feature vectors formed by all the feature points are sent to the SoftMax layer, the SoftMax layer identifies and judges the category and the corresponding label of an image to be classified according to the obtained input features, and finally a classification result is obtained, so that a high-efficiency and high-precision classification task is completed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic structural diagram of a multi-attention ghost residual fusion classification model according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a CBAM attention module in a basic feature extraction network in the multi-attention ghost residual fusion classification model according to the embodiment of the present invention.
Fig. 3 is a schematic structural diagram of the ghost convolution operation in the ghost residual mapping network GRM in the multi-attention ghost residual fusion classification model according to the embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a multi-branch ghost group convolution module MGR-Block in a ghost residual mapping network GRM in the multi-attention ghost residual fusion classification model according to the embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a ghost sub-network GRS in a ghost residual mapping network GRM in the multi-attention ghost residual fusion classification model according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of a residual connection structure in a ghost residual mapping network GRM in the multi-attention ghost residual fusion classification model according to an embodiment of the present invention.
Fig. 7 is a schematic structural diagram of an image classification network in the multiple attention ghost residual fusion classification model according to the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The inventor researches and discovers that the existing image classification algorithm based on deep learning has poor classification effect, and the existing classification model mainly has the following defects: (1) the number of model parameters is large, and the calculation cost is high; (2) The existing deep learning model mostly improves the classification effect of the network by increasing the network depth, so that the network has the problem of gradient dispersion, and the network training is difficult; (3) The existing deep learning model has insufficient extraction capability on key feature information which is more useful for classification in an input image. In view of the above drawbacks, an embodiment of the present invention provides a multiple attention ghost residual fusion classification model, which has a structure as shown in fig. 1, and includes a basic feature extraction network, a ghost residual mapping network, and an image classification network, which are connected in sequence, where the basic feature extraction network is used to extract basic feature information such as color and contour of an initial input image, and a CBAM attention mechanism can make up for the problem of insufficient feature extraction capability of a general convolution pooling operation. And the ghost residual mapping network is used for repeatedly extracting high-dimensional characteristic information of the image and transmitting the high-dimensional characteristic information to the image classification network by establishing a multi-branch ghost group convolution residual mapping relation between input and output. And the image classification network is used for further extracting characteristic information such as details, textures and the like from the output of the ghost residual error mapping network by using an ECA (equal cost optimization) attention mechanism, and then sending the extracted characteristic information into a classifier to finish the classification task of the image.
The basic feature extraction network in the embodiment of the invention consists of a 3 x 3 convolutional layer, a maximum pooling layer Maxpool and a CBAM attention mechanism. A 3 × 3 convolutional layer for extracting basic feature information including color, contour, etc. of the initial input image; the maximum pooling layer is used for reducing parameters and calculated amount while keeping main characteristics, preventing overfitting and improving the generalization capability of the model; the CBAM attention mechanism is used for helping the basic feature extraction network to emphatically extract feature information which is more useful for classification.
For the size of the convolution kernel, a large-scale convolution kernel has the capability of learning complex features but loses detail information, while a small-scale convolution kernel is easy to learn, can bring more abundant detail information, but has poor capability of learning complex features. Therefore, the CBAM with the channel and space mixed attention mechanism helps the network to realize the feature extraction, can effectively make up for the defect of insufficient capability of extracting feature information by a small-scale single convolution kernel, ensures that richer feature information is extracted from an initial input image, and prepares for a subsequent classification task. In the embodiment of the invention, the basic feature extraction network comprises three parts, namely a 3 x 3 convolution layer, a maximum pooling layer Maxpool and a CBAM attention mechanism. The 3 × 3 convolutional layer is used to extract basic feature information including color, contour, etc. of the initial input image. The Maxpool layer Maxpool is used for reducing parameters and calculation amount while keeping main characteristics, and improving the generalization capability of the model while preventing overfitting. The CBAM attention mechanism consists of a channel attention mechanism CA and a space attention mechanism SA, wherein an input image sequentially passes through a convolution layer and a maximum pooling layer, extracted image features are sent to a CA module, a weighting processing result is obtained and then sent to an SA module, and the extracted basic feature information is obtained by weighting. Thus, a basic feature map is obtained through the basic feature extraction network.
All convolutional layers contained in a ghost residual sub-network GRS in the ghost residual mapping network are replaced by ghost convolutional layers, the input of a first layer of 1 multiplied by 1 ghost convolutional layer in each GRS is directly connected with the output of a last layer of 1 multiplied by 1 ghost convolutional layer to serve as the input of a next GRS module, and the process is circulated to the last GRS. Residual error connection is adopted in the GRS, the input of one GRS is divided into 32 inputs, each input is transmitted forward according to the mode of the ghost group convolution on one hand, and on the other hand, the input is directly transmitted to an output layer through the residual error connection, and therefore the final output result of the GRS is obtained. In deep learning, the problem of gradient diffusion can be caused due to the deepening of the network depth, and the flow of information and gradient in the network can be further effectively improved by utilizing residual connection.
Due to the fact that ghost convolution can effectively reduce the calculated amount, the network width is widened, the feature extraction capacity of the network can be enhanced, residual connection can solve the problem of network gradient dispersion and improve information flow and gradient updating of the whole network, and for an image classification algorithm, the method can effectively improve the classification effect. Therefore, the ghost convolution is used for replacing the traditional convolution, and the calculation cost can be effectively reduced. And a multi-branch ghost group convolution mode is used for widening the network width and enhancing the network feature extraction capability. And residual error connection is adopted to solve the problem of network gradient diffusion and improve the information flow and gradient update of the whole network. Thus, a ghost residual error mapping network is constructed.
The image classification network comprises an ECA attention module, a global average pooling layer GAP and a SoftMax classifier. The ECA attention module pools the final output results of all MGR-blocks in the ghost residual mapping network on a Channel-by-Channel global average basis under the condition of not reducing dimensionality to generate a feature vector of 1 x C, then information interaction among channels is completed through a one-dimensional convolution layer, the size of a convolution kernel of the one-dimensional convolution is determined through an adaptive function, the layer with the larger number of channels can perform Cross Channel interaction more, and a calculation formula of the size of the adaptive convolution kernel is as follows:
Figure BDA0003007149550000051
the ECA module adaptively determines the kernel size of the one-dimensional convolution through nonlinear mapping of channel dimensions, the kernel size k represents the coverage of network cross-channel interaction, and the range size is increased in proportion to the channel dimensions. And finally, the classifier identifies and judges the category of the initial input picture and the label information corresponding to the category according to the transmitted feature graph, and prints a classification result. At this point, the image classification network completes the classification task of the whole model.
Because each image to be classified input into the network has different categories, the characteristic information contained in the image is different, and there may be objects with different sizes, colors and outlines. However, when extracting feature information using convolutional layers, the feature information that can be extracted by a small object is very small, and other surrounding irrelevant information may also be extracted by the size of the receptive field, and as the convolution processing operation continues, some feature information in the original image may be lost. Therefore, residual error connection is established, original image features are introduced, processed information results are supplemented, and information of small objects can be displayed on a finally output feature map.
According to the application background, for example, super-resolution reconstruction of images focuses on a nonlinear mapping relation between a low-resolution image and a high-resolution image, original weak contour features are enhanced by extracting feature information of each position in the images, details and textures are improved, and the low-resolution image is used for deducing all missing high-frequency details, which is the key of reconstruction. The image classification is more concerned with the analysis and understanding of the image content, and the identification of the target which is more concerned in the model design is more concerned, and the target information is separated from the whole body for identification and classification. The embodiment of the invention provides a multi-attention ghost residual fusion classification model and a classification method thereof in order to fully extract the characteristic information of an original image and realize a better classification effect. The method adopts a channel attention and space attention mixed mechanism CBAM to help the network to emphatically extract basic feature information which is more useful for classification in an original image, and then the extracted basic feature information is sent to a ghost residual error mapping network. The ghost residual mapping network GRM is formed by sequentially connecting 4 MGR-blocks, the ghost residual mapping network is responsible for repeatedly extracting high-dimensional characteristic information of upper-layer input information and sending the high-dimensional characteristic information to the image classification network, and the image classification network completes classification tasks by adopting an ECA attention module, a global average pooling layer GAP and a SoftMax classifier.
The embodiment of the invention provides a multi-attention ghost residual fusion classification model and a classification method thereof, and the classification method is carried out according to the following steps as shown in figure 1:
s1, inputting an original image into a basic feature extraction network of a multi-attention ghost residual fusion classification model, and sequentially passing through a 3 x 3 convolutional layer, a maximum pooling layer and a CBAM attention module.
S2, extracting the features of the original image by a basic feature extraction network to obtain basic features, namely x' in the figure 1;
extracting the basic feature information by the single convolutional layer and the maximum pooling layer may cause more useful feature detail information to be omitted and neglected in the extraction process. In the basic feature extraction network part, a CBAM attention module is adopted to help the network to extract more useful key feature information with emphasis, and the structural diagram of the CBAM attention module is shown in figure 2, so that the problem of insufficient feature extraction capability of a single-layer convolution layer is effectively solved. The basic feature extraction formula is as follows:
n=H 3×3 (m);
x=Maxpool(n);
Figure BDA0003007149550000071
Figure BDA0003007149550000072
wherein M is an input image to be classified, H represents a convolution operator, subscripts represent the size of a convolution kernel, maxpool represents the maximum pooling operation, M c And M s Respectively representing the channel feature extraction and spatial feature extraction operations,
Figure BDA0003007149550000073
and the method comprises the steps of representing point multiplication operation, wherein n, x ' and x ' respectively represent a feature diagram obtained by performing 3X 3 convolution operation, channel feature extraction and spatial feature extraction on m, and x ' is a basic feature finally obtained by a basic feature extraction network.
S3, sending the basic features into a ghost residual error mapping network, and repeatedly extracting high-dimensional feature information of the image through MGR-Block;
the ghost residual mapping network GRM replaces a traditional convolution layer with ghost convolution, the ghost convolution is shown in figure 3, the ghost convolution comprises two parts, the first part generates a feature graph with a small channel number through the traditional convolution, the second part generates more feature graphs by utilizing the result of the first part through linear operation, the two groups of feature graphs are spliced together to obtain final output, and the mathematical model of the ghost convolution is given as follows:
Y 1 =x 1 ′*f′
Figure BDA0003007149550000074
Y=Y 1 +Y 2
where x 'represents the underlying features obtained by the underlying feature extraction network, assuming x' = x 1 ′+x′ 2 (x 1 ′<x′ 2 ′),x 1 'and x' 2 ' are respectively useful groupsBasic feature information and redundant basic feature information, x 1 For generating m eigenmaps Y 1 For each eigen-feature map, linearly operating on j Generating s ghost features, m eigen features generating n = m × s ghost feature maps Y 2 Obtaining m + n output characteristic diagrams Y after the Ghost convolution operation, wherein f' is belonged to R c×k×k×m Represents the filter used in ghost convolution operation, k is the size of convolution kernel, and m eigen characteristic diagrams Y belongs to R h′×w′×n H 'and w' are the height and width of the output characteristic diagram respectively, and n is the number of ghost characteristic diagrams.
4 MGR-Block modules are connected in series in the GRM network, the structure of the MGR-Block modules is shown in figure 4, and the 4 MGR-blocks are formed by sequentially connecting 3, 4, 6 and 3 ghost residual sub-networks GRS. The GRS structure diagram is shown in fig. 5, all the convolutional layers in the GRS are replaced by ghost convolutional layers, each ghost convolutional layer is divided into 32 branches to form a multi-branch grouping ghost convolutional structure, each GRS is formed by sequentially cascading a 1 × 1 ghost group convolution, a 3 × 3 ghost group convolution, a 1 × 1 ghost group convolution and a Relu nonlinear activation layer, wherein the input of the first layer of the 1 × 1 ghost group convolutional layer and the output of the last layer of the 1 × 1 ghost group convolutional layer are directly connected to serve as the input of the next GRS module, and the process is circulated to the last GRS, and the mathematical model of a single GRS is as follows:
Figure BDA0003007149550000081
where the input X = X', the input X is divided into Q inputs X i ,i=1,…,Q,T(X i ) The mapping result of the ith branch is shown, after a plurality of GRSs sequentially form MGR-Block (the number of GRSs is 3, 4, 6 and 3 sequentially adopted to construct a network), except that the input of the first GRS is the output of the basic feature extraction network, the input of other GRSs is the output of the last GRS, so that the transmission of the features in the 4 MGR-Block is realized, wherein the mathematical model of the complete ghost group convolution mapping process in the ghost mapping network is as follows:
Figure BDA0003007149550000082
wherein K, K =1, \8230, P, M is the number of GRS contained in the Kth multi-branch ghost group volume, T K (X) represents the output result of the ghost mapping network. The inputs are directly transmitted to the output layer by adopting residual connection in the GRS, the input of one GRS is divided into Q inputs, each input is transmitted forward according to the mode of the ghost group convolution on one hand, and is directly transmitted to the output layer by virtue of the residual connection on the other hand, the residual connection is shown in figure 6, and thus the complete mathematical model of the GRM network is obtained as follows:
Figure BDA0003007149550000083
wherein T is KFin (X) represents the final output result of the complete ghost residual mapping network GRM. The ghost convolution can effectively reduce the calculated amount of the model, the multi-branch group convolution can widen the network width and enhance the network characteristic extraction capability, and the residual connection can effectively improve the flowing and gradient updating of characteristic information in the network and prevent gradient dispersion or explosion.
The input to the image classification network shown in FIG. 7 is the output T of the GRM kFin (X),T kFin (X) after entering the image classification network, firstly enabling an ECA module to perform channel-by-channel global average pool under the condition of not reducing dimensionality, wherein a mathematical model of the ECA module is as follows:
Figure BDA0003007149550000084
wherein M is e Showing the course of the ECA operation,
Figure BDA0003007149550000085
indicating dot multiplication operation, T kFin (X) represents an input T kFin (X) processing results obtained after passing through ECA Module, followed by treatment of T kFin (X) sending the feature map to a GAP layer, and carrying out global average value pool on each input feature map by the GAP layerAnd finally, obtaining a classification result and finishing a final classification task.
The activation functions in the multi-attention ghost residual fusion classification model MAGR all adopt ReLU activation functions, the ReLU activation functions are the key for realizing nonlinear mapping, and the ReLU activation functions are beneficial to learning of complex features of input images by the network model in the embodiment of the invention. Since the convolutional layer is a linear filter with cross-correlation properties. The ReLU has nonlinear characteristics as an activation function of the convolutional layer, and can convert a plurality of input signals of one node into one output signal to realize nonlinear mapping of input and output characteristic images.
Given a training data set E = { X = (k) ,Y (k) }, k =1,2,3, ·, | D |, where X =1,2,3 (k) And Y (k) Respectively representing the original image and the image corresponding category label. The multi-attention ghost residual fusion classification model is an end-to-end mapping from image feature information to image class labels. In other words, the objective of the multi-attention ghost residual fusion classification model of our inventive embodiment is to learn a deductive model Γ from the input original image X (k) Deducing the corresponding category label Y (k)
Figure BDA0003007149550000091
Where Θ = [ ω, b ] is a network model parameter, ω is a weight matrix, and b is a bias. The model parameters Θ are determined by minimizing the loss between the reconstructed HR image and the true HR image. We define the loss function as being that,
Figure BDA0003007149550000092
the process of training the MAGR with the training set E is to minimize the loss and find the optimal parameters of the model Θ. The structure of the MAGR model is shown in fig. 1, and it is composed of a basic feature extraction network (BFE), a ghost residual mapping network (GRM), and an image classification network (IC). The BFE is responsible for extracting basic characteristics of an original image and transmitting the basic characteristics to the GRM, the GRM is responsible for extracting high-dimensional characteristics of the image and sending the high-dimensional characteristics to the IC, and the IC is sent to a SoftMax classifier to perform classification tasks after being processed by the ECA attention module.
To verify the effectiveness of the multi-attention ghost residual fusion classification model of the present embodiment, different scene images were selected as test datasets, in combination with K.Han's algorithm (K.Han, Y.Wang, Q.Tian, J.Guo, C.xu and C.xu, "GhostNet: more Features From Cheap Operations,"2020IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), seattle, WA, USA,2020, pp.1577-1586.); liu Z's algorithm (Liu Z, sun M, and Zhou T, "resetting the value of network forwarding," arXiv preprint arXiv1, pp.810.05270, 2018.); the algorithm of Shen Y T (Shen Y T, and Wen Y,
"capacitive Network optimization Channel response attribute module," arXivpreprint arXiv, pp.2010.05605, 2020.); liu Y' S algorithm (Liu Y, wentzlaffD, and Kung S Y, "reproducing Class-Discrimination Based CNN Channel planning," arXiv preprinting arXiv, pp.2004.14492, 2020.); ren's algorithm (F.ren, W.Liu and G.Wu, "Feature Reuse reactive Networks for instance Pest registration," in IEEE Access, vol.7, pp.122758-122768, 2019.); wang M's algorithm (Wang M, zhang X, and Niu X, "Scene classification of high-resolution moved sensed image based on ResNet," Journal of geographic Analysis and Spatial Analysis, pp.16,2019,3 (2)); the results of the L.Li algorithm (L.Li, T.Tian and H.Li, "Classification of Remote Sensing Based on Neural Architecture Search Network,"2019IEEE 4th International Conference on Signal and Image Processing (ICSIP), wuxi, china,2019, pp.176-180.) and the present invention were verified by comparative analysis in both main and objective aspects.
In order to avoid deviation caused by qualitative analysis, the present embodiment uses three objective indexes of model parameters (Params/M), floating point calculation amount (FLOPs/M) and classification accuracy rate (Acc/%) to perform quantitative evaluation, and performs analysis and comparison through experimental results on three datasets of CIFAR10, CIFAR100 and UC commercial Land (UC-M).
The results of the experiments on the CIFAR10 dataset are shown in Table 1, with the best results shown in bold and the second best results shown in blue. As can be seen from the table: han's Ghost-ResNet-56 algorithm Params and FLOPs are the lowest values, but the classification accuracy is not high. The algorithm L1-ResNet-56Params index and the FLOPs index of Liu Z are ranked second, but the classification precision is only 92.5%. The Params index and the FLOPs index of the MAGR are ranked the third, but the classification precision is the highest and reaches 94.7%, which is respectively improved by 2.1% and 2.3% compared with the Ghost-ResNet-56 and the L1-ResNet-56.
Table 1 comparison of Performance on CIFAR10 with other models
Figure BDA0003007149550000101
The results of the experiments on the CIFAR100 dataset are shown in Table 2, with the best results shown in bold and the second best results shown in blue. As can be seen from table 2: the algorithm ResNet-164-S-GD of Liu Y yields the best Params index and the best FLOPs index. The Params index of the MAGR is positioned at the second, but the highest classification precision is obtained, and the highest classification precision reaches 78.4 percent, which is improved by 1.7 percent compared with ResNet-164-S-GD.
Table 2 compares the Performance of other models on CIFAR100
Figure BDA0003007149550000102
Finally, the results of the experiments on the UC-M dataset are shown in table 3, with the best results shown in bold. As can be seen from the data in Table 3, the multi-attention ghost residual fusion classification model and the classification method MAGR thereof provided by the invention have the best effect, and the classification precision reaches 96.7%. From the above experimental results, it can be seen that although the MAGR does not obtain the best Params and FLOPs indexes, the median is the same in the comparison method, and the classification accuracy of the three public data sets of CIFAR10, CIFAR100, and UC-M all obtain the highest value. Therefore, the method provided by the embodiment of the invention has superiority in training efficiency, and has obvious competitiveness compared with other CNN models in classification precision.
TABLE 3 comparison of UC-M Performance with other models
Figure BDA0003007149550000111
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (8)

1. An image classification method based on a multi-attention ghost residual fusion classification model is characterized by comprising a basic feature extraction network, a ghost residual mapping network GRM and an image classification network which are sequentially connected;
the CBAM attention mechanism can help the basic feature extraction network to extract key information which is useful for a classification stage;
the ghost residual error mapping network GRM establishes a nonlinear mapping relation between input and output by replacing convolution operation, widening network width and connecting residual error, reduces the calculated amount of a model, and effectively improves the feature extraction capability of the network;
the ghost residual mapping network GRM is formed by cascading 4 multi-branch ghost group convolutional networks MGR-Block;
the first MGR-Block in the GRM is formed by cascading a dimensionality reducer and 3 ghost residual error sub-networks GRS;
the second MGR-Block in the GRM is formed by cascading a dimensionality reducer and 4 GRSs;
the third MGR-Block in the GRM is formed by cascading a dimensionality reducer and 6 GRSs;
a fourth MGR-Block in the GRM is formed by cascading a dimensionality reducer and 3 GRSs;
the GRM inputs basic features extracted by a basic feature extraction network, namely the basic features are input of a first MGR-Block;
the Ghost residual sub-network GRS uses Ghost convolution Ghost to extract features, the network width is widened to 32 times through grouping, each GRS is formed by serially connecting 1 × 1 convolution, 3 × 3 convolution and 1 × 1 convolution in sequence, and a Relu active layer is adopted after each convolution layer;
the image classification network further extracts detail and texture information from the features output by the ghost residual mapping network by using an ECA (equal cost adaptive array) attention mechanism, and then sends the information to a classifier to finish the classification task of the image.
2. The image classification method based on the multi-attention ghost residual fusion classification model according to claim 1, characterized in that the basic feature extraction network sequentially performs a convolution operation and a maximum pooling layer on an input image, sends the extracted image features to a channel attention module CA, obtains a weighting processing result, sends the weighting processing result to a space attention module SA, and performs weighting to obtain extracted basic feature information;
and the final basic features obtained by the basic feature extraction network are sent to the input end of the ghost residual error mapping network GRM.
3. The image classification method based on the multi-attention ghost residual fusion classification model according to claim 1, characterized in that two ends of GRSs included in the multi-branch ghost group convolutional network MGR-Block adopt a local residual connection mode to fuse input and output feature information and input the fused input and output feature information to the next GRS, thereby realizing ghost residual mapping and transmission of input features in the whole GRM network;
the grouping convolution operation in the GRS means that the input features are split and then combined, and the number of channels is not changed;
the convolutional layers involved in the GRS are all ghost convolutions;
the local residual error in the GRS is connected and fused to represent a combined feature graph, and the number of channels is increased.
4. The image classification method based on the multi-attention ghost residual fusion classification model is characterized in that the image classification network comprises three parts, namely an effective channel attention mechanism (ECA), a Global Average Pooling (GAP) and a classifier SoftMax, which are connected in sequence;
the ECA can strengthen the relation between high-dimensional channel information output by ghost residual mapping, help the model to further extract useful feature information on the premise of not increasing calculated amount, the upper layer outputs and generates a 1 multiplied by C feature vector by channel global average pooling one by one under the condition of not reducing dimensionality after entering the ECA module, then information interaction between the cross channels is completed through a one-dimensional convolution layer, the convolution kernel size of the one-dimensional convolution is determined through an adaptive function, the layer with larger channel number can perform interaction utilization between the channels more, and further the extraction with emphasis is used for extracting detail feature information more useful for final classification;
the global average pooling GAP is used for performing global average pooling on each input feature obtained by the ECA, so that each feature map corresponds to one feature point, and finally, a feature vector consisting of all the feature points is obtained;
and the classifier SoftMax judges the class label of the original input image according to the feature vector which is output by the GAP layer and consists of all the feature points, obtains a classification result and realizes the final image classification.
5. The image classification method based on the multiple attention ghost residual fusion classification model according to any one of claims 1 to 4, characterized by comprising the following steps:
s1, inputting an image to be classified into a basic feature extraction network of a multi-attention ghost residual fusion classification model of the image;
s2, extracting the features of the image to be classified by a basic feature extraction network to obtain basic features;
s3, sending the basic features into a ghost residual error mapping network, and repeatedly extracting high-dimensional feature information of the input image by adopting 4 MGR-Block modules;
and S4, sending the high-dimensional feature information of the image into an image classification network, realizing information interaction between channels for the input high-dimensional feature information by using the ECA module by the image classification network, realizing the purpose of extracting feature information more useful for classification with a side effect, and then transmitting the finally obtained feature information to a classifier to realize classification.
6. The image classification method based on the multi-attention ghost residual fusion classification model according to claim 5, wherein the mathematical model of the basic feature extraction network in the step S2 is as follows:
n=H 3×3 (m),
x=Maxpool(n),
Figure FDA0003814955850000031
Figure FDA0003814955850000032
wherein M is an input image to be classified, H represents a convolution operator, subscripts represent the size of a convolution kernel, maxpool represents the maximum pooling operation, M c And M s Respectively representing the channel feature extraction and spatial feature extraction operations,
Figure FDA0003814955850000033
and the method comprises the steps of representing point multiplication operation, wherein n, y and z respectively represent a feature graph obtained by performing 3 multiplied by 3 convolution operation, channel feature extraction and space feature extraction on m, and z is a basic feature finally obtained by a basic feature extraction network.
7. The image classification method based on the multi-attention ghost residual fusion classification model according to claim 5, wherein the process of repeatedly extracting the high-dimensional feature information of the image by the ghost residual mapping network in the step S3 is as follows:
the input of the first layer of 1 multiplied by 1 ghost convolutional layer and the output of the last layer of 1 multiplied by 1 ghost convolutional layer in each GRS in the ghost residual mapping network are directly connected and used as the input of the next GRS module, the operation is circulated to the last GRS, and the convolution mathematical model of the ghost group in the GRS is as follows:
Figure FDA0003814955850000034
where input X = z, the input X being divided into 32 inputs X i ,T(X i ) Representing the mapping result of the ith branch; the mathematical model of the complete ghost group convolution mapping process in the ghost mapping network is as follows:
Figure FDA0003814955850000035
wherein M represents the number of GRS contained in 4 MGR-blocks connected in sequence, the values of M are respectively 3, 4, 6 and 3, k represents the serial numbers of the 4 MGR-blocks connected in sequence, and the values of k corresponding to M are respectively 1,2,3 and 4, T k (X) represents an output result of the ghost mapping network;
the GRS adopts residual connection to directly transmit the input to the output layer, the input of one GRS is divided into 32 inputs, each input is transmitted forward according to the mode of the ghost group convolution on one hand, and is directly transmitted to the output layer by means of the residual connection on the other hand, and therefore the final mathematical model of the GRS is obtained as follows:
Figure FDA0003814955850000036
wherein T is kFin (X) means the final output of the complete ghost residual mapping network GRM, meaning of M and kAs in the mathematical model of the ghost volume mapping process.
8. The image classification method based on the multi-attention ghost residual fusion classification model according to claim 5, wherein the image classification network in the step S4 comprises the following steps:
the input of the image classification network is the output T of the GRM kFin (X),T kFin (X) entering an image classification network, and then enabling an ECA module to perform channel-by-channel global average pool under the condition of not reducing dimensionality, wherein a mathematical model of the ECA module is as follows:
Figure FDA0003814955850000041
wherein M is e Showing the course of the ECA operation,
Figure FDA0003814955850000042
indicating dot multiplication operation, T kFin (X) represents an input T kFin (X) processing results obtained after passing through ECA module, and then processing T kFin And (X) sending the feature maps into a GAP layer, wherein the GAP layer performs global average pooling on each input feature map, so that each feature map corresponds to one feature point, then sending feature vectors formed by all the feature points to a SoftMax layer, identifying and judging the category and the corresponding label of the image to be classified according to the obtained input features by the SoftMax layer, finally obtaining a classification result, and finishing the classification task.
CN202110366308.8A 2021-04-06 2021-04-06 Multi-attention ghost residual fusion classification model and classification method thereof Active CN113052254B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110366308.8A CN113052254B (en) 2021-04-06 2021-04-06 Multi-attention ghost residual fusion classification model and classification method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110366308.8A CN113052254B (en) 2021-04-06 2021-04-06 Multi-attention ghost residual fusion classification model and classification method thereof

Publications (2)

Publication Number Publication Date
CN113052254A CN113052254A (en) 2021-06-29
CN113052254B true CN113052254B (en) 2022-10-04

Family

ID=76517598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110366308.8A Active CN113052254B (en) 2021-04-06 2021-04-06 Multi-attention ghost residual fusion classification model and classification method thereof

Country Status (1)

Country Link
CN (1) CN113052254B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113470827A (en) * 2021-06-30 2021-10-01 上海商汤智能科技有限公司 Classification method and device, electronic equipment and storage medium
CN113658044B (en) * 2021-08-03 2024-02-27 长沙理工大学 Method, system, device and storage medium for improving image resolution
CN117616471A (en) * 2021-10-13 2024-02-27 英特尔公司 Sample adaptive 3D feature calibration and associated proxy
CN114842240A (en) * 2022-04-06 2022-08-02 盐城工学院 Method for classifying images of leaves of MobileNet V2 crops by fusing ghost module and attention mechanism

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046967A (en) * 2019-12-18 2020-04-21 江苏科技大学 Underwater image classification method based on convolutional neural network and attention mechanism
CN111898709A (en) * 2020-09-30 2020-11-06 中国人民解放军国防科技大学 Image classification method and device

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191664B (en) * 2018-11-14 2024-04-23 京东方科技集团股份有限公司 Training method of tag identification network, tag identification device/method and equipment
US11373066B2 (en) * 2019-05-17 2022-06-28 Leica Microsystems Cms Gmbh Deep model matching methods for image transformation
CN111027630B (en) * 2019-12-13 2023-04-07 安徽理工大学 Image classification method based on convolutional neural network
CN111325155B (en) * 2020-02-21 2022-09-23 重庆邮电大学 Video motion recognition method based on residual difference type 3D CNN and multi-mode feature fusion strategy
CN111738300A (en) * 2020-05-27 2020-10-02 复旦大学 Optimization algorithm for detecting and identifying traffic signs and signal lamps
CN111640101B (en) * 2020-05-29 2022-04-29 苏州大学 Ghost convolution characteristic fusion neural network-based real-time traffic flow detection system and method
CN112149747A (en) * 2020-09-27 2020-12-29 浙江物产信息技术有限公司 Hyperspectral image classification method based on improved Ghost3D module and covariance pooling
CN112541409B (en) * 2020-11-30 2021-09-14 北京建筑大学 Attention-integrated residual network expression recognition method
CN112528879B (en) * 2020-12-15 2024-02-02 杭州电子科技大学 Multi-branch pedestrian re-identification method based on improved GhostNet
CN112364944B (en) * 2020-12-18 2022-07-05 福州大学 Deep learning-based household garbage classification method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046967A (en) * 2019-12-18 2020-04-21 江苏科技大学 Underwater image classification method based on convolutional neural network and attention mechanism
CN111898709A (en) * 2020-09-30 2020-11-06 中国人民解放军国防科技大学 Image classification method and device

Also Published As

Publication number Publication date
CN113052254A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
CN113052254B (en) Multi-attention ghost residual fusion classification model and classification method thereof
CN111310773B (en) Efficient license plate positioning method of convolutional neural network
WO2019001070A1 (en) Adjacency matrix-based connection information organization system, image feature extraction system, and image classification system and method
WO2019001071A1 (en) Adjacency matrix-based graph feature extraction system and graph classification system and method
CN105354273A (en) Method for fast retrieving high-similarity image of highway fee evasion vehicle
CN111178316A (en) High-resolution remote sensing image land cover classification method based on automatic search of depth architecture
Atif et al. A review on semantic segmentation from a modern perspective
Sun et al. Hyperlayer bilinear pooling with application to fine-grained categorization and image retrieval
Cai et al. Softer pruning, incremental regularization
Su et al. Transfer learning for video recognition with scarce training data for deep convolutional neural network
CN114419413A (en) Method for constructing sensing field self-adaptive transformer substation insulator defect detection neural network
CN112862015A (en) Paper classification method and system based on hypergraph neural network
Wang et al. TF-SOD: a novel transformer framework for salient object detection
Mereu et al. Learning sequential descriptors for sequence-based visual place recognition
CN113870160A (en) Point cloud data processing method based on converter neural network
CN106355210A (en) Method for expressing infrared image features of insulators on basis of depth neuron response modes
CN115830575A (en) Transformer and cross-dimension attention-based traffic sign detection method
Li et al. Few-shot hyperspectral image classification with self-supervised learning
Zhu et al. Local information fusion network for 3D shape classification and retrieval
Singh et al. Iml-gcn: Improved multi-label graph convolutional network for efficient yet precise image classification
CN116206082A (en) Semantic scene completion method, system, equipment and storage medium
CN104408158A (en) Viewpoint tracking method based on geometrical reconstruction and semantic integration
CN114998702A (en) Entity recognition and knowledge graph generation method and system based on BlendMask
Liu et al. Playing to Vision Foundation Model's Strengths in Stereo Matching
Wang et al. A spatio-temporal attention convolution block for action recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant