CN113011444B - Image identification method based on neural network frequency domain attention mechanism - Google Patents

Image identification method based on neural network frequency domain attention mechanism Download PDF

Info

Publication number
CN113011444B
CN113011444B CN202011504311.3A CN202011504311A CN113011444B CN 113011444 B CN113011444 B CN 113011444B CN 202011504311 A CN202011504311 A CN 202011504311A CN 113011444 B CN113011444 B CN 113011444B
Authority
CN
China
Prior art keywords
attention
frequency domain
image
network
spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011504311.3A
Other languages
Chinese (zh)
Other versions
CN113011444A (en
Inventor
李玺
秦泽群
张芃怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202011504311.3A priority Critical patent/CN113011444B/en
Publication of CN113011444A publication Critical patent/CN113011444A/en
Application granted granted Critical
Publication of CN113011444B publication Critical patent/CN113011444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • G06V10/431Frequency domain transformation; Autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a frequency domain attention mechanism design method based on a neural network, which is used for image recognition. The method specifically comprises the following steps: acquiring an image recognition data set for training a neural network, and defining an algorithm target; establishing a single frequency domain transformation basis function selection model; establishing a combined frequency domain transformation basis function selection model; establishing a frequency domain attention mechanism based on a neural network; training a prediction model based on the modeling result; and performing image recognition by using the prediction model. By bringing the information of different frequency domains into an attention mechanism, the invention realizes the great improvement of precision on various image identification tasks (image classification, target detection and example segmentation) under the condition of the same calculated amount and complexity, and has good application value.

Description

Image identification method based on neural network frequency domain attention mechanism
Technical Field
The invention belongs to the field of image processing, and particularly relates to an image identification method based on a neural network frequency domain attention mechanism.
Background
In recent years, the attention mechanism of the neural network gradually attracts people due to simple calculation and remarkable effect, and is widely applied to many fields such as computer vision. The mechanism mainly comprises two key steps: the first is how to efficiently extract information from the neural network as input to the attention mechanism; the second is how to design an attention calculation method, get reasonable attention from the input, and improve the learning of the neural network. For the first point, existing methods all use global average pooling operations to efficiently extract information for attention calculations; for the second point, the existing method generally uses a fully-connected network as an attention calculation method, and meanwhile, since the fully-connected network has the calculation complexity of inputting a scale square term, the complexity of the first step is also constrained, so that people must use a global average pooling operation to extract information. Although the global average pooling operation is computationally simple and efficient, it is equivalent to extracting only the lowest frequency portion of the information, while the information of other frequencies is entirely discarded.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an image identification method based on a neural network frequency domain attention mechanism, which adopts a design of the neural network frequency domain attention mechanism combined with multi-band information, has the same calculation complexity as the global average pooling operation, and can extract more frequency spectrum information, so that the input of the attention mechanism contains more abundant information, thereby improving the accuracy of the whole network and simultaneously keeping the same calculation amount.
In order to achieve the purpose, the technical scheme of the invention is as follows:
an image identification method based on a neural network frequency domain attention mechanism comprises the following steps:
s1, acquiring an image recognition data set for training a neural network;
s2, establishing an attention basic network by taking ResNet as a backbone;
s3, establishing a single frequency domain transformation basis function selection model based on the attention basic network in the S2;
s4, establishing a combined frequency domain transformation basis function selection model on the basis of S2 and S3;
s5, establishing a frequency domain attention mechanism based on the neural network on the basis of S4 to form a final model;
s6, training the final model in S5 based on the image recognition data set in S1 to obtain an image prediction model;
and S7, inputting the image to be recognized into the image prediction model for image recognition.
Preferably, in step S1, the data set for image recognition includes a group of images
Figure GDA0003038645640000021
In which IiK is the number of images in the image group for the ith image;
the algorithm targets are defined as: and acquiring a classification result of each picture.
Further, in step S2, the process of establishing the attention infrastructure network is as follows:
s21, constructing ResNet as a basic backbone network;
s22, adding an attention mechanism on the basis of ResNet to construct an attention base network, wherein X is the same as RC×H×WFor the output characteristics of a single layer in the ResNet network, wherein C, H, W is the channel number, the height of the characteristic diagram, and the width of the characteristic diagram of the characteristic, respectively, the attention mechanism is to transform the output X of the layer as follows:
att=sigmoid(fc(fi))
wherein att ∈ RCFor the attention vector obtained after transformation, sigmoid () is a sigmoid activation function, fc (-) is a two-layer fully-connected network, fi∈RCIs the frequency spectrum of the input data X;
transformed output characteristics of one layer of ResNet network
Figure GDA0003038645640000027
Comprises the following steps:
Figure GDA0003038645640000022
wherein
Figure GDA0003038645640000023
Att, the ith channel of the transformed featureiIs the ith value, X, of the attention vectori,:,:The ith channel which is input data X; an attention mechanism is added to each layer in the ResNet network, the output characteristics of the current layer are transformed, and then the transformed characteristics are used
Figure GDA0003038645640000024
Inputting the feature after attention processing into the next layer of ResNet, and obtaining the attention base network.
Further, in step S3, the process of selecting a model by the single frequency domain transform basis function is as follows:
s31, outputting the characteristic X belonging to R of each layerC×H×WIs divided into C two-dimensional characteristic graphs x2d∈RH×WAnd for each two-dimensional feature map x2dPerforming discrete cosine transform, wherein the transform process comprises the following steps:
Figure GDA0003038645640000025
s.t.h∈{0,1,…,H-1},w∈{0,1,…,W-1}
for a two-dimensional feature map x of size H W2dObtaining H multiplied by W transformed frequency spectrum components; f. of2d∈RH×WNamely, obtaining a discrete cosine transform frequency spectrum result;
Figure GDA0003038645640000026
for transforming the frequency spectrum f by discrete cosine2dIn [ h, w ]]A value of the location;
s32, aiming at C two-dimensional feature maps x2dResulting C spectra f2dEach time f is selected2dFor X ∈ R, thenC×H×WEach time obtaining an fi∈RC(ii) a F is to be measurediAnd (5) bringing the attention basic network established in the S2, training and testing the performance of the frequency spectrum component as single input, and finally obtaining the performance sequence of all the frequency spectrum components according to the test results of different frequency components.
Further, in step S4, the process of establishing the combined frequency domain transform basis function selection model is as follows:
s41, sorting the performance according to the single frequency spectrum obtained in the step S32 when the single frequency spectrum is used as input, and sequentially taking 1, 2, 4, 8, 16 and 32 frequency spectrum components with the highest performance to form 6 combinations of frequency components with different quantities;
s42 for renA combination of X ∈ RC×H×WDividing the channel dimension, namely C dimension, according to the number of the frequency components; assuming that the number of frequency domains in a combination is nf, then nf should be able to divide C by X0,X1,…,Xnf-1]For the divided part, the input is divided as follows:
Figure GDA0003038645640000031
wherein
Figure GDA0003038645640000032
Represents the first of X
Figure GDA0003038645640000033
To
Figure GDA0003038645640000034
A channel; after division, each part is sequentially subjected to frequency spectrum decomposition by using corresponding frequency bands in the frequency component combination according to the method of S32 to obtain [ f0,f1,…,fnf-1]Each of which
Figure GDA0003038645640000035
s.t.j belongs to {0,1, …, nf-1 }; and then splicing the frequency spectrum of each part:
fi=cat([f0,f1,…,fnf-1])
where cat (. cndot.) is the splicing function, yielding fi∈RC
S43, f obtained by respectively combining 6 combinations of 1, 2, 4, 8, 16 and 32 spectrum componentsi∈RCSubstituting the data into the attention basic network established in S2, training and testing the model to obtain the performance of each combination;
s44, selecting the combination with the highest performance as the frequency spectrum input f 'of the final model'i
Further, in step S5, the process of establishing the frequency domain attention mechanism based on the neural network is as follows:
s51, input Spectrum f 'for the Final model obtained in S44'iThe following attention mechanism is established and the attention vector is obtained:
att′=sigmoid(fc(f′i))
s53, for each channel of the input image or the characteristic X of the basic network in S2, performing attention scale transformation according to the attention vector att' to obtain final output
Figure GDA0003038645640000036
Figure GDA0003038645640000037
Wherein
Figure GDA0003038645640000038
Att 'as the ith channel of the transformed feature'iIs the ith value, X, of attention vector atti,:,:And inputting the ith channel of the image or the feature, and establishing a frequency domain attention mechanism of the neural network according to the ith channel to form a final model.
Further, the specific process of step S6 is as follows: based on the image recognition data set in S1, after single spectrum performance ranking obtained by S2 and S3 is used, 1, 2, 4, 8, 16 and 32 frequencies with the highest performance are respectively selected to obtain 6 spectrum combinations, and then the 6 spectrum combinations are substituted into S4 to obtain each combination spectrum performance ranking and obtain the spectrum combination with the highest performance; and substituting the spectrum combination with the highest performance into S5 to serve as an input spectrum of a final model, and performing final model training based on the image recognition data set in S1 to obtain an image recognition prediction model.
Further, step S7 is specifically as follows: and after the prediction model in the step S6 is obtained, inputting the image to be recognized into the prediction model for prediction to obtain an image classification prediction result.
Compared with the existing attention mechanism method, the image identification method based on the neural network frequency domain attention mechanism has the following beneficial effects:
firstly, the image identification method based on the neural network frequency domain attention mechanism defines an attention mechanism based on frequency domain analysis. The original attention mechanism is popularized to the frequency domain, and information noticed by the attention mechanism is more complete due to the complete property of the frequency domain.
Secondly, compared with the original mean value method, the frequency domain analysis method expanded by the image identification method based on the neural network frequency domain attention mechanism has the same parameter amount and calculated amount, and can seamlessly expand the original arbitrary attention mechanism network.
Finally, the invention realizes the great improvement of precision on various image identification tasks (image classification, target detection and example segmentation) under the condition of the same calculation amount and complexity by bringing the information of different frequency domains into an attention mechanism, and has good application value.
Drawings
FIG. 1 is a flowchart of an image recognition method based on a neural network frequency domain attention mechanism.
Detailed Description
The invention will be further elucidated and described with reference to the drawings and the detailed description. The technical characteristics of the embodiments of the invention can be correspondingly combined without mutual conflict.
In a preferred embodiment of the present invention, as shown in fig. 1, there is provided an image recognition method based on a neural network frequency domain attention mechanism, which includes the following steps:
and S1, acquiring an image recognition data set for training the neural network.
In step S1 of the present embodiment, the data set for image recognition includes a group of images
Figure GDA0003038645640000041
Wherein IiK is the number of images in the image group for the ith image;
the algorithm targets are defined as: and acquiring a classification result of each picture.
And S2, establishing an attention base network by using ResNet as a backbone.
In step S2 of this embodiment, the specific process is as follows:
s21, constructing ResNet as a basic backbone network;
s22, adding an attention mechanism on the basis of ResNet to construct an attention base network, wherein X is the same as RC×H×WFor the output characteristics of a single layer in the ResNet network, wherein C, H, W is the channel number, the height of the characteristic diagram, and the width of the characteristic diagram of the characteristic, respectively, the attention mechanism is to transform the output X of the layer as follows:
att=sigmoid(fc(fi))
wherein att ∈ RCFor the attention vector obtained after transformation, sigmoid () is a sigmoid activation function, fc (-) is a two-layer fully-connected network, fi∈RCIs the spectrum of the input data X. The method of obtaining the frequency spectrum may be a single frequency domain transform basis function selection model in S3, or may be a combined frequency domain transform basis function selection model in S4.
Transformed output characteristics of one layer of ResNet network
Figure GDA0003038645640000051
Comprises the following steps:
Figure GDA0003038645640000052
wherein
Figure GDA0003038645640000053
Att, the ith channel of the transformed featureiIs the ith value, X, of the attention vectori,:,:The ith channel which is input data X; an attention mechanism is added to each layer in the ResNet network, the output characteristics of the current layer are transformed, and then the transformed characteristics are used
Figure GDA0003038645640000054
As a result of attentionInputting the feature after the force processing into the next layer of ResNet, and obtaining the attention base network.
S3, establishing a single frequency domain transformation basis function selection model based on the attention base network in S2.
In step S3 of this embodiment, the specific process is as follows:
s31, outputting the characteristic X belonging to R of each layerC×H×WIs divided into C two-dimensional characteristic graphs x2d∈RH×WAnd for each two-dimensional feature map x2dPerforming discrete cosine transform, wherein the transform process comprises the following steps:
Figure GDA0003038645640000055
s.t.h∈{0,1,…,H-1},w∈{0,1,…,W-1}
for a two-dimensional feature map x of size H W2dObtaining H multiplied by W transformed frequency spectrum components; f. of2d∈RH×WNamely, obtaining a discrete cosine transform frequency spectrum result;
Figure GDA0003038645640000056
for transforming the frequency spectrum f by discrete cosine2dIn [ h, w ]]A value of the location;
s32, aiming at C two-dimensional feature maps x2dResulting C spectra f2dEach time f is selected2dOf a spectrum (e.g. the first C spectra f)2dOnly select
Figure GDA0003038645640000061
Second C frequency spectra f2dOnly select
Figure GDA0003038645640000062
) Then for X ∈ RC×H×WEach time obtaining an fi∈RC(ii) a F is to be measurediAnd (5) bringing the attention basic network established in the S2, training and testing the performance of the frequency spectrum component as single input, and finally obtaining the performance sequence of all the frequency spectrum components according to the test results of different frequency components.
And S4, establishing a combined frequency domain transformation basis function selection model on the basis of S2 and S3.
In step S4 of this embodiment, the specific process is as follows:
s41, sorting the performance according to the single frequency spectrum obtained in the step S32 when the single frequency spectrum is used as input, and sequentially taking 1, 2, 4, 8, 16 and 32 frequency spectrum components with the highest performance to form 6 combinations of frequency components with different quantities;
s42, for any combination, inputting X epsilon RC×H×WDividing the channel dimension, namely C dimension, according to the number of the frequency components; assuming that the number of frequency domains in a combination is nf, then nf should be able to divide C by X0,X1,…,Xnf-1]For the divided part, the input is divided as follows:
Figure GDA0003038645640000063
wherein
Figure GDA0003038645640000064
Represents the first of X
Figure GDA0003038645640000065
To
Figure GDA0003038645640000066
A channel; after division, each part is sequentially subjected to frequency spectrum decomposition by using corresponding frequency bands in the frequency component combination according to the method of S32 to obtain [ f0,f1,…,fnf-1]Each of which
Figure GDA0003038645640000067
s.t.j belongs to {0,1, …, nf-1 }; and then splicing the frequency spectrum of each part:
fi=cat([f0,f1,…,fnf-1])
where cat (. cndot.) is the splicing function, yielding fi∈RC
S43, 1, 2,4. F obtained from 6 combinations of 8, 16 and 32 spectrum componentsi∈RCSubstituting the data into the attention basic network established in S2, training and testing the model to obtain the performance of each combination;
s44, selecting the combination with the highest performance as the frequency spectrum input f 'of the final model'i
And S5, establishing a frequency domain attention mechanism based on the neural network on the basis of the S4, and forming a final model. In step S5 of the present embodiment, the process of establishing the frequency domain attention mechanism based on the neural network is as follows:
s51, input Spectrum f 'for the Final model obtained in S44'iThe following attention mechanism is established and the attention vector is obtained:
att′=sigmoid(fc(f′i))
s53, for each channel of the input image or the characteristic X of the basic network in S2, performing attention scale transformation according to the attention vector att' to obtain final output
Figure GDA0003038645640000068
Figure GDA0003038645640000069
Wherein
Figure GDA0003038645640000071
Att 'as the ith channel of the transformed feature'iIs the ith value, X, of attention vector atti,:,:And inputting the ith channel of the image or the feature, and establishing a frequency domain attention mechanism of the neural network to form a final model.
And S6, training the final model in S5 based on the image recognition data set in S1 to obtain an image prediction model.
In step S6 of the present embodiment, the process of training the prediction model based on the modeling results of S3, S4, and S5 is as follows: based on the image recognition data set in S1, after single spectrum performance ranking obtained in S2 and S3 is used, 1, 2, 4, 8, 16, 32 frequencies with the highest performance are respectively taken to obtain 6 spectrum combinations, and then the 6 spectrum combinations are substituted into S4 to obtain each combination spectrum performance ranking, and a spectrum combination with the highest performance is obtained; and substituting the spectrum combination with the highest performance into S5 to serve as an input spectrum of a final model, and performing final model training based on the image recognition data set in S1 to obtain an image recognition prediction model.
And S7, inputting the image to be recognized into the image prediction model for image recognition.
In step S7 of this embodiment, the specific process is as follows: and after the prediction model in the step S6 is obtained, inputting the image to be recognized into the prediction model for prediction to obtain an image classification prediction result.
The methods of S1-S7 are applied to specific data sets to demonstrate the technical effects that can be achieved.
Examples
The implementation method of this embodiment is as described above, and specific steps are not elaborated, and the effect is shown only for case data. The invention is implemented on a data set with truth value labels of two images, which respectively comprises the following steps:
ImageNet dataset [1 ]: the data set contained 1000 classes of natural images, 1281167 training pictures, 50000 verification images, each image labeled to contain a category.
MS COCO data set [2 ]: the data set includes object detection tasks and instance segmentation tasks, including 80 countable object classes and 91 countable object classes. The data set had over 33 million images, 150 object instances.
In this embodiment, classification accuracy comparison is mainly performed on the ImageNet data set, which is Top-1 accuracy and Top-5 accuracy respectively. In addition, the present embodiment compares the parameter quantities Parameters with the calculated quantity FLOPS.
Table 1 comparison of evaluation indexes on ImageNet dataset in this example
Figure GDA0003038645640000081
On the MS COCO data set, the present embodiment uses the network proposed in the patent as a backbone network, and uses fast RCNN and Mask RCNN to respectively implement an object detection task and an instance segmentation task, where the comparison indexes include an average accuracy AP, an average accuracy AP50 when the threshold is 0.5, and an average accuracy AP75 when the threshold is 0.75.
Table 2 comparison of each index of object detection task on MS COCO data set in this embodiment
Figure GDA0003038645640000091
Table 3 comparison of each index of example segmentation task on MS COCO dataset in this example
Method AP AP50 AP75
ResNet-50 34.1 55.5 36.2
SENet 35.4 57.4 37.8
GCNet 35.7 58.4 37.6
ECANet 35.6 58.1 37.7
The method of the invention 36.2 58.6 38.1
The prior art cited above for comparison with the present invention can be found in the following references:
[1]Deng J,Dong W,Socher R,et al.ImageNet:A large-scale hierarchical image database[C]//IEEE Conference on Computer Vision&Pattern Recognition.IEEE,2009.
[2]Lin T Y,Maire M,Belongie S,et al.Microsoft COCO:Common Objects in Context[C]//European Conference on Computer Vision.Springer International Publishing,2014.
[3]He K,Zhang X,Ren S,et al.Deep Residual Learning for Image Recognition[C]//IEEE Conference on Computer Vision&Pattern Recognition.IEEE Computer Society,2016.
[4]Hu J,Shen L,Albanie S,et al.Squeeze-and-Excitation Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,PP(99).
[5]Wang Q,Wu B,Zhu P,et al.ECA-Net:Efficient Channel Attention for Deep Convolutional Neural Networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020.
[6]Woo S,Park J,Lee JY,So Kweon I.Cbam:Convolutional block attention module.InProceedings of the European conference on computer vision(ECCV)2018.
[7]Gao Z,Xie J,Wang Q,Li P.Global second-order pooling convolutional networks[C]//2019 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019.
[8]Cao Y,Xu J,Lin S,Wei F,Hu H.Gcnet:Non-local networks meet squeeze-excitation networks and beyond[C]//2019 IEEE International Conference on Computer Vision Workshops.IEEE,2019.
[9]Bello I,Zoph B,Le Q,et al.Attention Augmented Convolutional Networks[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV).IEEE,2020.
[10]Ren S,He K,Girshick R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2017,39(6):1137-1149.
[11]He K,Gkioxari G,Dollár P,Girshick R.Mask r-cnn[C]\\2017 IEEE international conference on computer vision.IEEE,2017.
the above-described embodiments are merely preferred embodiments of the present invention, and are not intended to limit the present invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical scheme obtained by adopting the mode of equivalent replacement or equivalent transformation is within the protection scope of the invention.

Claims (3)

1. An image identification method based on a neural network frequency domain attention mechanism is characterized by comprising the following steps:
s1, acquiring an image recognition data set for training a neural network;
s2, establishing an attention basic network by taking ResNet as a backbone;
s3, establishing a single frequency domain transformation basis function selection model based on the attention basic network in the S2;
s4, establishing a combined frequency domain transformation basis function selection model on the basis of S2 and S3;
s5, establishing a frequency domain attention mechanism based on the neural network on the basis of S4 to form a final model;
s6, training the final model in S5 based on the image recognition data set in S1 to obtain an image prediction model;
s7, inputting the image to be recognized into the image prediction model for image recognition;
in step S1, the data set for image recognition includes a group of images
Figure FDA0003555747640000011
Wherein Ii″For the ith "image, K is the number of images in the image group;
the algorithm targets are defined as: obtaining a classification result of each picture;
in step S2, the process of establishing the attention-based network is as follows:
s21, constructing ResNet as a basic backbone network;
s22, adding an attention mechanism on the basis of ResNet to construct an attention base network, wherein X is the same as RC×H×WFor the output characteristics of a single layer in the ResNet network, wherein C, H, W is the channel number, the height of the characteristic diagram, and the width of the characteristic diagram of the characteristic, respectively, the attention mechanism is to transform the output X of the layer as follows:
att=sigmoid(fc(fi))
wherein att ∈ RCFor the attention vector obtained after transformation, sigmoid () is a sigmoid activation function, fc (-) is a two-layer fully-connected network, fi∈RCIs the frequency spectrum of the input data X;
transformed output characteristics of one layer of ResNet network
Figure FDA0003555747640000015
Comprises the following steps:
Figure FDA0003555747640000012
wherein
Figure FDA0003555747640000013
Att, the ith channel of the transformed featureiIs the ith value, X, of the attention vectori,:,:The ith channel which is input data X; an attention mechanism is added to each layer in the ResNet network, the output characteristics of the current layer are transformed, and then the transformed characteristics are used
Figure FDA0003555747640000014
Inputting the feature subjected to attention processing into the next layer of ResNet to obtain an attention base network;
in step S3, the process of selecting a model by a single frequency domain transform basis function is as follows:
s31, outputting the characteristic X belonging to R of each layerC×H×WIs divided into C two-dimensional characteristic graphs x2d∈RH×WAnd for each two-dimensional feature map x2dPerforming discrete cosine transform, wherein the transform process comprises the following steps:
Figure FDA0003555747640000021
for a two-dimensional feature map x of size H W2dObtaining H multiplied by W transformed frequency spectrum components; f. of2d∈RH×WNamely, obtaining a discrete cosine transform frequency spectrum result;
Figure FDA0003555747640000022
for transforming the frequency spectrum f by discrete cosine2dIn [ h, w ]]A value of the location;
s32, for C two-dimensional feature maps x2dResulting C spectra f2dEach time f is selected2dFor X ∈ R, thenC×H×WEach time obtaining an fi∈RC(ii) a F is to be measurediBringing into the attention-based network established at S2, training and testing the performance of the spectral components as a single input, according toFinally obtaining the performance sequence of all the frequency spectrum components according to the test results of different frequency components;
in step S4, the process of establishing the combined frequency domain transform basis function selection model is as follows:
s41, sorting the performance according to the single frequency spectrum obtained in the step S32 when the single frequency spectrum is used as input, and sequentially taking 1, 2, 4, 8, 16 and 32 frequency spectrum components with the highest performance to form 6 combinations of frequency components with different quantities;
s42, inputting X epsilon for any combinationWC×H×WDividing the channel dimension, namely C dimension, according to the number of the frequency components; assuming that the number of frequency domains in a combination is nf, then nf should be able to divide C by X0,X1,…,Xnf-1]For the divided part, the input is divided as follows:
Figure FDA0003555747640000023
wherein
Figure FDA0003555747640000024
Represents the first of X
Figure FDA0003555747640000025
To
Figure FDA0003555747640000026
A channel; after division, each part is sequentially subjected to frequency spectrum decomposition by using corresponding frequency bands in the frequency component combination according to the method of S32 to obtain [ f0,f1,…,fnf-1]Each of which
Figure FDA0003555747640000027
And then splicing the frequency spectrum of each part:
fi=cat([f0,f1,…,fnf-1])
where cat (. cndot.) is the splicing function, yielding fi∈RC
S43, f obtained by 6 combinations of 1, 2, 4, 8, 16 and 32 frequency spectrum componentsi∈RCSubstituting the data into the attention basic network established in S2, training and testing the model to obtain the performance of each combination;
s44, selecting the combination with the highest performance as the frequency spectrum input f 'of the final model'i
In step S5, the process of establishing the frequency domain attention mechanism based on the neural network is as follows:
s51, input Spectrum f 'for the Final model obtained in S44'iThe following attention mechanism is established and the attention vector is obtained:
att′=sigmoid(fc(f′i))
s53, for each channel of the input image or the characteristic X of the basic network in S2, performing attention scale transformation according to the attention vector att' to obtain final output
Figure FDA0003555747640000031
Figure FDA0003555747640000032
Wherein
Figure FDA0003555747640000033
Att 'as the ith channel of the transformed feature'iIs the ith value, X, of attention vector atti,:,:And inputting the ith channel of the image or the feature, and establishing a frequency domain attention mechanism of the neural network according to the ith channel to form a final model.
2. The image recognition method based on the neural network frequency domain attention mechanism as claimed in claim 1, wherein the step S6 is as follows: based on the image recognition data set in S1, after single spectrum performance ranking obtained in S2 and S3 is used, 1, 2, 4, 8, 16, 32 frequencies with the highest performance are respectively taken to obtain 6 spectrum combinations, and then the 6 spectrum combinations are substituted into S4 to obtain each combination spectrum performance ranking, and a spectrum combination with the highest performance is obtained; and substituting the spectrum combination with the highest performance into S5 to serve as an input spectrum of a final model, and performing final model training based on the image recognition data set in S1 to obtain an image recognition prediction model.
3. The image recognition method based on the neural network frequency domain attention mechanism as claimed in claim 2, wherein the step S7 is as follows: and after the prediction model in the step S6 is obtained, inputting the image to be recognized into the prediction model for prediction to obtain an image classification prediction result.
CN202011504311.3A 2020-12-18 2020-12-18 Image identification method based on neural network frequency domain attention mechanism Active CN113011444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011504311.3A CN113011444B (en) 2020-12-18 2020-12-18 Image identification method based on neural network frequency domain attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011504311.3A CN113011444B (en) 2020-12-18 2020-12-18 Image identification method based on neural network frequency domain attention mechanism

Publications (2)

Publication Number Publication Date
CN113011444A CN113011444A (en) 2021-06-22
CN113011444B true CN113011444B (en) 2022-05-13

Family

ID=76383532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011504311.3A Active CN113011444B (en) 2020-12-18 2020-12-18 Image identification method based on neural network frequency domain attention mechanism

Country Status (1)

Country Link
CN (1) CN113011444B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113706570B (en) * 2021-08-02 2023-09-15 中山大学 Segmentation method and device for zebra fish fluorescence image
CN113643261B (en) * 2021-08-13 2023-04-18 江南大学 Lung disease diagnosis method based on frequency attention network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107024987A (en) * 2017-03-20 2017-08-08 南京邮电大学 A kind of real-time human brain Test of attention and training system based on EEG
DE102018202440A1 (en) * 2018-02-19 2019-08-22 Aktiebolaget Skf measuring system
CN111382795A (en) * 2020-03-09 2020-07-07 交叉信息核心技术研究院(西安)有限公司 Image classification processing method of neural network based on frequency domain wavelet base processing
CN111539449A (en) * 2020-03-23 2020-08-14 广东省智能制造研究所 Sound source separation and positioning method based on second-order fusion attention network model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245665B (en) * 2019-05-13 2023-06-06 天津大学 Image semantic segmentation method based on attention mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107024987A (en) * 2017-03-20 2017-08-08 南京邮电大学 A kind of real-time human brain Test of attention and training system based on EEG
DE102018202440A1 (en) * 2018-02-19 2019-08-22 Aktiebolaget Skf measuring system
CN111382795A (en) * 2020-03-09 2020-07-07 交叉信息核心技术研究院(西安)有限公司 Image classification processing method of neural network based on frequency domain wavelet base processing
CN111539449A (en) * 2020-03-23 2020-08-14 广东省智能制造研究所 Sound source separation and positioning method based on second-order fusion attention network model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Dual attention network for scene;Jun Fu,et.al;《IEEE Conf. Comput. Vis. Pattern Recog.》;20191231;全文 *
TF 2 an: a temporal-frequency fusion attention network for spectrum energy level prediction;Li, K.,et.al;《 2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking》;20190630;全文 *
基于层次注意力机制的维度情感识别方法;汤宇豪,等;《计算机工程》;20190530;全文 *
结合分段频域和局部注意力的超声甲状腺分割;胡屹杉等;《中国图象图形学报》;20201016(第10期);全文 *

Also Published As

Publication number Publication date
CN113011444A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
CN111462126B (en) Semantic image segmentation method and system based on edge enhancement
Hayder et al. Boundary-aware instance segmentation
Cao et al. Landmark recognition with sparse representation classification and extreme learning machine
CN110738207A (en) character detection method for fusing character area edge information in character image
CN111738143B (en) Pedestrian re-identification method based on expectation maximization
CN105956560A (en) Vehicle model identification method based on pooling multi-scale depth convolution characteristics
CN112966137B (en) Image retrieval method and system based on global and local feature rearrangement
CN102750385B (en) Correlation-quality sequencing image retrieval method based on tag retrieval
CN109740679B (en) Target identification method based on convolutional neural network and naive Bayes
CN104778476B (en) A kind of image classification method
CN104778457A (en) Video face identification algorithm on basis of multi-instance learning
CN113011444B (en) Image identification method based on neural network frequency domain attention mechanism
CN111126396A (en) Image recognition method and device, computer equipment and storage medium
CN104077742B (en) Human face sketch synthetic method and system based on Gabor characteristic
Hayder et al. Shape-aware instance segmentation
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN111461039A (en) Landmark identification method based on multi-scale feature fusion
CN113554654A (en) Point cloud feature extraction model based on graph neural network and classification and segmentation method
CN112017162B (en) Pathological image processing method, pathological image processing device, storage medium and processor
CN114332544A (en) Image block scoring-based fine-grained image classification method and device
CN113269224A (en) Scene image classification method, system and storage medium
CN112990282A (en) Method and device for classifying fine-grained small sample images
Sun et al. Deep learning based pedestrian detection
CN113822134A (en) Instance tracking method, device, equipment and storage medium based on video
CN116796248A (en) Forest health environment assessment system and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant