CN113011444B - Image identification method based on neural network frequency domain attention mechanism - Google Patents
Image identification method based on neural network frequency domain attention mechanism Download PDFInfo
- Publication number
- CN113011444B CN113011444B CN202011504311.3A CN202011504311A CN113011444B CN 113011444 B CN113011444 B CN 113011444B CN 202011504311 A CN202011504311 A CN 202011504311A CN 113011444 B CN113011444 B CN 113011444B
- Authority
- CN
- China
- Prior art keywords
- attention
- frequency domain
- image
- network
- spectrum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
- G06V10/431—Frequency domain transformation; Autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a frequency domain attention mechanism design method based on a neural network, which is used for image recognition. The method specifically comprises the following steps: acquiring an image recognition data set for training a neural network, and defining an algorithm target; establishing a single frequency domain transformation basis function selection model; establishing a combined frequency domain transformation basis function selection model; establishing a frequency domain attention mechanism based on a neural network; training a prediction model based on the modeling result; and performing image recognition by using the prediction model. By bringing the information of different frequency domains into an attention mechanism, the invention realizes the great improvement of precision on various image identification tasks (image classification, target detection and example segmentation) under the condition of the same calculated amount and complexity, and has good application value.
Description
Technical Field
The invention belongs to the field of image processing, and particularly relates to an image identification method based on a neural network frequency domain attention mechanism.
Background
In recent years, the attention mechanism of the neural network gradually attracts people due to simple calculation and remarkable effect, and is widely applied to many fields such as computer vision. The mechanism mainly comprises two key steps: the first is how to efficiently extract information from the neural network as input to the attention mechanism; the second is how to design an attention calculation method, get reasonable attention from the input, and improve the learning of the neural network. For the first point, existing methods all use global average pooling operations to efficiently extract information for attention calculations; for the second point, the existing method generally uses a fully-connected network as an attention calculation method, and meanwhile, since the fully-connected network has the calculation complexity of inputting a scale square term, the complexity of the first step is also constrained, so that people must use a global average pooling operation to extract information. Although the global average pooling operation is computationally simple and efficient, it is equivalent to extracting only the lowest frequency portion of the information, while the information of other frequencies is entirely discarded.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an image identification method based on a neural network frequency domain attention mechanism, which adopts a design of the neural network frequency domain attention mechanism combined with multi-band information, has the same calculation complexity as the global average pooling operation, and can extract more frequency spectrum information, so that the input of the attention mechanism contains more abundant information, thereby improving the accuracy of the whole network and simultaneously keeping the same calculation amount.
In order to achieve the purpose, the technical scheme of the invention is as follows:
an image identification method based on a neural network frequency domain attention mechanism comprises the following steps:
s1, acquiring an image recognition data set for training a neural network;
s2, establishing an attention basic network by taking ResNet as a backbone;
s3, establishing a single frequency domain transformation basis function selection model based on the attention basic network in the S2;
s4, establishing a combined frequency domain transformation basis function selection model on the basis of S2 and S3;
s5, establishing a frequency domain attention mechanism based on the neural network on the basis of S4 to form a final model;
s6, training the final model in S5 based on the image recognition data set in S1 to obtain an image prediction model;
and S7, inputting the image to be recognized into the image prediction model for image recognition.
Preferably, in step S1, the data set for image recognition includes a group of imagesIn which IiK is the number of images in the image group for the ith image;
the algorithm targets are defined as: and acquiring a classification result of each picture.
Further, in step S2, the process of establishing the attention infrastructure network is as follows:
s21, constructing ResNet as a basic backbone network;
s22, adding an attention mechanism on the basis of ResNet to construct an attention base network, wherein X is the same as RC×H×WFor the output characteristics of a single layer in the ResNet network, wherein C, H, W is the channel number, the height of the characteristic diagram, and the width of the characteristic diagram of the characteristic, respectively, the attention mechanism is to transform the output X of the layer as follows:
att=sigmoid(fc(fi))
wherein att ∈ RCFor the attention vector obtained after transformation, sigmoid () is a sigmoid activation function, fc (-) is a two-layer fully-connected network, fi∈RCIs the frequency spectrum of the input data X;
whereinAtt, the ith channel of the transformed featureiIs the ith value, X, of the attention vectori,:,:The ith channel which is input data X; an attention mechanism is added to each layer in the ResNet network, the output characteristics of the current layer are transformed, and then the transformed characteristics are usedInputting the feature after attention processing into the next layer of ResNet, and obtaining the attention base network.
Further, in step S3, the process of selecting a model by the single frequency domain transform basis function is as follows:
s31, outputting the characteristic X belonging to R of each layerC×H×WIs divided into C two-dimensional characteristic graphs x2d∈RH×WAnd for each two-dimensional feature map x2dPerforming discrete cosine transform, wherein the transform process comprises the following steps:
s.t.h∈{0,1,…,H-1},w∈{0,1,…,W-1}
for a two-dimensional feature map x of size H W2dObtaining H multiplied by W transformed frequency spectrum components; f. of2d∈RH×WNamely, obtaining a discrete cosine transform frequency spectrum result;for transforming the frequency spectrum f by discrete cosine2dIn [ h, w ]]A value of the location;
s32, aiming at C two-dimensional feature maps x2dResulting C spectra f2dEach time f is selected2dFor X ∈ R, thenC×H×WEach time obtaining an fi∈RC(ii) a F is to be measurediAnd (5) bringing the attention basic network established in the S2, training and testing the performance of the frequency spectrum component as single input, and finally obtaining the performance sequence of all the frequency spectrum components according to the test results of different frequency components.
Further, in step S4, the process of establishing the combined frequency domain transform basis function selection model is as follows:
s41, sorting the performance according to the single frequency spectrum obtained in the step S32 when the single frequency spectrum is used as input, and sequentially taking 1, 2, 4, 8, 16 and 32 frequency spectrum components with the highest performance to form 6 combinations of frequency components with different quantities;
s42 for renA combination of X ∈ RC×H×WDividing the channel dimension, namely C dimension, according to the number of the frequency components; assuming that the number of frequency domains in a combination is nf, then nf should be able to divide C by X0,X1,…,Xnf-1]For the divided part, the input is divided as follows:
whereinRepresents the first of XToA channel; after division, each part is sequentially subjected to frequency spectrum decomposition by using corresponding frequency bands in the frequency component combination according to the method of S32 to obtain [ f0,f1,…,fnf-1]Each of whichs.t.j belongs to {0,1, …, nf-1 }; and then splicing the frequency spectrum of each part:
fi=cat([f0,f1,…,fnf-1])
where cat (. cndot.) is the splicing function, yielding fi∈RC;
S43, f obtained by respectively combining 6 combinations of 1, 2, 4, 8, 16 and 32 spectrum componentsi∈RCSubstituting the data into the attention basic network established in S2, training and testing the model to obtain the performance of each combination;
s44, selecting the combination with the highest performance as the frequency spectrum input f 'of the final model'i。
Further, in step S5, the process of establishing the frequency domain attention mechanism based on the neural network is as follows:
s51, input Spectrum f 'for the Final model obtained in S44'iThe following attention mechanism is established and the attention vector is obtained:
att′=sigmoid(fc(f′i))
s53, for each channel of the input image or the characteristic X of the basic network in S2, performing attention scale transformation according to the attention vector att' to obtain final output
WhereinAtt 'as the ith channel of the transformed feature'iIs the ith value, X, of attention vector atti,:,:And inputting the ith channel of the image or the feature, and establishing a frequency domain attention mechanism of the neural network according to the ith channel to form a final model.
Further, the specific process of step S6 is as follows: based on the image recognition data set in S1, after single spectrum performance ranking obtained by S2 and S3 is used, 1, 2, 4, 8, 16 and 32 frequencies with the highest performance are respectively selected to obtain 6 spectrum combinations, and then the 6 spectrum combinations are substituted into S4 to obtain each combination spectrum performance ranking and obtain the spectrum combination with the highest performance; and substituting the spectrum combination with the highest performance into S5 to serve as an input spectrum of a final model, and performing final model training based on the image recognition data set in S1 to obtain an image recognition prediction model.
Further, step S7 is specifically as follows: and after the prediction model in the step S6 is obtained, inputting the image to be recognized into the prediction model for prediction to obtain an image classification prediction result.
Compared with the existing attention mechanism method, the image identification method based on the neural network frequency domain attention mechanism has the following beneficial effects:
firstly, the image identification method based on the neural network frequency domain attention mechanism defines an attention mechanism based on frequency domain analysis. The original attention mechanism is popularized to the frequency domain, and information noticed by the attention mechanism is more complete due to the complete property of the frequency domain.
Secondly, compared with the original mean value method, the frequency domain analysis method expanded by the image identification method based on the neural network frequency domain attention mechanism has the same parameter amount and calculated amount, and can seamlessly expand the original arbitrary attention mechanism network.
Finally, the invention realizes the great improvement of precision on various image identification tasks (image classification, target detection and example segmentation) under the condition of the same calculation amount and complexity by bringing the information of different frequency domains into an attention mechanism, and has good application value.
Drawings
FIG. 1 is a flowchart of an image recognition method based on a neural network frequency domain attention mechanism.
Detailed Description
The invention will be further elucidated and described with reference to the drawings and the detailed description. The technical characteristics of the embodiments of the invention can be correspondingly combined without mutual conflict.
In a preferred embodiment of the present invention, as shown in fig. 1, there is provided an image recognition method based on a neural network frequency domain attention mechanism, which includes the following steps:
and S1, acquiring an image recognition data set for training the neural network.
In step S1 of the present embodiment, the data set for image recognition includes a group of imagesWherein IiK is the number of images in the image group for the ith image;
the algorithm targets are defined as: and acquiring a classification result of each picture.
And S2, establishing an attention base network by using ResNet as a backbone.
In step S2 of this embodiment, the specific process is as follows:
s21, constructing ResNet as a basic backbone network;
s22, adding an attention mechanism on the basis of ResNet to construct an attention base network, wherein X is the same as RC×H×WFor the output characteristics of a single layer in the ResNet network, wherein C, H, W is the channel number, the height of the characteristic diagram, and the width of the characteristic diagram of the characteristic, respectively, the attention mechanism is to transform the output X of the layer as follows:
att=sigmoid(fc(fi))
wherein att ∈ RCFor the attention vector obtained after transformation, sigmoid () is a sigmoid activation function, fc (-) is a two-layer fully-connected network, fi∈RCIs the spectrum of the input data X. The method of obtaining the frequency spectrum may be a single frequency domain transform basis function selection model in S3, or may be a combined frequency domain transform basis function selection model in S4.
whereinAtt, the ith channel of the transformed featureiIs the ith value, X, of the attention vectori,:,:The ith channel which is input data X; an attention mechanism is added to each layer in the ResNet network, the output characteristics of the current layer are transformed, and then the transformed characteristics are usedAs a result of attentionInputting the feature after the force processing into the next layer of ResNet, and obtaining the attention base network.
S3, establishing a single frequency domain transformation basis function selection model based on the attention base network in S2.
In step S3 of this embodiment, the specific process is as follows:
s31, outputting the characteristic X belonging to R of each layerC×H×WIs divided into C two-dimensional characteristic graphs x2d∈RH×WAnd for each two-dimensional feature map x2dPerforming discrete cosine transform, wherein the transform process comprises the following steps:
s.t.h∈{0,1,…,H-1},w∈{0,1,…,W-1}
for a two-dimensional feature map x of size H W2dObtaining H multiplied by W transformed frequency spectrum components; f. of2d∈RH×WNamely, obtaining a discrete cosine transform frequency spectrum result;for transforming the frequency spectrum f by discrete cosine2dIn [ h, w ]]A value of the location;
s32, aiming at C two-dimensional feature maps x2dResulting C spectra f2dEach time f is selected2dOf a spectrum (e.g. the first C spectra f)2dOnly selectSecond C frequency spectra f2dOnly select) Then for X ∈ RC×H×WEach time obtaining an fi∈RC(ii) a F is to be measurediAnd (5) bringing the attention basic network established in the S2, training and testing the performance of the frequency spectrum component as single input, and finally obtaining the performance sequence of all the frequency spectrum components according to the test results of different frequency components.
And S4, establishing a combined frequency domain transformation basis function selection model on the basis of S2 and S3.
In step S4 of this embodiment, the specific process is as follows:
s41, sorting the performance according to the single frequency spectrum obtained in the step S32 when the single frequency spectrum is used as input, and sequentially taking 1, 2, 4, 8, 16 and 32 frequency spectrum components with the highest performance to form 6 combinations of frequency components with different quantities;
s42, for any combination, inputting X epsilon RC×H×WDividing the channel dimension, namely C dimension, according to the number of the frequency components; assuming that the number of frequency domains in a combination is nf, then nf should be able to divide C by X0,X1,…,Xnf-1]For the divided part, the input is divided as follows:
whereinRepresents the first of XToA channel; after division, each part is sequentially subjected to frequency spectrum decomposition by using corresponding frequency bands in the frequency component combination according to the method of S32 to obtain [ f0,f1,…,fnf-1]Each of whichs.t.j belongs to {0,1, …, nf-1 }; and then splicing the frequency spectrum of each part:
fi=cat([f0,f1,…,fnf-1])
where cat (. cndot.) is the splicing function, yielding fi∈RC;
S43, 1, 2,4. F obtained from 6 combinations of 8, 16 and 32 spectrum componentsi∈RCSubstituting the data into the attention basic network established in S2, training and testing the model to obtain the performance of each combination;
s44, selecting the combination with the highest performance as the frequency spectrum input f 'of the final model'i。
And S5, establishing a frequency domain attention mechanism based on the neural network on the basis of the S4, and forming a final model. In step S5 of the present embodiment, the process of establishing the frequency domain attention mechanism based on the neural network is as follows:
s51, input Spectrum f 'for the Final model obtained in S44'iThe following attention mechanism is established and the attention vector is obtained:
att′=sigmoid(fc(f′i))
s53, for each channel of the input image or the characteristic X of the basic network in S2, performing attention scale transformation according to the attention vector att' to obtain final output
WhereinAtt 'as the ith channel of the transformed feature'iIs the ith value, X, of attention vector atti,:,:And inputting the ith channel of the image or the feature, and establishing a frequency domain attention mechanism of the neural network to form a final model.
And S6, training the final model in S5 based on the image recognition data set in S1 to obtain an image prediction model.
In step S6 of the present embodiment, the process of training the prediction model based on the modeling results of S3, S4, and S5 is as follows: based on the image recognition data set in S1, after single spectrum performance ranking obtained in S2 and S3 is used, 1, 2, 4, 8, 16, 32 frequencies with the highest performance are respectively taken to obtain 6 spectrum combinations, and then the 6 spectrum combinations are substituted into S4 to obtain each combination spectrum performance ranking, and a spectrum combination with the highest performance is obtained; and substituting the spectrum combination with the highest performance into S5 to serve as an input spectrum of a final model, and performing final model training based on the image recognition data set in S1 to obtain an image recognition prediction model.
And S7, inputting the image to be recognized into the image prediction model for image recognition.
In step S7 of this embodiment, the specific process is as follows: and after the prediction model in the step S6 is obtained, inputting the image to be recognized into the prediction model for prediction to obtain an image classification prediction result.
The methods of S1-S7 are applied to specific data sets to demonstrate the technical effects that can be achieved.
Examples
The implementation method of this embodiment is as described above, and specific steps are not elaborated, and the effect is shown only for case data. The invention is implemented on a data set with truth value labels of two images, which respectively comprises the following steps:
ImageNet dataset [1 ]: the data set contained 1000 classes of natural images, 1281167 training pictures, 50000 verification images, each image labeled to contain a category.
MS COCO data set [2 ]: the data set includes object detection tasks and instance segmentation tasks, including 80 countable object classes and 91 countable object classes. The data set had over 33 million images, 150 object instances.
In this embodiment, classification accuracy comparison is mainly performed on the ImageNet data set, which is Top-1 accuracy and Top-5 accuracy respectively. In addition, the present embodiment compares the parameter quantities Parameters with the calculated quantity FLOPS.
Table 1 comparison of evaluation indexes on ImageNet dataset in this example
On the MS COCO data set, the present embodiment uses the network proposed in the patent as a backbone network, and uses fast RCNN and Mask RCNN to respectively implement an object detection task and an instance segmentation task, where the comparison indexes include an average accuracy AP, an average accuracy AP50 when the threshold is 0.5, and an average accuracy AP75 when the threshold is 0.75.
Table 2 comparison of each index of object detection task on MS COCO data set in this embodiment
Table 3 comparison of each index of example segmentation task on MS COCO dataset in this example
Method | AP | AP50 | AP75 |
ResNet-50 | 34.1 | 55.5 | 36.2 |
SENet | 35.4 | 57.4 | 37.8 |
GCNet | 35.7 | 58.4 | 37.6 |
ECANet | 35.6 | 58.1 | 37.7 |
The method of the invention | 36.2 | 58.6 | 38.1 |
The prior art cited above for comparison with the present invention can be found in the following references:
[1]Deng J,Dong W,Socher R,et al.ImageNet:A large-scale hierarchical image database[C]//IEEE Conference on Computer Vision&Pattern Recognition.IEEE,2009.
[2]Lin T Y,Maire M,Belongie S,et al.Microsoft COCO:Common Objects in Context[C]//European Conference on Computer Vision.Springer International Publishing,2014.
[3]He K,Zhang X,Ren S,et al.Deep Residual Learning for Image Recognition[C]//IEEE Conference on Computer Vision&Pattern Recognition.IEEE Computer Society,2016.
[4]Hu J,Shen L,Albanie S,et al.Squeeze-and-Excitation Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,PP(99).
[5]Wang Q,Wu B,Zhu P,et al.ECA-Net:Efficient Channel Attention for Deep Convolutional Neural Networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020.
[6]Woo S,Park J,Lee JY,So Kweon I.Cbam:Convolutional block attention module.InProceedings of the European conference on computer vision(ECCV)2018.
[7]Gao Z,Xie J,Wang Q,Li P.Global second-order pooling convolutional networks[C]//2019 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019.
[8]Cao Y,Xu J,Lin S,Wei F,Hu H.Gcnet:Non-local networks meet squeeze-excitation networks and beyond[C]//2019 IEEE International Conference on Computer Vision Workshops.IEEE,2019.
[9]Bello I,Zoph B,Le Q,et al.Attention Augmented Convolutional Networks[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV).IEEE,2020.
[10]Ren S,He K,Girshick R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2017,39(6):1137-1149.
[11]He K,Gkioxari G,Dollár P,Girshick R.Mask r-cnn[C]\\2017 IEEE international conference on computer vision.IEEE,2017.
the above-described embodiments are merely preferred embodiments of the present invention, and are not intended to limit the present invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical scheme obtained by adopting the mode of equivalent replacement or equivalent transformation is within the protection scope of the invention.
Claims (3)
1. An image identification method based on a neural network frequency domain attention mechanism is characterized by comprising the following steps:
s1, acquiring an image recognition data set for training a neural network;
s2, establishing an attention basic network by taking ResNet as a backbone;
s3, establishing a single frequency domain transformation basis function selection model based on the attention basic network in the S2;
s4, establishing a combined frequency domain transformation basis function selection model on the basis of S2 and S3;
s5, establishing a frequency domain attention mechanism based on the neural network on the basis of S4 to form a final model;
s6, training the final model in S5 based on the image recognition data set in S1 to obtain an image prediction model;
s7, inputting the image to be recognized into the image prediction model for image recognition;
in step S1, the data set for image recognition includes a group of imagesWherein Ii″For the ith "image, K is the number of images in the image group;
the algorithm targets are defined as: obtaining a classification result of each picture;
in step S2, the process of establishing the attention-based network is as follows:
s21, constructing ResNet as a basic backbone network;
s22, adding an attention mechanism on the basis of ResNet to construct an attention base network, wherein X is the same as RC×H×WFor the output characteristics of a single layer in the ResNet network, wherein C, H, W is the channel number, the height of the characteristic diagram, and the width of the characteristic diagram of the characteristic, respectively, the attention mechanism is to transform the output X of the layer as follows:
att=sigmoid(fc(fi))
wherein att ∈ RCFor the attention vector obtained after transformation, sigmoid () is a sigmoid activation function, fc (-) is a two-layer fully-connected network, fi∈RCIs the frequency spectrum of the input data X;
whereinAtt, the ith channel of the transformed featureiIs the ith value, X, of the attention vectori,:,:The ith channel which is input data X; an attention mechanism is added to each layer in the ResNet network, the output characteristics of the current layer are transformed, and then the transformed characteristics are usedInputting the feature subjected to attention processing into the next layer of ResNet to obtain an attention base network;
in step S3, the process of selecting a model by a single frequency domain transform basis function is as follows:
s31, outputting the characteristic X belonging to R of each layerC×H×WIs divided into C two-dimensional characteristic graphs x2d∈RH×WAnd for each two-dimensional feature map x2dPerforming discrete cosine transform, wherein the transform process comprises the following steps:
for a two-dimensional feature map x of size H W2dObtaining H multiplied by W transformed frequency spectrum components; f. of2d∈RH×WNamely, obtaining a discrete cosine transform frequency spectrum result;for transforming the frequency spectrum f by discrete cosine2dIn [ h, w ]]A value of the location;
s32, for C two-dimensional feature maps x2dResulting C spectra f2dEach time f is selected2dFor X ∈ R, thenC×H×WEach time obtaining an fi∈RC(ii) a F is to be measurediBringing into the attention-based network established at S2, training and testing the performance of the spectral components as a single input, according toFinally obtaining the performance sequence of all the frequency spectrum components according to the test results of different frequency components;
in step S4, the process of establishing the combined frequency domain transform basis function selection model is as follows:
s41, sorting the performance according to the single frequency spectrum obtained in the step S32 when the single frequency spectrum is used as input, and sequentially taking 1, 2, 4, 8, 16 and 32 frequency spectrum components with the highest performance to form 6 combinations of frequency components with different quantities;
s42, inputting X epsilon for any combinationWC×H×WDividing the channel dimension, namely C dimension, according to the number of the frequency components; assuming that the number of frequency domains in a combination is nf, then nf should be able to divide C by X0,X1,…,Xnf-1]For the divided part, the input is divided as follows:
whereinRepresents the first of XToA channel; after division, each part is sequentially subjected to frequency spectrum decomposition by using corresponding frequency bands in the frequency component combination according to the method of S32 to obtain [ f0,f1,…,fnf-1]Each of whichAnd then splicing the frequency spectrum of each part:
fi=cat([f0,f1,…,fnf-1])
where cat (. cndot.) is the splicing function, yielding fi∈RC;
S43, f obtained by 6 combinations of 1, 2, 4, 8, 16 and 32 frequency spectrum componentsi∈RCSubstituting the data into the attention basic network established in S2, training and testing the model to obtain the performance of each combination;
s44, selecting the combination with the highest performance as the frequency spectrum input f 'of the final model'i;
In step S5, the process of establishing the frequency domain attention mechanism based on the neural network is as follows:
s51, input Spectrum f 'for the Final model obtained in S44'iThe following attention mechanism is established and the attention vector is obtained:
att′=sigmoid(fc(f′i))
s53, for each channel of the input image or the characteristic X of the basic network in S2, performing attention scale transformation according to the attention vector att' to obtain final output
2. The image recognition method based on the neural network frequency domain attention mechanism as claimed in claim 1, wherein the step S6 is as follows: based on the image recognition data set in S1, after single spectrum performance ranking obtained in S2 and S3 is used, 1, 2, 4, 8, 16, 32 frequencies with the highest performance are respectively taken to obtain 6 spectrum combinations, and then the 6 spectrum combinations are substituted into S4 to obtain each combination spectrum performance ranking, and a spectrum combination with the highest performance is obtained; and substituting the spectrum combination with the highest performance into S5 to serve as an input spectrum of a final model, and performing final model training based on the image recognition data set in S1 to obtain an image recognition prediction model.
3. The image recognition method based on the neural network frequency domain attention mechanism as claimed in claim 2, wherein the step S7 is as follows: and after the prediction model in the step S6 is obtained, inputting the image to be recognized into the prediction model for prediction to obtain an image classification prediction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011504311.3A CN113011444B (en) | 2020-12-18 | 2020-12-18 | Image identification method based on neural network frequency domain attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011504311.3A CN113011444B (en) | 2020-12-18 | 2020-12-18 | Image identification method based on neural network frequency domain attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113011444A CN113011444A (en) | 2021-06-22 |
CN113011444B true CN113011444B (en) | 2022-05-13 |
Family
ID=76383532
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011504311.3A Active CN113011444B (en) | 2020-12-18 | 2020-12-18 | Image identification method based on neural network frequency domain attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113011444B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113706570B (en) * | 2021-08-02 | 2023-09-15 | 中山大学 | Segmentation method and device for zebra fish fluorescence image |
CN113643261B (en) * | 2021-08-13 | 2023-04-18 | 江南大学 | Lung disease diagnosis method based on frequency attention network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107024987A (en) * | 2017-03-20 | 2017-08-08 | 南京邮电大学 | A kind of real-time human brain Test of attention and training system based on EEG |
DE102018202440A1 (en) * | 2018-02-19 | 2019-08-22 | Aktiebolaget Skf | measuring system |
CN111382795A (en) * | 2020-03-09 | 2020-07-07 | 交叉信息核心技术研究院(西安)有限公司 | Image classification processing method of neural network based on frequency domain wavelet base processing |
CN111539449A (en) * | 2020-03-23 | 2020-08-14 | 广东省智能制造研究所 | Sound source separation and positioning method based on second-order fusion attention network model |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245665B (en) * | 2019-05-13 | 2023-06-06 | 天津大学 | Image semantic segmentation method based on attention mechanism |
-
2020
- 2020-12-18 CN CN202011504311.3A patent/CN113011444B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107024987A (en) * | 2017-03-20 | 2017-08-08 | 南京邮电大学 | A kind of real-time human brain Test of attention and training system based on EEG |
DE102018202440A1 (en) * | 2018-02-19 | 2019-08-22 | Aktiebolaget Skf | measuring system |
CN111382795A (en) * | 2020-03-09 | 2020-07-07 | 交叉信息核心技术研究院(西安)有限公司 | Image classification processing method of neural network based on frequency domain wavelet base processing |
CN111539449A (en) * | 2020-03-23 | 2020-08-14 | 广东省智能制造研究所 | Sound source separation and positioning method based on second-order fusion attention network model |
Non-Patent Citations (4)
Title |
---|
Dual attention network for scene;Jun Fu,et.al;《IEEE Conf. Comput. Vis. Pattern Recog.》;20191231;全文 * |
TF 2 an: a temporal-frequency fusion attention network for spectrum energy level prediction;Li, K.,et.al;《 2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking》;20190630;全文 * |
基于层次注意力机制的维度情感识别方法;汤宇豪,等;《计算机工程》;20190530;全文 * |
结合分段频域和局部注意力的超声甲状腺分割;胡屹杉等;《中国图象图形学报》;20201016(第10期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113011444A (en) | 2021-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111462126B (en) | Semantic image segmentation method and system based on edge enhancement | |
Hayder et al. | Boundary-aware instance segmentation | |
Cao et al. | Landmark recognition with sparse representation classification and extreme learning machine | |
CN110738207A (en) | character detection method for fusing character area edge information in character image | |
CN111738143B (en) | Pedestrian re-identification method based on expectation maximization | |
CN105956560A (en) | Vehicle model identification method based on pooling multi-scale depth convolution characteristics | |
CN112966137B (en) | Image retrieval method and system based on global and local feature rearrangement | |
CN102750385B (en) | Correlation-quality sequencing image retrieval method based on tag retrieval | |
CN109740679B (en) | Target identification method based on convolutional neural network and naive Bayes | |
CN104778476B (en) | A kind of image classification method | |
CN104778457A (en) | Video face identification algorithm on basis of multi-instance learning | |
CN113011444B (en) | Image identification method based on neural network frequency domain attention mechanism | |
CN111126396A (en) | Image recognition method and device, computer equipment and storage medium | |
CN104077742B (en) | Human face sketch synthetic method and system based on Gabor characteristic | |
Hayder et al. | Shape-aware instance segmentation | |
CN110751027B (en) | Pedestrian re-identification method based on deep multi-instance learning | |
CN111461039A (en) | Landmark identification method based on multi-scale feature fusion | |
CN113554654A (en) | Point cloud feature extraction model based on graph neural network and classification and segmentation method | |
CN112017162B (en) | Pathological image processing method, pathological image processing device, storage medium and processor | |
CN114332544A (en) | Image block scoring-based fine-grained image classification method and device | |
CN113269224A (en) | Scene image classification method, system and storage medium | |
CN112990282A (en) | Method and device for classifying fine-grained small sample images | |
Sun et al. | Deep learning based pedestrian detection | |
CN113822134A (en) | Instance tracking method, device, equipment and storage medium based on video | |
CN116796248A (en) | Forest health environment assessment system and method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |