CN112861970B - Fine-grained image classification method based on feature fusion - Google Patents

Fine-grained image classification method based on feature fusion Download PDF

Info

Publication number
CN112861970B
CN112861970B CN202110179265.2A CN202110179265A CN112861970B CN 112861970 B CN112861970 B CN 112861970B CN 202110179265 A CN202110179265 A CN 202110179265A CN 112861970 B CN112861970 B CN 112861970B
Authority
CN
China
Prior art keywords
image
feature map
feature
network
resnet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110179265.2A
Other languages
Chinese (zh)
Other versions
CN112861970A (en
Inventor
初妍
王丽娜
莫世奇
李思纯
李松
时洁
胡博
苗晓晨
赵佳昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202110179265.2A priority Critical patent/CN112861970B/en
Publication of CN112861970A publication Critical patent/CN112861970A/en
Application granted granted Critical
Publication of CN112861970B publication Critical patent/CN112861970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image recognition in computer vision, and particularly relates to a fine-grained image classification method based on feature fusion. The invention realizes the extraction of local detail characteristics of the fine-grained images on the classification task, accurately positions the fine-grained images in the concerned target area, solves the difficulty of small intra-class difference of the fine-grained images on the classification task, utilizes the improved non-maximum value to inhibit the soft-NMS optimization area to suggest the RPN to acquire the target object, and avoids the interference of background information. According to the invention, the bilinear convolutional neural network B-CNNs are improved through the attention module SCA and used for a fine-grained classification task so as to obtain attention characteristics with different dimensions. Compared with the existing classification method, the method is positioned in the key part of the distinction, and has higher accuracy.

Description

Fine-grained image classification method based on feature fusion
Technical Field
The invention belongs to the technical field of image recognition in computer vision, and particularly relates to a fine-grained image classification method based on feature fusion.
Background
The traditional classification task, multi-finger gross classification, is for example cat and dog. Due to their many distinctive features, it is relatively easier than fine-grained image classification. Fine-grained image classification is a subtask of image classification, mainly identifying hundreds of sub-categories under the same basic category, such as hundreds of sub-categories of birds, cars, pets, flowers, and airplanes. Different from a general classification task, fine-grained image classification has the characteristic of small intra-class difference, and the fine and local difference is the key of fine-grained image classification.
Due to the slight intra-class differences, different sub-classes can often be distinguished only by slight local differences. The fine-grained classification method mainly comprises two methods: one is a classification model based on strong supervision, which needs to use additional information such as manually labeled object labeling boxes and part labeling points in addition to the class labels of the images in order to obtain better classification accuracy. For example, the Part R-CNN algorithm adopts a recursive convolutional neural network to detect objects and local regions in an image. The practicability of the algorithm is limited to a great extent because the acquisition cost of the label information is very expensive. The other is a classification model based on weak supervision, which only relies on class labels to complete good classification without using additional part labeling information. Like a Two-level attention (Two-level attention) algorithm, does not depend on additional labeling information, and only uses class labels to complete fine-grained image classification. Although the extracted features have certain expression capability, how to effectively extract the features of the discriminant parts of the key attention area categories on the premise of only having category labels is challenging.
Disclosure of Invention
The invention aims to realize the extraction of local detail features of fine-grained images on a classification task and the accurate positioning in a concerned target area, and provides a fine-grained image classification method based on feature fusion.
The purpose of the invention is realized by the following technical scheme: the method comprises the following steps:
step 1: acquiring an image data set to be classified, taking partial image data to construct a training set, and forming a test set by the rest data; labeling the images in the training set to obtain a category label corresponding to each image;
and 2, step: extracting a feature map of each image in the training set by using a VGG-19 convolutional neural network, and obtaining a feature vector of each image in the training set through sliding window operation on the final conv5-3 feature map;
and step 3: inputting the feature vector of each image in the training set into a regression layer and a classification layer to obtain a regional candidate detection frame set of each image in the training set; calculating a confidence score f for each detection frame in the set of region candidate detection frames i Selecting a detection frame with the highest confidence coefficient to cut the image to obtain a cut image training set;
and 4, step 4: inputting the cut image training set into an SC-B-CNNs model for training;
the SC-B-CNNs model comprises a first ResNet-50 network, a second ResNet-50 network and a softmax classifier; the first ResNet-50 network is a ResNet-50 network which is pre-trained on ImageNet and removes a final full connection layer, and an attention module SCA is added between conv2 and conv3 volume blocks of the ResNet-50 network; the second ResNet-50 network does not perform pre-training and has an attention module SCA added between its conv4 and conv5 volume blocks;
step 4.1: respectively inputting the cut image training set into a first ResNet-50 network and a second ResNet-50 network, wherein the first ResNet-50 network outputs a first weighted feature map f of each image A The second ResNet-50 network outputs a second weighted feature map f for each image B
Step 4.2: the first weight characteristic graph f of each image in the cut image training set A And a second weighted feature map f B Obtaining each image in the cut image training set through bilinear poolingBilinear feature vectors of the sheet image;
step 4.3: inputting the bilinear feature vector of each image in the cut image training set into a softmax classifier to obtain the category of the image;
and 5: and inputting the test set into the trained SC-B-CNNs model to obtain a classification result of the image data set to be classified.
The present invention may further comprise:
the attention module SCA is used for extracting a feature map F with weight distribution of the input feature map G sc The method comprises the following specific steps:
step 4.1.1: generating a feature map F by 1 multiplied by 1 convolution for the feature map G input to the attention module SCA;
step 4.1.2: feature graph F is dimensionality reduced using global mean pooling by having a parameter W fc The full-connection layer assigns weight to the full-connection layer, then compresses the w multiplied by h multiplied by 1 characteristic diagram into a channel according to the channel direction through convolution operation, and generates a space attention diagram A by adopting a sigmoid activation function s
Figure BDA0002941705150000021
Wherein G ∈ R w×h×c W is the length of the feature map G, h is the width of the feature map G, and w × h represents the two-dimensional space size of the feature map G; c represents the number of channels; f. of 7×7 Representing the size of the convolution kernel; σ () represents a sigmoid activation function;
step 4.1.3: element-by-element dot multiplication method for spatial attention diagram A s Performing feature fusion with the feature map F to obtain a spatial attention feature F s
Figure BDA0002941705150000022
Step 4.1.4: feature spatial attention F s Compressing according to the spatial dimension w multiplied by h to generate a global compressed feature vector z of the current feature map c
Figure BDA0002941705150000031
Wherein, f sq () Representing a compression operation; u. of c Representing the c channel characteristic diagram;
step 4.1.5: obtaining the weight value of each channel in the feature map through two full-connection layers, and obtaining a feature map F with weight distribution by using sigmoid activation sc
Figure BDA0002941705150000032
A=σ(W s2 ×tanh(W s1 ×z c ))
Wherein σ () represents a sigmoid activation function, and tanh () represents a tanh activation function; a is a feature vector of weight distribution; w s1 Is the weight of the first fully connected layer; w is a group of s2 Is the weight of the second fully connected layer; u. of c Representing the c channel feature map;
Figure BDA0002941705150000033
representing element-by-element dot multiplication.
The invention has the beneficial effects that:
the invention realizes the extraction of local detail characteristics of the fine-grained images on the classification task, accurately positions the fine-grained images in the concerned target area, solves the difficulty of small intra-class difference of the fine-grained images on the classification task, utilizes the improved non-maximum value to inhibit the soft-NMS optimization area to suggest the RPN to acquire the target object, and avoids the interference of background information. According to the invention, the bilinear convolutional neural network B-CNNs are improved through the attention module SCA and used for a fine-grained classification task so as to obtain attention characteristics with different dimensions. Compared with the existing classification method, the method is positioned in the key part of the distinction, and has higher accuracy.
Drawings
Fig. 1 is a frame diagram of the fine-grained image classification method based on feature fusion according to the present invention.
Fig. 2 is a specific flowchart of the RPN network according to the present invention.
FIG. 3 is a schematic diagram of the framework of the B-CNNs based on SCA in the invention.
FIG. 4 is a schematic diagram of the attention module SCA of the present invention.
Fig. 5 is a specific algorithm code diagram of the SCA-based bilinear CNNs in the present invention.
FIG. 6 is a table of the results of comparative experiments performed on three datasets CUB-200, stanford cars and Oxford flowers.
FIG. 7 is a table of the results of comparative experiments performed on the CUB-200 dataset.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The invention aims to realize extraction of local detail features of fine-grained images on a classification task and accurate positioning in a concerned target area, provides a feature fusion-based weak supervision fine-grained image classification method, and aims to suppress a soft-Non-Maximum-supervision (NMS) optimization area recommendation Network (RPN) to obtain a target object by using an improved Non-Maximum value so as to avoid interference of background information. An Attention module SCA (Spatial-Channel Attention) is designed to improve bilinear convolutional neural networks (B-CNNs) for a fine-grained classification task so as to acquire Attention features of different dimensions. Compared with the existing classification method, the method is positioned in the key part of the distinction, and has higher accuracy.
Step 1, inputting images in a data set and corresponding class labels, and extracting a feature map of each image by using a VGG-19 convolutional neural network;
step 2, obtaining a 256-dimensional feature vector through 3 multiplied by 3 sliding window operation on the final conv5-3 feature map;
step 3, inputting 256-dimensional feature vectors into two full-connection layers, namely a boundary regression layer and a classification layer to obtain a regional candidate frame set;
step 4, selecting a detection frame with the highest confidence level in the frames to be detected by using an improved soft-NMS algorithm;
step 5, cutting and dividing the detected target area with the highest confidence coefficient;
step 6, inputting the cut image;
step 7, two ResNet-50 networks with the last full connection layer removed are respectively used for extracting convolution characteristics of the input image;
step 8, the first branch network uses ResNet-50 pre-trained on ImageNet, and adds a designed attention module SCA between conv2 and conv3 volume blocks to obtain a weighted feature map;
step 9, the second network uses ResNet-50 without pre-training and adds a designed attention module SCA between the conv4 and conv5 volume blocks to obtain a weighted feature map;
step 10, obtaining bilinear feature vectors by bilinear pooling operation on the weighted feature maps in the steps 8 and 9;
step 11, inputting the bilinear feature vectors into a softmax classifier to obtain the category of the image;
step 12 inputs the test data set and calculates the accuracy of the model classification.
The invention extracts the image characteristics through the RPN network and completes the selection of the candidate frame. And taking the picture as an input, extracting the rough features of the detected image by using VGG-19, and outputting an RPN (robust pitch contour) which is a region of interest obtained by convolving the feature map. To prevent overfitting, the RPN network is optimized using a modified soft-NMS, selecting the region where the higher confidence target is located. And optimizing the preset region, selecting anchors with 3 scales and 3 aspect ratios, namely generating 9 types of anchors, outputting 18 confidence values at each sliding window position classification layer, and outputting 36 target interested region position information by the regression layer to obtain more accurate candidate regions. Carrying out parameterized calculation on the target according to the boundary coordinates, wherein the formula is as follows:
t x =(x-x a )/w a ,t y =(y-y a )/h a
t w =log(w/w a ),t h =log(h/h a )
Figure BDA0002941705150000041
Figure BDA0002941705150000042
wherein, x, y, w, h respectively represent the horizontal and vertical coordinates and the length and width of the frame center of the prediction matrix. t is t i Representing parameterization of object boundary coordinates.
Figure BDA0002941705150000051
Indicating annotation information associated with the positive anchor point. x is a radical of a fluorine atom a ,y a ,w a ,h a Respectively representing the horizontal and vertical coordinates and the length and width, x, of the anchor point frame * ,y * ,w * ,h * Respectively representing the abscissa and ordinate of the true position of the label and the length and width.
Sorting all the detected detection boxes according to their scores (when the score is scored by using a classifier, a probability value is obtained, and the probability value represents the probability that the current detection box is the object to be detected), selecting the detection box A with the largest score, setting a threshold b, calculating loU (interaction over Unit) between the current detection box and the largest detection box A in the rest detection boxes, and if the loU is larger than the threshold b, obtaining the detection box with high overlapping rate. Deleting the detection boxes; if the detection frames are not overlapped with the current detection frame completely or the overlapping area of the detection frames is very small (the loU is smaller than the threshold b), then the detection frames which are not processed are reordered, the detection frame with the largest score is also selected after the ordering is finished, then the loU values of other detection frames and the largest detection frame are calculated, then the detection frames with the loU larger than a certain threshold are deleted again, and the process is iterated continuously until all the detection frames are processed, and the final detection result is output.
The RPN extracted candidate frames will be highly overlapping. To reduce redundancy, the improved soft-NMS was optimized according to the classification score of the detection box. And when the score of the detection frame is larger than the threshold value t, putting the detection frame into a final detection result set. When the areas are overlapped, the score of the detected frame is multiplied by an attenuation function, so that the error probability is effectively reduced, and the detection accuracy is improved. The specific calculation formula is as follows:
Figure BDA0002941705150000052
wherein: f. of i The score corresponding to the ith detection box is shown, and t is a threshold value.
The SC-B-CNNs network architecture provided by the invention can be formed by a quaternary function B = (f) A ,f B P, C), bilinear features are obtained by performing bilinear combination through outer product operation, and the calculation formula is as follows:
b=f A T ·f B
wherein f is A And f B The feature function containing the added attention block SCA, P is the pooling function and C is the classification function.
The feature outputs for each location are combined using bilinear pooling. The bilinear pooling operation of the input image l at position I is defined as:
bilinear(l,I,f A ,f B )=f A (l,I) T f B (l,I)
wherein f is A And f B Are the output of two feature extraction functions of the B-CNNs.
Firstly, a feature graph extracted by a feature function is used as an original input G, G belongs to R w×h×c Where w × h represents the two-dimensional space size of G, and c represents the number of channels. Feature map F is generated by a 1 × 1 convolution, and F is dimensionality reduced using Global Average Pooling (Global Average Pooling), by having a parameter W fc The full-connection layer assigns weight to the full-connection layer, then compresses the w multiplied by h multiplied by 1 characteristic diagram into a channel according to the channel direction through convolution operation, and generates a space attention diagram A by adopting a sigmoid activation function s ,A s ∈R w×h×1 . The process of spatial attention extraction is expressed as the formula:
Figure BDA0002941705150000061
wherein: f. of 7×7 Representing the size of the convolution kernel, σ () representing the sigmoid activation function, W fc Is represented by having a parameter W fc The full interconnect layer of (1).
Then, the space attention diagram A is multiplied by the element points s Performing feature fusion with the original input F to obtain a spatial attention feature F s
Figure BDA0002941705150000062
And compressing the global space information into the channel description characteristic information. Generating a global compressed feature vector z of the current feature map by compressing the feature map Fs in a spatial dimension w × h c The specific calculation formula is as follows:
Figure BDA0002941705150000063
wherein, f sq () Denotes a compression operation, u c Showing the c-th channel profile.
Then, an activation operation is carried out, and by learning the weight parameters, the nonlinear correlation between the channels is found. And obtaining the weight value of each channel in the feature map through the two fully-connected layers, and taking the weighted feature map as the input of the next layer of network. The weight assignment calculation formula of the channel is as follows:
A c =f eq (z,W)=σ(W s2 ×tanh(W s1 ×z c ))
wherein f is eq () Represents a compression operation, z represents a global compressed feature vector, σ () represents a sigmoid activation function, and tanh () represents a tanh activation function.
After the weight distribution vector of the feature map is obtained through the operation, simple gate control is selected and used, sigmoid activation is used, and the feature map F with the weight distribution is obtained sc The calculation process is as follows:
Figure BDA0002941705150000064
wherein A is c Is a feature vector of weight distribution, u c Showing the characteristic diagram of the c-th channel,
Figure BDA0002941705150000065
representing element-by-element dot multiplication.
The function of using two fully-connected layers is to ensure the consistency of input and output. The first full-connection layer firstly reduces the dimension of the channel to 1/16 of the original dimension, and after the channel passes through the tanh activation function, the channel is restored to the original input dimension through one full-connection layer.
The specific algorithm of the SCA-based bilinear CNNs is shown in FIG. 5. To demonstrate the effectiveness of the proposed method, comparative experiments were performed on three datasets, CUB-200, stanford cars and Oxford flowers, respectively, and the results are shown in FIG. 6. To further verify the validity and accuracy of the improved RPN network and SCA, comparative experiments were performed on the CUB-200 dataset, with the results shown in fig. 7.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (1)

1. A fine-grained image classification method based on feature fusion is characterized by comprising the following steps:
step 1: acquiring an image data set to be classified, taking partial image data to construct a training set, and forming a test set by the rest data; labeling the images in the training set to obtain class labels corresponding to the images;
and 2, step: extracting a feature map of each image in the training set by using a VGG-19 convolutional neural network, and obtaining a feature vector of each image in the training set through sliding window operation on the final conv5-3 feature map;
and step 3: inputting the feature vector of each image in the training set into a regression layer and a classification layer to obtain a regional candidate detection frame set of each image in the training set; calculating a confidence score f for each detection frame in the set of region candidate detection frames i Selecting a detection frame with the highest confidence coefficient to cut the image to obtain a cut image training set;
and 4, step 4: inputting the cut image training set into an SC-B-CNNs model for training;
the SC-B-CNNs model comprises a first ResNet-50 network, a second ResNet-50 network and a softmax classifier; the first ResNet-50 network is a ResNet-50 network which is pre-trained on ImageNet and removes a final full connection layer, and an attention module SCA is added between conv2 and conv3 volume blocks of the ResNet-50 network; the second ResNet-50 network does not perform pre-training and has an attention module SCA added between its conv4 and conv5 volume blocks;
step 4.1: respectively inputting the cut image training set into a first ResNet-50 network and a second ResNet-50 network, wherein the first ResNet-50 network outputs a first weighted feature map f of each image A The second ResNet-50 network outputs a second weighted feature map f for each image B
The attention module SCA is used for extracting a feature map F with weight distribution of the input feature map G sc The method comprises the following specific steps:
step 4.1.1: for the feature map G input to the attention module SCA, generating a feature map F by 1 × 1 convolution;
step 4.1.2: feature graph F is dimensionality reduced using global average pooling by having parameter W fc The full connection layer is assigned with weight, then the characteristic diagram of w multiplied by h multiplied by 1 is compressed into a channel according to the channel direction through convolution operation, and the sigmoid activation function is adopted to generate a space attention diagram A s
Figure FDA0003740504810000011
Wherein G ∈ R w×h×c W is the length of the feature map G, h is the width of the feature map G, and w × h represents the two-dimensional space size of the feature map G; c represents the number of channels; f. of 7×7 Representing the size of the convolution kernel; σ () represents a sigmoid activation function;
step 4.1.3: element-by-element dot multiplication method for spatial attention diagram A s Performing feature fusion with the feature map F to obtain a spatial attention feature map F s
Figure FDA0003740504810000012
Step 4.1.4: spatial attention feature map F s Compressing according to the spatial dimension w multiplied by h to generate a spatial attention feature map F s Global compressed feature vector z of c
Step 4.1.5: obtaining a spatial attention feature map F through two fully-connected layers s Using sigmoid to activate the weight value of each channel to obtain a feature graph F with weight distribution sc
Figure FDA0003740504810000021
A=σ(W s2 ×tanh(W s1 ×z c ))
Wherein σ () represents a sigmoid activation function, and tanh () represents a tanh activation function; a is a weight distribution feature vector; w s1 Is the weight of the first fully connected layer; w s2 Is the weight of the second fully connected layer; u. of c Representing the c channel characteristic diagram;
Figure FDA0003740504810000022
representing element-by-element dot multiplication;
and 4.2: the first weight characteristic graph f of each image in the cut image training set A And a second weighted feature map f B Obtaining a bilinear feature vector of each image in the cut image training set through bilinear pooling operation;
step 4.3: inputting the bilinear feature vector of each image in the cut image training set into a softmax classifier to obtain the category of the image;
and 5: and inputting the test set into the trained SC-B-CNNs model to obtain a classification result of the image data set to be classified.
CN202110179265.2A 2021-02-09 2021-02-09 Fine-grained image classification method based on feature fusion Active CN112861970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110179265.2A CN112861970B (en) 2021-02-09 2021-02-09 Fine-grained image classification method based on feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110179265.2A CN112861970B (en) 2021-02-09 2021-02-09 Fine-grained image classification method based on feature fusion

Publications (2)

Publication Number Publication Date
CN112861970A CN112861970A (en) 2021-05-28
CN112861970B true CN112861970B (en) 2023-01-03

Family

ID=75989506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110179265.2A Active CN112861970B (en) 2021-02-09 2021-02-09 Fine-grained image classification method based on feature fusion

Country Status (1)

Country Link
CN (1) CN112861970B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113393371B (en) * 2021-06-28 2024-02-27 北京百度网讯科技有限公司 Image processing method and device and electronic equipment
CN113869347B (en) * 2021-07-20 2022-08-02 西安理工大学 Fine-grained classification method for severe weather image
CN113744292A (en) * 2021-09-16 2021-12-03 安徽世绿环保科技有限公司 Garbage classification station garbage throwing scanning system
CN114067316B (en) * 2021-11-23 2024-05-03 燕山大学 Rapid identification method based on fine-granularity image classification

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086792A (en) * 2018-06-26 2018-12-25 上海理工大学 Based on the fine granularity image classification method for detecting and identifying the network architecture
CN110866907A (en) * 2019-11-12 2020-03-06 中原工学院 Full convolution network fabric defect detection method based on attention mechanism

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8879855B2 (en) * 2012-08-17 2014-11-04 Nec Laboratories America, Inc. Image segmentation for large-scale fine-grained recognition
CN108898137B (en) * 2018-05-25 2022-04-12 黄凯 Natural image character recognition method and system based on deep neural network
CN110443116B (en) * 2019-06-19 2023-06-20 平安科技(深圳)有限公司 Video pedestrian detection method, device, server and storage medium
CN110826558B (en) * 2019-10-28 2022-11-11 桂林电子科技大学 Image classification method, computer device, and storage medium
CN111709265A (en) * 2019-12-11 2020-09-25 深学科技(杭州)有限公司 Camera monitoring state classification method based on attention mechanism residual error network
CN111210907A (en) * 2020-01-14 2020-05-29 西北工业大学 Pain intensity estimation method based on space-time attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086792A (en) * 2018-06-26 2018-12-25 上海理工大学 Based on the fine granularity image classification method for detecting and identifying the network architecture
CN110866907A (en) * 2019-11-12 2020-03-06 中原工学院 Full convolution network fabric defect detection method based on attention mechanism

Also Published As

Publication number Publication date
CN112861970A (en) 2021-05-28

Similar Documents

Publication Publication Date Title
CN112861970B (en) Fine-grained image classification method based on feature fusion
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN110287960B (en) Method for detecting and identifying curve characters in natural scene image
CN107633513B (en) 3D image quality measuring method based on deep learning
CN110728200B (en) Real-time pedestrian detection method and system based on deep learning
CN109583483B (en) Target detection method and system based on convolutional neural network
CN114758383A (en) Expression recognition method based on attention modulation context spatial information
CN107784288B (en) Iterative positioning type face detection method based on deep neural network
CN111461039B (en) Landmark identification method based on multi-scale feature fusion
CN111898432B (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN113269054B (en) Aerial video analysis method based on space-time 2D convolutional neural network
CN111046787A (en) Pedestrian detection method based on improved YOLO v3 model
CN114758288A (en) Power distribution network engineering safety control detection method and device
CN112861917B (en) Weak supervision target detection method based on image attribute learning
CN109670555B (en) Instance-level pedestrian detection and pedestrian re-recognition system based on deep learning
CN111768415A (en) Image instance segmentation method without quantization pooling
CN112861785B (en) Instance segmentation and image restoration-based pedestrian re-identification method with shielding function
CN111652273A (en) Deep learning-based RGB-D image classification method
CN112396036A (en) Method for re-identifying blocked pedestrians by combining space transformation network and multi-scale feature extraction
CN115497122A (en) Method, device and equipment for re-identifying blocked pedestrian and computer-storable medium
CN112329771A (en) Building material sample identification method based on deep learning
CN114743126A (en) Lane line sign segmentation method based on graph attention machine mechanism network
CN114494786A (en) Fine-grained image classification method based on multilayer coordination convolutional neural network
CN116796248A (en) Forest health environment assessment system and method thereof
CN116543338A (en) Student classroom behavior detection method based on gaze target estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant