CN116797847A - Enhanced complementary fine-grained image classification network system - Google Patents

Enhanced complementary fine-grained image classification network system Download PDF

Info

Publication number
CN116797847A
CN116797847A CN202310842868.5A CN202310842868A CN116797847A CN 116797847 A CN116797847 A CN 116797847A CN 202310842868 A CN202310842868 A CN 202310842868A CN 116797847 A CN116797847 A CN 116797847A
Authority
CN
China
Prior art keywords
network
area
complementary
reinforced
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310842868.5A
Other languages
Chinese (zh)
Inventor
胡静
王芳
王梦瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Science and Technology
Original Assignee
Taiyuan University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Science and Technology filed Critical Taiyuan University of Science and Technology
Priority to CN202310842868.5A priority Critical patent/CN116797847A/en
Publication of CN116797847A publication Critical patent/CN116797847A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

A fine-grained image classification network system that enforces complementation, the classification network system comprising: the reinforced complementary learning network structure is used for extracting the characteristics of the main network, driving the other two sub-networks to respectively perform reinforced learning and complementary learning, and extracting the target characteristics together to realize detailed and comprehensive identification of the identification target object; the DM driving module is used for cutting and amplifying the area with the greatest influence on the result, deleting the area in the original image, and sending the area into the reinforced complementary learning network structure, so that the model can be helped to carry out end-to-end training; the DM loss function is used for enabling the DM driving module to locate a key area, continuously optimizing the position information of the reinforced complementary learning network structure and simultaneously providing a precise mask position; and the verification data set is used for verifying the performance of the reinforced complementary learning network structure.

Description

Enhanced complementary fine-grained image classification network system
Technical field:
the application relates to a fine-grained image classification network system with enhanced complementation.
The background technology is as follows:
the image analysis is to analyze the bottom layer characteristics and the upper layer structure by utilizing a mathematical model and combining an image processing technology, and extract information data with certain intelligence; image analysis focuses on the description method of constructing images, more on symbolizing various images rather than computing the images themselves and reasoning with various relevant knowledge; image analysis is also germane to research on human vision, and research on certain identifiable modules in human visual mechanisms may facilitate improvements in computer vision capabilities.
The fine-grained images are important components in image analysis, and as the identified objects are various subclasses under different types, the fine-grained identification task has very great challenges due to the fact that the differences among the subclasses are very fine and are concentrated in a plurality of local areas, some fine-grained networks are often concentrated in a certain area when judging the target class, other auxiliary judging area features are lacking, and therefore the target cannot be finely and comprehensively identified.
The application comprises the following steps:
the embodiment of the application provides a reinforced complementary fine-grained image classification network system, which has reasonable structural design, is additionally provided with a reinforced complementary fine-grained image classification network, utilizes a main network to perform feature extraction and drive two sub-networks to perform reinforced learning and complementary learning respectively, acquires the finer fine-grained image features by adopting a learning method of a reinforced model, and simultaneously acquires the complementary discrimination area of a target by adopting a mode of attention erasure, thereby increasing the overall perception capability of the network to the target, and evaluating the performance of a system model by combining verification experiments developed on a plurality of public data sets, realizing the careful and comprehensive identification of the target and improving the effect of fine-grained image identification.
The technical scheme adopted by the application for solving the technical problems is as follows:
a fine-grained image classification network system that enforces complementation, the classification network system comprising:
the reinforced complementary learning network structure is used for extracting the characteristics of the main network, driving the other two sub-networks to respectively perform reinforced learning and complementary learning, and extracting the target characteristics together to realize careful and comprehensive identification of the identification target object;
the DM driving module is used for cutting and amplifying the area with the greatest effect on the result, deleting the area in the original image and sending the area into the reinforced complementary learning network structure, so that the model can be helped to carry out end-to-end training;
the DM loss function is used for enabling the DM driving module to locate a key area, continuously optimizing the position information of the reinforced complementary learning network structure and simultaneously providing accurate mask positions;
a validation data set for validating performance of the reinforcement complementary learning network structure, including CUB-200-2011,Stanford Cars and FGVC-Ai rcraft.
The backbone network of the reinforced complementary learning network structure is I-point ion-V3; the reinforced complementary learning network structure comprises a basic network, a reinforced network and a complementary network, so that three classification networks are constructed to aggregate the whole and partial characteristics of a target object, the whole semantic information of the object can be obtained, the partial semantic information of the object can be obtained, then the characteristics output by each network are subjected to global average pooling, the pooled characteristics are spliced to form a 6144 vector, a 200-dimensional classification layer is added to the vector for end-to-end training, and finally a classification result is obtained through Softmax.
The DM driving module performs cutting and amplifying on the area with the greatest influence on the result, and sends the area into the strengthening network; and the DM driving module deletes the area with the greatest influence on the result in the original image and sends the area to the complementary network.
The DM driving module receives the feature map obtained after the training of the basic network, then generates a square area taking (x, y) as the center and half of l as the side length, cuts and amplifies the area, sends the area into the strengthening network, generates an image mask according to the area, and inputs the image mask into the complementary network for complementary learning.
The DM driving module consists of two full-connection layers, the input end is a feature map, the output end is the most important local area of the neural network, and the most important local automatic positioning can be realized through the full-connection layers;
given an image X, inputting the image X into a trained convolution layer for feature extraction, wherein Tn represents an overall parameter, the whole process can be described as convolution, pooling and activation of the X, and finally a probability distribution p is generated, and a probability distribution formula is as follows:
p(X)=f(Tn*X)
where f (·) represents the fully connected layer, which converts the features extracted by the convolutional neural network into feature vectors and uses softmax to convert the vectors into probability values.
The initialization parameter calculation formula of the DM driving module is as follows:
wherein F represents the feature map output by the last layer in the convolutional neural network, n represents the number of the feature maps, d represents the total number of the feature maps, and F is the total feature map obtained by adding each feature map;
the average value comparison formula of the DM driving module is as follows:
wherein h, w represent the width and height of the feature map respectively,representing the mean value of the feature map;
by passing through and Fi,j And generating initialization coordinates for obtaining the center of the bounding box. After the initialization coordinates are obtained, the model can automatically optimize the coordinates according to the training process, then the region is needed to be cut and amplified, and a finer local region is obtained and then sent into a strengthening network for learning; the upper left and lower right corner coordinates of the local area are obtained from the center coordinates and the side length, the upper left corner sitting being marked (t) lx ,t rx ) The lower right corner is marked as (t) ly ,t ry ) The calculation formula is as follows:
t lx =x-l,t rx =y-l
t ly =x+l,t ry =y+l。
the cropping operation can be seen as a multiplication between the original image and the module, expressed as:
wherein ,Xcrop In order to cut out the region,representing a clipping operation between the artwork and the template, M (·) is an attention mask whose expression is:
M(·)=[μ(i-t lx )-μ(i-t ly )]×[μ(j-t rx )-μ(j-t ry )]
wherein i, j are respectively at any point in the feature map, if i, j are positioned in the feature map, the value of M (·) is 1, otherwise, the value is 0; while μ (·) is a continuous derivative function, its expression is:
the size of the extracted local area is enlarged by adopting a bilinear interpolation algorithm, and the defended local area is obtained according to the ratio of the original image to the local area, wherein the algorithm formula is as follows:
wherein , and Xa Representing the area of the local area and the whole area, respectively, < >>Is the area ratio, X local Is an enlarged local area.
The DM loss function is:
wherein m represents the size of a batch, W represents the output result of the full connection layer, yi represents the category of the ith picture, xi represents the feature vector of the ith picture before the full connection layer, b represents the network bias, s is the number of target categories to continuously optimize the position information of the strengthening network, and simultaneously provide more accurate mask positions to enable the complementary network to learn the secondary features.
The verification method for verifying the data set comprises the following steps:
s1, training a feature extraction network by utilizing training weights of a backbone network acceptance-V3 on an Image Net, reserving parameters of a pooling layer, an input layer and a convolution layer, removing an existing full-connection layer and a softmax layer to finely tune the network, and training data used in the process;
s2, calculating the key region through the reinforced complementary learning network structure, finding the coordinate information of the most key region, and cutting and amplifying to generate a finer training result.
According to the application, the characteristic extraction is carried out on the main network through the reinforced complementary learning network structure, and the other two sub-networks are driven to respectively carry out reinforced learning and complementary learning, so that the target characteristics are extracted together, and the detailed and comprehensive identification of the identification target object is realized; the region with the greatest effect on the result is cut and amplified through the DM driving module, the region is deleted in the original image and is sent to the reinforced complementary learning network structure, and the model can be helped to carry out end-to-end training; positioning a key area by a DM driving module through a DM loss function, continuously optimizing the position information of the reinforced complementary learning network structure, and providing a precise mask position; the performance of the reinforced complementary learning network structure is verified through the verification data set, and the method has the advantages of accuracy, practicability and excellent performance.
Description of the drawings:
fig. 1 is a schematic structural view of the present application.
Fig. 2 is a diagram showing the network configuration of the present application.
The specific embodiment is as follows:
in order to clearly illustrate the technical features of the present solution, the present application will be described in detail below with reference to the following detailed description and the accompanying drawings.
As shown in fig. 1-2, a complementary fine-grained image classification network system is enhanced, the classification network system comprising:
the reinforced complementary learning network structure is used for extracting the characteristics of the main network, driving the other two sub-networks to respectively perform reinforced learning and complementary learning, and extracting the target characteristics together to realize careful and comprehensive identification of the identification target object;
the DM driving module is used for cutting and amplifying the area with the greatest effect on the result, deleting the area in the original image and sending the area into the reinforced complementary learning network structure, so that the model can be helped to carry out end-to-end training;
the DM loss function is used for enabling the DM driving module to locate a key area, continuously optimizing the position information of the reinforced complementary learning network structure and simultaneously providing accurate mask positions;
a validation data set for validating performance of the reinforcement complementary learning network structure, including CUB-200-2011,Stanford Cars and FGVC-Ai rcraft.
The backbone network of the reinforced complementary learning network structure is I-point ion-V3; the reinforced complementary learning network structure comprises a basic network, a reinforced network and a complementary network, so that three classification networks are constructed to aggregate the whole and partial characteristics of a target object, the whole semantic information of the object can be obtained, the partial semantic information of the object can be obtained, then the characteristics output by each network are subjected to global average pooling, the pooled characteristics are spliced to form a 6144 vector, a 200-dimensional classification layer is added to the vector for end-to-end training, and finally a classification result is obtained through Softmax.
The DM driving module performs cutting and amplifying on the area with the greatest influence on the result, and sends the area into the strengthening network; and the DM driving module deletes the area with the greatest influence on the result in the original image and sends the area to the complementary network.
The DM driving module receives the feature map obtained after the training of the basic network, then generates a square area taking (x, y) as the center and half of l as the side length, cuts and amplifies the area, sends the area into the strengthening network, generates an image mask according to the area, and inputs the image mask into the complementary network for complementary learning.
The DM driving module consists of two full-connection layers, the input end is a feature map, the output end is the most important local area of the neural network, and the most important local automatic positioning can be realized through the full-connection layers;
given an image X, inputting the image X into a trained convolution layer for feature extraction, wherein Tn represents an overall parameter, the whole process can be described as convolution, pooling and activation of the X, and finally a probability distribution p is generated, and a probability distribution formula is as follows:
p(X)=f(Tn*X)
where f (·) represents the fully connected layer, which converts the features extracted by the convolutional neural network into feature vectors and uses softmax to convert the vectors into probability values.
The initialization parameter calculation formula of the DM driving module is as follows:
wherein F represents the feature map output by the last layer in the convolutional neural network, n represents the number of the feature maps, d represents the total number of the feature maps, and F is the total feature map obtained by adding each feature map;
the average value comparison formula of the DM driving module is as follows:
wherein h, w represent the width and height of the feature map respectively, and I represents the mean value of the feature map;
through I and F i,j And generating initialization coordinates for obtaining the center of the bounding box. After the initialization coordinates are obtained, the model can automatically optimize the coordinates according to the training process, then the region is needed to be cut and amplified, and a finer local region is obtained and then sent into a strengthening network for learning; the upper left and lower right corner coordinates of the local area are obtained from the center coordinates and the side length, the upper left corner sitting being marked (t) lx ,t rx ) The lower right corner is marked as (t) ly ,t ry ) The calculation formula is as follows:
t lx =x-l,t rx =y-l
t ly =x+l,t ry =y+l。
the cropping operation can be seen as a multiplication between the original image and the module, expressed as:
wherein ,Xcrop In order to cut out the region,representing a clipping operation between the artwork and the template, M (·) is an attention mask whose expression is:
M(·)=[μ(i-t lx )-μ(i-t ly )]×[μ(j-t rx )-μ(j-t ry )]
wherein i, j are respectively at any point in the feature map, if i, j are positioned in the feature map, the value of M (·) is 1, otherwise, the value is 0; while μ (·) is a continuous derivative function, its expression is:
the size of the extracted local area is enlarged by adopting a bilinear interpolation algorithm, and the defended local area is obtained according to the ratio of the original image to the local area, wherein the algorithm formula is as follows:
wherein , and Xa Representing the area of the local area and the whole area, respectively, < >>Is the area ratio, X local Is an enlarged local area.
The DM loss function is:
wherein m represents the size of a batch, W represents the output result of the full connection layer, yi represents the category of the ith picture, xi represents the feature vector of the ith picture before the full connection layer, b represents the network bias, s is the number of target categories to continuously optimize the position information of the strengthening network, and simultaneously provide more accurate mask positions to enable the complementary network to learn the secondary features.
The verification method for verifying the data set comprises the following steps:
s1, training a feature extraction network by utilizing training weights of a backbone network I-V3 on an Image Net, reserving parameters of a pooling layer, an input layer and a convolution layer, removing an existing full-connection layer and a softmax layer to finely tune the network, and training data used in the process;
s2, calculating the key region through the reinforced complementary learning network structure, finding the coordinate information of the most key region, and cutting and amplifying to generate a finer training result.
The working principle of the reinforced complementary fine-grained image classification network system in the embodiment of the application is as follows: the method has the advantages that the reinforced complementary fine-grained image classification network is additionally arranged, the main network is utilized to perform feature extraction, the two sub-networks are driven to perform reinforced learning and complementary learning respectively, the reinforced model learning method is adopted to acquire finer fine-grained image features, meanwhile, the complementary network is adopted to acquire a complementary discrimination area of a target in a attention erasing mode, so that the overall perception capability of the network on the target is improved, the performance of a system model is evaluated in combination with verification experiments developed on a plurality of public data sets, the effect of carrying out fine and comprehensive recognition on the target is achieved, and the fine-grained image recognition effect is improved, and can be widely applied to classification tasks.
For an identification network, the features of interest tend to be concentrated on a region of the target that becomes the most prominent feature for identifying the target; the model designed by the application can identify the target in a larger range, and the model is not dependent on a certain significant feature any more, and can also realize detailed and comprehensive identification of the identification target by means of secondary features.
The reinforced complementary learning network structure comprises a basic network, a reinforced network and a complementary network, so that three classification networks are constructed to aggregate the whole and partial characteristics of a target object, the whole semantic information of the object can be obtained, the partial semantic information of the object can be obtained, then the characteristics output by each network are subjected to global average pooling, the pooled characteristics are spliced to form a 6144 vector, a 200-dimensional classification layer is added to the vector for end-to-end training, and finally a classification result is obtained through Softmax.
Because the traditional neural network does not utilize the advantages of the deep neural network to perform positioning and recognition learning, the application provides a DM driving module to help the backbone network to find a rectangular area with the greatest influence on the result in the training process; meanwhile, the DM driver module is very computationally inexpensive, and it can help the model perform end-to-end training.
The DM driving module receives the feature map obtained after the training of the basic network, then generates a square area taking (x, y) as the center and taking half of l as the side length, cuts and amplifies the area, sends the area into the strengthening network, and can generate an image mask according to the area and input the image mask into the complementary network for complementary learning.
In the process, the high response area of the feature map is the key for obtaining coordinates (x, y), the DM driving module is composed of two full-connection layers, the input is the feature map, the output is the most important local area of the neural network, and the most important local automatic positioning can be realized through the full-connection layers, so that the size of the bounding box is limited, the size of the bounding box cannot exceed 2/3 of the longest side of the whole image at most, and the minimum size cannot be smaller than 1/3 of the smallest side of the image.
Specifically, given an image X, inputting the image X into a trained convolution layer for feature extraction, where Tn represents an overall parameter, the whole process can be described as convolving, pooling and activating the X, and finally generating a probability distribution p, where the probability distribution formula is as follows:
p(X)=f(Tn*X)
where f (·) represents the fully connected layer, which converts the features extracted by the convolutional neural network into feature vectors and uses softmax to convert the vectors into probability values.
The next step is to generate the position and length parameter information of square bounding box as
[x,y,l]=g(Tn*X)
Wherein X, y and l are respectively the abscissa of the bounding box in X and half of the side length, g (-) represents a DM driving module, and the structure of the DM driving module is composed of two full-connection layers; because the weight parameters initialized by the network have great influence on the model, the feature images output by the last layer of the basic network are added, so that the more abundant the semantic information of the feature images is, the more accurate the generated bounding box is.
The calculation formula of the initialization parameters of the DM drive module is as follows:
wherein F represents the feature map output by the last layer in the convolutional neural network, n represents the number of feature maps, d represents the total number of feature maps, and F is the total feature map obtained by adding each feature map.
Further, the average comparison formula of the DM driving module is:
wherein h, w represent the width and height of the feature map respectively,representing the mean value of the feature map;
by passing through and Fi,j And generating initialization coordinates for obtaining the center of the bounding box. After the initialization coordinates are obtained, the model can automatically optimize the coordinates according to the training process, then the region is needed to be cut and amplified, and a finer local region is obtained and then sent into a strengthening network for learning; the upper left and lower right corner coordinates of the local area are obtained from the center coordinates and the side length, the upper left corner sitting being marked (t) lx ,t rx ) The lower right corner is marked as (t) ly ,t ry ) The calculation formula is as follows:
t lx =x-l,t rx =y-l
t ly =x+l,t ry =y+l。
when the corresponding coordinate information is obtained, the clipping operation can be regarded as multiplication between the original image and the module, expressed as:
wherein ,Xcrop In order to cut out the region,representing a clipping operation between the artwork and the template, M (·) is an attention mask whose expression is:
M(·)=[μ(i-t lx )-μ(i-t ly )]×[μ(j-t rx )-μ(j-t ry )]
wherein i, j are respectively at any point in the feature map, if i, j are positioned in the feature map, the value of M (·) is 1, otherwise, the value is 0; while μ (·) is a continuous derivative function, its expression is:
in order to cut and amplify the picture, the size of the extracted local area is enlarged by using a bilinear interpolation method, and the enlarged local area can be obtained according to the ratio of the original image to the local area, wherein the algorithm formula is as follows:
wherein , and Xa Representing the area of the local area and the whole area, respectively, < >>Is the area ratio, X local Is an enlarged local area.
Similarly, in order to train the complementary network, the generated local area is changed into a mask picture, the pixels of the mask are unified as the average value of the pixels of the original image, and the rest is replaced by white pixels, and the specific formula is as follows:
then, the mask is erased in the original image according to the position information obtained before, and the obtained mask image is sent to complementary model training, and the specific process is as follows:
in the above formula, the values of each position in the pixel matrix formed by the original image represent different pixels, 1 in the mask image represents a black pixel, the pixel values in the RGB channels are (0, 0), the position calculation is performed by the original image and the mask image, the black pixel part is directly filled with the original image pixel, and the mask image pixel replaces the original image pixel, so that the image after the key area is erased can be obtained.
For DM loss functions, since a suitable loss function has a positive effect on model training, a loss function commonly used for fine-grained image recognition is a softmax loss function, and the specific formula is:
wherein m represents the size of a batch, W represents the output result of the full connection layer, yi represents the category of the ith picture, xi represents the feature vector of the ith picture before the full connection layer, b represents the network bias, s is the number of target categories to continuously optimize the position information of the strengthening network, and simultaneously provide more accurate mask positions to enable the complementary network to learn the secondary features.
To help the reinforcement of the complementary learning network structure to find more accurate features, the probability value p of the sample can be output by the trunk model k And probability value p generated by the reinforcement model k+1 In contrast, when p, the difference between the two models is referred to k >p k +1 No loss is generated when p k <p k+1 When a loss occurs; therefore, the loss function can help the strengthening network to find more accurate features, and can help the backbone network to more accurately locate after the accurate features are extracted, and the strengthening network mutually strengthen each other.
Meanwhile, in the complementary model, as the characteristics extracted by the trunk characteristics are to be deleted, the characteristics extracted by the trunk model are not in any connection with the complementary model, but the trunk model provides accurate local areas which are helpful for the complementary model to learn secondary characteristics, so that only the trunk model and the strengthening model can be positioned to key local areas; from the above examples, the total loss of the model is:
wherein For the modulation factor, two loss functions are balanced.
For the verification data set in the application mainly comprises CUB-200-2011,Stanford Cars and FGVC-air, in order to improve the attention capability of the model to secondary features, a small amount of data may need to be erased for multiple key areas to obtain all features, so that erasure experiments need to be performed on the used data set to find out proper erasure times.
The verification method for verifying a data set comprises the steps of: training the feature extraction network by utilizing the training weight of the backbone network acceptance-V3 on the Image Net, reserving parameters of a pooling layer, an input layer and a convolution layer, removing an existing full-connection layer and a softmax layer to finely tune the network, and training data used in the process; and (3) calculating the key region through the reinforced complementary learning network structure, finding the coordinate information of the most key region, and cutting and amplifying to generate a finer training result.
Specifically, the experimental environment was performed under the pyrach version 1.71, GPU was NvidiaGenforce3060Ti, and CPU was i7-10700K. The optimizer selects SGD, initial learning rate set to 0.0001, momentum over parameter 0.9, batch_size 32, epoch set to 200.
In summary, the reinforced and complementary fine-grained image classification network system in the embodiment of the application adds the reinforced and complementary fine-grained image classification network, performs feature extraction by using the main network, drives the two sub-networks to perform reinforced learning and complementary learning respectively, acquires the finer fine-grained image features by adopting the learning method of the reinforced model, and simultaneously acquires the complementary discrimination area of the target by adopting the attention erasing mode, thereby increasing the overall perception capability of the network to the target, evaluating the performance of the system model by combining verification experiments developed on a plurality of public data sets, realizing the careful and comprehensive identification of the target and improving the effect of fine-grained image identification, and can be widely applied to classification tasks.
The above embodiments are not to be taken as limiting the scope of the application, and any alternatives or modifications to the embodiments of the application will be apparent to those skilled in the art and fall within the scope of the application.
The present application is not described in detail in the present application, and is well known to those skilled in the art.

Claims (10)

1. A fine-grained image classification network system that enhances complementation, the classification network system comprising:
the reinforced complementary learning network structure is used for extracting the characteristics of the main network, driving the other two sub-networks to respectively perform reinforced learning and complementary learning, and extracting the target characteristics together to realize careful and comprehensive identification of the identification target object;
the DM driving module is used for cutting and amplifying the area with the greatest effect on the result, deleting the area in the original image and sending the area into the reinforced complementary learning network structure, so that the model can be helped to carry out end-to-end training;
the DM loss function is used for enabling the DM driving module to locate a key area, continuously optimizing the position information of the reinforced complementary learning network structure and simultaneously providing accurate mask positions;
a validation data set for validating performance of the reinforcement complementary learning network structure, including CUB-200-2011,Stanford Cars and FGVC-air.
2. The enhanced complementary fine-grained image classification network system of claim 1, wherein: the backbone network of the reinforced complementary learning network structure is an acceptance-V3; the reinforced complementary learning network structure comprises a basic network, a reinforced network and a complementary network, so that three classification networks are constructed to aggregate the whole and partial characteristics of a target object, the whole semantic information of the object can be obtained, the partial semantic information of the object can be obtained, then the characteristics output by each network are subjected to global average pooling, the pooled characteristics are spliced to form a 6144 vector, a 200-dimensional classification layer is added to the vector for end-to-end training, and finally a classification result is obtained through Softmax.
3. The enhanced complementary fine-grained image classification network system of claim 2, wherein: the DM driving module performs cutting and amplifying on the area with the greatest influence on the result, and sends the area into the strengthening network; and the DM driving module deletes the area with the greatest influence on the result in the original image and sends the area to the complementary network.
4. The enhanced complementary fine-grained image classification network system according to claim 3, wherein: the DM driving module receives the feature map obtained after the training of the basic network, then generates a square area taking (x, y) as the center and half of l as the side length, cuts and amplifies the area, sends the area into the strengthening network, generates an image mask according to the area, and inputs the image mask into the complementary network for complementary learning.
5. The enhanced complementary fine-grained image classification network system according to claim 4, wherein: the DM driving module consists of two full-connection layers, the input end is a feature map, the output end is the most important local area of the neural network, and the most important local automatic positioning can be realized through the full-connection layers;
given an image X, inputting the image X into a trained convolution layer for feature extraction, wherein Tn represents an overall parameter, the whole process can be described as convolution, pooling and activation of the X, and finally a probability distribution p is generated, and a probability distribution formula is as follows:
p(X)=f(Tn*X)
where f (·) represents the fully connected layer, which converts the features extracted by the convolutional neural network into feature vectors and uses softmax to convert the vectors into probability values.
6. The enhanced complementary fine-grained image classification network system according to claim 5, wherein the initialization parameter calculation formula of the DM driver module is:
wherein F represents the feature map output by the last layer in the convolutional neural network, n represents the number of the feature maps, d represents the total number of the feature maps, and F is the total feature map obtained by adding each feature map;
the average value comparison formula of the DM driving module is as follows:
wherein h, w represent the width and height of the feature map respectively, and I represents the mean value of the feature map;
by passing through and Fi,j And generating initialization coordinates for obtaining the center of the bounding box. After the initialization coordinates are obtained, the model can automatically optimize the coordinates according to the training process, then the region is needed to be cut and amplified, and a finer local region is obtained and then sent into a strengthening network for learning; the upper left and lower right corner coordinates of the local area are obtained from the center coordinates and the side length, the upper left corner sitting being marked (t) lx ,t rx ) The lower right corner is marked as (t) ly ,t ry ) The calculation formula is as follows:
t lx =x-l,t rx =y-l
t ly =x+l,t ry =y+l。
7. the enhanced complementary fine-grained image classification network system according to claim 6, wherein the cropping operation can be considered as a multiplication between the original image and the module, expressed as:
wherein ,Xcrop In order to cut out the region,representing a clipping operation between the artwork and the template, M (·) is an attention mask whose expression is:
M(·)=[μ(i-t lx )-μ(i-t ly )]×[μ(j-t rx )-μ(j-t ry )]
wherein i, j are respectively at any point in the feature map, if i, j are positioned in the feature map, the value of M (·) is 1, otherwise, the value is 0; while μ (·) is a continuous derivative function, its expression is:
8. the enhanced complementary fine-grained image classification network system according to claim 7, wherein the size of the extracted local area is enlarged by using a bilinear interpolation algorithm, and a defended local area is obtained according to the ratio of the original image to the local area, and the algorithm formula is as follows:
wherein , and Xa Representing the area of the local area and the whole area, respectively, < >>Is the area ratio, X local Is an enlarged local area.
9. The enhanced complementary fine-grained image classification network system according to claim 1, wherein the DM loss function is:
wherein m represents the size of a batch, W represents the output result of the full connection layer, yi represents the category of the ith picture, xi represents the feature vector of the ith picture before the full connection layer, b represents the network bias, s is the number of target categories to continuously optimize the position information of the strengthening network, and simultaneously provide more accurate mask positions to enable the complementary network to learn the secondary features.
10. The enhanced complementary fine-grained image classification network system of claim 1, wherein the verification method of verifying the dataset comprises the steps of:
s1, training a feature extraction network by utilizing training weights of a backbone network acceptance-V3 on an Image Net, reserving parameters of a pooling layer, an input layer and a convolution layer, removing an existing full-connection layer and a softmax layer to finely tune the network, and training data used in the process;
s2, calculating the key region through the reinforced complementary learning network structure, finding the coordinate information of the most key region, and cutting and amplifying to generate a finer training result.
CN202310842868.5A 2023-07-10 2023-07-10 Enhanced complementary fine-grained image classification network system Pending CN116797847A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310842868.5A CN116797847A (en) 2023-07-10 2023-07-10 Enhanced complementary fine-grained image classification network system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310842868.5A CN116797847A (en) 2023-07-10 2023-07-10 Enhanced complementary fine-grained image classification network system

Publications (1)

Publication Number Publication Date
CN116797847A true CN116797847A (en) 2023-09-22

Family

ID=88044937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310842868.5A Pending CN116797847A (en) 2023-07-10 2023-07-10 Enhanced complementary fine-grained image classification network system

Country Status (1)

Country Link
CN (1) CN116797847A (en)

Similar Documents

Publication Publication Date Title
CN110210551B (en) Visual target tracking method based on adaptive subject sensitivity
CN112200161B (en) Face recognition detection method based on mixed attention mechanism
CN108427920B (en) Edge-sea defense target detection method based on deep learning
CN110443818B (en) Graffiti-based weak supervision semantic segmentation method and system
CN114202672A (en) Small target detection method based on attention mechanism
CN110309842B (en) Object detection method and device based on convolutional neural network
US10262214B1 (en) Learning method, learning device for detecting lane by using CNN and testing method, testing device using the same
CN111310773A (en) Efficient license plate positioning method of convolutional neural network
CN111612008A (en) Image segmentation method based on convolution network
CN110991444B (en) License plate recognition method and device for complex scene
Pham et al. Road damage detection and classification with YOLOv7
US20220398737A1 (en) Medical image segmentation method based on u-network
CN107784288A (en) A kind of iteration positioning formula method for detecting human face based on deep neural network
CN113609896A (en) Object-level remote sensing change detection method and system based on dual-correlation attention
CN113034506B (en) Remote sensing image semantic segmentation method and device, computer equipment and storage medium
CN112861970B (en) Fine-grained image classification method based on feature fusion
CN112509046B (en) Weak supervision convolutional neural network image target positioning method
CN112396036B (en) Method for re-identifying blocked pedestrians by combining space transformation network and multi-scale feature extraction
CN115880529A (en) Method and system for classifying fine granularity of birds based on attention and decoupling knowledge distillation
CN114612658A (en) Image semantic segmentation method based on dual-class-level confrontation network
CN106778579A (en) A kind of head pose estimation method based on accumulative attribute
Li et al. A new algorithm of vehicle license plate location based on convolutional neural network
CN116797847A (en) Enhanced complementary fine-grained image classification network system
CN113887536B (en) Multi-stage efficient crowd density estimation method based on high-level semantic guidance
CN113792660B (en) Pedestrian detection method, system, medium and equipment based on improved YOLOv3 network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination