CN113139587A - Double-quadratic pooling model for self-adaptive interactive structure learning - Google Patents

Double-quadratic pooling model for self-adaptive interactive structure learning Download PDF

Info

Publication number
CN113139587A
CN113139587A CN202110350164.7A CN202110350164A CN113139587A CN 113139587 A CN113139587 A CN 113139587A CN 202110350164 A CN202110350164 A CN 202110350164A CN 113139587 A CN113139587 A CN 113139587A
Authority
CN
China
Prior art keywords
pooling
model
features
layer
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110350164.7A
Other languages
Chinese (zh)
Other versions
CN113139587B (en
Inventor
谭敏
袁富
俞俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202110350164.7A priority Critical patent/CN113139587B/en
Publication of CN113139587A publication Critical patent/CN113139587A/en
Application granted granted Critical
Publication of CN113139587B publication Critical patent/CN113139587B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a biquadratic pooling model for self-adaptive interactive structure learning. The invention comprises the following steps: firstly, extracting multi-level depth features of an image by using a hierarchical depth model, and constructing a weight vector with dimension being the number of pooling groups after obtaining a plurality of groups of biquadratic pooling features among cross-level features; adding a multiplication module of weight and pooling characteristics in the deep network, and classifying on the weighted pooling characteristics; secondly, applying sparse constraint of L1 norm to the whole weight vector; a supervision module is then designed to build classification penalties on all the weighted pooled features. 4. And establishing a multi-task end-to-end deep learning model according to the steps, training and fine-tuning the whole network on a specific data set, and testing the performance of the final model on a test set. The invention can self-adaptively mine the most suitable interactive structure aiming at a specific data set, and has strong practicability and universality.

Description

Double-quadratic pooling model for self-adaptive interactive structure learning
Technical Field
The invention relates to the field of fine-grained image classification, in particular to learning of a bi-quadratic pooling-based adaptive interaction structure, and a competitive classification result is realized on a common reference data set by means of pure visual information.
Background
Image classification is a popular research topic in the field of computer vision. With the progress of deep learning, the classification of fine-grained images is paid considerable attention, and the technology plays an important role in many fields such as endangered organism species protection, commodity identification, traffic violation automobile management and the like. Many fine-grained classification methods based on deep learning have been proposed in recent years. Fine-grained image classification aims at distinguishing objects from different sub-categories in a general category, e.g. different kinds of birds, dogs or different types of cars. However, fine-grained classification is a very challenging task because objects from similar sub-categories may have slight category differences, while objects of the same sub-category may exhibit large appearance changes due to different shooting scales or perspectives, or different object poses, complex backgrounds, and occlusion changes, thereby making fine-grained classification more difficult. As such, fine-grained image classification still faces significant challenges.
The classification of fine-grained images can be divided into two categories, namely strong supervision and weak supervision, according to the existence of manual labeling information. The strong supervision fine-grained classification needs to provide marking information in the training process, mainly comprises marking frames, local area positioning and the like, and accurately completes local positioning and obtains foreground objects by means of the information, but the manual marking information is expensive, so that the practicability of the algorithm is limited.
The classification of the weakly supervised fine granularity only requires to provide class information for the image, so that the application scene is wider, most of the algorithms in recent years are used for researching the classification of the weakly supervised fine granularity image and obtain greater breakthrough, and a common solution idea is high-order pooling. A series of specific algorithms are derived based on the idea of high-order pooling, such as Bilinear Convolutional Neural Network (BCNN), Hierarchical Bilinear Pooling (HBP), and biquadratic pooling (HQP). The purpose of these models is to fully exploit important information with discrimination in images. However, the existing pooling method usually only considers fixed feature interaction, does not fully explore complementarity of feature interaction of different levels and different scales in a deep neural network, and does not consider how to select the most suitable feature combination or interaction structure from a plurality of groups of pooled features.
Disclosure of Invention
The invention provides a biquadratic pooling model for adaptive interactive structure learning based on the ideas of biquadratic pooling (HQP) and adaptive learning. The method integrates self-adaptive interactive structure selection and image classification into a unified multi-task model framework, can complete training end to end, and realizes competitive classification accuracy rate on a common fine-grained classification benchmark data set, and comprises the following steps:
step (1): image data pre-processing
Because the size of the images in the data set (the existing data set) is different, the images need to be subjected to size transformation and conventional data enhancement operation before model training, so that the sizes of the images are consistent.
Step (2): and constructing a hierarchical depth model based on the bi-quadratic pooling multi-scale feature interaction.
In the convolutional neural network, the model outputs of different levels contain target features of different granularities, and the target features from coarse to fine correspond to the outputs of the model from high to shallow. The features of different levels are fused by a biquadratic pooling (HQP) method, so that the detail features having a critical effect on classification can be effectively extracted.
And (3): constructing weight vectors
Extracting a plurality of bi-quadratic pooling characteristics of the preprocessed image by using a hierarchical depth model, and constructing a weight vector with dimension being the number of the bi-quadratic pooling characteristics; and adding weighted pooling features in the hierarchical depth model; the weighted pooling feature is obtained by multiplying the weight vector by the corresponding biquadratic pooling feature.
And (4): sparsely constraining weight vectors
And applying sparse constraint of L1 norm to the weight vector in the training process of the hierarchical model, so that the hierarchical depth model can obtain excellent classification performance more easily while optimizing the weighting pooling characteristics.
And (5): design supervision module
In order to guarantee the convergence of the hierarchical depth model training and the stability of the gradient flow in the training process, a supervision module is designed, and a global classification loss is constructed by utilizing all weighted pooling characteristics.
And (6): model training and testing
And establishing a multi-task end-to-end hierarchical depth model according to the steps, training and fine-tuning the whole model on a specified data set, and testing the performance of the final hierarchical depth model on a test set.
The image data preprocessing in the step (1) comprises the following specific steps:
because the sizes of the images in the data sets are different, all the images are uniformly adjusted to a certain specified size by a bilinear interpolation method, and the optimal specified sizes of different data sets are different. Next, the resized image is randomly cropped to obtain image data of 448 × 448. The cropped image is then flipped horizontally with a 50% probability. And finally, carrying out normalization processing on the image.
Establishing a hierarchical depth model based on the biquadratic pooling multi-scale feature interaction in the step (2), wherein the specific process is as follows:
in the convolutional neural network, as the network goes from shallow to deep, the characteristic size output by convolutional layers at different depths is gradually reduced. We therefore divide the convolutional neural network into stages, with the division criterion being that convolutional layers that enable the output of the same size feature are in the same stage.
2-1, in the same convolutional neural network, selecting the last three stages, and respectively calling the three stages as a low stage, a medium stage and a high stage according to the characteristic sizes in each stage from big to small. The features of one or more convolutional layers are selected from each stage, and the features selected in the three stages are respectively called a low-layer feature group, a middle-layer feature group and a high-layer feature group. The low-level feature group comprises at least one convolutional layer feature and at most all convolutional layer features in a low-level stage; the middle layer feature set and the high layer feature set respectively comprise at least two convolutional layer features, and at most all convolutional layer features of the whole corresponding stage. And then, respectively adjusting the features contained in the low-layer feature group and the middle-layer feature group by using a residual error down-sampling module to ensure that the feature size is consistent with the feature size in the high-layer feature group.
The residual downsampling module is as follows:
the residual downsampling structure has two branches: the main branch contains a maximal pooling of size k x k and step size k, followed by a convolution layer with convolution kernel size and step size of 1. The other residual branch contains a convolutional layer with the convolutional kernel size and the step size of k, and is used for compensating the information lost due to the maximum pooling in the main branch. Finally, the characteristics of the two branches are added and then pass through a normalization layer.
And 2-2, performing double secondary pooling operation on the features among the low-level feature group, the middle-level feature group and the high-level feature group. After new low-layer and middle-layer feature groups are obtained through a residual error down-sampling module in the step 2-1, inner products are firstly made between every two features contained in different layer feature groups in a cross-layer mode, and the features contained in the different layer feature groups are interacted with each other; and then performing matrix outer product on each interacted feature and the transpose of the feature to obtain biquadratic pooling features, namely pooling features for short, so as to obtain a hierarchical depth model based on biquadratic pooling.
The weight vector construction process in the step (3) is specifically as follows:
and 3-1, after the hierarchical depth model established in the step 2-2 generates a plurality of pooling features, constructing weight vectors with dimensions equal to the number of the pooling features.
3-2, because the importance of the pooling features is positively correlated with the 'significance' of the output visual features, when the hierarchical depth model is trained for the first round, the mean value of each pooling feature obtained by the hierarchical depth model is obtained, the weight vector is initialized by the mean values of all pooling features, and the weight vector is normalized in the training iteration process, so that the range of each value in the weight vector w is in [0,1], and the specific formula is as follows:
Figure BDA0003002180180000041
wherein max (), min () take the maximum value, minimum value respectively for all values in the weight vector. Relu (w) represents the linear commutation activation function.
And 3-3, correspondingly multiplying the normalized weight vector and all the pooling features to obtain the weighted pooling feature.
The sparse constraint in the step (4) is that the regularization constraint of an L1 norm is implemented on the weight vector during model training, so that the sparsity of the weight vector is ensured, and the final classification performance of the model is improved.
The design supervision module in the step (5) utilizes all weighted pooling characteristics to construct a global classification loss.
Since the dimension of fully stitching all weighted pooling features is very high, classifying all weighted pooling features after averaging through one layer of full connection is called a supervision module, because: 1) it can provide smooth gradients for all network sub-branches involved in the weighted pooling feature to facilitate stable training; 2) it helps to learn more reasonable weight vectors by minimizing the overall loss of all weighted pooled features; 3) this global classification penalty is only used to monitor the training process and will be ignored during the testing process. The supervision module ensures the training safety and the reliability of the weight vector.
The step (6) of constructing the multitask deep learning model specifically refers to that after an end-to-end frame is established according to the steps (2), (3), (4) and (5), actual classification loss, overall classification loss of a supervision module and sparse constraint of a weight vector are optimized on a specified data set at the same time.
The actual classification loss construction process is as follows:
and selecting the weighting pooling features corresponding to the maximum K numerical values according to the magnitude of each numerical value in the weight vector, splicing the selected weighting pooling features, and then using the spliced weighting pooling features for final classification through one layer of full connection, wherein the generated classification loss is called actual classification loss. Wherein the K value is minimum 1 and maximum number of weighted pooling features.
Firstly, a hierarchical depth model is constructed according to a specific convolutional neural network, and a final bi-quadratic pooling model for self-adaptive interactive structure learning is obtained after a weight vector, a sparse constraint module and a supervision module are added on the hierarchical depth model. In the training process, firstly, parameters of a specific convolutional neural network part obtained by pretraining an Imagenet data set are fixed, and only parameters of other newly added modules are trained; and then fine-tuning the whole network to obtain a final model and testing the training effect on the test set. The specific optimization objective function of the whole model is as follows:
Figure BDA0003002180180000051
wherein θ, w represent the parameters and weight vectors of the model, respectively; y issA label representing a sample s;
Figure BDA0003002180180000052
respectively representing the actual classification output of the model of the sample s and the overall classification output of the monitoring module; α, λ, λ', δ respectively represent the ratio between the losses; n represents the number of pictures in the training set.
The invention has the beneficial effects that:
based on the concepts of biquadratic pooling (HQP) and adaptive learning, a biquadratic pooling Model (MSHQP) for adaptive interactive structure learning for fine-grained image classification is proposed. Through the weight vector, the model can self-adaptively select the optimal pooling feature combination suitable for a specific data set from a plurality of pooling features, and the current leading or competitive accuracy rate is obtained on a common reference data set. In addition, the model for self-adaptive interactive structure learning provided by the invention can be applied to fine-grained image classification, can be used as a more universal module to be conveniently applied to various other tasks, and can improve the performance of the model without influencing the reasoning efficiency of the model.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
FIG. 2 is a schematic diagram of a model framework constructed in the method of the present invention.
Detailed Description
The present invention will be further described with reference to FIGS. 1 and 2.
Firstly, extracting multi-level depth features of an image by using a hierarchical depth model, and constructing a weight vector with dimension being the number of pooling groups after obtaining a plurality of groups of biquadratic pooling features among cross-level features; adding a multiplication module of weight and pooling characteristics in the deep network, and classifying on the weighted pooling characteristics; secondly, applying sparse constraint of L1 norm to the whole weight vector; then, in order to guarantee the convergence of model training and the stability of gradient flow, a supervision module is designed in the training process, and classification losses are constructed on all the weighted pooling characteristics. 4. And establishing a multi-task end-to-end deep learning model according to the steps, training and fine-tuning the whole network on a specific data set, and testing the performance of the final model on a test set. The invention can self-adaptively mine the most suitable interactive structure aiming at a specific data set, and has strong practicability and universality.
The invention specifically realizes the following steps:
the first step is as follows:
we used three data sets CUB-200-2011, Stanford Cars, FGVC-Aircraft to validate our bi-quadratic pooling model for adaptive interaction structure learning. When the model is trained, firstly, the sizes of the images in the three data sets are respectively adjusted to 600 × 600, 500 × 500 and 500 × 480 through a bilinear interpolation method, then, each image is randomly cut to 448 × 448, then, the images are randomly and horizontally turned over with the probability of 50%, and finally, the pixel values of the images are normalized. When testing a model, the data processing is similar to that of training, but does not require random horizontal flipping.
The second step is that:
the following procedure is illustrated by way of example with Resnet 34. In Resnet34, the low-level feature group was selected as Conv3_4, with original feature sizes of 128 × 56, representing the number of channels × feature height × feature width, respectively. The middle layer feature group is selected from Conv4_2, Conv4_4 and Conv4_6, and the original feature size is 256 × 28. The high level feature group selects Conv5_1, Conv5_2, Conv5_3, the original feature size is 512 × 14. All the features in the low-level feature group and the middle-level feature group are respectively subjected to residual error down-sampling modules with k being 4 and 2, and the sizes of the new down-sampled features are 512 x 14.
When double-secondary pooling (HQP) is performed, two features in different hierarchical feature groups are subjected to inner product, the size of the feature after inner product is still 512 × 14, deformation is 512 × 196, the deformed feature is subjected to outer product with the self transpose to obtain the feature after outer product with the size of 512 × 512, and the feature is deformed again to be the pooled feature of 1 × 262144. And performing double-secondary pooling on the features in the low-layer feature group and the middle-layer feature group to obtain three pooled features, performing double-secondary pooling on the features in the low-layer feature group and the high-layer feature group to obtain three groups of pooled features, and performing double-secondary pooling on the features in the middle-layer feature group and the high-layer feature group to obtain nine groups of pooled features. All biquadratic pooling features of dimension 15 × 262144 were finally obtained.
The third step:
and constructing a trainable one-dimensional weight vector with the length of 15, averaging the pooled features with the dimension of 15 × 262144 output by the first forward process in the model training on the second dimension to obtain a vector with the dimension of 15 × 1, and taking the value of the vector as an initialization value of the weight vector. The weight vector is normalized according to formula 1 in each forward process. And then multiplying the normalized weight vector by the corresponding pooling feature with the dimension of 15 x 262144 to obtain the weighted pooling feature. And finally, selecting the first K corresponding weighted pooling features according to the numerical values in the weight vector, splicing the weighting pooling features to be used as final actual classification features, wherein the cross entropy loss after full connection and softmax is called actual classification loss. In our experiment, K is selected from 1 to 5, and the classification effect on three data sets is shown in table 1 below, and it can be seen that when K is equal to 3, the optimal classification performance can be obtained by selecting 3 groups of weighted pooling features with the largest weight for splicing.
TABLE 1 Classification accuracy in selecting different weighted pooled feature quantities
K 1 2 3 4 5
CUB-200-2011 87.2 87.9 88.5 88.2 88.3
StanfordCars 94.0 93.9 94.4 94.1 94.1
FGVC-Aircraft 92.0 92.3 92.8 92.4 92.5
The fourth step:
in the model training process, sparse constraints of L1 or L2 paradigm are applied to the weight vectors alone, or combined sparse constraints of L1 and L2. As shown in Table 2 below, the model has the best classification effect when sparse constraint is performed using the L1 paradigm on the CUB-200-2011 data set.
TABLE 2 Classification accuracy at different sparseness regimes
Sparse mode L2 L1 L2+L1
88.1 88.5 88.2
The fifth step:
and averaging the weighted pooling features in the third step in the first dimension to obtain an average weighted pooling feature with the dimension of 1 × 262144, wherein the cross entropy loss of the feature after the full connection and the softmax is called the global classification loss of the supervision module. When the proposed adaptive interactive structure learning model is verified, an ablation experiment is carried out on the sparse constraint and supervision module, the experimental result is shown in table 3 below, wherein the reference represents the adaptive interactive structure learning model without the sparse constraint and supervision module, and the joint control represents the adaptive interactive structure learning method with the sparse constraint and supervision module.
TABLE 3 ablation experiment of adaptive interaction structure learning module
Datum Sparse constraints Supervision signal Joint control
CUB-200-2011 87.6 87.2 88.0 88.5
StanfordCars 94.1 93.9 94.3 94.4
FGVC-Aircraft 92.1 92.1 92.6 92.8
And a sixth step:
and loading pre-training parameters of the Resnet34 model on the Imagenet data set, and removing the final full-link layer to serve as a hierarchical deep visual feature extraction model. And establishing a biquadratic pooling and self-adaptive interactive structure learning module according to the second, third, fourth and fifth steps after the visual feature extraction model. And taking actual classification loss, global classification loss of a supervision module and weight vector sparse constraint loss as final model loss. Firstly, parameters of a visual feature extraction module are fixed, parameters of a subsequent pooling and self-adaptive interactive structure learning part are trained independently, and when the model approaches convergence, parameters of the whole model are fine-tuned until the model is completely converged. In the model reasoning stage, the supervision module branches in the model are discarded to reduce the model parameters and accelerate the model reasoning speed.
Finally, we added both Stanford-Dog and VegFru datasets and validated the bi-quadratic pooling model of our proposed adaptive interaction structure learning on VGG16, Resnet34, Resnet50, Resnet152 convolutional neural networks, with classification performance on five reference datasets as in table 4 below.
TABLE 4 Classification accuracy under different convolutional neural networks
Figure BDA0003002180180000091

Claims (6)

1. A bi-quadratic pooling model for self-adaptive interaction structure learning is characterized in that self-adaptive interaction structure selection and image classification are fused in a unified multi-task model framework, training can be completed end to end, and competitive classification accuracy is realized on a fine-grained classification benchmark data set, and the specific realization steps are as follows:
step (1): image data pre-processing
Before model training, carrying out size transformation and data enhancement operation on the images to ensure that the sizes of the images are consistent;
step (2): constructing a hierarchical depth model based on the multi-scale feature interaction of biquadratic pooling;
and (3): constructing weight vectors
Extracting a plurality of bi-quadratic pooling characteristics of the preprocessed image by using a hierarchical depth model, and constructing a weight vector with dimension being the number of the bi-quadratic pooling characteristics; and adding weighted pooling features in the hierarchical depth model; the weighted pooling features are obtained by corresponding multiplication of the weight vectors and the corresponding biquadratic pooling features;
and (4): sparsely constraining weight vectors
And (5): designing a supervision module, and then constructing a global classification loss by using all the weighted pooling characteristics;
and (6): and (5) training and testing the model.
2. The bi-quadratic pooling model for adaptive interaction structure learning of claim 1, wherein the step (2) is implemented as follows:
2-1, selecting the last three stages in the same convolutional neural network, and respectively calling the three stages as a low-stage, a middle-stage and a high-stage according to the characteristic size of each stage from big to small; selecting the characteristics of one or more convolution layers from each stage, wherein the characteristics selected in the three stages are respectively called a low-layer characteristic group, a middle-layer characteristic group and a high-layer characteristic group; the low-level feature group comprises at least one convolutional layer feature and at most all convolutional layer features in a low-level stage; the middle layer feature group and the high layer feature group respectively at least comprise two convolution layer features and at most comprise all convolution layer features of the whole corresponding stage; then utilizing a residual error down-sampling module to respectively adjust the features contained in the low-layer feature group and the middle-layer feature group so as to enable the feature size to be consistent with the feature size in the high-layer feature group;
2-2, performing double secondary pooling operation on the characteristics among the low-layer characteristic group, the middle-layer characteristic group and the high-layer characteristic group; after new low-layer and middle-layer feature groups are obtained through a residual error down-sampling module in the step 2-1, inner products are firstly made between every two features contained in different layer feature groups in a cross-layer mode, and the features contained in the different layer feature groups are interacted with each other; and then performing matrix outer product on each interacted feature and the transpose of the feature to obtain biquadratic pooling features, namely pooling features for short, so as to obtain a hierarchical depth model based on biquadratic pooling.
3. The bi-quadratic pooling model for adaptive interaction structure learning of claim 2, wherein the residual down-sampling module is as follows:
the residual downsampling structure has two branches: the main branch comprises a maximum pooling layer with the size of k × k and the step size of k, and then a convolution layer with the convolution kernel size and the step size of 1; the other residual branch comprises a convolution layer with the convolution kernel size and the step length both being k and is used for compensating the information lost due to the maximum pooling in the main branch; finally, the characteristics of the two branches are added and then pass through a normalization layer.
4. The bi-quadratic pooling model for adaptive interaction structure learning according to claim 2 or 3, wherein the weight vector construction process in step (3) is specifically as follows:
3-1, after the hierarchical depth model generates a plurality of pooling features, constructing weight vectors with dimensions equal to the number of the pooling features;
3-2, when the hierarchical depth model is subjected to first-round training, solving the mean value of each pooled feature obtained by the hierarchical depth model, initializing a weight vector by using the mean value of all pooled features, and normalizing the weight vector in the training iteration process to ensure that the range of each value in the weight vector w is in [0,1], wherein the specific formula is as follows:
Figure FDA0003002180170000021
wherein max { } and min { } take the maximum value and the minimum value respectively for all values in the weight vector; relu (w) represents a linear commutation activation function;
and 3-3, correspondingly multiplying the normalized weight vector and all the pooling features to obtain the weighted pooling feature.
5. The bi-quadratic pooling model of adaptive interaction structure learning of claim 4, wherein said design supervision module of step (5) is configured to use all weighted pooling features to construct global classification loss.
6. The bi-quadratic pooling model for adaptive interaction structure learning according to claim 5, wherein the step (6) is implemented by constructing a multi-task deep learning model, specifically, after an end-to-end frame is established according to the steps (2), (3), (4) and (5), on a designated data set, the actual classification loss, the global classification loss of a supervision module and the sparse constraint of a weight vector are optimized simultaneously;
the actual classification loss construction process is as follows:
selecting the weighting pooling features corresponding to the maximum K numerical values according to the magnitude of each numerical value in the weight vector, splicing the selected weighting pooling features, and then using the splicing result for final classification through a layer of full connection, wherein the generated classification loss is called actual classification loss; wherein the K value is minimum 1 and maximum number of weighted pooling features;
firstly, a hierarchical depth model is constructed according to a specific convolutional neural network, and a final bi-quadratic pooling model for self-adaptive interactive structure learning is obtained after a weight vector, a sparse constraint module and a supervision module are added on the hierarchical depth model; in the training process, firstly, parameters of a specific convolutional neural network part obtained by pretraining an Imagenet data set are fixed, and only parameters of other newly added modules are trained; then fine-tuning the whole network to obtain a final model and testing the training effect on the test set; the specific optimization objective function of the whole model is as follows:
Figure FDA0003002180170000031
wherein θ, w represent the parameters and weight vectors of the model, respectively; y issA label representing a sample s;
Figure FDA0003002180170000032
respectively representing the actual classification output of the model of the sample s and the overall classification output of the monitoring module; α, λ, λ', δ respectively represent the ratio between the losses; n represents the number of pictures in the training set.
CN202110350164.7A 2021-03-31 2021-03-31 Double secondary pooling model for self-adaptive interactive structure learning Active CN113139587B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110350164.7A CN113139587B (en) 2021-03-31 2021-03-31 Double secondary pooling model for self-adaptive interactive structure learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110350164.7A CN113139587B (en) 2021-03-31 2021-03-31 Double secondary pooling model for self-adaptive interactive structure learning

Publications (2)

Publication Number Publication Date
CN113139587A true CN113139587A (en) 2021-07-20
CN113139587B CN113139587B (en) 2024-02-06

Family

ID=76810226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110350164.7A Active CN113139587B (en) 2021-03-31 2021-03-31 Double secondary pooling model for self-adaptive interactive structure learning

Country Status (1)

Country Link
CN (1) CN113139587B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578060A (en) * 2017-08-14 2018-01-12 电子科技大学 A kind of deep neural network based on discriminant region is used for the method for vegetable image classification
CN108764281A (en) * 2018-04-18 2018-11-06 华南理工大学 A kind of image classification method learning across task depth network based on semi-supervised step certainly
CN109255381A (en) * 2018-09-06 2019-01-22 华南理工大学 A kind of image classification method based on the sparse adaptive depth network of second order VLAD
CN110349148A (en) * 2019-07-11 2019-10-18 电子科技大学 Image target detection method based on weak supervised learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578060A (en) * 2017-08-14 2018-01-12 电子科技大学 A kind of deep neural network based on discriminant region is used for the method for vegetable image classification
CN108764281A (en) * 2018-04-18 2018-11-06 华南理工大学 A kind of image classification method learning across task depth network based on semi-supervised step certainly
CN109255381A (en) * 2018-09-06 2019-01-22 华南理工大学 A kind of image classification method based on the sparse adaptive depth network of second order VLAD
CN110349148A (en) * 2019-07-11 2019-10-18 电子科技大学 Image target detection method based on weak supervised learning

Also Published As

Publication number Publication date
CN113139587B (en) 2024-02-06

Similar Documents

Publication Publication Date Title
CN110135366B (en) Shielded pedestrian re-identification method based on multi-scale generation countermeasure network
CN109859190B (en) Target area detection method based on deep learning
US11501415B2 (en) Method and system for high-resolution image inpainting
CN108764292B (en) Deep learning image target mapping and positioning method based on weak supervision information
CN111652321B (en) Marine ship detection method based on improved YOLOV3 algorithm
Zeng et al. Single image super-resolution using a polymorphic parallel CNN
CN112288011B (en) Image matching method based on self-attention deep neural network
CN112329658A (en) Method for improving detection algorithm of YOLOV3 network
CN111612008A (en) Image segmentation method based on convolution network
CN111696110B (en) Scene segmentation method and system
CN112561027A (en) Neural network architecture searching method, image processing method, device and storage medium
CN112529146B (en) Neural network model training method and device
CN113239782A (en) Pedestrian re-identification system and method integrating multi-scale GAN and label learning
CN109389667B (en) High-efficiency global illumination drawing method based on deep learning
CN110298841B (en) Image multi-scale semantic segmentation method and device based on fusion network
CN111695494A (en) Three-dimensional point cloud data classification method based on multi-view convolution pooling
CN112927209B (en) CNN-based significance detection system and method
CN112330719A (en) Deep learning target tracking method based on feature map segmentation and adaptive fusion
CN112101364B (en) Semantic segmentation method based on parameter importance increment learning
CN110866938B (en) Full-automatic video moving object segmentation method
CN111767860A (en) Method and terminal for realizing image recognition through convolutional neural network
Zhou et al. Online filter clustering and pruning for efficient convnets
CN114821058A (en) Image semantic segmentation method and device, electronic equipment and storage medium
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
CN115482518A (en) Extensible multitask visual perception method for traffic scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant