CN112132004B - Fine granularity image recognition method based on multi-view feature fusion - Google Patents
Fine granularity image recognition method based on multi-view feature fusion Download PDFInfo
- Publication number
- CN112132004B CN112132004B CN202010992253.7A CN202010992253A CN112132004B CN 112132004 B CN112132004 B CN 112132004B CN 202010992253 A CN202010992253 A CN 202010992253A CN 112132004 B CN112132004 B CN 112132004B
- Authority
- CN
- China
- Prior art keywords
- feature
- loss function
- bilinear
- image
- fine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000004927 fusion Effects 0.000 title claims abstract description 26
- 239000013598 vector Substances 0.000 claims abstract description 50
- 230000006870 function Effects 0.000 claims abstract description 38
- 230000001629 suppression Effects 0.000 claims abstract description 22
- 238000000605 extraction Methods 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 8
- 230000005764 inhibitory process Effects 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 5
- 238000012935 Averaging Methods 0.000 claims description 3
- 230000002401 inhibitory effect Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims 2
- 230000000007 visual effect Effects 0.000 abstract description 3
- 230000003993 interaction Effects 0.000 abstract description 2
- 238000012545 processing Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 238000011176 pooling Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 210000003323 beak Anatomy 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
A fine-grained image recognition method based on multi-view feature fusion relates to the technical field of image processing, and solves the problems that the existing fine-grained image recognition method ignores detail information of images and has poor adaptability of visual differences among images, an introduced loss function is complex, the quantity of parameters of a model is increased and the like; the invention introduces a suppression branch, and forces the network to find subtle discriminant features among the confusing categories by suppressing the most significant region in the image. And introducing a similar comparison learning module, fusing the feature vectors of similar samples, and increasing the interaction information of different images under the same category. And a center loss function is also introduced, so that the distance between the features and the centers of the corresponding classes is minimized, and the learned features are more discriminative. The accuracy of fine-grained image recognition is improved.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a fine-granularity image recognition method based on multi-view feature fusion.
Background
Fine-grained image classification is based on distinguishing basic categories, such as birds or dogs, for finer subclassification. This problem therefore requires capturing subtle inter-class differences and fully mining the discriminative features of the image.
Fine-grained objects are widely available in real life, and fine-grained image recognition corresponding to the fine-grained objects is an important research topic in computer vision recognition. Current fine-grained image recognition presents mainly the following three challenges: (1) The same category may appear to have a large variance due to differences in pose, background, and shooting angle. (2) The different categories, due to belonging to the same parent category, differ between them only in some subtle areas, such as the beak and tail of a bird, etc. And (3) collecting and labeling fine-grained images is time-consuming and labor-consuming. As shown in fig. 1.
The existing method mainly achieves the aim of identification through the following three aspects: (1) Fine-grained image recognition is performed based on a location-classification network. (2) More discriminant characterization is learned directly by developing a powerful depth model for fine-grained recognition. (3) And combining the global features and the local features of the image to realize fine-grained classification of the image.
In the prior art 1, the feature is extracted through a pretrained twin convolutional neural network (convolutional neural networks) by bilinear pooling fine-grained image classification (Bilinear pooling), and bilinear pooling is carried out on each channel layer of the feature to obtain high-order representation of the feature, so that the discrimination capability of the feature is enhanced. The method benefits from the realization of improvement of the accuracy of fine-grained image recognition by a new pooling mode.
The method provides a new bilinear pooling mode, but no effective design is carried out on the aspects of the relation among fine-granularity image categories, the number of model parameters, the number of detail areas and the like aiming at fine-granularity image identification. The influence of factors such as small inter-class difference, large intra-class difference and the like on the fact that the fine-grained images contain various detail information is not considered.
In the prior art 2, a Multi-Attention Multi-Class Constraint network (Multi-Attention Multi-Class Constraint) is adopted, multiple Attention (Attention) areas of an input image are extracted through a one-time compression-Multi-expansion (one-squeeze Multi-extraction) module, then metric learning (METRIC LEARNING) is introduced, a network is trained by adopting a triplet loss and a softmax loss, the same Attention of similar features is pulled, and different Attention or different types of features are pushed away, so that the relation among components is enhanced, and the improvement of the accuracy of fine-granularity image identification is realized.
The method mainly uses metric learning to improve sample distribution in feature space, and therefore it has poor adaptability to mining visual differences between a pair of images. And the introduced loss function is complex, a large number of sample pairs need to be constructed, and the parameter number of the model is greatly increased.
Disclosure of Invention
The invention provides a fine-grained image recognition method based on multi-view feature fusion, which aims to solve the problems that the existing fine-grained image recognition method ignores detail information of images and has poor adaptability of visual differences between images, an introduced loss function is complex, the number of parameters of a model is increased and the like.
A fine granularity image recognition method based on multi-view feature fusion is realized by the following steps:
Step one, bilinear feature extraction;
Inputting an original image into a bilinear feature extraction network, and fusing feature graphs output by different convolution layers to obtain bilinear feature vectors; the characteristic extraction network adopts a network structure pre-trained in a data set ImageNet;
Step two, inhibiting branch learning, wherein the specific process is as follows:
Step two, generating attention patterns according to the sizes and the threshold values of the feature graphs output by different convolution layers of the feature extraction network in the step one;
Step two, generating a suppression mask according to the attention graph in the step two, and covering the suppression mask on the original image to generate a suppression image with a masked local area;
step two, carrying out bilinear feature extraction on the suppression image in the step two by adopting the step one to obtain bilinear feature vectors, inputting the bilinear feature vectors into a full-connection layer to obtain predicted class probability values, and calculating multi-class cross entropy for the predicted class probability values;
Step three, learning a similar comparison module;
Step three, randomly selecting other N images under the same category as the original image as positive sample images;
Step three, the target image and the positive sample image in step three are sent to the feature extraction network in step one to carry out bilinear feature vector fusion, so that bilinear feature vectors with fusion features integrating a plurality of images under the same category are obtained;
Thirdly, averaging bilinear feature vectors of a plurality of images under the same category obtained in the third step to obtain a fused feature vector, inputting the fused feature vector into a full-connection layer to obtain a predicted probability, and calculating multi-category cross entropy for the obtained predicted probability of the same category;
Step four, calculating a center loss function L C;
Let v i be bilinear feature of the ith sample, c i be average feature of all samples of the class corresponding to sample i, i.e. class center, and N be the number of samples in the current batch, then the formula of the center loss function L C is as follows:
Step five, calculating a model optimization loss function;
And carrying out weighted summation on the cross entropy loss function of the bilinear feature vector of the original image, the cross entropy loss function of the bilinear feature vector of the inhibition image, the cross entropy loss function of the fusion feature and the center loss function, and obtaining the loss function of model optimization.
The invention has the beneficial effects that: the invention comprehensively considers factors such as large intra-class difference, small inter-class difference, large background noise influence and the like of fine-granularity images, introduces a suppression branch, and forces the network to find subtle distinguishing characteristics among confusable classes by suppressing the most obvious area in the images. A similar comparison learning module is also introduced, and feature vectors of similar samples are fused, so that interaction information of different images under the same category is increased. Meanwhile, a center loss function is introduced, the distance between the features and the corresponding class centers is minimized, and the learned features are more discriminant.
By combining the above points, the method comprehensively utilizes the global features and the local features in the judging process, achieves obvious performance improvement on a plurality of fine-granularity image classification tasks, has robustness compared with the existing method, and is easy to be practically deployed. The accuracy of fine-grained image recognition is improved.
Drawings
A, b, c and d in fig. 1 are all schematic diagrams of the existing 4-group fine-grained images;
FIG. 2 is a schematic diagram of bilinear feature extraction in a fine-grained image recognition method based on multi-view feature fusion according to the invention;
FIG. 3 is a schematic diagram of similar contrast learning in a fine-grained image recognition method based on multi-view feature fusion according to the invention;
FIG. 4 is a schematic diagram of model optimization loss function calculation in a fine-grained image recognition method based on multi-view feature fusion according to the invention;
Fig. 5 is a feature visualization effect diagram obtained by a fine-grained image recognition method based on multi-view feature fusion according to the invention.
Detailed Description
The present embodiment will be described with reference to fig. 2 to 5, which are a fine-grained image recognition method based on multi-view feature fusion, and the method is implemented by the following steps:
Step one, bilinear feature extraction: and (3) inputting original images with fixed sizes by adopting a pre-trained ResNet-50 network structure on the ImageNet, and fusing the feature images output by different convolution layers to obtain bilinear feature vectors.
In connection with fig. 2, in the feature extraction step, a network pre-trained in the dataset ImageNet is used as the base network for feature extraction, and a common image classification network such as VGGNet, googLeNet, resNet can be fine-tuned to adapt the model to a specific task. Specifically, the original image is input to a feature extraction network to obtain feature maps (feature maps) output by the last two convolution layers, which are respectively marked asWherein D1, D2 represent the number of channels of the two features, and H and W represent the height and width of the feature map, respectively. In order to solve the problem of too high feature dimension after fusion, the feature information contained in the generated feature vector is enough, and only feature graphs of n channels in the F 2 are randomly extracted to be fused with the F 1. Characteristic vectors of positions along channels in characteristic diagrams F 1 and F 2 are/> After multiplication of the two eigenvectors, a bilinear matrix/>Adding bilinear matrixes corresponding to all positions in the feature map, and expanding the matrixes into a vector, namely bilinear vector/>Where d=d1×d2. The bilinear vector provides a stronger representation of the feature than the linear model.
Step two, a branch learning inhibition step:
A. Note that the map generation step: generating the attention pattern according to the size of the feature map and the threshold value.
B. And generating a suppression mask according to the attention graph, and overlaying the suppression mask on the original image to generate a suppression image with a masked local area.
C. multi-classification cross entropy calculation: and (3) obtaining bilinear feature vectors through the first step of the suppression image, inputting the bilinear feature vectors into a full-connection layer to obtain a prediction probability value, and calculating multi-classification cross entropy for the obtained category prediction value.
In the suppression branch learning step, the following three aspects are included:
step A, outputting a feature map of a convolution layer in a feature extraction network P d, ordered by the average, and the value of top-5 is selected to calculate entropy:
Attention is directed to FIG. A by comparing the entropy and the magnitude of threshold δ:
Step B of enlarging the attention map to the original image size, calculating an average value M thereof, taking m×θ as a threshold, setting elements larger than the threshold in the attention map to 0, and setting other elements to 1, thereby obtaining a suppression mask M:
Calculating an average value m of the attention patterns, setting a threshold value theta ranging from 0 to 1,
Step C, overlaying the suppression mask onto the original image, thereby obtaining a suppression image with the local area masked:
Is(x,y)=I(x,y)*M(x,y)
Where I (x, y) is the value of the (x, y) position in I in the original image.
Because the most significant areas of the image are suppressed, the attention is dispersed, forcing the neural network to learn discriminant information from other areas. The dependence of the network on training samples can be reduced, overfitting is prevented, and the robustness of the model is further improved.
Step three, a similar comparison module study step:
A. And (3) image sampling: the other N images under the same category are randomly selected as positive samples.
B. And a feature fusion step, in which the target image and the randomly sampled positive sample image are fused by the bilinear feature vector obtained in the first step, and the obtained fusion features integrate the feature information of a plurality of images under the same category.
C. Fusion feature loss function calculation: and directly inputting the fused feature vectors into a full-connection layer to obtain a prediction probability, and calculating multi-classification cross entropy for the obtained class prediction value.
Referring to fig. 3, step a randomly selects N images belonging to the same category as the input image, and all the N images are sent to the bilinear feature extraction network of step one.
Step B, averaging bilinear feature vectors of the multiple similar images output in the step A to obtain a fused feature vector:
Wherein j is the position of the feature vector, V (j) is the value of the feature vector at the j-th position, and T is the number of selected positive samples; v r (j) is the value of the nth positive sample at the jth position;
step four, calculating center loss;
A. Class center generation: in the training process, the feature vectors of the learned centers of all the categories are continuously updated.
B. Center loss calculation step: the distance between the bilinear feature vector and the class center vector obtained by each input image is used as the center loss, and the distance is continuously optimized in the training process.
In this embodiment, a feature vector is calculated for each class as the class center of the corresponding class, and this feature vector is updated continuously as training progresses. By punishing the bilinear feature vector of each sample and the offset of the sample center of the corresponding category, samples of the same category are gathered together as much as possible, and the complex sample pair construction process is omitted. Let v i be the bilinear feature of the ith sample, c i be the average feature of all samples in the class corresponding to sample i, i.e. class center, N be the number of samples in the current batch, and the formula is as follows:
step five, calculating a model optimization loss function:
and carrying out weighted summation on the cross entropy loss function of the bilinear features of the original image, the cross entropy loss function of the bilinear features of the inhibition image, the cross entropy loss function of the fusion features and the center loss function, and obtaining the model optimized loss function.
Referring to fig. 4, the cross entropy loss of bilinear feature vectors of the original image is recorded as L CE1, the cross entropy loss of bilinear feature vectors of the suppressed image is recorded as L CE2, the cross entropy loss of fusion features is recorded as L CE3, the center loss is recorded as L C, and the loss functions are weighted and summed to obtain a model-optimized loss function L:
Where λ is the weight of the center loss function.
In connection with fig. 5, the description will be given of the present embodiment, in fig. 5, of an original image selected randomly in the first behavior data set, a class activation map obtained by a global branch of the input of the original image in the second behavior, and a class activation map obtained by a third behavior suppression branch. It can be seen that in the global branch, the network learns the most prominent areas of the image, such as the beak of a bird, the head lamp of a car, etc., and in the inhibitory branch, the network learns subtle features that facilitate fine-grained classification, such as the torso of a bird, wheels, etc. The combination of multiple views enables the judgment basis of the network model to be more comprehensive, not only can the salient region be obtained, but also the fine granularity characteristic can be captured subtly.
The fine-grained image recognition method according to the embodiment introduces a new data enhancement mode, and suppresses the component area in the image through attention map guidance, so that attention is dispersed, and the network learns more complementary area characteristic information. A similar comparison module is introduced, and feature information from a plurality of images in the same category is fused, so that the representation of the images in the same category is as close as possible in an embedding space, and the classification performance is improved.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
Claims (2)
1. A fine granularity image recognition method based on multi-view feature fusion is characterized by comprising the following steps: the method is realized by the following steps:
Step one, bilinear feature extraction;
Inputting an original image into a bilinear feature extraction network, and fusing feature graphs output by different convolution layers to obtain bilinear feature vectors; the characteristic extraction network adopts a network structure pre-trained in a data set ImageNet;
Step two, inhibiting branch learning, wherein the specific process is as follows:
Step two, generating attention patterns according to the sizes and the threshold values of the feature graphs output by different convolution layers of the feature extraction network in the step one;
In the second step, the specific process of generating the attention pattern is as follows:
feature map output to last convolution layer in feature extraction network P k, wherein D is the number of channels characterized, and H and W are the height and width of the feature map, respectively; sequencing according to the average value, and obtaining the formula of the entropy E as follows:
note that fig. a is constructed by comparing the entropy and the magnitude of the threshold δ:
wherein F k is a two-dimensional feature map corresponding to each channel after channel sequencing;
Step two, generating a suppression mask according to the attention graph in the step two, and covering the suppression mask on the original image to generate a suppression image with a masked local area;
In the second step, the specific process of generating the inhibition mask is as follows:
amplifying the attention map in the second step to the original image size, calculating an average value M of the attention map, setting a threshold value theta in a range between 0 and 1, taking M theta as a threshold value, setting elements larger than the threshold value M theta in the attention map as 0, setting other elements as 1, and obtaining a suppression mask M:
wherein a (x, y) is the value of the (x, y) position in attention map a;
Overlaying a suppression mask onto the original image to obtain a suppression image I s (x, y) with a local area masked;
Is(x,y)=I(x,y)*M(x,y)
wherein I (x, y) is the value of the (x, y) position in I in the original image;
Step two, carrying out bilinear feature extraction on the suppression image in the step two by adopting the step one to obtain bilinear feature vectors, inputting the bilinear feature vectors into a full-connection layer to obtain predicted class probability values, and calculating multi-class cross entropy for the predicted class probability values;
Step three, learning a similar comparison module;
Step three, randomly selecting other N images under the same category as the original image as positive sample images;
Step three, the target image and the positive sample image in step three are sent to the feature extraction network in step one to carry out bilinear feature vector fusion, so that bilinear feature vectors with fusion features integrating a plurality of images under the same category are obtained;
Thirdly, averaging bilinear feature vectors of a plurality of images under the same category obtained in the third step to obtain a fused feature vector, inputting the fused feature vector into a full-connection layer to obtain a predicted probability, and calculating multi-category cross entropy for the obtained predicted probability of the same category;
Step four, calculating a center loss function L C;
Let v i be bilinear feature of the ith sample, c i be average feature of all samples of the class corresponding to sample i, i.e. class center, and N be the number of samples in the current batch, then the formula of the center loss function L C is as follows:
Step five, calculating a model optimization loss function;
The cross entropy loss function of the bilinear feature vector of the original image is restrained, the cross entropy loss function of the bilinear feature vector of the image is restrained, the cross entropy loss function of the fusion feature and the center loss function are weighted and summed, and a model optimized loss function is obtained;
In the fifth step, the cross entropy loss function of the bilinear feature vector of the original image is L CE1, the cross entropy loss function of the bilinear feature vector of the inhibition image is L CE2, the cross entropy loss function of the fusion feature is L CE3, the center loss function is L C, the weighted summation is carried out to obtain a model optimized loss function L, and finally the fine granularity image identification is realized; expressed by the following formula:
Where λ is the weight of the center loss function.
2. The fine-grained image recognition method based on multi-view feature fusion according to claim 1, wherein: in the third step, bilinear feature vectors of a plurality of images in the same category are averaged to obtain a fused feature vector V' (j), which is expressed as follows:
Wherein j is the position of the feature vector, V (j) is the value of the feature vector at the j-th position, and T is the number of selected positive samples; v r (j) is the value of the jth positive sample at the jth position.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010992253.7A CN112132004B (en) | 2020-09-21 | 2020-09-21 | Fine granularity image recognition method based on multi-view feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010992253.7A CN112132004B (en) | 2020-09-21 | 2020-09-21 | Fine granularity image recognition method based on multi-view feature fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112132004A CN112132004A (en) | 2020-12-25 |
CN112132004B true CN112132004B (en) | 2024-06-25 |
Family
ID=73841694
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010992253.7A Active CN112132004B (en) | 2020-09-21 | 2020-09-21 | Fine granularity image recognition method based on multi-view feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112132004B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112733912B (en) * | 2020-12-31 | 2023-06-09 | 华侨大学 | Fine granularity image recognition method based on multi-granularity countering loss |
CN112712066B (en) * | 2021-01-19 | 2023-02-28 | 腾讯科技(深圳)有限公司 | Image recognition method and device, computer equipment and storage medium |
CN112766378B (en) * | 2021-01-19 | 2023-07-21 | 北京工商大学 | Cross-domain small sample image classification model method focusing on fine granularity recognition |
CN112800927B (en) * | 2021-01-25 | 2024-03-29 | 北京工业大学 | Butterfly image fine-granularity identification method based on AM-Softmax loss |
CN112990270B (en) * | 2021-02-10 | 2023-04-07 | 华东师范大学 | Automatic fusion method of traditional feature and depth feature |
CN113255793B (en) * | 2021-06-01 | 2021-11-30 | 之江实验室 | Fine-grained ship identification method based on contrast learning |
CN113449613B (en) * | 2021-06-15 | 2024-02-27 | 北京华创智芯科技有限公司 | Multi-task long tail distribution image recognition method, system, electronic equipment and medium |
CN113642571B (en) * | 2021-07-12 | 2023-10-10 | 中国海洋大学 | Fine granularity image recognition method based on salient attention mechanism |
CN113705489B (en) * | 2021-08-31 | 2024-06-07 | 中国电子科技集团公司第二十八研究所 | Remote sensing image fine-granularity airplane identification method based on priori regional knowledge guidance |
CN115424086A (en) * | 2022-07-26 | 2022-12-02 | 北京邮电大学 | Multi-view fine-granularity identification method and device, electronic equipment and medium |
CN117725483A (en) * | 2023-09-26 | 2024-03-19 | 电子科技大学 | Supervised signal classification method based on neural network |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109685115A (en) * | 2018-11-30 | 2019-04-26 | 西北大学 | A kind of the fine granularity conceptual model and learning method of bilinearity Fusion Features |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135502B (en) * | 2019-05-17 | 2023-04-18 | 东南大学 | Image fine-grained identification method based on reinforcement learning strategy |
CN110210550A (en) * | 2019-05-28 | 2019-09-06 | 东南大学 | Image fine granularity recognition methods based on integrated study strategy |
CN110222636B (en) * | 2019-05-31 | 2023-04-07 | 中国民航大学 | Pedestrian attribute identification method based on background suppression |
CN110807465B (en) * | 2019-11-05 | 2020-06-30 | 北京邮电大学 | Fine-grained image identification method based on channel loss function |
CN111523534B (en) * | 2020-03-31 | 2022-04-05 | 华东师范大学 | Image description method |
-
2020
- 2020-09-21 CN CN202010992253.7A patent/CN112132004B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109685115A (en) * | 2018-11-30 | 2019-04-26 | 西北大学 | A kind of the fine granularity conceptual model and learning method of bilinearity Fusion Features |
Non-Patent Citations (1)
Title |
---|
基于多视角融合的细粒度图像分类方法;黄伟锋;张甜;常东良;闫冬;王嘉希;王丹;马占宇;;信号处理(09);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112132004A (en) | 2020-12-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112132004B (en) | Fine granularity image recognition method based on multi-view feature fusion | |
Zhao et al. | Discriminative feature learning for unsupervised change detection in heterogeneous images based on a coupled neural network | |
CN110443143B (en) | Multi-branch convolutional neural network fused remote sensing image scene classification method | |
CN111881714B (en) | Unsupervised cross-domain pedestrian re-identification method | |
Xie et al. | Multilevel cloud detection in remote sensing images based on deep learning | |
Awad et al. | Multicomponent image segmentation using a genetic algorithm and artificial neural network | |
CN107016357B (en) | Video pedestrian detection method based on time domain convolutional neural network | |
Chuang et al. | A feature learning and object recognition framework for underwater fish images | |
Dornaika et al. | Building detection from orthophotos using a machine learning approach: An empirical study on image segmentation and descriptors | |
Ghamisi et al. | Multilevel image segmentation based on fractional-order Darwinian particle swarm optimization | |
Firpi et al. | Swarmed feature selection | |
CN111832514B (en) | Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels | |
CN108596211B (en) | Shielded pedestrian re-identification method based on centralized learning and deep network learning | |
CN111639564B (en) | Video pedestrian re-identification method based on multi-attention heterogeneous network | |
CN104346620A (en) | Inputted image pixel classification method and device, and image processing system | |
JP6867054B2 (en) | A learning method and a learning device for improving segmentation performance used for detecting a road user event by utilizing a double embedding configuration in a multi-camera system, and a testing method and a testing device using the learning method and a learning device. {LEARNING METHOD AND LEARNING DEVICE FOR IMPROVING SEGMENTATION PERFORMANCE TO BE USED FOR DETECTING ROAD USER EVENTS USING DOUBLE EMBEDDING CONFIGURATION IN MULTI-CAMERA SYSTEM AND TESTING METHOD AND TESTING DEVICE USING THE SAME} | |
CN105761238B (en) | A method of passing through gray-scale statistical data depth information extraction well-marked target | |
CN111709313B (en) | Pedestrian re-identification method based on local and channel combination characteristics | |
CN110705591A (en) | Heterogeneous transfer learning method based on optimal subspace learning | |
CN111161307A (en) | Image segmentation method and device, electronic equipment and storage medium | |
Geetha et al. | Detection and estimation of the extent of flood from crowd sourced images | |
CN110427835B (en) | Electromagnetic signal identification method and device for graph convolution network and transfer learning | |
CN117475236B (en) | Data processing system and method for mineral resource exploration | |
CN111274964B (en) | Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle | |
Alsanad et al. | Real-time fuel truck detection algorithm based on deep convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |