CN111860068A - Fine-grained bird identification method based on cross-layer simplified bilinear network - Google Patents

Fine-grained bird identification method based on cross-layer simplified bilinear network Download PDF

Info

Publication number
CN111860068A
CN111860068A CN201910360985.1A CN201910360985A CN111860068A CN 111860068 A CN111860068 A CN 111860068A CN 201910360985 A CN201910360985 A CN 201910360985A CN 111860068 A CN111860068 A CN 111860068A
Authority
CN
China
Prior art keywords
bilinear
cross
feature
network
grained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910360985.1A
Other languages
Chinese (zh)
Inventor
何小海
蓝洁
滕奇志
卿粼波
任超
吴小强
吴晓红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201910360985.1A priority Critical patent/CN111860068A/en
Publication of CN111860068A publication Critical patent/CN111860068A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a fine-grained bird identification method based on a cross-layer simplified bilinear network. The method comprises the following steps: 5994 training pictures and 5794 test pictures in the CUB-200-2011 data set are preprocessed, and then the processed images are input into a VGG-16 convolutional neural network to extract a feature map of the bird image. In order to consider the interlayer feature interaction, three groups of simplified bilinear feature representations are extracted from the obtained feature maps of different high-level convolutions, normalized and then cascaded to be sent to a softmax classifier. And finally, optimizing the whole network by utilizing cross entropy loss and assisting in pair confusion loss. The identification method described by the invention has the advantages of low feature dimension, less calculation amount, high identification rate, strong robustness and the like, has a certain use value aiming at the specific field of fine-grained image classification, and can be practically applied.

Description

Fine-grained bird identification method based on cross-layer simplified bilinear network
Technical Field
The invention designs a fine-grained bird identification method based on a cross-layer simplified bilinear network, and relates to deep learning and fine-grained image classification.
Background
Fine-grained classification is primarily aimed at distinguishing its numerous sub-categories, such as different kinds of birds, flowers, etc., under the same basic category. Compared with a coarse-grained image, the difference between classes of the fine-grained image is slight, the intra-class difference is obvious, the fine-grained characteristic is often more complex to obtain, the complex parameter in the model is determined by relying on the labeling of the image, and the overfitting phenomenon caused by a small amount of data is avoided as much as possible. The early fine-grained identification method relies on manually marked local information to carry out strong supervised learning on a classification model. Local labeling usually needs experts in the corresponding field to complete, so that the manual participation degree of the method is high. In recent years, a weakly supervised learning method that only needs an image class label becomes a research focus.
The mainstream fine-grained classification method based on weak supervision information mainly has two types. The first type employs a structure that "locates" sub-networks to assist in "classifying" the primary network, enhancing the learning capabilities of the classification network by locating local information (e.g., component locations or segmentation masks) provided by the network. Such approaches require a trade-off between location and identification capabilities, which may degrade the performance of a single network. This trade-off is also reflected in the practice that training usually involves alternating optimization of the two networks or training the two networks separately and then jointly adjusting. The second type is end-to-end feature coding, which enhances the learning capabilities of convolutional neural networks by coding the higher order statistics of the convolutional feature map. Such methods seek a robust representation of the image, and conventional representations include VLAD, Fisher vectors with SIFT features. Such models capture local feature interactions in a translation-invariant manner, which is particularly useful for texture and fine-grained recognition tasks.
The invention provides a fine-grained bird identification method based on a cross-layer simplified Bilinear network (BCNN) based on a simplified Bilinear network of end-to-end coding, which makes full use of the inter-layer characteristic correlation and the interactivity of characteristic maps from different Convolutional layers and regularizes a cross entropy loss function by pairwise confusion. The method makes up the inadequacy of the bilinear feature obtained by a single convolution layer, has lower dimensionality and less calculation amount compared with the BCNN feature, and obtains the recognition rate of 86.6 percent on the CUB-200-plus-2011 data set.
Disclosure of Invention
The invention realizes the purpose through the following technical scheme, which comprises the following steps:
(1) and (5) bird image feature extraction. 5994 training pictures and 5794 test pictures in the CUB-200-2011 data set are preprocessed, and then the processed images are input into a convolutional neural network to extract depth characterization vectors of the images.
(2) And (4) cross-layer simplified bilinear feature fusion. In order to consider the interlayer feature interaction, the feature maps of different high-level convolutions obtained in the step (1) are subjected to simplified bilinear operation to obtain three groups of bilinear feature representations, and the three groups of bilinear feature representations are subjected to normalization operation, then are cascaded and then are sent to a softmax classifier.
(3) The cross-entropy loss is utilized and assisted to optimize the network by the pair-wise confusion loss. Randomly dividing a sample in a training batch into two groups of picture pairs, and if the picture pairs have the same label, directly calculating cross entropy loss; if the picture pairs have different labels, adding paired Euclidean loss as a regularization term on the basis of cross entropy loss.
Drawings
FIG. 1 bird feature image extraction network
FIG. 2 is a schematic diagram of different high-level convolution activation responses
FIG. 3 is a simplified bilinear operation diagram
FIG. 4 is a block diagram of a cross-layer reduced bilinear network
FIG. 5 network training method with pairwise confusion loss
Detailed Description
The invention is further described below with reference to the accompanying drawings:
FIG. 1 is a VGG-16 based bird feature image extraction network. The image feature extractor selects VGG-16, and removes the fifth pooling layer pool5 and three full-connected layers fc6, fc7 and fc 8. Firstly, preprocessing a data set picture, and scaling the data set picture to 512 × S according to the length and width. In the training stage, pictures are disordered, horizontally turned and randomly cut, and the input size is 448 multiplied by 448; the testing stage performs center cropping only on the picture.
FIG. 2 is a diagram of different high-level convolution activation responses in a feature extraction network. As shown in fig. 2, the discrimination of each component in the input image differs between different convolutional layers. As in the first row of pictures in fig. 2, conv5_1 has a strong response to all of the tail, head and wings of the black-legged geoduck, while conv5_3 retains only the activation response to the head. Inspired by the observation, in order to better capture the characteristic relation between layers, the invention provides a cross-layer simplified bilinear pooling method. The method considers the characteristic interaction between layers, integrates a plurality of cross-layer bilinear characteristics and carries out characteristic fusion before final classification so as to enhance the representation capability of the characteristics, and avoids additional training parameters. In contrast to BCNN, which only utilizes features from a single convolutional layer, the present method treats each convolutional layer in a convolutional neural network as a partial attribute extractor, utilizing partial feature interactions from multiple layers.
FIG. 3 is a simplified bilinear operation diagram. The concrete implementation steps are as follows:
(1) first, the feature vector f is transformed using the Count Sketch function Ψk∈RcMapping to a feature space, k 1, 2. Two vectors s are definedk∈{-1,1},hkE {1, 1.., d }, the initialization is subject to uniform distribution, and the value is fixed in subsequent operations. h iskFor finding fkThe ith element fk(i) Corresponding index j ═ h in feature spacek(i) Then there is
Ψ(fk,hk,sk)={Q1,Q2,...,Qd}
Figure BDA0002046834090000031
Wherein: i ∈ {1,..., C }; j ∈ {1,..., d }.
(2) The Tensor Sketch algorithm indicates that a Count Sketch of the two vector outer products can be obtained by calculating the convolution of two feature vectors, Count Sketch, which can be expressed as
Figure BDA0002046834090000032
Where denotes a convolution operation. The convolution theorem states that convolution in the time domain is equivalent to a product in the frequency domain. Thus, the above formula can be represented as
Figure BDA0002046834090000033
Wherein F represents a fast Fourier transform, F-1Which represents the inverse of the fourier transform,
Figure BDA0002046834090000034
representing the multiplication of pairs of elements.
(3) And carrying out normalization operation on the three groups of bilinear eigenvectors obtained in the step. Firstly, the bilinear feature x ═ Ψ (i) is obtained by the square root of the symbol
Figure BDA0002046834090000035
Then l2 normalization is carried out (z ← y/| | y | survival circuitry)2)。
Fig. 4 is a block diagram of a cross-layer reduced bilinear network. The concrete implementation steps are as follows:
(1) VGG-16 is selected as a feature extractor, output feature maps of different high-level convolutions are obtained from bird feature image extraction networks and recorded as f 1(x,y),f2(x,y),f3(x, y) wherein f1、f2、f3The characteristic functions respectively correspond to the output characteristic functions of the fifth convolution layers conv5_1, conv5_2 and conv5_3 of the VGG-16.
(2) Combining the output characteristic maps f of different layers according to the method of FIG. 3AWith another layer profile fBAnd carrying out simplified bilinear operation to obtain three groups of bilinear eigenvectors.
(3) And (4) the normalized features are vector-valued and sent into a softmax classifier for classification.
FIG. 5 is a network training method with pairwise confusion loss. The core idea of pair-wise obfuscation is: randomly dividing a sample in a training batch into two groups of picture pairs, and if the picture pairs have the same label, directly calculating cross entropy loss; if the picture pairs have different labels, adding paired Euclidean loss as a regularization term on the basis of cross entropy loss. The method mainly comprises the following steps:
(1) samples in a training batch are randomly divided into two groups (x)1,y1)、(x2,y2)。
(2) Obtain the class label vector label (x) of the two sets of samples1) And label (x)2)。
(3) If two groups of samples have the same label, the cross entropy loss is directly calculated
Figure BDA0002046834090000036
If the two groups of samples have different labels, adding Euclidean pairwise confusion loss as a regularization term on the basis of cross entropy loss, namely
Figure BDA0002046834090000041
Wherein DECEpoendo's distance, L ECTable cross entropy loss, pθ(y|xi) Probability vector output by softmax classifier.
(4) Back-propagating the losses and updating the network parameters.
(5) Enter the next batch and jump to step (1).

Claims (5)

1. A fine-grained bird identification method based on a cross-layer simplified bilinear network is characterized by comprising the following steps:
(1) firstly, 5994 training pictures and 5794 testing pictures in a CUB-200-2011 data set are preprocessed, and then the processed images are input into a convolutional neural network VGG-16 to extract a feature map of a bird image;
(2) in order to consider interlayer feature interaction, three groups of simplified bilinear feature vectors are extracted from the feature maps of different high-level convolutions obtained in the step (1) in a cross-layer mode, normalized, cascaded and sent to a softmax classifier;
(3) cross entropy loss is utilized and assisted to pair-wise confusion optimization networks.
2. The cross-layer bilinear feature extraction of claim 1, wherein the selected convolutional layer is VGG-16 fifth set of convolutions, and the specific combination is
Figure FDA0002046834080000011
Wherein
Figure FDA0002046834080000012
Defined as a reduced bilinear operation.
3. The three sets of reduced bilinear eigenvectors of claim 1, wherein the output eigenvector dimension takes the value of 8192.
4. The optimization network using cross entropy loss according to claim 1, wherein samples in a training batch are randomly divided into two groups of picture pairs, and if the picture pairs have the same label, cross entropy loss is directly calculated; if the picture pairs have different labels, adding paired Euclidean loss as a regularization term on the basis of cross entropy loss.
5. The Euclidean loss weight of claim 4 takes a value of 20 and the cross-entropy loss weight takes a value of 1.
CN201910360985.1A 2019-04-30 2019-04-30 Fine-grained bird identification method based on cross-layer simplified bilinear network Pending CN111860068A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910360985.1A CN111860068A (en) 2019-04-30 2019-04-30 Fine-grained bird identification method based on cross-layer simplified bilinear network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910360985.1A CN111860068A (en) 2019-04-30 2019-04-30 Fine-grained bird identification method based on cross-layer simplified bilinear network

Publications (1)

Publication Number Publication Date
CN111860068A true CN111860068A (en) 2020-10-30

Family

ID=72966490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910360985.1A Pending CN111860068A (en) 2019-04-30 2019-04-30 Fine-grained bird identification method based on cross-layer simplified bilinear network

Country Status (1)

Country Link
CN (1) CN111860068A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114648667A (en) * 2022-03-31 2022-06-21 北京工业大学 Bird image fine-granularity identification method based on lightweight bilinear CNN model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704877A (en) * 2017-10-09 2018-02-16 哈尔滨工业大学深圳研究生院 A kind of image privacy cognitive method based on deep learning
CN108549926A (en) * 2018-03-09 2018-09-18 中山大学 A kind of deep neural network and training method for refining identification vehicle attribute
CN108875827A (en) * 2018-06-15 2018-11-23 广州深域信息科技有限公司 A kind of method and system of fine granularity image classification
CN109002845A (en) * 2018-06-29 2018-12-14 西安交通大学 Fine granularity image classification method based on depth convolutional neural networks
CN109086792A (en) * 2018-06-26 2018-12-25 上海理工大学 Based on the fine granularity image classification method for detecting and identifying the network architecture
WO2019018063A1 (en) * 2017-07-19 2019-01-24 Microsoft Technology Licensing, Llc Fine-grained image recognition
CN109685115A (en) * 2018-11-30 2019-04-26 西北大学 A kind of the fine granularity conceptual model and learning method of bilinearity Fusion Features

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019018063A1 (en) * 2017-07-19 2019-01-24 Microsoft Technology Licensing, Llc Fine-grained image recognition
CN107704877A (en) * 2017-10-09 2018-02-16 哈尔滨工业大学深圳研究生院 A kind of image privacy cognitive method based on deep learning
CN108549926A (en) * 2018-03-09 2018-09-18 中山大学 A kind of deep neural network and training method for refining identification vehicle attribute
CN108875827A (en) * 2018-06-15 2018-11-23 广州深域信息科技有限公司 A kind of method and system of fine granularity image classification
CN109086792A (en) * 2018-06-26 2018-12-25 上海理工大学 Based on the fine granularity image classification method for detecting and identifying the network architecture
CN109002845A (en) * 2018-06-29 2018-12-14 西安交通大学 Fine granularity image classification method based on depth convolutional neural networks
CN109685115A (en) * 2018-11-30 2019-04-26 西北大学 A kind of the fine granularity conceptual model and learning method of bilinearity Fusion Features

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ABHIMANYU DUBEY等: "Pairwise Confusion for Fine-Grained Visual Classification", 《PROCEEDINGS OF THE EUROPEAN CONFERENCE ON COMPUTER VISION》 *
AKIRA FUKUI等: "Multimodal compact bilinear pooling for visual question answering and visual grounding", 《COMPUTER VISION AND PATTERN RECOGNITION》 *
CHAOJIAN YU等: "Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition", 《PROCEEDINGS OF THE EUROPEAN CONFERENCE ON COMPUTER VISION》 *
YANG GAO等: "Compact Bilinear Pooling", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
单倩文等: "基于改进多尺度特征图的目标快速检测与识别算法", 《激光与光电子学进展》 *
张阳: "细粒度图像分类算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114648667A (en) * 2022-03-31 2022-06-21 北京工业大学 Bird image fine-granularity identification method based on lightweight bilinear CNN model
CN114648667B (en) * 2022-03-31 2024-06-07 北京工业大学 Bird image fine-granularity recognition method based on lightweight bilinear CNN model

Similar Documents

Publication Publication Date Title
Ding et al. Semi-supervised locality preserving dense graph neural network with ARMA filters and context-aware learning for hyperspectral image classification
Yue-Hei Ng et al. Exploiting local features from deep networks for image retrieval
Yang et al. Canonical correlation analysis networks for two-view image recognition
CN104268593B (en) The face identification method of many rarefaction representations under a kind of Small Sample Size
Cao et al. Landmark recognition with sparse representation classification and extreme learning machine
CN111723675B (en) Remote sensing image scene classification method based on multiple similarity measurement deep learning
CN110348399B (en) Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network
CN109978041B (en) Hyperspectral image classification method based on alternative updating convolutional neural network
CN105718889B (en) Based on GB (2D)2The face personal identification method of PCANet depth convolution model
CN107330355B (en) Deep pedestrian re-identification method based on positive sample balance constraint
CN112580590A (en) Finger vein identification method based on multi-semantic feature fusion network
CN109002755B (en) Age estimation model construction method and estimation method based on face image
CN110097060B (en) Open set identification method for trunk image
Priyankara et al. Computer assisted plant identification system for Android
Dong et al. Feature extraction through contourlet subband clustering for texture classification
CN110516533A (en) A kind of pedestrian based on depth measure discrimination method again
CN116740384A (en) Intelligent control method and system of floor washing machine
CN113496221B (en) Point supervision remote sensing image semantic segmentation method and system based on depth bilateral filtering
CN109543546B (en) Gait age estimation method based on depth sequence distribution regression
CN112329818B (en) Hyperspectral image non-supervision classification method based on graph convolution network embedded characterization
CN112150359B (en) Unmanned aerial vehicle image fast splicing method based on machine learning and feature point identification
CN113361589A (en) Rare or endangered plant leaf identification method based on transfer learning and knowledge distillation
Sun et al. Deep learning based pedestrian detection
Bressan et al. Semantic segmentation with labeling uncertainty and class imbalance
CN111860068A (en) Fine-grained bird identification method based on cross-layer simplified bilinear network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201030