CN107491782B - Image classification method for small amount of training data by utilizing semantic space information - Google Patents

Image classification method for small amount of training data by utilizing semantic space information Download PDF

Info

Publication number
CN107491782B
CN107491782B CN201710603221.1A CN201710603221A CN107491782B CN 107491782 B CN107491782 B CN 107491782B CN 201710603221 A CN201710603221 A CN 201710603221A CN 107491782 B CN107491782 B CN 107491782B
Authority
CN
China
Prior art keywords
training
neural network
data
network
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710603221.1A
Other languages
Chinese (zh)
Other versions
CN107491782A (en
Inventor
付彦伟
林航宇
马建奇
姜育刚
张寅达
薛向阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201710603221.1A priority Critical patent/CN107491782B/en
Publication of CN107491782A publication Critical patent/CN107491782A/en
Application granted granted Critical
Publication of CN107491782B publication Critical patent/CN107491782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of computer image processing, and particularly relates to an image classification method for a small amount of training data by utilizing semantic spatial information. The invention utilizes semantic space information combined with an automatic encoder to amplify data, thereby obtaining more effective samples under the condition of a small amount of samples; training a deep neural network-based classifier using the augmented data; and then the classifier network and the feature extraction network are connected together for training to obtain an end-to-end neural network, so that the functions of giving a picture and directly outputting classification information are realized. The method uses a data amplification method to increase the owned data, so that the deep neural network training becomes more effective; the algorithm is an end-to-end neural network, so that only one picture needs to be given to give a corresponding classification result.

Description

Image classification method for small amount of training data by utilizing semantic space information
Technical Field
The invention belongs to the technical field of computer image processing, and particularly relates to an image classification method for a small amount of training data by utilizing semantic spatial information.
Background
A significant portion of the current progress in the field of machine learning and deep learning is the reliance on large amounts of labeled data. However, in practical situations, acquiring a large amount of labeled data requires a large amount of manpower and material resources, which is not practical in many cases. On the other hand, it is known that humans can learn how to correctly identify objects with only a small amount of data (for example, we can identify other apples after seeing several apples). It is therefore meaningful and practical to study how to train classifiers with a small amount of data, and in fact this problem is known in the field of artificial intelligence as the One-shot Learning problem. Although One-shot Learning is a classical problem, there is still no very effective method and model for fine-grained image recognition.
The One-shot Learning problem is derived from the ability of humans to learn how to recognize objects from a small number of samples. However, training with a small amount of data is in a sense contrary to the currently existing machine learning methods [1 ]. In this case, the general gradient optimization-based models are not well suited, so the first developed models are Bayesian non-exponential methods [2], deep generative models [3], Bayesian auto-encoders [4 ]. In a broader case, the existing knowledge of another domain can be utilized to assist learning in the current domain, which is called transfer learning [5 ]. The key of the transfer learning is to utilize the existing knowledge in other fields for learning. The method used in our topic can also be considered as a kind of transfer learning. From another perspective, another approach to solve the One-shot Learning problem is to amplify data, and we know that machine Learning algorithms all require a large amount of data, and this is also a solution to the problem if more effective data can be generated by reusing other knowledge through the existing data. There are currently several related approaches: (1) adaptive leveling from large-amplitude of off-shape derived models [6 ]; (2) borrowing amides from free derived sources or vocalbulares [7 ];
(3) composition synthesized representations [8 ]. The above methods basically use only a single method to solve the problem, and most of the problems solved by the methods are classification of coarse-grained categories. In the invention, the data autography deep learning classifier is trained by utilizing knowledge of semantic space to solve the problem of fine-grained image recognition under the One-shot condition.
Disclosure of Invention
The invention aims to provide an image classification method aiming at a small amount of training data by utilizing semantic space information so as to solve the problem of fine-grained image identification under the One-shot condition.
The image classification method for a small amount of training data by utilizing the semantic space information provided by the invention is characterized in that the semantic space information is combined with an automatic encoder to amplify the data, so that more effective samples are obtained under the condition of a small amount of samples; training a deep neural network-based classifier using the augmented data; and then the classifier network and the feature extraction network are connected together for training to obtain an end-to-end neural network, so that the functions of giving a picture and directly outputting classification information are realized.
The method comprises the following specific steps:
(1) the data set is segmented into a training data set and a testing data set, and the image characteristics of the two data sets are extracted by using the same neural network. The same neural network is used for extracting the neural network by utilizing the existing effective characteristics, such as vgg16 network;
(2) and acquiring vocabulary vectors of the two data sets as semantic features. Specifically, corresponding text corpora are trained through a word2vec method, and corresponding mapping from vocabularies to vocabulary vectors is obtained. For two data sets, the vocabulary vectors corresponding to the marked vocabularies are the semantic features of the two data sets;
(3) and constructing an automatic encoder neural network, which specifically comprises two parts, wherein the first part is the neural network composed of the fully-connected layers and inputting the image features serving as training data and outputting the corresponding semantic features, and the second part is the neural network composed of the fully-connected layers and inputting the corresponding semantic features and outputting the reconstructed image features. The two parts are connected together, namely the output of the first part is the input of the second part, and the training aims to ensure that the difference between the semantic features output by the first part and the real semantic features is as small as possible, and the difference between the finally output image features and the real image features is as small as possible;
(4) utilizing the last half part of the automatic encoder network obtained in the step (3) to output nearest neighbor reconstructed image features of semantic features corresponding to the training data, and adding a training data set to complete data amplification;
(5) training an end-to-end neural network by using the obtained training data, and outputting a classification result, wherein the classification result is directly obtained for a given picture after the deep neural network model is trained.
In the invention, in the neural network of the automatic encoder, the first part is a feature extraction network, and the second part is a classification network; these networks are made up of fully connected layers.
In the invention, the fully-connected layer can be repeated for a plurality of times in the deep neural network structure.
The innovation of the invention is that:
1. the information of the semantic space is used for data amplification so as to solve the problem of image classification of a small number of samples, and the problem of insufficient information caused by a small number of samples can be well solved through the information of the semantic space, so that the deep neural network can be effectively trained;
2. the algorithm is an end-to-end neural network, so that only one picture needs to be given to give a corresponding classification result.
Drawings
Fig. 1 is a schematic structural diagram of a designed deep neural network.
Fig. 2 is a schematic structural diagram of an automatic encoder. .
FIG. 3 is a schematic diagram of data amplification.
Detailed Description
Step 1, acquiring corresponding data sets, segmenting the data sets into a training data set and a testing data set, wherein the segmented data sets have the characteristic of less training data, and extracting the image characteristics of the two data sets by using the same neural network. The neural network is a pre-trained deep convolutional neural network. We use a VGG-16 neural network comprising 13 convolutional layers and 3 fully-connected layers, with the output being a 4096-dimensional feature vector for each input picture.
And 2, training corresponding text corpora through a word2vec method to obtain a corresponding mapping or dictionary from the vocabulary to the vocabulary vector. Since we have a labeled vocabulary for each picture, we can use the mapping obtained before to obtain a corresponding vocabulary vector to obtain the semantic features we need.
And 3, constructing an automatic encoder neural network, which specifically comprises two parts, wherein the first part is the neural network composed of the full connection layers and inputting the image characteristics as training data and outputting the corresponding semantic characteristics, and the second part is the neural network composed of the full connection layers and inputting the corresponding semantic characteristics and outputting the reconstructed image characteristics. As shown in fig. 2, our automatic encoder structure is composed of two parts, the first part f (x) is composed of a fully connected neural network of 4096 × 2048 × 1024 × 512 × 256 × 100, and we hope that the output of this part is similar to the real semantic features. The second part g (x) is formed by a fully connected network of 100 × 256 × 512 × 1024 × 2048 × 4096, symmetrical to the previous network, the input of this network being the output of the previous part, the output being the reconstructed image features we have obtained. The training aims to make the difference between the semantic features output by the first part and the real semantic features as small as possible, and the difference between the finally output image features and the real image features as small as possible. A specific loss function is given here.
Figure BDA0001357685330000031
Where Θ denotes the parameter set of the auto-encoder, DsRepresenting a training data set, uiIs a lexical vector, xiIs an image feature vector, x'i,u′iRespectively representing the results generated by the network, P (theta) represents the regular term of the parameter, and lambda is the parameter of the regular term and can be adjusted by self to obtain the best result. The loss function can be divided into three parts, the first part being
Figure BDA0001357685330000041
This term requires that the difference between the image features generated by the auto-encoder reconstruction and the real image features be minimized, the second termIs that
Figure BDA0001357685330000042
This term minimizes the gap between the output of the first partial neural network and the true semantic features, and the last term λ P (Θ) is a regularization term to prevent overfitting.
And 4, acquiring the nearest neighbor (multiple nearest neighbors) of the labeled vocabulary of each training data by using the dictionary obtained in the step 2. After obtaining the nearest neighbor, the second half of the auto-encoder constructed in step (3) is utilized to obtain the corresponding reconstructed image feature. Thus, image features that are common to, but not exactly the same as, the original image are obtained. Thus, the data amplification process is completed; these augmented data features are added to the data set for subsequent training. As shown in fig. 3, we search the nearest neighbor of the semantic space corresponding to the beaver, here, muskrat, beaver, and then we input the vectors of these words in the semantic space into the second half g (x) of the auto-encoder of step (3) to obtain the corresponding image features, so as to obtain the augmented data. The process utilizes the image characteristics of the data and the semantic characteristics of the image, so that the problem of small information amount of a small amount of samples is solved, and the training of an effective deep neural network classifier is facilitated.
And 5, training an end-to-end deep neural network, wherein the training process is divided into two parts, namely a first part and a classifier from image features to classification results. And a second part, expanding the network into an end-to-end network for fine-tuning. The two-part training process is described in detail below.
In the first part, a classifier is trained from image features to classification results, and this task is defined as follows.
In the image classification task, the method takes the prediction of the class as a process of solving a classification vector, and outputs the probability of belonging to the class for each class and then takes the highest probability as the class to which the class belongs. Assuming that the classification result y ∈ R of 200 dimensionalities is finally obtained200Then the corresponding sequence number is obtained as the final classification result after taking the maximum value for y.
Here, the network consists of 3 fully-connected layers plus one softmax layer, with the parameter dimensions W1∈R4096×1024,W2∈R1024×256,W3∈R256×dAnd d represents the number of classifications. The loss function is:
Figure BDA0001357685330000043
here, yiRepresents the true classification result vector, y'iRepresenting the predicted classification result (probabilistic form). By training this neural network, a more successful classifier can be obtained because the data from the data amplification is used, otherwise it is difficult to train an effective classifier for a small number of samples.
And secondly, training an end-to-end network, as shown in the figure 1, extracting the characteristics of a given picture by using the image characteristic extraction network, and inputting the characteristics into a trained classifier to obtain a classification result. To get better results, we put the existing data into the whole network and train again, i.e. fine-tuning. This gives the final result.
Reference to the literature
[1]S.Thrun.Learning To Learn:Introduction.Kluwer Academic Publishers,1996.1,1.1
[2]L.Fei-Fei,R.Fergus,and P.Perona.A bayesian approach to unsupervised one-shot learning of object3categories.In IEEE International Conference on Computer Vision,2003.1.1
[3]D.J.Rezende,S.Mohamed,I.Danihelka,K.Gregor,and D.Wierstra.One-shot generalization in deep generative models.In ICML,2016.1.1
[4]D.Kingma and M.Welling.Auto-encoding variational bayes.In ICLR,2014.1.1
[5]E.Bart and S.Ullman.Cross-generalization:learning novel classes from a single example by featurereplacement.In CVPR,2005.1.1
[6]Y.Wang and M.Hebert.Learning from small sample sets by combining unsupervised meta-training withcnns.In NIPS,2016.1,1.1
[7]J.Lim,R.Salakhutdinov,and A.Torralba.Transfer learning by borrowing examples for multiclass objectdetection.In NIPS,2011.1,1.1
[8]Y.Movshovitz-Attias.Dataset curation through renders and ontology matching.In Ph.D.thesis,CMU,2015.1,1.1。

Claims (3)

1. An image classification method aiming at a small amount of training data by utilizing semantic space information is characterized in that the semantic space information is combined with an automatic encoder to amplify the data, so that more effective samples are obtained under the condition of a small amount of samples; training a deep neural network-based classifier using the augmented data; then, the classifier network and the feature extraction network are connected together for training to obtain an end-to-end neural network, so that the functions of giving a picture and directly outputting classification information are realized;
the method comprises the following specific steps:
(1) segmenting a data set into a training data set and a testing data set, and extracting image characteristics of the two data sets by using the same neural network; the same neural network is used for extracting the neural network by utilizing the existing effective characteristics;
(2) acquiring vocabulary vectors of two data sets as semantic features; specifically, training a corresponding text corpus by a word2vec method to obtain a corresponding mapping from a vocabulary to a vocabulary vector; for two data sets, the vocabulary vectors corresponding to the marked vocabularies are the semantic features of the two data sets;
(3) constructing an automatic encoder neural network, which specifically comprises two parts, wherein the first part is a neural network composed of a full connection layer and inputting image characteristics of training data, outputting the image characteristics as corresponding semantic characteristics, and the second part is a neural network composed of a full connection layer and inputting corresponding semantic characteristics, outputting the image characteristics as reconstruction; the two parts are connected together, namely the output of the first part is the input of the second part, the training aims to ensure that the difference between the semantic features output by the first part and the real semantic features is small, and the difference between the finally output image features and the real image features is small;
(4) utilizing the last half part of the automatic encoder network obtained in the step (3) to output nearest neighbor reconstructed image features of semantic features corresponding to the training data, and adding a training data set to complete data amplification;
(5) training an end-to-end neural network by using the obtained training data, and outputting a classification result, wherein the classification result is directly obtained for a given picture after the deep neural network model is trained.
2. The image classification method according to claim 1, characterized in that in step (3), the loss function used is:
Figure FDA0002668271320000011
where Θ denotes the parameter set of the auto-encoder, DsRepresenting a training data set, uiIs a lexical vector, xiIs an image feature vector, x'i,u′iRespectively representing the results generated by the network, P (theta) represents a regular term of the parameter, and lambda is a regular term parameter and can be adjusted.
3. The image classification method according to claim 1, wherein in step (5), the training of an end-to-end neural network is divided into two parts: a first part, training a classifier from image features to classification results; a second part, expanding the network into an end-to-end network for fine-tuning; wherein:
in the first part, the process of training a classifier from image features to classification results is as follows:
here, the network consists of 3 fully-connected layers plus one softmax layer, with the parameter dimensions W1∈R4096×1024,W2∈R1024×256,W3∈R256×dD represents the number of classifications; the loss function is:
Figure FDA0002668271320000021
here, yiRepresents the true classification result vector, y'iRepresenting the predicted classification result; by training this neural network, a successful classifier can be obtained;
training an end-to-end network, extracting the characteristics of a given picture by using an image characteristic extraction network, and inputting the characteristics into a trained classifier to obtain a classification result; in order to obtain a better result, putting the existing data into the whole network, and training again, namely fine-tuning; this gives the final result.
CN201710603221.1A 2017-07-22 2017-07-22 Image classification method for small amount of training data by utilizing semantic space information Active CN107491782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710603221.1A CN107491782B (en) 2017-07-22 2017-07-22 Image classification method for small amount of training data by utilizing semantic space information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710603221.1A CN107491782B (en) 2017-07-22 2017-07-22 Image classification method for small amount of training data by utilizing semantic space information

Publications (2)

Publication Number Publication Date
CN107491782A CN107491782A (en) 2017-12-19
CN107491782B true CN107491782B (en) 2020-11-20

Family

ID=60644673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710603221.1A Active CN107491782B (en) 2017-07-22 2017-07-22 Image classification method for small amount of training data by utilizing semantic space information

Country Status (1)

Country Link
CN (1) CN107491782B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657697B (en) * 2018-11-16 2023-01-06 中山大学 Classification optimization method based on semi-supervised learning and fine-grained feature learning
CN109871791A (en) * 2019-01-31 2019-06-11 北京字节跳动网络技术有限公司 Image processing method and device
US11328221B2 (en) 2019-04-09 2022-05-10 International Business Machines Corporation Hybrid model for short text classification with imbalanced data
CN110298388A (en) * 2019-06-10 2019-10-01 天津大学 Based on the 5 kinds of damage caused by a drought recognition methods of corn for improving VGG19 network
CN113673635B (en) * 2020-05-15 2023-09-01 复旦大学 Hand-drawn sketch understanding deep learning method based on self-supervision learning task
EP3913544A1 (en) * 2020-05-22 2021-11-24 Toyota Jidosha Kabushiki Kaisha A computer-implemented training method, classification method and system, computer program and computer-readable recording medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331442A (en) * 2014-10-24 2015-02-04 华为技术有限公司 Video classification method and device
CN105631466A (en) * 2015-12-21 2016-06-01 中国科学院深圳先进技术研究院 Method and device for image classification
WO2017004803A1 (en) * 2015-07-08 2017-01-12 Xiaoou Tang An apparatus and a method for semantic image labeling

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331442A (en) * 2014-10-24 2015-02-04 华为技术有限公司 Video classification method and device
WO2017004803A1 (en) * 2015-07-08 2017-01-12 Xiaoou Tang An apparatus and a method for semantic image labeling
CN105631466A (en) * 2015-12-21 2016-06-01 中国科学院深圳先进技术研究院 Method and device for image classification

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Semi-supervised Vocabulary-Informed Learning;Yanwei Fu等;《2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20161212;摘要及第2-3节 *

Also Published As

Publication number Publication date
CN107491782A (en) 2017-12-19

Similar Documents

Publication Publication Date Title
CN107491782B (en) Image classification method for small amount of training data by utilizing semantic space information
Chen et al. A tutorial on network embeddings
Bekker et al. Training deep neural-networks based on unreliable labels
Taherkhani et al. Deep-FS: A feature selection algorithm for Deep Boltzmann Machines
Zhu et al. Structured attentions for visual question answering
US10846589B2 (en) Automated compilation of probabilistic task description into executable neural network specification
Schulz et al. Deep learning: Layer-wise learning of feature hierarchies
Fischer et al. An introduction to restricted Boltzmann machines
Ji et al. Unsupervised few-shot feature learning via self-supervised training
CN109977094B (en) Semi-supervised learning method for structured data
Arsov et al. Network embedding: An overview
CN110188827A (en) A kind of scene recognition method based on convolutional neural networks and recurrence autocoder model
CN111080551B (en) Multi-label image complement method based on depth convolution feature and semantic neighbor
CN112861976B (en) Sensitive image identification method based on twin graph convolution hash network
Tyagi Automated multistep classifier sizing and training for deep learner
Wang A survey on graph neural networks
Bayoudh A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges
Baek et al. Deep convolutional decision jungle for image classification
Dou et al. Learning global and local consistent representations for unsupervised image retrieval via deep graph diffusion networks
Janković Babić A comparison of methods for image classification of cultural heritage using transfer learning for feature extraction
Wang et al. Distance correlation autoencoder
Mudiyanselage et al. Feature selection with graph mining technology
Heindl Graph Neural Networks for Node-Level Predictions
Yuan et al. Metric learning algorithms for meta learning
Geras et al. Composite denoising autoencoders

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant