CN111222545A

CN111222545A - Image classification method based on linear programming incremental learning

Info

Publication number: CN111222545A
Application number: CN201911348984.1A
Authority: CN
Inventors: 白静; 员安然; 王鼎臣; 周华吉; 肖竹; 张丹; 杨韦洁
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2020-06-02
Anticipated expiration: 2039-12-24
Also published as: CN111222545B

Abstract

The invention discloses an image classification method based on linear programming incremental learning, which comprises the following steps: constructing a convolutional neural network; generating an initial training set; initially training a convolutional neural network; obtaining a class-average feature vector of an initial training set; judging whether the category of the image to be classified belongs to the category in the initial training set, if so, classifying by using a convolutional neural network, and if not, executing the next step; generating an incremental training set; obtaining a class-average feature vector of an incremental training set; solving a weight column vector by using a linear programming model; updating the convolutional neural network; and classifying by using a convolutional neural network. The method has the advantages of strong self-adaptive capacity, capability of generating the incremental training set by only one image, capability of finishing incremental learning by only needing few computing resources and computing time, and high classification accuracy of the initial training set and the incremental training set.

Description

Image classification method based on linear programming incremental learning

Technical Field

The invention relates to the technical field of image processing, in particular to an image classification method based on linear programming incremental learning in the technical field of image classification. The method can be used for classifying the main target in the optical image or the ground object target in the hyperspectral image.

Background

Image classification is an important field of image processing, and is an image processing method for distinguishing different types of objects according to different characteristics of the different objects reflected in an image. The problem faced in the field of image classification is that a trained model can only classify images of classes contained in a training set, but images of classes not contained in the training set cannot be classified correctly, a mechanism for learning images of classes not contained in training data needs to be trained by a large number of images of calibration classes, a large amount of calibration work consumes time and labor, and training needs to consume a large amount of computing resources and time.

In the patent technology 'an image classification training method capable of incremental learning in a big data scene' (patent application number: 201710550339.2, publication number: CN 107358257B) owned by southern China university, an image classification training method capable of incremental learning in a big data scene is provided. Firstly, training an image classifier for initial image data; secondly, performing incremental training on the initial model when a new class image appears to obtain an updated image classifier; and finally, classifying the test data by using the trained incremental image classifier to obtain a classification result. The method can effectively carry out incremental learning and image classification on the new class images. However, the method still has the following defects: a large number of images of the calibration category are required to form an incremental training set, requiring labor and time costs.

The American transportation university of Western's in its patent document "automatic incremental learning method for image recognition" (patent application No. 201810574578.6, publication No. CN 108805196A) proposes an automatic incremental learning method for image recognition. the incremental method of the invention is to firstly read a plurality of image data which are marked and train to obtain a pre-training model, then calculate an evaluation standard α and classify the data without marking according to an entropy loss function β, and finally re-input the new training data obtained each time into the current model to train until the iteration is completed.

Disclosure of Invention

The invention aims to provide an image classification method based on linear programming incremental learning, aiming at the defects of the prior art. The method solves the problems that the existing image classification technology cannot correctly classify images of classes not contained in a training set, and the incremental learning process needs a large number of calibrated images of classes and consumes a large number of computing resources and time.

In order to achieve the purpose, the idea of the invention is as follows: the convolutional neural network is divided into a feature extraction module and a classification module, the feature extraction module is fixed in the incremental training process, and only the classification module is updated. And when the classification module is updated, the updated classification module is ensured to correctly classify the characteristics of the initial training class and the characteristics of the incremental training class.

The method comprises the following specific steps:

(1) constructing a convolutional neural network:

(1a) a10-layer feature extraction module is built, and the structure of the module is as follows in sequence: input layer → first convolution layer → second convolution layer → first pooling layer → third convolution layer → fourth convolution layer → second pooling layer → Flatten layer → normalization layer → first full-link layer;

the parameters of each layer are set as follows: respectively setting the number of convolution kernels in the first convolution layer, the number of convolution kernels in the fourth; the first full connection layer consists of 512 nodes;

(1b) building a classification module consisting of a point build-up layer and an output layer; the row number in the weight determinant of the point accumulation layer is 512, the column number is equal to the total number of the label categories of all the input images, and the activation function of the output layer is softmax;

(1c) connecting the feature extraction module and the classification module in sequence to form a convolutional neural network;

(2) generating an initial training set:

inputting at least 1000 images with labeled categories, wherein all the images at least comprise 3 labeled categories, preprocessing each input image, and forming an initial training set by all the preprocessed images;

(3) initial training of the convolutional neural network:

inputting the initial training set into a convolutional neural network, updating the weight of each layer of the convolutional neural network by using a gradient descent method until the root mean square error value is reduced to be below 5.0, and obtaining the initially trained convolutional neural network;

(4) obtaining a class-average feature vector of an initial training set:

(4a) sequentially inputting each image in the initial training set into an initially trained convolutional neural network, and taking 512-dimensional output vector of each image output by a first full-connection layer in a feature extraction module of the network as a feature vector of the image;

(4b) averaging each element of the feature vectors of all the images of the same labeling category, and forming a category average feature vector of the labeling category by the average values of all the elements;

(5) judging whether the category of the image to be classified belongs to the category in the initial training set, if so, executing the step (10), otherwise, executing the step (6);

(6) generating an incremental training set:

inputting an image with the same labeling category as the image to be classified, preprocessing the image in the same manner as the step (2), and forming an incremental training set by all the preprocessed images;

(7) obtaining a class-average feature vector of an incremental training set;

(7a) sequentially inputting each image in the incremental training set into an initially trained convolutional neural network, and taking 512-dimensional output vector of each image output by a first full-connection layer in a feature extraction module of the network as a feature vector of the image;

(7b) averaging each element of the feature vectors of all the images in the incremental training set, and forming the average value of all the elements into a class-average feature vector of the labeling class;

(8) solving the weight column vector by using a linear programming model:

(8a) and (3) improving the classification score of the class-average feature vector of the training set to the maximum extent under the limitation of a correct classification constraint condition by using the following formula:

max Z＝f·W

s.t.f_i·W＜f_i·W_j

f·W＞f·W_j

wherein max represents the maximization operation, Z represents the objective function, f represents the class-average feature vector of the incremental training set, represents the dot product operation, W represents the weight column vector to be solved, s.t. represents the constraint condition, f_iRepresenting the class-average feature vector of the ith class in the initial training set, i is 1n, n represents the total number of all labeled classes in the initial training set, and W_jRepresenting j-th row elements in a row-column formula of the dot-product layer weight of the classification module in the initially trained convolutional neural network, wherein the value of j is correspondingly equal to i;

(8b) solving the linear model by adopting one of the existing linear programming tool software to obtain a weight column vector to be solved; in the embodiment of the invention, a python language sklern library is adopted to solve the linear model;

(9) updating the convolutional neural network:

(9a) combining the weight determinant of the point lamination of the classification module in the initially trained convolutional neural network with the weight column vector obtained in the step (8b) in columns to obtain an updated weight determinant, and replacing the weight determinant of the point lamination of the classification module in the initially trained convolutional neural network with the updated weight determinant to obtain an updated classification module;

(9b) sequentially connecting a feature extraction module of the initially trained convolutional neural network with an updated classification module to form an updated convolutional neural network, and then executing the step (10);

(10) classification with convolutional neural networks:

and inputting the image to be classified into a convolutional neural network, and outputting a classification result.

Compared with the prior art, the invention has the following advantages:

firstly, because the invention utilizes the class-average feature vector of the incremental training set to establish the linear programming model, the invention has no requirement on the minimum number of images in the incremental training set, and overcomes the problems that the incremental training set is formed by a large number of images labeled with classes in the incremental learning in the prior art, and the labor cost and the time cost are required, so that the invention can generate the incremental training set and complete the incremental learning by only one image labeled with the classes.

Secondly, only forward propagation calculation is needed when the class-average feature vectors of the incremental training set are obtained, and only one linear programming model is needed to be solved in the incremental process, so that the problems that the incremental process in the prior art relates to an iterative process, the calculation is complex and time-consuming, and the response to the incremental categories is slow are solved, and the incremental learning can be completed only by few calculation resources and calculation time.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a simulation diagram of the present invention.

Detailed Description

The present invention is described in further detail below with reference to the attached drawing figures.

The specific implementation steps of the present invention are described in further detail with reference to fig. 1.

Step 1, constructing a convolutional neural network.

A10-layer feature extraction module is built, and the structure of the module is as follows in sequence: the input layer → the first convolution layer → the second convolution layer → the first pooling layer → the third convolution layer → the fourth convolution layer → the second pooling layer → the Flatten layer → the normalization layer → the first fully-connected layer, and the parameters of each layer are set as follows: respectively setting the number of convolution kernels in the first convolution layer, the number of convolution kernels in the fourth convolution; the first fully-connected layer consists of 512 nodes.

Building a classification module consisting of a point build-up layer and an output layer; the row number in the weighted determinant of the point stack is 512, the column number is equal to the total number of the label categories of all the input images, and the activation function of the output layer is softmax.

And the feature extraction module and the classification module are sequentially connected to form a convolutional neural network.

And 2, generating an initial training set.

Inputting at least 1000 images with labeled categories, wherein all the images at least comprise 3 labeled categories, preprocessing each input image, and if the input image is an optical image, sequentially preprocessing each image by rotating, shearing, stretching, reducing noise, changing brightness and changing contrast; if the input image is a hyperspectral image, performing Principal Component Analysis (PCA) dimensionality reduction and normalization preprocessing on each image in sequence, and forming an initial training set by all preprocessed images.

And 3, initially training the convolutional neural network.

And inputting the initial training set into the convolutional neural network, and updating the weight of each layer of the convolutional neural network by using a gradient descent method until the root mean square error value is reduced to be below 5.0 to obtain the initially trained convolutional neural network.

And 4, acquiring the class average feature vector of the initial training set.

And sequentially inputting each image in the initial training set into an initially trained convolutional neural network, and taking each 512-dimensional output vector of each image output by a first full-connection layer in a feature extraction module of the network as a feature vector of the image.

And averaging each element of the feature vectors of all the images of the same labeling category, and forming the average value of all the elements into a category-average feature vector of the labeling category.

And 5, manually judging whether the category of the image to be classified belongs to the category in the initial training set, if so, executing the step 10, otherwise, executing the step 6.

And 6, generating an incremental training set.

Inputting an image with the same labeling type as the image to be classified, preprocessing the image, and if the input image is an optical image, preprocessing each image in sequence by rotating, shearing, stretching, denoising, changing brightness and changing contrast; if the hyperspectral images are input, performing Principal Component Analysis (PCA) dimensionality reduction and normalization preprocessing on each image in sequence, and forming all preprocessed images into an incremental training set.

And 7, acquiring the class-average feature vector of the incremental training set.

And sequentially inputting each image in the incremental training set into an initially trained convolutional neural network, and taking each 512-dimensional output vector of each image output by a first full-connection layer in a feature extraction module of the network as a feature vector of the image.

And averaging each element of the feature vectors of all the images in the incremental training set, and forming the average value of all the elements into the class-average feature vector of the labeling category.

And 8, solving the weight column vector by using a linear programming model.

And (3) improving the classification score of the class-average feature vector of the training set to the maximum extent under the limitation of a correct classification constraint condition by using the following formula:

max Z＝f·W

s.t.f_i·W＜f_i·W_j

f·W＞f·W_j

wherein max represents the maximization operation, Z represents the objective function, f represents the class-average feature vector of the incremental training set, represents the dot product operation, W represents the weight column vector to be solved, s.t. represents the constraint condition, f_iA class-mean feature vector representing the ith class in the initial training set, i ═ 1n, n represents the total number of all labeled classes in the initial training set, W_jAnd j represents the j-th row element in the dot product layer weight determinant of the classification module in the initially trained convolutional neural network, and the value of j is correspondingly equal to i.

And solving the linear model by adopting a python language sklern library to obtain a weight column vector.

And 9, updating the convolutional neural network.

Combining the weight determinant of the point-product layer of the classification module in the initially trained convolutional neural network and the weight column vector obtained in the eighth step in columns to obtain an updated weight determinant, and replacing the weight determinant of the point-product layer of the classification module in the initially trained convolutional neural network with the updated weight determinant to obtain an updated classification module.

And (3) connecting the initially trained feature extraction module of the convolutional neural network with the updated classification module in sequence to form an updated convolutional neural network, and then executing the step 10.

And step 10, classifying by using a convolutional neural network.

The effect of the present invention is further explained by combining the simulation experiment as follows:

1. the experimental conditions are as follows:

the hardware platform of the simulation experiment of the invention is as follows: the GPU is NVIDIA GeForce GTX 1080 Ti/PCIe/SSE2, 20 cores, the main frequency is 2.4GHz, and the memory size is 64 GB; the video memory size is 20 GB.

The software platform of the simulation experiment of the invention is as follows: the operating system was ubuntu18.04 LTS, version 1.2.1 for TensorFlow.

2. Emulated content

The simulation experiment of the invention is to classify each image containing ground features of the input Paviau high-spectrum data set of the university of Pavea by adopting the method and two prior arts (a new-class and old-class sample collaborative training RR method and a weight parameter random initialization RIC method) respectively to obtain a classification result.

The RR method for the cooperative training of the new class and the old class samples in the prior art comprises the following steps: an image classification method proposed in a published paper "Deep connected neural networks for hyperspectral image classification" (Journal of Sensors, vol.2015, pp.1-12) of w.hu, h.yangyu, w.li, z.fan, and l.hengchao, which is abbreviated as a RR method for collaborative training of new and old samples.

The prior art method for randomly initializing the RIC by using the weight parameters refers to the following steps: hang Qi, in its published paper "Low-shot learning with embedded weights" (Proceedings of the IEEE Conference on computer Vision and Pattern recognition.2018:5822-5830), sets forth an image classification method, referred to as weight parameter random initialization RIC method.

The input image used by the simulation experiment is each image containing ground objects, which is extracted from Paviau university high spectral data set, the size of the image is 11 multiplied by 103, and the image format is mat. The PaviaU hyperspectral data set of university of Pavian is hyperspectral data taken by Reflective Optics Spectroscopy imaging System (ROSIS-03) in 2003 in Italy at the university of Pavian in Germany. The spectral imager continuously images 115 bands in the wavelength range of 0.43-0.86 μm, the spatial resolution of the images is 1.3m/pixel, and 12 bands are rejected due to the influence of noise. The PaviaU hyperspectral data set of the university of Pavea uses images formed by 103 spectral bands after noise removal. The data has a size of 610 × 340 and comprises 2207400 pixels, wherein the number of the pixels comprising the ground features is only 42776, the pixels comprise the ground features of asphalt, meadow, gravel, forest, metal plate, bare soil, asphalt, stone brick and shadow 9, and the rest pixels are background pixels. An image of 11 × 11 pixels taken out with one pixel including a feature as the center is taken as one image including a feature.

In a simulation experiment, a training set and a test set are selected according to the following proportion:

(1) taking out all images of eight types of asphalt, meadow, macadam, forest, metal plate, bare soil, asphalt and stone brick of the Paviau university Paviau hyperspectral dataset, wherein 5% of all images of each type are used for generating an initial training set, and the rest 95% of all images of each type are used for testing. The number of images in each category of the initial training set and the number of images used for testing are shown in table 1.

(2) And taking out all images of the shadow class of the PaviaU high-spectrum data set of the university of Pavian, wherein one image of all images of the shadow class is used for generating an incremental training set, and all other images of the shadow class are used for testing. The number of images for each class of the incremental training set and the number of images used for testing are shown in table 2.

TABLE 1 List of number of each class of images in initial training set and test number

Categories	Initial training set	Number of test images
			Asphalt	33	6598
Meadow	93	18556
			Crushing stone	10	2089
Tree forest	15	3049
			Metal plate	6	1339
Bare soil	25	5004
			Asphalt	6	1324
Stone brick	18	3664

TABLE 2 summary of number of images in each class and number of tests in incremental training set

Categories	Incremental training set	Number of test images
			Shadow masking	1	946

The effect of the present invention will be further described with reference to the simulation diagram of fig. 2.

Fig. 2(a) is a true terrain map of a PaviaU hyperspectral dataset at the university of parkia, which has a size of 610 × 340 pixels. Fig. 2(b) is a result diagram of classifying each image containing ground features taken from the PaviaU hyperspectral dataset of the university of paviana by using the RR method of the prior art for the collaborative training of the new class and the old class samples. Fig. 2(c) is a result diagram of classifying each image containing ground features taken from the PaviaU hyperspectral dataset of the university of paviana by using a weight parameter random initialization RIC method in the prior art. Fig. 2(d) is a diagram showing the result of classifying each image of the inclusion ground object taken from the PaviaU hyperspectral dataset of the university of paviae by using the method of the present invention.

In order to compare the classification effect of different methods on each category, the classification result of the three methods is evaluated by using the classification accuracy evaluation index. The classification accuracy rates of nine classes in the simulation experiment of the invention are respectively calculated by using the following formulas, and the classification accuracy rates of different methods for each class are drawn as table 3:

TABLE 3 Classification accuracy (%), for different analog images, for each method

Image classification	RR	RIC	Method for producing a composite material
				Asphalt	0.00	99.08	95.65
Meadow	68.05	99.87	98.32
				Crushing stone	5.16	94.64	93.63
Tree forest	0.07	98.66	96.68
				Metal plate	0.00	99.53	91.64
Bare soil	0.04	89.96	86.21
				Asphalt	0.00	86.36	82.44
Stone brick	0.63	98.91	91.48
				Shadow masking	91.67	0.00	100.00

As can be seen from fig. 2(b) in combination with table 3, compared with the RR method for collaborative training of new class and old class samples and the RIC method for randomly initializing weight parameters, the accuracy of each class included in the initial training set is very low in the prior art, because the method does not inherit any classification capability of the calibration class in the initial training set in the incremental process, and only the labeled class in the incremental training set can be learned.

As can be seen from fig. 2(c) in combination with table 3, the weight parameter random initialization RIC method in the prior art has high classification accuracy for classes in the initial training set, but has low classification accuracy for classes in the incremental training set due to failure to effectively learn the classes in the incremental training set.

As can be seen from fig. 2(d) in combination with table 3, the classification result of the present invention is superior to the classification results of the two prior art in classification ability of the calibration classes in the initial training set and incremental learning can be completed on the incremental training set generated from one image.

As can be seen from fig. 2 in conjunction with table 3, the above simulation experiments show that: the method can utilize the feature extraction module of the convolution neural network which is trained well initially to extract the features of the images in the incremental training set, and on the premise of keeping the capability of identifying the images in the initial training set, the incremental learning is realized quickly and efficiently, so that the method can identify the categories contained in the initial training set and the incremental training set. The image classification method is suitable for various application scenes, has obvious advantages in the application scenes with unbalanced data or strong timeliness, and is an efficient and flexible image classification method.

Claims

1. An image classification method based on linear programming incremental learning is characterized by comprising the following steps of constructing a convolutional neural network formed by connecting a feature extraction module and a classification module, initially training the convolutional neural network, classifying by using the convolutional neural network, if pictures to be classified with the categories not belonging to the categories in an initial training set are encountered, establishing a linear programming model by using the average features of the initial training set and the incremental training set, solving a weight column vector, and updating a classifier, wherein the method comprises the following steps:

(1) constructing a convolutional neural network:

(2) generating an initial training set:

(3) initial training of the convolutional neural network:

(4) obtaining a class-average feature vector of an initial training set:

(4a) sequentially inputting each image in the initial training set into an initially trained convolutional neural network, and taking each image 512-dimensional output vector output by a first full-connection layer in a feature extraction module of the network as a feature vector of the image;

(6) generating an incremental training set:

(7) obtaining a class-average feature vector of an incremental training set;

(7a) sequentially inputting each image in the incremental training set into an initially trained convolutional neural network, and taking each image 512-dimensional output vector output by a first full-connection layer in a feature extraction module of the network as a feature vector of the image;

(8) solving the weight column vector by using a linear programming model:

max Z＝f·W

s.t. f_i·W＜f_i·W_j

f·W＞f·W_j

wherein max represents the maximization operation, Z represents the objective function, f represents the class-average feature vector of the incremental training set, represents the dot product operation, W represents the weight column vector to be solved, s.t. represents the constraint condition, f_iA class-average feature vector representing the ith class in the initial training set, i equals 1n, n represents the total number of all labeled classes in the initial training set, W_jRepresenting j-th row elements in a point-product layer weight determinant of a classification module in the initially trained convolutional neural network, wherein the value of j is correspondingly equal to i;

(8b) solving the linear model to obtain a weight column vector;

(9) updating the convolutional neural network:

(10) classification with convolutional neural networks:

2. The image classification method based on linear programming incremental learning of claim 1, characterized in that: the preprocessing in the step (2) and the step (6) is to perform preprocessing of rotating, cutting, stretching, denoising, changing brightness and changing contrast on each image in sequence if the input image is an optical image; and if the input image is a hyperspectral image, sequentially performing Principal Component Analysis (PCA) dimensionality reduction and normalization preprocessing on each image.

3. The image classification method based on linear programming incremental learning of claim 1, characterized in that: and (3) solving the linear model in the step (8b) by adopting any one of linear programming tool software.