CN111401434A

CN111401434A - Image classification method based on unsupervised feature learning

Info

Publication number: CN111401434A
Application number: CN202010173425.8A
Authority: CN
Inventors: 聂飞平; 陆继韬; 王榕; 李学龙
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2020-03-12
Filing date: 2020-03-12
Publication date: 2020-07-10
Anticipated expiration: 2040-03-12
Also published as: CN111401434B

Abstract

The invention provides an image classification method based on unsupervised feature learning. Firstly, constructing a block matrix, utilizing a PCA algorithm to learn a filter bank, reducing the dimension of a feature map, taking output as input, repeating the process to construct a deep network, and obtaining a feature map set after two layers of dimension reduction; then, binarizing the feature map subjected to dimension reduction, calculating a hash value and counting a histogram in blocks to obtain feature embedding of the original image; secondly, training a classifier by using original image labels and feature embedding; and finally, computing features of the unidentified images to be classified, embedding the features and processing the features by using a trained classifier to obtain a final classification result.

Description

Image classification method based on unsupervised feature learning

Technical Field

The invention belongs to the technical field of machine learning and computer vision, and particularly relates to an image classification method based on unsupervised feature learning.

Background

With the rapid increase in computing power and the explosive growth of data, deep learning methods, represented by convolutional neural networks (ConvNet or CNN), have enjoyed tremendous success in a variety of computer vision tasks. It is well known that feature extraction of data has an important influence on the performance of machine learning. One of the main reasons for the success of CNN is that it extracts multi-level semantic information of an image through a cascaded convolution filter, which is superior to the characteristics of the conventional manual design. CNNs are usually supervised, and the free data sets published on the network are not necessarily suitable for the current task, so users usually need to pay high cost to collect data suitable for the task to be completed and hire manual tagging of the data for training of the network. In order to overcome the dependence on data labels and reduce the cost of acquiring data, many unsupervised learning methods have been proposed, i.e. to automatically learn features from a large number of unlabelled samples of pictures and videos. The unsupervised method is used for replacing the current supervised method applied in a large scale, has extremely high economic value and social value, and is a hot field in the current machine learning research.

The image classification problem is one of the most basic, most important and most challenging tasks in computer vision, and is the basis of other 'high-level' tasks, such as target detection, semantic segmentation, pedestrian recognition and the like, and the improvement of the performance of image classification can indirectly improve the performance of other dozens of computer vision tasks. The task of image classification is quite challenging, mainly because of the potentially large intra-class differences in the same class of images.

For example, the document "Jain A K, Farrokhia F. Understand textual definition using Gabor filters [ J ]. Pattern recognition,1991,24(12): 1167. sup. H1186." features for texture classification were designed, "Ahonen T, Hadid A, Pietikon M. facedescription with local patterns: Application to surface recognition [ J ]. IEEE Transactions on Pattern Analysis & Machine integration, 2006(12): 2037. sup. 2041. design features for facial classification," document "L. visual classification [ D. G.8978. distribution & Machine integration," design features for facial classification, "2005. 12. sup. H2004. 2041. special features for human classification," design features for human classification, "2005. D.8978. distribution, and so on,. 1. although these special features for simple design of objects, such as" design of objects, special features for simple classification, special manual classification, see No. 12. No. 1. for manual classification of objects, No. 12. special features for human classification, No. 12. special for design of objects.

Learning features from data of interest is considered a remedy for manually designed features, typically deep neural networks. The core idea of the deep neural network is to learn multiple levels of feature representations, and it is expected that the features at the higher level can represent more abstract semantics in the data, and the abstract representations have stronger robustness to intra-class differences. The document Bruna J, Mallat S.Invaring correlation networks [ J ]. IEEE transactions on pattern analysis and machine interpretation, 2013,35(8): 1872-. Despite the use of a fixed filter bank, it still exhibits superior performance in the tasks of handwritten digit recognition, texture recognition. However, the structure has insufficient generalization performance in tasks with obvious changes in illumination conditions, such as face recognition. The document "Chan T H, Jia K, Gao S, et al PCANet: A simple depleting baseline for image classification? [J] IEEE transactions on imaging 2015,24(12): 5017-. The network does not use labels in the stage of feature learning, and achieves excellent effects in tasks such as handwritten number recognition and face recognition. However, the dimensionality of the feature extracted by the algorithm increases with the network depth index, and it is difficult to extract semantic information with richer levels by improving the depth.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides an image classification method based on unsupervised feature learning. Firstly, constructing a block matrix, utilizing a PCA algorithm to learn a filter bank, reducing the dimension of a feature map, taking output as input, repeating the process to construct a deep network, and obtaining a feature map set after two layers of dimension reduction; then, binarizing the feature map subjected to dimension reduction, calculating a hash value and counting a histogram in blocks to obtain feature embedding of the original image; secondly, training a classifier by using original image labels and feature embedding; and finally, computing features of the unidentified images to be classified, embedding the features and processing the features by using a trained classifier to obtain a final classification result. The method can solve the problem of exponential increase of the feature dimension of the existing PCANet algorithm, and can effectively extract the features of the image for classification tasks through a simple linear PCA operator.

An image classification method based on unsupervised feature learning is characterized by comprising the following steps:

step 1: for the original ith input image, i is 1,2, …, N is the number of input images, and the feature map set after dimensionality reduction is extracted according to the following process

Wherein,

a j-th dimension-reduced feature map representing the original i-th input image, j being 1,2, …, D₁，D₁For the set target dimension:

step 1.1, extracting image blocks which take each pixel as a center in an image by using a sliding window with the step length of 1, then stretching each image block into one-dimensional vectors, respectively carrying out de-equalization processing on each vector, and taking each de-equalized vector as a column of a matrix to obtain a matrix X, wherein the size of an input image is p × p, the size of the image block is k × k, and the value range of k is more than 1 and less than k and less than p;

step 1.2: an orthogonal matrix V formed by a group of standard orthogonal bases is solved by adopting a principal component analysis algorithm, and the following requirements are met:

wherein, L₁The value range of the set number of the filters is not less than 1 and not more than L₁＜k²；

Transforming each column of the matrix V into a matrix with the size of k × k from a vector, taking each matrix obtained by transformation as a filter, performing convolution operation on each filter and the original input image respectively, and enabling the size of a feature map obtained after convolution to be the same as that of the original image through zero filling operation to obtain L₁A feature map with a web size of p × p;

step 1.3: stretching each characteristic diagram output in the step 1.2 into a one-dimensional vector, and taking the one-dimensional vector as a row of the matrix to obtain an original characteristic diagram matrix

Then, a projection matrix U formed by a group of standard orthogonal bases is solved by adopting a principal component analysis algorithm₁And satisfies the following conditions:

wherein D is₁The value range is not less than 1 and not more than D for the set target dimension₁≤L₁；

According to

Calculating to obtain a matrix P after dimension reduction, converting each row vector of P into a matrix with the size of P × P, wherein each matrix is a feature map after dimension reduction to obtain D₁The feature map after dimension reduction is formed into a feature map set after dimension reduction of the original input image;

step 2: for each feature map set obtained in step 1 after dimension reduction

With each image therein

For the input image, j is 1,2, …, D₁And let the procedure described in step 1.2The number of the filters is L₂The value range is not less than 1 and not more than L₂＜k²And the target dimension in step 1.3 is D₂The value range is not less than 1D₂≤L₂Calculating according to the step 1 to obtain a feature map set after the dimension reduction for the second time

And step 3: for each feature map set obtained in step 2

All images are subjected to binarization processing to enable all pixel values in the images to be 0 or 1, and then all pixel point values at the same position in different images are extracted to form a length D₂The binary strings are extracted from the pixel points at all positions, each binary string is converted into a decimal number, and each decimal number is used as a new pixel value of the pixel position extracted from the corresponding binary string to obtain a new image which is marked as

Thus, for the original ith input image, a new image set is obtained

And 4, step 4: obtaining a new image set of the original ith input image for step 3_iI is 1,2, …, N, each image is subject to block extraction by a sliding window with step length s and size b × b, wherein, 1 is not less than b not less than p, 1 is not less than s not less than b, then each block is statistically provided with

Connecting all the histograms of the bins into a vector; general collectionConnecting all the images in the image collection according to the vectors obtained by the process, and embedding the connected vectors as the features of the original ith input image;

for each original input image a new image set_iProcessing the images i to 1,2, …, N according to the above process to obtain corresponding feature embeddings, and obtaining N feature embeddings of all the original N input images;

and 5: embedding the N characteristics obtained in the step (4) into input data serving as a classifier, taking labels of the original N input images as input labels of the classifier, and training the classifier to obtain a trained classifier; the classifier comprises a nearest neighbor classifier and a support vector machine;

step 6: and (3) taking M unlabeled images to be classified as input images, calculating according to the steps 1-4 to obtain M feature embeddings, and inputting the M feature embeddings into the trained classifier obtained in the step 5 to obtain a classification result.

The invention has the beneficial effects that: because gradient back propagation is not used, PCA is used for unsupervised learning of the multi-stage filter bank, compared with a convolutional neural network, the calculation amount is greatly reduced, and the method is suitable for rapid calculation under a non-special hardware environment; because the feature map after dimensionality reduction is used as the next-layer input image, compared with the PCANet, the problem that the parameter number increases exponentially along with the increase of the layer number is solved, and the construction of a deeper network to extract semantic information of a higher layer is possible; because the feature extraction process does not depend on the labeled data, a large amount of unlabeled data on the network can be added into the training set under the condition of not increasing the additional manual labeling cost, and the feature extraction performance is further improved; since each step can be realized in a parallelization mode to shorten the running time, the method can be used for large-scale data set training on a distributed cluster.

Drawings

FIG. 1 is a basic flow chart of an image classification method based on unsupervised feature learning according to the present invention;

FIG. 2 is a schematic diagram of the process of computing a filter bank using a principal component analysis algorithm according to the present invention.

Detailed Description

The present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.

As shown in fig. 1, the present invention provides an image classification method based on unsupervised feature learning, which is implemented as follows:

1. and extracting a feature map set after the dimension reduction of the first layer.

(1) A matrix of tiles is constructed.

Hypothesis training set

Consists of N images of p × p size, for each image, a tile of size k × k centered around each pixel in the image is extracted in the form of a sliding window, with a sliding step of 1.

Wherein x is_i,jA column vector representing the jth tile of the ith image,

a column vector representing the jth tile of the ith image after the mean removal, 1 being a vector having all dimensions 1, i being 1,2, …, N, j being 1,2, …, m, m being (p-k +1)²The number of patches extracted for each image.

For the original ith image, i is 1,2, …, N, and each of the averaged patch vectors is used as a column of the matrix to obtain a matrix

(2) The filter bank is solved and the eigen map is calculated.

Solving a group of standard orthogonal bases by using a principal component analysis algorithm to minimize the reconstruction error, namely:

wherein, L₁The value range of the set number of the filters is not less than 1 and not more than L₁＜k². The solution to the optimization problem is the covariance matrix X_iX_i ^TMaximum L₁The eigenvectors capture the main differences between the training sample blocks, and are transformed L₁And k × k, namely obtaining a filter bank of the first layer.

L using the first layer₁And (3) convolving each filter with the original input image to obtain an original output characteristic diagram of a first layer:

wherein, represents a two-dimensional convolution operation, I_iRepresenting the original i-th input image, W₁ ^lThe ith filter representing the first layer,

representing the feature map generated by convolving the ith original input image with the ith filter. Before convolution in the original image I_iAnd (5) zero filling is carried out on the periphery, so that the size of the feature image output after convolution is kept the same as that of the original image before convolution.

(3) And reducing the dimension of the feature map.

If the above steps (1) - (2) are directly repeated to construct a deep network, that is, all the feature maps obtained in step (2) are input into step (1) and iterated several times, the number of output feature maps will increase exponentially. In order to avoid the exponential increase of the number of feature maps with the depth of the network, the invention adds a dimension reduction module before inputting the next layer. Stretching all the characteristic diagrams output in the step (2) into one-dimensional vectors to obtain an original characteristic diagram matrix

Then adopts the principal componentProjection matrix U formed by solving a group of standard orthogonal bases through a sub-analysis algorithm₁And satisfies the following conditions:

wherein D is₁The value range is not less than 1 and not more than D for the set target dimension₁≤L₁. Solution U of the optimization problem₁Is a covariance matrix

Maximum D₁And the feature vectors corresponding to the feature values form an orthogonal matrix.

According to

Calculating to obtain a matrix P after dimensionality reduction, wherein the size of the matrix P is

Reducing each row vector of P into a matrix with the size of P × P to obtain a characteristic diagram after dimension reduction, and obtaining D by the matrix P in a co-correspondence way₁Forming a feature map set after dimensionality reduction by using a feature map with the amplitude of p × p after dimensionality reduction;

to this end, for the original I-th input image I_iObtaining a feature map set after dimension reduction

2. And (5) iteratively constructing a deep network.

Each feature map set after dimension reduction obtained in the step 1 is collected

As input to the next layer, i.e. in each set

Each image of

For the input image, j is 1,2, …, D₁The calculation is carried out according to the above process starting from step 1, and the number of the filters in step (2) is L₂The value range is not less than 1 and not more than L₂＜k²The target dimension in the step (3) is D₂The value range is not less than 1D₂≤L₂. To this end, for each original image I_iI 1,2, …, N, to yield D₁A set of images

Wherein each image set

And further comprises D₂The images being, i.e.

3. And (4) carrying out hash calculation.

For step 2, obtaining each feature map set

Each set is encoded as a new image, processed as follows:

(1) and (6) binarization.

And (4) carrying out binarization processing on all images in the set, wherein if the negative number is 0 and the non-negative number is 1, the pixel values in the images are all changed into 0 or 1.

(2) A binary hash.

Extracting D in the set₂All D of the same position of the image₂A binary value constituting a length D₂The binary string of (2) is extracted from the pixel points at all positions in this way. Each binary string is treated as a binary representation constituting a decimal number, converted into a decimal number, and the output D₂The amplitude and real number characteristic diagram is converted intoAn integer image, namely:

wherein,

representation collection

H (-) represents the binarization processing of the image in the step (1) above,

representation collection

The d-th image of (1). Each pixel in the new image obtained is a closed interval

Is an integer of (1).

Thus, for the original ith input image, a new image set is obtained

4. And (5) partitioning a statistical histogram.

Obtaining a new image set of the original ith input image for step 3_iDividing each image into blocks by adopting a sliding window of B × B, wherein B is more than or equal to 1 and less than or equal to p, the value range of the sliding step length s is more than or equal to 1 and less than or equal to B, setting the number of the obtained blocks as B, counting the number of the blocks containing the blocks

Histogram of each bin, and histogram of all B binsAre concatenated into a vector. Connecting the vectors obtained by the process in all the images in the set, so far, for each set_iObtain a vector

Embedded as a feature of the original ith input image.

For each original input image a new image set_iIf all the i-1, 2, …, N are processed according to the above procedure to obtain the corresponding feature embeddings, N feature embeddings of all the original N input images can be obtained.

5. The classifier is trained using feature embedding.

Embedding the N characteristics obtained in the step (4) into input data serving as a classifier, taking labels of the original N input images as input labels of the classifier, and training the classifier to obtain a trained classifier; the classifier comprises a nearest neighbor classifier and a support vector machine;

6. and (5) classifying the images.

Using M unlabelled images to be classified as input images, and keeping L filter number set when embedding training image features₁、L₂And the resulting projection matrix U₁、U₂And (4) calculating to obtain M feature embeddings according to the steps 1-4, and inputting the M feature embeddings into the trained classifier obtained in the step 5 to obtain a classification result.

In this embodiment, a simulation experiment is performed on an Ubuntu 16.04 operating system with a central processing unit of intel (r) xeon (r) CPU E5-2680 [email protected] and a memory 512G by using Python and pytorreh software, and the experiment is performed by using data sets MNIST and CIFAR-10, respectively, and the information of the data sets is shown in table 1. The accuracy of the feature extraction performed by the steps 1-4 of the present invention and the classification performed by using the linear SVM as the classifier is shown in Table 2, and the time taken for the feature extraction is shown in Table 3. Therefore, the method can extract the features of large-scale data in a short time, and well complete classification tasks by using the extracted features.

TABLE 1

Data set	Number of samples	Number of categories
			MNIST	60000	10
CIFAR-10	50000	10

TABLE 2

Data set	Accuracy of
		MNIST	99.39％
CIFAR-10	77.09％

TABLE 3

Data set	Time (seconds)
		MNIST	587.88
CIFAR-10	2564.04

Claims

1. An image classification method based on unsupervised feature learning is characterized by comprising the following steps:

Wherein,

According to

step 2: for each feature map set obtained in step 1 after dimension reduction

With each image therein

For the input image, j is 1,2, …, D₁And the number of the filters in the step 1.2 is L₂The value range is not less than 1 and not more than L₂＜k²And the target dimension in step 1.3 is D₂The value range is not less than 1D₂≤L₂Calculating according to the step 1 to obtain a feature map set after the dimension reduction for the second time

And step 3: for each feature map set obtained in step 2

Thus, for the original ith input image, a new image set is obtained

Connecting all the histograms of the bins into a vector; connecting vectors obtained by all images in the set according to the process, and embedding the connected vectors as the features of the original ith input image;