CN109978080B

CN109978080B - Image identification method based on discrimination matrix variable limited Boltzmann machine

Info

Publication number: CN109978080B
Application number: CN201910297655.2A
Authority: CN
Inventors: 尹宝才; 田鹏宇; 李敬华; 孔德慧; 王立春
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2018-04-16
Filing date: 2019-04-15
Publication date: 2021-06-25
Anticipated expiration: 2039-04-15
Also published as: CN109978080A; CN108510009A

Abstract

The invention discloses an image identification method based on a discriminant matrix variable limited Boltzmann machine model, wherein the discriminant matrix variable limited Boltzmann machine is used for two-dimensional image classification and is marked as DisMVRBM, and the model can directly model an image without vectorization and retains the structural information of an original sample. Compared with MVRBM, the model is added with a label layer, which means that label information is blended in the process of extracting the features, so that the extracted features have discriminability and the classification performance can be improved; and because the label layer model is added, the model can be directly used as an independent classifier without linking other classifiers, and the fine tuning training stage of other classifiers is omitted.

Description

Image identification method based on discrimination matrix variable limited Boltzmann machine

Technical Field

The invention belongs to the technical field of pattern recognition, and particularly relates to an image recognition method based on a discriminant matrix variable limited Boltzmann machine model.

Background

An Artificial Neural Network (ANN) is a computational model built by mimicking the structure and function of a biological Neural Network, and a typical ANN is composed of a large number of simple processing nodes (Artificial neurons) that are hierarchically structured and related to each other in a specified manner. Some nodes are visible to the outside and others are hidden from the outside, the association between two nodes is a weight. Training an ANN model involves calculating weighting coefficients based on training data.

A Restricted Boltzmann Machine (RBM) is a random neural network based on statistical mechanics, can fit any discrete distribution, and is often used for the construction of a multilayer structure of a Deep Belief Network (DBN) and different Machine learning problems, such as data dimension reduction, face recognition, collaborative filtering, reconstruction, noise reduction, and the like. The input layer and the hidden layer of the RBM are in a vector form, and when the data is a high-order tensor, vectorization is usually required, and the high-order tensor data vectorization can destroy the spatial structure of the data and lose useful spatial information. In order to not destroy the spatial structure of data and the intrinsic correlation information thereof, Tu et al propose a tensor variable limited Boltzmann machine, but the hidden layer of the model is still in a vector form. The RBM is developed into a Matrix Variable Restricted Boltzmann Machine (MVRBM) by the Qinlei et al, and the model adopts an expression form that an input layer and a hidden layer are both matrixes. Although the matrix form can keep the spatial structure information of the data, the matrix form is also unsupervised and trained like RBM, the label information is not utilized when the features are extracted, and therefore the extracted features have no strong discriminability.

McCallum indicates that it is beneficial to utilize label information in the feature learning process. To extract discriminative features, many people have begun to use label information in the training process. Yang et al studied methods for modeling multimodal data and category information together and for video classification. Schmah proposes a discriminant training method of RBM, which trains an RBM for each type of data, and the method is similar to a Bayesian classifier. Hugo et al propose a class-limited Boltzmann machine learning algorithm. Furthermore, inspired by a discriminant monitoring subspace model, Guo et al add the monitoring subspace constraint to the RBM hidden layer, the models are all vector variable oriented models, that is, the input is vector data, for high-order signals such as images/videos, high-dimensional data needs to be stretched into vectors, and the way of processing data inevitably loses the spatial structure information of the high-dimensional data.

The invention improves MVRBM aiming at the problem that the MVRBM can not extract characteristics with discriminability, namely, the label information of data is fully utilized during training to ensure that the extracted characteristics have discriminability; and the proposed model can be used directly for classification without additional other classifiers to perform the classification task.

Disclosure of Invention

The invention provides an image identification method based on a discriminant matrix variable limited Boltzmann machine model, and provides a discriminant matrix variable limited Boltzmann machine for two-dimensional image classification, which is recorded as DisMVRBM. The model can directly model the image without vectorization, and the structural information of the original sample is reserved. Compared with MVRBM, the model is added with a label layer, which means that label information is blended in the process of extracting the features, so that the extracted features have discriminability and the classification performance can be improved; and because the label layer model is added, the model can be directly used as an independent classifier without linking other classifiers, and the fine tuning training stage of other classifiers is omitted.

Drawings

Fig. 1 is a schematic diagram of a DisMVRBM model according to the present invention.

Detailed Description

The invention provides an image identification method based on a discriminant matrix variable limited Boltzmann machine model, which comprises the following steps of:

step 1, discriminant matrix variable limited boltzmann machine model

The energy function of the matrix variable limited boltzmann model is defined as:

here, the following are defined: x ═ X_ij]∈i^I*JThe matrix variable of the visual layer represents input data, namely input images, and the size of each frame of image is I multiplied by J; h ═ H_kl]∈i^K*LRepresenting the characteristics with discriminability of the input data extracted based on the model, namely representing the characteristics of the input image, wherein the size of the characteristics is K × L;

the connection weight of X and H is a fourth-order tensor variable which represents a nonlinear mapping relation between the input image and the features extracted by the model; b ═ B_ij]∈i^I*JA bias matrix variable for the visible layer, representing an offset of the input data; c ═ C_kl]∈i^K*LThe offset matrix variable of the hidden layer represents the offset of the output characteristic.

Further, a joint probability distribution of the visible layer and the hidden layer, i.e. a joint probability of the input image and the feature to which the model is fitted, may be defined based on the energy function, as in equation (2):

and defining a log-likelihood function based on the joint probability distribution:

and then, with the goal of maximizing a log-likelihood function, the probability of all samples occurring under an optimal set of model parameters is maximized by learning the model parameters between the visible layer and the hidden layer, so that the effective representation of the input data is obtained.

However, the MVRBM is still an unsupervised generation model with expressive force and can well extract the characteristics of input data; when used for classification tasks, the conventional Neural Network (NN) is typically combined, the NN is initialized based on the model parameters of the MVRBM, and classification is performed after the NN is fine-tuned through a back propagation algorithm.

In order to avoid the problems of fine adjustment operation and the fact that NN may be trapped in local optimization, the matrix variable limited Boltzmann machine based on discrimination is adopted for two-dimensional image classification and is marked as DisMVRBM, namely category constraint is added on the basis of an original MVRBM model, so that the improved MVRBM has classification capability, as shown in figure 1.

DisMVRBM aims at modeling the number D of input images by means of hidden layer features H_train＝{X⁽¹⁾,...,X⁽ⁿ⁾,...,X^(N)And the corresponding category label Y ═ Y_zt]∈R^Z*TAnd Z is a joint distribution of 1, thus defining a class-constrained energy function as follows:

where x, h, w, b and c are as defined above, the added label related parts are as defined below: y ═ y_zt]∈i^Z*TThe label matrix variable of the visible layer identifies the category of the input data, namely the label corresponding to the input image, wherein, Z is a constant, and therefore can be regarded as a vector variable; p ═ P_ztkl]∈i^Z*T*K*LThe connection weight of Y and H is a fourth-order tensor variable which represents a nonlinear mapping relation between a label of an input image and output characteristics; d ═ D_zt]＝[d_t]∈i^Z*TThe offset matrix variable of the label layer represents the offset of the label and can be regarded as a vector variable in the same way;

the tag layer is a one-bit effective encoding vector, that is, if the tag of the input data is the t-th class, the t-th component of the tag layer vector corresponding to the data is 1, and other components are set to zero.

Because the weight of the model is the fourth-order tensor, the data volume is greatly increased, and the time complexity of the model training stage is high. In order to reduce model parameters and computational complexity, the invention assumes that the connection weights of the hidden layer unit and the visible layer, and the hidden layer and the label layer have a certain specific structure, thereby greatly reducing the number of free parameters, and the specific structure is to decompose the weight tensor:

w_ijkl＝u_kiv_ljand p_ztkl＝q_kzr_lt

By defining the matrix form:

U＝[u_ki]∈i^K*I，V＝[v_lj]∈i^L*J，Q＝[q_kz]∈i^K*Z，R＝[r_lt]∈i^L*T，

thus, the energy function of the deformed DisMVRBM is obtained as follows:

E(X,Y,H；Θ)＝-tr(U^THVX^T)-tr(X^TB)-tr(Q^THRY^T)-tr(Y^TD)-tr(H^TC)

(5)

where Θ ═ { U, V, Q, R, B, C, D } represents all parameters of the model.

Based on the above formula, the joint probability of X, Y, H, i.e. the joint probability of the input image, the feature and the corresponding label:

the normalization constant Z (Θ) in the above formula is defined as:

the probability of a unit of the hidden layer being activated, i.e. the probability of a feature being activated:

where σ (a) ═ 1/(1+ exp (-a)), expressed in a matrix:

p(H＝1|X,Y；Θ)＝σ(C+UXV^T+QYR^T) (9)

equation (8) represents that the probability that each element of the hidden layer H is 1 is calculated one by one, and σ calculation is applied to each corresponding matrix element.

The activation probability of a certain unit of the visual layer, namely the activation probability of a certain input image pixel:

the matrix form is represented as:

p(X＝1|H；Θ)＝σ(B+U^THV) (11)

like equation (8), equation (10) represents calculating the probability that any one element of the visible layer X is 1 one by one, and the σ calculation is applied to each corresponding matrix element.

Wherein, y _zt1 indicates that the training image data belongs to the t-th class.

The matrix form represents:

here, the subscript t referred to in the denominator indicates that the label belongs to the t-th class, the numerator t^*Representing the categories of all possible tags.

The joint probability distribution for the given parameters Θ, X, Y is:

step 2, solving the discriminant matrix variable limited boltzmann model

Suppose that a given set of training image data sets D containing N samples_train＝{X⁽¹⁾,...,X⁽ⁿ⁾,...,X^(N)The invention aims to estimate the parameter theta based on the maximum likelihood method by taking the following conditional probability as an objective function, wherein the likelihood function is

Wherein N is the number of samples;n represents the nth sample; y is⁽ⁿ⁾Is a vector, so here and hereinafter y is used⁽ⁿ⁾In place of Y⁽ⁿ⁾The t-th component is 1 and the remaining components are all 0, i.e.

Denotes y⁽ⁿ⁾The t-th component of (A), and

is 1, representing data X⁽ⁿ⁾Is t; Θ represents all model parameters. The objective function described above is aimed at solving the problem that, for an input sample X, under the current model parameters⁽ⁿ⁾The label is y⁽ⁿ⁾The probability of (c) is the greatest.

Obtaining the following according to a conditional probability formula:

derivative of the objective function to the model parameters:

to calculate (17), three parts to the right of the second equal sign in equation (17) need to be calculated, respectively:

p(H|X⁽ⁿ⁾,y⁽ⁿ⁾)，p(y,H|X⁽ⁿ⁾)。

these three parts are calculated separately below:

calculation of

Wherein the compound represented by the formula (1):

the above (18.1) to (18.6) are the calculation methods for each element in the matrix.

Calculate p (H | X)⁽ⁿ⁾,y⁽ⁿ⁾)：

Because of the fact that

All of them contain h_kl,h_klIs an element in the matrix variable H.

So here are:

the above formula is the way each element in the matrix H is calculated.

Calculate p (y, H | X)⁽ⁿ⁾)：

Firstly, simplification:

the molecule y of the above formula expresses a specific category; y on the denominator indicates that all categories are to be traversed.

Wherein the molecule:

at this point, the partial derivatives of the objective function in equation (17) for each parameter can be obtained and then substituted into the calculation results (17) of (18), (19) and (20).

Given the particularity of (18.6), this is given separately:

wherein, p (y)_t|X⁽ⁿ⁾) Representation by training data X⁽ⁿ⁾And calculating the probability value of the t-th class.

Finally, the objective function is maximized by optimizing by a gradient ascent method, and each result of the formula (17) is substituted into the following formula for updating:

and the theta belongs to the theta, and the lambda is the learning rate, and after multiple iterations, the optimized model is obtained.

And (3) experimental verification:

the effectiveness of the method for image recognition is verified through a comparison experiment with the similar method. The experimental part is designed with two types of experiments, wherein the first experiment aims to verify the superiority of the discriminant matrix variable restricted Boltzmann machine (DisMVRBM) relative to an RBM, an MVRBM and corresponding variants thereof and other unsupervised methods; the second experiment aims to verify the superiority of the discriminant matrix variable-constrained boltzmann machine (DisMVRBM) relative to the discriminant vector variable-constrained boltzmann machine (DisRBM).

The experimental data set used in the present invention is as follows:

MNIST Database: the MNIST dataset is a handwritten digit dataset that includes 0-9 ten digits of 60,000 training images and 10,000 test images. Each image is a 28 x 28 gray scale image.

ETH-80 Database: the ETH-80 dataset contains 8 classes of objects (apple, car, cow, cup, dog, horse, pear, tomato), and in each class of object set, 41 images at different perspectives of 10 different objects of that class are contained, i.e., 10 different objects are contained in each class, and each object contains 41 frames of image data, for a total of 8 × 10 × 41 — 3,280 frames of images. The present invention first down samples each image to 32 x 32 and converts each image to a grayscale image.

Ballet Database: the entire data set contains 8 complex ballet actions, 44 pieces of video cut from the ballet DVD, each containing 107 to 506 frames. The present invention randomly selects 200 frames from each of the 8 actions as training data. Each frame image is down-sampled to 32 x 32 size and the image is converted to a grayscale image.

Coil _20: containing 20 different classes of objects, each class of objects having 72 images from different perspectives, each frame of images was down-sampled to 32 x 32 size as training data.

Experiment one: DisMVRBM compares the effect against other unsupervised RBMs and variants thereof.

Experiment one aims to verify the superiority of discriminant matrix variable limited boltzmann machine added with class constraint compared with other unsupervised methods, and the comparison methods comprise traditional RBM, IGBRBM (Gaussian distribution limited boltzmann machine), MVRBM and MVGRBM (Gaussian distribution matrix variable limited boltzmann machine).

In the first experiment, the sizes of the hidden layers of the comparison model and the model provided by the invention are both 28 x 28, the weight learning rate between the hidden layer and the visible layer of the model is 0.01, the weight attenuation is 10^ -3, the weight learning rate between the hidden layer and the label layer is 0.01, the weight attenuation is 10^ -6, and the sizes of the visible layer and the input data are consistent. The comparative results for experiment one are shown in table 1:

TABLE 1 comparison of recognition accuracy of discriminative MVRBM versus other non-discriminative methods

(unit:%)

	RBM	IGRBM	MVRBM	MVIGRBM	DisMVRBM
						MNIST	0.9494	0.9365	0.9658	0.9665	0.9725
Ballet_32	0.3566	0.7063	0.3505	0.9323	0.9509
						ETH-80	0.5281	0.8750	0.3319	0.88	0.9053

Table 1 shows the recognition accuracy results of different models in multiple data sets, and it can be seen that the recognition accuracy of the model provided by the present invention in the data sets of MNIST, Ballet _32 and ETH-80 is higher than that of the comparison models RBM, IGRBM, MVRBM and MVIGRBM.

This is because the four models used for comparison are all generative models, and unsupervised training methods are employed, without using the label information of the data. When the classification task of the experimental design is executed, the traditional Neural Network (NN) is combined, NN parameters are initialized based on the training result of the comparison model, then the NN parameters are finely adjusted through a back propagation algorithm, and finally classification is carried out based on the NN. The model DisMVRBM provided by the invention integrates label information, so that on one hand, the extracted features of the provided model have discriminability, thereby being beneficial to classification tasks; on the other hand, because the model provided by the invention is added with a label layer, and a supervision training method is adopted, the model can be used as an independent classifier to directly execute a classification task.

Experiment two: DisMVRBM versus DisRBM effect comparison

Experiment two aims at verifying the recognition accuracy of the discriminant matrix variable restricted boltzmann machine relative to the discriminant vector variable restricted boltzmann machine. Thus, DisMVRBM and DisRBM performance was tested on three datasets, Ballet _32, ETH-80, and Coil _20, using the same parameter settings as experiment one. The comparative results for experiment two are shown in table 2:

TABLE 2 DisMVRBM vs DisRBM recognition accuracy (unit:%)

	DisRBM	DisMVRBM
			Ballet_32	0.9114	0.9509
ETH-80	0.5078	0.9053
			Coil_20	0.9779	0.9896

Table 2 shows the recognition accuracy of the distmvrbm model provided by the present invention with respect to the distrbm model on different data sets, and the result shows that the classification effect of the matrix variable discriminant model provided by the present invention is superior to that of the conventional vector variable discriminant model, thereby verifying the superiority of the model provided by the present invention. The matrix variable oriented classification model provided by the invention does not need to stretch the two-dimensional image into vectors when executing an image classification task, namely, the original space structure of the image is not damaged. Therefore, the classification result of the model provided by the invention has better effect than that of a one-dimensional comparison model.

Claims

1. An image identification method based on a discriminant matrix variable limited Boltzmann machine model is characterized by comprising the following steps of:

step 1, discriminant matrix variable limited boltzmann machine model

wherein，

The matrix variable of the visual layer represents input data, namely input images, and the size of each frame of image is I multiplied by J;

representing the characteristics with discriminability of the input data extracted based on the model, namely representing the characteristics of the input image, wherein the size of the characteristics is K × L;

the connection weight of X and H is a fourth-order tensor variable which represents a nonlinear mapping relation between the input image and the features extracted by the model;

a bias matrix variable for the visible layer, representing an offset of the input data;

the offset matrix variable of the hidden layer represents the offset of the output characteristic;

a joint probability distribution of the visible layer and the hidden layer, i.e. a joint probability of the input image and the feature to which the model is fitted, may be defined based on the energy function, as in equation (2):

further, a matrix variable limited Boltzmann machine based on discrimination is adopted for two-dimensional image classification and is marked as DisMVRBM, namely category constraint is added on the basis of an original MVRBM model, so that the improved MVRBM has classification capability, and the MVRBM is the matrix variable limited Boltzmann machine;

wherein the content of the first and second substances,

identifying the category of input data, namely a label corresponding to an input image, for a visible layer label matrix variable, wherein Z is a constant, and therefore, the variable can be regarded as a vector variable;

the connection weight of Y and H is a fourth-order tensor variable which represents a nonlinear mapping relation between the label of the input image and the output characteristic;

the offset matrix variable of the label layer represents the offset of the label, and can be regarded as a vector variable similarly;

wherein, the label layer is a one-bit effective coding vector, that is, if the label of the input data is the t-th class, the t-th component of the label layer vector corresponding to the data is 1, and other components are all set to zero,

assuming that the connection weights of the hidden layer unit and the visible layer, and the hidden layer and the label layer have a specific structure, the specific structure is to decompose the weight tensor:

and

by defining the matrix form:

thus, the energy function of the deformed DisMVRBM is obtained as follows:

E(X,Y,H；Θ)＝-tr(U^THVX^T)-tr(X^TB)-tr(Q^THRY^T)-tr(Y^TD)-tr(H^TC) (5)

wherein Θ ═ { U, V, Q, R, B, C, D } represents all parameters of the model,

the normalization constant Z (Θ) in the above formula is defined as:

where σ (a) ═ 1/(1+ exp (-a)), expressed in a matrix:

p(H＝1|X,Y；Θ)＝σ(C+UXV^T+QYR^T) (9)

equation (8) represents the probability that each element of the hidden layer H is 1 is computed one by one, the σ computation is applied to each corresponding matrix element,

the matrix form is represented as:

p(X＝1|H；Θ)＝σ(B+U^THV) (11)

like formula (8), formula (10) represents that the probability that any one element of the visible layer X is 1 is calculated one by one, and σ calculation is applied to each corresponding matrix element, and similarly, the conditional probability of each component of the label layer is as follows:

wherein, y_zt1 indicates that the training image data belongs to the t-th class,

the matrix form represents:

here, the subscript t referred to in the denominator indicates that the label belongs to the t-th class, the numerator t^*Indicates the category of all possible tags,

the joint probability distribution for the given parameters Θ, X, Y is:

step 2, solving the discriminant matrix variable limited boltzmann model

Suppose that a given set of training image data sets D containing N samples_train＝{X⁽¹⁾,...,X⁽ⁿ⁾,...,X^(N)The following conditional probability is taken as an objective functionEstimating a parameter theta based on a maximum likelihood method, wherein a likelihood function is

Wherein N is the number of samples; n represents the nth sample; y is⁽ⁿ⁾Is a vector, so here and hereinafter y is used⁽ⁿ⁾In place of Y⁽ⁿ⁾The t-th component is 1 and the remaining components are all 0, i.e.

Denotes y⁽ⁿ⁾The t-th component of (A), and

is 1, representing data X⁽ⁿ⁾Is t; theta denotes all model parameters and the objective function is intended to make the input sample X at the current model parameters⁽ⁿ⁾The label is y⁽ⁿ⁾The probability of (a) being the highest,

obtaining the following according to a conditional probability formula:

derivative of the objective function to the model parameters:

p(H|X⁽ⁿ⁾,y⁽ⁿ⁾)，p(y,H|X⁽ⁿ⁾)，

these three parts are calculated separately below:

calculation of

Wherein the compound represented by the formula (1):

the above (18.1) - (18.6) are the calculation of each element in the matrix,

calculate p (H | X)⁽ⁿ⁾,y⁽ⁿ⁾)：

Because of the fact that

All of them contain h_kl，h_klIs an element in the matrix variable H,

so here are:

the above equation is a way of calculating each element in the matrix H,

calculate p (y, H | X)⁽ⁿ⁾)：

Firstly, simplification:

the molecule y of the above formula expresses a specific category; y on the denominator indicates that all categories are to be traversed,

wherein the molecule:

at last, the partial derivatives of the objective function in the formula (17) for each parameter can be obtained and the calculation results of (18), (19) and (20) can be substituted into the formula (17),

given the particularity of (18.6), this is given separately:

wherein, p (y)_t|X⁽ⁿ⁾) Representation by training data X⁽ⁿ⁾The probability value of the t-th class is calculated,