CN105631476B

CN105631476B - A kind of recognition methods of matrix variables RBM

Info

Publication number: CN105631476B
Application number: CN201510994184.2A
Authority: CN
Inventors: 齐光磊; 孙艳丰; 胡永利
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2015-12-25
Filing date: 2015-12-25
Publication date: 2019-06-21
Anticipated expiration: 2035-12-25
Also published as: CN105631476A

Abstract

The invention discloses the recognition methods of matrix variables RBM a kind of, its computation complexity for substantially reducing training and deriving, the spatial information in 2D matrix data is maintained in trained and test process while obtaining good effect in restructuring procedure, can be applied to more complicated data structure.The method comprising the steps of: (1) training stage: carrying out sample training according to the matrix variables RBM of formula (4)WhereinFor binary system visible layer matrix variables,For binary system hidden layer matrix variables, Θ indicates all model parameter U, V, B and C, and normalization constant Z (Θ) is defined asThe binary-valued space of wherein x, y expression X and Y；(2) sorting phase: carrying out vectorization for hidden layer matrix variables, be trained using K-NN method, is classified according to the minimum test image of residual error.

Description

Matrix variable RBM identification method

Technical Field

The invention belongs to the technical field of pattern recognition, and particularly relates to a matrix variable RBM recognition method.

Background

The Boltzmann Machine (BM) is an important stochastic neural network proposed by Hinton and Sejnowski in 1985. But the traditional Boolean machine variable unit has no constraint of connection relation, so that the traditional Boolean machine variable unit cannot be effectively applied to machine learning. To construct a model that can be applied to reality, Hinton proposes a model structure called a constrained boolean machine in which only the connection between the visible layer elements and the hidden layer elements exists.

When there is a restriction between hidden layer and visible layer elements, a RBM (Restricted Boltzmann Machine) model can be considered as a probabilistic model with binary variables. In recent years, RBMs have been widely used in the fields of pattern recognition and machine learning due to their powerful feature extraction and expression capabilities.

Given some training data, the goal of training the RBM model is to learn the direct weights of the visible and hidden layers so that the probability distribution represented by the RBM is as adaptive as possible to all the training samples. A trained RBM model can provide an effective representation of input data based on probability distributions obtained from training data.

The classical RBM model mainly describes input data or variables based on vector form. However, data derived from modern technology is more of a more general structure. For example, a digital image is a 2-dimensional matrix, which includes spatial information. In order that the classical RBM can be applied to data such as a 2D image, a conventional method is to vectorize the 2D data. Unfortunately, this process not only destroys the internal structure of the expensive image, resulting in loss of the interactive information hidden in the structure, but also results in an increase in the model parameters due to the full connectivity between the visible and hidden layers.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the method overcomes the defects of the prior art, greatly reduces the computational complexity of training and derivation, keeps the spatial information in the 2D matrix data in the training and testing process, obtains good effect in the reconstruction process, and can be applied to more complex data structures.

The technical solution of the invention is as follows: the identification method of the matrix variable RBM comprises the following steps:

(1) a training stage: sample training according to matrix variable RBM of formula (4)

WhereinFor the binary visible layer matrix variable,for binary hidden layer matrix variables, Θ represents all model parameters U, V, B, and C, and the normalization constant Z (C)Theta) is defined as

Wherein X, Y represent binary value space of X and YIn order to be the weight matrix of the model,bias matrixes corresponding to the visible layer and the hidden layer;

(2) a classification stage: vectorizing the hidden layer matrix variables, training by applying a K-NN method, and classifying the test images according to the minimum residual error.

The model parameters needing to be learned are less than the classical RBMs, so the computational complexity of training and derivation is obviously reduced; the visible layer and the hidden layer are both in a matrix form, so that the spatial information in the 2D matrix data is kept in the training and testing processes, and a good effect is obtained in the reconstruction process; the invention can be easily extended to tensor data of any order, and thus can be applied to more complex data structures.

Drawings

Fig. 1 shows a classical RBM model.

Fig. 2 shows the RBM model of the present invention.

Fig. 3 shows the classification error rate when the number of iterations and the number of training samples are fixed.

Fig. 4 shows the classification error rates of different methods when the number of training samples is different.

Detailed Description

The identification method of the matrix variable RBM comprises the following steps:

WhereinFor the binary visible layer matrix variable,for binary hidden layer matrix variables, Θ represents all model parameters U, V, B and C, and the normalization constant Z (Θ) is defined as

Preferably, the step (1) comprises the following substeps:

(1.1) defining a matrix-type training sample setMaximum iteration times T, learning rate, weight regular terms, training sample number of each group and a CD algorithm K';

(1.2) randomly initializing U and V, and making B ═ C ═ 0 and random gradient Δ U ═ Δ V ═ Δ B ═ Δ C ═ 0;

(1.3) the number of iteration steps T ═ 1 → T;

(1.4) random mixingDivided into M groupsThe size is b;

(1.5) group M ═ 1 → M;

(1.6) Gibbs sampling of all data under current model parameters

(1.7) K ═ 0 → K' -1;

(1.8) sample Y according to equation (9)^(k)Sampling is carried out

p(Y＝1|X；Θ)＝σ(UXV^T+C) (9)；

(1.9) sample X according to equation (8)^(k)Sampling is carried out

p(X＝1|Y；Θ)＝σ(U^TYV+B) (8)；

(1.10) updating of the gradient according to equation (18)

(1.11) updating a model parameter theta epsilon theta according to a formula theta + delta theta;

and (1.12) finishing.

Preferably, the maximum iteration number T is 10000, the learning rate is 0.05, the weight regularization term is 0.01, the number of training samples in each group is 100, and the step K of the CD algorithm is 1.

The invention will now be described in more detail.

1 model definition

Classical RBM [8, 13] is a binary vector model, with both the input and hidden layers in vector form. Model as in fig. 1, the visible layer unit (cube) and the hidden layer unit (cylinder) are in a fully connected form.

The RBM energy function model is:

E(x，y；Θ)＝-x^TWy-b^Tx-c^Ty (1)

wherein,are binary visible layer elements and hidden layer elements,in order to be offset,representing the connection weights of the visible layer and the hidden layer in the neural network. And theta is a model parameter { b, c, w }.

For the purpose of introducing the MVRBM of the present invention, the following notation is defined. Definition ofFor the binary visible layer matrix variable,is a binary hidden layer matrix variable. Assume independent random variable x_ijAnd y_klTake values from {0, 1 }.As a fourth order tensor parameter, an offset matrix ofAndthe following energy function is defined.

Where Θ is a model parameter { W, B, C }. There are a total of I.times.J.times.K.times.L + I.times.J + K.times.L free parameters in Θ. Even when I, J, K, L is small, Θ will be a large number, which requires a large number of training samples and a long time. In order to reduce the output of free parameters and save the computational complexity, it is assumed that the connection weights of the hidden layer unit and the visible layer unit have the following relationship: w is a_ijkl＝u_kiv_lj. By defining two new matricesAndthe energy function (2) can be rewritten into the form,

E(X，Y)＝-tr(U^TYVX^T)-tr(X^TB)-tr(Y^TV) (3)

the matrices U and V together define the concatenation weights of the input matrix X and the implicit matrix Y, so that the free parameters Θ in equation (2) are reduced to I × K + L × J + I × J + K × L in equation (3).

Based on equation (3), the following distribution is defined:

Θ denotes all model parameters U, V, B and C. The normalization constant Z (Θ) is defined as

Wherein X and Y represent binary value spaces of X and Y.

The probability model in equation (4) is the matrix variable rbm (mvrbm). The model is shown in figure 2.

For the convenience of explaining the learning algorithm of MVRBM, the following reasoning is proposed for the conditional probability densities of visible cells and hidden cells

Lemma 1. the MVRBM model is defined by equations (3) (4). The conditional probability density of each visible layer unit is

The conditional probability density of each hidden layer cell is

Where σ is a sigmoid function σ (x) 1/(1+ e)^-x)

Applying a matrix representation, two conditional probabilities can be written as:

p(X＝1|Y；Θ)＝σ(U^TYV+B) (8)

p(Y＝1|X；Θ)＝σ(UXV^T+C) (9)

maximum likelihood function and CD algorithm for MVRBM

For a given sample setUnder the joint distribution of the formula (4),is defined as a log-likelihood function of

For any element theta in theta, we can prove that

The first term on the right of the equation (10) equal sign is called a data expansion term, and the second term is called a model expansion term.

The most important problem in computing the gradient of the likelihood function is to compute the model extension term. Because the model extension is to be done for all states of the visible and hidden layersAnd (4) adding. However, the CD algorithm allows for approximate computation through a shorter markov chain. The main idea of the CD algorithm is to take one sample in the sample set as the initial value of the Gibbs chainThe CD-K' algorithm utilizes the samples of step KAs an approximation of the model extension term.

By substituting (11) into (10), we can get an approximation based on the CD algorithm:

for all 4 types of parameters of MVRBM, only calculation is performedAs an example, the calculation of other parameters and so on. From the formula (3) to obtain

Thereby, the formula (12) becomes

For the binary variable Y (Y'), because

To (13) type, there are

Similarly, other parameters can be obtained

In order to verify the effectiveness of the MVRBM algorithm provided by the text, an MNIST database is selected for carrying out denoising, reconstruction and identification experiments. These experiments were performed to demonstrate that MVRBM has better feature extraction and reconstruction capabilities than classical RBM.

The MNIST hand-written digit library has 7 ten thousand hand-written pictures, and 6 ten thousand hand-written pictures are used as training samples and 1 ten thousand hand-written pictures are used as test samples in general. Each 28 x 28 pixel size grayscale image can be obtained from the website: http: lecun, com/exdb/mnist/. download.

1.1 De-noising and reconstruction

In the first experiment, the objective was to show that a trained MVRBM can be used to denoise data and dimensionality reduction for reconstruction.

First, it is desirable to demonstrate that the MVRBM model can learn information from the data. For this purpose, 5000 number 9 pictures were randomly selected from the training samples, the hidden layer matrix variable was set to 15 × 15 size, and the parameter settings in algorithm 1 were used. The training process is iterated for 3000 times. Meanwhile, a denoising experiment was performed. The test picture 9 was randomly added with 10% salt and pepper noise. Therefore, the denoising result is very good.

In another experiment, 2 ten thousand training samples were used to train the MVRBM model, and the training process was iterated for a total of 3000 times. But the size of the hidden layer is set to 25 x 25. For the binary MVRBM model, the trained model parameters U and V can be used as filters or filtersAs a feature extractor. During image processing, the model learns a filter that closely approximates a Haar filter. Meanwhile, the dimension reduction and reconstruction capability of the trained MVRBM model are tested experimentally. Some original samples are shown, followed by images reconstructed by a low-dimensional representation. The average reconstruction error is 10.8488.

Experiments show that the model has good denoising, dimension reduction and reconstruction capabilities, and can effectively learn the characteristics of data.

2 handwritten character recognition

In this experiment, MVRBM can be used as a feature extractor for evaluation. In fact, the hidden layer can be considered a new feature of the visible layer. The classifier can be trained to classify using these new features. Like most classifiers, we spread the MVRBM hidden layer matrix features into vectors and then classify with a K-nearest neighbor classifier (K ═ 1).

Firstly, the hidden layer unit is fixed to be 25 × 25, the iteration number is T equal to 2000, and different training sample numbers are changed to perform experiments, wherein the training sample numbers are from 100 to 20000. In FIG. 3(a), the classification error rate is shown.

Experiments show that the more sufficient the training sample, the better the recognition effect.

In another experiment, 10000 training samples are randomly selected, and the number of iterations is changed for training. The number of iterations ranges from 10 to 3000. Fig. 3(b) shows the classification error rate at different iterations. It can be observed that the MVRBM tends to stabilize when the number of iterations reaches 70. As the number of iterations increased from 300 to 3000, the classification error rate further decreased from 0.0571 to 0.0520.

Based on these experimental observations, parameters N20000 and T3000 were chosen for comparison experiments with other models. In the experiment, it can be seen that the accuracy rate is higher when the training sample adopted by the MVRBM is 50000, and the error rate is only 0.0359. It was also found that we could achieve an error rate of 0.1387 for only 600 training samples.

Finally, a comparison is made with some of the most popular machine learning methods at present, including drop-out based Deep Neural Networks (DNN), Deep Convolutional Networks (DBN), Convolutional Neural Networks (Convolutional) and Sparse automatic coding (SAE). The code for these models can be obtained from the web site: https: com/rasmusbergpalm/deep leantoolbox, we use the default parameter settings in the model. In fig. 4(a) and (b), in the MVRBM model parameter, the number of iterations T is 3000. The results of MVRBM and other methods when the number of training samples was sufficient and insufficient (less than 10000) were shown in the experiment, respectively. Because the MVRBM is significantly reduced from the parameters of other models, overfitting is less likely to result.

And (3) identifying handwritten characters by using MVRBM, wherein the algorithm is expressed as follows:

1. a training stage:

CD-K algorithm 1 of MVRBM

Input: matrix type training sample setMaximum iteration number T (default value is 10,000), learning rate (default value is 0.05), weight regular term (default value is 0.01), and training sample number of each group (default value is 1)00) CD Algorithm K' (default value is 1)

Output: model parameter Θ ═ { U, V, B, C }

1. Initialization: randomly initializing U and V, and making B ═ C ═ 0, and making D ═ C ═ Δ V ═ Δ B ═ Δ C ═ 0

2. for iteration step number T1 → T

3. Will be at randomDivided into M groupsSize b then

4. for group M1 → M

5. Gibbs sampling of all data under current model parameters

6. for K is 0 → K' -1

7. Sample Y according to equation (9)^(k)Sampling is carried out

8. Sample X according to equation (8)^(k)Sampling is carried out

9. end for

10. Updating of the gradient according to equations (14) to (17)

12. end for

13. end for

2. A classification stage: vectorizing the hidden layer matrix variables, training by applying a K-NN method, and classifying the test images according to the minimum residual error.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims

1. A method for identifying a matrix variable RBM, characterized in that it comprises the following steps:

WhereinBinary value space for representing X and YIn order to be the weight matrix of the model,bias matrixes corresponding to the visible layer and the hidden layer;

2. Method for the identification of a matrix variable RBM according to claim 1, characterised in that said step (1) comprises the sub-steps of:

(1.2) randomly initializing U and V, and makingThe random gradient delta U-delta V-delta B-delta C-0;