CN105631476B - A kind of recognition methods of matrix variables RBM - Google Patents

A kind of recognition methods of matrix variables RBM Download PDF

Info

Publication number
CN105631476B
CN105631476B CN201510994184.2A CN201510994184A CN105631476B CN 105631476 B CN105631476 B CN 105631476B CN 201510994184 A CN201510994184 A CN 201510994184A CN 105631476 B CN105631476 B CN 105631476B
Authority
CN
China
Prior art keywords
matrix
training
rbm
model
hidden layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510994184.2A
Other languages
Chinese (zh)
Other versions
CN105631476A (en
Inventor
齐光磊
孙艳丰
胡永利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201510994184.2A priority Critical patent/CN105631476B/en
Publication of CN105631476A publication Critical patent/CN105631476A/en
Application granted granted Critical
Publication of CN105631476B publication Critical patent/CN105631476B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses the recognition methods of matrix variables RBM a kind of, its computation complexity for substantially reducing training and deriving, the spatial information in 2D matrix data is maintained in trained and test process while obtaining good effect in restructuring procedure, can be applied to more complicated data structure.The method comprising the steps of: (1) training stage: carrying out sample training according to the matrix variables RBM of formula (4)WhereinFor binary system visible layer matrix variables,For binary system hidden layer matrix variables, Θ indicates all model parameter U, V, B and C, and normalization constant Z (Θ) is defined asThe binary-valued space of wherein x, y expression X and Y;(2) sorting phase: carrying out vectorization for hidden layer matrix variables, be trained using K-NN method, is classified according to the minimum test image of residual error.

Description

Matrix variable RBM identification method
Technical Field
The invention belongs to the technical field of pattern recognition, and particularly relates to a matrix variable RBM recognition method.
Background
The Boltzmann Machine (BM) is an important stochastic neural network proposed by Hinton and Sejnowski in 1985. But the traditional Boolean machine variable unit has no constraint of connection relation, so that the traditional Boolean machine variable unit cannot be effectively applied to machine learning. To construct a model that can be applied to reality, Hinton proposes a model structure called a constrained boolean machine in which only the connection between the visible layer elements and the hidden layer elements exists.
When there is a restriction between hidden layer and visible layer elements, a RBM (Restricted Boltzmann Machine) model can be considered as a probabilistic model with binary variables. In recent years, RBMs have been widely used in the fields of pattern recognition and machine learning due to their powerful feature extraction and expression capabilities.
Given some training data, the goal of training the RBM model is to learn the direct weights of the visible and hidden layers so that the probability distribution represented by the RBM is as adaptive as possible to all the training samples. A trained RBM model can provide an effective representation of input data based on probability distributions obtained from training data.
The classical RBM model mainly describes input data or variables based on vector form. However, data derived from modern technology is more of a more general structure. For example, a digital image is a 2-dimensional matrix, which includes spatial information. In order that the classical RBM can be applied to data such as a 2D image, a conventional method is to vectorize the 2D data. Unfortunately, this process not only destroys the internal structure of the expensive image, resulting in loss of the interactive information hidden in the structure, but also results in an increase in the model parameters due to the full connectivity between the visible and hidden layers.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method overcomes the defects of the prior art, greatly reduces the computational complexity of training and derivation, keeps the spatial information in the 2D matrix data in the training and testing process, obtains good effect in the reconstruction process, and can be applied to more complex data structures.
The technical solution of the invention is as follows: the identification method of the matrix variable RBM comprises the following steps:
(1) a training stage: sample training according to matrix variable RBM of formula (4)
WhereinFor the binary visible layer matrix variable,for binary hidden layer matrix variables, Θ represents all model parameters U, V, B, and C, and the normalization constant Z (C)Theta) is defined as
Wherein X, Y represent binary value space of X and YIn order to be the weight matrix of the model,bias matrixes corresponding to the visible layer and the hidden layer;
(2) a classification stage: vectorizing the hidden layer matrix variables, training by applying a K-NN method, and classifying the test images according to the minimum residual error.
The model parameters needing to be learned are less than the classical RBMs, so the computational complexity of training and derivation is obviously reduced; the visible layer and the hidden layer are both in a matrix form, so that the spatial information in the 2D matrix data is kept in the training and testing processes, and a good effect is obtained in the reconstruction process; the invention can be easily extended to tensor data of any order, and thus can be applied to more complex data structures.
Drawings
Fig. 1 shows a classical RBM model.
Fig. 2 shows the RBM model of the present invention.
Fig. 3 shows the classification error rate when the number of iterations and the number of training samples are fixed.
Fig. 4 shows the classification error rates of different methods when the number of training samples is different.
Detailed Description
The identification method of the matrix variable RBM comprises the following steps:
(1) a training stage: sample training according to matrix variable RBM of formula (4)
WhereinFor the binary visible layer matrix variable,for binary hidden layer matrix variables, Θ represents all model parameters U, V, B and C, and the normalization constant Z (Θ) is defined as
Wherein X, Y represent binary value space of X and YIn order to be the weight matrix of the model,bias matrixes corresponding to the visible layer and the hidden layer;
(2) a classification stage: vectorizing the hidden layer matrix variables, training by applying a K-NN method, and classifying the test images according to the minimum residual error.
The model parameters needing to be learned are less than the classical RBMs, so the computational complexity of training and derivation is obviously reduced; the visible layer and the hidden layer are both in a matrix form, so that the spatial information in the 2D matrix data is kept in the training and testing processes, and a good effect is obtained in the reconstruction process; the invention can be easily extended to tensor data of any order, and thus can be applied to more complex data structures.
Preferably, the step (1) comprises the following substeps:
(1.1) defining a matrix-type training sample setMaximum iteration times T, learning rate, weight regular terms, training sample number of each group and a CD algorithm K';
(1.2) randomly initializing U and V, and making B ═ C ═ 0 and random gradient Δ U ═ Δ V ═ Δ B ═ Δ C ═ 0;
(1.3) the number of iteration steps T ═ 1 → T;
(1.4) random mixingDivided into M groupsThe size is b;
(1.5) group M ═ 1 → M;
(1.6) Gibbs sampling of all data under current model parameters
(1.7) K ═ 0 → K' -1;
(1.8) sample Y according to equation (9)(k)Sampling is carried out
p(Y=1|X;Θ)=σ(UXVT+C) (9);
(1.9) sample X according to equation (8)(k)Sampling is carried out
p(X=1|Y;Θ)=σ(UTYV+B) (8);
(1.10) updating of the gradient according to equation (18)
(1.11) updating a model parameter theta epsilon theta according to a formula theta + delta theta;
and (1.12) finishing.
Preferably, the maximum iteration number T is 10000, the learning rate is 0.05, the weight regularization term is 0.01, the number of training samples in each group is 100, and the step K of the CD algorithm is 1.
The invention will now be described in more detail.
1 model definition
Classical RBM [8, 13] is a binary vector model, with both the input and hidden layers in vector form. Model as in fig. 1, the visible layer unit (cube) and the hidden layer unit (cylinder) are in a fully connected form.
The RBM energy function model is:
E(x,y;Θ)=-xTWy-bTx-cTy (1)
wherein,are binary visible layer elements and hidden layer elements,in order to be offset,representing the connection weights of the visible layer and the hidden layer in the neural network. And theta is a model parameter { b, c, w }.
For the purpose of introducing the MVRBM of the present invention, the following notation is defined. Definition ofFor the binary visible layer matrix variable,is a binary hidden layer matrix variable. Assume independent random variable xijAnd yklTake values from {0, 1 }.As a fourth order tensor parameter, an offset matrix ofAndthe following energy function is defined.
Where Θ is a model parameter { W, B, C }. There are a total of I.times.J.times.K.times.L + I.times.J + K.times.L free parameters in Θ. Even when I, J, K, L is small, Θ will be a large number, which requires a large number of training samples and a long time. In order to reduce the output of free parameters and save the computational complexity, it is assumed that the connection weights of the hidden layer unit and the visible layer unit have the following relationship: w is aijkl=ukivlj. By defining two new matricesAndthe energy function (2) can be rewritten into the form,
E(X,Y)=-tr(UTYVXT)-tr(XTB)-tr(YTV) (3)
the matrices U and V together define the concatenation weights of the input matrix X and the implicit matrix Y, so that the free parameters Θ in equation (2) are reduced to I × K + L × J + I × J + K × L in equation (3).
Based on equation (3), the following distribution is defined:
Θ denotes all model parameters U, V, B and C. The normalization constant Z (Θ) is defined as
Wherein X and Y represent binary value spaces of X and Y.
The probability model in equation (4) is the matrix variable rbm (mvrbm). The model is shown in figure 2.
For the convenience of explaining the learning algorithm of MVRBM, the following reasoning is proposed for the conditional probability densities of visible cells and hidden cells
Lemma 1. the MVRBM model is defined by equations (3) (4). The conditional probability density of each visible layer unit is
The conditional probability density of each hidden layer cell is
Where σ is a sigmoid function σ (x) 1/(1+ e)-x)
Applying a matrix representation, two conditional probabilities can be written as:
p(X=1|Y;Θ)=σ(UTYV+B) (8)
p(Y=1|X;Θ)=σ(UXVT+C) (9)
maximum likelihood function and CD algorithm for MVRBM
For a given sample setUnder the joint distribution of the formula (4),is defined as a log-likelihood function of
For any element theta in theta, we can prove that
The first term on the right of the equation (10) equal sign is called a data expansion term, and the second term is called a model expansion term.
The most important problem in computing the gradient of the likelihood function is to compute the model extension term. Because the model extension is to be done for all states of the visible and hidden layersAnd (4) adding. However, the CD algorithm allows for approximate computation through a shorter markov chain. The main idea of the CD algorithm is to take one sample in the sample set as the initial value of the Gibbs chainThe CD-K' algorithm utilizes the samples of step KAs an approximation of the model extension term.
By substituting (11) into (10), we can get an approximation based on the CD algorithm:
for all 4 types of parameters of MVRBM, only calculation is performedAs an example, the calculation of other parameters and so on. From the formula (3) to obtain
Thereby, the formula (12) becomes
For the binary variable Y (Y'), because
To (13) type, there are
Similarly, other parameters can be obtained
In order to verify the effectiveness of the MVRBM algorithm provided by the text, an MNIST database is selected for carrying out denoising, reconstruction and identification experiments. These experiments were performed to demonstrate that MVRBM has better feature extraction and reconstruction capabilities than classical RBM.
The MNIST hand-written digit library has 7 ten thousand hand-written pictures, and 6 ten thousand hand-written pictures are used as training samples and 1 ten thousand hand-written pictures are used as test samples in general. Each 28 x 28 pixel size grayscale image can be obtained from the website: http: lecun, com/exdb/mnist/. download.
1.1 De-noising and reconstruction
In the first experiment, the objective was to show that a trained MVRBM can be used to denoise data and dimensionality reduction for reconstruction.
First, it is desirable to demonstrate that the MVRBM model can learn information from the data. For this purpose, 5000 number 9 pictures were randomly selected from the training samples, the hidden layer matrix variable was set to 15 × 15 size, and the parameter settings in algorithm 1 were used. The training process is iterated for 3000 times. Meanwhile, a denoising experiment was performed. The test picture 9 was randomly added with 10% salt and pepper noise. Therefore, the denoising result is very good.
In another experiment, 2 ten thousand training samples were used to train the MVRBM model, and the training process was iterated for a total of 3000 times. But the size of the hidden layer is set to 25 x 25. For the binary MVRBM model, the trained model parameters U and V can be used as filters or filtersAs a feature extractor. During image processing, the model learns a filter that closely approximates a Haar filter. Meanwhile, the dimension reduction and reconstruction capability of the trained MVRBM model are tested experimentally. Some original samples are shown, followed by images reconstructed by a low-dimensional representation. The average reconstruction error is 10.8488.
Experiments show that the model has good denoising, dimension reduction and reconstruction capabilities, and can effectively learn the characteristics of data.
2 handwritten character recognition
In this experiment, MVRBM can be used as a feature extractor for evaluation. In fact, the hidden layer can be considered a new feature of the visible layer. The classifier can be trained to classify using these new features. Like most classifiers, we spread the MVRBM hidden layer matrix features into vectors and then classify with a K-nearest neighbor classifier (K ═ 1).
Firstly, the hidden layer unit is fixed to be 25 × 25, the iteration number is T equal to 2000, and different training sample numbers are changed to perform experiments, wherein the training sample numbers are from 100 to 20000. In FIG. 3(a), the classification error rate is shown.
Experiments show that the more sufficient the training sample, the better the recognition effect.
In another experiment, 10000 training samples are randomly selected, and the number of iterations is changed for training. The number of iterations ranges from 10 to 3000. Fig. 3(b) shows the classification error rate at different iterations. It can be observed that the MVRBM tends to stabilize when the number of iterations reaches 70. As the number of iterations increased from 300 to 3000, the classification error rate further decreased from 0.0571 to 0.0520.
Based on these experimental observations, parameters N20000 and T3000 were chosen for comparison experiments with other models. In the experiment, it can be seen that the accuracy rate is higher when the training sample adopted by the MVRBM is 50000, and the error rate is only 0.0359. It was also found that we could achieve an error rate of 0.1387 for only 600 training samples.
Finally, a comparison is made with some of the most popular machine learning methods at present, including drop-out based Deep Neural Networks (DNN), Deep Convolutional Networks (DBN), Convolutional Neural Networks (Convolutional) and Sparse automatic coding (SAE). The code for these models can be obtained from the web site: https: com/rasmusbergpalm/deep leantoolbox, we use the default parameter settings in the model. In fig. 4(a) and (b), in the MVRBM model parameter, the number of iterations T is 3000. The results of MVRBM and other methods when the number of training samples was sufficient and insufficient (less than 10000) were shown in the experiment, respectively. Because the MVRBM is significantly reduced from the parameters of other models, overfitting is less likely to result.
And (3) identifying handwritten characters by using MVRBM, wherein the algorithm is expressed as follows:
1. a training stage:
CD-K algorithm 1 of MVRBM
Input: matrix type training sample setMaximum iteration number T (default value is 10,000), learning rate (default value is 0.05), weight regular term (default value is 0.01), and training sample number of each group (default value is 1)00) CD Algorithm K' (default value is 1)
Output: model parameter Θ ═ { U, V, B, C }
1. Initialization: randomly initializing U and V, and making B ═ C ═ 0, and making D ═ C ═ Δ V ═ Δ B ═ Δ C ═ 0
2. for iteration step number T1 → T
3. Will be at randomDivided into M groupsSize b then
4. for group M1 → M
5. Gibbs sampling of all data under current model parameters
6. for K is 0 → K' -1
7. Sample Y according to equation (9)(k)Sampling is carried out
8. Sample X according to equation (8)(k)Sampling is carried out
9. end for
10. Updating of the gradient according to equations (14) to (17)
12. end for
13. end for
2. A classification stage: vectorizing the hidden layer matrix variables, training by applying a K-NN method, and classifying the test images according to the minimum residual error.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims (3)

1. A method for identifying a matrix variable RBM, characterized in that it comprises the following steps:
(1) a training stage: sample training according to matrix variable RBM of formula (4)
WhereinFor the binary visible layer matrix variable,for binary hidden layer matrix variables, Θ represents all model parameters U, V, B and C, and the normalization constant Z (Θ) is defined as
WhereinBinary value space for representing X and YIn order to be the weight matrix of the model,bias matrixes corresponding to the visible layer and the hidden layer;
(2) a classification stage: vectorizing the hidden layer matrix variables, training by applying a K-NN method, and classifying the test images according to the minimum residual error.
2. Method for the identification of a matrix variable RBM according to claim 1, characterised in that said step (1) comprises the sub-steps of:
(1.1) defining a matrix-type training sample setMaximum iteration times T, learning rate, weight regular terms, training sample number of each group and a CD algorithm K';
(1.2) randomly initializing U and V, and makingThe random gradient delta U-delta V-delta B-delta C-0;
(1.3) the number of iteration steps T ═ 1 → T;
(1.4) random mixingDivided into M groupsThe size is b;
(1.5) group M ═ 1 → M;
(1.6) Gibbs sampling of all data under current model parameters
(1.7) K ═ 0 → K' -1;
(1.8) sample Y according to equation (9)(k)Sampling is carried out
p(Y=1|X;Θ)=σ(UXVT+C) (9);
(1.9) sample X according to equation (8)(k)Sampling is carried out
p(X=1|Y;Θ)=σ(UTYV+B) (8);
In equations (8) and (9), σ represents an sigmoid function σ (x) of 1/(1+ e)-x);
(1.10) updating of the gradient according to equation (18)
(1.11) updating a model parameter theta epsilon theta according to a formula theta + delta theta;
and (1.12) finishing.
3. The method for recognizing the matrix variable RBM as claimed in claim 2, wherein the maximum iteration number T is 10000, the learning rate is 0.05, the weight regularization term is 0.01, the number of training samples in each group is 100, and the K' step of the CD algorithm is 1 step, whereinIn order to be the weight matrix of the model,the bias matrices for the visible layer and the hidden layer correspond.
CN201510994184.2A 2015-12-25 2015-12-25 A kind of recognition methods of matrix variables RBM Active CN105631476B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510994184.2A CN105631476B (en) 2015-12-25 2015-12-25 A kind of recognition methods of matrix variables RBM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510994184.2A CN105631476B (en) 2015-12-25 2015-12-25 A kind of recognition methods of matrix variables RBM

Publications (2)

Publication Number Publication Date
CN105631476A CN105631476A (en) 2016-06-01
CN105631476B true CN105631476B (en) 2019-06-21

Family

ID=56046388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510994184.2A Active CN105631476B (en) 2015-12-25 2015-12-25 A kind of recognition methods of matrix variables RBM

Country Status (1)

Country Link
CN (1) CN105631476B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446117A (en) * 2016-09-18 2017-02-22 西安电子科技大学 Text analysis method based on poisson-gamma belief network
CN106886798A (en) * 2017-03-10 2017-06-23 北京工业大学 The image-recognizing method of the limited Boltzmann machine of the Gaussian Profile based on matrix variables

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814160A (en) * 2010-03-08 2010-08-25 清华大学 RBF neural network modeling method based on feature clustering
CN104361393A (en) * 2014-09-06 2015-02-18 华北电力大学 Method for using improved neural network model based on particle swarm optimization for data prediction
CN104880945A (en) * 2015-03-31 2015-09-02 成都市优艾维机器人科技有限公司 Self-adaptive inverse control method for unmanned rotorcraft based on neural networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1393196A4 (en) * 2001-05-07 2007-02-28 Health Discovery Corp Kernels and methods for selecting kernels for use in learning machines

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814160A (en) * 2010-03-08 2010-08-25 清华大学 RBF neural network modeling method based on feature clustering
CN104361393A (en) * 2014-09-06 2015-02-18 华北电力大学 Method for using improved neural network model based on particle swarm optimization for data prediction
CN104880945A (en) * 2015-03-31 2015-09-02 成都市优艾维机器人科技有限公司 Self-adaptive inverse control method for unmanned rotorcraft based on neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向图像分类和识别的视觉特征表达与学习的研究;杨钊;《中国博士学位论文全文数据库·信息科技辑》;20141115;I138-21第6-7页,第66-68页

Also Published As

Publication number Publication date
CN105631476A (en) 2016-06-01

Similar Documents

Publication Publication Date Title
CN106991372B (en) Dynamic gesture recognition method based on mixed deep learning model
CN109754078B (en) Method for optimizing a neural network
Li et al. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions
CN107506712B (en) Human behavior identification method based on 3D deep convolutional network
CN107526785B (en) Text classification method and device
CN110348399B (en) Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network
Lee et al. Wasserstein introspective neural networks
US20190228268A1 (en) Method and system for cell image segmentation using multi-stage convolutional neural networks
Mao et al. Deep residual pooling network for texture recognition
CN108121975B (en) Face recognition method combining original data and generated data
Oktar et al. A review of sparsity-based clustering methods
Kingma et al. Regularized estimation of image statistics by score matching
CN110969086B (en) Handwritten image recognition method based on multi-scale CNN (CNN) features and quantum flora optimization KELM
CN109389166A (en) The depth migration insertion cluster machine learning method saved based on partial structurtes
Chu et al. Stacked Similarity-Aware Autoencoders.
Salem et al. Semantic image inpainting using self-learning encoder-decoder and adversarial loss
CN111371611A (en) Weighted network community discovery method and device based on deep learning
CN105631476B (en) A kind of recognition methods of matrix variables RBM
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
US20230076290A1 (en) Rounding mechanisms for post-training quantization
CN109978080B (en) Image identification method based on discrimination matrix variable limited Boltzmann machine
Yang et al. Image noise level estimation for rice noise based on extended ELM neural network training algorithm
Newatia et al. Convolutional neural network for ASR
Vepuri Improving facial emotion recognition with image processing and deep learning
Fonseka et al. Data augmentation to improve the performance of a convolutional neural network on image classification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant