CN107798349B

CN107798349B - Transfer learning method based on depth sparse self-coding machine

Info

Publication number: CN107798349B
Application number: CN201711069171.XA
Authority: CN
Inventors: 胡学钢; 张玉红; 朱毅; 李培培; 周鹏
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2017-11-03
Filing date: 2017-11-03
Publication date: 2020-07-14
Anticipated expiration: 2037-11-03
Also published as: CN107798349A

Abstract

The invention discloses a deep sparse self-coding machine-based transfer learning method which comprises the steps of (1) preprocessing a data set vectorization earlier stage, (2) designing and realizing a model, (3) carrying out semi-supervised learning on extracted features obtained by applying a Stacked RICA algorithm, (4) training a classifier on a training set by using a logistic regression analysis model (L R) after the feature extraction is finished, (5) carrying out classification prediction on a test set by using the classifier on the training set, (6) finishing the classification on the test set to obtain a final transfer learning result.

Description

Transfer learning method based on depth sparse self-coding machine

Technical Field

The invention relates to the field of feature extraction and transfer learning methods, in particular to a transfer learning method based on a deep sparse self-coding machine.

Background

Traditional machine learning has achieved significant success in many areas. However, many machine learning algorithms are based on the assumption that the training set and the test set are homologized, independent and homologized, and most machine learning needs to recapture data when the data set distribution changes, which requires a large amount of training data to be collected again. In real-world applications, the environment is constantly changing, data is re-collected, and the model is retrained for each new scenario encountered by the learning system, which is very costly and impractical. It is desirable that the learning system automatically adapt to changes in the environment with little retraining data and retraining time. Under the condition, the migration knowledge which is obtained from the former scene and can be applied to the new scene can help us to speed up the learning process, reduce the cost of collecting new training data and achieve the goal of migration learning. Migration learning emphasizes the transfer of knowledge across domains, tasks, and distributions, which are similar but not identical. For example, learning to recognize an apple may help to learn a pear, or learning to play an electronic organ may help to learn a piano. The research of transfer learning is essentially that people always apply the original knowledge to solve new problems more quickly.

In recent years, Deep learning (Deep L earning) has been used to extract features in the fields of images, text, audio, etc. in terms of human perception, the information processing of the human visual system is hierarchical, edge features are extracted from a low-level V1 region, then the shape of a V2 region or parts of a target, and then the whole target, the behavior of the target, etc. in other words, the features of a high-level are the combination of features of a low-level, and the features from a low-level to a high-level are more abstract and more capable of expressing semantics or intentions.

A sparse self-coding machine is a method for extracting data features. This has the advantage that a set of linearly independent over-complete bases (over-completed bases) can be extracted to reconstruct the sample. The general model for extracting the feature base vectors can only ensure that the base vectors are linearly uncorrelated, and the model cannot be well applied in some applications. For example, some audio is collected, and the audio has personal voices, which are independent of each other, and we want to separate the audio of each person, so the model is disabled at this time. We use the RICA (Reconstruction Independent Component analysis) algorithm, and the goal is to learn a set of mutually Independent overcomplete bases.

The deep sparse self-coding machine is based on the thought of deep learning, the sparse self-coding machine is used as one layer of a model to be superposed, namely the result of the sparse self-coding machine at the previous layer is output and is used as the input of the next layer, so that a multi-layer deep learning structure is formed, and more useful characteristics are extracted. And then, semi-supervised learning is performed according to the extracted features, so that the precision and the accuracy of the transfer learning are improved.

In the research on the feature extraction and the migration learning method, the existing methods are all researches carried out by using self-coding models, and the research work of using sparse coding models is very little. Sparse coding is one of effective means for reducing dimensions in the fields of images, texts and the like, but the application of sparse coding in field adaptation has some problems, and the common problems are as follows: (a) the problem of non-independent linearity between characteristic basis vectors; (b) the problem of the use of tags in the source domain; (c) and (4) the objective function bias term after superposition. If the problems cannot be solved well, the accuracy of feature extraction and transfer learning is inevitably affected, and the invention provides a solution to the problems.

Disclosure of Invention

The invention aims to provide a transfer learning method based on a depth sparse self-coding machine, which aims to solve the problems of the prior art in feature extraction and transfer learning methods.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a transfer learning method based on a depth sparse self-coding machine is characterized by comprising the following steps: the method sequentially comprises the following steps:

(1) and whitening preprocessing is carried out on all images in the image database, and the process is as follows:

(1.1) representing the input dataset as { x }⁽¹⁾,x⁽²⁾,...,x⁽ⁿ⁾Calculating the covariance matrix of x

Then calculating the characteristic direction of the covariance matrixThe quantity, in terms of column composition matrix U, is shown by the following equation:

in the matrix U, U₁Is the principal eigenvector, which corresponds to the largest eigenvalue, u₂Is a sub-feature vector, analogizes with the same, and records λ₁,λ₂,...,λ_nCharacteristic values corresponding to all the vector quantities in the matrix U are obtained;

(1.2) representing the input data by the calculated matrix U as shown in the following equation:

wherein the subscript rot refers to rotation, which means that it is the result of the rotation processing of the original data, in order to make each input feature have unit variance, the use of

Scaling each feature x as a scaling factor_rot,iWherein the resulting PCA whitened data is represented by the following formula:

(1.3) let R be any orthogonal matrix, i.e. satisfy RR^T＝R^TR is I, then Rx_PCAwhiteStill with unit covariance, in order to make the input data after whitening processing for all possible R as close to the original input data as possible, let R be U, resulting in formula (1):

x_ZCAwhite＝Ux_PCAwhite(1),

x_ZCAwhitenamely processing data of original input data obtained after ZCA whitening;

(2) constructing a depth sparse self-coding machine model to extract high-level abstract features of the image, wherein the process is as follows:

(2.1) constructing a sparse self-encoding machine model, comprising the following steps:

(2.1.1), the sparse self-coding model uses a Reconstruction Independent component analysis algorithm, namely RICA algorithm, and x is obtained by using a formula (1)_ZCAwhiteIs input data of the RICA algorithm and substitutes the cost function formula (2):

in the cost function formula (2), x is the input data, i.e. x_ZCAwhiteW is a weighting matrix;

(2.1.2) obtaining the partial derivative of x for the cost function formula (2), wherein the partial derivative of the first term of the cost function formula (2) is obtained

As the partial derivative function, the obtained partial derivative function is shown in formula (3):

(2.1.3) iteratively calculating a weighting matrix W by utilizing an L-FBGS algorithm to obtain a trained sparse self-coding model.

(2.2) constructing a deep sparse self-coding machine model:

substituting the weighting matrix W obtained in the step (2.1) into the cost function formula (2), and recording the obtained output as

Is output data obtained after the training of the single-layer RICA model is finished, and the output data is obtained

Repeating step (2.1) as input data to obtain W⁽ⁱ⁾Training a weighting matrix obtained after stacking the sparse self-coding machine, wherein i is the number of times of the iteration step (2.1);

(2.3) extracting features according to the trained deep sparse self-coding machine model;

pooling the square root of the model square root with the weighting matrix W obtained in step (2.1)⁽ⁱ⁾Substituting the formula (4) for convolution feature extraction, wherein the formula (4) is as follows:

in the formula (4), the first and second groups,

represents the input of the l-th layer in the convolutional network,

the error term for layer l +1 representing the kth feature in the convolutional network, the output from equation (4) is denoted as x_feaAbstract features extracted from the raw input data;

(3) and optimizing features by semi-supervised learning:

using x obtained in step (2)_feaPerforming semi-supervised learning for input to obtain a formula (5), adding K L distance of source domain distribution and target domain distribution and multi-class regression bias terms according to source domain class labels,

denotes the output, W, obtained after semi-supervised learning_SSLRepresenting a weight matrix in semi-supervised learning, ξ^(s)Output representing hidden layers in the source domain, ξ^(t)Representing the output of the hidden layer in the target domain, equation (5) is as follows:

in the formula (5), the first and second groups,

representing the reconstruction error from the original data to the data re-represented after feature extraction;

a K L distance representing the source domain distribution and the target domain distribution;

representing multi-class regression bias terms made according to the source domain class labels;

matrix W representing characteristic parameters_SSLThe constraint term of (2);

(4) training a classifier and classifying the test image data set, wherein the process is as follows:

(4.1) training L R classifier with test image data set, in L R classification, note:

in formula (6), scale

Outputting the finished product of the step (3) for sigmoid function

And a label-substitution-in (6) training classifier of y, the training image dataset, of known labels;

(4.2) classifying the test image data set by using the trained classifier; outputting the finished product of the step (3)

The output of the test image data set of (1) is substituted for the L R classifier completed in formula (6), and the classification result T of the test image data set is obtained_testAs shown in equation (7):

T_test＝argmaxP(x) (7)。

the invention provides a transfer learning method based on a depth sparse self-coding machine. According to the method, from the aspect of deep learning, a sparse self-coding machine model applying an RICA algorithm is applied to data set feature extraction, and through a multi-layer superposition idea of deep learning, a deep sparse self-coding machine is constructed through a Stacked RICA algorithm and a linear independent over-complete feature basis vector is trained and extracted. And on the basis of the feature basis vector, a semi-supervised learning method is applied to add the source domain class labels and the bias terms of the multi-class regression, so that the extracted features are further optimized. And finally, training a classifier by applying a support vector machine model according to the extracted features to realize classification prediction of the target domain and finish the target of transfer learning. The method can extract more useful features in the data set, improves the classification precision in the target domain, and obviously improves the accuracy and precision of the transfer learning.

The invention solves the important practical problem of feature extraction and transfer learning, the research result can be directly applied to image classification, text classification, emotion transfer and other applications, and can be expanded to be applied to a plurality of fields such as audio, web pages, videos and the like, and the invention has important application value, and once the research is successful and put into application, huge social and economic benefits can be generated.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention realizes the characteristic representation of the extracted data from the level research of the independent component analysis model, and improves the robustness of the represented characteristic compared with the traditional characteristic extraction algorithm (sparse coding or self-coding).

2. According to the invention, through a method of a hierarchical structure in deep learning, on the basis of analyzing data of a data set, a staged RicA algorithm is researched and provided, a source domain label and a multi-class regression target function are considered in a multilayer structure, the condition of the source domain label is applied to optimization of the target function, more useful characteristics in the data set can be extracted, the classification precision in the target domain is improved, and the accuracy of transfer learning is improved.

3. The invention can be applied to a plurality of fields such as images, texts, audios, videos and the like, and has important application value. Moreover, the results of the research based on the Stacked RICA can also be applied to many pattern classification fields related to transfer learning, such as image recognition, emotion classification, theme classification, voice recognition and robot system.

Drawings

Fig. 1 is a flowchart of a specific study scheme of feature extraction and transfer learning according to the present invention.

FIG. 2 is a schematic diagram of the hierarchy of the RICA model.

FIG. 3 is a schematic diagram of an analysis of a Stacked RICA model according to the present invention.

Detailed Description

As shown in fig. 1, fig. 1 is a flow chart of the method of the present invention, and the specific implementation in fig. 1 is as follows:

(1) in order to train better characteristics, the training data set and the test data set are spliced and vectorized to obtain a vectorized data set.

(2) For the vectorized text data set, a Stacked Reconstruction independent component Analysis (Stacked RICA) model is used for feature extraction, and the specific process is as follows:

1) whitening data with the ZCA method:

ZCA whitening is a data pre-processing method that maps data from x to x_ZCAwhiteIt has also proven to be a rough model of the processed image of a biological eye (retina). For example, when your eye perceives an image, most adjacent "pixels" are perceived as similar values in the eye, since adjacent parts in an image are very correlated in brightness. Therefore, it is very cost-inefficient if the human eye needs to transmit each pixel value separately (via the optic nerve) into the brain. Instead, the retina performs a decorrelation operation similar to that in ZCA, thereby obtaining a less redundant representation of the input image, and transmits it to the brain. In feature extraction, the input is redundant for training purposes due to the strong correlation between adjacent instances or expressions in the dataset. The purpose of whitening is to reduce the redundancy of the input, and the input of the learning algorithm has the following properties through the whitening process: (i) features ofThe inter-correlation is low; (ii) all features have the same variance. The result of ZCA whitening can be expressed as:

2) feature extraction based on Stacked RICA

The method comprises the following specific steps:

① Single layer RICA extraction features

A Reconstruction Independent Component Analysis (RICA) algorithm is designed to extract features according to the idea of fig. 2. Assuming that given an input of x, the present invention is intended to derive a linearly independent set of bases (denoted by W), the objective function can be expressed as:

J(W)＝||Wx||₁

in the expression, Wx represents the characteristic representation of input x, in RICA, in order to ensure that mutually linearly independent overcomplete bases are obtained, the invention solves the following objective function:

where λ is the weight attenuation coefficient, W is the weight proof, and x is the input data. To solve the objective function:

first, the first step requires solving by a method of requiring derivatives

I.e. to solve for

As shown in fig. 2, the weights and activation functions in the model are as follows:

let J (z)⁽⁴⁾) F (x), then J (z)⁽⁴⁾)＝∑_kJ(z_k ⁽⁴⁾)。

Model will beAfter the input of (2) is set to F, the question is converted to a solution

Although W appears twice in the model, it can be shown that when W appears multiple times in the neural network, the partial derivative with respect to W is the sum of partial derivatives with respect to each W instance in the network, as follows:

as described above, the present invention first derives a partial derivative for each W instance,

with respect to W^TThe following steps:

regarding W:

the final method for solving partial derivatives of W is:

the second step is an iteration by the method with l-bfgs. The invention is completed by the following cost function:

w finally obtained after multiple iterations is a group of linear independent overcomplete bases of the original input x. From this set of bases we can get a more useful characterization Wx of the original input data x.

② superposition of RICA (stacked RICA) computational feature representation

FIG. 3 is a model diagram of a Stacked RICA model according to the present invention, illustrating that the Stacked RICA model consists of an input layer, two hidden layers, and an output layer. The Stacked RICA model isBased on the idea of deep learning, the RICA structures are superposed, namely a stronger characteristic representation z obtained after the completion of single-layer RICA is used as the input of the next-layer RICA algorithm, and then each layer of iteration optimization parameters is used for optimizing the objective function. Finally obtaining the characteristic representation of the original input data through multilayer superposition

(3) After the feature extraction work is done by Stacked RICA, the resulting feature representation is used

Instead of the original input data x, pair

Semi-Supervised learning (Semi-Supervised L earning) is carried out and consideration of bias terms is added, wherein the bias terms comprise K L distances of source domain distribution and target domain distribution and multi-class regression bias terms made according to source domain class labels, and label information of a source domain is applied to optimization of feature representation.

The objective function can be expressed as:

wherein the content of the first and second substances,

representing the reconstruction error from the original data to the data re-represented after feature extraction.

K L distance representing the source domain distribution and the target domain distribution.

Representing according to source domainAnd (4) making multiple types of regression bias items by using the class labels.

Representing the constraint terms of the characteristic parameter matrix W.

(4) After all the feature extraction and selection processes are completed, the obtained features of the source domain are used for representing, and a classifier is trained in the source domain, wherein the tool for training the classifier is a Support Vector Machine (SVM), a logistic regression analysis model (L R) or a module classifier.

(5) And (4) carrying out classification prediction in the target domain by using the classifier obtained by the source domain training, thereby applying the classifier in the source domain to the target domain.

(6) And obtaining a final migration learning result.

Claims

1. A transfer learning method based on a depth sparse self-coding machine is characterized by comprising the following steps: the method sequentially comprises the following steps:

Then, calculating the eigenvector of the covariance matrix, and forming a matrix U according to the column as shown in the following formula:

x_ZCAwhite＝Ux_PCAwhite(1),

(2.1.1), the sparse self-coding model uses a Reconstruction Independent Component Analysis algorithm (RICA) to obtain x by using a formula (1)_ZCAwhiteIs input data of the RICA algorithm and substitutes the cost function formula (2):

in the cost function formula (2), W is a weighting matrix;

(2.1.2) solving for x in the cost function formula (2)_ZCAwhiteWherein when the first term of the cost function (2) is subjected to partial derivation, the partial derivation of (1) is adopted

(2.1.3) iteratively calculating a weighting matrix W by using an L-FBGS algorithm to obtain a trained sparse self-coding model;

(2.2) constructing a deep sparse self-coding machine model:

Repeating step (2.1) as input data to obtain W^(l)Training a weighting matrix obtained after stacking the sparse self-coding machine, wherein l is the number of times of the iteration step (2.1);

pooling the square root of the model square root with the weighting matrix W obtained in step (2.1)^(l)Substituting the formula (4) for convolution feature extraction, wherein the formula (4) is as follows:

in the formula (4), the first and second groups,

represents the input of the z-th layer in the convolutional network,

the error term for the z +1 th layer representing the kth feature in the convolutional network, the output from equation (4) is denoted as x_feaAbstract features extracted from the raw input data;

(3) and optimizing features by semi-supervised learning:

in the formula (5), the first and second groups,

representing a reconstruction error from the input data of the step (3) to data re-represented after feature extraction;

representing the source domain distribution and the target domain distribution and the K L distance between the target domain distribution and the source domain distribution;

represents the weight matrix W_SSLThe constraint term of (2);

in the formula (6), σ (z') is sigmoid function, and has the same function as input z

Outputting the finished product of the step (3)

Output of the training data set in (1)

And a label substitution (6) training classifier for y, the training image dataset, of known labels;

Output of the test data set in (1)

Obtaining a classification result T of the test image dataset, in place of the L R classifier performed by equation (6)_testAs shown in equation (7):