CN110516718B

CN110516718B - Zero sample learning method based on deep embedding space

Info

Publication number: CN110516718B
Application number: CN201910740748.8A
Authority: CN
Inventors: 魏巍; 张磊; 聂江涛; 王聪; 张艳宁
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2019-08-12
Filing date: 2019-08-12
Publication date: 2023-03-24
Anticipated expiration: 2039-08-12
Also published as: CN110516718A

Abstract

The invention discloses a zero sample learning method based on a deep embedding space, which is used for solving the technical problem that the existing zero sample learning method is poor in generalization capability. The technical scheme includes that an effective deep intermediary embedding space is learned through a deep learning technology, semantic category description and image information description of known categories and unknown categories are mapped into the deep intermediary embedding space through a trained deep network, and finally, features in the embedding space are classified through corresponding classifiers to obtain corresponding prediction labels. In the prediction process, a mapping network self-learning algorithm is adopted, the generalization capability is effectively improved, and the classification accuracy of unknown class samples is improved.

Description

Zero sample learning method based on deep embedding space

Technical Field

The invention relates to a zero sample learning method, in particular to a zero sample learning method based on a deep embedding space.

Background

In recent years, deep neural networks have achieved significant success in many computer vision applications, such as target recognition, detection, and the like. The key point of success is that based on a large number of learning examples with marks, a supervised learning method is utilized, and the extremely strong nonlinear fitting capacity of a deep neural network is fully exerted to mine the complex structural relationship existing between task input and task output. However, in practical applications, because the artificial labeling of the learning samples requires high cost, especially in relatively complex tasks such as semantic segmentation, etc., it is often difficult to obtain sufficient labeled learning samples, and even in many applications, any labeled learning samples cannot be obtained (for example, for newly emerging substances, or unknown environments, etc.), thereby seriously affecting the generalization ability of the deep neural network.

The zero sample learning-based method proposed in the documents "Y.Anandani and S.Biswas.Preserving semiconducting relationships for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7603-7612, 2018" can effectively solve the above problems. Different from the traditional supervised learning, in zero sample learning, each class is associated with a specific semantic description, and the learning aims to realize accurate classification and identification of samples of unknown classes (without any labeled training samples) by mining the relation between the samples in the classes and the corresponding semantic descriptions. The key of zero sample learning lies in learning an effective embedding space, which can accurately establish the structural relationship between the category and the corresponding semantic description, and generalize to the unknown category and the associated semantic description. However, the existing zero sample learning model cannot fully consider the structural characteristics embedded in the space, and thus is generally affected by the problems of hubness and bias towards seconds classes, and the generalization capability is limited.

Disclosure of Invention

In order to overcome the defect that the existing zero sample learning method is poor in generalization capability, the invention provides a zero sample learning method based on a deep embedding space. The method learns an effective deep intermediary embedding space through a deep learning technology, maps semantic class descriptions and image information descriptions of known classes and unknown classes into the deep intermediary embedding space through a trained deep network, and classifies features in the embedding space through a corresponding classifier to obtain a corresponding prediction label. In the prediction process, a mapping network self-learning algorithm is adopted, the generalization capability is effectively improved, and the classification accuracy of unknown class samples is improved.

The technical scheme adopted by the invention for solving the technical problems is as follows: a zero sample learning method based on a deep embedding space is characterized by comprising the following steps:

step one, representing a training set with N samples as

Wherein

Indicating that the ith image sample is b in length and corresponds to a category label of->

And->

Then the label set for all known categories is represented. During testing, the goal of zero sample learning is to predict the new sample x _j Affiliated category label>

Represented is a set of labels of all unknown classes, and +>

In respect of each known class->

Or unknown class->

Has a corresponding semantic description->

Or->

Step two, establishing a two-branch deep embedded network, wherein one branch is an image mapping branch, the branch network is a preprocessed deep convolution network, and the input of the branch network is the extracted image characteristic x _i Then, through a multi-layer perceptron

To learn the image feature x _i A mapping process embedded into the implicit space. The other branch of the two-branch network is a semantic class mapping branch which is also based on a multi-level perceptron->

Based on semantic description information->

Mapping into the same implicit embedding space. The loss function of the two-branch network is defined in the form,

wherein, theta _v And theta _s The parameters of the multi-layer perceptron involved in the two-branch network are shown, while W refers to the parameters of the linear classifier to be learned, otherwise

It refers to classification loss, where a cross-entropy function is chosen as a way to compute the classification loss. To avoid overfitting, l is adopted ₂ The norm limits all parameters and is constrained by η weighting. The loss function is optimized and solved through a back propagation algorithm, and therefore the corresponding network parameter theta is obtained _v And theta _s . At the obtained parameter theta _v And theta _s Thereafter, the test sample is->

The predicted label of (a) is expressed as,

where z represents the semantic description information for tag y.

Step three, giving a test sample, firstly, the rootPredicting the pseudo label of the test sample set according to the embedding space learned in the step two, and then generating the pseudo label and the image-semantic difference, namely

The M test samples in the test sample set which are closest to the pseudo label are selected, M =40, and the selected samples and the pseudo labels assigned thereto are manually combined as new training data into the training set->

In, the expanded training set is obtained>

And step four, after the trained mapping network and the trained classifier are obtained, in order to avoid the phenomenon that the predicted label of the unknown sample is biased to the label of the known sample due to the learned deep embedding space, an adaptive adjustment model is adopted to solve the problem. The new optimization objective function is expressed as

Wherein C represents the number of unknown classes,

indicates the ith selected test sample, <' > is selected>

And &>

Respectively indicate the extended training set->

The corresponding pseudo label in (1) and the semantic description of the category to which it belongs.

The invention has the beneficial effects that: the method learns an effective deep intermediary embedding space through a deep learning technology, maps semantic class descriptions and image information descriptions of known classes and unknown classes into the deep intermediary embedding space through a trained deep network, and classifies features in the embedding space through a corresponding classifier to obtain a corresponding prediction label. In the prediction process, a mapping network self-learning algorithm is adopted, the generalization capability is effectively improved, and the classification accuracy of unknown class samples is improved.

The present invention will be described in detail with reference to the following embodiments.

Detailed Description

The zero sample learning method based on the deep embedding space comprises the following specific steps:

1. and (4) preprocessing data.

Representing the training set with N samples as

Training sample set having a size N, wherein>

Represents the ith image feature vector with length b and the corresponding class label->

A set of tags representing all known categories. During testing, zero sample learning aims at predicting new sample x _j Affiliated category label>

A set of tags representing all unknown classes, and +>

In respect of each known class->

Or unknown class->

There is a corresponding semantic feature vector z that describes the feature of the class, which is/are greater than>

Represents a semantic feature vector in the training set, <' >>

Representing semantic feature vectors in the test set. Taking the AwA data set as an example, the data set comprises 30,745 pictures of 50 different animal species, wherein the semantic feature vector of each semantic category

The performance of this type of animal was characterized in 85 different characteristics. A training set of the data set->

Is taken as an image sample>

The length of the feature vector obtained after ResNet101 processing of the corresponding picture in the data set is 2048, and the sample data x in the corresponding test set _j Have the same shape.

2. And (5) deep embedded network training.

After data preprocessing, the image features and the category semantic features need to be respectively mapped into the same implicit deep embedding space through establishing a deep network, and the space can enable the embedded image features and the category semantic features to meet the intra-class compactness and the inter-class separability. Mapping of image features and category semantic features to an implicit depth embedding space is realized by establishing a two-branch depth embedding network, wherein one branch is image mapping branch, and the other branch is image mapping branchCategory semantic feature mapping branches. The invention respectively learns the mapping process of the two characteristics to the embedding space by a multilayer perceptron. The image mapping branch network can be expressed as

The network combines the image features x _i Mapping to implicit space. Theta _v Mapping parameters of the branched network for the image, and x _i Then the ith image feature vector is represented, and the branched multi-layered perceptron is implemented by a Fully Connected Layer (FC) plus a linearly Rectified Layer (ReLU), where the input and output channel sizes of the Fully Connected Layer are 2048 and 1024, respectively.

Another category semantic feature mapping branch network can be expressed as

The branch network combines the semantic description information>

Mapping into the same implicit embedding space. Wherein theta is _s The category semantic feature maps a parameter of the branch network and->

The class semantic feature vector corresponding to the training sample is represented, and the multi-layer perceptron of the branch is realized by two fully-connected layers and two linear rectification layers. The two fully-connected layers are connected in series, and each fully-connected layer is followed by a linear rectifying layer, wherein the input channel size of the first fully-connected layer is the size of the semantic feature->

For an AwA data set of size 85, the output channel of the fully-connected layer is sized->

And the output channel size of the second fully-connected layer is 10The input channel size is the same as the output of the first fully connected layer 24.

The error function of the deep-embedded network is defined as,

where W is the parameter of the linear classifier learned by the proposed network structure during the training process, W ^T Is a transpose of the classifier parameters. In addition, the

Refers to a classification loss function for calculating the difference between the classification result and the correct result of the linear classifier on the training sample, and a cross entropy function is selected as a method for calculating the classification loss. Lambda is taken as the equilibrium coefficient, the value range is (0.1, 0.3), and l is adopted to avoid overfitting ₂ The norm limits all learnable parameters and is constrained by η weighting. Formula (1) is optimized and solved through a typical back propagation algorithm, so that a corresponding network parameter theta is obtained _v And theta _s . In the training process, the learning rate is set to be 1e-4, and the cycle number is T =50.

After the corresponding network has been learned, the test samples can be classified by

Obtained by

I.e. test sample x _j The predictive tag of (1).

3. And (5) data set expansion.

Calculating the distance between the image feature vector and the category semantic feature vector in the learned depth interpolation space by the formula (2), and dividing M image feature vectors with the minimum distance from the category semantic feature vector into the category and assigning the M image feature vectorsThe pseudo label expands the training data set, the expanded training set

Can be expressed as

M denotes the number of selected pseudo-tagged test samples, C is the number of unknown classes, M =40 in the present invention, C varies from data set to data set and is 10 on the AwA data set.

Indicating the ith sample assigned a pseudo label. Additionally, is>

The corresponding pseudo label of the M samples, i.e. the prediction label obtained according to equation (2) corresponding to the test sample, is represented and used

To represent the semantic description of the category to which the pseudo-tag belongs.

4. And (4) self-adaptive learning of the mapping network.

In most zero sample learning, only samples in known classes can be considered as training samples to learn the embedding space, and therefore, the learned embedding space generates a phenomenon that the prediction labels of unknown samples bias towards the labels of known samples. To better solve this problem, a deep embedding space adaptive tuning model is adopted, which can apply unlabeled test data to the training of the model to improve the classification accuracy.

Training set after obtaining expansion

Then, the objective function of the adaptive tuning model is expressed as

Where C represents the number of unknown classes. Step 4 Using the extended training set

The learned mapping network is adaptively adjusted by the data in (1). After each round of adjustment, the combination is paired>

And (4) updating the data set according to the step 3, wherein the total updating turn is R =10, and the learning rate of the mapping network in the self-adaptive adjustment process is 1e-4. After that, by updating the parameter theta _v And theta _s Substituting into equation (4) can be applied to the corresponding test sample x _j The label of (2) is predicted.

The method of the invention compares the PSR method proposed in the paper "Y.Anandani and S.Biswas.Preserving semiconducting relationships for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7603-7612, 2018" with the RN method proposed in the background art method on the AwA data set, and the experimental result shows that the proposed method has better performance, for example, under the conventional zero-sample learning experiment, the global classification accuracy of the proposed method on the AwA data set for unknown samples is higher than the PSR of the existing best method by 2.7%. In the general zero-sample learning experiment, the classification accuracy for unknown samples on the same AwA dataset was also 5.2% higher than that of the background art method RN.

Claims

1. A zero sample learning method based on a deep embedding space is characterized by comprising the following steps:

step one, representing a training set with N samples as

Wherein

Indicates that the ith image sample has a length of b and a corresponding class label of

While

Then the label set of all known categories is represented; during testing, zero sample learning aims at predicting new sample x _j Class label to which

Representing a set of labels of all unknown classes, an

About each known class

Or unknown class

All have a corresponding semantic description

Or

To learn the image feature x _i A mapping process embedded into an implicit space; the other branch of the two-branch network is a semantic class mapping branch which is also passed through a multi-layer perceptron

Describing information semantically

Mapping into the same implicit embedding space; the loss function of the two-branch network is defined in the form,

Then it refers to classification loss, where a cross entropy function is chosen as the method to compute the classification loss; to avoid overfitting, l is adopted ₂ Norm to limit all parameters and is constrained by η weighting; the loss function is optimized and solved through a back propagation algorithm, and therefore the corresponding network parameter theta is obtained _v And theta _s (ii) a At the acquisition of the parameter theta _v And theta _s Thereafter, the sample is tested

The predicted label of (a) is expressed as,

wherein z represents semantic description information of a label y;

step three, giving a test sample, predicting a pseudo label of the test sample set according to the embedding space learned in the step two, and then predicting the pseudo label of the test sample set according to the generated pseudo label and the image-semantic difference, namely

Selecting M test samples closest to the pseudo label in the test sample set, wherein M =40, and manually combining the selected samples and the pseudo label given to the selected samples into a training set as new training data

In the training set, the extended training set is obtained

After the trained mapping network and classifier are obtained, in order to avoid the phenomenon that the prediction label of the unknown sample is biased to the label of the known sample caused by the learned deep embedding space, a self-adaptive adjustment model is adopted to solve the problem; the new optimization objective function is represented as:

wherein C represents the number of unknown classes,

indicating the ith selected test sample,

and

then respectively is formedRepresenting extended training sets