CN107463996B

CN107463996B - Self-walking-collaborative training learning method for people re-marking

Info

Publication number: CN107463996B
Application number: CN201710413595.7A
Authority: CN
Inventors: 孟德宇; 谢琦; 马凡; 李梓娜; 赵谦
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2017-06-05
Filing date: 2017-06-05
Publication date: 2021-11-16
Anticipated expiration: 2037-06-05
Also published as: CN107463996A

Abstract

A self-walking-cooperative training learning method. Firstly, acquiring data from two visual fields of a target problem, including a small amount of marked data and a large amount of unmarked data, and initializing a model; respectively establishing corresponding optimization targets on the two visual fields; embedding a self-stepping regular term in the loss function of each visual field to realize the steady learning under the visual field; associating two fields of view by a regularization term; obtaining a multi-view semi-supervised self-step-collaborative training model which is embedded with a steady learning mechanism and has model interpretation; and a small amount of labeled data, a large amount of unlabeled data and a semi-supervised multi-view learning model in the target field are applied to obtain high-quality labels of the unlabeled data, and a reliable learner under two views can be obtained at the same time. The invention aims to provide a steady learning model with a replacement mode for the traditional collaborative training algorithm, so that label-lacking data in the target field can be labeled more accurately and with high quality.

Description

Self-walking-collaborative training learning method for people re-marking

Technical Field

The invention relates to a multi-view semi-supervised learning model and a method, in particular to a novel self-walking-collaborative training model and a learning method.

Background

There is a lot of real-time data in the internet, such as news, pictures, videos, etc., but most of these data only have a vaguer description about the event, and some have no labeled information at all. When we want to perform a query or classification task, in the conventional machine learning algorithm, this part of unlabeled information or weakly labeled data is not used substantially, resulting in a large loss of available information. This type of data is characterized by a large amount of unlabeled data, and by the limited amount of available labeled data. Therefore, how to mine information in label-free data becomes a technology which has emerged in recent years in the field of machine learning. On the premise of fully utilizing the labeled data, the information is extracted from the non-labeled data as accurately as possible, and then high-quality labeling is carried out on a large amount of non-labeled data.

Semi-supervised learning is a type of learning method that extracts structural information from unlabelled data by using supervised information of labeled data. The semi-supervised learning can be divided into semi-supervised classification, semi-supervised clustering and semi-supervised regression according to different target tasks, and a plurality of related semi-supervised methods are already provided on the basis of different assumptions, so that a good effect is achieved in practical problems. The cooperative training method is a very classical multi-view semi-supervised learning method. The method is applied to data with two visual fields, and features under the two visual fields can complement each other to jointly describe a sample. Such data is particularly widespread, for example for certain picture data, the content of the picture and the link of the picture can be used as two views describing the picture. The method is based on the principle that different visual fields assist each other, a small amount of labeled data are utilized to respectively train two weak learners under the two visual fields, then corresponding learners are used for adding pseudo labels to non-labeled data under a single visual field, a part of pseudo labeled data is selected as training data of the other visual field, the learners under the two visual fields are respectively labeled and mutually supplemented, further, the learning effects of the two learners can be fully improved, and finally, the strong learners of the two visual fields are obtained, and high-quality labeling can be carried out on the non-labeled data.

A series of multi-view semi-supervised learning methods are derived based on the principle of cooperative training, and can be mainly divided into two categories: one is an iterative training process that maintains coordinated training, but employs different confidence criteria when marking samples; the other type is to embed the information of the other visual field as a regular term in the objective function of the current visual field. However, the conventional co-training algorithm still has the following problems. Firstly, it is a very heuristic algorithm, which needs to make assumptions about the pseudo-labeling accuracy of the learning process in advance, i.e. wrongly labeled examples can be identified by the learner, or the labeling given by the learner each time is very reliable. Based on such assumptions, most co-training algorithms do not re-label after pseudo-labeling of label-free data. However, the subjective assumption cannot be verified, and is basically difficult to satisfy, because in the actual training process, the initial learners are trained by using only a small amount of labeled data, and the confidence level of the pseudo labels given by the weak learners is not high, thereby further reducing the labeling precision of the learners. In addition, the algorithm adopts a 'replacement-free' data label updating mode, namely, the data is added into the learning process all the time after being subjected to pseudo-labeling. However, as described above, in the semi-supervised learning process, particularly in the early stage of learning, many pseudo labels are not highly reliable, and there is a high possibility that an error occurs in the label. Therefore, a more reasonable update mode should be a "replacement" mode, i.e. the algorithm should replace the incorrectly labeled samples in time. In addition, it is very important for a machine learning method to have a machine learning optimization model that can explain its intrinsic meaning, which is one of the three basic elements of machine learning (i.e. training data, decision function, performance metric or optimization goal), whereas the traditional collaborative training methods basically lack a perfect model interpretation.

Therefore, in order to realize high-quality labeling of multi-view data, a multi-view collaborative training method which can perform robust learning and has an optimized model is provided, and is a very important problem in the field of machine learning and semi-supervised learning. The invention well solves the problems of the multi-visual field cooperative training at present.

Disclosure of Invention

The invention aims to provide a novel self-walking-collaborative training learning method for realizing multi-view data high-quality labeling.

In order to achieve the purpose, the invention adopts the technical scheme that:

step S1: acquiring a marked data set and a non-marked data set under two visual fields of a target field;

step S2: determining optimization targets under two visual fields;

step S3: embedding self-learning mechanisms into the loss functions of the two views respectively;

step S4: introducing self-stepping regular terms related to the two visual fields according to the similarity of the same sample under the two visual fields;

step S5: combining the steps S2, S3 and S4, constructing a multi-view semi-supervised learning model embedded with a robust mechanism, and calling the model as a self-walking-collaborative training model;

step S6: and (4) taking all the data of the two visual fields obtained in the step (S1) as input, solving the self-walking-collaborative training model constructed in the step (S5) by using an alternating optimization algorithm, and finally obtaining the learner with high-quality labeling and final optimization of the unlabeled data.

The annotation data set obtained in step S1 is:

the label-free dataset is:

wherein

Is the feature vector of the ith sample in the jth field of view, d_jIs the dimension of the feature space in the jth field of view, y_i(i-1, …, l) is the common label of the ith sample in both views, l is the number of samples of the labeled dataset, and u is the number of samples of the unlabeled dataset.

The two visual field optimization objectives in step S2 are expressed as follows:

where the superscript j denotes the jth field of view, g^(j)(x, w) is the learner in this field of view, w^(j)Is a parameter of the learner, l (·,) is a loss function,

is the feature vector of the ith sample in the jth field of view, y_i(i-1, …, l) is the common label for the ith sample in both views, y_k(k ═ l +1, …, l + u) is a pseudo label with no label data.

The step S3 embedded self-learning objective function is as follows:

where the superscript j denotes the jth field of view,

is a sample under the field of view

L +1, …, l + u,

indicates that there are no labeled samples in training the learner in the jth view

Is selected as a training sample, otherwise this sample is not selected into the training data set, and f (v, λ) — v λ is a "hard" form of a self-walking regularization term, λ is a self-walking regularization parameter, a larger value indicating that more complex samples are selected.

In the step S4, the correlation between the two views is defined by the regularization term-gamma (V)⁽¹⁾)^TV⁽²⁾In which V⁽¹⁾,V⁽²⁾Is a u-dimensional vector representing the weights corresponding to the unlabeled samples in the two views, respectively, the i-th element being

The two fields of view have such consistency that a sample selected in one field of view is a confident sample and a sample selected in the other field of view is also confident.

In the step S5, the final self-walking collaborative training model obtained by combining the steps S2-S4 is as follows:

where γ is a parameter that controls the degree of correlation of the fields of view, a larger value indicates a stronger correlation between the two fields of view, i.e., the unlabeled sample selected as training data in one field of view is selected in the other field of view.

The step S6 of solving the self-step collaborative training model in the step S5 by using an alternating optimization algorithm comprises the following steps:

s1) initialization

First get V⁽¹⁾And V⁽²⁾Is R^uZero vector of (1), set λ⁽¹⁾，λ⁽²⁾The value is smaller, therefore, only a small number of unlabeled samples are selected as training samples in the first step of iteration, and gamma is set to be 1;

the two learners simultaneously carry out learning and updating on the marked samples under respective visual fields to predict the labeled values of the unmarked samples, and in order to obtain reliable prediction of the unmarked samples, the labeled values are the average of the predicted values under the two visual fields, and then the loss values of the unmarked data under different visual fields are obtained;

s2) updating the optimization variables according to the alternative optimization algorithm

In one iteration, for j ═ 1,2, the following optimization sequence is adopted:

where k ═ l +1, …, l + u is the corner mark of the sample;

s2.1) update

The purpose of this step is to select the jth field sample, and provide guidance for the selection of the 3 rd-jth field unmarked sample;

(2) equation is equivalent to solving the following optimization problem:

wherein

Is a sample

Loss value at jth view;

(6) form pair

The partial derivative is calculated to obtain the following formula:

thereby obtaining

The update formula of (2) is as follows:

selecting reliable samples from the unlabeled dataset in the jth view according to equation (7), weights of the reliable samples

The higher the credibility of the pseudo label is, the more easily the corresponding sample is selected in the step;

on the first iteration and j equals 1, all

Is set to 0 according to the initial step,the sample selection is therefore based only on the loss information in the first field of view, i.e. the loss value is less than λ⁽¹⁾Is considered as a credible sample, otherwise, the selection is carried out according to the loss of the current visual field and the guide information of another visual field;

s2.2) update

The purpose of this step is to update the training data set in the 3 rd to j th views

(3) Equation is equivalent to solving the following optimization problem:

wherein

Is a sample

Loss values at 3-j views;

(8) form pair

By calculating the partial derivatives, the following formula can be obtained:

thereby obtaining

The update formula of (2) is as follows:

selecting reliable samples from the unlabeled dataset at view 3-j according to equation (9), reliable samplesWeight of the book

The selected sample is directly used for the training of the learner in the visual field;

s2.3) updating w^(3-j)

(4) Equation is equivalent to solving the following optimization problem:

updating the classifier under the view field on the marking data and the pseudo-marking data selected in the previous step;

s2.4) updating y_k

The purpose of this step is to update the pseudo-label of the unlabeled sample

(5) Equation is equivalent to the following optimization problem:

the above optimization problem has a global optimal solution, and for the ith sample, the label value y thereofⁱIs a weighted sum of the learner's predicted values under both views;

s2.5) increasing λ^(j)

The number of the credible samples in each circulation is increased by controlling the number of the samples. Assuming that the number of positive and negative type samples selected in the initialization step is a, b, respectively, then after S2.4) is performed for the k-th time, the selected positive and negative type samples are a × k, b × k, respectively;

and when the unlabeled samples are all selected into the training data set or the preset maximum iteration step number is reached, stopping the algorithm, and obtaining high-quality labels of all unlabeled data and two final optimized learners.

On one hand, a self-walking learning mechanism is embedded into each visual field, so that steady learning under a single visual field is realized; on the other hand, the learning processes of the two visual fields are associated, so that the two visual fields can guide each other, and efficient learning of the target task is achieved.

Compared with the prior collaborative training method, the method mainly has the following advantages: 1) a clear optimization model is provided, and convenience is provided for exploring the internal mechanism of the collaborative training algorithm;

2) in the algorithm circulation process, the updating process of the pseudo label of the unmarked sample adopts a replacing updating mode, so that the damage of the pseudo label to the performance of the learner, which is caused by a weak learner obtained by the initial training of the traditional replacing-free mode, is avoided; 3) the collaborative training connotation provided by the model can be directly explained through the model, any subjective theoretical assumption is not needed, understanding is easy, and popularization of a collaborative training mechanism and method to general users is facilitated.

Drawings

The invention is further illustrated by means of the attached drawings, which however do not constitute any limitation to the invention.

FIG. 1 is a diagram showing the mechanism of model construction according to the present invention.

FIG. 2 is a flow chart of an alternative optimization algorithm of a corresponding model of the present invention.

Detailed Description

The invention is further described with reference to the following examples.

Example 1

Table 1 is a description table of six sets of text data.

Table 1: example 1 Experimental data

Using six text data sets as shown in Table 1 as the subjects of the invention, all samples were manually divided into two fields. Each data set has two categories, the structural features of which are illustrated in table 1.

Table 2 is a precision table for classification on six groups of text data using seven semi-supervised methods including the present invention, respectively.

Referring to FIG. 1, step S1 is to read the text data and select 2 from the positive class sample and the negative class sample, respectively, for the first data set in Table 1^k、3·2^kOne sample is a marked sample and the remaining samples are treated as unmarked samples. For the second, third and fourth data sets in Table 1, the positive and negative classes are selected to be 2 respectively^k、6·2^kAn example is a marked sample. For the last two data sets in Table 1,2 and 2 are selected for positive and negative classes respectively^k+1、2·2^kThe individual samples are marked samples;

step S2: determining optimization targets of the two visual fields;

the optimization objectives in the jth view are:

representing the prediction function in the field of view, here taken as a linear function, i.e.

Since text classification in this example is a binary problem, the Hingloss loss function can be chosen, i.e.,/(y,g^(j)(x,w))＝max(0,1-y(x^Tw)). For the sake of descriptive convenience, the general notation l (y, g) is used^(j)(x, w)) without writing a specific form of the loss function;

step S3: respectively embedding a self-step learning mechanism into the loss functions of the two views, and selecting a proper self-step regular term according to the requirement;

here, the "hard" self-step regularization term f (v, λ) — v λ is chosen, so the objective function embedded in the self-step learning mechanism under each view is as follows:

where the superscript j denotes the jth field of view,

is the weight of the kth sample in the field of view,

representing a sample

A training data set is selected and used,

indicating unselected samples

Step S4: introducing a regular term associated with the two views according to the similarity of the same sample under the two views;

the regularization term here is applied to the weight vector of the sample, having the form:

-(V⁽¹⁾)^TV⁽²⁾

step S5: combining the steps S2, S3 and S4 to construct a self-step collaborative training model;

the objective function here is:

wherein gamma is a parameter of the view-related term, a larger value indicates a stronger correlation between the two views;

referring to fig. 2, step S6: taking all the data of the two views obtained in the step S1 as input, and applying the self-step collaborative training model in the step S5 to obtain a high-quality label of the unlabeled data and a final optimized learner;

the method comprises the following specific steps:

s1) initialization

V⁽¹⁾And V⁽²⁾Is taken as RⁿThe zero vector of (1). Firstly setting lambda⁽¹⁾，λ⁽²⁾The value is smaller, therefore, only a small number of unlabeled samples are selected as training samples in the first step of iteration, and gamma is set to be 1;

the two learners learn in the labeled samples, and thus the loss value of each unlabeled sample can be obtained. In order to obtain a reliable prediction of the unlabeled sample, its labeled value is the average of the labeled values in the two views;

s2) using an alternative optimization algorithm to solve the self-walking collaborative training model in the step S5

In one loop, for j ═ 1,2, the solution process takes the following iterative format:

where k ═ l +1, …, l + u is the corner mark of the unlabeled sample;

s2.1) update

(2) The formula has a display solution:

selecting reliable samples from the unlabeled dataset in the jth view according to equation (6), weights of the reliable samples

S2.2) update

(3) The formula is shown below:

selecting reliable samples from the unlabeled dataset in view 3-j according to equation (7), weighting of reliable samples

s2.3) updating w^(3-j)

(4) Equation is equivalent to solving the following optimization problem

The objective function is actually a standard SVM optimization problem, so that the existing SVM relevant toolkit can be used for solving to obtain an updated learner in the visual field;

s2.4) updating y_k

(5) Equation equivalent to the following optimization problem

Direct comparison of y_kThe loss function at 0,1 can be solved optimally, and then all unmarked samples are marked again;

s2.5) increasing λ^(j)

The number of the credible samples in each circulation is increased by controlling the number of the samples. Let the number of the positive and negative samples selected at the beginning be a, b respectively, then after the step S2.4) is performed for the q-th time, the number of the positive and negative samples selected is a × q, b × q respectively;

and when the unmarked samples are all selected into the training data set or the preset maximum iteration step number is reached, stopping the algorithm, and obtaining the high-quality labels of all the unmarked data and the finally optimized classifier.

Example 2

Table 3 is a table of the precision of Person re-labeling (Person re-Identification) on the Market-1501 dataset using three multi-view semi-supervised methods including the one invented.

The Market-1501 dataset is used in this example for the Person re-identification task. Person re-ID refers to a type of task in which, for a Person captured by a camera, it is determined whether the Person is captured in other cameras. The Market-1501 data set includes 1501 individual 32668 photographs. Each person's picture is captured by six cameras at most, and two cameras at least. Here 12936 cropped photographs containing 751 persons were selected as the training data set and 19732 cropped photographs containing 750 persons were selected as the test data set.

Feature extraction is performed on the training data and the test data set, and different networks such as cafenet, Googlenet and Vggnet are adopted to obtain different features. The different networks extract features as different views. Two combinations are used here: cafenet and Googlenet, Googlenet and vgnet. For a person who we want to identify, the picture containing the person is marked as positive, otherwise, as negative, and then the multi-view marking data required in this document can be obtained. And randomly selecting 20% of data in each class as marked data, and using the rest data as unmarked data.

Step S1, reading the multi-view semi-supervised data as input data;

step S2: determining optimization targets of the two visual fields;

since the discriminant function chosen in this example is implemented by a neural network, we choose the cross-entropy function as the loss function, with the objective function as follows:

where the superscript j denotes the jth field of view,

representing a sample of neural network decisions in the field of view

Probability of being positive class, p_i＝p(y_i1) is a sample

Probability of being positive class, p_i∈{0,1}，w^(j)Is a network parameter. For convenience of description, the following steps are generally denoted by common reference numerals

Is a sample

A loss function of (d);

represents the weight of the kth sample in the jth view,

representing a sample

A training data set is selected and used,

indicating unselected samples

-(V⁽¹⁾)^TV⁽²⁾

step S6: taking all the data of the two views obtained in the step S1 as input, and applying the steady learning model in the step S5 to obtain a high-quality label of the unlabeled data and a final optimized learner;

the method comprises the following specific steps:

s1) initialization

V⁽¹⁾And V⁽²⁾Is taken as RⁿZero vector of (1), set λ⁽¹⁾，λ⁽²⁾The value is smaller, therefore, only a small number of unlabeled samples are selected as training samples in the first step of iteration, and gamma is set to be 1;

and the two learners learn the marked samples to obtain the loss value of each unmarked sample. In order to obtain a reliable prediction of the unlabeled sample, its labeled value is the average of the labeled values in the two views;

s2) using an alternative optimization algorithm to solve the adaptive collaborative training model in step S5, in a loop, for j ═ 1,2, the solution process adopts the following iterative format:

where k is l +1, …, and l + u is the corner mark of the unlabeled sample.

S2.1) update

(10) The formula has a display solution:

selecting reliable samples from the unlabeled dataset in the jth view according to equation (14), weights of the reliable samples

S2.2) update

(11) The formula is shown below:

selecting reliable samples from the unlabeled dataset in view 3-j according to equation (15), weights of the reliable samples

The samples selected in the step are directly used for training the network under the visual field;

s2.3) updating w^(3-j)

(12) Equation is equivalent to solving the following optimization problem

w^(3-j)Parameters in the learner network under the 3 rd to j th vision field, so that the BP algorithm is adopted to solve the network parameters;

s2.4) updating y_k

(13) Equation equivalent to the following optimization problem

The above optimization problem has a global optimal solution:

re-pseudo-labeling all unlabeled samples according to equation (17);

s2.5) increasing λ^(j)

The selection of unlabeled samples by control is based on two considerations: in order to ensure that enough training samples exist, the number of unmarked samples selected in each iteration cannot be too small, the lower bound is set to be 1000, and in order to reduce the influence of overlarge noise on the training effect, the upper bound of the number of unmarked samples is set to be 2000;

and when the unmarked samples are all selected into the training data set or the preset maximum iteration step number is reached, stopping the algorithm, and obtaining the high-quality labels and the finally optimized network of all the unmarked data.

Claims

1. A method for learning a self-paced and co-trained person re-labeling on a Market-1501 data set, said Market-1501 data set comprising 32668 photos of 1501 people, each photo of a person being captured by at most six cameras and at least two cameras, 12936 cropped photos of 751 people being selected as a training data set, and 19732 cropped photos of 750 people being selected as a test data set, said person re-labeling being that for a person captured by a camera, it is determined whether the person is captured in another camera, comprising the steps of:

step S1: acquiring a marked data set and a non-marked data set under two views in a target field, wherein for a certain person needing identity marking, pictures containing the person are marked as a positive type, otherwise, the pictures are marked as a negative type, multi-view marking data is obtained, 20% of data in each type are randomly selected as marked data, and the rest data are used as non-marked data;

step S2: determining optimization targets under two visual fields;

step S6: taking all the data of the two views obtained in the step S1 as input, solving the self-walking-collaborative training model constructed in the step S5 by using an alternating optimization algorithm, and finally obtaining a learner with high-quality labeling and final optimization of unlabeled data;

in step S5, combining steps S2-S4, a final self-paced collaborative training model is obtained as follows:

wherein gamma is a parameter controlling the degree of correlation of the fields of view, a larger value indicates a stronger correlation between the two fields of view, i.e. a label-free sample selected as training data in one field of view is selected in the other field of view;

s1) initialization

where k ═ l +1, …, l + u is the corner mark of the sample;

s2.1) update

(2) equation is equivalent to solving the following optimization problem:

wherein

Is a sample

Loss value at jth view;

(6) form pair

The partial derivative is calculated to obtain the following formula:

thereby obtaining

The update formula of (2) is as follows:

on the first iteration and j equals 1, all

Set to 0 according to the initial step, so the sample selection is only based on the loss information in the first field of view, i.e. the loss value is less than λ⁽¹⁾Is considered as a credible sample, otherwise, the selection is carried out according to the loss of the current visual field and the guide information of another visual field;

s2.2) update

(3) Equation is equivalent to solving the following optimization problem:

wherein

Is a sample

Loss values at 3-j views;

(8) form pair

By calculating the partial derivatives, the following formula can be obtained:

thereby obtaining

The update formula of (2) is as follows:

selecting reliable samples from the unlabeled dataset in view 3-j according to equation (9), weights of the reliable samples

s2.3) updating w^(3-j)

(4) Equation is equivalent to solving the following optimization problem:

s2.4) updating y_k

The purpose of this step is to update the pseudo-label of the unlabeled sample

(5) Equation is equivalent to the following optimization problem:

s2.5) increasing λ^(j)

Increasing the number of credible samples in each circulation by a method of controlling the number of samples; assuming that the number of positive and negative type samples selected in the initialization step is a, b, respectively, then after S2.4) is performed for the k-th time, the selected positive and negative type samples are a × k, b × k, respectively;

when the unmarked samples are all selected into the training data set or the preset maximum iteration step number is reached, the algorithm is stopped, and then high-quality labels of all unmarked data and two final optimized learners are obtained;

wherein, w^(j)Is a parameter of the learner;

Selecting the sample as a training sample, otherwise, indicating that the sample is not selected into a training data set; λ is a self-stepping regularization parameter; lambda [ alpha ]^(j)Controlling a self-learning canonical parameter selected by the sample under the jth view; g^(j)(x, w) is the learner in the field of view, l (·,) is a loss function,

is the feature vector of the ith sample in the jth view, where i is 1, …, l + u; j is 1, 2; v⁽¹⁾,V⁽²⁾Are u-dimensional vectors representing the weights corresponding to the unlabeled samples in the two views, respectively; lambda [ alpha ]⁽¹⁾Self-learning regularization parameter, λ, for control sample selection under 1 st view⁽²⁾Self-learning regularization parameters selected for the control samples under the 2 nd view;

is the weight of the sample under 3-j fields;

parameters of the learner in 3 rd-j vision; γ is a parameter controlling the degree of view correlation; y is_kIs a pseudo-label of an unlabeled sample; l is the number of samples of the annotated dataset and u is the number of samples of the unlabeled dataset.

2. The method for learning of self-paced, collaborative training of human re-labeling on a Market-1501 data set as claimed in claim 1, wherein: the annotation data set obtained in step S1 is:

the label-free dataset is:

wherein

Is the feature vector of the ith sample in the jth view, and i is 1, …, l + u; j is 1, 2; d_jIs the dimension of the feature space under the jth view; y is_iIs the common label for the ith sample in both views, and i is 1, …, l; l is the number of samples of the annotated dataset and u is the number of samples of the unlabeled dataset.

3. The method for learning of self-paced, collaborative training of human re-labeling on a Market-1501 data set as claimed in claim 1, wherein: the two visual field optimization objectives in step S2 are expressed as follows:

is the feature vector of the ith sample in the jth view, where i is 1, …, l + u; j is 1, 2; y is_iIs the common label for the ith sample in both views, where i is 1, …, l; y is_kIs a pseudo label with no label data, where k ═ l +1, …, l + u; l is the number of labelsThe number of samples of a dataset, u is the number of samples of an unlabeled dataset.

4. The method of claim 3 for learning from step-by-step collaborative training of human re-labeling on a Market-1501 data set, wherein: the step S3 embedded self-learning objective function is as follows:

where the superscript j denotes the jth field of view,

is a sample under the field of view

K ═ l +1, …, l + u,

5. The method for learning of self-paced, collaborative training of human re-labeling on a Market-1501 data set as claimed in claim 1, wherein: in the step S4, the correlation between the two views is defined by the regularization term-gamma (V)⁽¹⁾)^TV⁽²⁾In which gamma is a parameter controlling the degree of view correlation, V⁽¹⁾,V⁽²⁾Is a u-dimensional vector representing the weights corresponding to the unlabeled samples in the two views, respectively, the i-th element being