CN115861625A - Self-label modifying method for processing noise label - Google Patents

Self-label modifying method for processing noise label Download PDF

Info

Publication number
CN115861625A
CN115861625A CN202211554141.9A CN202211554141A CN115861625A CN 115861625 A CN115861625 A CN 115861625A CN 202211554141 A CN202211554141 A CN 202211554141A CN 115861625 A CN115861625 A CN 115861625A
Authority
CN
China
Prior art keywords
data
data samples
clean
label
noisy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211554141.9A
Other languages
Chinese (zh)
Inventor
张宇
林凡
米思娅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202211554141.9A priority Critical patent/CN115861625A/en
Publication of CN115861625A publication Critical patent/CN115861625A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a self-label modification method for processing noise labels, which comprises the steps of randomly selecting small-batch data samples, carrying out data enhancement processing on the data samples to obtain different views, using the different views as the input of a pseudo-twin neural network, and outputting the prediction probability of the class of the data samples; calculating JS divergence in distribution of data sample labels according to prediction of different views by different networks, and judging the possibility of the JS divergence as a clean data sample; dividing the batch of data samples into clean data samples and noisy data samples according to a given judgment threshold, only smoothing the labels of the clean data samples, and dynamically weighting the noisy data samples according to the prediction of a model and the labels of the samples to endow the noisy data samples with reliable labels; and finally, updating the model by using the classification loss function and the consistency loss function. The method is used for solving the image classification task under the label noise, and achieves a good performance effect.

Description

Self-label modifying method for processing noise label
Technical Field
The invention belongs to the technical field of computer vision, and mainly relates to a self-label modification method for processing a noise label.
Background
Deep neural networks have made tremendous progress in various computer vision tasks that do not leave large-scale datasets with reliable annotations, such as ImageNet. However, collecting well-annotated data sets is very labor, material, and time intensive, especially in the area of expertise (e.g., fine-grained classification). The high cost of acquiring large scale well-labeled data constitutes a bottleneck for the use of deep neural networks in real world scenarios. In order to alleviate the problem, a data annotation company selects a crowdsourcing mode to raise data and annotation, and data crawled from a network, or only one or a small number of annotation personnel perform annotation due to limited cost, or an alternative method such as online query is adopted to improve the labeling efficiency. Unfortunately, although the means to obtain these data is less costly and easier to implement, it often creates an unavoidable noisy signature due to unreliability of labeling from non-experts, error prone automated labeling systems or limited labeling personnel, and no way to make repeated checks.
Due to the complexity of the network structure, the overfitting capability of the deep network to the noise label is very strong, and the label sample with noise is inevitably fitted by the deep neural network, so that the performance of the model is influenced. Therefore, the research on the robust learning method of the anti-noise label is urgent.
The method for researching the noise label mainly comprises the following steps: (1) the problem is solved by estimating the underlying noise transition matrix, and the main difficulty of this kind of problem is that the noise transition matrix needs to be estimated accurately, and thus needs good a priori knowledge. (2) And designing a loss function for preventing noise, and correcting the loss according to the prediction of the deep neural network. However, such methods are prone to failure when the data set is large. (3) The deep neural network is trained using the selected or re-weighted training samples. The main challenge of such problems is to design a proper standard to identify clean data samples, and how to improve reliable clean data samples is a problem to be considered. (4) The labels of the data samples are modified, primarily in conjunction with the output of the prediction network, to modify the labels of the data samples that are considered noisy. However, how much confidence is to be given to the predicted network is a matter of consideration. (5) The noise label problem is studied in the field of semi-supervised learning. The accuracy of many semi-supervised learning classifiers can drop significantly in the presence of label noise.
Disclosure of Invention
The invention provides a self-labeling modification method for processing noise labels aiming at the problem of model performance reduction caused by the noise labels in the prior art, wherein small batches of data samples are randomly selected, the data samples are subjected to data enhancement processing to obtain different views and are used as the input of a pseudo-twin neural network, the prediction probability of the class of the data samples is output, and the JS divergence of the data sample labels and the prediction calculation of the different views by the different networks are used for judging the possibility of the data samples as clean data samples; dividing the batch of data samples into clean data samples and noisy data samples according to a given judgment threshold, performing smoothing treatment on labels of the clean data samples, performing dynamic weighting on the noisy data samples according to prediction of a model and labels of the samples, giving reliable labels to the noisy data samples, and updating the model by using the proposed classification loss function and consistency loss function. The method can achieve good performance effect under the condition of artificially synthesized noisy data sets and large-scale noisy data sets from real scenes, and has the characteristic of faster convergence in the training process.
In order to achieve the purpose, the invention adopts the technical scheme that: a self-tag modification method for processing a noisy tag, comprising the steps of:
s1, randomly selecting small-batch data samples in the process of training a model by using a data set
Figure BDA0003982576160000021
Processing each data sample X by using two data enhancement modes of zooming and cutting to obtain different views V and V';
s2, taking the different views V and V' obtained in the step S1 as the input of two pseudo-twin neural networks, and obtaining the final predicted output through soft-max layers by the output of the two pseudo-twin neural networksGive out a result P 1 ,P’ 1 ,P 2 ,P’ 2 In which P is 1 And P' 1 The output of the network I is generated through a soft-max layer, and the input is V and V'; p 2 And P' 2 The output of the network II is generated through a soft-max layer, and the input is V and V';
s3, calculating the difference between the pseudo twin neural network output in the step S2 and the label distribution given by the sample, specifically
Figure BDA0003982576160000031
Wherein, P i =[P i 1 ,P i 2 ,...,P i C ]Is a data sample x i Measuring the predicted probability distribution by using Jensen-Shannon (JS) divergence;
Figure BDA0003982576160000032
as data samples x i A given distribution of real tags; d KL (. | |. Cndot.) represents the Kullback-Leibler (KL) divergence function;
the label distribution of the data samples is 0-1 distribution, only the class to which the data samples belong is marked as 1, and the rest are 0, and in order to prevent the situation that the true number of the logarithm is 0 in the calculation process, the distribution is converted into the following formula for calculation:
Figure BDA0003982576160000033
wherein the given label is l i E {1,2,3,. Eta., C }, wherein epsilon is a hyper-parameter used for controlling the smoothness degree of label distribution;
s4, utilizing the distribution difference d obtained in the step S3 i Calculating the data sample x i The probability of being a clean sample is expressed as follows:
Figure BDA0003982576160000034
wherein,
Figure BDA0003982576160000035
represents p i And y i Consistency between them;
s5, calculating a threshold value selected by the clean data samples according to the training turns, and determining the threshold value tau clean Then, for data sample x i If the following equation is satisfied, x can be preliminarily determined i Is a clean data sample:
Figure BDA0003982576160000036
s6, selecting clean data samples according to the output of the two pseudo-twin neural networks, wherein the data samples participate in subsequent model updating only when the two neural networks judge that the data samples are clean, and a sample set expression is selected as follows:
Figure BDA0003982576160000037
wherein,
Figure BDA0003982576160000038
and &>
Figure BDA0003982576160000039
Respectively judging the data samples by using the output predictions of the two neural networks;
s7, dividing the training data into two subsets through judging and selecting the data samples: clean data sample set
Figure BDA0003982576160000041
And noisy data sample set->
Figure BDA0003982576160000042
S8, adopting smoothingClean data sample set processed by label distribution
Figure BDA0003982576160000043
The expression of the sample label in (1) is as follows:
Figure BDA0003982576160000044
s9, processing the sample labels in the noisy data sample set by means of the pseudo-twin neural network in the step S2, wherein the expression is as follows:
Figure BDA0003982576160000045
wherein,
Figure BDA0003982576160000046
is the label after the smoothing treatment of step S3; p is a radical of formula i The prediction result is output by the pseudo-twin neural network in the step S2, and one of the pseudo-twin neural networks is selected; e is the weight given to the model output;
s10, respectively performing cross entropy loss on the label distribution modified in the steps S8 and S9 and the probability distribution predicted by the model, and calculating a classification loss function, wherein the classification loss expression is as follows:
Figure BDA0003982576160000047
wherein, the data sample x i Two different views v are obtained after different data enhancement processing i And v i ' the predicted probability distributions of the outputs as inputs to the two networks are denoted by p i1 ,p’ i1 ,p i2 ,p’ i2
Figure BDA0003982576160000048
Is the modified label distribution, and is used for the selected clean data sampleBased on>
Figure BDA0003982576160000049
Obtained in step S8; for data samples deemed noisy, a->
Figure BDA00039825761600000410
Obtained in step S9; n is the number of data samples processed;
s11, calculating a consistency loss function, specifically:
Figure BDA00039825761600000411
wherein D is KL (. | |. Cndot.) denotes the Kullback-Leibler (KL) divergence function, p i1 ,p’ i1 ,p i2 ,p’ i2 N represents the classmatic loss function;
s12, integrating the classification loss function obtained in the step S10 and the consistency loss function obtained in the step S11, and calculating an overall loss function, wherein the expression is as follows:
Figure BDA0003982576160000051
wherein alpha is a hyper-parameter used for adjusting the weight of two losses;
s13, calculating gradient descent by using the overall loss function, updating parameters of the model, and obtaining an optimal model for solving the noise label:
Figure BDA0003982576160000052
wherein θ = { θ 12 },θ 12 And respectively representing the parameters of the two networks, repeating the training process after updating, executing the step S1 if the set iteration times are not reached, otherwise, exiting the training round, and executing the next training round until the training round is finished.
Compared with the prior art, the invention has the following beneficial effects: the self-label modifying method for processing the noise label is used for solving the image classification task under the label noise, and can achieve higher performance effect in a given data set. The training data are fully utilized in the training process, and the training is performed only by the model of the user without depending on an additional auxiliary model; the method can achieve good performance effect under the condition of artificially synthesized noisy data sets and large-scale noisy data sets from real scenes, and has the characteristic of faster convergence in the training process.
Drawings
FIG. 1 is a block diagram of the method of the present invention;
FIG. 2 is a comparison graph of classification performance of the present invention on the cloning 1M data set and the existing main methods, including Decoupling, co-training +, joCOR, jo-SRC method, wherein Standard is a method for training the network directly on the noisy data set;
FIG. 3 is a comparison of classification performance of the present invention on Food101N datasets against existing dominant methods, including CleanNet, deepself methods, where Standard is a method to train the network directly on noisy datasets;
fig. 4 is a graph comparing classification performance of the present invention on a noisy dataset artificially synthesized on a CIFAR100 dataset with the existing main methods, including decoring, co-training +, joCoR, jo-SRC method, where Standard is a method for training a network directly on a noisy dataset.
Detailed Description
The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention.
Example 1
A self-label modifying method for processing noise labels is provided, the frame of the method is shown in figure 1, and the invention adopts a pseudo-twin neural network to strictly judge whether a data sample is a clean sample. For data samples which are considered to be noisy, the method carries out self-label modification, does not depend on an additional auxiliary network, and only depends on a pseudo-twin neural network in the method for modification. Meanwhile, dynamic weight is given to the output prediction of the pseudo-twin neural network, so that the confidence degree of the pseudo-twin neural network is more reasonable as training progresses. Finally, the proposed consistency loss and classification loss are used to update the model. The method specifically comprises the following steps:
step S1: in the process of training the model by using the data set, randomly selecting a small batch of data samples
Figure BDA0003982576160000061
For each data sample X, processing the data sample by using a data enhancement technology to obtain different views V and V', and specifically for the same data sample, processing the data sample by using two data enhancement modes of scaling and cropping to obtain two views.
Step S2: the different views V and V' for each sample are taken as inputs to network one and network two. Wherein network one and network two are pseudo-twin neural networks, which can predict labels separately, with different parameters, but updated simultaneously by the same loss function. The output of the network obtains the final predicted output P through the soft-max layer 1 ,P 1 ’,P 2 ,P 2 ', wherein P 1 And P 1 'is generated by the output of network one through soft-max layer, the inputs are V and V', P 2 And P 2 'the output from network two is generated by the soft-max layer, with inputs V and V', respectively. The network architectures of the first network and the second network are the same, but do not share parameters, and the two networks are updated simultaneously by using a random gradient descent method by using the same loss function.
And step S3: and (3) calculating the difference of the pseudo twin neural network output and the label distribution given by the sample in the step (S2). For data sample x i The method measures the predicted probability distribution P by using Jensen-Shannon (JS) divergence i =[P i 1 ,P i 2 ,...,P i C ]And given truthLabel distribution
Figure BDA0003982576160000062
The difference between them. It is represented as follows:
Figure BDA0003982576160000071
wherein D KL (. | |. Cndot.) represents the Kullback-Leibler (KL) divergence function.
The label distribution of the data sample is 0-1 distribution, only the class of the data sample is marked as 1, the rest is 0, and in order to prevent the situation that the true number of the logarithm is 0 in the calculation process, the smoothed label distribution is adopted in the calculation formula (1), and the calculation is converted into the calculation formula (2).
Figure BDA0003982576160000072
Wherein the given label is l i E {1,2,3.., C }, e is a hyper-parameter used to control the smoothness of the label distribution. And after debugging, selecting 0.7 as the hyper-parameter epsilon for smoothing the label.
And step S4: calculating data samples x i The possibility of a clean sample. The JS divergence can be used to measure the dissimilarity between the two probability distributions, which is between 0 and 1. Thus, intuitively, d can be utilized i To measure data sample x i The probability of being a clean sample is expressed as follows:
Figure BDA0003982576160000073
in fact, it is possible to use,
Figure BDA0003982576160000074
represents p i And y i Consistency between them.
Step S5: and calculating the threshold value selected by the clean data sample according to the training turns. The invention adopts the following mode to carry out dynamic processing on the threshold value for judging whether the data sample is clean:
Figure BDA0003982576160000075
where t denotes the training run, Δ τ = τ mc ,τ c Is a hyperparameter, τ m Is a self-defined constant, threshold τ clean The first stage is divided into two stages, t is more than or equal to 1 and less than or equal to t w In this stage, the model is selected only from the clean data samples, and the model is updated using the selected data samples without modifying the label. Second stage t w ≤t≤t max When the model has certain prediction capability, the label modification is carried out on the data sample judged to have noise in order to more effectively utilize the data. Threshold τ clean Which varies with the training in a linear manner in both stages.
Given a suitable threshold τ clean Then, for data sample x i If the following equation is satisfied, x can be preliminarily determined i Is a clean data sample:
Figure BDA0003982576160000081
the proportion of the two-stage training rounds is not fixed, and in the training process, the first stage can enable the training performance to tend to be saturated in the stage, and in the second stage, sufficient training is needed to achieve higher performance. The threshold for clean data sample selection during the initial process should be low enough to prevent selection of too few samples.
Step S6: clean data sample selection is performed in combination with output prediction of the network. Because the two networks in the framework have different learning capabilities, the two networks can filter errors caused by different types of noise labels, and in order to improve the reliability of sample selection, the invention adopts a dual-model structure to strengthen the screening of the clean labels. And only when the two networks judge that the data sample is clean, the data sample participates in subsequent model updating, and a sample set expression is selected as follows:
Figure BDA0003982576160000082
wherein
Figure BDA0003982576160000083
And &>
Figure BDA0003982576160000084
Respectively, the results of judging the data samples using the output predictions of the two networks.
Although the two networks are capable of filtering different types of noise, the thresholds by which the two networks determine whether a data sample is a clean data sample are the same at the same stage.
Step S7: the training data is partitioned. By decision and selection of data samples, we divide the training data into two subsets: clean data sample set
Figure BDA0003982576160000085
And noisy data sample set->
Figure BDA0003982576160000086
Step S8: processing clean data sample sets
Figure BDA0003982576160000087
The sample label of (1). The invention keeps their labels unchanged, but in order to improve generalization performance and prevent the situation that the true number of the logarithm is 0 in calculating cross entropy, a smoothed label distribution is adopted, and the expression is as follows:
Figure BDA0003982576160000088
the labels determined to be clean data samples still need to be consistent with the process of calculating the difference in the distribution of the network output and the given label of the sample in the previous step.
Step S9: processing a set of noisy data samples
Figure BDA0003982576160000091
The sample label of (1). The invention adopts a modification mode of self-labeling and only depends on the pseudo-twin neural network in the step S2 to process the sample labels in the noisy data sample set. When a data sample is predicted to be noisy, where the prediction of a given label and model is conflicting, the label and model prediction output of the sample itself should be given different weights, since it cannot be completely determined whether its corresponding label is erroneous or correct, as expressed below:
Figure BDA0003982576160000092
wherein in the Chinese formula
Figure BDA0003982576160000093
Is a label smoothed by the formula (2), p i The prediction result output by the pseudo-twin neural network in step S2, where one of the pseudo-twin neural networks is selected, e is a weight given to the model output, and e determines how much the label distribution of the model prediction should be trusted. Considering that the model should become more reliable as training progresses, e should be dynamic and increase as training progresses. While considering that the output distribution for the prediction should be reasonable, for this reason e can be defined as:
∈=g(t)×l(p) (9)
where g (t) determines how much the learner can be trusted, it is data independent, and the expression is as follows:
Figure BDA0003982576160000094
wherein
Figure BDA0003982576160000095
Represents the total number of iteration rounds of training and t represents the current training round.
l (p) determines how much to trust the predicted tag distribution, which is data dependent, and the expression is as follows:
l(p)=1-H(p)/H(u) (11)
where H (p) represents the information entropy of the model output prediction, and the expression of H (u) is as follows:
Figure BDA0003982576160000096
wherein, the weight epsilon given to the model output is determined by weighting the model output by the prediction outputs of the two networks, and the weight is 0.5.
Step S10: a classification loss function is calculated. And respectively performing cross entropy loss on the modified label distribution and the probability distribution predicted by the model, wherein the classification loss expression is as follows:
Figure BDA0003982576160000097
wherein, the data sample x i Two different views v are obtained after different data enhancement processing i And v i ' the predicted probability distributions of the inputs and outputs of the two networks are denoted by p i1 ,p’ i1 ,p i2 ,p’ i2
Figure BDA0003982576160000101
Is a modified label distribution that, for selected clean data samples, combines>
Figure BDA0003982576160000102
Derived from formula (7) for the presence of noiseAcoustic data sample, based on the comparison of the signal level and the signal level>
Figure BDA0003982576160000103
Obtained by the formula (8). N is the number of processed data samples, and in the first stage of threshold dynamic, in order to make the model have a certain predictive capability as soon as possible, only the selection of clean data samples is performed without performing label processing on noisy data samples, and at this time, the classification loss is as follows:
Figure BDA0003982576160000104
wherein
Figure BDA0003982576160000105
The data sample set is obtained by calculation according to the formula (6) and the condition that both networks judge that the data sample is clean.
In the second stage of threshold dynamic, the model has a certain predictive ability, and the data judged to be noisy is subjected to label modification, wherein N is the number of data samples in a small batch.
The labels used for calculating the classification loss are processed differently in two stages, namely, the labels are smoothed in the first stage, and clean data samples and noisy data samples are processed differently in the second stage through the steps.
Step S11: a consistency loss function is calculated. The present invention designs a consistency loss to maximize consistency between two classifiers and consistency between output predictions from different views of the input by the same network. The expression is as follows:
Figure BDA0003982576160000106
wherein D KL (. | | -) represents the Kullback-Leibler (KL) divergence function, p i1 ,p’ i1 ,p i2 ,p’ i2 And N represents the classmatic loss function.
Step S12: a total loss function is calculated. Integrating the classification loss function and the consistency loss function, wherein the overall loss function expression is as follows:
Figure BDA0003982576160000107
where α is a hyperparameter used to adjust the weight magnitudes of the two losses.
Step S13: the gradient descent is calculated using the global loss function, thereby updating the parameters of the model:
Figure BDA0003982576160000111
wherein θ = { θ 12 },θ 12 Representing the parameters of the two networks, respectively. And repeating the training process after updating, if the set iteration number is not reached, executing the step one, otherwise, exiting the training round, and executing the next training round until the training round is finished.
Test example
The classification performance of the method of the invention on the blocking 1M dataset, the Food101N dataset and the CIFAR100 dataset is compared with the classification performance of the existing advanced method related to the noise label processing field, the compared method is different for each dataset, the specific compared method is shown in the attached figure description, and the compared result is shown in figures 2-4.
FIG. 2 shows the comparison of classification performance of the present invention with the existing advanced methods in the field of noise label processing on the blocking 1m dataset, including Decoupling, co-training +, joCOR, jo-SRC, standard, which is a method for training the network directly on the noisy dataset. It can be seen that the best results are obtained with a performance of about 0.2% higher than that obtained previously for the best performance method, jo-SRC. However, the training process of the Jo-SRC method requires predictions using a teacher model, and therefore relies on accurate auxiliary models to generate the predictions. The invention lightens the network model in the training process and obtains higher performance effect on the closing 1m data set.
Fig. 3 shows a comparison of the present invention with existing advanced methods in the field of noise signature processing on a Food101N dataset, including clearnet, deepSelf, standard, which is a method of training a network directly on a noisy dataset. The performance effect of the method Jo-SRC, which is slightly higher than the best performance before, is achieved by the method, and the effectiveness of the method in the case of processing real-world noise is also verified. However, the Jo-SRC uses the teacher model in the training process, which also embodies the characteristic that the method can obtain better performance effect without depending on an additional auxiliary model.
Fig. 4 shows a comparison of the present method with the existing advanced methods related to the field of noise label processing, including decoring, co-training +, joCoR, jo-SRC, standard, for training the network directly on the noisy dataset, when the CIFAR100 dataset is composed of the noisy dataset. The noise types include "symmetrical type" and "asymmetrical type", the noise ratio under the symmetrical type "is set to 0.2, 0.4, 0.8, and the noise ratio under the asymmetrical type" is set to 0.4. As shown, the method is always superior to the existing advanced methods regarding the field of noise tag processing.
In summary, a simple and effective method is proposed to solve the performance degradation problem caused by noise labels in image classification. Aiming at the problem that the existing method is lack of reliability when the data sample is judged to be clean, the method adopts a dual-model structure to filter errors caused by different types of noise labels, and the consistency between prediction outputs of dual models is maximized. For a clean data sample, the method smoothes the label of the clean data sample to improve the generalization performance of the model and prevent the condition that the true number is 0 in the process of calculating the cross entropy. For a noisy data sample, the method determines the label thereof to the prediction of the model and the label marked by the method, and gives dynamic weight between the prediction and the label, and the modification mode does not depend on other models, but only depends on the model of the frame. In addition, the method also provides a classification loss function and a consistency loss function to update the model, experiments are carried out on the synthesized noisy data set and the large-scale noisy data set in a real scene, and a good performance effect is obtained to prove the effectiveness of the method.
It should be noted that the above-mentioned contents only illustrate the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and it will be apparent to those skilled in the art that several modifications and embellishments can be made without departing from the principle of the present invention, and these modifications and embellishments fall within the protection scope of the claims of the present invention.

Claims (7)

1. A self-tag modification method of processing a noisy tag, characterized by: randomly selecting a small batch of data samples, performing data enhancement processing on the data samples to obtain different views, using the different views as the input of a pseudo-twin neural network, outputting the prediction probability of the data sample category, and calculating JS divergence distributed by data sample labels according to the prediction of different views by different networks for judging the possibility of the data samples as clean data samples; dividing the batch of data samples into clean data samples and noisy data samples according to a given judgment threshold, performing smoothing treatment on labels of the clean data samples, performing dynamic weighting on the noisy data samples according to prediction of a model and labels of the samples, giving reliable labels to the noisy data samples, and updating the model by using a classification loss function and a consistency loss function.
2. A self-tag modification method of processing a noise tag as claimed in claim 1, comprising the steps of:
s1, randomly selecting a small batch of data samples in the process of training a model by using a data set
Figure FDA0003982576150000014
Processing each data sample X by using two data enhancement modes of zooming and cutting to obtain different views V and V';
s2, taking the different views V and V' obtained in the step S1 as the input of two pseudo-twin neural networks, and obtaining final prediction output results P through soft-max layers by the output of the two pseudo-twin neural networks 1 ,P’ 1 ,P 2 ,P’ 2 In which P is 1 And P' 1 The output of the network I is generated through a soft-max layer, and the input is V and V'; p 2 And P' 2 The output of the network II is generated through a soft-max layer, and the input is V and V';
s3, calculating the difference between the pseudo twin neural network output in the step S2 and the label distribution given by the sample, specifically:
Figure FDA0003982576150000011
wherein,
Figure FDA0003982576150000012
for data samples x i Measuring the predicted probability distribution by using Jensen-Shannon (JS) divergence; />
Figure FDA0003982576150000013
Is a data sample x i A given distribution of real tags; d KL (. | | -) represents the Kullback-Leibler (KL) divergence function;
the label distribution of the data samples is 0-1 distribution, only the class to which the data samples belong is marked as 1, and the rest is 0, so that the situation that the true number of the logarithm is 0 in the calculation process is prevented, and the distribution is converted into the following formula for calculation:
Figure FDA0003982576150000021
wherein the given label is l i E {1,2,3,. Eta., C }, wherein epsilon is a hyper-parameter used for controlling the smoothness degree of label distribution;
s4, utilizing the stepsThe difference d in distribution obtained in step S3 i Calculating the data sample x i The probability of being a clean sample is expressed as follows:
Figure FDA0003982576150000022
wherein,
Figure FDA0003982576150000023
represents p i And y i Consistency between them;
s5, calculating a threshold value selected by the clean data samples according to the training turns, and determining the threshold value tau clean Then, for data sample x i If the following equation is satisfied, x can be preliminarily determined i Is a clean data sample:
Figure FDA0003982576150000024
s6, selecting clean data samples according to the output of the two pseudo-twin neural networks, wherein the data samples participate in subsequent model updating only when the two neural networks judge that the data samples are clean, and a sample set expression is selected as follows:
Figure FDA0003982576150000025
wherein,
Figure FDA0003982576150000026
and &>
Figure FDA0003982576150000027
Respectively judging the data samples by using the output predictions of the two neural networks;
s7, dividing the training data into two parts by judging and selecting the data samplesSubset of individuals: clean data sample set
Figure FDA0003982576150000031
And a collection of noisy data samples>
Figure FDA0003982576150000032
S8, processing the clean data sample set by adopting the smoothed label distribution
Figure FDA0003982576150000033
The expression of the sample label in (1) is as follows:
Figure FDA0003982576150000034
s9, processing the noisy data sample set by means of the pseudo twin neural network in the step S2
Figure FDA0003982576150000035
The expression of the sample label in (1) is as follows:
Figure FDA0003982576150000036
wherein,
Figure FDA0003982576150000037
the label is smoothed in the step S3; p is a radical of formula i Is the prediction result output by the pseudo-twin neural network in step S2; e is the weight given to the model output;
s10, respectively performing cross entropy loss on the label distribution modified in the steps S8 and S9 and the probability distribution predicted by the model, and calculating a classification loss function, wherein the classification loss expression is as follows:
Figure FDA0003982576150000038
wherein, the data sample x i Two different views v are obtained after different data enhancement processing i And v i ' the output probability distributions obtained as inputs to the two networks are denoted p i1 ,p’ i1 ,p i2 ,p’ i2
Figure FDA0003982576150000039
Is a modified label distribution that, for a selected clean data sample, is based on the value of the label distribution>
Figure FDA00039825761500000310
Obtained in step S8; for data samples deemed noisy, a->
Figure FDA00039825761500000311
Obtained in step S9; n is the number of data samples processed;
s11, calculating a consistency loss function, specifically:
Figure FDA00039825761500000312
wherein D is KL (. | | -) represents the Kullback-Leibler (KL) divergence function, p i1 ,p’ i1 ,p i2 ,p’ i2 N represents the classmark penalty function;
s12, integrating the classification loss function obtained in the step S10 and the consistency loss function obtained in the step S11, and calculating an overall loss function, wherein the expression is as follows:
Figure FDA0003982576150000041
wherein alpha is a hyperparameter used for adjusting the weight of two losses;
s13, calculating gradient descent by using the overall loss function, updating parameters of the model, and obtaining an optimal model for solving the noise label:
Figure FDA0003982576150000042
wherein θ = { θ 12 },θ 12 And respectively representing parameters of the two networks, repeating the training process after updating, executing the step S1 if the set iteration times are not reached, otherwise, exiting the training round and executing the next training round until the training round is finished.
3. A self-tag modification method of processing a noisy tag according to claim 2, characterized by: in the step S2, the two pseudo-twin neural networks are the same in network structure, but do not share parameters, and the two neural networks are updated simultaneously by using the same loss function and using a random gradient descent method.
4. A self-tag modification method of processing a noisy tag according to claim 2, characterized by: in step S5, the threshold for determining whether the data sample is clean is dynamically processed in the following manner:
Figure FDA0003982576150000043
wherein t represents the training round; Δ τ = τ mc ,τ c Is a hyper-parameter; tau is m Is a self-defined constant; threshold τ clean The first stage is divided into two stages, t is more than or equal to 1 and less than or equal to t w Only selecting a clean data sample, and updating the model by using the selected data sample without modifying the label; second stage t w ≤t≤t max Modifying the label of the data sample judged to have noise; the threshold value tau clean In two stages with linearityThe way of (2) is constantly changing with training.
5. A self-tag modification method of processing a noisy tag according to claim 2, characterized by: in step S6, the thresholds according to which the two neural networks determine whether the data sample is a clean data sample are the same.
6. A self-tag modification method of processing a noisy tag according to claim 4 or 5, characterized in that: in step S9, the weight e of the model output is dynamic and continuously increases as the training progresses, i.e. e can be defined as:
∈=g(t)×l(p)
Figure FDA0003982576150000051
Figure FDA0003982576150000052
H(u)=-log(1/C)
wherein Γ represents the total number of iteration rounds of training; t represents the current training round; h (p) represents the information entropy of the model output prediction.
7. A self-tag modification method of processing a noisy tag according to claim 6, characterized by: in the step S10, in the first stage of threshold dynamic, only the selection of the clean data sample is performed, and the labeling processing on the noise data sample is not performed, where the classification loss is as follows:
Figure FDA0003982576150000053
wherein,
Figure FDA0003982576150000054
the data sample set is a data sample set which meets the condition that both networks judge that the data sample is clean;
in the second stage of threshold dynamism, the model performs label modification on data judged to be noisy, where N is the number of data samples in a small batch.
CN202211554141.9A 2022-12-06 2022-12-06 Self-label modifying method for processing noise label Pending CN115861625A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211554141.9A CN115861625A (en) 2022-12-06 2022-12-06 Self-label modifying method for processing noise label

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211554141.9A CN115861625A (en) 2022-12-06 2022-12-06 Self-label modifying method for processing noise label

Publications (1)

Publication Number Publication Date
CN115861625A true CN115861625A (en) 2023-03-28

Family

ID=85670166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211554141.9A Pending CN115861625A (en) 2022-12-06 2022-12-06 Self-label modifying method for processing noise label

Country Status (1)

Country Link
CN (1) CN115861625A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274682A (en) * 2023-09-14 2023-12-22 电子科技大学 Label-containing noise data classification method based on asynchronous co-training

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274682A (en) * 2023-09-14 2023-12-22 电子科技大学 Label-containing noise data classification method based on asynchronous co-training

Similar Documents

Publication Publication Date Title
WO2021155706A1 (en) Method and device for training business prediction model by using unbalanced positive and negative samples
CN109741318B (en) Real-time detection method of single-stage multi-scale specific target based on effective receptive field
US11804074B2 (en) Method for recognizing facial expressions based on adversarial elimination
CN109800778A (en) A kind of Faster RCNN object detection method for dividing sample to excavate based on hardly possible
CN110458084B (en) Face age estimation method based on inverted residual error network
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN113806546B (en) Graph neural network countermeasure method and system based on collaborative training
CN111382686B (en) Lane line detection method based on semi-supervised generation confrontation network
CN107945210B (en) Target tracking method based on deep learning and environment self-adaption
CN110532880B (en) Sample screening and expression recognition method, neural network, device and storage medium
CN112651998B (en) Human body tracking algorithm based on attention mechanism and double-flow multi-domain convolutional neural network
Lin et al. Fairgrape: Fairness-aware gradient pruning method for face attribute classification
CN112150493A (en) Semantic guidance-based screen area detection method in natural scene
CN111145145B (en) Image surface defect detection method based on MobileNet
CN113537630A (en) Training method and device of business prediction model
CN112927266A (en) Weak supervision time domain action positioning method and system based on uncertainty guide training
Duan et al. Age estimation using aging/rejuvenation features with device-edge synergy
CN116343080A (en) Dynamic sparse key frame video target detection method, device and storage medium
CN115861625A (en) Self-label modifying method for processing noise label
CN113989256A (en) Detection model optimization method, detection method and detection device for remote sensing image building
CN113179276A (en) Intelligent intrusion detection method and system based on explicit and implicit feature learning
CN116935054A (en) Semi-supervised medical image segmentation method based on hybrid-decoupling training
CN116884071A (en) Face detection method and device, electronic equipment and storage medium
CN116645562A (en) Detection method for fine-grained fake image and model training method thereof
JP7073171B2 (en) Learning equipment, learning methods and programs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination