CN115618935B

CN115618935B - Robustness loss function searching method and system for classification task tag noise

Info

Publication number: CN115618935B
Application number: CN202211645114.2A
Authority: CN
Inventors: 邓岳; 杜金阳
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-05-05
Anticipated expiration: 2042-12-21
Also published as: CN115618935A

Abstract

The invention discloses a robustness loss function searching method and system for classification task label noise, comprising a deep neural network model selection module, a loss function parameterization module, a classification task data set dividing module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module; selecting a deep neural network model for classifying tasks, and constructing parameterized loss functions with different expansion orders based on a Taylor expansion method; constructing a double-layer optimized algorithm main body, and outputting a group of super parameters with the best effect on a verification set by combining hidden function theorem or penalty function thought based on a verification set sample self-selection strategy combined with self-learning; the new loss function formed by the obtained super parameters is used as the loss function with robustness to the tag noise obtained by final search, the retraining model is guided, the rapid automatic search of the loss function with robustness to the tag noise is realized, and the method has simplicity and portability.

Description

Robustness loss function searching method and system for classification task tag noise

Technical Field

The invention relates to the technical field of deep learning, in particular to a robustness loss function searching method and system for classification task-oriented label noise.

Background

Currently, tag noise is prevalent in real-world real-scene data sets. When these noise labels are present, the deep neural network may negatively affect the target task due to the problem of overfitting, such as when used in the medical diagnostic field, the overfitting of the noise labels may seriously affect the judgment of the physician.

The traditional loss function has the problems of low convergence speed, poor training effect and the like, and has no robustness to the label noise or has robustness. The gradient contribution of the conventional Class Cross Entropy (CCE) to the samples of class errors is greater, and when there is a true label and the label is false, the class cross entropy is more prone to fit the noise label and therefore is not robust to the label noise. The average absolute error (MAE) loss contributes uniformly to the gradient of all samples, so that the model has robustness to the label noise, but also cannot provide effective guidance for the model in the training process, has low convergence speed and has poor model training effect.

The robustness loss function of manual design is typically based on good a priori knowledge and introduces super parameters that need to be manually adjusted. The extensive cross entropy (GCE) is based on the thought of combining the advantages of the average absolute error and the classification cross entropy, and introduces the super parameter q whenGCE is equivalent to CCE when q tends to 0, and MAE when q=1; symmetric Cross Entropy (SCE) from the standpoint of KL divergence, inverse Cross entropy (RCE) based on model predictions is defined by introducing coefficients

and />

Linear combining with CCEs is robust to tag noise, but these loss function designs are based on good a priori knowledge, the cost of the design is high, and the introduction of super-parameters can make the application on different tasks inflexible.

The traditional automatic searching method of the loss function combined with genetic programming and the like has the problems of high calculation cost, low searching speed, discrete searching space, need of verification set without label noise and the like. Based on the idea of autopl, many studies desire to be able to automatically search for a loss function that is robust to tag noise. In the AutoLoss-Zero method, a loss function is split into polynomial combinations among different operators, a search space formed by basic operators is defined, a calculation diagram is used for representing a loss function, the loss function is constructed through basic calculation on the diagram, and finally an evolutionary algorithm is adopted for searching the loss function; in addition, a common method is to search a group of parameters with good label noise resistance by combining a genetic algorithm method on a clean verification set by introducing an adjustable parameter group in consideration of the fact that the existing loss function has certain superiority; however, such methods cannot utilize fast search with gradient descent due to discrete search space, and the conventional methods of grid search and genetic programming have high computational cost and slow computational speed.

Therefore, how to provide an efficient and rapid method and system for searching the robust loss function of the label noise of the classification task is a problem to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, the invention provides a method and a system for searching a robust loss function for classifying task tag noise, which solve the problems of high cost, introduction of super-parameters and discrete search space which need to be manually adjusted, low search speed, need of a noiseless tag verification set and the like of the existing method for designing the loss function with robustness to the tag noise.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the task tag noise-oriented robustness loss function searching method comprises the following steps:

s1, selecting a deep neural network model for learning different difficulty classification tasks;

s2, constructing parameterized classification task loss functions with different expansion orders according to a Taylor expansion method based on a deep neural network model;

s3, dividing the data set of the classification task into a noisy training set for inner layer optimization, a noisy verification set for outer layer optimization and a clean test set for testing;

s4, setting a verification set sample self-selection strategy combined with self-learning;

s5, constructing a double-layer optimized algorithm main body, and outputting a group of super parameters with the best effect on the verification set by combining hidden function theorem or penalty function thought based on a verification set sample self-selection strategy;

s6, forming a new loss function by the obtained super parameters to serve as a loss function with robustness to the tag noise, which is obtained through final search;

s7, retraining the deep neural network, and guiding training of a new basic model on the training set added with noise and the verification set not added with noise according to the searched loss function to finish the retraining process of the deep neural network.

In practical application, the different difficulty classification tasks comprise simple classification tasks such as handwriting digital recognition and complex classification tasks such as large-scale image multi-classification tasks.

Preferably, the parameterized classification task loss function in S2

The method comprises the following steps:

wherein ,

probability of being the model predictive value to be the true value, +.>

For a learnable hyper-parameter, the initial value is the cross entropy loss expansion coefficient,Nthe taylor's expanded order is lost for the selected cross entropy.

Preferably, the specific content of S4 includes: setting degradation weights in combination with self-learning

The loss is selected to be less than or equal to +.>

The samples with high model classification accuracy confidence are screened out as the samples with correct classification labels, and the samples of the verification set selected according to the loss are gradually increased along with the increase of the iteration times.

Preferably, the inner layer optimization in S5 is targeted at a given loss function parameter

Under the condition of +.>

Optimal response of the model parameters obtained above +.>

The goal of the outer layer optimization is to respond +.>

In verification set->

The above-mentioned measure for making the evaluation model classifying property +.>

Achieving the optimal parameters->

The method specifically comprises the following steps:

alternating the inner and outer bilayer optimizations to obtain the best loss function parameters:

wherein

Is the learning rate when model parameters are updated, +.>

Is learning rate when super parameter is updated, ω is model parameter, θ is loss function parameter, ++>

Is super gradient.

Preferably, S5 comprises:

s51, setting a norm-up stage, training a deep neural network model under the guidance of an initialized loss function, and ensuring that the deep neural network learns the ability of correctly classifying simple samples;

s52, evaluating the performance of a model obtained by training under the condition of noise existence on a verification set by a metric function meeting a micro-condition in outer layer optimization;

s53, updating parameters of the loss function based on a gradient descent mode by combining hidden function theorem or penalty function idea;

s54, training the model on a training set according to the newly obtained loss function;

s55, repeating the steps from S52 to S54 until the iteration is finished or the algorithm converges;

s56, outputting a group of super parameters with the best effect of overcoming the noise influence of the classification labels by the stored guidance model when the double-layer optimization algorithm is finished.

Preferably, the specific content of S53 includes:

(1) Solving the super gradient by combining the hidden function theorem:

the calculation of the supergradient can be converted into the following form according to the function-derived chain law:

wherein ,

is the best response of model parameters to t-moment super-parameter theta in the inner layer optimization task, and is +.>

Gradient of optimal response to super parameters for inner layer, +.>

To evaluate the metric function of the robustness effect of the model trained under noisy conditions on the noise of the classification tag, +.>

Is a verification set; the first term on the right of the equation is equal to the constant zero and can be ignored.

Setting a loss function

There is a second derivative of the model parameter ω, then according to the hidden function theorem:

wherein ,

for the Hessian matrix, the inverse of the Hessian matrix is effectively approximated using a Neumann series:

super gradient calculation formula combined with hidden function theorem:

updating the loss function parameters from the obtained super gradient:

；

(2) Constructing an auxiliary function of the inequality constraint optimization problem by combining the penalty function idea:

wherein ,

for the purpose of +.>

As a constraint function, the auxiliary function is in the form of:

wherein sigma and epsilon are both adjustable super-parameters and loss function parameters

And model parameters->

For explicit existence, the auxiliary function is calculated with respect to the loss function parameter when the loss function parameter is updated on the basis of the gradient descent>

Is a first order bias guide of (a):

。

preferably, the specific content of S54 is: after obtaining the new loss function parameters from S53, iterating on the training set according to the new loss function

Updating model parameters by batch data to obtain optimal response of the model parameters to new loss function parameters>

。

Preferably, the specific content of S55 is: set up fixed training round number

When the model is trained on the training set +.>

After the round, the parameter search process ends, or when the average loss of the model on the validation set +.>

Continuous->

And when the round rises, ending the loss function parameter searching process.

The robustness loss function search system for the classified task tag noise comprises a deep neural network model selection module, a loss function parameterization module, a classified task data set division module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module;

the deep neural network model selection module outputs a deep neural network model according to the difficulty level of the classification task;

the loss function parameterization module is used for constructing parameterized classification task loss functions with different expansion orders based on a Taylor expansion method according to the deep neural network model;

the classification task data set dividing module is used for outputting a noisy training set for inner layer optimization, a noisy verification set for outer layer optimization and a clean test set for testing based on the classification task data set;

the self-step learning module outputs a set verification set sample self-selection strategy combined with self-step learning;

the double-layer optimization module is used for constructing a double-layer optimized algorithm main body, calling a verification set sample self-selection strategy combined with self-learning, and outputting a group of super parameters with the best effect on the verification set by combining with hidden function theorem or penalty function thought;

the robust loss function construction module is used for composing a new loss function by the obtained super parameters, outputting a loss function which is obtained by final search and has robustness to the classified task tag noise:

and the retraining module is used for guiding training of a new basic model on the training set added with noise and the verification set not added with noise according to the searched loss function, and finishing the retraining process of the deep neural network.

Preferably, the double-layer optimization module comprises an inner-layer optimization unit and an outer-layer optimization unit;

an inner layer optimization unit for outputting the optimal response of model parameters obtained on the training set under the condition of given loss function parameters

；

An outer layer optimizing unit for optimizing the response according to the current best response

Invoking a verification set sample self-selection strategy combined with self-learning, and outputting a measurement which is obtained on the verification set and enables the classification performance of the evaluation model to be improved>

Achieving the optimal parameters->

。

Compared with the prior art, the invention discloses a robust loss function searching method and a robust loss function searching system for classifying task tag noise, wherein a Taylor expansion mode is adopted to construct a continuous searching space parameterized by a proper amount of variables and capable of representing enough wide function capacity, and conditions are created for realizing gradient descent-based loss function searching; the method combines self-learning and gradient descent-based double-layer optimization, searches the loss function in the continuous search space in a gradient descent mode on the verification set with the tag noise, finally realizes quick and automatic search of the loss function with robust tag noise, can be easily deployed in different classification tasks due to the simplicity and portability of a search algorithm, has the characteristic of applicability and flexibility, and provides a new idea for overcoming the tag noise.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a robust loss function search method according to the present invention;

fig. 2 is a schematic diagram of a double-layer optimization structure for realizing gradient descent-based search loss function parameters.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention discloses a task tag noise-oriented robustness loss function searching method, which comprises the following steps:

In practical application, in the image classification task, residual neural network models (ResNet) with different depths can be selected according to the learning difficulty of the provided image data, and when the residual neural network models are used for image classification data sets easy to learn such as MNIST, a plurality of layers of Convolutional Neural Networks (CNN) can be simply defined as a basic model.

In this embodiment, the parameterization of the loss function is implemented by performing taylor expansion on the conventional classification cross entropy. For classification problems, sample tags typically use One-hot coding, based on which cross entropy loss is expanded into taylor polynomials, resulting in the following form:

the coefficient of the jth term in the expanded cross entropy loss cancels the polynomial pair

The power of the j-th term after derivation is not the optimal set of coefficients in the presence of label noise, so the combination of N-th terms before the expansion polynomial is taken as a loss function, and the coefficients are set as learnable parameters->

The resulting parameterized postloss function->

。

To go aheadImplementing the technical scheme, the parameterized classification task loss function in S2

The method comprises the following steps:

wherein ,

probability of being the model predictive value to be the true value, +.>

As a learnable parameter, the initial value is the cross entropy loss expansion coefficient,Nthe taylor's expanded order is lost for the selected cross entropy.

In this embodiment, since each coefficient obtained after taylor expansion is constant and each coefficient counteracts the influence of each power after derivation, the robustness of cross entropy to classification task label noise is caused by unreasonable distribution of each coefficient to final gradient contribution when calculating gradient, so each coefficient after taylor expansion is used as a learnable parameter in this patent to find a set of coefficients robust to label noise.

The first term is the mean absolute error loss (MAE) of robustness to tag noise, which illustrates that the existing robustness loss function may be one case in the search space; at the same time, since the polynomial has the property of fitting an arbitrary function, the search space based on the taylor expansion structure has the capability of representing a sufficiently broad function.

In order to further implement the above technical solution, the specific content of S4 includes: setting degradation weights in combination with self-learning

The loss is selected to be less than or equal to +.>

Is used for evaluating model parameters

Screening out samples with high model classification accuracy confidence as samples with correct classification labels, providing supervision information for optimizing outer layer problems, and gradually reducing ++as the iteration number increases>

The value of>

As the value of (2) approaches 0, the samples of the validation set selected based on the loss gradually increase.

In order to further implement the above technical solution, the inner layer optimization in S5 is aimed at giving the loss function parameters

Under the condition of +.>

Optimal response of the model parameters obtained above +.>

The goal of the outer layer optimization is to respond +.>

In verification set->

Achieving the optimal parameters->

The method specifically comprises the following steps:

wherein

Is the learning rate when model parameters are updated, +.>

Is super gradient. />

In order to further implement the above technical solution, S5 includes:

in this embodiment, the model has not been fitted with samples with false labels at the initial stage of training, thus has good generalization ability, while in order to ensure that the model has classification ability to distinguish simple samples when sample data is selected on the verification set based on a self-learning method, a norm-up stage is set in the double-layer optimization process, before

Use of initialized loss function in wheel training +.>

Normally guiding and training a network model;

in this embodiment, based on the verification set sample selected in combination with self-walking learning, the present invention selects cross entropy with good effect in classifying problems as a metric function

Taking the cross entropy loss of the current model on the selected sample as a verification set measure;

In order to further implement the above technical solution, the specific contents of S53 include:

(1) Solving the super gradient by combining the hidden function theorem:

wherein ,

Gradient of optimal response to super parameters for inner layer, +.>

Is a verification set;

in this embodiment, the model parameter ω and the loss function parameter θ are both directly related to gradient terms

Is the gradient of the training set loss to the model parameters, is a scalar, which can be regarded as about +.>

And

function of->

The method comprises the steps of carrying out a first treatment on the surface of the When the parameter omega reaches the optimal response in the inner layer optimization, the method can obtain the first-order optimal condition

I.e. +.>

And if the condition of the hidden function theorem is satisfied, then:

setting a loss function

There is a second derivative of the model parameter ω, which is derived from the hidden function theorem: />

wherein ,

super gradient calculation formula combined with hidden function theorem:

updating the loss function parameters from the obtained super gradient:

；

(2) Constructing an auxiliary function by combining the penalty function ideas:

in this embodiment, the parameter ω reaches the optimal response, the first order optimal condition is satisfied, the inner layer optimization objective is constructed as a soft constraint in combination with the idea of the penalty function, when the model parameters

The loss value of the model on the training set is small enough to achieve the optimal response, i.e. +.>

The outer layer optimization objective is to obtain parameters on the validation set that are metrics optimal +.>

I.e. +.>

The double-layer optimization problem is formed as an inequality constraint optimization problem:

constructing an auxiliary function to obtain:

wherein ,

the cross entropy loss function is chosen for the specific implementation of the objective function, i.e. the model guided by the loss function obtained by the search is able to obtain the best robustness to classified label noise on the validation set,

for constraint function, ensure that the model has sufficiently small loss on training set, sigma and epsilon are adjustable super parameters, loss function parameters ∈ ->

And model parameters->

Is a first order bias guide of (a):

。/>

in order to further implement the above technical solution, the specific content of S54 is: after obtaining the new loss function parameters from S53, iterating on the training set according to the new loss function

。

In order to further implement the above technical solution, the specific content of S55 is: set up fixed training round number

When the model is trained on the training set +.>

Continuous->

And when the round rises, ending the loss function parameter searching process.

Specific examples:

a loss function with robustness to uniformly distributed tag noise for real object classification of aircraft (airland), birds (bird), cats (cat), dogs (dog), horses (horse), ships (ship), trucks (truck), automobiles (automatic), deer (deer), and frogs (frog) is searched based on the CIFAR10 dataset.

S1, aiming at the real object classification problem based on the CIFAR10 data set, a common 18-layer or 32-layer residual neural network (ResNet 18 and ResNet 32) is selected as a basic model of classification prediction.

S2, performing Taylor expansion on cross entropy loss with good classification guiding effect on a clean data set, intercepting the first 5 items (N) as a loss function, and taking various coefficients as learnable parameters.

As a learnable parameter, the initial value is +.>

。

S3, dividing 50000 training data samples in the CIFAR10 data set into a training set containing 45000 samples

And a validation set comprising 5000 samples +.>

. Meanwhile, since the original CIFAR10 data set does not contain label noise, the assumed noise which is uniformly distributed in the true data with the percentage of p is artificially added to the training set and the verification set, namely, the sample labels are randomly converted into labels of other categories with the probability of p.

S4, setting degradation weight combined with self-learning

In the outer layer optimization process of the iteration process, the number of verification set samples selected according to the loss is gradually increased along with the increase of the iteration times.

S5, constructing a double-layer optimized algorithm main body, and performing total training wheel type

：

S51, setting trainingFront of journey

The round training serves as a arm-up phase in which the parameters of the loss function are unchanged.

S52, starting the outer layer optimization after the end of the arm-up stage. In each training round, model parameters

Every pass by

After training of the training data of the individual batches, according to the current +.>

And loss->

Select +.>

Is to obtain->

. Then from->

Sampling->

Verification set data of lot, using metric function +.>

Evaluating the current model on the selected sample to obtain +.>

。

S53, sampling from training set

Batch training data, calculating to obtain loss of current model

In combination with the outer layer metric obtained in S5-2 +.>

And updating the parameters of the current loss function based on gradient descent by a loss function parameter updating algorithm based on the hidden function theorem or the penalty function idea to obtain a new set of loss function parameters.

S54, under the condition of a new loss function, guiding the model to be in

Training was performed on each batch of training data. />

S55, repeating the steps from S5-2 to S5-4 until the completion

Training process of round or during training process, model loss on validation set +.>

Continuous->

And when the round rises, ending the loss function parameter searching process.

S56, outputting a group of super parameters with the best saving effect when the double-layer optimization algorithm is finished

Loss of this set of parameters>

As a final search results in a loss function that is robust to tag noise.

S6, according to the searched loss function, guiding training of a new basic model on the training set added with noise and the verification set not added with noise, and finishing the retraining process of the deep neural network. The classification model obtained by the process has good robustness to the label noise.

specifically, according to the difficulty level of the classification task, base models with different adaptation degrees are selected as the base depth neural network model of the classification task, the input of the base model is the original data of a sample, and the output is the classification result of the model on the sample;

specifically, taylor expansion is carried out on the traditional loss functions of different classification tasks, the first N items of the Taylor expansion polynomial are intercepted to serve as the loss functions, the coefficients of the items are used as the learnable parameters, the current value is used as the initial value of the parameters, and the parameterization of the loss functions is realized;

specifically, when the measurement of the model on the verification set is calculated in the double-layer optimization algorithm, the number of samples with small loss selected by the model per se can be gradually increased along with the increase of the training round number of the model, so that the self-step learning from a simple sample to a difficult sample on the verification set is achieved;

specifically, taking a complex function relation of model parameters with respect to loss function parameters as a hidden function, solving the derivative of the model parameters with respect to the loss function parameters by using a hidden function theorem, and further calculating to obtain the gradient of the model measurement on the verification set with respect to the loss function; constructing an unconstrained optimization auxiliary function through constraint conditions, explicitly obtaining an optimization equation of the metrics on the verification set about the loss function parameters, and calculating the gradient of the metrics on the loss function parameters;

In order to further implement the technical scheme, the double-layer optimization module comprises an inner-layer optimization unit and an outer-layer optimization unit;

；

Achieving the optimal parameters->

。

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The robustness loss function searching method for classifying task tag noise is characterized by comprising the following steps of:

s5, constructing a double-layer optimized algorithm main body, and outputting a group of super parameters on the verification set by combining hidden function theorem or penalty function thought based on a verification set sample self-selection strategy;

2. The method for searching for a robust loss function for classification task tab noise of claim 1, wherein the classification task loss function parameterized in S2

The method comprises the following steps:

wherein ,P_t For the probability that the model predictive value is a true value, Θ= { θ ₁ ,…,θ _N And the initial value is a cross entropy loss expansion coefficient, and N is the order of the selected cross entropy loss Taylor expansion.

3. The method for searching the robustness loss function for classifying task tag noise according to claim 1, wherein the specific content of S4 comprises: setting degradation weight gamma combined with self-learning, and selecting loss less than or equal to each iteration in the outer layer optimization process

4. The robust classification task-oriented label noise of claim 1The sex loss function searching method is characterized in that the inner layer optimization target in S5 is in a training set under the condition of given loss function parameters theta

The response ω of the model parameters is obtained above ^* The goal of the outer layer optimization is to rely on the current response ω ^* In verification set->

Obtaining the parameter theta ^* The method specifically comprises the following steps:

the loss function parameters are obtained by alternately optimizing the inner layer and the outer layer:

wherein η_ω Is the learning rate, eta when model parameters are updated _θ Is the learning rate when the super parameter is updated, omega is the model parameter, theta is the loss function parameter,

is super gradient.

5. The method for searching for a robustness loss function for classification task tab noise of claim 1, wherein S5 comprises:

s56, outputting a set of super parameters of the stored guidance model when the double-layer optimization algorithm is finished.

6. The method for searching for a robustness loss function for classifying task tag noise according to claim 5, wherein the specific contents of S53 include:

(1) Solving the super gradient by combining the hidden function theorem:

wherein ,ω_t+1 Is the response of model parameters in the inner layer optimization task to the super-parameter theta at the moment t,

for the inner layer to respond to the gradient of the super parameter, +.>

As a measure function +.>

Is a verification set;

setting a loss function

wherein ,

super gradient calculation formula combined with hidden function theorem:

updating the loss function parameters from the obtained super gradient:

/>

wherein ,

as an objective function, m (θ, ω) is a constraint function, and the auxiliary function is as follows:

wherein, sigma and epsilon are super parameters which can be adjusted, the loss function parameter theta and the model parameter omega are explicit existence, and when the loss function parameter is updated based on a gradient descent mode, the first-order bias of the auxiliary function relative to the loss function parameter theta is calculated:

7. the method for searching for a robust loss function for classification task tab noise of claim 5, wherein the specific contents of S54 are: after obtaining the new loss function parameters by S53, iterating the I batch data on the training set to update the model parameters according to the new loss function to obtain the response omega of the model parameters to the new loss function parameters ^* 。

8. The method for searching for a robust loss function for classification task tab noise according to claim 5, wherein the specific contents of S55 are: setting a fixed training round number T, ending the parameter search process when the model trains the T round on the training set, or when the model loses average on the verification set

Continuous T _stop And when the round rises, ending the loss function parameter searching process.

9. The robust loss function searching system for classifying task tag noise is based on the robust loss function searching method for classifying task tag noise according to any one of claims 1-8, and is characterized by comprising a deep neural network model selection module, a loss function parameterization module, a classifying task data set division module, a self-step learning module, a double-layer optimization module, a robust loss function construction module and a retraining module;

the double-layer optimization module is used for constructing a double-layer optimized algorithm main body, calling a verification set sample self-selection strategy combined with self-learning, and outputting a group of super parameters on the verification set by combining with hidden function theorem or penalty function thought;

10. The classification task-oriented label noise robustness loss function search system of claim 9, wherein the double-layer optimization module comprises an inner-layer optimization unit and an outer-layer optimization unit;

an inner layer optimization unit for outputting the response omega of model parameters obtained on the training set under the condition of given loss function parameters ^* ；

An outer layer optimizing unit for optimizing the current response omega ^* Regulating and taking knotThe verification set sample self-selection strategy combined with self-learning outputs the parameter theta obtained on the verification set ^* 。