CN115618935B - Robustness loss function searching method and system for classification task tag noise - Google Patents

Robustness loss function searching method and system for classification task tag noise Download PDF

Info

Publication number
CN115618935B
CN115618935B CN202211645114.2A CN202211645114A CN115618935B CN 115618935 B CN115618935 B CN 115618935B CN 202211645114 A CN202211645114 A CN 202211645114A CN 115618935 B CN115618935 B CN 115618935B
Authority
CN
China
Prior art keywords
loss function
function
model
parameters
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211645114.2A
Other languages
Chinese (zh)
Other versions
CN115618935A (en
Inventor
邓岳
杜金阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202211645114.2A priority Critical patent/CN115618935B/en
Publication of CN115618935A publication Critical patent/CN115618935A/en
Application granted granted Critical
Publication of CN115618935B publication Critical patent/CN115618935B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a robustness loss function searching method and system for classification task label noise, comprising a deep neural network model selection module, a loss function parameterization module, a classification task data set dividing module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module; selecting a deep neural network model for classifying tasks, and constructing parameterized loss functions with different expansion orders based on a Taylor expansion method; constructing a double-layer optimized algorithm main body, and outputting a group of super parameters with the best effect on a verification set by combining hidden function theorem or penalty function thought based on a verification set sample self-selection strategy combined with self-learning; the new loss function formed by the obtained super parameters is used as the loss function with robustness to the tag noise obtained by final search, the retraining model is guided, the rapid automatic search of the loss function with robustness to the tag noise is realized, and the method has simplicity and portability.

Description

Robustness loss function searching method and system for classification task tag noise
Technical Field
The invention relates to the technical field of deep learning, in particular to a robustness loss function searching method and system for classification task-oriented label noise.
Background
Currently, tag noise is prevalent in real-world real-scene data sets. When these noise labels are present, the deep neural network may negatively affect the target task due to the problem of overfitting, such as when used in the medical diagnostic field, the overfitting of the noise labels may seriously affect the judgment of the physician.
The traditional loss function has the problems of low convergence speed, poor training effect and the like, and has no robustness to the label noise or has robustness. The gradient contribution of the conventional Class Cross Entropy (CCE) to the samples of class errors is greater, and when there is a true label and the label is false, the class cross entropy is more prone to fit the noise label and therefore is not robust to the label noise. The average absolute error (MAE) loss contributes uniformly to the gradient of all samples, so that the model has robustness to the label noise, but also cannot provide effective guidance for the model in the training process, has low convergence speed and has poor model training effect.
The robustness loss function of manual design is typically based on good a priori knowledge and introduces super parameters that need to be manually adjusted. The extensive cross entropy (GCE) is based on the thought of combining the advantages of the average absolute error and the classification cross entropy, and introduces the super parameter q whenGCE is equivalent to CCE when q tends to 0, and MAE when q=1; symmetric Cross Entropy (SCE) from the standpoint of KL divergence, inverse Cross entropy (RCE) based on model predictions is defined by introducing coefficients
Figure 135286DEST_PATH_IMAGE001
and />
Figure 693306DEST_PATH_IMAGE002
Linear combining with CCEs is robust to tag noise, but these loss function designs are based on good a priori knowledge, the cost of the design is high, and the introduction of super-parameters can make the application on different tasks inflexible.
The traditional automatic searching method of the loss function combined with genetic programming and the like has the problems of high calculation cost, low searching speed, discrete searching space, need of verification set without label noise and the like. Based on the idea of autopl, many studies desire to be able to automatically search for a loss function that is robust to tag noise. In the AutoLoss-Zero method, a loss function is split into polynomial combinations among different operators, a search space formed by basic operators is defined, a calculation diagram is used for representing a loss function, the loss function is constructed through basic calculation on the diagram, and finally an evolutionary algorithm is adopted for searching the loss function; in addition, a common method is to search a group of parameters with good label noise resistance by combining a genetic algorithm method on a clean verification set by introducing an adjustable parameter group in consideration of the fact that the existing loss function has certain superiority; however, such methods cannot utilize fast search with gradient descent due to discrete search space, and the conventional methods of grid search and genetic programming have high computational cost and slow computational speed.
Therefore, how to provide an efficient and rapid method and system for searching the robust loss function of the label noise of the classification task is a problem to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a method and a system for searching a robust loss function for classifying task tag noise, which solve the problems of high cost, introduction of super-parameters and discrete search space which need to be manually adjusted, low search speed, need of a noiseless tag verification set and the like of the existing method for designing the loss function with robustness to the tag noise.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the task tag noise-oriented robustness loss function searching method comprises the following steps:
s1, selecting a deep neural network model for learning different difficulty classification tasks;
s2, constructing parameterized classification task loss functions with different expansion orders according to a Taylor expansion method based on a deep neural network model;
s3, dividing the data set of the classification task into a noisy training set for inner layer optimization, a noisy verification set for outer layer optimization and a clean test set for testing;
s4, setting a verification set sample self-selection strategy combined with self-learning;
s5, constructing a double-layer optimized algorithm main body, and outputting a group of super parameters with the best effect on the verification set by combining hidden function theorem or penalty function thought based on a verification set sample self-selection strategy;
s6, forming a new loss function by the obtained super parameters to serve as a loss function with robustness to the tag noise, which is obtained through final search;
s7, retraining the deep neural network, and guiding training of a new basic model on the training set added with noise and the verification set not added with noise according to the searched loss function to finish the retraining process of the deep neural network.
In practical application, the different difficulty classification tasks comprise simple classification tasks such as handwriting digital recognition and complex classification tasks such as large-scale image multi-classification tasks.
Preferably, the parameterized classification task loss function in S2
Figure 815983DEST_PATH_IMAGE003
The method comprises the following steps:
Figure 916794DEST_PATH_IMAGE004
wherein ,
Figure 833934DEST_PATH_IMAGE005
probability of being the model predictive value to be the true value, +.>
Figure 285775DEST_PATH_IMAGE006
For a learnable hyper-parameter, the initial value is the cross entropy loss expansion coefficient,Nthe taylor's expanded order is lost for the selected cross entropy.
Preferably, the specific content of S4 includes: setting degradation weights in combination with self-learning
Figure 946564DEST_PATH_IMAGE007
The loss is selected to be less than or equal to +.>
Figure 26515DEST_PATH_IMAGE008
The samples with high model classification accuracy confidence are screened out as the samples with correct classification labels, and the samples of the verification set selected according to the loss are gradually increased along with the increase of the iteration times.
Preferably, the inner layer optimization in S5 is targeted at a given loss function parameter
Figure 255503DEST_PATH_IMAGE009
Under the condition of +.>
Figure 53694DEST_PATH_IMAGE010
Optimal response of the model parameters obtained above +.>
Figure 518174DEST_PATH_IMAGE011
The goal of the outer layer optimization is to respond +.>
Figure 829463DEST_PATH_IMAGE011
In verification set->
Figure 353985DEST_PATH_IMAGE012
The above-mentioned measure for making the evaluation model classifying property +.>
Figure 514839DEST_PATH_IMAGE013
Achieving the optimal parameters->
Figure 517430DEST_PATH_IMAGE014
The method specifically comprises the following steps:
Figure 571974DEST_PATH_IMAGE015
alternating the inner and outer bilayer optimizations to obtain the best loss function parameters:
Figure 142763DEST_PATH_IMAGE016
Figure 384389DEST_PATH_IMAGE017
wherein
Figure 456250DEST_PATH_IMAGE018
Is the learning rate when model parameters are updated, +.>
Figure 240667DEST_PATH_IMAGE019
Is learning rate when super parameter is updated, ω is model parameter, θ is loss function parameter, ++>
Figure 106991DEST_PATH_IMAGE020
Is super gradient.
Preferably, S5 comprises:
s51, setting a norm-up stage, training a deep neural network model under the guidance of an initialized loss function, and ensuring that the deep neural network learns the ability of correctly classifying simple samples;
s52, evaluating the performance of a model obtained by training under the condition of noise existence on a verification set by a metric function meeting a micro-condition in outer layer optimization;
s53, updating parameters of the loss function based on a gradient descent mode by combining hidden function theorem or penalty function idea;
s54, training the model on a training set according to the newly obtained loss function;
s55, repeating the steps from S52 to S54 until the iteration is finished or the algorithm converges;
s56, outputting a group of super parameters with the best effect of overcoming the noise influence of the classification labels by the stored guidance model when the double-layer optimization algorithm is finished.
Preferably, the specific content of S53 includes:
(1) Solving the super gradient by combining the hidden function theorem:
the calculation of the supergradient can be converted into the following form according to the function-derived chain law:
Figure 101492DEST_PATH_IMAGE021
wherein ,
Figure 85366DEST_PATH_IMAGE022
is the best response of model parameters to t-moment super-parameter theta in the inner layer optimization task, and is +.>
Figure 848923DEST_PATH_IMAGE023
Gradient of optimal response to super parameters for inner layer, +.>
Figure 620570DEST_PATH_IMAGE013
To evaluate the metric function of the robustness effect of the model trained under noisy conditions on the noise of the classification tag, +.>
Figure 243312DEST_PATH_IMAGE012
Is a verification set; the first term on the right of the equation is equal to the constant zero and can be ignored.
Setting a loss function
Figure 656976DEST_PATH_IMAGE024
There is a second derivative of the model parameter ω, then according to the hidden function theorem:
Figure 415985DEST_PATH_IMAGE025
wherein ,
Figure 889691DEST_PATH_IMAGE026
for the Hessian matrix, the inverse of the Hessian matrix is effectively approximated using a Neumann series:
Figure 734151DEST_PATH_IMAGE027
super gradient calculation formula combined with hidden function theorem:
Figure 328336DEST_PATH_IMAGE028
updating the loss function parameters from the obtained super gradient:
Figure 597644DEST_PATH_IMAGE029
(2) Constructing an auxiliary function of the inequality constraint optimization problem by combining the penalty function idea:
Figure 586460DEST_PATH_IMAGE030
wherein ,
Figure 42849DEST_PATH_IMAGE031
for the purpose of +.>
Figure 643988DEST_PATH_IMAGE032
As a constraint function, the auxiliary function is in the form of:
Figure 343302DEST_PATH_IMAGE033
wherein sigma and epsilon are both adjustable super-parameters and loss function parameters
Figure 565336DEST_PATH_IMAGE009
And model parameters->
Figure 243442DEST_PATH_IMAGE034
For explicit existence, the auxiliary function is calculated with respect to the loss function parameter when the loss function parameter is updated on the basis of the gradient descent>
Figure 677965DEST_PATH_IMAGE009
Is a first order bias guide of (a):
Figure 125127DEST_PATH_IMAGE035
preferably, the specific content of S54 is: after obtaining the new loss function parameters from S53, iterating on the training set according to the new loss function
Figure 485439DEST_PATH_IMAGE036
Updating model parameters by batch data to obtain optimal response of the model parameters to new loss function parameters>
Figure 916421DEST_PATH_IMAGE011
Preferably, the specific content of S55 is: set up fixed training round number
Figure 748110DEST_PATH_IMAGE037
When the model is trained on the training set +.>
Figure 190724DEST_PATH_IMAGE037
After the round, the parameter search process ends, or when the average loss of the model on the validation set +.>
Figure 82457DEST_PATH_IMAGE038
Continuous->
Figure 876100DEST_PATH_IMAGE039
And when the round rises, ending the loss function parameter searching process.
The robustness loss function search system for the classified task tag noise comprises a deep neural network model selection module, a loss function parameterization module, a classified task data set division module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module;
the deep neural network model selection module outputs a deep neural network model according to the difficulty level of the classification task;
the loss function parameterization module is used for constructing parameterized classification task loss functions with different expansion orders based on a Taylor expansion method according to the deep neural network model;
the classification task data set dividing module is used for outputting a noisy training set for inner layer optimization, a noisy verification set for outer layer optimization and a clean test set for testing based on the classification task data set;
the self-step learning module outputs a set verification set sample self-selection strategy combined with self-step learning;
the double-layer optimization module is used for constructing a double-layer optimized algorithm main body, calling a verification set sample self-selection strategy combined with self-learning, and outputting a group of super parameters with the best effect on the verification set by combining with hidden function theorem or penalty function thought;
the robust loss function construction module is used for composing a new loss function by the obtained super parameters, outputting a loss function which is obtained by final search and has robustness to the classified task tag noise:
and the retraining module is used for guiding training of a new basic model on the training set added with noise and the verification set not added with noise according to the searched loss function, and finishing the retraining process of the deep neural network.
Preferably, the double-layer optimization module comprises an inner-layer optimization unit and an outer-layer optimization unit;
an inner layer optimization unit for outputting the optimal response of model parameters obtained on the training set under the condition of given loss function parameters
Figure 245902DEST_PATH_IMAGE011
An outer layer optimizing unit for optimizing the response according to the current best response
Figure 933235DEST_PATH_IMAGE011
Invoking a verification set sample self-selection strategy combined with self-learning, and outputting a measurement which is obtained on the verification set and enables the classification performance of the evaluation model to be improved>
Figure 871235DEST_PATH_IMAGE013
Achieving the optimal parameters->
Figure 11230DEST_PATH_IMAGE014
Compared with the prior art, the invention discloses a robust loss function searching method and a robust loss function searching system for classifying task tag noise, wherein a Taylor expansion mode is adopted to construct a continuous searching space parameterized by a proper amount of variables and capable of representing enough wide function capacity, and conditions are created for realizing gradient descent-based loss function searching; the method combines self-learning and gradient descent-based double-layer optimization, searches the loss function in the continuous search space in a gradient descent mode on the verification set with the tag noise, finally realizes quick and automatic search of the loss function with robust tag noise, can be easily deployed in different classification tasks due to the simplicity and portability of a search algorithm, has the characteristic of applicability and flexibility, and provides a new idea for overcoming the tag noise.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a robust loss function search method according to the present invention;
fig. 2 is a schematic diagram of a double-layer optimization structure for realizing gradient descent-based search loss function parameters.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention discloses a task tag noise-oriented robustness loss function searching method, which comprises the following steps:
s1, selecting a deep neural network model for learning different difficulty classification tasks;
s2, constructing parameterized classification task loss functions with different expansion orders according to a Taylor expansion method based on a deep neural network model;
s3, dividing the data set of the classification task into a noisy training set for inner layer optimization, a noisy verification set for outer layer optimization and a clean test set for testing;
s4, setting a verification set sample self-selection strategy combined with self-learning;
s5, constructing a double-layer optimized algorithm main body, and outputting a group of super parameters with the best effect on the verification set by combining hidden function theorem or penalty function thought based on a verification set sample self-selection strategy;
s6, forming a new loss function by the obtained super parameters to serve as a loss function with robustness to the tag noise, which is obtained through final search;
s7, retraining the deep neural network, and guiding training of a new basic model on the training set added with noise and the verification set not added with noise according to the searched loss function to finish the retraining process of the deep neural network.
In practical application, in the image classification task, residual neural network models (ResNet) with different depths can be selected according to the learning difficulty of the provided image data, and when the residual neural network models are used for image classification data sets easy to learn such as MNIST, a plurality of layers of Convolutional Neural Networks (CNN) can be simply defined as a basic model.
In this embodiment, the parameterization of the loss function is implemented by performing taylor expansion on the conventional classification cross entropy. For classification problems, sample tags typically use One-hot coding, based on which cross entropy loss is expanded into taylor polynomials, resulting in the following form:
Figure 450301DEST_PATH_IMAGE040
the coefficient of the jth term in the expanded cross entropy loss cancels the polynomial pair
Figure 368972DEST_PATH_IMAGE041
The power of the j-th term after derivation is not the optimal set of coefficients in the presence of label noise, so the combination of N-th terms before the expansion polynomial is taken as a loss function, and the coefficients are set as learnable parameters->
Figure 9032DEST_PATH_IMAGE009
The resulting parameterized postloss function->
Figure 636322DEST_PATH_IMAGE003
To go aheadImplementing the technical scheme, the parameterized classification task loss function in S2
Figure 488872DEST_PATH_IMAGE003
The method comprises the following steps:
Figure 150797DEST_PATH_IMAGE004
wherein ,
Figure 696179DEST_PATH_IMAGE005
probability of being the model predictive value to be the true value, +.>
Figure 545187DEST_PATH_IMAGE006
As a learnable parameter, the initial value is the cross entropy loss expansion coefficient,Nthe taylor's expanded order is lost for the selected cross entropy.
In this embodiment, since each coefficient obtained after taylor expansion is constant and each coefficient counteracts the influence of each power after derivation, the robustness of cross entropy to classification task label noise is caused by unreasonable distribution of each coefficient to final gradient contribution when calculating gradient, so each coefficient after taylor expansion is used as a learnable parameter in this patent to find a set of coefficients robust to label noise.
The first term is the mean absolute error loss (MAE) of robustness to tag noise, which illustrates that the existing robustness loss function may be one case in the search space; at the same time, since the polynomial has the property of fitting an arbitrary function, the search space based on the taylor expansion structure has the capability of representing a sufficiently broad function.
In order to further implement the above technical solution, the specific content of S4 includes: setting degradation weights in combination with self-learning
Figure 326061DEST_PATH_IMAGE007
The loss is selected to be less than or equal to +.>
Figure 950815DEST_PATH_IMAGE008
Is used for evaluating model parameters
Figure 526153DEST_PATH_IMAGE034
Screening out samples with high model classification accuracy confidence as samples with correct classification labels, providing supervision information for optimizing outer layer problems, and gradually reducing ++as the iteration number increases>
Figure 128035DEST_PATH_IMAGE007
The value of>
Figure 322387DEST_PATH_IMAGE007
As the value of (2) approaches 0, the samples of the validation set selected based on the loss gradually increase.
In order to further implement the above technical solution, the inner layer optimization in S5 is aimed at giving the loss function parameters
Figure 693326DEST_PATH_IMAGE009
Under the condition of +.>
Figure 580510DEST_PATH_IMAGE010
Optimal response of the model parameters obtained above +.>
Figure 138531DEST_PATH_IMAGE011
The goal of the outer layer optimization is to respond +.>
Figure 526787DEST_PATH_IMAGE011
In verification set->
Figure 627598DEST_PATH_IMAGE012
The above-mentioned measure for making the evaluation model classifying property +.>
Figure 544738DEST_PATH_IMAGE013
Achieving the optimal parameters->
Figure 763624DEST_PATH_IMAGE014
The method specifically comprises the following steps:
Figure 689991DEST_PATH_IMAGE042
alternating the inner and outer bilayer optimizations to obtain the best loss function parameters:
Figure 645309DEST_PATH_IMAGE043
Figure 264509DEST_PATH_IMAGE044
wherein
Figure 938067DEST_PATH_IMAGE018
Is the learning rate when model parameters are updated, +.>
Figure 136967DEST_PATH_IMAGE019
Is learning rate when super parameter is updated, ω is model parameter, θ is loss function parameter, ++>
Figure 71425DEST_PATH_IMAGE020
Is super gradient. />
In order to further implement the above technical solution, S5 includes:
s51, setting a norm-up stage, training a deep neural network model under the guidance of an initialized loss function, and ensuring that the deep neural network learns the ability of correctly classifying simple samples;
in this embodiment, the model has not been fitted with samples with false labels at the initial stage of training, thus has good generalization ability, while in order to ensure that the model has classification ability to distinguish simple samples when sample data is selected on the verification set based on a self-learning method, a norm-up stage is set in the double-layer optimization process, before
Figure 736893DEST_PATH_IMAGE045
Use of initialized loss function in wheel training +.>
Figure 491222DEST_PATH_IMAGE046
Normally guiding and training a network model;
s52, evaluating the performance of a model obtained by training under the condition of noise existence on a verification set by a metric function meeting a micro-condition in outer layer optimization;
in this embodiment, based on the verification set sample selected in combination with self-walking learning, the present invention selects cross entropy with good effect in classifying problems as a metric function
Figure 759392DEST_PATH_IMAGE013
Taking the cross entropy loss of the current model on the selected sample as a verification set measure;
s53, updating parameters of the loss function based on a gradient descent mode by combining hidden function theorem or penalty function idea;
s54, training the model on a training set according to the newly obtained loss function;
s55, repeating the steps from S52 to S54 until the iteration is finished or the algorithm converges;
s56, outputting a group of super parameters with the best effect of overcoming the noise influence of the classification labels by the stored guidance model when the double-layer optimization algorithm is finished.
In order to further implement the above technical solution, the specific contents of S53 include:
(1) Solving the super gradient by combining the hidden function theorem:
the calculation of the supergradient can be converted into the following form according to the function-derived chain law:
Figure 922258DEST_PATH_IMAGE021
wherein ,
Figure 352103DEST_PATH_IMAGE022
is the best response of model parameters to t-moment super-parameter theta in the inner layer optimization task, and is +.>
Figure 124887DEST_PATH_IMAGE023
Gradient of optimal response to super parameters for inner layer, +.>
Figure 806535DEST_PATH_IMAGE013
To evaluate the metric function of the robustness effect of the model trained under noisy conditions on the noise of the classification tag, +.>
Figure 450006DEST_PATH_IMAGE012
Is a verification set;
in this embodiment, the model parameter ω and the loss function parameter θ are both directly related to gradient terms
Figure 660538DEST_PATH_IMAGE047
Is the gradient of the training set loss to the model parameters, is a scalar, which can be regarded as about +.>
Figure 655039DEST_PATH_IMAGE034
And
Figure 265012DEST_PATH_IMAGE009
function of->
Figure 670979DEST_PATH_IMAGE048
The method comprises the steps of carrying out a first treatment on the surface of the When the parameter omega reaches the optimal response in the inner layer optimization, the method can obtain the first-order optimal condition
Figure 442626DEST_PATH_IMAGE049
I.e. +.>
Figure 65368DEST_PATH_IMAGE050
And if the condition of the hidden function theorem is satisfied, then:
setting a loss function
Figure 479032DEST_PATH_IMAGE024
There is a second derivative of the model parameter ω, which is derived from the hidden function theorem: />
Figure 97095DEST_PATH_IMAGE025
wherein ,
Figure 446168DEST_PATH_IMAGE026
for the Hessian matrix, the inverse of the Hessian matrix is effectively approximated using a Neumann series:
Figure 149682DEST_PATH_IMAGE027
super gradient calculation formula combined with hidden function theorem:
Figure 976824DEST_PATH_IMAGE028
updating the loss function parameters from the obtained super gradient:
Figure 714972DEST_PATH_IMAGE029
(2) Constructing an auxiliary function by combining the penalty function ideas:
in this embodiment, the parameter ω reaches the optimal response, the first order optimal condition is satisfied, the inner layer optimization objective is constructed as a soft constraint in combination with the idea of the penalty function, when the model parameters
Figure 94001DEST_PATH_IMAGE034
The loss value of the model on the training set is small enough to achieve the optimal response, i.e. +.>
Figure 658712DEST_PATH_IMAGE051
The outer layer optimization objective is to obtain parameters on the validation set that are metrics optimal +.>
Figure 414179DEST_PATH_IMAGE009
I.e. +.>
Figure 6834DEST_PATH_IMAGE052
The double-layer optimization problem is formed as an inequality constraint optimization problem:
Figure 432130DEST_PATH_IMAGE030
constructing an auxiliary function to obtain:
Figure 375816DEST_PATH_IMAGE053
wherein ,
Figure 544760DEST_PATH_IMAGE031
the cross entropy loss function is chosen for the specific implementation of the objective function, i.e. the model guided by the loss function obtained by the search is able to obtain the best robustness to classified label noise on the validation set,
Figure 991922DEST_PATH_IMAGE032
for constraint function, ensure that the model has sufficiently small loss on training set, sigma and epsilon are adjustable super parameters, loss function parameters ∈ ->
Figure 712753DEST_PATH_IMAGE009
And model parameters->
Figure 19101DEST_PATH_IMAGE034
For explicit existence, the auxiliary function is calculated with respect to the loss function parameter when the loss function parameter is updated on the basis of the gradient descent>
Figure 850791DEST_PATH_IMAGE009
Is a first order bias guide of (a):
Figure 152459DEST_PATH_IMAGE035
。/>
in order to further implement the above technical solution, the specific content of S54 is: after obtaining the new loss function parameters from S53, iterating on the training set according to the new loss function
Figure 44191DEST_PATH_IMAGE036
Updating model parameters by batch data to obtain optimal response of the model parameters to new loss function parameters>
Figure 62002DEST_PATH_IMAGE011
In order to further implement the above technical solution, the specific content of S55 is: set up fixed training round number
Figure 697383DEST_PATH_IMAGE037
When the model is trained on the training set +.>
Figure 119137DEST_PATH_IMAGE037
After the round, the parameter search process ends, or when the average loss of the model on the validation set +.>
Figure 57137DEST_PATH_IMAGE038
Continuous->
Figure 197131DEST_PATH_IMAGE039
And when the round rises, ending the loss function parameter searching process.
Specific examples:
a loss function with robustness to uniformly distributed tag noise for real object classification of aircraft (airland), birds (bird), cats (cat), dogs (dog), horses (horse), ships (ship), trucks (truck), automobiles (automatic), deer (deer), and frogs (frog) is searched based on the CIFAR10 dataset.
S1, aiming at the real object classification problem based on the CIFAR10 data set, a common 18-layer or 32-layer residual neural network (ResNet 18 and ResNet 32) is selected as a basic model of classification prediction.
S2, performing Taylor expansion on cross entropy loss with good classification guiding effect on a clean data set, intercepting the first 5 items (N) as a loss function, and taking various coefficients as learnable parameters.
Figure 370624DEST_PATH_IMAGE054
Figure 522250DEST_PATH_IMAGE055
As a learnable parameter, the initial value is +.>
Figure 755786DEST_PATH_IMAGE056
S3, dividing 50000 training data samples in the CIFAR10 data set into a training set containing 45000 samples
Figure 117497DEST_PATH_IMAGE010
And a validation set comprising 5000 samples +.>
Figure 235626DEST_PATH_IMAGE012
. Meanwhile, since the original CIFAR10 data set does not contain label noise, the assumed noise which is uniformly distributed in the true data with the percentage of p is artificially added to the training set and the verification set, namely, the sample labels are randomly converted into labels of other categories with the probability of p.
S4, setting degradation weight combined with self-learning
Figure 631972DEST_PATH_IMAGE007
In the outer layer optimization process of the iteration process, the number of verification set samples selected according to the loss is gradually increased along with the increase of the iteration times.
S5, constructing a double-layer optimized algorithm main body, and performing total training wheel type
Figure 36408DEST_PATH_IMAGE057
S51, setting trainingFront of journey
Figure 619836DEST_PATH_IMAGE058
The round training serves as a arm-up phase in which the parameters of the loss function are unchanged.
S52, starting the outer layer optimization after the end of the arm-up stage. In each training round, model parameters
Figure 40191DEST_PATH_IMAGE034
Every pass by
Figure 291044DEST_PATH_IMAGE059
After training of the training data of the individual batches, according to the current +.>
Figure 866382DEST_PATH_IMAGE034
And loss->
Figure 812472DEST_PATH_IMAGE003
Select +.>
Figure 397037DEST_PATH_IMAGE060
Is to obtain->
Figure 502397DEST_PATH_IMAGE061
. Then from->
Figure 389581DEST_PATH_IMAGE061
Sampling->
Figure 213181DEST_PATH_IMAGE062
Verification set data of lot, using metric function +.>
Figure 70278DEST_PATH_IMAGE063
Evaluating the current model on the selected sample to obtain +.>
Figure 171089DEST_PATH_IMAGE064
S53, sampling from training set
Figure 88230DEST_PATH_IMAGE065
Batch training data, calculating to obtain loss of current model
Figure 399125DEST_PATH_IMAGE066
In combination with the outer layer metric obtained in S5-2 +.>
Figure 436745DEST_PATH_IMAGE064
And updating the parameters of the current loss function based on gradient descent by a loss function parameter updating algorithm based on the hidden function theorem or the penalty function idea to obtain a new set of loss function parameters.
S54, under the condition of a new loss function, guiding the model to be in
Figure 516696DEST_PATH_IMAGE036
Training was performed on each batch of training data. />
S55, repeating the steps from S5-2 to S5-4 until the completion
Figure 604738DEST_PATH_IMAGE067
Training process of round or during training process, model loss on validation set +.>
Figure 278296DEST_PATH_IMAGE038
Continuous->
Figure 742775DEST_PATH_IMAGE068
And when the round rises, ending the loss function parameter searching process.
S56, outputting a group of super parameters with the best saving effect when the double-layer optimization algorithm is finished
Figure 677233DEST_PATH_IMAGE069
Loss of this set of parameters>
Figure 936176DEST_PATH_IMAGE003
As a final search results in a loss function that is robust to tag noise.
S6, according to the searched loss function, guiding training of a new basic model on the training set added with noise and the verification set not added with noise, and finishing the retraining process of the deep neural network. The classification model obtained by the process has good robustness to the label noise.
The robustness loss function search system for the classified task tag noise comprises a deep neural network model selection module, a loss function parameterization module, a classified task data set division module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module;
the deep neural network model selection module outputs a deep neural network model according to the difficulty level of the classification task;
specifically, according to the difficulty level of the classification task, base models with different adaptation degrees are selected as the base depth neural network model of the classification task, the input of the base model is the original data of a sample, and the output is the classification result of the model on the sample;
the loss function parameterization module is used for constructing parameterized classification task loss functions with different expansion orders based on a Taylor expansion method according to the deep neural network model;
specifically, taylor expansion is carried out on the traditional loss functions of different classification tasks, the first N items of the Taylor expansion polynomial are intercepted to serve as the loss functions, the coefficients of the items are used as the learnable parameters, the current value is used as the initial value of the parameters, and the parameterization of the loss functions is realized;
the classification task data set dividing module is used for outputting a noisy training set for inner layer optimization, a noisy verification set for outer layer optimization and a clean test set for testing based on the classification task data set;
the self-step learning module outputs a set verification set sample self-selection strategy combined with self-step learning;
specifically, when the measurement of the model on the verification set is calculated in the double-layer optimization algorithm, the number of samples with small loss selected by the model per se can be gradually increased along with the increase of the training round number of the model, so that the self-step learning from a simple sample to a difficult sample on the verification set is achieved;
the double-layer optimization module is used for constructing a double-layer optimized algorithm main body, calling a verification set sample self-selection strategy combined with self-learning, and outputting a group of super parameters with the best effect on the verification set by combining with hidden function theorem or penalty function thought;
specifically, taking a complex function relation of model parameters with respect to loss function parameters as a hidden function, solving the derivative of the model parameters with respect to the loss function parameters by using a hidden function theorem, and further calculating to obtain the gradient of the model measurement on the verification set with respect to the loss function; constructing an unconstrained optimization auxiliary function through constraint conditions, explicitly obtaining an optimization equation of the metrics on the verification set about the loss function parameters, and calculating the gradient of the metrics on the loss function parameters;
the robust loss function construction module is used for composing a new loss function by the obtained super parameters, outputting a loss function which is obtained by final search and has robustness to the classified task tag noise:
and the retraining module is used for guiding training of a new basic model on the training set added with noise and the verification set not added with noise according to the searched loss function, and finishing the retraining process of the deep neural network.
In order to further implement the technical scheme, the double-layer optimization module comprises an inner-layer optimization unit and an outer-layer optimization unit;
an inner layer optimization unit for outputting the optimal response of model parameters obtained on the training set under the condition of given loss function parameters
Figure 831451DEST_PATH_IMAGE011
An outer layer optimizing unit for optimizing the response according to the current best response
Figure 365201DEST_PATH_IMAGE011
Invoking a verification set sample self-selection strategy combined with self-learning, and outputting a measurement which is obtained on the verification set and enables the classification performance of the evaluation model to be improved>
Figure 154165DEST_PATH_IMAGE013
Achieving the optimal parameters->
Figure 459376DEST_PATH_IMAGE014
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The robustness loss function searching method for classifying task tag noise is characterized by comprising the following steps of:
s1, selecting a deep neural network model for learning different difficulty classification tasks;
s2, constructing parameterized classification task loss functions with different expansion orders according to a Taylor expansion method based on a deep neural network model;
s3, dividing the data set of the classification task into a noisy training set for inner layer optimization, a noisy verification set for outer layer optimization and a clean test set for testing;
s4, setting a verification set sample self-selection strategy combined with self-learning;
s5, constructing a double-layer optimized algorithm main body, and outputting a group of super parameters on the verification set by combining hidden function theorem or penalty function thought based on a verification set sample self-selection strategy;
s6, forming a new loss function by the obtained super parameters to serve as a loss function with robustness to the tag noise, which is obtained through final search;
s7, retraining the deep neural network, and guiding training of a new basic model on the training set added with noise and the verification set not added with noise according to the searched loss function to finish the retraining process of the deep neural network.
2. The method for searching for a robust loss function for classification task tab noise of claim 1, wherein the classification task loss function parameterized in S2
Figure QLYQS_1
The method comprises the following steps:
Figure QLYQS_2
wherein ,Pt For the probability that the model predictive value is a true value, Θ= { θ 1 ,…,θ N And the initial value is a cross entropy loss expansion coefficient, and N is the order of the selected cross entropy loss Taylor expansion.
3. The method for searching the robustness loss function for classifying task tag noise according to claim 1, wherein the specific content of S4 comprises: setting degradation weight gamma combined with self-learning, and selecting loss less than or equal to each iteration in the outer layer optimization process
Figure QLYQS_3
The samples with high model classification accuracy confidence are screened out as the samples with correct classification labels, and the samples of the verification set selected according to the loss are gradually increased along with the increase of the iteration times.
4. The robust classification task-oriented label noise of claim 1The sex loss function searching method is characterized in that the inner layer optimization target in S5 is in a training set under the condition of given loss function parameters theta
Figure QLYQS_4
The response ω of the model parameters is obtained above * The goal of the outer layer optimization is to rely on the current response ω * In verification set->
Figure QLYQS_5
Obtaining the parameter theta * The method specifically comprises the following steps:
Figure QLYQS_6
Figure QLYQS_7
the loss function parameters are obtained by alternately optimizing the inner layer and the outer layer:
Figure QLYQS_8
Figure QLYQS_9
wherein ηω Is the learning rate, eta when model parameters are updated θ Is the learning rate when the super parameter is updated, omega is the model parameter, theta is the loss function parameter,
Figure QLYQS_10
is super gradient.
5. The method for searching for a robustness loss function for classification task tab noise of claim 1, wherein S5 comprises:
s51, setting a norm-up stage, training a deep neural network model under the guidance of an initialized loss function, and ensuring that the deep neural network learns the ability of correctly classifying simple samples;
s52, evaluating the performance of a model obtained by training under the condition of noise existence on a verification set by a metric function meeting a micro-condition in outer layer optimization;
s53, updating parameters of the loss function based on a gradient descent mode by combining hidden function theorem or penalty function idea;
s54, training the model on a training set according to the newly obtained loss function;
s55, repeating the steps from S52 to S54 until the iteration is finished or the algorithm converges;
s56, outputting a set of super parameters of the stored guidance model when the double-layer optimization algorithm is finished.
6. The method for searching for a robustness loss function for classifying task tag noise according to claim 5, wherein the specific contents of S53 include:
(1) Solving the super gradient by combining the hidden function theorem:
the calculation of the supergradient can be converted into the following form according to the function-derived chain law:
Figure QLYQS_11
wherein ,ωt+1 Is the response of model parameters in the inner layer optimization task to the super-parameter theta at the moment t,
Figure QLYQS_12
for the inner layer to respond to the gradient of the super parameter, +.>
Figure QLYQS_13
As a measure function +.>
Figure QLYQS_14
Is a verification set;
setting a loss function
Figure QLYQS_15
There is a second derivative of the model parameter ω, then according to the hidden function theorem:
Figure QLYQS_16
wherein ,
Figure QLYQS_17
for the Hessian matrix, the inverse of the Hessian matrix is effectively approximated using a Neumann series:
Figure QLYQS_18
super gradient calculation formula combined with hidden function theorem:
Figure QLYQS_19
updating the loss function parameters from the obtained super gradient:
Figure QLYQS_20
/>
(2) Constructing an auxiliary function of the inequality constraint optimization problem by combining the penalty function idea:
Figure QLYQS_21
Figure QLYQS_22
wherein ,
Figure QLYQS_23
as an objective function, m (θ, ω) is a constraint function, and the auxiliary function is as follows:
Figure QLYQS_24
wherein, sigma and epsilon are super parameters which can be adjusted, the loss function parameter theta and the model parameter omega are explicit existence, and when the loss function parameter is updated based on a gradient descent mode, the first-order bias of the auxiliary function relative to the loss function parameter theta is calculated:
Figure QLYQS_25
7. the method for searching for a robust loss function for classification task tab noise of claim 5, wherein the specific contents of S54 are: after obtaining the new loss function parameters by S53, iterating the I batch data on the training set to update the model parameters according to the new loss function to obtain the response omega of the model parameters to the new loss function parameters *
8. The method for searching for a robust loss function for classification task tab noise according to claim 5, wherein the specific contents of S55 are: setting a fixed training round number T, ending the parameter search process when the model trains the T round on the training set, or when the model loses average on the verification set
Figure QLYQS_26
Continuous T stop And when the round rises, ending the loss function parameter searching process.
9. The robust loss function searching system for classifying task tag noise is based on the robust loss function searching method for classifying task tag noise according to any one of claims 1-8, and is characterized by comprising a deep neural network model selection module, a loss function parameterization module, a classifying task data set division module, a self-step learning module, a double-layer optimization module, a robust loss function construction module and a retraining module;
the deep neural network model selection module outputs a deep neural network model according to the difficulty level of the classification task;
the loss function parameterization module is used for constructing parameterized classification task loss functions with different expansion orders based on a Taylor expansion method according to the deep neural network model;
the classification task data set dividing module is used for outputting a noisy training set for inner layer optimization, a noisy verification set for outer layer optimization and a clean test set for testing based on the classification task data set;
the self-step learning module outputs a set verification set sample self-selection strategy combined with self-step learning;
the double-layer optimization module is used for constructing a double-layer optimized algorithm main body, calling a verification set sample self-selection strategy combined with self-learning, and outputting a group of super parameters on the verification set by combining with hidden function theorem or penalty function thought;
the robust loss function construction module is used for composing a new loss function by the obtained super parameters, outputting a loss function which is obtained by final search and has robustness to the classified task tag noise:
and the retraining module is used for guiding training of a new basic model on the training set added with noise and the verification set not added with noise according to the searched loss function, and finishing the retraining process of the deep neural network.
10. The classification task-oriented label noise robustness loss function search system of claim 9, wherein the double-layer optimization module comprises an inner-layer optimization unit and an outer-layer optimization unit;
an inner layer optimization unit for outputting the response omega of model parameters obtained on the training set under the condition of given loss function parameters *
An outer layer optimizing unit for optimizing the current response omega * Regulating and taking knotThe verification set sample self-selection strategy combined with self-learning outputs the parameter theta obtained on the verification set *
CN202211645114.2A 2022-12-21 2022-12-21 Robustness loss function searching method and system for classification task tag noise Active CN115618935B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211645114.2A CN115618935B (en) 2022-12-21 2022-12-21 Robustness loss function searching method and system for classification task tag noise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211645114.2A CN115618935B (en) 2022-12-21 2022-12-21 Robustness loss function searching method and system for classification task tag noise

Publications (2)

Publication Number Publication Date
CN115618935A CN115618935A (en) 2023-01-17
CN115618935B true CN115618935B (en) 2023-05-05

Family

ID=84879818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211645114.2A Active CN115618935B (en) 2022-12-21 2022-12-21 Robustness loss function searching method and system for classification task tag noise

Country Status (1)

Country Link
CN (1) CN115618935B (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446927B (en) * 2016-07-07 2019-05-28 浙江大学 It is a kind of to enhance image classification method and system from step
WO2019207581A1 (en) * 2018-04-22 2019-10-31 Technion Research & Development Foundation Limited System and method for emulating quantization noise for a neural network
CN109242028A (en) * 2018-09-19 2019-01-18 西安电子科技大学 SAR image classification method based on 2D-PCA and convolutional neural networks
CN110110780B (en) * 2019-04-30 2023-04-07 南开大学 Image classification method based on antagonistic neural network and massive noise data
CN112101328A (en) * 2020-11-19 2020-12-18 四川新网银行股份有限公司 Method for identifying and processing label noise in deep learning
CN113537389B (en) * 2021-08-05 2023-11-07 京东科技信息技术有限公司 Robust image classification method and device based on model embedding
CN114445662A (en) * 2022-01-25 2022-05-06 南京理工大学 Robust image classification method and system based on label embedding
CN114201632B (en) * 2022-02-18 2022-05-06 南京航空航天大学 Label noisy data set amplification method for multi-label target detection task

Also Published As

Publication number Publication date
CN115618935A (en) 2023-01-17

Similar Documents

Publication Publication Date Title
US11816183B2 (en) Methods and systems for mining minority-class data samples for training a neural network
CN110914839B (en) Selective training of error decorrelation
US20210201107A1 (en) Neural architecture search based on synaptic connectivity graphs
US20230229891A1 (en) Reservoir computing neural networks based on synaptic connectivity graphs
US11593627B2 (en) Artificial neural network architectures based on synaptic connectivity graphs
US11625611B2 (en) Training artificial neural networks based on synaptic connectivity graphs
CN103577694B (en) Aquaculture water quality short-time combination forecast method on basis of multi-scale analysis
CN111985601A (en) Data identification method for incremental learning
US11631000B2 (en) Training artificial neural networks based on synaptic connectivity graphs
CN103559537B (en) Based on the template matching method of error back propagation in a kind of out of order data stream
CN116596044B (en) Power generation load prediction model training method and device based on multi-source data
CN109558898B (en) Multi-choice learning method with high confidence based on deep neural network
CN112215412B (en) Dissolved oxygen prediction method and device
CN116542382A (en) Sewage treatment dissolved oxygen concentration prediction method based on mixed optimization algorithm
Moldovan et al. Chicken swarm optimization and deep learning for manufacturing processes
CN112508177A (en) Network structure searching method and device, electronic equipment and storage medium
CN115618935B (en) Robustness loss function searching method and system for classification task tag noise
CN116882539A (en) Water quality data prediction method based on improved Re-GCN model
Hao et al. A Model-Agnostic approach for learning with noisy labels of arbitrary distributions
CN116415177A (en) Classifier parameter identification method based on extreme learning machine
Abusnaina et al. Enhanced MWO training algorithm to improve classification accuracy of artificial neural networks
CN115423091A (en) Conditional antagonistic neural network training method, scene generation method and system
Naik et al. Rainfall prediction based on deep neural network: a review
CN115346084A (en) Sample processing method, sample processing apparatus, electronic device, storage medium, and program product
CN114139783A (en) Wind power short-term power prediction method and device based on nonlinear weighted combination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant