CN114090952A

CN114090952A - Method, apparatus, device and storage medium for loss function dynamic weighting

Info

Publication number: CN114090952A
Application number: CN202111242467.3A
Authority: CN
Inventors: 周开龙; 陈颖辉; 王范萍; 张玥
Original assignee: Shanghai Xiaoling Network Technology Co ltd
Current assignee: Shanghai Xiaoling Network Technology Co ltd
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-02-25

Abstract

The invention discloses a method, a device, equipment and a storage medium for dynamic weighting of a loss function, wherein the method comprises the following steps: obtaining the prior probability of each category in a sample set; determining the difficulty weight value of each category according to the prior probability and the prediction probability of each category output in the current model training; updating the weight variable in the preset loss function based on the difficulty weight value of each category to obtain the loss function after the difficulty weight value is updated, realizing dynamic adjustment of the weight variable of the loss function, and avoiding the phenomenon that the artificially divided difficulty weight value has poor effect in the selected loss function.

Description

Method, apparatus, device and storage medium for loss function dynamic weighting

Technical Field

The invention belongs to the technical field of machine learning, and particularly relates to a method, a device, equipment and a storage medium for dynamic weighting of a loss function.

Background

In recent years, with the surge of artificial intelligence wave, artificial intelligence and machine learning have become more and more important in academic and industrial circles.

In the conventional model training, a loss function is usually used to calculate a difference between a predicted distribution and a true distribution as a loss value, and the loss value is used to adjust the model so that the model converges.

In a real environment, learning difficulties of various categories are different, some categories have a simple fixed mode and are easy to train, and samples of some categories have complex and changeable modes and large model learning difficulties. For the difficulty category, a more desirable weighting scheme is: a larger weight is added to the hard category, and a smaller weight is added to the simple category, so that the loss ratio of the hard category can be increased.

However, this method needs to know the difficulty of the category in advance, and the difficulty of the category is a relative concept, and the categories in each training set have relative difficulty, but the difficulty differences of different categories may be different, so that the same loss function may obtain a better effect on one category and a poor effect on the other category by using the same distinguishing method. Therefore, the difficulty of each category in the training set is difficult to accurately and quantitatively estimate before model training, and the model training effect is poor.

Disclosure of Invention

The invention mainly aims to provide a method, a device, equipment and a storage medium for dynamic weighting of a loss function, so as to solve the technical problem that in the prior art, the difficulty level of each category in a training set is difficult, and the accurate quantitative estimation is difficult before model training, so that the model training effect is poor.

In view of the above problem, the present invention provides a method for dynamically weighting a loss function, including:

obtaining the prior probability of each category in a sample set;

determining the difficulty weight value of each category according to the prior probability and the prediction probability of each category output in the current model training;

and updating the weight variable in the preset loss function based on the difficulty weight value of each category to obtain the loss function after updating the difficulty weight value, and determining the loss value of the current model training by using the updated loss function.

Further, in the above method for dynamically weighting a loss function, obtaining a prior probability of each category in a set of categories includes:

acquiring the prior probability of each category in the sample set by using a preset prior probability calculation formula;

the prior probability calculation formula is as follows:

wherein, p (c)_i) Representing a prior probability, c_iDenotes the ith class, n_iDenotes c_iCorresponding number of samples, K representing the number of classes in the sample set, n_kThe number of samples corresponding to the kth category is shown, gamma represents a parameter which is difficult to control the category, and gamma is a constant.

Further, in the above method for dynamically weighting a loss function, determining a difficulty weight of each class according to the prior probability and a prediction probability of each class output in the current model training includes:

substituting the prior probability and the prediction probability of each category output in the current model training into a preset difficulty weight calculation formula to calculate the difficulty weight of each category;

the calculation formula of the difficulty weight is as follows:

W_d＝(p(c_i)-p)²

wherein, W_dDenotes the difficulty weight, p (c)_i) Representing the prior probability and p the prediction probability.

Further, in the above method for dynamically weighting a loss function, updating a weight variable in a preset loss function based on the difficulty weight of each category to obtain a loss function after updating the difficulty weight, the method includes:

substituting the difficulty weight value of each category into a preset weight variable updating calculation formula to obtain a loss function after updating the difficulty weight value;

the weight variable updating calculation formula is as follows:

Loss＝-W_d＊logp

where Loss denotes the updated penalty function, W_dRepresenting the difficulty weight, -logp representing a preset loss function.

The invention also provides a device for dynamically weighting the loss function, which comprises:

the acquisition module is used for acquiring the prior probability of each category in the sample set;

the determining module is used for determining the difficulty weight of each category according to the prior probability and the output prediction probability of each category in the current model training;

and the updating module is used for updating the weight variables in the preset loss function based on the difficulty weight of each category to obtain the loss function after the difficulty weight is updated, so that the loss value of the current model training is determined by using the updated loss function.

Further, in the above apparatus for dynamically weighting a loss function, the obtaining module is specifically configured to:

acquiring the prior probability of each category in the category set by using a preset prior probability calculation formula;

the prior probability calculation formula is as follows:

wherein, p (c)_i) Representing a prior probability, c_iDenotes the ith class, n_iDenotes c_iCorresponding number of samples, K representing the number of classes in the sample set, n_kThe number of samples corresponding to the kth class is shown, Y represents a class uncontrollable parameter, and gamma is a constant.

Further, in the above apparatus for dynamically weighting a loss function, the determining module is specifically configured to:

the calculation formula of the difficulty weight is as follows:

W_d＝(p(c_i)-p)²

Further, in the above apparatus for dynamically weighting a loss function, the update module is specifically configured to:

the weight variable updating calculation formula is as follows:

Loss＝-W_d＊logp

The invention also provides a device for dynamically weighting the loss function, which comprises a memory and a controller;

the memory has stored thereon a computer program which, when executed by a controller, carries out the steps of the method of dynamic weighting of a loss function as defined in any one of the above.

The invention also provides a storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for dynamic weighting of loss functions as defined in any one of the preceding claims.

Compared with the prior art, one or more embodiments in the above scheme can have the following advantages or beneficial effects:

the method, the device, the equipment and the storage medium for the dynamic weighting of the loss function acquire the prior probability of each category in a sample set; determining the difficulty weight value of each category according to the prior probability and the prediction probability of each category output in the current model training; updating the weight variable in the preset loss function based on the difficulty weight value of each category to obtain the loss function after the difficulty weight value is updated, dynamically adjusting the weight variable of the loss function, and avoiding the phenomenon that the artificially divided difficulty weight value has poor effect in the selected loss function.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of an embodiment of a method for dynamic weighting of loss functions in accordance with the present invention;

FIG. 2 is a schematic structural diagram of an apparatus for dynamic weighting of loss functions according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an embodiment of the device for dynamically weighting the loss function according to the present invention.

Detailed Description

The following detailed description of the embodiments of the present invention will be provided with reference to the drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, as long as there is no conflict, the embodiments and the features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.

The current loss functions mainly include a cross entropy loss function, a focal loss function and the like. In target detection applications, they can calculate the difference between the predicted distribution and the true distribution, where focal loss is an improvement made by cross entropy, and can not only deal with the imbalance of foreground and background, but also distinguish difficult samples so as to increase the loss ratio of the difficult samples and reduce the loss ratio of simple samples.

The expression (1) of the cross entropy loss function may be:

where p represents the prediction probability and y represents the true probability.

Since one-hot coding is used, the value of the position corresponding to the true probability is 1, and the rest are all 0, the formula (1) can be abbreviated as the calculation formula (2):

CE(p)＝-log(p_t) (2)

the following two parameters α and γ are added to the following formula (2) for the following formula (3):

FL(p_t)＝-a(1-p_t)^γlog(p_t) (3)

wherein a is used for controlling the class imbalance, gamma is a class difficulty control parameter, and gamma is a constant.

In practical application, the most concerned place is the problem of distinguishing the difficulty categories, the category difficulty is a relative concept, the categories in each training set have relative difficulty, but the difficulty differences of different categories are possibly different, so that the same loss function can obtain a good effect on one category and a poor effect on the other category by adopting the same distinguishing mode, and therefore, the difficulty degree of each category in the training set is difficult to accurately and quantitatively estimate before model training, and the model training effect is poor.

In order to solve the technical problems, the invention provides the following technical scheme:

fig. 1 is a flowchart of an embodiment of a method for dynamic weighting of a lossy function, as shown in fig. 1, the method for dynamic weighting of a lossy function of the embodiment may specifically include the following steps:

100. obtaining the prior probability of each category in a sample set;

in one implementation, the classes may be assumed to follow a prior distribution in which each class has a corresponding prior probability p (c)_i). In one particular implementation, the prior probability p (c) for each class can be calculated based on, but not limited to, the number of times each class appears in the sample set_i). A priori probability p (c) for each class_i) To some extent, how much the category should be predicted.

Specifically, a preset prior probability calculation formula can be used for obtaining the prior probability of each category in the sample set;

the prior probability calculation formula (4) is:

101. Determining the difficulty weight value of each category according to the prior probability and the prediction probability of each category output in the current model training;

in a specific implementation process, the prior probability and the prediction probability of each category output in the current model training can be substituted into a preset difficulty weight calculation formula to calculate the difficulty weight of each category;

the difficulty weight value calculation formula (5) is:

W_d＝(p(c_i)-p)² (5)

In one embodiment, when the predicted probability p is equal to the prior probability p (c)_i) The larger the phase difference is, the greater the probability that the sample is a difficult sample, W_dThe greater the value of (c), and conversely, the more the predictive probability p and the prior probability p (c)_i) The smaller the phase difference, the less likely the sample is a difficult sample, W_dThe smaller the value of (c).

In a specific implementation process, for each class, its corresponding difficulty weight value W_dThe training method has the advantages that the training method can be changed during each training, the classification difficulty and the classification difficulty do not need to be manually classified before the training, and the problem that the effect of the manually classified difficulty weight in the selected loss function is poor is solved.

102. And updating the weight variable in the preset loss function based on the difficulty weight value of each category to obtain the loss function after the difficulty weight value is updated, and determining the loss value of the current model training by using the updated loss function.

In a specific implementation process, the difficulty weight values of each category can be substituted into a preset weight value variable updating calculation formula to obtain a loss function after the difficulty weight values are updated;

the weight variable update calculation formula (6) is:

Loss＝-W_d*log p (6)

After the loss function after the difficulty weight value is updated is obtained, the loss value of the current model training can be determined by using the loss function after the difficulty weight value is updated, then the loss value of the current model training is transmitted to the optimizer as a signal, the optimizer calculates again according to the signal to obtain a new weight, and the weight of the model at the moment is updated to be optimal through repeated iteration until the loss value is not reduced any more. Thus, compared with a method for fixedly weighting the loss function, the weight of the method is determined by a training effect, and in the dynamic adjustment process, the training of the model is more consistent with the data, so that the harder the category loss occupation ratio is, the smaller the simpler the category loss occupation ratio is, and the effect of the model is improved. By the method, the model can focus on the difficult category with a large loss value in the next round of training, and the problem of poor difficult category training is solved. In addition, the loss function obtained in this way is based on an a priori distribution, so that the training is not started from zero, and therefore, the best effect can be achieved more quickly during the training, so that the training model can converge more quickly.

It should be noted that selecting different loss functions will affect the model weight and thus affect the overall model.

In the method for dynamically weighting the loss function of the embodiment, the prior probability of each category in the sample set is obtained; determining the difficulty weight value of each category according to the prior probability and the prediction probability of each category output in the current model training; updating the weight variable in the preset loss function based on the difficulty weight value of each category to obtain the loss function after the difficulty weight value is updated, realizing dynamic adjustment of the weight variable of the loss function, and avoiding the phenomenon that the artificially divided difficulty weight value has poor effect in the selected loss function.

It should be noted that the method of the embodiment of the present invention may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In the case of such a distributed scenario, one of the multiple devices may only perform one or more steps of the method of the embodiment of the present invention, and the multiple devices may interact with each other to complete the method.

Fig. 2 is a schematic structural diagram of an embodiment of the device for dynamically weighting a loss function according to the present invention, and as shown in fig. 2, the device for dynamically weighting a loss function according to this embodiment may include an obtaining module 20, a determining module 21, and an updating module 22.

An obtaining module 20, configured to obtain a prior probability of each category in the sample set;

specifically, the obtaining module 20 may obtain the prior probability of each category in the category set by using a preset prior probability calculation formula;

the prior probability calculation formula may refer to the calculation formula (4), and will not be described herein.

A determining module 21, configured to determine a difficulty weight of each category according to the prior probability and a prediction probability of each category output in the current model training;

specifically, the prior probability and the prediction probability of each category output in the current model training are substituted into a preset difficulty weight calculation formula, and the difficulty weight of each category is calculated;

the calculation formula of the difficult-to-easy weight value may refer to the calculation formula (5), and will not be described herein.

And the updating module 22 is configured to update the weight variable in the preset loss function based on the difficulty weight value of each category to obtain a loss function with an updated difficulty weight value, so as to determine the loss value of the current model training by using the updated loss function.

Specifically, the difficulty weight value of each category may be substituted into a preset weight value variable update calculation formula to obtain a loss function after updating the difficulty weight value;

the weight variable update calculation formula may refer to the calculation formula (6), and will not be described herein.

The device for dynamically weighting the loss function of the embodiment obtains the prior probability of each category in the sample set; determining the difficulty weight value of each category according to the prior probability and the prediction probability of each category output in the current model training; updating the weight variable in the preset loss function based on the difficulty weight value of each category to obtain the loss function after the difficulty weight value is updated, realizing dynamic adjustment of the weight variable of the loss function, and avoiding the phenomenon that the artificially divided difficulty weight value has poor effect in the selected loss function.

The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and specific implementation schemes thereof may refer to the method described in the foregoing embodiment and relevant descriptions in the method embodiment, and have beneficial effects of the corresponding method embodiment, which are not described herein again.

Fig. 3 is a schematic structural diagram of an embodiment of the apparatus for dynamically weighting loss functions according to the present invention, and as shown in fig. 3, the apparatus for dynamically weighting loss functions according to the present embodiment may include a memory 30 and a controller 31;

the memory 30 has stored thereon a computer program which, when executed by the controller 31, implements the steps of the method of dynamic weighting of loss functions of the above-described embodiments.

An embodiment of the present invention provides a storage medium, where a computer program is stored on the storage medium of this embodiment, and the computer program, when executed by a controller, implements the steps of the method for dynamically weighting a loss function according to the foregoing embodiment.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for dynamic weighting of a loss function, comprising:

obtaining the prior probability of each category in a sample set;

and updating the weight variable in the preset loss function based on the difficulty weight value of each category to obtain the loss function after the difficulty weight value is updated, and determining the loss value of the current model training by using the updated loss function.

2. The method of claim 1, wherein obtaining a prior probability for each class in a set of classes comprises:

the prior probability calculation formula is as follows:

wherein, p (c)_i) Representing a prior probability, c_iDenotes the ith class, n_iDenotes c_iCorresponding number of samples, K representing the number of classes in the sample set, n_kThe number of samples corresponding to the kth category is shown, Y represents a category difficulty control parameter, and Y is a constant.

3. The method of claim 1, wherein determining the difficulty weight of each class according to the prior probability and the predicted probability of each class output in the current model training comprises:

the calculation formula of the difficulty weight is as follows:

W_d＝(p(c_i)-p)²

4. The method according to claim 1, wherein the step of updating the weight variables in the preset loss function based on the difficulty weight value of each category to obtain the loss function with updated difficulty weight value comprises:

substituting the difficulty weight value of each category into a preset weight value variable updating calculation formula to obtain a loss function after the difficulty weight value is updated;

the weight variable updating calculation formula is as follows:

Loss＝-W_d＊log p

where Loss denotes the updated penalty function, W_dRepresenting the difficulty weight, -log p representing the pre-set penalty function.

5. An apparatus for dynamic weighting of loss functions, comprising:

the determining module is used for determining the difficulty weight of each category according to the prior probability and the prediction probability of each category output in the current model training;

and the updating module is used for updating the weight variable in the preset loss function based on the difficulty weight value of each category to obtain the loss function after the difficulty weight value is updated, so that the loss value of the current model training is determined by using the updated loss function.

6. The apparatus for dynamic weighting of loss functions as claimed in claim 5, wherein the obtaining module is specifically configured to:

the prior probability calculation formula is as follows:

wherein, p (c)_i) Representing a prior probability, c_iDenotes the ith class, n_iDenotes c_iCorresponding number of samples, K representing the number of classes in the sample set, n_kThe number of samples corresponding to the kth category is represented, gamma represents a category difficulty control parameter, and gamma is a constant.

7. The apparatus for dynamic weighting of loss functions of claim 5, wherein the determining module is specifically configured to:

the calculation formula of the difficulty weight is as follows:

W_d＝(p(c_i)-p)²

8. The apparatus for dynamic weighting of loss functions as claimed in claim 5, wherein the update module is specifically configured to:

the weight variable updating calculation formula is as follows:

Loss＝-W_d＊log p

9. An apparatus for dynamic weighting of loss functions, comprising a memory and a controller;

the memory has stored thereon a computer program which, when being executed by the controller, carries out the steps of the method for dynamic weighting of a loss function as claimed in any one of claims 1 to 4.

10. A storage medium having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the method for dynamic weighting of a loss function as claimed in any one of claims 1 to 4.