CN117115551A

CN117115551A - Image detection model training method and device, electronic equipment and storage medium

Info

Publication number: CN117115551A
Application number: CN202311155994.XA
Authority: CN
Inventors: 彭莉
Original assignee: Hangzhou Ezviz Network Co Ltd
Current assignee: Hangzhou Ezviz Network Co Ltd
Priority date: 2023-09-07
Filing date: 2023-09-07
Publication date: 2023-11-24

Abstract

The embodiment of the application provides an image detection model training method, an image detection model training device, electronic equipment and a storage medium. And subsequently, acquiring a sample image with the labeling information, and inputting the sample image into an image detection model to obtain an image detection result. And determining the loss of the image detection model according to the labeling information of the sample image and the image detection result, and obtaining the trained image detection model when the loss is converged. And determining a proper current parameter optimization model and a current learning rate calculation function according to the current iteration times, so that parameter adjustment is further carried out on the image detection model, and the training efficiency of the image detection model can be improved.

Description

Image detection model training method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method and apparatus for training an image detection model, an electronic device, and a storage medium.

Background

In the process of training a machine learning algorithm model, optimizing a loss function through an optimization algorithm to obtain a trained model. The gradient descent optimization method is an optimization algorithm most commonly used for solving model parameters of a machine learning algorithm, and the commonly used gradient descent optimization algorithm comprises an ADAM (Adaptive Moment Estimation ) algorithm, an SGDM (Stochastic Gradient Descent with Momentum, momentum random gradient descent) algorithm and the like. The ADAM algorithm can converge quickly, but there is a problem in that a globally optimal solution cannot be found. The SGDM algorithm can find the optimal solution, but its gradient drops slower. Therefore, an ADAM algorithm needs to be used in the early stage of training to take advantage of the rapid convergence of the ADAM algorithm, and an SGDM algorithm needs to be used in the later stage of training to find the optimal solution. That is, during training, the switching uses ADAM and SGDM algorithms to optimize the loss function. Therefore, how to select the parameter optimization model has important significance for improving the training efficiency of the image detection model.

Disclosure of Invention

The embodiment of the application aims to provide an image detection model training method, an image detection model training device, electronic equipment and a storage medium, so as to improve training efficiency of an image detection model. The specific technical scheme is as follows:

In a first aspect, an embodiment of the present application provides an image detection model training method, where the method includes:

determining a current parameter optimization model and a current learning rate calculation function according to the current iteration times of the image detection model;

calculating the current learning rate and the current weight parameters according to the current parameter optimization model and the current learning rate calculation function;

carrying out parameter adjustment on the image detection model according to the current learning rate and the current weight parameter;

acquiring a sample image, and inputting the sample image into an image detection model to obtain an image detection result; wherein the sample image corresponds to labeling information;

determining the loss of an image detection model according to the labeling information of the sample image and the image detection result;

and when the loss of the image detection model is converged, obtaining the trained image detection model.

In one possible embodiment, the determining the current parameter optimization model and the current learning rate calculation function according to the current iteration number of the image detection model includes:

acquiring the iteration number entering a transition interval and ending the transition iteration number; the iteration number of ending transition is larger than the iteration number entering a transition interval;

Under the condition that the current iteration times are smaller than the iteration times entering the transition interval, determining that the current parameter optimization model is an adaptive adjustment learning rate optimization model, and the current learning rate calculation function is a first learning rate calculation function;

under the condition that the current iteration times are not less than the iteration times of entering the transition interval and are less than the iteration times of ending the transition interval, determining that the current parameter optimization model is a self-adaptive adjustment learning rate optimization model, and the current learning rate calculation function is a second learning rate calculation function;

under the condition that the current iteration times are not less than the ending transition iteration times, determining that the current parameter optimization model is a random gradient descent optimization model, and the current learning rate calculation function is a third learning rate calculation function;

the convergence speed of the image detection model under the first learning rate calculation function is greater than the convergence speed of the image detection model under the second learning rate calculation function, and the convergence speed of the image detection model under the second learning rate calculation function is greater than the convergence speed of the image detection model under the third learning rate calculation function;

the improvement of the accuracy of the image detection model under the first learning rate calculation function and the improvement of the accuracy of the image detection model under the third learning rate calculation function are smaller than the improvement of the accuracy of the image detection model under the second learning rate calculation function.

In a possible embodiment, in a case that the current iteration number is smaller than the entering transition interval iteration number, the current parameter optimization model is an adaptive moment estimation ADAM optimization model;

the calculating the current learning rate and the current weight parameters according to the current parameter optimization model and the current learning rate calculation function comprises the following steps:

calculating the current learning rate according to the learning rates corresponding to the current iteration times and the last iteration times;

calculating the current gradient according to the objective function and the weight parameter corresponding to the last iteration number;

acquiring an exponential decay rate of the first moment estimation and an exponential decay rate of the second moment estimation;

calculating the current first moment estimation according to the current gradient, the first moment estimation corresponding to the last iteration times and the exponential decay rate of the first moment estimation;

calculating the current second moment estimation according to the current gradient, the second moment estimation corresponding to the last iteration times and the exponential decay rate of the second moment estimation;

calculating current first-order moment deviation correction according to the current iteration times, the current first-order moment estimation and the exponential decay rate of the first-order moment estimation;

calculating current second moment deviation correction according to the current iteration times, the current second moment estimation and the exponential decay rate of the second moment estimation;

And calculating the current weight parameter according to the weight parameter corresponding to the previous iteration number, the current learning rate, the exponential decay rate of the first moment estimation, the exponential decay rate of the second moment estimation, the current first moment deviation correction and the current second moment deviation correction.

In one possible embodiment, the calculating the current learning rate according to the learning rates corresponding to the current iteration number and the last iteration number includes:

according to the learning rate corresponding to the current iteration times and the last iteration times, the current learning rate is calculated by using the following formula:

wherein alpha is _t Alpha is the current learning rate _t-1 The learning rate corresponding to the last iteration number is given, and t is the current iteration number;

correspondingly, the calculating the current gradient according to the objective function and the weight parameter corresponding to the last iteration number comprises the following steps:

according to the weight parameters corresponding to the objective function and the last iteration times, calculating the current gradient by using the following formula:

wherein g _t For the current gradient, f (w _t-1 ) As an objective function, w _t-1 The weight parameter corresponding to the last iteration number is obtained;

correspondingly, the calculating the current first moment estimation according to the current gradient, the first moment estimation corresponding to the last iteration number and the exponential decay rate of the first moment estimation comprises the following steps:

According to the current gradient, the first moment estimation corresponding to the last iteration number and the exponential decay rate of the first moment estimation, the current first moment estimation is calculated by using the following formula:

m _t ＝β ₁ m _t-1 +(1-β ₁ )g _t

wherein m is _t For the current first moment estimation, beta ₁ Exponential decay Rate, m, estimated for first order moment _t-1 G is the first moment estimation corresponding to the last iteration number _t Is the current gradient;

correspondingly, the calculating the current second moment estimation according to the current gradient, the second moment estimation corresponding to the last iteration number and the exponential decay rate of the second moment estimation comprises the following steps:

according to the current gradient, the second moment estimation corresponding to the last iteration number and the exponential decay rate of the second moment estimation, the current second moment estimation is calculated by using the following formula:

wherein v is _t For the current second moment estimation, beta ₂ Exponential decay rate, v, for second moment estimation _t-1 For the second moment estimation corresponding to the last iteration number,is the square of the current gradient;

correspondingly, the calculating the current first-moment deviation correction according to the current iteration times, the current first-moment estimation and the exponential decay rate of the first-moment estimation comprises the following steps:

according to the current iteration times, the current first moment estimation and the exponential decay rate of the first moment estimation, calculating the current first moment deviation correction by using the following formula:

Wherein,for current first moment deviation correction, m _t For the current first moment estimate,/o>The method comprises the steps that the t power of an exponential decay rate of first moment estimation is obtained, and t is the current iteration number;

correspondingly, the calculating the current second moment deviation correction according to the current iteration times, the current second moment estimation and the exponential decay rate of the second moment estimation comprises the following steps:

according to the current iteration times, the current second moment estimation and the exponential decay rate of the second moment estimation, calculating the current second moment deviation correction by using the following formula:

wherein,for current second moment deviation correction, v _t For the current second moment estimate, +.>The method comprises the steps that the power t of an exponential decay rate of second moment estimation is obtained, and t is the current iteration number;

correspondingly, the calculating the current weight parameter according to the weight parameter corresponding to the previous iteration number, the current learning rate, the exponential decay rate of the first moment estimation, the exponential decay rate of the second moment estimation, the current first moment deviation correction and the current second moment deviation correction comprises the following steps:

according to the weight parameter corresponding to the previous iteration number, the current learning rate, the exponential decay rate of the first moment estimation, the exponential decay rate of the second moment estimation, the current first moment deviation correction and the current second moment deviation correction, the current weight parameter is calculated by using the following formula:

Wherein w is _t As the current weight parameter, w _t-1 For the weight parameter corresponding to the last iteration number, alpha _t For the current rate of learning to be the same,to the power t of the exponential decay rate of the second moment estimation,/->Is the t power of the exponential decay rate of the first moment estimate, +.>Correction for the current first moment deviation, +.>For the current second moment deviation correction, ε is a constant and t is the current iteration number.

In a possible embodiment, in a case that a current iteration number is not less than the number of iterations of the entering transition interval and is less than the number of iterations of the ending transition interval, the current parameter optimization model is an ADAM optimization model;

acquiring the radius of a trust interval, the exponential decay rate of first moment estimation and the exponential decay rate of second moment estimation;

Calculating a current confidence interval according to the current first moment estimation, the radius of the confidence interval, the current second moment estimation, the exponential decay rate of the second moment estimation and the current iteration times;

calculating the current learning rate according to the learning rate, the confidence interval radius and the current first moment estimation corresponding to the last iteration times;

and calculating the current weight parameter according to the weight parameter corresponding to the last iteration number, the current learning rate, the current first moment estimation, the current second moment estimation and the current trust interval.

In a possible embodiment, the calculating the current gradient according to the objective function and the weight parameter corresponding to the last iteration number includes:

m _t ＝β ₁ m _t-1 +(1-β ₁ )g _t

correspondingly, the calculating the current trust interval according to the current first moment estimation, the radius of the trust interval, the current second moment estimation, the exponential decay rate of the second moment estimation and the current iteration number comprises the following steps:

according to the current first moment estimation, the radius of the confidence interval, the current second moment estimation, the exponential decay rate of the second moment estimation and the current iteration number, the current confidence interval is calculated by using the following formula:

wherein z is _t For the confidence interval corresponding to the current iteration number, sign (m _t ) As a function of the sign of the symbol, as a minimum function, m _t For the current first moment estimation, delta is the radius of the confidence interval, v _t For the current second moment estimate, +.>The method comprises the steps that the power t of an exponential decay rate of second moment estimation is obtained, and t is the current iteration number;

correspondingly, the calculating the current learning rate according to the learning rate, the confidence interval radius and the current first moment estimation corresponding to the last iteration number comprises the following steps:

according to the learning rate, the confidence interval radius and the current first moment estimation corresponding to the last iteration number, the current learning rate is calculated by using the following formula:

α _t ＝α _t-1 δ/(δ+||m _t ||+∈)

wherein alpha is _t Alpha is the current learning rate _t-1 For the learning rate corresponding to the last iteration number, delta is the radius of the confidence interval, m _t For the current first moment estimate, E is a constant;

correspondingly, the calculating the current weight parameter according to the weight parameter, the current learning rate, the current first moment estimation, the current second moment estimation and the current confidence interval corresponding to the last iteration number comprises the following steps:

according to the weight parameter, the current learning rate, the current first moment estimation, the current second moment estimation and the current trust interval corresponding to the last iteration number, the current weight parameter is calculated by using the following formula:

wherein w is _t As the current weight parameter, w _t-1 For the weight parameter corresponding to the last iteration number, alpha _t For the current learning rate, m _t For the current first moment estimate, z _t V is the current confidence interval _t For the current second moment estimate, ε is a constant.

In a possible embodiment, in a case that the current iteration number is not less than the ending transition iteration number, the current parameter optimization model is a momentum random gradient descent SGDM optimization model;

acquiring a current momentum parameter;

calculating a current learning rate according to the current momentum parameter and the current gradient;

and calculating the current weight parameter according to the weight parameter corresponding to the last iteration number, the current learning rate and the current momentum parameter.

correspondingly, the calculating the current learning rate according to the current momentum parameter and the current gradient comprises the following steps:

according to the momentum parameter and the current gradient, the current learning rate is calculated by using the following formula:

α _t ＝g _t /(1-β)

wherein alpha is _t Beta is the current momentum parameter, and gt is the current gradient;

correspondingly, the calculating the current weight parameter according to the weight parameter corresponding to the last iteration number, the current learning rate and the current momentum parameter comprises the following steps:

according to the weight parameter, the current learning rate and the momentum parameter corresponding to the previous iteration number, the current weight parameter is calculated by using the following formula:

w _t ＝w _t-1 -(1-β)α _t

wherein w is _t As the current weight parameter, w _t-1 For the weight parameter corresponding to the last iteration number, alpha _t For the current learning rate, β is the current momentum parameter.

In a second aspect, an embodiment of the present application provides an image detection model training apparatus, including:

the first determining module is used for determining a current parameter optimization model and a current learning rate calculation function according to the current iteration times of the image detection model;

the calculation module is used for calculating the current learning rate and the current weight parameters according to the current parameter optimization model and the current learning rate calculation function;

The adjusting module is used for carrying out parameter adjustment on the image detection model according to the current learning rate and the current weight parameter;

the acquisition module is used for acquiring a sample image, inputting the sample image into the image detection model and obtaining an image detection result; wherein the sample image corresponds to labeling information;

the second determining module is used for determining the loss of the image detection model according to the labeling information of the sample image and the image detection result;

and the training module is used for obtaining the trained image detection model when the loss of the image detection model is converged.

In a possible embodiment, the first determining module is configured to:

In a possible embodiment, in a case that the current iteration number is smaller than the entering transition interval iteration number, the current parameter optimization model is an adaptive moment estimation ADAM optimization model; the computing module is used for:

In a possible embodiment, the computing module is configured to:

correspondingly, according to the weight parameters corresponding to the objective function and the last iteration number, the current gradient is calculated by using the following formula:

correspondingly, according to the current gradient, the first moment estimation corresponding to the last iteration number and the exponential decay rate of the first moment estimation, the current first moment estimation is calculated by using the following formula:

m _t ＝β ₁ m _t-1 +(1-β ₁ )g _t

correspondingly, according to the current gradient, the second moment estimation corresponding to the last iteration number and the exponential decay rate of the second moment estimation, the current second moment estimation is calculated by using the following formula:

correspondingly, according to the current iteration times, the current first moment estimation and the exponential decay rate of the first moment estimation, calculating the current first moment deviation correction by using the following formula:

correspondingly, according to the current iteration times, the current second moment estimation and the exponential decay rate of the second moment estimation, calculating the current second moment deviation correction by using the following formula:

correspondingly, according to the weight parameter corresponding to the previous iteration number, the current learning rate, the exponential decay rate of the first moment estimation, the exponential decay rate of the second moment estimation, the current first moment deviation correction and the current second moment deviation correction, the current weight parameter is calculated by using the following formula:

wherein w is _t As the current weight parameter, w _t-1 For the weight parameter corresponding to the last iteration number, alpha _t For the current rate of learning to be the same,for estimating the second momentTo the power t of the exponential decay rate, +.>Is the t power of the exponential decay rate of the first moment estimate, +.>Correction for the current first moment deviation, +.>For the current second moment deviation correction, ε is a constant and t is the current iteration number.

the computing module is used for:

In a possible embodiment, the computing module is configured to

m _t ＝β ₁ m _t-1 +(1-β ₁ )g _t

correspondingly, according to the current first moment estimation, the radius of the confidence interval, the current second moment estimation, the exponential decay rate of the second moment estimation and the current iteration number, the current confidence interval is calculated by using the following formula:

correspondingly, according to the learning rate, the confidence interval radius and the current first moment estimation corresponding to the last iteration number, the current learning rate is calculated by using the following formula:

α _t ＝α _t-1 δ/(δ+||m _t ||+∈)

correspondingly, according to the weight parameter corresponding to the last iteration number, the current learning rate, the current first moment estimation, the current second moment estimation and the current trust interval, the current weight parameter is calculated by using the following formula:

wherein w is _t As the current weight parameter, w _t-1 Corresponding to the last iteration numberWeight parameter, alpha _t For the current learning rate, m _t For the current first moment estimate, z _t V is the current confidence interval _t For the current second moment estimate, ε is a constant.

The computing module is used for:

acquiring a current momentum parameter;

In a possible embodiment, the computing module is configured to:

correspondingly, according to the momentum parameter and the current gradient, the current learning rate is calculated by using the following formula:

α _t ＝g _t /(1-β)

wherein alpha is _t Beta is the current momentum parameter g for the current learning rate _t Is the current gradient;

correspondingly, according to the weight parameter, the current learning rate and the momentum parameter corresponding to the previous iteration number, the current weight parameter is calculated by using the following formula:

w _t ＝w _t-1 -(1-β)α _t

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory;

the memory is used for storing a computer program;

the processor is used for realizing the image detection model training method when executing the program stored in the memory.

In a fourth aspect, an embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the image detection model training method described in the present application.

The embodiment of the application has the beneficial effects that:

according to the image detection model training method, the image detection model training device, the electronic equipment and the storage medium, the current parameter optimization model and the current learning rate calculation function are determined according to the current iteration times of the image detection model, the current learning rate and the current weight parameters are calculated based on the current parameter optimization model and the current learning rate calculation function, and the calculated information is used for adjusting the image detection model. And subsequently, acquiring a sample image with the labeling information, and inputting the sample image into an image detection model to obtain an image detection result. And determining the loss of the image detection model according to the labeling information of the sample image and the image detection result, and obtaining the trained image detection model when the loss is converged. And determining a proper current parameter optimization model and a current learning rate calculation function according to the current iteration times, so that parameter adjustment is further carried out on the image detection model, and the training efficiency of the image detection model can be improved. Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the application, and other embodiments may be obtained according to these drawings to those skilled in the art.

FIG. 1 is a first schematic diagram of an image detection model training method according to an embodiment of the present application;

FIG. 2 is a second schematic diagram of an image detection model training method according to an embodiment of the present application;

FIG. 3 is a schematic diagram showing a comparison of training effects of training a model using an ADAM model, using an SGDM model, and using an ADAM model and an SGDM model according to an embodiment of the present application;

FIG. 4 is a third schematic diagram of an image detection model training method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an image detection model training apparatus according to an embodiment of the present application;

fig. 6 is a schematic diagram of an electronic device according to an embodiment of the application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the embodiments of the present application, all other embodiments obtained by the person skilled in the art based on the present application are included in the scope of protection of the present application.

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely one way of describing an association of associated objects, meaning that there may be three relationships, e.g., a and/or b, which may represent: the first and second cases exist separately, and the first and second cases exist separately. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

First, the technical terms in the present application will be explained.

Early stop method: a regularization method capable of avoiding model overfitting includes dividing sample data into training set and test set, obtaining test result in test set after each or multiple current training steps (epoch) are finished, and recording best test accuracy so far. As epoch increases, if a test error rise occurs on the test set, training is stopped, and the weight parameter when the accuracy of the previous processed test set is highest is used as the final parameter of the model.

In the related art, a conversion threshold condition parameter is set according to experience, is used as a switching condition, and a scaling rule is used for determining a learning rate and a gradient corresponding to a switched optimization algorithm. However, depending on the empirically set transformation threshold condition parameters, the inability to adapt the data set and deep network model may result in model convergence failure. If the model is improperly set, the fitting phenomenon may occur, the generalization capability of the model is affected, and the training effect is poor. In addition, the use of scaling rules to determine the learning rate and gradient corresponding to the switched optimization algorithm can lead to insufficient accuracy of the model, increase the time and calculation cost required by model training, and reduce the convergence rate of the model.

In order to improve training efficiency of an image detection model, an embodiment of the present application provides an image detection model training method, referring to fig. 1, the method includes:

s101, determining a current parameter optimization model and a current learning rate calculation function according to the current iteration times of the image detection model.

The machine learning algorithm is essentially to build an optimization model, and optimize the objective function through the optimization algorithm, so that the best model is trained. The optimization algorithm is directly related to the performance of the model. Common gradient descent optimization algorithms include adaptive adjustment learning rate optimization models and random gradient descent optimization models. The adaptive learning rate adjustment optimization model may be an ADAM (Adaptive Moment Estimation ) optimization model, and the random gradient descent optimization model may be an SGDM (Stochastic Gradient Descent with Momentum, momentum random gradient descent) optimization model. The ADAM optimization algorithm can converge quickly, but there is a problem that the globally optimal solution cannot be found. The SGDM optimization algorithm can find the optimal solution, but its gradient drops slower. Therefore, an ADAM optimization algorithm can be used in the early stage of training to take advantage of the rapid convergence of the ADAM optimization algorithm, and an SGDM optimization algorithm can be used in the later stage of training to find the optimal solution. That is, in the training process for image detection, the ADAM optimization algorithm and the SGDM optimization algorithm are switched to optimize the objective function (loss function).

The embodiment of the application provides a gradient descent optimization algorithm based on a mixing strategy, which realizes gradual and smooth transition from an ADAM optimization algorithm to an SGDM optimization algorithm, enables an image detection model to have the same training speed as the ADAM optimization model in the initial stage, has the same stable learning rate as the SGMD optimization model in the later stage, and improves the parameter calculation speed of the image detection model and the robustness of the image detection model.

In one possible embodiment, as shown in fig. 2, the step S101 may specifically include the following steps:

s1011, acquiring the iteration times of entering a transition section and ending the transition iteration times; the number of iteration times of ending transition is larger than that of entering a transition interval;

s1012, under the condition that the current iteration times are smaller than the iteration times entering a transition interval, determining that the current parameter optimization model is a self-adaptive adjustment learning rate optimization model, and the current learning rate calculation function is a first learning rate calculation function;

s1013, under the condition that the current iteration times are not less than the iteration times of entering a transition interval and are less than the iteration times of finishing the transition interval, determining that the current parameter optimization model is a self-adaptive adjustment learning rate optimization model, and the current learning rate calculation function is a second learning rate calculation function;

S1014, under the condition that the current iteration times are not less than the ending transition iteration times, determining that the current parameter optimization model is a random gradient descent optimization model, and the current learning rate calculation function is a third learning rate calculation function;

the convergence speed of the image detection model under the first learning rate calculation function is greater than the convergence speed of the image detection model under the second learning rate calculation function, and the convergence speed of the image detection model under the second learning rate calculation function is greater than the convergence speed of the image detection model under the third learning rate calculation function; the improvement of the accuracy of the image detection model under the first learning rate calculation function and the improvement of the accuracy of the image detection model under the third learning rate calculation function are smaller than the improvement of the accuracy of the image detection model under the second learning rate calculation function.

In one possible embodiment, the early stop method may be used to set the number of iterations into the transition interval, for example, to set the number of iterations into the transition interval as the number of iterations when the value of the loss function no longer drops. By setting the iteration number entering the transition interval as the iteration number of the early-stop method, the image detection model can be converged more quickly in the training process, and overfitting is avoided. In addition, the early-stopping method can reduce the waste of computing resources and improve the training efficiency.

In one possible embodiment, the transition interval length may be empirically set, for example, to 10, then the number of end transition iterations is the number of entry transition interval iterations +10.

For example, when the adaptive parameter optimization model is an ADAM optimization model and the random gradient descent optimization model is an SGDM optimization model, the current parameter optimization model may be determined using equation (1), equation (2), and equation (3):

wherein f (t) is the current parameter optimization model weight, t _adam To enter the transition interval iteration times, t is the currentIteration times t _switch To end the transition iterations.

α _t ＝f(t)α _sgdm +(1-f(t))α _adam (2)

Wherein alpha is _t Alpha is the current learning rate _sgdm For optimizing the corresponding current learning rate, alpha, when using the SGDM optimization model to optimize the objective function _adam And optimizing the corresponding current learning rate when the objective function is optimized by using the ADAM optimization model.

β _t ＝f(t)β _sgdm +(1-f(t))β _adam (3)

Wherein beta is _t Beta, being the current momentum parameter _sgdm Beta for optimizing the corresponding current momentum parameter of the objective function using the SGDM optimization model _adam The corresponding current momentum parameters when optimizing the objective function are optimized using the ADAM optimization model.

Specifically, as can be seen from formulas (1), (2) and (3), at t < t _adam And t _adam ≤t＜t _switch When f (t) =0, α _t ＝α _adam ，β _t ＝β _adam . That is, in both cases, the current parameter optimization model is determined to be the ADAM optimization model, and the current learning rate α _t And the current momentum parameter beta _t The corresponding current learning rate and current momentum parameters are optimized for the objective function using the ADAM optimization model. At t is greater than or equal to t _switch When f (t) =1, α _t ＝α _SGDM ，β _t ＝β _SGDM . That is, in this case, the current parameter optimization model is determined to be the SGDM optimization model, and the current learning rate α _t And the current momentum parameter beta _t The corresponding current learning rate and current momentum parameters are optimized for the objective function using the SGDM optimization model.

In one possible embodiment, the first learning rate calculation function and the second learning rate calculation function may be calculation functions of a current learning rate under an ADAM optimization model, and the third learning rate calculation function may be calculation functions of a current learning rate under an SGDM optimization model.

In the embodiment of the application, the cosine linear interpolation is used to enable the switching process from the ADAM optimization model to the SGDM optimization model to be smoother, and the oscillation phenomenon in the transition period is avoided. Therefore, the stability of the model in the switching process can be ensured, and the accuracy of the model is improved.

S102, calculating the current learning rate and the current weight parameters according to the current parameter optimization model and the current learning rate calculation function.

In one possible embodiment, calculating the current learning rate and the current weight parameters according to the current parameter optimization model and the current learning rate calculation function includes:

When the current iteration number is smaller than the iteration number entering the transition interval and the current parameter optimization model is an ADAM optimization model, calculating a current learning rate according to the learning rates corresponding to the current iteration number and the last iteration number; calculating the current gradient according to the objective function and the weight parameter corresponding to the last iteration number; acquiring an exponential decay rate of the first moment estimation and an exponential decay rate of the second moment estimation; calculating the current first moment estimation according to the current gradient, the first moment estimation corresponding to the last iteration times and the exponential decay rate of the first moment estimation; calculating the current second moment estimation according to the current gradient, the second moment estimation corresponding to the last iteration times and the exponential decay rate of the second moment estimation; calculating current first-order moment deviation correction according to the current iteration times, the current first-order moment estimation and the exponential decay rate of the first-order moment estimation; calculating current second moment deviation correction according to the current iteration times, the current second moment estimation and the exponential decay rate of the second moment estimation; calculating a current weight parameter according to the weight parameter corresponding to the previous iteration number, the current learning rate, the exponential decay rate of the first moment estimation, the exponential decay rate of the second moment estimation, the current first moment deviation correction and the current second moment deviation correction;

When the current iteration number is not less than the iteration number entering the transition interval and is less than the transition iteration number ending the transition interval and the current parameter optimization model is an ADAM optimization model, calculating the current gradient according to the objective function and the weight parameter corresponding to the last iteration number; acquiring the radius of a trust interval, the exponential decay rate of first moment estimation and the exponential decay rate of second moment estimation; calculating the current first moment estimation according to the current gradient, the first moment estimation corresponding to the last iteration times and the exponential decay rate of the first moment estimation; calculating the current second moment estimation according to the current gradient, the second moment estimation corresponding to the last iteration times and the exponential decay rate of the second moment estimation; calculating a current confidence interval according to the current first moment estimation, the radius of the confidence interval, the current second moment estimation, the exponential decay rate of the second moment estimation and the current iteration times; calculating the current learning rate according to the learning rate, the confidence interval radius and the current first moment estimation corresponding to the last iteration times; calculating a current weight parameter according to the weight parameter corresponding to the last iteration number, the current learning rate, the current first moment estimation, the current second moment estimation and the current trust interval;

When the current iteration number is not less than the ending transition iteration number and the current parameter optimization model is a momentum random gradient descent SGDM optimization model, calculating the current gradient according to the objective function and the weight parameter corresponding to the last iteration number; acquiring a current momentum parameter; calculating a current learning rate according to the current momentum parameter and the current gradient; and calculating the current weight parameter according to the weight parameter corresponding to the last iteration number, the current learning rate and the current momentum parameter.

In one possible embodiment, before calculating the current learning rate and current weight parameters according to the current parameter optimization model and the current learning rate calculation function, some parameters need to be initialized so that the current learning rate and current weight parameters can be calculated subsequently based on these initialized parameters.

Specifically, the learning rate is initialized, the learning rate is set to be very small, and the optimal point of skipping the loss function in the parameter updating process is avoided. Therefore, when the parameter optimization model is an ADAM optimization model, the current learning rate alpha can be calculated _t Initialization is between 0.001 and 0.1, for example 10 ^-3 The method comprises the steps of carrying out a first treatment on the surface of the Current momentum parameter beta _t Exponential decay rate beta including first order moment estimation ₁ Exponential decay rate beta for second moment estimation ₂ And a constant epsilon. The exponential decay rate beta of the first moment estimate can be estimated ₁ And the exponential decay rate beta of the second moment estimation ₂ Initialized to a value near 1For example, beta ₁ Set to 0.9, beta ₂ Set to 0.999. The constant e is normally initialized to a number close to 0 to prevent the denominator 0, e.g. set to 10 ^-9 . When the parameter optimization model is an SGDM optimization model, beta _t Including beta, beta can be initialized to 0.9, an initial optimizer is usually an ADAM model, the number of initialization iterations t is 0, and the first moment estimation m is initialized _t At 0, initialize the second moment estimate v _t At 0, initialize the cumulative variable lambda _t Initializing the confidence interval radius delta=1.0 with 0; namely t+.0, m _t ←0，v _t ←0，λ _t While being ≡0, δ≡1.0, the current learning rate α _t ＝α ₀ ＝10 ^-3 。

Illustratively, at t < t _adam When the method is used, according to the learning rate corresponding to the current iteration times and the last iteration times, the current learning rate is calculated by using a formula (4):

according to the weight parameters corresponding to the objective function and the last iteration times, calculating the current gradient by using a formula (5):

calculating the current first moment estimation by using a formula (6) according to the current gradient, the first moment estimation corresponding to the last iteration number and the exponential decay rate of the first moment estimation:

m _t ＝β ₁ m _t-1 +(1-β ₁ )g _t (6)

calculating the current second moment estimation by using a formula (7) according to the current gradient, the second moment estimation corresponding to the last iteration number and the exponential decay rate of the second moment estimation:

calculating the current first-order moment deviation correction by using a formula (8) according to the current iteration times, the current first-order moment estimation and the exponential decay rate of the first-order moment estimation:

calculating the current second moment deviation correction by using a formula (9) according to the current iteration times, the current second moment estimation and the exponential decay rate of the second moment estimation:

calculating a current weight parameter by using a formula (10) according to the weight parameter corresponding to the previous iteration number, the current learning rate, the exponential decay rate of the first moment estimation, the exponential decay rate of the second moment estimation, the current first moment deviation correction and the current second moment deviation correction:

Illustratively, at t _adam ≤t＜t _switch When the method is used, according to the weight parameters corresponding to the objective function and the last iteration times, calculating the current gradient by using a formula (5); calculating the current first moment estimation by using a formula (6) according to the current gradient, the first moment estimation corresponding to the last iteration times and the exponential decay rate of the first moment estimation; calculating the current second moment estimation by using a formula (7) according to the current gradient, the second moment estimation corresponding to the last iteration times and the exponential decay rate of the second moment estimation;

Calculating a current confidence interval by using a formula (11) according to the current first moment estimation, the confidence interval radius, the current second moment estimation, the exponential decay rate of the second moment estimation and the current iteration times:

according to the learning rate, the confidence interval radius and the current first moment estimation corresponding to the last iteration number, calculating the current learning rate by using a formula (12):

α _t ＝α _t-1 δ/(δ+||m _t ||+∈) (12)

wherein alpha is _t Alpha is the current learning rate _t-1 For the learning rate corresponding to the last iteration number, delta is the radius of the confidence interval, m _t For the current first moment estimate,e is a constant;

according to the weight parameter, the current learning rate, the current first moment estimation, the current second moment estimation and the current trust interval corresponding to the last iteration number, calculating the current weight parameter by using a formula (13):

At t is greater than or equal to t _switch When the method is used, according to the weight parameters corresponding to the objective function and the last iteration times, calculating the current gradient by using a formula (5);

calculating a current learning rate from the current momentum parameter and the current gradient using equation (14):

α _t ＝g _t /(1-β) (14)

according to the weight parameter, the current learning rate and the current momentum parameter corresponding to the previous iteration number, calculating the current weight parameter by using a formula (15):

w _t ＝w _t-1 -(1-β)α _t (15)

S103, parameter adjustment is carried out on the image detection model according to the current learning rate and the current weight parameters.

The essence of image detection model training is that after the mathematical formula of the model is designed, proper parameters (weight parameters) are searched, so that the difference between the evaluation or classification result of the image detection model on the appointed data set and the real situation is minimized.

Based on the steps, after the current weight parameters are obtained, the weight parameters of the image detection model can be adjusted to the current weight parameters, and the image detection model with the adjusted parameters is obtained. The sample image can be input into the image detection model after parameter adjustment, and the image detection model after parameter adjustment is trained to obtain the trained image detection model.

S104, acquiring a sample image, and inputting the sample image into an image detection model to obtain an image detection result; wherein the sample image corresponds to the annotation information.

The sample image can be obtained from an image algorithm public data set or a preset image library. For example, as shown in table 1, 6000 images with an image size of 32×32 were obtained as sample images from the industrial mainstream image dataset CIFAR-10, wherein 5000 images were used to train an image detection model and 1000 images were used to test the image detection model. The image detection model is a convolutional neural network model, for example, as shown in table 2, the image detection model may be a res net-32 model with a parameter amount of 470218 and a number of iterations of 100. Fig. 3 is a schematic diagram showing comparison of training effects of training the res net-32 model using the ADAM model, using the SGDM model, and using the ADAM model and the SGDM model, for illustrating a trend of variation in verification accuracy (Validation Accuracy) of the res net-32 model according to the current iteration number t in three cases of training the res net-32 model using the ADAM model, the SGDM model, and the ADAM model and the SGDM model. The horizontal axis represents the variation of the current iteration number t, the scales may be 0, 20, 40, 60, 80, and 100, the vertical axis represents the variation of the Accuracy (Accuracy), and the scales may be 0, 0.5, 0.6, 0.7, 0.8, and 0.9. The labeling information in the sample image represents a set of pixels of the labeling area in the sample image. The labeling information of the sample image can be obtained by manual labeling or obtained from a related open source database.

TABLE 1

Data set	Image size	Training data set	Test data set
				CIFAR-10	32*32	5000	1000

TABLE 2

Network structure	Quantity of parameters	Iteration number (t)
			Residual-Network-32	470，218	100

S105, determining the loss of the image detection model according to the labeling information of the sample image and the image detection result.

When training the image detection model, the loss function is used as an objective function for evaluating the difference between the image detection result (model predicted value) of the image detection model and the labeling information (target value) of the sample image, so that the image detection model can be predicted towards the direction of the true value. Wherein, when the smaller the value of the loss function, the closer the model predicted value is to the target value, the higher the accuracy of the model; the greater the value of the loss function, the lower the accuracy of the model is, the model predictor principle target value.

In one possible embodiment, common loss functions include mean square error MSE, cross entropy loss, KL divergence, etc.

And S106, obtaining a trained image detection model when the loss of the image detection model is converged.

In one possible embodiment, a trained image detection model is obtained when the loss of the image detection model converges. And when the loss of the image detection model is not converged, continuing to execute the steps to carry out parameter adjustment on the image detection model, and training the image detection model after the parameter adjustment until the loss of the image detection model is converged.

In the embodiment of the application, a current parameter optimization model and a current learning rate calculation function are determined according to the current iteration times of the image detection model, the current learning rate and the current weight parameters are calculated based on the current parameter optimization model and the current learning rate calculation function, and the calculated information is used for adjusting the image detection model. And subsequently, acquiring a sample image with the labeling information, and inputting the sample image into an image detection model to obtain an image detection result. And determining the loss of the image detection model according to the labeling information of the sample image and the image detection result, and obtaining the trained image detection model when the loss is converged. The current learning rate and the current weight parameters are determined according to the current iteration parameters, the parameter optimization model and the current learning rate calculation function, so that the parameter optimization model can be smoothly switched, the stability of the image detection model in the switching process is ensured, and the accuracy of the trained image detection model is further improved.

The embodiment of the application also provides an image detection model training method, which specifically comprises the following steps of:

step one: starting;

step two: initializing model parameters;

step three: training a model by using an ADAM algorithm, and recording a loss function value of the current iteration times;

Step four: judging whether the switching function is met, if not, returning to the step three, and if so, executing the step five;

step five: entering a transition interval, and calculating the current learning rate and the current momentum parameter each time;

step six: if the transition interval ending condition is met, returning to the step five, and if the transition interval ending condition is not met, executing the step seven;

step seven: ending the transition, and using an SGDM training model;

step eight: and stopping training when the loss function value on the verification set is not reduced any more.

In the embodiment of the application, the switching process from the ADAM optimization model to the SGDM optimization model is integrated into the self-adaptive learning rate and momentum optimization algorithm, so that the robustness and generalization capability of the model can be improved, the training process can be accelerated, and the computing resource can be saved. And whether transition is carried out is determined according to the performance of the loss function on the verification set, so that the model still has better generalization capability in the transition period, the accuracy, the robustness and the generalization capability of the model are further improved on the original precision, and finally an experimental result is given.

The embodiment of the application also provides an image detection model training device, referring to fig. 5, the device comprises:

A first determining module 501, configured to determine a current parameter optimization model and a current learning rate calculation function according to a current iteration number of the image detection model;

the calculating module 502 is configured to calculate a current learning rate and a current weight parameter according to the current parameter optimization model and a current learning rate calculating function;

an adjustment module 503, configured to perform parameter adjustment on the image detection model according to the current learning rate and the current weight parameter;

the obtaining module 504 is configured to obtain a sample image, input the sample image into the image detection model, and obtain an image detection result; wherein the sample image corresponds to the labeling information;

a second determining module 505, configured to determine a loss of the image detection model according to the labeling information of the sample image and the image detection result;

the training module 506 is configured to obtain a trained image detection model when the loss of the image detection model converges.

In a possible embodiment, the first determining module is configured to:

acquiring the iteration number entering a transition interval and ending the transition iteration number; the number of iteration times of ending transition is larger than that of entering a transition interval;

under the condition that the current iteration times are smaller than the iteration times entering the transition interval, determining that the current parameter optimization model is a self-adaptive adjustment learning rate optimization model, and the current learning rate calculation function is a first learning rate calculation function;

Under the condition that the current iteration times are not less than the iteration times entering the transition interval and are less than the transition iteration times ending, determining that the current parameter optimization model is a self-adaptive adjustment learning rate optimization model, and the current learning rate calculation function is a second learning rate calculation function;

In one possible embodiment, the current parameter optimization model is an adaptive moment estimation ADAM optimization model in the case that the current iteration number is smaller than the number of iterations entering the transition interval; a calculation module for:

In one possible embodiment, the computing module is configured to:

wherein alpha is _t For the current studyRate, alpha _t-1 The learning rate corresponding to the last iteration number is given, and t is the current iteration number;

m _t ＝β ₁ m _t-1 +(1-β ₁ )g _t

wherein v is _t For the current second moment estimation, beta ₂ Exponential decay rate, v, for second moment estimation _t-1 For the second moment estimation corresponding to the last iteration number, Is the square of the current gradient;

wherein w is _t As the current weight parameter, w _t-1 For the weight parameter corresponding to the last iteration number, alpha _t For the current rate of learning to be the same,to the power t of the exponential decay rate of the second moment estimation,/- >Is the t power of the exponential decay rate of the first moment estimate, +.>Correction for the current first moment deviation, +.>For the current second moment deviation correction, ε is a constant and t is the current iteration number.

In one possible embodiment, the current parameter optimization model is an ADAM optimization model when the current iteration number is not less than the number of iterations entering the transition interval and is less than the number of iterations ending the transition interval;

a calculation module for:

In a possible embodiment, the computing module is used for

m _t ＝β ₁ m _t-1 +(1-β ₁ )g _t

α _t ＝α _t-1 δ/(δ+||m _t ||+∈)

In one possible embodiment, in the case that the current iteration number is not less than the ending transition iteration number, the current parameter optimization model is a momentum random gradient descent SGDM optimization model;

a calculation module for:

acquiring a current momentum parameter;

In one possible embodiment, the computing module is configured to:

α _t ＝g _t /(1-β)

wherein alpha is _t For the current learning rate, beta is the current momentum parameterNumber g _t Is the current gradient;

w _t ＝w _t-1 -(1-β)α _t

The embodiment of the application also provides an electronic device, as shown in fig. 6, including:

a memory 601 for storing a computer program;

a processor 602, configured to execute a program stored in the memory 601, and implement the following steps:

calculating a current learning rate and a current weight parameter according to the current parameter optimization model and a current learning rate calculation function;

acquiring a sample image, and inputting the sample image into an image detection model to obtain an image detection result; wherein the sample image corresponds to the labeling information;

And the electronic device may further comprise a communication bus and/or a communication interface, through which the processor 602, the communication interface, and the memory 601 communicate with each other.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. In a possible embodiment, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present application, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor implements the steps of any of the image detection model training methods described above.

In yet another embodiment of the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the image detection model training methods of the above embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.) means from one website, computer, server, or data center. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a Solid State Disk (SSD), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, electronic device and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and references to the parts of the description of the method embodiments are only needed.

The foregoing is merely a preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A method of training an image detection model, the method comprising:

2. The method of claim 1, wherein determining a current parameter optimization model and a current learning rate calculation function based on a current number of iterations of the image detection model comprises:

3. The method according to claim 2, wherein the current parameter optimization model is an adaptive moment estimation ADAM optimization model in case the current number of iterations is smaller than the number of iterations entering the transition interval;

4. A method according to claim 3, wherein calculating the current learning rate from the learning rates corresponding to the current iteration number and the last iteration number comprises:

m _t ＝β ₁ m _t-1 +(1-β ₁ )g _t

5. The method according to claim 2, wherein the current parameter optimization model is an ADAM optimization model in the case that a current iteration number is not less than the number of iterations of the entering transition interval and less than the number of iterations of the ending transition interval;

6. The method of claim 5, wherein calculating the current gradient from the objective function and the weight parameter corresponding to the last iteration number comprises:

m _t ＝β ₁ m _t-1 +(1-β ₁ )g _t

α _t ＝α _t-1 δ/(δ+‖m _t ‖+∈)

7. The method according to claim 2, wherein the current parameter optimization model is a momentum random gradient descent SGDM optimization model in case the current number of iterations is not less than the ending transition number of iterations;

acquiring a current momentum parameter;

8. The method of claim 7, wherein calculating the current gradient from the objective function and the weight parameter corresponding to the last iteration number comprises:

α _t ＝g _t /(1—β)

w _t ＝w _t-1 -(1-β)α _t

9. An image detection model training apparatus, the apparatus comprising:

10. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the method of any of claims 1-8 when executing a program stored on a memory.

11. A computer-readable storage medium comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: the method of any one of claims 1-8 is implemented.