CN110458287B

CN110458287B - Parameter updating method, device, terminal and storage medium of neural network optimizer

Info

Publication number: CN110458287B
Application number: CN201910117536.4A
Authority: CN
Inventors: 金戈; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-02-15
Filing date: 2019-02-15
Publication date: 2024-05-07
Anticipated expiration: 2039-02-15
Also published as: CN110458287A

Abstract

The invention is applied to the field of deep learning, and particularly discloses a parameter updating method, a device, a terminal and a storage medium of a neural network optimizer, wherein the method comprises the following steps: when the neural network model is optimized by using the optimizer, setting an initial value of a learning rate in the optimizer as a reference learning rate, and acquiring a gradient of the optimizer during random gradient descent optimization so as to find a corresponding learning rate according to the gradient, wherein the learning rate is reduced along with descent of the gradient; calculating the ratio k of the found learning rate to the reference learning rate, and updating the found learning rate to be the reference learning rate; and (3) increasing the attenuation rate according to the ratio k, and re-executing the steps to obtain the gradient when the optimizer performs random gradient descent optimization when the neural network model is optimized by using the optimizer, so as to find the corresponding learning rate according to the gradient until the neural network model is optimized. The invention realizes the dynamic update of the parameters of the neural network optimizer and improves the execution effect of the classification model.

Description

Parameter updating method, device, terminal and storage medium of neural network optimizer

Technical Field

The present invention relates to the field of computers, and in particular, to a method, an apparatus, a terminal, and a computer readable storage medium for updating parameters of a neural network optimizer.

Background

The optimization mode of the neural network model based on deep learning at present adopts a SGD (Stochastic GRADIENT DESCENT, random gradient descent) method, and the method obtains good precision to a certain extent, namely low error rate. The SGD may collect samples randomly, which may cause a certain probability of the model to fall into a locally optimal solution, such as saddle points, so that the obtained model has poor execution effect.

Disclosure of Invention

The invention mainly aims to provide a parameter updating method, device, terminal and computer readable storage medium of a neural network optimizer, and aims to solve the problem that when the deep learning model is optimized by the optimizer based on an SGD (generalized gateway) method, the model execution effect is poor.

In order to achieve the above object, the present invention provides a method for updating parameters of a neural network optimizer, the method comprising the steps of:

When an optimizer based on a random gradient descent combined momentum method is used for optimizing a neural network model, setting an initial value of a learning rate in the optimizer as a reference learning rate, and acquiring a gradient of the optimizer during random gradient descent optimization so as to find a corresponding learning rate according to the gradient, wherein the learning rate is reduced along with gradient descent;

calculating the ratio k of the found learning rate to the reference learning rate, and updating the found learning rate to be the reference learning rate;

And (3) increasing the attenuation rate according to the ratio k, and re-executing the steps to obtain the gradient when the optimizer performs random gradient descent optimization when the neural network model is optimized by using the optimizer, so as to find the corresponding learning rate according to the gradient until the neural network model is optimized, wherein k is E (0, 1).

Optionally, the step of increasing the attenuation rate according to the ratio k includes:

According to the formula And (3) increasing the attenuation rate, wherein gamma ₁ is the attenuation rate after adjustment, gamma ₀ is the attenuation rate before adjustment, and k is the ratio of the found learning rate to the reference learning rate.

Optionally, after the step of searching for the corresponding learning rate according to the gradient, the method further includes:

Monitoring whether the optimization of the optimizer on the neural network model reaches a preset stage or not;

When the optimizing of the optimizing device to the neural network model reaches a preset stage, executing the steps of: calculating the ratio k of the found learning rate to the reference learning rate;

And when the optimization of the optimizer on the neural network model does not reach a preset stage, returning to the execution step, and when the optimizer is used for optimizing the neural network model, acquiring the gradient of the optimizer during random gradient descent optimization so as to search the corresponding learning rate according to the gradient until the neural network model is optimized.

Optionally, the step of monitoring whether the optimization of the neural network model by the optimizer reaches a preset stage includes:

When the optimizer completes optimization of the primary neural network model and batch size data in training sample data selected according to a preset standard is input into the optimized neural network model, monitoring whether the accumulated data amount of the training sample data input into the neural network model is equal to a preset stage threshold value; when the accumulated data amount of the training sample data input to the neural network model is equal to a preset stage threshold value, the optimization of the optimizer on the neural network model is determined to reach a preset stage.

Optionally, the preset criteria is that the batch size increases as the number of optimizers optimize the neural network model increases.

Optionally, when the preset stage threshold is N times of a preset value, N is an integer greater than or equal to 1.

Optionally, the step of optimizing the neural network model using an optimizer based on a random gradient descent combined momentum method includes:

Optimizing the neural network model according to an optimizer comprising the formula θ _n＝θ_n-1-v_t, wherein θ _n is the currently optimized neural network model parameter, θ _n-1 is the previously optimized neural network model parameter, v _t is the momentum of the currently optimized neural network model, Gamma is the attenuation rate, v _t-1 is the momentum of the previous optimization of the neural network model, epsilon is the learning rate,/>Gradient, t is the number of optimizations.

In order to achieve the above object, the present invention further provides a parameter updating apparatus of a neural network optimizer, the apparatus comprising:

The searching module is used for setting an initial learning rate value in the optimizer as a reference learning rate when the neural network model is optimized by using the optimizer based on the random gradient descent combined momentum method, and acquiring a gradient of the optimizer during random gradient descent optimization so as to search a corresponding learning rate according to the gradient, wherein the learning rate is reduced along with the descent of the gradient;

The calculation module is used for calculating the ratio k of the found learning rate to the reference learning rate and updating the found learning rate into the reference learning rate;

And the lifting module is used for lifting the attenuation rate according to the ratio k, and re-executing the steps to obtain the gradient when the optimizer performs random gradient descent optimization when the optimizer performs optimization on the neural network model so as to search the corresponding learning rate according to the gradient until the neural network model is optimized, wherein k is E (0, 1).

To achieve the above object, the present invention also provides a terminal including: a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the method for updating parameters of a neural network optimizer as described above.

To achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the parameter updating method of the neural network optimizer as described above.

When an optimizer based on a random gradient descent combined momentum method is used for optimizing a neural network model, setting an initial value of a learning rate in the optimizer as a reference learning rate, and acquiring a gradient of the optimizer during random gradient descent optimization so as to find a corresponding learning rate according to the gradient, wherein the learning rate is reduced along with gradient descent; calculating the ratio k of the found learning rate to the reference learning rate, and updating the found learning rate to be the reference learning rate; and (3) increasing the attenuation rate according to the ratio k, and re-executing the steps to obtain the gradient when the optimizer performs random gradient descent optimization when the neural network model is optimized by using the optimizer, so as to find the corresponding learning rate according to the gradient until the neural network model is optimized, wherein k is E (0, 1). Therefore, the dynamic adjustment and update of the parameters of the optimizer including the learning rate and the attenuation rate are realized, the neural network model is optimized by using an optimizer based on the combination of a random gradient descent method, a momentum method and the dynamic adjustment of the parameters of the optimizer, and the execution effect of the model is improved. In addition, because the attenuation rate is generally defined as a constant in the prior art, compared with the prior art, the attenuation rate is improved through the change of the learning rate, and the convergence rate of the neural network model is accelerated in the actual operation process.

Drawings

Fig. 1 is a schematic structural diagram of a terminal according to an embodiment of the present invention;

FIG. 2 is a flowchart of an embodiment of a method for updating parameters of a neural network optimizer of the present invention;

FIG. 3 is a flowchart of another embodiment of a method for updating parameters of a neural network optimizer of the present invention;

Fig. 4 is a schematic diagram of a functional module of a parameter updating device of the neural network optimizer of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic hardware structure of a terminal according to the present invention. The terminal may be a server or a computer including the memory 10 and the processor 20. In the terminal, the processor 20 is connected to the memory 10, and the memory 10 stores a computer program, which is executed by the processor 20 at the same time, to implement the steps of the method corresponding to the embodiment described below.

The memory 10 may be used to store software programs as well as various data. The memory 10 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as optimizing a neural network model using an optimizer), and the like; the storage data area may include a database, and the storage data area may store data or information created according to the use of the terminal, etc. In addition, memory 10 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The processor 20, which is a control center of the terminal, connects various parts of the entire terminal using various interfaces and lines, performs various functions of the terminal and processes data by running or executing software programs and/or modules stored in the memory 10, and calling data stored in the memory 10, thereby performing overall monitoring of the terminal. Processor 20 may include one or more processing units; alternatively, the processor 20 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 20.

Although not shown in fig. 1, the terminal may further include a circuit control module for connecting to a power source to ensure normal operation of other components. The terminal may further include a display module, configured to extract the data in the memory 10, and display the data as a front display interface of the terminal and an operation result of the neural network model when the neural network model is applied to classification. The terminal may further include a communication module for connecting with an external communication device through a network. The communication module can receive a request sent by external communication equipment and can also send the request, the instruction and the information to the external communication equipment.

It will be appreciated by those skilled in the art that the terminal structure shown in fig. 1 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

Based on the above hardware structure, various embodiments of the method of the present invention are presented.

Referring to fig. 2, in an embodiment of a method for updating parameters of a neural network optimizer of the present invention, the method includes:

Step S10, when an optimizer based on a random gradient descent combined momentum method is used for optimizing a neural network model, setting an initial value of a learning rate in the optimizer as a reference learning rate, and acquiring a gradient of the optimizer during random gradient descent optimization so as to find a corresponding learning rate according to the gradient, wherein the learning rate is reduced along with gradient descent;

The neural network model in the present embodiment is a classification model, and can be used for text classification, image classification, and the like, and a brief description of the procedure will be given below with the application of text classification. After training texts are subjected to preprocessing operations such as word segmentation and the like, text word segmentation is converted into corresponding word vectors through a pre-trained dictionary, the word vectors are required to be input into a feature extraction neural network to obtain output word vectors, the output word vectors are then input into a preset classifier, the classifier is provided with a plurality of rows, and after operation is finished, the classifier outputs classification probabilities of the input word vectors corresponding to each row. It may be understood that the result of adding the classification probabilities corresponding to all the rows is 1, and a row with the highest classification probability may be selected as the classification result corresponding to the input word vector by default in the program.

It should be noted that, the feature extraction neural network used for inputting the word vector is feature extraction engineering, which is used for preserving the main feature of the word vector, so that the output word vector encapsulates enough information for classification, and has strong feature expression capability. The program running of the classifier can be performed by referring to the technical means commonly used in the field, for example, the Softmax function can be used to obtain the probability corresponding to each row. In addition, it should be noted that the classifier and the feature extraction neural network form a complete neural network model, which can be actually connected between the neural network layers, and are described in detail herein for defining the function; of course, the classifier and the feature extraction neural network may be separately provided.

In this process, parameters and weights in the neural network model need to be optimized, so that the classification result output by the Text classification model (i.e., the neural network model) is fit to reality, where the Text classification model may be at least one of Text CNN (Text Convolutional Neural Network ), text RNN (Text Recurrent Neural Network, text convolutional neural network) or Text RCNN (Text Recurrent Convolutional Neural Network, text convolutional neural network).

In this embodiment, the process of optimizing the neural network model by the optimizer refers to a complete training optimization process of the neural network model by the optimizer made by adopting the SGD combined with momentum method, wherein the optimization includes multiple times of size optimization until the training of the neural network model is completed. All parameters related to random gradient descent and momentum are related in the optimizer, and the parameters have corresponding initial values in the first optimization, and the initial value of the learning rate can be set as a reference learning rate. The learning rate represents the speed of the adjusted parameters reaching the best quality process and is used for determining the performance of the neural network model operated by the computer. In order to achieve the purpose of optimizing the neural network model, the learning rate can be controlled by the processor to gradually decrease along with the gradient decrease in the random gradient decrease method, so that the learning rate gradually decreases in the iterative optimization process of the model, the adjustment speed is reduced, and the accuracy is ensured.

For the adjustment of the learning rate, the association relation between the gradient and the learning rate can be established in the memory in advance, and then the corresponding learning rate is found for adjustment by acquiring the gradient during random gradient descent optimization, wherein the overall adjustment trend based on the association relation is that the learning rate is reduced along with the descent of the gradient. Further, the gradient is obtained when the optimizer is used to perform random gradient descent, the loss function calculated by the neural network model is obtained after derivation, and the loss function can be obtained by combining the output probability of each line with the output result of the actual text in each line according to the output classification probability in each line in the text classification in a manner of calculating cross entropy, which is not described herein.

Step S20, calculating the ratio k of the found learning rate to the reference learning rate, and updating the found learning rate to be the reference learning rate;

In the optimization process of the whole optimizer on the neural network model, once learning rate can be adjusted every time, the ratio of the adjusted learning rate to the reference learning rate can be further calculated, then the adjusted learning rate is updated to the reference learning rate, and the ratio is updated in real time along with the small optimization times of the optimizer on the neural network model.

And step S30, the attenuation rate is increased according to the ratio k, and when the optimizer is used for optimizing the neural network model, the gradient of the random gradient descent optimization of the optimizer is obtained, so that the corresponding learning rate is searched according to the gradient until the neural network model is optimized, wherein k is E (0, 1).

In the process of optimizing the neural network model by the existing optimizer, the attenuation rate is generally constant, the attenuation rate is defined as a variable, and the ratio of the learning rate before and after optimization is determined according to the change of the learning rate so as to adjust the attenuation rate. It can be understood that the learning rate found decreases with the decrease of the gradient, and the learning rate is usually adjusted by changing the bit level, so that the value range of the ratio is between 0 and 1, and the actual attenuation rate is larger and larger. After the adjustment of the attenuation rate and the learning rate is completed, the new neural network model can be optimized again through the optimizer until the neural network model is optimized.

Alternatively, the adjustment process of the attenuation rate before and after the neural network model is optimized can be according to the formulaAnd (3) increasing the attenuation rate, wherein gamma ₁ is the attenuation rate after adjustment, gamma ₀ is the attenuation rate before adjustment, and k is the ratio of the found learning rate to the reference learning rate. For example, if the learning rate is adjusted from 0.02 to 0.0002, k is equal to 0.0002/0.02=0.01, and the increased attenuation rate can be calculated in combination with the attenuation rate before adjustment.

In the embodiment, when an optimizer based on a random gradient descent combined momentum method is used for optimizing a neural network model, an initial value of a learning rate in the optimizer is set as a reference learning rate, and gradients during random gradient descent optimization of the optimizer are acquired, so that corresponding learning rates are searched according to the gradients, wherein the learning rates are reduced along with gradient descent; calculating the ratio k of the found learning rate to the reference learning rate, and updating the found learning rate to be the reference learning rate; and (3) increasing the attenuation rate according to the ratio k, and re-executing the steps to obtain the gradient when the optimizer performs random gradient descent optimization when the neural network model is optimized by using the optimizer, so as to find the corresponding learning rate according to the gradient until the neural network model is optimized, wherein k is E (0, 1). Therefore, the dynamic adjustment and update of the parameters of the optimizer including the learning rate and the attenuation rate are realized, the neural network model is optimized by using an optimizer based on the combination of a random gradient descent method, a momentum method and the dynamic adjustment of the parameters of the optimizer, and the execution effect of the model is improved. In addition, because the attenuation rate is generally defined as a constant in the prior art, compared with the prior art, the attenuation rate in the optimizer is improved through the change of the learning rate, and the convergence speed of the neural network model can be accelerated and the optimization can be achieved as soon as possible in the process of optimizing the operation of the actual optimizer on the neural network.

Further, in other embodiments, the process of optimizing the neural network model by combining the random gradient descent-based momentum method may be to optimize the neural network model according to an optimizer comprising the formula θ _n＝θ_n-1-v_t, where θ _n is the currently optimized neural network model parameter, θ _n-1 is the previously optimized neural network model parameter, v _t is the momentum at the time of currently optimizing the neural network model,Gamma is the attenuation rate, v _t-1 is the momentum of the previous optimization of the neural network model, epsilon is the learning rate,/>Gradient, t is the number of optimizations. The neural network model parameters may refer to weight coefficients of the neural network model, and the like. The momentum and random gradient decline are combined with the updating and adjusting of learning rate and attenuation rate parameters in the optimizer, so that vibration can be reduced when the parameters in the neural network model are optimized and updated through a computer program, the calculation efficiency is greatly improved, the convergence speed of the neural network model is accelerated, the training speed is accelerated, and the effect is better.

Further, referring to fig. 3, in another embodiment, after the step S10, the method further includes:

step S40, monitoring whether the optimization of the neural network model by the optimizer reaches a preset stage; if yes, go to step S20; if not, executing step S50;

And S50, returning to the execution step, and when the optimizer is used for optimizing the neural network model, acquiring the gradient of the random gradient descent optimization by the optimizer so as to find the corresponding learning rate according to the gradient until the neural network model is optimized.

In this embodiment, the number of times of adjustment of the attenuation rate is limited based on the foregoing embodiment, mainly because the adjustment of the learning rate is generally a change in the number of bits, and if the attenuation rate is adjusted every time the learning rate is adjusted, the change in the actual attenuation rate may be small, and the influence on the convergence rate of the neural network model is small. After the learning rate is adjusted through each searching operation, whether the optimization effect reaches a preset stage is determined, whether the attenuation rate is adjusted is determined, the optimization times of the attenuation rate can be reduced, the update and adjustment of the attenuation rate each time can have a substantial influence on the optimization of the neural network model, and the time required by the terminal for updating the parameters of the optimizer is indirectly shortened.

Alternatively, in this embodiment, whether the optimization of the neural network model by the optimizer reaches the preset stage may be determined by the size of the batch size data that is continuously input. When the optimizer completes optimization of the neural network model once and batch size data in training sample data selected according to a preset standard is input into the optimized neural network model, monitoring whether the accumulated data amount of the training sample data input into the neural network model is equal to a preset stage threshold value; when the accumulated data amount of the training sample data input to the neural network model is equal to a preset stage threshold value, the optimization of the optimizer on the neural network model is determined to reach a preset stage. Otherwise, when the accumulated data amount of the training sample data input to the neural network model is not equal to the preset stage threshold value, determining that the optimization of the neural network model by the optimizer does not reach the preset stage. Further, the preset stage threshold may be set according to actual needs, for example, as the data amount in the accumulated input neural network model increases, thresholds corresponding to different stages are set, and differences between adjacent thresholds may be the same or different. Taking the same difference value between adjacent thresholds of each stage as an example, namely that all the thresholds of the preset stages are N times of a certain preset value, and the value of N is an integer greater than or equal to 1.

In addition, whether the preset stage is reached or not can be determined according to the optimization times of the optimizer on the neural network model, for example, the attenuation rate is adjusted once every time the learning rate is adjusted for Q times, for example, Q can be equal to 10.

In the whole process of optimizing the neural network model by the optimizer, batch size (batch_size) data are input into the neural network model after optimization once after the optimization of the neural network model is performed once by the optimizer. Where the batch size data is used in text classification as a batch size input word vector. It should be noted that, in the process of optimizing and training the neural network model, training text needs to be input into the neural network, a complete training sample data set passes through a neural network and returns to be called an EPOCH, but when the data set is very huge, the EPOCH needs to be divided into a plurality of batch size data inputs. Wherein the size of the batch in the batch size data determines the direction of descent, the larger the batch size within a reasonable range, the more accurate it determines the direction of descent of the gradient, and the less concussion is trained. In this embodiment, the batch size may be adjusted as the number of optimization increases, or may be dynamically adjusted according to the output result.

Referring to fig. 4, the present invention further proposes a parameter updating apparatus of a neural network optimizer, which may be a computer or a server, including:

The searching module 10 is configured to set an initial learning rate value in an optimizer as a reference learning rate when the neural network model is optimized by using the optimizer based on a random gradient descent combined momentum method, and acquire a gradient of the optimizer when the optimizer performs random gradient descent optimization, so as to search a corresponding learning rate according to the gradient, wherein the learning rate decreases with the descent of the gradient;

The calculating module 20 is configured to calculate a ratio k of the found learning rate to the reference learning rate, and update the found learning rate to the reference learning rate;

And the lifting module 30 is configured to lift the attenuation rate according to the ratio k, and re-execute the steps to obtain a gradient when the optimizer performs random gradient descent optimization when the optimizer performs optimization on the neural network model, so as to find a corresponding learning rate according to the gradient until the neural network model is optimized, where k is (0, 1).

Further, in another embodiment, the lifting module is further configured to perform the following formulaAnd (3) increasing the attenuation rate, wherein gamma ₁ is the attenuation rate after adjustment, gamma ₀ is the attenuation rate before adjustment, and k is the ratio of the found learning rate to the reference learning rate.

Further, in yet another embodiment, the apparatus further comprises a monitoring module; wherein,

The monitoring module is used for monitoring whether the optimization of the optimizer on the neural network model reaches a preset stage or not; triggering the calculation module to execute the step of calculating the ratio k of the found learning rate to the reference learning rate when the optimization of the optimizer to the neural network model reaches a preset stage; and when the optimizing of the neural network model by the optimizer is monitored to reach a preset stage, returning to the executing step, and when the neural network model is optimized by the optimizer, acquiring the gradient of the random gradient descent optimizing by the optimizer, so as to search the corresponding learning rate according to the gradient until the neural network model is optimized.

Further, in yet another embodiment, the monitoring module is further configured to monitor whether an accumulated data amount of training sample data input to the neural network model is equal to a preset phase threshold value when the optimizer completes optimization of the neural network model once and batch size data in training sample data selected according to a preset standard is input into the optimized neural network model; when the accumulated data amount of the training sample data input to the neural network model is equal to a preset stage threshold value, the optimization of the optimizer on the neural network model is determined to reach a preset stage.

Further, in yet another embodiment, the predetermined criteria is that the batch size increases as the number of optimizers optimize the neural network model increases.

Further, in yet another embodiment, the preset phase threshold is N times the preset value, where N is an integer greater than or equal to 1.

Further, in yet another embodiment, the apparatus further comprises:

An optimization module for optimizing the neural network model according to an optimizer comprising the formula θ _n＝θ_n-1-v_t, wherein θ _n is the currently optimized neural network model parameter, θ _n-1 is the previously optimized neural network model parameter, v _t is the momentum when the neural network model is currently optimized, Gamma is the attenuation rate, v _t-1 is the momentum of the previous optimization of the neural network model, epsilon is the learning rate,/>Gradient, t is the number of optimizations.

The present invention also proposes a computer-readable storage medium on which a computer program is stored. The computer readable storage medium may be the Memory 10 in the terminal of fig. 1, or may be at least one of ROM (Read-Only Memory)/RAM (Random Access Memory ), magnetic disk, or optical disk, and the computer readable storage medium includes several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, a terminal, or a network device) having a processor to perform the methods according to the embodiments of the present invention.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or server that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or server. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or server that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A method for updating parameters of a neural network optimizer, the method comprising the steps of:

When an optimizer based on a random gradient descent combined momentum method is used for optimizing a neural network model, setting an initial value of a learning rate in the optimizer as a reference learning rate, and acquiring gradients of the optimizer during random gradient descent optimization so as to find a corresponding learning rate according to the gradients, wherein the learning rate is reduced along with gradient descent, and the neural network model is a text classification model;

Calculating the ratio of the found learning rate to the reference learning rate Updating the found learning rate into a reference learning rate;

According to the ratio of Increasing the attenuation rate, and re-executing the steps to obtain the gradient when the optimizer performs random gradient descent optimization when the optimizer is used for optimizing the neural network model, so as to find the corresponding learning rate according to the gradient until the neural network model is optimized, wherein/>∈（0，1）；

Performing word segmentation preprocessing operation on the training text to obtain text word segmentation; converting the text word segmentation into corresponding word vectors through a pre-trained dictionary; inputting the word vectors into a feature extraction neural network to obtain a plurality of output word vectors; inputting the plurality of output word vectors into a preset text classification model to obtain classification probabilities corresponding to the plurality of output word vectors; taking the highest probability of the classification probabilities as a classification result corresponding to the word vector;

wherein said ratio is based on The step of increasing the attenuation rate includes:

According to the formula Enhancing the decay rate, wherein/>For the adjusted attenuation Rate,/>To adjust the attenuation rate before adjustment,/>The ratio of the found learning rate to the reference learning rate is set;

Wherein the optimizing the neural network model by using an optimizer based on a random gradient descent combined with momentum method comprises the following steps:

According to the inclusion formula Optimizing a neural network model, wherein/>For the currently optimized neural network model parameters,/>For the previous optimized neural network model parameters,/>For the momentum when currently optimizing the neural network model,/>，/>For attenuation rate,/>Momentum for optimizing neural network model last time,/>For learning rate,/>For gradient,/>For the number of optimizations.

2. The method for updating parameters of a neural network optimizer according to claim 1 wherein after the step of searching for the corresponding learning rate according to the gradient, further comprising:

when the optimizing of the optimizing device to the neural network model reaches a preset stage, executing the steps of: calculating the ratio of the found learning rate to the reference learning rate ；

3. The method for updating parameters of a neural network optimizer according to claim 2 wherein the step of monitoring whether the optimization of the neural network model by the optimizer reaches a preset stage comprises:

4. A method for updating parameters of a neural network optimizer as claimed in claim 3 wherein the predetermined criteria is that the batch size increases as the number of optimizations of the neural network model by the optimizer increases.

5. The method for updating parameters of a neural network optimizer as claimed in claim 3 wherein the predetermined phase threshold is N times the predetermined value, where N is an integer greater than or equal to 1.

6. A parameter updating apparatus of a neural network optimizer, the apparatus comprising:

The searching module is used for setting an initial value of a learning rate in the optimizer as a reference learning rate when the optimizer based on a random gradient descent combined momentum method is used for optimizing the neural network model, and acquiring a gradient of the optimizer during random gradient descent optimization so as to search a corresponding learning rate according to the gradient, wherein the learning rate is reduced along with the descent of the gradient, and the neural network model is a text classification model;

a calculation module for calculating the ratio of the found learning rate to the reference learning rate Updating the found learning rate into a reference learning rate;

a lifting module for according to the ratio Increasing the attenuation rate, and re-executing the steps to obtain the gradient when the optimizer performs random gradient descent optimization when the optimizer is used for optimizing the neural network model, so as to find the corresponding learning rate according to the gradient until the neural network model is optimized, wherein/>∈（0，1）；

wherein, the lifting module is further used for according to a formula Enhancing the decay rate, wherein/>For the adjusted attenuation Rate,/>To adjust the attenuation rate before adjustment,/>The ratio of the found learning rate to the reference learning rate is set;

wherein the device further comprises an optimization module for optimizing the device according to the formula Optimizing a neural network model, wherein/>For the currently optimized neural network model parameters,/>For the previous optimized neural network model parameters,/>For the momentum when currently optimizing the neural network model,/>，/>For attenuation rate,/>Momentum for optimizing neural network model last time,/>For learning rate,/>For gradient,/>For the number of optimizations.

7. A terminal, the terminal comprising: memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the method for updating parameters of a neural network optimizer as claimed in any one of claims 1 to 5.

8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the method for updating parameters of a neural network optimizer according to any one of claims 1 to 5.