CN111860834B

CN111860834B - Neural network tuning method, system, terminal and storage medium

Info

Publication number: CN111860834B
Application number: CN202010657269.2A
Authority: CN
Inventors: 赵宝新; 须成忠; 赵娟娟
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2024-05-24
Anticipated expiration: 2040-07-09
Also published as: WO2022007349A1; CN111860834A

Abstract

The application relates to a neural network tuning method, a neural network tuning system, a terminal and a storage medium. The method comprises the following steps: adding intra-class pitch regularization loss into a loss function of the neural network to obtain a loss function based on intra-class pitch regularization; the intra-class pitch regularization loss represents intra-class pitch proximity between feature graphs of the same class of data output by the neural network; and in the neural network training process, inserting the loss function based on intra-class pitch regularization in the iteration times of a set proportion, and performing iterative training on the neural network by adopting a mode of overlapping training of a network optimization algorithm and the intra-class pitch regularization algorithm to obtain an optimal neural network. According to the embodiment of the application, the intra-class interval regularization algorithm is inserted into a certain proportion of iteration times to carry out iterative training on the neural network, so that the generalization capability and the anti-interference performance of the network structure are improved under the condition that the neural network structure is not changed, and the additional time expenditure caused by the generalization capability and the anti-interference performance is small.

Description

Neural network tuning method, system, terminal and storage medium

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a neural network optimization method, a system, a terminal and a storage medium.

Background

With the development of artificial intelligence, deep neural networks have achieved dramatic results in a number of fields with their strong fitting capabilities. However, the strong fitting capability of the neural network still does not have strong theoretical support, and the capability of the deep neural network still has strong development space. In order to obtain better generalization capability of the neural network, the existing network structure becomes more and more complex, the parameter number of the network also presents an explosive growth trend, the training process of the complex neural network becomes very slow, and the energy consumption caused by training the network also increases exponentially. In addition, the neural network has the problems of poor anti-interference performance, sensitivity to input data and the like. In order to improve the above problems, there is a need to improve the ability of neural networks to generalize.

At present, the main methods for improving the generalization capability of the neural network are divided into three categories, specifically:

1. From the network structure, a better network structure is designed. The method is mainly used for correcting the network structure from three angles of depth, width and breadth of the deep neural network, and the complex network structure can bring better performance benefit, but the method also can cause huge parameter quantity of the network, longer training time and larger energy consumption.

2. More training data is provided so that the neural network can recognize more data types. This method requires more data of good quality and requires very much manpower and computational effort to process the data.

3. From the training process, a better super-parameter or optimization method is found. The method can improve the network performance under the condition of not changing the network structure. However, the super-parametric search also requires a network training process that is iterated a very large number of times, which is very computationally and energy intensive.

Disclosure of Invention

The application provides a neural network tuning method, a neural network tuning system, a neural network tuning terminal and a neural network storage medium, which aim to solve one of the technical problems in the prior art at least to a certain extent.

In order to solve the problems, the application provides the following technical scheme:

a neural network tuning method, comprising the steps of:

Adding intra-class pitch regularization loss into a loss function of the neural network to obtain a loss function based on intra-class pitch regularization; the intra-class pitch regularization loss represents intra-class pitch proximity between feature graphs of the same class of data output by the neural network; given a neural network structure z (x _i, ω), ω is a parameter of the neural network, x _i is input data, the intra-class pitch regularization based loss function is:

Where y _i is the label corresponding to input data x _i, L (z (x _i,ω),y_i) is the empirical loss function, lambda is a hyper-parameter, For intra-class pitch regularization loss,/>Is a feature map set of all classes;

And in the neural network training process, inserting the loss function based on intra-class pitch regularization in the iteration times of a set proportion, and performing iterative training on the neural network by adopting a mode of overlapping training of a network optimization algorithm and the intra-class pitch regularization algorithm to obtain an optimal neural network.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the loss function based on intra-class distance regularization is used for extracting class centers of feature graphs of data of each class, and the class centers are calculated in the following way:

expanding the loss function based on intra-class pitch regularization to obtain:

In the above-mentioned formula(s), Representing a characteristic diagram obtained by the input data x _i through a neural network, the label of the data x _i is c and is marked as l (x _i)∈c,fm_c is the category center of the characteristic diagram of all the data with the label of c)/>The distance between the feature map representing data x _i and the center of the category to which it belongs;

the calculation formula of the feature map of the class center is as follows:

In the above expression, |x _c | represents the number of samples belonging to the category c among all samples X.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the iterative training of the neural network by the network optimization algorithm in a mode of overlapping training of the network optimization algorithm and the intra-class interval regularization algorithm comprises the following steps:

The method comprises the steps that one complete training of a neural network is recorded as one epoch, T epochs are shared in the whole training process, firstly, iterative training is conducted through a network optimization algorithm, when the iteration number of the network optimization algorithm reaches a set number threshold, the iteration number which is 10% of the set number threshold is inserted after the epoch which is trained through the network optimization algorithm for the last time, iterative training is conducted through an intra-class interval regularization algorithm, and iterative training is conducted through the network optimization algorithm again after the epoch which is trained through the intra-class interval regularization algorithm for the last time; this is cycled until T epochs are completed.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the set frequency threshold value of the network optimization algorithm iteration frequency is 20 times.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the loss function of the intra-class pitch regularization algorithm is:

The technical scheme adopted by the embodiment of the application further comprises the following steps: the network optimization algorithm is a random gradient descent method.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the loss function of the random gradient descent method is as follows:

the embodiment of the application adopts another technical scheme that: a neural network tuning system, comprising:

Loss function optimizing module: the method comprises the steps of adding intra-class pitch regularization loss into a loss function of a neural network to obtain a loss function based on intra-class pitch regularization; the intra-class pitch regularization loss represents intra-class pitch proximity between feature graphs of the same class of data output by the neural network; given a neural network structure z (x _i, ω), ω is a parameter of the neural network, x _i is input data, the intra-class pitch regularization based loss function is:

Model training module: and in the neural network training process, inserting the loss function based on intra-class pitch regularization in the iteration times of a set proportion, and performing iterative training on the neural network by adopting a mode of overlapping training of a network optimization algorithm and the intra-class pitch regularization algorithm to obtain an optimal neural network.

The embodiment of the application adopts the following technical scheme: a terminal comprising a processor, a memory coupled to the processor, wherein,

The memory stores program instructions for implementing the neural network tuning method;

the processor is configured to execute the program instructions stored by the memory to control neural network tuning.

The embodiment of the application adopts the following technical scheme: a storage medium storing program instructions executable by a processor for performing the neural network tuning method.

Compared with the prior art, the embodiment of the application has the beneficial effects that: according to the neural network optimization method, system, terminal and storage medium, the intra-class interval regularization loss is added into the loss function of the neural network, in the network training process, the intra-class interval regularization algorithm is inserted into a certain proportion of iteration times to carry out iterative training on the neural network, the intra-class interval regularization algorithm only needs to extract the class center of the feature map of the sample data of each class, and other iteration processes are consistent with the training method of the original network, so that the generalization capability and anti-interference performance of the network structure are improved under the condition that the neural network structure is not changed, and the additional time expenditure brought by the intra-class interval regularization algorithm is small.

Drawings

FIG. 1 is a flow chart of a neural network tuning method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a neural network architecture;

FIG. 3 is a neural network training schematic diagram according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a neural network tuning system according to an embodiment of the present application;

fig. 5 is a schematic diagram of a terminal structure according to an embodiment of the present application;

Fig. 6 is a schematic structural diagram of a storage medium according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In order to solve the defects in the prior art, the neural network tuning method of the embodiment of the application adds intra-class interval regularization loss into the loss function of the neural network, inserts an intra-class interval regularization algorithm into a certain proportion of iteration times in the process of network training to carry out iterative training on the neural network, and the rest iteration processes are consistent with the training method of the original network, so that the generalization capability of the network structure is improved under the condition of not changing the neural network structure, and the additional time expenditure brought by the method is small.

Referring to fig. 1, a flowchart of a neural network tuning method according to an embodiment of the application is shown. The neural network tuning method of the embodiment of the application comprises the following steps:

step 100: adding intra-class pitch regularization loss into a loss function of the neural network to obtain a loss function based on intra-class pitch regularization;

In step 100, the neural network generally includes a convolutional layer and a fully-connected layer, and specifically, as shown in fig. 2, a schematic structural diagram of the neural network is shown. Wherein the trapezoid is a convolution layer, and the triangle is a full connection layer. Taking a picture classification task as an example, after a picture is input, firstly, feature extraction is performed through a plurality of convolution layers, a Feature Map (Feature Map) is output, and then an output picture classification result is obtained through a plurality of full-connection layers.

Let the input data be denoted as x= { X ₁,x₂,…,x_n }, where n is the number of samples; for each input x _i, the signature output is denoted FM _i, the labels of the input data are denoted y= {1,2, …, C }, and the classification categories of the data together comprise class C. In order to enable the neural network to accurately identify and classify input data, the output feature graphs of the neural network are assumed to have a very good clustering effect, namely the intra-class spacing between the feature graphs of the data in the same class is as small as possible, and on the basis of the assumption, the intra-class spacing regularization loss is proposed by the embodiment of the application.

Given a neural network structure z (x _i, ω), ω is a parameter of the neural network, x _i is input data, whose loss functions include empirical and regularized loss functions, the loss functions can be generalized as:

in equation (1), y _i is a label corresponding to the input data x _i, L (z (x _i,ω),y_i) is an empirical loss function, Ω (ω) is a regularized loss function, and λ is a super parameter for balancing the empirical loss function and the regularized loss function.

Based on the above equation, adding intra-class pitch regularization loss into the loss function to obtain a loss function based on intra-class pitch regularization:

In the formula (2), the amino acid sequence of the compound, Representing intra-class pitch regularization loss,/>Is a feature map set of all classes. Unfolding it into:

In the formula (3), the amino acid sequence of the compound, Representing the feature map obtained by the input data x _i through the neural network, the label of the data x _i is c, and is marked as l (x _i)∈c,fm_c is the clustering center of the feature map of all samples with the label of c, namely the category center,Representing the distance between the feature map of sample x _i and the center of the belonging class.

The feature map of the category center is obtained through self-learning:

In the formula (4), X _c represents the number of samples belonging to the category c among all the samples X.

Step 200: in the training process of the neural network, a loss function (ICR) based on intra-class interval regularization is inserted in iteration times of a set proportion, and the neural network is subjected to iterative training in a mode of overlapping training of a traditional network optimization algorithm and the intra-class interval regularization algorithm to obtain an optimal neural network;

In step 200, during the training of the neural network, a complete training of the model using all data samples of the training set is denoted as one epoch (epoch), the entire training process has a total of T epochs, and the index of all epochs is denoted as the sequence [0,1,2, …, T ]. Traditional network optimization algorithms include, but are not limited to, random gradient descent (Stochastic GRADIENT DESCENT, SGD), small batch gradient descent (Mini-batch GRADIENT DESCENT, MBGD), and the like. Taking SGD as an example, the loss function of the random gradient descent method is:

The loss function of the intra-class pitch regularization algorithm is:

In the embodiment of the application, the overlapping mode of the random gradient descent method and the intra-class pitch regularization method is as follows: in the neural network training process, firstly, adopting SGD to carry out iterative training, and inserting 10% of iteration times of the set time threshold after the epoch of the last SGD training to carry out ICR iterative training when the iteration times of the SGD reach the set time threshold, and then adopting SGD to carry out iterative training again after the epoch of the last ICR training. This is cycled until T epochs are completed. Specifically, fig. 3 is a schematic diagram of neural network training according to an embodiment of the present application. Wherein gray represents the training method of SGD, black represents the training method of ICR, and SGD and ICR overlap training. Preferably, through experimental comparison, when the number of times of SGD iterative training is 20, the performance improvement effect of the neural network is the best. Namely: two ICR exercises are inserted after each 20 SGD trained epoch interval, it being understood that the number of times threshold and the ratio of ICR method inserts may be adjusted according to actual practice.

Based on the above, in the embodiment of the application, by inserting the intra-class interval regularization method with a smaller proportion in the training process, the intra-class interval regularization algorithm only needs to extract the class center of the feature map of the sample data of each class, so that the generalization capability and the anti-interference performance of the neural network are improved under the condition that the structure of the neural network is not changed, and the additional burden brought to the training time of the neural network can be ignored.

To verify the feasibility and effectiveness of embodiments of the present application, experiments were performed using a common dataset CIFAR for picture recognition. Experimental results show that the embodiment of the application has very good performance improvement and anti-interference capability on white noise in VGG (Visual Geometry Group, deep convolutional neural Network), resNet (Residual Network) and the like, and the additional time cost caused by the training process is less than 5% of the original training time.

Please refer to fig. 4, which is a schematic diagram illustrating a neural network tuning system according to an embodiment of the present application. The neural network tuning system of the embodiment of the application comprises:

Loss function optimizing module: the method comprises the steps of adding intra-class pitch regularization loss into a loss function of a neural network to obtain a loss function based on intra-class pitch regularization; given a neural network structure z (x _i, ω), ω being a parameter of the neural network, x _i being input data, the loss function includes an empirical loss function and a regularized loss function, the loss function can be generalized to:

The feature map of the category center is obtained through self-learning:

Model training module: in the training process of the neural network, a loss function (ICR) based on intra-class pitch regularization is inserted in the iteration times of a set proportion, and the neural network is subjected to iterative training in a mode of overlapping training of a traditional network optimization algorithm and the intra-class pitch regularization algorithm to obtain an optimal neural network;

Specifically, in the neural network training process, a complete training of the model using all data samples of the training set is denoted as one epoch (period), the entire training process has a total of T epochs, and the indexes of all epochs are denoted as sequences [0,1,2, …, T ]. Traditional network optimization algorithms include, but are not limited to, random gradient descent (Stochastic GRADIENT DESCENT, SGD), small batch gradient descent (Mini-batch GRADIENT DESCENT, MBGD), and the like. Taking SGD as an example, the loss function of the random gradient descent method is:

The loss function of the intra-class pitch regularization algorithm is:

In the embodiment of the application, the overlapping mode of the random gradient descent method and the intra-class pitch regularization method is as follows: in the neural network training process, firstly, adopting SGD to carry out iterative training, and inserting 10% of iteration times of the set time threshold after the epoch of the last SGD training to carry out ICR iterative training when the iteration times of the SGD reach the set time threshold, and then adopting SGD to carry out iterative training again after the epoch of the last ICR training. This is cycled until T epochs are completed. Preferably, through experimental comparison, when the number of times of SGD iterative training is 20, the performance improvement effect of the neural network is the best. Namely: two ICR exercises are inserted after each 20 SGD trained epoch interval, it being understood that the number of times threshold and the ratio of ICR method inserts may be adjusted according to actual practice.

Fig. 5 is a schematic diagram of a terminal structure according to an embodiment of the application. The terminal 50 includes a processor 51, a memory 52 coupled to the processor 51.

The memory 52 stores program instructions for implementing the neural network tuning method described above.

The processor 51 is configured to execute program instructions stored in the memory 52 to control the tuning of the neural network.

The processor 51 may also be referred to as a CPU (Central Processing Unit ). The processor 51 may be an integrated circuit chip with signal processing capabilities. Processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Fig. 6 is a schematic structural diagram of a storage medium according to an embodiment of the application. The storage medium of the embodiment of the present application stores a program file 61 capable of implementing all the methods described above, where the program file 61 may be stored in the storage medium in the form of a software product, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes, or a terminal device such as a computer, a server, a mobile phone, a tablet, or the like.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The neural network optimizing method is applied to picture classification, after a picture is input, firstly, feature extraction is carried out through a plurality of convolution layers, a feature image is output, and then an output picture classification result is obtained through a plurality of full-connection layers, and is characterized by comprising the following steps:

In the neural network training process, inserting the loss function based on intra-class pitch regularization in iteration times of a set proportion, and performing iterative training on the neural network by adopting a mode of overlapping training of a network optimization algorithm and the intra-class pitch regularization algorithm to obtain an optimal neural network;

the class center calculation method is that:

the calculation formula of the feature map of the class center is as follows:

2. The neural network tuning method of claim 1, wherein the iterative training of the neural network by the network optimization algorithm in a manner that the network optimization algorithm overlaps with an intra-class pitch regularization algorithm comprises:

3. The neural network tuning method of claim 2, wherein the set number of iterations of the network optimization algorithm is 20.

4. A neural network tuning method according to any one of claims 1 to 3, wherein the intra-class pitch regularization algorithm has a loss function of:

5. A neural network tuning method according to any one of claims 1 to 3, wherein the network optimization algorithm is a random gradient descent method.

6. The neural network tuning method of claim 5, wherein the random gradient descent method has a loss function of:

7. The utility model provides a neural network accent system, is applied to the picture classification, when an input picture, carries out the feature extraction through a plurality of layers convolution layer at first to output the feature map, then obtains the picture classification result of output through a plurality of full-connection layer, its characterized in that includes:

Model training module: the method comprises the steps of inserting the loss function based on intra-class pitch regularization in iteration times of a set proportion in the neural network training process, and carrying out iterative training on the neural network by adopting a mode of overlapping training of a network optimization algorithm and the intra-class pitch regularization algorithm to obtain an optimal neural network;

the class center calculation method is that:

the calculation formula of the feature map of the class center is as follows:

8. A terminal comprising a processor, a memory coupled to the processor, wherein,

The memory stores program instructions for implementing the neural network tuning method of any one of claims 1-6;

9. A storage medium storing program instructions executable by a processor for performing the neural network tuning method of any one of claims 1 to 6.