CN114692815A

CN114692815A - Method for optimizing low-bit model training

Info

Publication number: CN114692815A
Application number: CN202011617715.3A
Authority: CN
Inventors: 张东
Original assignee: Hefei Ingenic Technology Co ltd
Current assignee: Hefei Ingenic Technology Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2022-07-01

Abstract

The invention provides a method for optimizing training of a low-bit model, which aims to overcome the defects in the prior art and solve the problems of serious precision loss and difficult convergence of the existing 2-bit model in the training process. The method comprises the following steps: s1, training a full-precision model: training a full-precision model based on the data set; s2, training a low-bit model: then training 4bit model, 2bit model in turn, and adopting different weight attenuation coefficient and optimizer under different bit width.

Description

Method for optimizing low-bit model training

Technical Field

The invention relates to the technical field of image processing, in particular to a method for optimizing low-bit model training.

Background

In recent years, with the rapid development of science and technology, a big data age has come. Deep learning takes a Deep Neural Network (DNN) as a model, and achieves remarkable results in key fields of many human intelligence, such as image recognition, reinforcement learning, semantic analysis and the like. The Convolutional Neural Network (CNN) is a typical DNN structure, can effectively extract hidden layer features of an image and accurately classify the image, and is widely applied to the field of image identification and detection in recent years.

In the prior art, a Relu function is mostly adopted when a full-precision model is trained, because the real number range represented by full-precision number is wide, the numerical range required in the training process can be met, however, when low bits are trained, because of the limit of bit width, the representation range is limited, the model cannot be effectively converged in the training process, and the precision of the final model is not ideal.

Commonly used terms in the art include:

convolutional Neural Networks (CNN): is a type of feedforward neural network that contains convolution calculations and has a depth structure.

And (3) quantification: quantization refers to the process of approximating a continuous value (or a large number of possible discrete values) of a signal to a finite number (or fewer) of discrete values.

Low bit: and quantizing the data into data with bit width of 8bit, 4bit or 2 bit.

Disclosure of Invention

In order to solve the problems, the method aims to overcome the defects in the prior art and solve the problems that the existing 2-bit model has serious precision loss and is difficult to converge in the training process.

Fine-tuning a low-bit model based on a full-precision model: firstly, a full-precision model is trained by using a data set to reach target precision, and then a low-bit model is finely trained on the basis of the full-precision model.

Specifically, the invention provides a method for optimizing low bit model training, which comprises the following steps:

s1, training a full-precision model: training a full-precision model based on the data set;

s2, training a low-bit model: then training 4bit model, 2bit model in turn, and adopting different weight attenuation coefficient and optimizer under different bit width.

The step S1 further includes:

s1.1, training data:

the data set for the training model is ImageNet1000, which is a subset of the ImageNet data set with about 1.2 millions of training set, 5 thousand of validation sets, 15 thousand of test sets, 1000 classes;

s1.2, establishing a model:

the basic neural network model adopted by training is MobileNet V1, and the network is a model based on deep separable convolution;

s1.3, training a network:

the basic steps for training the network are: setting the weight attenuation coefficient to be 0.0005, firstly training 60 epochs by adopting an adam optimizer, and then using an SGD optimizer until the training is finished;

s1.4, testing the network effect: and testing the network result by using the test set.

The step S2 further includes:

s2.1, data quantization: quantizing the data to be quantized to obtain low-bit data;

s2.2, carrying out low bit model training:

s2.2.1, training 4bit model;

s2.2.2, training a 2bit model;

s2.2.3, testing network effects:

s2.2.4, output network.

Step S2.1, quantization is performed according to formula 1:

description of variables: w_fFor full-precision data being an array, W_qTo simulate quantized data, max_wFull precision data W_fMedian maximum value, min_wFull precision data W_fAnd b is the bit width after quantization.

The step S2.2.1, the training 4bit model: during training, the weight sum activation is quantized to 4 bits, and the weight attenuation coefficient is set to be 0, namely weight attenuation is not adopted; and training the model until convergence by adopting an adam optimizer.

The step S2.2.2, the training 2bit model: obtaining a model with weight and activation quantized to 4 bits after training in the step S2.2.1, training the model with weight and activation quantized to 2 bits based on the model, setting a weight attenuation coefficient to 0.00005 when training the 2bit model, adopting an adam optimizer during the first 60epoch training, then reducing the learning rate to 0.0001, and adopting an SGD optimizer to train the model until convergence.

Thus, the present application has the advantages that: the method is simple, and the aim of improving the model precision of the convolutional neural network during quantification is fulfilled by training a full-precision model based on a data set, then training a 4-bit model and a 2-bit model, and adopting different weight attenuation coefficients and optimizers under different bit widths.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention.

FIG. 1 is a schematic flow diagram of the process of the present invention.

FIG. 2 is a schematic diagram of the training process of the low bit model of the present invention.

Detailed Description

In order that the technical contents and advantages of the present invention can be more clearly understood, the present invention will now be described in further detail with reference to the accompanying drawings.

As shown in fig. 1, the present invention relates to a method for optimizing low bit model training, comprising the steps of:

Specifically, the method comprises the following steps:

1, full-precision model training:

1) training data:

the data set for the training model is ImageNet1000, which is a subset of the ImageNet data set with about 1.2 millions of training set, 5 thousand of validation sets, 15 thousand of test sets, 1000 classes.

2) Model:

the basic neural network model adopted in the training is MobileNet V1, which is a model based on deep separable convolution.

3) Training a network:

the basic steps for training the network are: setting the weight attenuation coefficient to 0.0005, firstly training 60 epochs by using an adam optimizer, and then using an SGD optimizer until the training is finished.

4) Testing the network effect:

and testing the network result by using the test set.

2 low bit model training:

data quantization: the data to be quantized is quantized according to the following formula to obtain low-bit data.

Description of variables: w_fIs an array, W, for full-precision data_qTo simulate the quantized data, max_wFull precision data W_fMedian maximum value, min_wFull precision data W_fAnd b is the bit width after quantization.

The low bit model training process is shown in fig. 2, and is mainly divided into 2 steps, which are respectively:

1) training a 4-bit model:

during training, the weight sum activation is quantized to 4 bits, and the weight attenuation coefficient is set to be 0, namely, weight attenuation is not adopted. And the optimizer trains the model using adam until convergence.

2) Training a 2bit model:

obtaining a model with weight and activation quantized into 4 bits after the first training, then training the model with weight and activation quantized into 2 bits based on the model, setting a weight attenuation coefficient to be 0.00005 when training the 2bit model, adopting an adam optimizer during the first 60epoch training, then reducing the learning rate to be 0.0001, and adopting an SGD optimizer to train the model until convergence.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for optimizing training of a low bit model, the method comprising the steps of:

s1, training a full-precision model: training a full-precision model based on a data set;

2. The method for optimizing low bit rate model training of claim 1, wherein said step S1 further comprises:

s1.1, training data:

s1.2, establishing a model:

s1.3, training a network:

the basic steps for training the network are: setting the weight attenuation coefficient to be 0.0005, firstly adopting an adam optimizer to train 60 epochs, and then adopting an SGD optimizer until the training is finished;

3. The method for optimizing low bit rate model training of claim 1, wherein said step S2 further comprises:

s2.2, carrying out low bit model training:

s2.2.1, training 4bit model;

s2.2.2, training a 2bit model;

s2.2.3, testing network effects:

s2.2.4, output network.

4. A method for optimizing low bit rate model training as claimed in claim 3, wherein said step S2.1, quantization according to equation 1:

description of variables: w_fFor full-precision data being an array, W_qTo simulate the quantized data, max_wFull precision data W_fMedian maximum value, min_wFull precision data W_fAnd b is the bit width after quantization.

5. A method for optimizing low bit model training as claimed in claim 3, wherein said step S2.2.1, said training 4bit model: during training, the weight sum activation is quantized to 4 bits, and the weight attenuation coefficient is set to be 0, namely weight attenuation is not adopted; and training the model until convergence by adopting an adam optimizer.

6. A method for optimizing low bit model training as claimed in claim 5, wherein said step S2.2.2, training 2bit model: obtaining a model with weight and activation quantized to 4 bits after training in the step S2.2.1, training the model with weight and activation quantized to 2 bits based on the model, setting a weight attenuation coefficient to 0.00005 when training the 2bit model, adopting an adam optimizer during the first 60epoch training, then reducing the learning rate to 0.0001, and adopting an SGD optimizer to train the model until convergence.