CN112990420A

CN112990420A - Pruning method for convolutional neural network model

Info

Publication number: CN112990420A
Application number: CN201911212375.3A
Authority: CN
Inventors: 乐国庆; 刘振; 陈渊博; 苏帅; 元润一
Original assignee: Beijing Huahang Radio Measurement Research Institute
Current assignee: Beijing Huahang Radio Measurement Research Institute
Priority date: 2019-12-02
Filing date: 2019-12-02
Publication date: 2021-06-18

Abstract

The invention provides a pruning method for a convolutional neural network model, which is characterized in that a scaling factor gamma in a BN layer is used as a means for judging the importance of network connection, a target function of sparse training is designed based on the means, the trained sparse network is subjected to network clipping according to a set pruning rate, and finally, a compression model is obtained on the premise of not reducing the performance. The invention can greatly reduce the space occupancy of model storage and improve the model prediction real-time property.

Description

Pruning method for convolutional neural network model

Technical Field

The invention relates to the technical field of computer vision, in particular to a pruning method applied to convolutional neural network model compression.

Background

In recent years, the technical development in the field of artificial intelligence is rapid, and target detection and identification algorithms based on deep learning are endless, such as a convolutional neural network model with a BN (batch normalization) layer. In a deep neural network, the input of a certain layer is the output of the previous neural layer, so the parameter change of the previous neural layer can cause the distribution of the input to generate larger difference, and each parameter update can cause the distribution of the input of each layer in the middle to change when the network is trained by using random gradient descent, and the distribution of the input of each layer in the deeper layer is more obviously changed. In order to solve the above problems, most of the current deep neural networks add a batch normalization layer (BN) during design, so that the inputs of the neural layer keep the same distribution during the training process of the network, and the network convergence speed is increased, and the inputs and outputs of the layer are as follows:

wherein, mu and sigma are respectively the mean value and variance of data corresponding to the convolution layer channel of a batch (batch data); γ and β are the scaling factor and offset, respectively (learnable variables for controlling the features that the network can learn before the BN is recovered).

However, in the convolutional neural network model with the bn (batch normalization) layer, because of redundancy of parameters, the algorithm usually needs to consume a large amount of computing resources, and when the convolutional neural network model is applied to engineering products, real-time execution of tasks is generally required to be realized under extremely limited computing resources and storage resources, which puts high requirements on the computational complexity of the algorithm and the refinement of the model. Therefore, for the trained network, the redundancy of the model can be reduced and the memory overhead can be reduced by clipping the unimportant connections, the weights of the boundary nodes or the convolution kernel parameters through an effective judgment means. Therefore, under the condition of ensuring the model performance, the model parameters and the operation amount are compressed to the maximum extent, and the real-time performance of the model is greatly improved.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: pruning is carried out on the convolutional neural network model with the BN layer, and the real-time performance of algorithm prediction is improved and the size of the model is reduced on the premise that the performance of the algorithm is not reduced.

In order to solve the technical problem, the invention provides a pruning method for a convolutional neural network model, which comprises the following steps:

step S1, preparing a data set and a deep convolution neural network model;

step S2, training a sparse network: aiming at the deep convolutional neural network, a scaling factor gamma is added into a loss function, and a sparse network model is trained, specifically as shown in the following formula,

wherein

Which is the basic loss of the original network,

an L1 regular constraint term of a scaling factor gamma coefficient of the introduced BN layer is used for inducing sparsification of the BN layer, and lambda is a sparse factor;

step S3, network pruning: network cutting is carried out on the trained sparse network according to a set pruning rate, and a pruned network model is obtained;

step S4, network fine adjustment: and performing fine tuning training in a training set by using the network structure and the model after pruning to finally obtain the pruning network model.

Further, the step S2 includes the following sub-steps:

step S201, analyzing a deep neural network structure, and marking the convolution layer with the BN layer;

s202, setting a sparse rate lambda, taking the deep convolutional network model prepared in the step S1 as a pre-training model, and performing sparse training in a training set;

and S203, observing the loss descending trend and the average performance MAP changing trend in the verification set in the training process, and adjusting the sparse rate lambda in time to ensure that the model performance is not reduced in the sparse training process.

Further, the step S3 includes the following sub-steps:

step S301, for the sparse network model obtained in step S2, traversing the maximum value of recorded gamma in each BN layer, setting the minimum value as the upper limit of a threshold value, then setting a pruning rate P, arranging the gamma values of all channels in all BN layers in a descending order, and taking the gamma value of the index corresponding to the pruning rate as a pruning threshold value Thr, wherein the threshold value Thr cannot be greater than the upper limit of the threshold value;

step S302, aiming at each convolution layer, firstly sorting channels according to gamma values of BN layers, then obtaining the residual channel number C of the layer according to a threshold value Thr, and configuring the channel number C which tends to be a multiple of 32;

and S303, determining the network structure after pruning, and storing a preliminary pruning model.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a pruning method for a deep convolutional neural network with a BN (boron nitride) layer, wherein a scaling factor gamma in the BN layer is innovatively used as a means for judging the importance of network connection, a target function of sparse training is designed based on the means, the method is suitable for different data sets and algorithms, meanwhile, when unimportant network connection is finally cut, the consideration on parallelization of a computing platform is increased, the network structure after pruning is set to be normalized connection (the number of channels is 32 times), finally, a compression model can be obtained on the premise of not reducing the performance, the space occupation rate of model storage can be greatly reduced, and the model prediction real-time performance is improved.

Drawings

FIG. 1 is a flow chart of a pruning method provided by an embodiment of the present invention;

FIG. 2 shows the variation trend of loss and MAP during the sparse training process according to the embodiment of the present invention;

fig. 3 is a comparison of the number of channels before and after pruning with a BN layer in a network according to an embodiment of the present invention.

Detailed Description

For the deep neural network added with the BN layer, different channels of each convolutional layer correspond to different scaling factors gamma, and the values are continuously learned in the network training process, so that the importance degree of the channels can be represented through the scaling factors gamma of the BN layer, the importance of network connection is judged, and the network is cut. The following further describes the embodiments of the present invention with reference to the drawings and examples.

As shown in fig. 1, an embodiment of the present invention provides a pruning method for a convolutional neural network model, including the following steps:

step S1, preparation of data sets and deep network models. The data set is divided into: training a network model according to an original network by using a training set, a verification set and a test set.

And step S2, training a sparse network.

step S202, aiming at the deep convolutional neural network, utilizing the scaling factor gamma in the BN layer to measure the importance of the channel in the training process, adding the scaling factor gamma into the objective function, and training a sparse network model, as shown in the following formula,

wherein

Which is the basic loss of the original network,

And step S3, network pruning. And performing network cutting on the trained sparse network according to the artificially set pruning rate to obtain a pruned network model.

Step S301, for the sparse network model obtained in step S2, in order to avoid pruning all channels of the convolutional layer, traverse the maximum value of γ recorded in each BN layer, and set the minimum value therein as the upper threshold limit, then set the pruning rate P (the default initial value is set to 0.5, and then fine-tune the P value according to the model performance after setting the pruning rate, so as to ensure that the pruning rate reaches the maximum value under the premise of not changing the performance), sort the γ values of the channels in all BN layers in descending order, and use the γ value of the index corresponding to the pruning rate as the pruning threshold Thr, where the threshold Thr cannot be greater than the upper threshold limit.

Step S302, when the number of channels of the model is a multiple of 32 in the forward prediction process, no matter whether the CPU or GPU operation can reach the maximum parallel efficiency, after the threshold value Thr is successfully obtained, for each convolution layer, channel sorting is firstly carried out according to the gamma value of the BN layer, then the residual number of channels C of the layer is obtained according to Thr, and the channel number C is configured to be a multiple of 32.

And step S4, fine adjustment of the network. And performing fine tuning training in a training set by using the network structure and the model after pruning to finally obtain a pruning model.

The following is an application embodiment of the present invention, and discloses a pruning method applied to the YOLOv3 target detection algorithm, which includes the following steps:

and S1, preparing a data set, dividing the data set into a training set, a verification set and a test set, and training a YOLOv3 original network to obtain an original model.

And step S2, analyzing the original model to obtain a BN layer subscript index, then setting a sparse rate lambda, starting sparse training, observing loss and MAP change in the training process, and adjusting the sparse rate in time as shown in FIG. 2.

Step S3, setting a pruning rate P for the model obtained by sparse training to obtain a pruning threshold Thr, traversing the convolutional layer with the BN layer, and modifying the number of channels left after pruning; the corresponding original number of channels, the remaining number of channels and the modified remaining number of channels of the convolutional layer are shown in fig. 3; and obtaining the network structure after pruning and a preliminary pruning network model.

And step S4, fine-tuning the network structure after training pruning on the training set, taking the preliminary pruning network model obtained in the previous step as a training model, and training until the loss is stable and unchanged to obtain a final pruning model. As shown in Table 1, the model is compared before and after pruning. (test environment: untu16.04 environment, GPU GTX1080 Ti).

TABLE 1 comparison of models before and after pruning

	Size of model	MAP	Speed of rotation
				Original model	248MB	0.812563	0.0120s
Pruning model	38.9MB	0.815629	0.0066s

Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A pruning method for a convolutional neural network model, comprising the steps of:

step S1, preparing a data set and a deep convolution neural network model;

step S2, training a sparse network: aiming at the deep convolutional neural network, a scaling factor gamma is added into a loss function, and a sparse network model is trained, as shown in the following formula,

wherein

Which is the basic loss of the original network,

step S4, network fine adjustment: and performing fine tuning training in the training set by using the network model after pruning to finally obtain the pruning model.

2. The pruning method for the convolutional neural network model as set forth in claim 1, wherein the step S2 comprises the sub-steps of:

3. The pruning method for the convolutional neural network model as set forth in claim 1, wherein the step S3 comprises the sub-steps of: