CN112990420A - Pruning method for convolutional neural network model - Google Patents

Pruning method for convolutional neural network model Download PDF

Info

Publication number
CN112990420A
CN112990420A CN201911212375.3A CN201911212375A CN112990420A CN 112990420 A CN112990420 A CN 112990420A CN 201911212375 A CN201911212375 A CN 201911212375A CN 112990420 A CN112990420 A CN 112990420A
Authority
CN
China
Prior art keywords
pruning
model
network
sparse
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911212375.3A
Other languages
Chinese (zh)
Inventor
乐国庆
刘振
陈渊博
苏帅
元润一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huahang Radio Measurement Research Institute
Original Assignee
Beijing Huahang Radio Measurement Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huahang Radio Measurement Research Institute filed Critical Beijing Huahang Radio Measurement Research Institute
Priority to CN201911212375.3A priority Critical patent/CN112990420A/en
Publication of CN112990420A publication Critical patent/CN112990420A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a pruning method for a convolutional neural network model, which is characterized in that a scaling factor gamma in a BN layer is used as a means for judging the importance of network connection, a target function of sparse training is designed based on the means, the trained sparse network is subjected to network clipping according to a set pruning rate, and finally, a compression model is obtained on the premise of not reducing the performance. The invention can greatly reduce the space occupancy of model storage and improve the model prediction real-time property.

Description

Pruning method for convolutional neural network model
Technical Field
The invention relates to the technical field of computer vision, in particular to a pruning method applied to convolutional neural network model compression.
Background
In recent years, the technical development in the field of artificial intelligence is rapid, and target detection and identification algorithms based on deep learning are endless, such as a convolutional neural network model with a BN (batch normalization) layer. In a deep neural network, the input of a certain layer is the output of the previous neural layer, so the parameter change of the previous neural layer can cause the distribution of the input to generate larger difference, and each parameter update can cause the distribution of the input of each layer in the middle to change when the network is trained by using random gradient descent, and the distribution of the input of each layer in the deeper layer is more obviously changed. In order to solve the above problems, most of the current deep neural networks add a batch normalization layer (BN) during design, so that the inputs of the neural layer keep the same distribution during the training process of the network, and the network convergence speed is increased, and the inputs and outputs of the layer are as follows:
Figure BDA0002298485590000011
wherein, mu and sigma are respectively the mean value and variance of data corresponding to the convolution layer channel of a batch (batch data); γ and β are the scaling factor and offset, respectively (learnable variables for controlling the features that the network can learn before the BN is recovered).
However, in the convolutional neural network model with the bn (batch normalization) layer, because of redundancy of parameters, the algorithm usually needs to consume a large amount of computing resources, and when the convolutional neural network model is applied to engineering products, real-time execution of tasks is generally required to be realized under extremely limited computing resources and storage resources, which puts high requirements on the computational complexity of the algorithm and the refinement of the model. Therefore, for the trained network, the redundancy of the model can be reduced and the memory overhead can be reduced by clipping the unimportant connections, the weights of the boundary nodes or the convolution kernel parameters through an effective judgment means. Therefore, under the condition of ensuring the model performance, the model parameters and the operation amount are compressed to the maximum extent, and the real-time performance of the model is greatly improved.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: pruning is carried out on the convolutional neural network model with the BN layer, and the real-time performance of algorithm prediction is improved and the size of the model is reduced on the premise that the performance of the algorithm is not reduced.
In order to solve the technical problem, the invention provides a pruning method for a convolutional neural network model, which comprises the following steps:
step S1, preparing a data set and a deep convolution neural network model;
step S2, training a sparse network: aiming at the deep convolutional neural network, a scaling factor gamma is added into a loss function, and a sparse network model is trained, specifically as shown in the following formula,
Figure BDA0002298485590000021
wherein
Figure BDA0002298485590000022
Which is the basic loss of the original network,
Figure BDA0002298485590000023
an L1 regular constraint term of a scaling factor gamma coefficient of the introduced BN layer is used for inducing sparsification of the BN layer, and lambda is a sparse factor;
step S3, network pruning: network cutting is carried out on the trained sparse network according to a set pruning rate, and a pruned network model is obtained;
step S4, network fine adjustment: and performing fine tuning training in a training set by using the network structure and the model after pruning to finally obtain the pruning network model.
Further, the step S2 includes the following sub-steps:
step S201, analyzing a deep neural network structure, and marking the convolution layer with the BN layer;
s202, setting a sparse rate lambda, taking the deep convolutional network model prepared in the step S1 as a pre-training model, and performing sparse training in a training set;
and S203, observing the loss descending trend and the average performance MAP changing trend in the verification set in the training process, and adjusting the sparse rate lambda in time to ensure that the model performance is not reduced in the sparse training process.
Further, the step S3 includes the following sub-steps:
step S301, for the sparse network model obtained in step S2, traversing the maximum value of recorded gamma in each BN layer, setting the minimum value as the upper limit of a threshold value, then setting a pruning rate P, arranging the gamma values of all channels in all BN layers in a descending order, and taking the gamma value of the index corresponding to the pruning rate as a pruning threshold value Thr, wherein the threshold value Thr cannot be greater than the upper limit of the threshold value;
step S302, aiming at each convolution layer, firstly sorting channels according to gamma values of BN layers, then obtaining the residual channel number C of the layer according to a threshold value Thr, and configuring the channel number C which tends to be a multiple of 32;
and S303, determining the network structure after pruning, and storing a preliminary pruning model.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a pruning method for a deep convolutional neural network with a BN (boron nitride) layer, wherein a scaling factor gamma in the BN layer is innovatively used as a means for judging the importance of network connection, a target function of sparse training is designed based on the means, the method is suitable for different data sets and algorithms, meanwhile, when unimportant network connection is finally cut, the consideration on parallelization of a computing platform is increased, the network structure after pruning is set to be normalized connection (the number of channels is 32 times), finally, a compression model can be obtained on the premise of not reducing the performance, the space occupation rate of model storage can be greatly reduced, and the model prediction real-time performance is improved.
Drawings
FIG. 1 is a flow chart of a pruning method provided by an embodiment of the present invention;
FIG. 2 shows the variation trend of loss and MAP during the sparse training process according to the embodiment of the present invention;
fig. 3 is a comparison of the number of channels before and after pruning with a BN layer in a network according to an embodiment of the present invention.
Detailed Description
For the deep neural network added with the BN layer, different channels of each convolutional layer correspond to different scaling factors gamma, and the values are continuously learned in the network training process, so that the importance degree of the channels can be represented through the scaling factors gamma of the BN layer, the importance of network connection is judged, and the network is cut. The following further describes the embodiments of the present invention with reference to the drawings and examples.
As shown in fig. 1, an embodiment of the present invention provides a pruning method for a convolutional neural network model, including the following steps:
step S1, preparation of data sets and deep network models. The data set is divided into: training a network model according to an original network by using a training set, a verification set and a test set.
And step S2, training a sparse network.
Step S201, analyzing a deep neural network structure, and marking the convolution layer with the BN layer;
step S202, aiming at the deep convolutional neural network, utilizing the scaling factor gamma in the BN layer to measure the importance of the channel in the training process, adding the scaling factor gamma into the objective function, and training a sparse network model, as shown in the following formula,
Figure BDA0002298485590000051
wherein
Figure BDA0002298485590000052
Which is the basic loss of the original network,
Figure BDA0002298485590000053
an L1 regular constraint term of a scaling factor gamma coefficient of the introduced BN layer is used for inducing sparsification of the BN layer, and lambda is a sparse factor;
and S203, observing the loss descending trend and the average performance MAP changing trend in the verification set in the training process, and adjusting the sparse rate lambda in time to ensure that the model performance is not reduced in the sparse training process.
And step S3, network pruning. And performing network cutting on the trained sparse network according to the artificially set pruning rate to obtain a pruned network model.
Step S301, for the sparse network model obtained in step S2, in order to avoid pruning all channels of the convolutional layer, traverse the maximum value of γ recorded in each BN layer, and set the minimum value therein as the upper threshold limit, then set the pruning rate P (the default initial value is set to 0.5, and then fine-tune the P value according to the model performance after setting the pruning rate, so as to ensure that the pruning rate reaches the maximum value under the premise of not changing the performance), sort the γ values of the channels in all BN layers in descending order, and use the γ value of the index corresponding to the pruning rate as the pruning threshold Thr, where the threshold Thr cannot be greater than the upper threshold limit.
Step S302, when the number of channels of the model is a multiple of 32 in the forward prediction process, no matter whether the CPU or GPU operation can reach the maximum parallel efficiency, after the threshold value Thr is successfully obtained, for each convolution layer, channel sorting is firstly carried out according to the gamma value of the BN layer, then the residual number of channels C of the layer is obtained according to Thr, and the channel number C is configured to be a multiple of 32.
And S303, determining the network structure after pruning, and storing a preliminary pruning model.
And step S4, fine adjustment of the network. And performing fine tuning training in a training set by using the network structure and the model after pruning to finally obtain a pruning model.
The following is an application embodiment of the present invention, and discloses a pruning method applied to the YOLOv3 target detection algorithm, which includes the following steps:
and S1, preparing a data set, dividing the data set into a training set, a verification set and a test set, and training a YOLOv3 original network to obtain an original model.
And step S2, analyzing the original model to obtain a BN layer subscript index, then setting a sparse rate lambda, starting sparse training, observing loss and MAP change in the training process, and adjusting the sparse rate in time as shown in FIG. 2.
Step S3, setting a pruning rate P for the model obtained by sparse training to obtain a pruning threshold Thr, traversing the convolutional layer with the BN layer, and modifying the number of channels left after pruning; the corresponding original number of channels, the remaining number of channels and the modified remaining number of channels of the convolutional layer are shown in fig. 3; and obtaining the network structure after pruning and a preliminary pruning network model.
And step S4, fine-tuning the network structure after training pruning on the training set, taking the preliminary pruning network model obtained in the previous step as a training model, and training until the loss is stable and unchanged to obtain a final pruning model. As shown in Table 1, the model is compared before and after pruning. (test environment: untu16.04 environment, GPU GTX1080 Ti).
TABLE 1 comparison of models before and after pruning
Size of model MAP Speed of rotation
Original model 248MB 0.812563 0.0120s
Pruning model 38.9MB 0.815629 0.0066s
Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (3)

1. A pruning method for a convolutional neural network model, comprising the steps of:
step S1, preparing a data set and a deep convolution neural network model;
step S2, training a sparse network: aiming at the deep convolutional neural network, a scaling factor gamma is added into a loss function, and a sparse network model is trained, as shown in the following formula,
Figure FDA0002298485580000011
wherein
Figure FDA0002298485580000012
Which is the basic loss of the original network,
Figure FDA0002298485580000013
an L1 regular constraint term of a scaling factor gamma coefficient of the introduced BN layer is used for inducing sparsification of the BN layer, and lambda is a sparse factor;
step S3, network pruning: network cutting is carried out on the trained sparse network according to a set pruning rate, and a pruned network model is obtained;
step S4, network fine adjustment: and performing fine tuning training in the training set by using the network model after pruning to finally obtain the pruning model.
2. The pruning method for the convolutional neural network model as set forth in claim 1, wherein the step S2 comprises the sub-steps of:
step S201, analyzing a deep neural network structure, and marking the convolution layer with the BN layer;
s202, setting a sparse rate lambda, taking the deep convolutional network model prepared in the step S1 as a pre-training model, and performing sparse training in a training set;
and S203, observing the loss descending trend and the average performance MAP changing trend in the verification set in the training process, and adjusting the sparse rate lambda in time to ensure that the model performance is not reduced in the sparse training process.
3. The pruning method for the convolutional neural network model as set forth in claim 1, wherein the step S3 comprises the sub-steps of:
step S301, for the sparse network model obtained in step S2, traversing the maximum value of recorded gamma in each BN layer, setting the minimum value as the upper limit of a threshold value, then setting a pruning rate P, arranging the gamma values of all channels in all BN layers in a descending order, and taking the gamma value of the index corresponding to the pruning rate as a pruning threshold value Thr, wherein the threshold value Thr cannot be greater than the upper limit of the threshold value;
step S302, aiming at each convolution layer, firstly sorting channels according to gamma values of BN layers, then obtaining the residual channel number C of the layer according to a threshold value Thr, and configuring the channel number C which tends to be a multiple of 32;
and S303, determining the network structure after pruning, and storing a preliminary pruning model.
CN201911212375.3A 2019-12-02 2019-12-02 Pruning method for convolutional neural network model Pending CN112990420A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911212375.3A CN112990420A (en) 2019-12-02 2019-12-02 Pruning method for convolutional neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911212375.3A CN112990420A (en) 2019-12-02 2019-12-02 Pruning method for convolutional neural network model

Publications (1)

Publication Number Publication Date
CN112990420A true CN112990420A (en) 2021-06-18

Family

ID=76331192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911212375.3A Pending CN112990420A (en) 2019-12-02 2019-12-02 Pruning method for convolutional neural network model

Country Status (1)

Country Link
CN (1) CN112990420A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361697A (en) * 2021-07-14 2021-09-07 深圳思悦创新有限公司 Convolution network model compression method, system and storage medium
CN113554169A (en) * 2021-07-28 2021-10-26 杭州海康威视数字技术股份有限公司 Model optimization method and device, electronic equipment and readable storage medium
CN113935484A (en) * 2021-10-19 2022-01-14 上海交通大学 Compression method and device of convolutional neural network model
CN114155602A (en) * 2021-12-02 2022-03-08 青岛大学 Human body posture estimation model sparse pruning method
WO2023191879A1 (en) * 2022-03-29 2023-10-05 Microsoft Technology Licensing, Llc Sparsity masking methods for neural network training

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361697A (en) * 2021-07-14 2021-09-07 深圳思悦创新有限公司 Convolution network model compression method, system and storage medium
CN113554169A (en) * 2021-07-28 2021-10-26 杭州海康威视数字技术股份有限公司 Model optimization method and device, electronic equipment and readable storage medium
CN113554169B (en) * 2021-07-28 2023-10-27 杭州海康威视数字技术股份有限公司 Model optimization method, device, electronic equipment and readable storage medium
CN113935484A (en) * 2021-10-19 2022-01-14 上海交通大学 Compression method and device of convolutional neural network model
CN114155602A (en) * 2021-12-02 2022-03-08 青岛大学 Human body posture estimation model sparse pruning method
CN114155602B (en) * 2021-12-02 2024-04-26 青岛大学 Sparse pruning method for human body posture estimation model
WO2023191879A1 (en) * 2022-03-29 2023-10-05 Microsoft Technology Licensing, Llc Sparsity masking methods for neural network training

Similar Documents

Publication Publication Date Title
CN112990420A (en) Pruning method for convolutional neural network model
CN107506865B (en) Load prediction method and system based on LSSVM optimization
WO2020224297A1 (en) Method and device for determining computer-executable integrated model
CN107798379B (en) Method for improving quantum particle swarm optimization algorithm and application based on improved algorithm
CN111079899A (en) Neural network model compression method, system, device and medium
CN112700060B (en) Station terminal load prediction method and prediction device
JP2023523029A (en) Image recognition model generation method, apparatus, computer equipment and storage medium
CN106971238A (en) The Short-Term Load Forecasting Method of Elman neutral nets is obscured based on T S
CN110110380B (en) Piezoelectric actuator hysteresis nonlinear modeling method and application
CN116050540B (en) Self-adaptive federal edge learning method based on joint bi-dimensional user scheduling
JPWO2019146189A1 (en) Neural network rank optimizer and optimization method
CN115578248A (en) Generalized enhanced image classification algorithm based on style guidance
KR20210032140A (en) Method and apparatus for performing pruning of neural network
CN111626328B (en) Image recognition method and device based on lightweight deep neural network
CN111985845A (en) Node priority tuning method for heterogeneous Spark cluster
CN112733458A (en) Engineering structure signal processing method based on self-adaptive variational modal decomposition
CN110263917B (en) Neural network compression method and device
CN110826692B (en) Automatic model compression method, device, equipment and storage medium
CN117290721A (en) Digital twin modeling method, device, equipment and medium
CN115115113A (en) Equipment fault prediction method and system based on graph attention network relation embedding
CN109034372B (en) Neural network pruning method based on probability
CN114417095A (en) Data set partitioning method and device
CN113761026A (en) Feature selection method, device, equipment and storage medium based on conditional mutual information
US20210271932A1 (en) Method, device, and program product for determining model compression rate
CN107995027B (en) Improved quantum particle swarm optimization algorithm and method applied to predicting network flow

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210618

RJ01 Rejection of invention patent application after publication