CN112990420A - Pruning method for convolutional neural network model - Google Patents
Pruning method for convolutional neural network model Download PDFInfo
- Publication number
- CN112990420A CN112990420A CN201911212375.3A CN201911212375A CN112990420A CN 112990420 A CN112990420 A CN 112990420A CN 201911212375 A CN201911212375 A CN 201911212375A CN 112990420 A CN112990420 A CN 112990420A
- Authority
- CN
- China
- Prior art keywords
- pruning
- model
- network
- sparse
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013138 pruning Methods 0.000 title claims abstract description 57
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 17
- 238000012549 training Methods 0.000 claims abstract description 39
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 5
- 238000005520 cutting process Methods 0.000 claims description 3
- 230000001939 inductive effect Effects 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 2
- 238000003860 storage Methods 0.000 abstract description 5
- 230000006835 compression Effects 0.000 abstract description 3
- 238000007906 compression Methods 0.000 abstract description 3
- 238000009826 distribution Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 229910052582 BN Inorganic materials 0.000 description 1
- PZNSFCLAULLKQX-UHFFFAOYSA-N Boron nitride Chemical compound N#B PZNSFCLAULLKQX-UHFFFAOYSA-N 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a pruning method for a convolutional neural network model, which is characterized in that a scaling factor gamma in a BN layer is used as a means for judging the importance of network connection, a target function of sparse training is designed based on the means, the trained sparse network is subjected to network clipping according to a set pruning rate, and finally, a compression model is obtained on the premise of not reducing the performance. The invention can greatly reduce the space occupancy of model storage and improve the model prediction real-time property.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a pruning method applied to convolutional neural network model compression.
Background
In recent years, the technical development in the field of artificial intelligence is rapid, and target detection and identification algorithms based on deep learning are endless, such as a convolutional neural network model with a BN (batch normalization) layer. In a deep neural network, the input of a certain layer is the output of the previous neural layer, so the parameter change of the previous neural layer can cause the distribution of the input to generate larger difference, and each parameter update can cause the distribution of the input of each layer in the middle to change when the network is trained by using random gradient descent, and the distribution of the input of each layer in the deeper layer is more obviously changed. In order to solve the above problems, most of the current deep neural networks add a batch normalization layer (BN) during design, so that the inputs of the neural layer keep the same distribution during the training process of the network, and the network convergence speed is increased, and the inputs and outputs of the layer are as follows:
wherein, mu and sigma are respectively the mean value and variance of data corresponding to the convolution layer channel of a batch (batch data); γ and β are the scaling factor and offset, respectively (learnable variables for controlling the features that the network can learn before the BN is recovered).
However, in the convolutional neural network model with the bn (batch normalization) layer, because of redundancy of parameters, the algorithm usually needs to consume a large amount of computing resources, and when the convolutional neural network model is applied to engineering products, real-time execution of tasks is generally required to be realized under extremely limited computing resources and storage resources, which puts high requirements on the computational complexity of the algorithm and the refinement of the model. Therefore, for the trained network, the redundancy of the model can be reduced and the memory overhead can be reduced by clipping the unimportant connections, the weights of the boundary nodes or the convolution kernel parameters through an effective judgment means. Therefore, under the condition of ensuring the model performance, the model parameters and the operation amount are compressed to the maximum extent, and the real-time performance of the model is greatly improved.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: pruning is carried out on the convolutional neural network model with the BN layer, and the real-time performance of algorithm prediction is improved and the size of the model is reduced on the premise that the performance of the algorithm is not reduced.
In order to solve the technical problem, the invention provides a pruning method for a convolutional neural network model, which comprises the following steps:
step S1, preparing a data set and a deep convolution neural network model;
step S2, training a sparse network: aiming at the deep convolutional neural network, a scaling factor gamma is added into a loss function, and a sparse network model is trained, specifically as shown in the following formula,
whereinWhich is the basic loss of the original network,an L1 regular constraint term of a scaling factor gamma coefficient of the introduced BN layer is used for inducing sparsification of the BN layer, and lambda is a sparse factor;
step S3, network pruning: network cutting is carried out on the trained sparse network according to a set pruning rate, and a pruned network model is obtained;
step S4, network fine adjustment: and performing fine tuning training in a training set by using the network structure and the model after pruning to finally obtain the pruning network model.
Further, the step S2 includes the following sub-steps:
step S201, analyzing a deep neural network structure, and marking the convolution layer with the BN layer;
s202, setting a sparse rate lambda, taking the deep convolutional network model prepared in the step S1 as a pre-training model, and performing sparse training in a training set;
and S203, observing the loss descending trend and the average performance MAP changing trend in the verification set in the training process, and adjusting the sparse rate lambda in time to ensure that the model performance is not reduced in the sparse training process.
Further, the step S3 includes the following sub-steps:
step S301, for the sparse network model obtained in step S2, traversing the maximum value of recorded gamma in each BN layer, setting the minimum value as the upper limit of a threshold value, then setting a pruning rate P, arranging the gamma values of all channels in all BN layers in a descending order, and taking the gamma value of the index corresponding to the pruning rate as a pruning threshold value Thr, wherein the threshold value Thr cannot be greater than the upper limit of the threshold value;
step S302, aiming at each convolution layer, firstly sorting channels according to gamma values of BN layers, then obtaining the residual channel number C of the layer according to a threshold value Thr, and configuring the channel number C which tends to be a multiple of 32;
and S303, determining the network structure after pruning, and storing a preliminary pruning model.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a pruning method for a deep convolutional neural network with a BN (boron nitride) layer, wherein a scaling factor gamma in the BN layer is innovatively used as a means for judging the importance of network connection, a target function of sparse training is designed based on the means, the method is suitable for different data sets and algorithms, meanwhile, when unimportant network connection is finally cut, the consideration on parallelization of a computing platform is increased, the network structure after pruning is set to be normalized connection (the number of channels is 32 times), finally, a compression model can be obtained on the premise of not reducing the performance, the space occupation rate of model storage can be greatly reduced, and the model prediction real-time performance is improved.
Drawings
FIG. 1 is a flow chart of a pruning method provided by an embodiment of the present invention;
FIG. 2 shows the variation trend of loss and MAP during the sparse training process according to the embodiment of the present invention;
fig. 3 is a comparison of the number of channels before and after pruning with a BN layer in a network according to an embodiment of the present invention.
Detailed Description
For the deep neural network added with the BN layer, different channels of each convolutional layer correspond to different scaling factors gamma, and the values are continuously learned in the network training process, so that the importance degree of the channels can be represented through the scaling factors gamma of the BN layer, the importance of network connection is judged, and the network is cut. The following further describes the embodiments of the present invention with reference to the drawings and examples.
As shown in fig. 1, an embodiment of the present invention provides a pruning method for a convolutional neural network model, including the following steps:
step S1, preparation of data sets and deep network models. The data set is divided into: training a network model according to an original network by using a training set, a verification set and a test set.
And step S2, training a sparse network.
Step S201, analyzing a deep neural network structure, and marking the convolution layer with the BN layer;
step S202, aiming at the deep convolutional neural network, utilizing the scaling factor gamma in the BN layer to measure the importance of the channel in the training process, adding the scaling factor gamma into the objective function, and training a sparse network model, as shown in the following formula,
whereinWhich is the basic loss of the original network,an L1 regular constraint term of a scaling factor gamma coefficient of the introduced BN layer is used for inducing sparsification of the BN layer, and lambda is a sparse factor;
and S203, observing the loss descending trend and the average performance MAP changing trend in the verification set in the training process, and adjusting the sparse rate lambda in time to ensure that the model performance is not reduced in the sparse training process.
And step S3, network pruning. And performing network cutting on the trained sparse network according to the artificially set pruning rate to obtain a pruned network model.
Step S301, for the sparse network model obtained in step S2, in order to avoid pruning all channels of the convolutional layer, traverse the maximum value of γ recorded in each BN layer, and set the minimum value therein as the upper threshold limit, then set the pruning rate P (the default initial value is set to 0.5, and then fine-tune the P value according to the model performance after setting the pruning rate, so as to ensure that the pruning rate reaches the maximum value under the premise of not changing the performance), sort the γ values of the channels in all BN layers in descending order, and use the γ value of the index corresponding to the pruning rate as the pruning threshold Thr, where the threshold Thr cannot be greater than the upper threshold limit.
Step S302, when the number of channels of the model is a multiple of 32 in the forward prediction process, no matter whether the CPU or GPU operation can reach the maximum parallel efficiency, after the threshold value Thr is successfully obtained, for each convolution layer, channel sorting is firstly carried out according to the gamma value of the BN layer, then the residual number of channels C of the layer is obtained according to Thr, and the channel number C is configured to be a multiple of 32.
And S303, determining the network structure after pruning, and storing a preliminary pruning model.
And step S4, fine adjustment of the network. And performing fine tuning training in a training set by using the network structure and the model after pruning to finally obtain a pruning model.
The following is an application embodiment of the present invention, and discloses a pruning method applied to the YOLOv3 target detection algorithm, which includes the following steps:
and S1, preparing a data set, dividing the data set into a training set, a verification set and a test set, and training a YOLOv3 original network to obtain an original model.
And step S2, analyzing the original model to obtain a BN layer subscript index, then setting a sparse rate lambda, starting sparse training, observing loss and MAP change in the training process, and adjusting the sparse rate in time as shown in FIG. 2.
Step S3, setting a pruning rate P for the model obtained by sparse training to obtain a pruning threshold Thr, traversing the convolutional layer with the BN layer, and modifying the number of channels left after pruning; the corresponding original number of channels, the remaining number of channels and the modified remaining number of channels of the convolutional layer are shown in fig. 3; and obtaining the network structure after pruning and a preliminary pruning network model.
And step S4, fine-tuning the network structure after training pruning on the training set, taking the preliminary pruning network model obtained in the previous step as a training model, and training until the loss is stable and unchanged to obtain a final pruning model. As shown in Table 1, the model is compared before and after pruning. (test environment: untu16.04 environment, GPU GTX1080 Ti).
TABLE 1 comparison of models before and after pruning
Size of model | MAP | Speed of rotation | |
Original model | 248MB | 0.812563 | 0.0120s |
Pruning model | 38.9MB | 0.815629 | 0.0066s |
Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.
Claims (3)
1. A pruning method for a convolutional neural network model, comprising the steps of:
step S1, preparing a data set and a deep convolution neural network model;
step S2, training a sparse network: aiming at the deep convolutional neural network, a scaling factor gamma is added into a loss function, and a sparse network model is trained, as shown in the following formula,
whereinWhich is the basic loss of the original network,an L1 regular constraint term of a scaling factor gamma coefficient of the introduced BN layer is used for inducing sparsification of the BN layer, and lambda is a sparse factor;
step S3, network pruning: network cutting is carried out on the trained sparse network according to a set pruning rate, and a pruned network model is obtained;
step S4, network fine adjustment: and performing fine tuning training in the training set by using the network model after pruning to finally obtain the pruning model.
2. The pruning method for the convolutional neural network model as set forth in claim 1, wherein the step S2 comprises the sub-steps of:
step S201, analyzing a deep neural network structure, and marking the convolution layer with the BN layer;
s202, setting a sparse rate lambda, taking the deep convolutional network model prepared in the step S1 as a pre-training model, and performing sparse training in a training set;
and S203, observing the loss descending trend and the average performance MAP changing trend in the verification set in the training process, and adjusting the sparse rate lambda in time to ensure that the model performance is not reduced in the sparse training process.
3. The pruning method for the convolutional neural network model as set forth in claim 1, wherein the step S3 comprises the sub-steps of:
step S301, for the sparse network model obtained in step S2, traversing the maximum value of recorded gamma in each BN layer, setting the minimum value as the upper limit of a threshold value, then setting a pruning rate P, arranging the gamma values of all channels in all BN layers in a descending order, and taking the gamma value of the index corresponding to the pruning rate as a pruning threshold value Thr, wherein the threshold value Thr cannot be greater than the upper limit of the threshold value;
step S302, aiming at each convolution layer, firstly sorting channels according to gamma values of BN layers, then obtaining the residual channel number C of the layer according to a threshold value Thr, and configuring the channel number C which tends to be a multiple of 32;
and S303, determining the network structure after pruning, and storing a preliminary pruning model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911212375.3A CN112990420A (en) | 2019-12-02 | 2019-12-02 | Pruning method for convolutional neural network model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911212375.3A CN112990420A (en) | 2019-12-02 | 2019-12-02 | Pruning method for convolutional neural network model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112990420A true CN112990420A (en) | 2021-06-18 |
Family
ID=76331192
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911212375.3A Pending CN112990420A (en) | 2019-12-02 | 2019-12-02 | Pruning method for convolutional neural network model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112990420A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113361697A (en) * | 2021-07-14 | 2021-09-07 | 深圳思悦创新有限公司 | Convolution network model compression method, system and storage medium |
CN113554169A (en) * | 2021-07-28 | 2021-10-26 | 杭州海康威视数字技术股份有限公司 | Model optimization method and device, electronic equipment and readable storage medium |
CN113935484A (en) * | 2021-10-19 | 2022-01-14 | 上海交通大学 | Compression method and device of convolutional neural network model |
CN114155602A (en) * | 2021-12-02 | 2022-03-08 | 青岛大学 | Human body posture estimation model sparse pruning method |
WO2023191879A1 (en) * | 2022-03-29 | 2023-10-05 | Microsoft Technology Licensing, Llc | Sparsity masking methods for neural network training |
-
2019
- 2019-12-02 CN CN201911212375.3A patent/CN112990420A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113361697A (en) * | 2021-07-14 | 2021-09-07 | 深圳思悦创新有限公司 | Convolution network model compression method, system and storage medium |
CN113554169A (en) * | 2021-07-28 | 2021-10-26 | 杭州海康威视数字技术股份有限公司 | Model optimization method and device, electronic equipment and readable storage medium |
CN113554169B (en) * | 2021-07-28 | 2023-10-27 | 杭州海康威视数字技术股份有限公司 | Model optimization method, device, electronic equipment and readable storage medium |
CN113935484A (en) * | 2021-10-19 | 2022-01-14 | 上海交通大学 | Compression method and device of convolutional neural network model |
CN114155602A (en) * | 2021-12-02 | 2022-03-08 | 青岛大学 | Human body posture estimation model sparse pruning method |
CN114155602B (en) * | 2021-12-02 | 2024-04-26 | 青岛大学 | Sparse pruning method for human body posture estimation model |
WO2023191879A1 (en) * | 2022-03-29 | 2023-10-05 | Microsoft Technology Licensing, Llc | Sparsity masking methods for neural network training |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112990420A (en) | Pruning method for convolutional neural network model | |
CN107506865B (en) | Load prediction method and system based on LSSVM optimization | |
WO2020224297A1 (en) | Method and device for determining computer-executable integrated model | |
CN107798379B (en) | Method for improving quantum particle swarm optimization algorithm and application based on improved algorithm | |
CN111079899A (en) | Neural network model compression method, system, device and medium | |
CN112700060B (en) | Station terminal load prediction method and prediction device | |
JP2023523029A (en) | Image recognition model generation method, apparatus, computer equipment and storage medium | |
CN106971238A (en) | The Short-Term Load Forecasting Method of Elman neutral nets is obscured based on T S | |
CN110110380B (en) | Piezoelectric actuator hysteresis nonlinear modeling method and application | |
CN116050540B (en) | Self-adaptive federal edge learning method based on joint bi-dimensional user scheduling | |
JPWO2019146189A1 (en) | Neural network rank optimizer and optimization method | |
CN115578248A (en) | Generalized enhanced image classification algorithm based on style guidance | |
KR20210032140A (en) | Method and apparatus for performing pruning of neural network | |
CN111626328B (en) | Image recognition method and device based on lightweight deep neural network | |
CN111985845A (en) | Node priority tuning method for heterogeneous Spark cluster | |
CN112733458A (en) | Engineering structure signal processing method based on self-adaptive variational modal decomposition | |
CN110263917B (en) | Neural network compression method and device | |
CN110826692B (en) | Automatic model compression method, device, equipment and storage medium | |
CN117290721A (en) | Digital twin modeling method, device, equipment and medium | |
CN115115113A (en) | Equipment fault prediction method and system based on graph attention network relation embedding | |
CN109034372B (en) | Neural network pruning method based on probability | |
CN114417095A (en) | Data set partitioning method and device | |
CN113761026A (en) | Feature selection method, device, equipment and storage medium based on conditional mutual information | |
US20210271932A1 (en) | Method, device, and program product for determining model compression rate | |
CN107995027B (en) | Improved quantum particle swarm optimization algorithm and method applied to predicting network flow |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210618 |
|
RJ01 | Rejection of invention patent application after publication |