CN117521763A

CN117521763A - Artificial intelligent model compression method integrating regularized pruning and importance pruning

Info

Publication number: CN117521763A
Application number: CN202410009132.4A
Authority: CN
Inventors: 李锋; 李博
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2024-01-04
Filing date: 2024-01-04
Publication date: 2024-02-06

Abstract

The invention provides an artificial intelligent model compression method for regularized pruning and importance pruning of a fusion group, which comprises the following steps: initializing weights and offsets in the deep convolutional neural network, and initializing a group regularization penalty factor; adding the cross entropy loss function and the group regularization loss function to obtain a total loss function of the network training; back propagation is carried out according to the total loss function, gradient of the loss function is calculated, and weight and bias of the network are updated to minimize the loss function; continuously optimizing weights and biases through a training process of multiple iterations to obtain network parameters with perfect training; after training, pruning the network and compressing the network model; and carrying out fine tuning training on the pruned network. In the method, in the regularized pruning of the group, the penalty factors are adaptively adjusted according to the importance of the weight group, and the adaptively adjusted penalty factors are used for pruning the network more reasonably, so that the weight group with important contribution to the network performance is better reserved.

Description

Artificial intelligent model compression method integrating regularized pruning and importance pruning

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an artificial intelligence model compression method integrating regularized pruning and importance pruning.

Background

Artificial intelligence has been successfully applied in a variety of different practical tasks, with deep neural networks being one of the most successful artificial intelligence models and achieving the most advanced performance in a number of application areas. Deep neural networks can use large-scale networks to extract useful features from large amounts of data, which is a major factor in its success. However, this makes deep neural networks ubiquitous with large-scale parameters, complex computation, etc., which prevents their deployment on mobile and embedded devices. With this problem, various network compression methods have been developed.

Network pruning is an important artificial intelligence model compression method, often used to compress deep neural network models, which by removing some unimportant parts of the network, results in a more compact network without affecting the original accuracy. And especially, the structural pruning can remove the whole redundant structure in the network, thereby being beneficial to realizing the compression and acceleration of the model.

Existing structural pruning methods mainly include importance-based methods and regularization-based methods. The method based on importance sorts the network structures through some importance criteria, and prunes the structures with lower importance. This pruning method requires more retraining to recover the loss of accuracy due to pruning, which increases a significant amount of computational cost. The regularization-based method can punish the network weights in the training process, compress the weights of redundant structures to zero, and remove the weights from the network. Since these weights are small, they do not contribute much to the output of the network, their removal does not result in a large loss of accuracy, and good accuracy can be achieved without requiring too much retraining. Therefore, this method does not increase the calculation cost.

Network pruning is an important artificial intelligent model compression method, wherein the structural pruning method based on group regularization can effectively prune redundant structures in a network model, and simultaneously realize network model compression and model acceleration. However, existing group regularization methods typically use a fixed penalty factor to compress the weights to zero, which has an unreasonable assumption that all weights are equally important. In practical cases, the weights of the neural networks tend to have different importance. Some weights may have a greater contribution to the performance of the network, while others may have less impact on the performance of the network. Thus, using the same penalty factor, simply compressing all weights to zero at the same penalty strength results in a decrease in network performance. In addition, the penalty factor is taken as a super parameter, the value of the penalty factor lacks general theoretical guidance, parameter adjustment is usually required according to different problems, and a great deal of time cost and calculation cost are consumed.

The group regularization method is a regularization method specially used for structural pruning, and the structural pruning effect of the regularization method is verified by a lot of work. However, existing group regularization methods typically use a fixed penalty factor to compress the weights to zero, which has an unreasonable assumption that all weights are equally important. This assumption is contrary to the actual situation where the weights of the neural network tend to have different importance. Some weights may have a greater contribution to the performance of the network, while others may have less impact on the performance of the network. Thus, using the same penalty factor, simply compressing all weights to zero at the same penalty strength results in a decrease in network performance. In addition, the penalty factor is taken as a super parameter, the value of the penalty factor lacks general theoretical guidance, parameter adjustment is usually required according to different problems, and a great deal of time cost and calculation cost are consumed.

Disclosure of Invention

According to the technical problems, the invention provides a model compression method for regularized pruning and importance pruning of a fusion group. In the method, penalty factors are adaptively adjusted according to the importance of weight groups in group regularized pruning. By using the self-adaptive adjustment penalty factors, the network can be pruned more reasonably, and the weight group which has important contribution to the network performance can be better reserved.

The invention adopts the following technical means:

an artificial intelligence model compression method integrating regularized pruning and importance pruning, comprising:

s1, initializing weights and offsets in a deep convolutional neural network, and initializing a group regularization penalty factor;

s2, adding the cross entropy loss function and the group regularization loss function to obtain a total loss function of the network training;

s3, back propagation is carried out according to the total loss function, gradient of the loss function is calculated, and weight and bias of the network are updated to minimize the loss function;

s4, continuously optimizing weights and biases through a training process of multiple iterations, and finally obtaining the network parameters with perfect training;

s5, after training is finished, pruning operation is carried out on the network, and a network model is compressed;

s6, performing fine tuning training on the pruned network to help the network adapt to the pruned structure.

Further, the step S1 specifically includes:

s11, randomly sampling the weight and the bias into a smaller value from normal distribution;

s12, initializing and setting the group regularization factor to 0.

Further, the step S2 specifically includes:

s21, expressing the cross entropy loss function as；

S22, representing the group regularization loss function as；

S23, adding the cross entropy loss function and the group regularization loss function to obtain a total loss function of the network training：

Wherein,representing the set of ownership weights in the network, +.>Is a predicted loss function, i.e. cross entropy loss function, ">Is an unstructured regularization of each weight, +.>Representing norm operation, weight decay, ++>Is for->Group of layers->Structured sparse regularization term of +.>Is->The number of weight sets in a layer;

s24, pairUsing the Group Lasso, the Group Lasso canonical term is expressed as:

。

further, the step S3 specifically includes:

s31, differentiating the output of the network by using the loss function to obtain the gradient of the loss function to the output;

s32, sequentially calculating the gradient of each layer according to a chain rule until the gradient of a loss function on each parameter (weight and bias) is calculated;

s33, if the gradient of the loss function to each parameter is calculated, updating the parameters of the network by using an optimization algorithm.

Further, in the step S33, the optimization algorithm used for updating the parameters of the network is:

new parameter = old parameter-learning rate parameter gradient

Wherein the learning rate is a super parameter for controlling the step size of parameter update.

Further, the step S5 specifically includes:

s51, performing overall parameters of the convolution kernelOperating;

s52, setting a threshold value which is 0.0001, and removing convolution kernels smaller than the threshold value.

Further, the step S6 specifically includes:

training the pruned network, and slightly adjusting parameters of the network model by using a small learning rate.

Further, the method can impose more reasonable penalty factors on different weight sets, wherein:

for important weight groups, smaller penalty factors are applied, and the penalty force in updating is smaller;

for unimportant weight groups, a larger penalty factor is applied, and the penalty force in updating is larger.

Compared with the prior art, the invention has the following advantages:

1. according to the artificial intelligent model compression method for regularized pruning and importance pruning of the fusion group, the performance and the precision of a network can be remarkably improved, and the final precision can be improved by about 0.34% under the same training iteration times by taking VGG16 as an example; taking ResNet18 as an example, the final accuracy can be improved by about 0.28% for the same number of training iterations.

2. According to the artificial intelligent model compression method integrating regularized pruning and importance pruning, the calculation complexity of a network can be reduced, the calculation efficiency of the network is improved, and compared with a normal VGG16 network, the VGG16 is taken as an example, the parameter quantity can be reduced by 87%; taking ResNet18 as an example, the number of parameters can be reduced by 81%.

3. The artificial intelligence model compression method for regularized pruning and importance pruning of the fusion group can automatically identify and adjust the magnitude of the punishment factors without manual adjustment. Therefore, the workload of manual parameter adjustment can be reduced, and the network is ensured to be always subjected to proper regularization constraint in the training process.

4. The artificial intelligent model compression method for regularized pruning and importance pruning of the fusion group can be conveniently applied to the existing deep convolutional neural network training and pruning flow without large-scale modification or redesign. Thus, time and resources can be saved, and the method is easier to implement and popularize.

In summary, by using the method for adaptively adjusting the group regularization penalty factor provided by the invention, the performance and the precision of the deep convolutional neural network can be effectively improved, the number of parameters can be reduced, and the operation flow can be simplified. Meanwhile, the method is simple and convenient to operate, and can be conveniently applied to the existing network training and pruning flow. These improvements will bring broader application prospects and commercial value to the fields of neural network training and artificial intelligence model compression.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.

Fig. 1 is a general route diagram of network pruning provided by the present invention.

FIG. 2 is a schematic diagram of group regularization provided by the present invention.

FIG. 3 is a comparison of the number of parameters and the running acceleration ratio of different networks under the same conditions provided by the embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in FIG. 1, the invention provides an artificial intelligence model compression method for regularized pruning and importance pruning of a fusion group, which comprises the following steps:

s2, adding the cross entropy loss function and the group regularization loss function to obtain a total loss function of the network training; in this embodiment, in each training iteration, the group regularization loss function needs to be added in addition to the loss function generated by calculating the network parameters. This ensures that each weight set of the network is subject to the appropriate regularization constraints.

S3, back propagation is carried out according to the total loss function, gradient of the loss function is calculated, and weight and bias of the network are updated to minimize the loss function; in this embodiment, by dynamically adjusting the group regularization penalty factor, a weight group that has an important contribution to network performance can be more accurately retained, and a weight group that has less impact on network performance by pruning can be more accurately retained.

S4, continuously optimizing weights and biases through a training process of multiple iterations, and finally obtaining the network parameters with perfect training; therefore, the number of parameters of the network can be further reduced, and the calculation efficiency of the network is improved.

s6, performing fine tuning training on the pruned network to help the network adapt to the pruned structure. The fine tuning training can help the network adapt to the structure after pruning, and improves the generalization capability of the network so as to further improve the performance and the precision.

In specific implementation, as a preferred embodiment of the present invention, the step S1 specifically includes:

s12, initializing and setting the group regularization factor to 0.

In specific implementation, as a preferred embodiment of the present invention, the step S2 specifically includes:

s21, expressing the cross entropy loss function as；

S22, representing the group regularization loss function as；

s24, pairUsing the Group Lasso, the Group Lasso canonical term is expressed as:

。

in this embodiment, parameters of the network model are divided into two modules for processing, wherein one module performs data fitting to generate a cross entropy loss function, and the other module can penalize weight sets corresponding to the network filters, and compress weights of the redundant structures to zero, as shown in fig. 2, whereinIs an adaptively adjustable penalty factor that produces a group regularization loss function.

In specific implementation, as a preferred embodiment of the present invention, the step S3 specifically includes:

s33, if the gradient of the loss function to each parameter is calculated, updating the parameters of the network by using an optimization algorithm. The optimization algorithm used for updating the parameters of the network is as follows:

new parameter = old parameter-learning rate parameter gradient

Wherein the learning rate is a super parameter for controlling the step size of parameter update. In this example, the initial value is 0.1, and the parameters are multiplied by 0.1 at 50% and 75% of the number of network iterations.

In this embodiment, the scale factor of the network batch normalization layer (BN layer) is used as an importance standard to represent the importance of the weight set, and thus a penalty factor capable of being adaptively adjusted is constructed, so that a small penalty factor can be applied to a filter with high importance, the filter is retained in a strengthening manner, and a large penalty factor is applied to a filter with low importance, and the filter is compressed to zero. The scaling factor of BN layer is a significant advantage as a criterion of importance:

(1) The scaling factor of the BN layer is an inherent parameter in the network training process, and no additional construction parameter is needed as an importance standard, so that the calculation cost can be saved;

(2) The scaling factor of the BN layer is related to the characteristic image output by the filter, so that the importance of the filter can be more accurately represented;

(3) The scaling factor of the BN layer is itself updatable, which can bring about the updating of the penalty factor constructed by it, so that no additional design of the responsible penalty factor updating rules is required.

In specific implementation, as a preferred embodiment of the present invention, the step S5 specifically includes:

s51, performing overall parameters of the convolution kernelOperating;

In specific implementation, as a preferred embodiment of the present invention, the step S6 specifically includes:

training the pruned network, and slightly adjusting the parameters of the network model by using a small learning rate (0.0005).

In specific implementation, as a preferred embodiment of the present invention, the method can apply more reasonable penalty factors to different weight groups, wherein:

Examples

VGG, res net in the following table are standard networks; SSL is a method for applying fixed group regularization factors under the same model; DRFGR is a penalty factor for applying adaptive adjustment under the same model.

Table I results of different indexes of Vgg16 and Vgg19 under the same condition and different methods

Table II results of different indices of ResNet18 and ResNet50 under the same conditions and different methods

In summary, the group regularization method of the self-adaptive adjustment penalty factors can more reasonably prune the network, can effectively compress and prune the network model, and simultaneously maintains the performance and accuracy of the model. The improvement part of the invention brings improvement of performance, precision, efficiency, simplicity, generalization capability and expandability. The improvement of the technical effects and main performance indexes can lead the method of the invention to have wide application prospect and commercial value in the field of artificial intelligent model compression. The method for adaptively adjusting the regularization penalty factors of the group can effectively improve the performance and the precision of the deep convolutional neural network, reduce the number of parameters and simplify the operation flow. Meanwhile, the method is simple and convenient to operate, and can be conveniently applied to the existing network training and pruning flow. These improvements will bring broader application prospects and commercial value to the fields of neural network training and artificial intelligence model compression.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. An artificial intelligence model compression method integrating regularized pruning and importance pruning, which is characterized by comprising the following steps:

2. The artificial intelligence model compression method of fusion group regularized pruning and importance pruning according to claim 1, wherein the step S1 specifically comprises:

s12, initializing and setting the group regularization factor to 0.

3. The artificial intelligence model compression method of fusion group regularized pruning and importance pruning according to claim 1, wherein the step S2 specifically comprises:

s21, expressing the cross entropy loss function as；

S22, representing the group regularization loss function as；

Wherein,representing the set of ownership weights in the network, +.>Is the predicted loss function, i.e. the cross entropy loss function,is an unstructured regularization of each weight, +.>Representing norm operation, weight decay, ++>Is for->Group of layersStructured sparse regularization term of +.>Is->The number of weight sets in a layer;

s24, pairUsing the Group Lasso, the Group Lasso canonical term is expressed as:

。

4. the artificial intelligence model compression method of fusion group regularized pruning and importance pruning according to claim 1, wherein the step S3 specifically comprises:

5. The artificial intelligence model compression method of fusion group regularized pruning and importance pruning according to claim 1, wherein in step S33, the optimization algorithm used for updating the parameters of the network is:

new parameter = old parameter-learning rate parameter gradient

6. The artificial intelligence model compression method of fusion group regularized pruning and importance pruning according to claim 1, wherein the step S5 specifically comprises:

s51, performing overall parameters of the convolution kernelOperating;

7. The artificial intelligence model compression method of fusion group regularized pruning and importance pruning according to claim 1, wherein the step S6 specifically comprises:

8. The artificial intelligence model compression method of fusion group regularized pruning and importance pruning of claim 1, wherein the method is capable of applying more rational penalty factors to different weight groups, wherein: