CN111222629B

CN111222629B - Neural network model pruning method and system based on self-adaptive batch standardization

Info

Publication number: CN111222629B
Application number: CN201911423105.7A
Authority: CN
Inventors: 李百林; 苏江
Original assignee: DMAI Guangzhou Co Ltd
Current assignee: DMAI Guangzhou Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2023-05-05
Anticipated expiration: 2039-12-31
Also published as: CN111222629A

Abstract

The invention discloses a neural network model pruning method and system based on self-adaptive batch standardization, which take random sampling floating point number as pruning rate of each layer, and generate pruning rate vector (r) under the limit of preset computing resources ₁ ，r ₂ ，...，r _L ) As a pruning strategy, pruning model candidate sets formed by pruning the models respectively based on the pruning strategy; respectively utilizing a self-adaptive batch normalization method to the pruning models in the candidate set to update the statistical parameters of the batch normalization layer; and evaluating the classification accuracy of the neural network model with updated statistical parameters, and fine-tuning the model with the highest classification accuracy on a training set to convergence to obtain a final pruning model. According to the invention, the candidate sub-networks are rapidly and accurately evaluated by the adjustment standardization layer, and the winning pruning strategy is finely adjusted in the rapid evaluation method, so that the parameters of the final pruning network are obtained, the huge time consumption required by finely adjusting all the pruning networks is avoided, and meanwhile, the accuracy is also advantageous.

Description

Neural network model pruning method and system based on self-adaptive batch standardization

Technical Field

The invention relates to the technical field of neural network model pruning, in particular to a neural network model pruning method and system based on self-adaptive batch standardization.

Background

Neural network pruning aims at reducing computational redundancy of the neural network without losing much accuracy. The pruned model generally has lower energy consumption and hardware load, and therefore has great significance for deployment on embedded equipment. However, how to find the least important parts of the network to minimize the loss of accuracy after pruning is a critical issue. The pruning problem of a neural network can be seen as a search problem, where the search space is the set of all pruned subnetworks, and finding the subnetwork with the highest accuracy in this space is the core of the pruning problem. The sub-network evaluation process is commonly existed in the existing pruning method, and can reveal the potential accuracy of the sub-network and then fine-tune the sub-network with the highest potential accuracy to obtain the optimal neural network model. In the neural network model pruning method in the prior art, as shown in fig. 1, all pruning networks often need to be finely tuned, so as to judge the final convergence accuracy which can be achieved by different pruning strategies, but the essence of the fine tuning is training for a plurality of periods, and the fine tuning is relatively time-consuming.

Disclosure of Invention

Therefore, the neural network model pruning method and system based on the self-adaptive batch standardization overcomes the defect of large time consumption of the neural network model pruning method in the prior art.

In a first aspect, an embodiment of the present invention provides a neural network model pruning method based on adaptive batch normalization, including the following steps: for an L-layer neural network model, L [0, R are randomly sampled](0<R<1) The floating point number in the tree is used as the pruning rate of each layer, and a pruning rate vector (r) is generated under the condition that the limit of preset computing resources is met ₁ ,r ₂ ,…,r _L ) As a pruning strategy; pruning is respectively carried out on the neural network model based on the pruning strategy, and a pruning model candidate set formed by the pruned model is generated; respectively utilizing a self-adaptive batch normalization method to the pruning models in the candidate set to update the statistical parameters of the batch normalization layer; and evaluating the classification accuracy of the neural network model with updated statistical parameters, and fine-tuning the model with the highest classification accuracy on a training set to convergence to be used as a final pruning model.

In an embodiment, the preset computing resource limit includes at least one of a preset computing operand limit, a preset parameter limit, and a preset computing delay limit.

In an embodiment, the process of pruning the neural network model based on the pruning strategy to generate a pruned model candidate set formed by the pruned model includes: pruning is carried out on the candidate set pruning model by utilizing each pruning strategy, the convolution kernels of all layers are ordered according to the norm from large to small, and the convolution kernels after M bits are inverted in the ordering are removed, wherein M=ceil (r) _l * cl), ceil represents rounding up, c _l Is the number of convolution kernels of layer i, r _l The pruning rate of the first layer; and forming a pruning model candidate set by using the pruning model with the convolution kernel after the M bits of the reciprocal number are removed.

In an embodiment, the process of updating statistical parameters of the batch normalization layer of the pruning models in the candidate set by using an adaptive batch normalization method respectively includes: and (3) fixing all the learnable parameters of the pruning models in each candidate set, iterating on a preset number of training samples, and updating the sliding average value and the sliding variance value of the statistical parameters of the batch normalization layer.

In one embodiment, the parameter sliding average and sliding variance values are respectively calculated by the following formula

Updating:

μ _t ＝mμ _t-1 +(1-m)μ _β ,

wherein mu _t Representing a running average of the parameters,

represents the sliding variance value, m represents the weight of the historical average in the sliding average, t represents the iteration number, and β represents the sample batch.

In one embodiment, the process of evaluating the classification accuracy of the neural network model with updated statistical parameters and taking the model with highest classification accuracy as the final pruning model comprises the steps of acquiring the classification accuracy calculated on the verification set by the neural network model with updated statistical parameters as the potential accuracy of the model; and acquiring the neural network models with the potential accuracy rate ranked in the top N as candidate models, performing fine tuning on the candidate models on a training set until convergence, and calculating the model with the highest classification accuracy rate on a verification set as a final pruning model.

In a second aspect, an embodiment of the present invention provides a neural network model pruning system based on adaptive batch normalization, including: pruning strategy generation module for randomly sampling L [0, R for L-layer neural network model](0<R<1) Floating point number in asThe pruning rate of each layer is used for generating a pruning rate vector (r under the condition of meeting the limit of preset computing resources ₁ ,r ₂ ,…,r _L ) As a pruning strategy; the pruning model candidate set generation module is used for pruning the neural network model based on the pruning strategy respectively to generate a pruning model candidate set formed by the pruned model; the statistical parameter updating module of the batch normalization layer is used for updating the statistical parameters of the batch normalization layer of the pruning models in the candidate set by using a self-adaptive batch normalization method respectively; and the pruning model output module is used for evaluating the classification accuracy of the neural network model with updated statistical parameters, and taking the model with the highest classification accuracy as the final pruning model after fine adjustment on a training set to convergence.

In a third aspect, an embodiment of the present invention provides a computer apparatus, including: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the neural network model pruning method based on adaptive batch normalization according to the first aspect of the present invention.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, where computer instructions are stored, to cause the at least one processor to execute the neural network model pruning method based on adaptive batch normalization according to the first aspect of the present invention.

The technical scheme of the invention has the following advantages:

1. the embodiment of the invention provides a neural network model pruning method and system based on self-adaptive batch standardization, which aim at a neural network model, randomly sample floating point numbers as pruning rate of each layer, and generate a pruning rate vector (r) under the condition that the limit of preset computing resources is met ₁ ,r ₂ ,…,r _L ) As a pruning strategy, pruning is respectively carried out on the neural network model based on the pruning strategy, and a pruning model candidate set formed by a pruned model is generated; to candidate setThe pruning model of the tree is updated with statistical parameters of a batch normalization layer by a self-adaptive batch normalization method respectively; and evaluating the classification accuracy of the neural network model with updated statistical parameters, and fine-tuning the model with the highest classification accuracy on a training set to convergence to obtain a final pruning model. According to the invention, the candidate sub-networks are rapidly and accurately evaluated by the adjustment standardization layer, and the winning pruning strategy is finely adjusted in the rapid evaluation method, so that the parameters of the final pruning network are obtained, the huge time consumption required by finely adjusting all the pruning networks is avoided, and meanwhile, the accuracy is also advantageous.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a specific example of a pruning algorithm in the prior art according to an embodiment of the present invention;

FIG. 2 is a flowchart of an example of a neural network model pruning method based on adaptive batch normalization according to an embodiment of the present invention;

FIG. 3 is a flowchart of a specific example of a neural network model pruning method based on adaptive batch normalization according to an embodiment of the present invention;

FIG. 4 is a block diagram of a specific example of a neural network model pruning system based on adaptive batch normalization according to an embodiment of the present invention;

fig. 5 is a composition diagram of a specific example of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

Example 1

The neural network model pruning method based on the adaptive batch standardization provided by the embodiment of the invention, as shown in fig. 2, comprises the following steps:

step S1: for an L-layer neural network model, L [0, R are randomly sampled](0<R<1) The floating point number in the tree is used as the pruning rate of each layer, and a pruning rate vector (r) is generated under the condition that the limit of preset computing resources is met ₁ ,r ₂ ,…,r _L ) As a pruning strategy.

In the embodiment of the invention, for taking all convolution layers with the convolution kernel size of 3×3 of mobilenet v1 as a pruneable layer (l=14), 14 [0, r are randomly sampled](0<R<1) The floating point number in the tree is used as the pruning rate of each layer, and a corresponding vector (r ₁ ,r ₂ ,…,r _L ) As pruning strategy, each element r in the vector ₁ Representing the proportion of convolution kernels that the first layer needs to reduce, i.e. the pruning rate of each layer, the pruning policy remains to meet preset computational resource constraints, including at least one of preset computational operand constraints, preset parameter number constraints, preset computational latency constraints, e.g. computational operands do not exceed 283M, etc.

And S2, pruning the neural network models based on pruning strategies to generate a pruning model candidate set formed by the pruned models.

According to the embodiment of the invention, a training to converged MobileNet V1 model is subjected to convolution kernel pruning one by using the pruning strategy generated in the step S1. Specifically, for each pruning strategy (r ₁ ,r ₂ ,…,r _L ) The convolution kernels of each layer L are firstly ordered from large to small according to the L1 norm, and then removed from the original neural networkConvolution kernels after the last M bits in the ordering, where m=ceil (r _l *c _l ) Ceil represents the upper rounding, c _l For the number of convolution kernels of layer l, r _l The pruning rate of the first layer is adopted, so that the purposes of reducing the quantity of the neural network parameters and the required calculated quantity are achieved. And pruning the same trained neural network model by respectively applying different pruning strategies in the pruning strategy candidate set to obtain a pruning model candidate set consisting of different pruned models (1000 in the embodiment of the invention, which is only used as an example and not limited thereto). It should be noted that the above-mentioned order of the convolution kernels of each layer L from large to small according to the L1 norm is only used as an illustration, but not limited thereto, and in other embodiments, the order may be obtained according to other norm types.

And step S3, updating statistical parameters of the batch normalization layer of the pruning models in the candidate set by using a self-adaptive batch normalization method respectively.

In this embodiment, the adaptive batch normalization method is applied one by one to recalculate the statistical parameters (sliding average μ) of the batch normalization layer (international generic name Batch Normalization layer) _t And a sliding variance value

). Specifically, for the pruning model in each candidate set, all the learnable parameters in the neural network are fixed first, then iterated over a small number of training samples, and the statistical parameters μ of the batch normalization layer are calculated _t And->

The updating is carried out, and the updating process is as follows:

μ _t ＝mμ _t-1 +(1-m)μ _β ,

where m represents the weight of the historical average in the running average, t represents the number of iterations, and β represents the sample lot. The number of training samples in the embodiment of the present invention is 5000, but is merely exemplary and not limited thereto.

And S4, evaluating the classification accuracy of the neural network model with updated statistical parameters, and taking the model with the highest classification accuracy as a final pruning model.

The embodiment of the invention acquires the classification accuracy calculated on the verification set by the neural network model with updated statistical parameters as the potential accuracy of the model; and acquiring the neural network models with the potential accuracy rate ranked in the top N as candidate models, performing fine tuning on the candidate models on a training set until convergence, and calculating the model with the highest classification accuracy rate on a verification set as a final pruning model.

In a specific embodiment, as shown in FIG. 3, classification accuracy is calculated for each candidate model over a sub-verification set of 10000 samples randomly sampled in the training set. In this implementation, the accuracy is 14.33% at the highest, and the pruning strategy corresponding to the candidate model is (0.40,0.26,0.29,0.33,0.39,0.14,0.28,0.38,0.39,0.23,0.36,0.09,0.02,0.28). And finally, fine-tuning the candidate sub-network obtained by pruning according to the pruning strategy until convergence, wherein the classification accuracy of the final pruning model reaches 70.9%.

In order to verify the effectiveness of the invention, the embodiment of the invention selects an international public data set ImageNet for evaluation. The effectiveness of the invention is verified from the superiority evaluation of the model precision after pruning, and the invention selects the picture classification accuracy under the specific model floating point number operation times as comparison with the commonly adopted evaluation index of the data set, and the invention and other methods show the pruning effect of the invention on ResNet-50 and Table 2 show the pruning effect of the invention and other methods on MobileNet V1.

TABLE 1

TABLE 2

Method	Floating point number calculation (M)	Test accuracy (%)
			0.5×MobileNetV1[4]	325	68.4
AMC	285	70.5
			NetAdapt[2]	284	69.1
Meta-Pruning[3]	281	70.6
			The invention is that	284	70.9

Compared with other pruning methods, the accuracy of the pruned model obtained by the method provided by the embodiment of the invention has obvious advantages under the same pruning rate.

According to the neural network model pruning method based on the adaptive batch standardization, provided by the embodiment of the invention, the candidate sub-networks are rapidly and accurately estimated by adjusting the batch standardization layer, and the winning pruning strategy in the rapid estimation method is finely adjusted to obtain the parameters of the final pruning network, so that the huge time consumption required by finely adjusting all the pruning networks is avoided, and meanwhile, the accuracy is also advantageous.

Example 2

The embodiment of the invention provides a neural network model pruning system based on self-adaptive batch standardization, which is shown in fig. 4, and comprises the following steps:

pruning strategy generation module 1 for randomly sampling L [0, R ] for an L-layer neural network model](0<R<1) The floating point number in the tree is used as the pruning rate of each layer, and a pruning rate vector (r) is generated under the condition that the limit of preset computing resources is met ₁ ,r ₂ ,…,r _L ) As a pruning strategy; this module performs the method described in step S1 in embodiment 1, and will not be described here again.

The pruning model candidate set generating module 2 is used for respectively pruning based on the pruning strategy neural network model to generate a pruning model candidate set formed by a pruned model; this module performs the method described in step S2 in embodiment 1, and will not be described here.

The statistical parameter updating module of the batch normalization layer is used for updating the statistical parameters of the batch normalization layer of the pruning models in the candidate set by using a self-adaptive batch normalization method respectively; this module performs the method described in step S3 in embodiment 1, and will not be described here.

And the pruning model output module is used for evaluating the classification accuracy of the neural network model with updated statistical parameters, and taking the model with the highest classification accuracy as the final pruning model after fine adjustment on a training set to convergence. This module performs the method described in step S4 in embodiment 1, and will not be described here.

According to the neural network model pruning system based on the adaptive batch standardization, provided by the embodiment of the invention, the candidate sub-networks are rapidly and accurately evaluated by adjusting the batch standardization layer, and the winning pruning strategy in the rapid evaluation method is finely adjusted to obtain the parameters of the final pruning network, so that the huge time consumption required by finely adjusting all the pruning networks is avoided, and meanwhile, the accuracy is also advantageous.

Example 3

An embodiment of the present invention provides a computer device, as shown in fig. 5, including: at least one processor 401, such as a CPU (Central Processing Unit ), at least one communication interface 403, a memory 404, at least one communication bus 402. Wherein communication bus 402 is used to enable connected communications between these components. The communication interface 403 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional communication interface 403 may further include a standard wired interface and a wireless interface. The memory 404 may be a high-speed RAM memory (Ramdom Access Memory, volatile random access memory) or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 404 may also optionally be at least one storage device located remotely from the aforementioned processor 401. Wherein the processor 401 may perform the neural network model pruning method based on adaptive batch normalization in embodiment 1. A set of program codes is stored in the memory 404, and the processor 401 calls the program codes stored in the memory 404 for executing the neural network model pruning method based on adaptive batch normalization in embodiment 1. The communication bus 402 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. Communication bus 402 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one line is shown in fig. 5, but not only one bus or one type of bus.

Wherein the memory 404 may include volatile memory (English) such as random-access memory (RAM); the memory may also include a nonvolatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated as HDD) or a solid state disk (english: solid-state drive, abbreviated as SSD); memory 404 may also include a combination of the above types of memory.

The processor 401 may be a central processor (English: central processing unit, abbreviated: CPU), a network processor (English: network processor, abbreviated: NP) or a combination of CPU and NP.

Wherein the processor 401 may further comprise a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof (English: programmable logic device). The PLD may be a complex programmable logic device (English: complex programmable logic device, abbreviated: CPLD), a field programmable gate array (English: field-programmable gate array, abbreviated: FPGA), a general-purpose array logic (English: generic array logic, abbreviated: GAL), or any combination thereof.

Optionally, the memory 404 is also used for storing program instructions. The processor 401 may invoke program instructions to implement the neural network model pruning method based on adaptive batch normalization as in execution of embodiment 1 herein.

The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores computer executable instructions, wherein the computer executable instructions can execute the neural network model pruning method based on the adaptive batch normalization in the embodiment 1. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims

1. A neural network model pruning method based on self-adaptive batch standardization is characterized by comprising the following steps:

for an L-layer neural network model, L [0, R are randomly sampled](0<R<1) The floating point number in the tree is used as the pruning rate of each layer, and a pruning rate vector (r) is generated under the condition that the limit of preset computing resources is met ₁ ,r ₂ ,…,r _L ) As a pruning strategy;

pruning is carried out on the neural network model based on the pruning strategy, and a pruning model candidate set formed by the pruned model is generated, which comprises the following steps:

pruning is carried out on the candidate set pruning model by utilizing each pruning strategy, the convolution kernels of all layers are ordered according to the norm from large to small, and the convolution kernels after M bits are inverted in the ordering are removed, wherein M=ceil (r) _l *c _l ) Ceil represents the upper rounding, c _l For the number of convolution kernels of layer l, r _l The pruning rate of the first layer;

the pruning model after the convolution kernel after M bits of the reciprocal is removed is formed into a pruning model candidate set;

the statistical parameters of the batch normalization layer of the pruning model in the candidate set are updated by a self-adaptive batch normalization method respectively, and the method comprises the following steps: for the pruning model in each candidate set, fixing all the learnable parameters, iterating on a preset number of training samples, updating the statistical parameter sliding average value and the sliding variance value of the batch normalization layer, and respectively carrying out the parameter sliding average value and the sliding variance value by the following formulas

Updating:

μ _t ＝mμ _t-1 +(1-m)μ _β ,

wherein mu _t Representing a running average of the parameters,

representing a sliding variance value, m represents a weight of a historical average in the sliding average, t represents iteration times, and beta represents a sample batch;

and evaluating the classification accuracy of the neural network model with updated statistical parameters, and fine-tuning the model with the highest classification accuracy on a training set to convergence to obtain a final pruning model.

2. The adaptive batch normalization based neural network model pruning method according to claim 1, wherein the preset computing resource limits include at least one of preset computing operand limits, preset parameter limit, and preset computing delay limit.

3. The neural network model pruning method based on adaptive batch normalization according to claim 1, wherein the process of evaluating the classification accuracy of the neural network model from which the updated statistical parameters are obtained and taking the model with the highest classification accuracy as the final pruning model comprises:

acquiring the classification accuracy calculated on the verification set by the neural network model with updated statistical parameters as the potential accuracy of the model;

and acquiring the neural network models with the potential accuracy rate ranked in the top N as candidate models, performing fine tuning on the candidate models on a training set until convergence, and calculating the model with the highest classification accuracy rate on a verification set as a final pruning model.

4. A neural network model pruning system based on adaptive batch normalization, comprising:

pruning strategy generation module for randomly sampling L [0, R for L-layer neural network model](0<R<1) The floating point number in the tree is used as the pruning rate of each layer, and a pruning rate vector (r) is generated under the condition that the limit of preset computing resources is met ₁ ,r ₂ ,…,r _L ) As a pruning strategy;

the pruning model candidate set generating module is used for respectively pruning the neural network model based on the pruning strategy to generate a pruning model candidate set formed by the pruned model, and comprises the following steps:

the statistical parameter updating module of the batch normalization layer is used for updating the statistical parameters of the batch normalization layer of the pruning model in the candidate set by using a self-adaptive batch normalization method respectively, and comprises the following steps: for the pruning model in each candidate set, fixing all the learnable parameters, iterating on a preset number of training samples, updating the statistical parameter sliding average value and the sliding variance value of the batch normalization layer, and respectively carrying out the parameter sliding average value and the sliding variance value by the following formulas

Updating:

μ _t ＝mμ _t-1 +(1-m)μ _β ,

wherein mu _t Representative parametersThe sliding average value of the two values is calculated,

and the pruning model output module is used for evaluating the classification accuracy of the neural network model with updated statistical parameters, and taking the model with the highest classification accuracy as the final pruning model after fine tuning to convergence on the training set.

5. A computer device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the adaptive batch normalization based neural network model pruning method of any one of claims 1-3.

6. A computer readable storage medium having stored thereon computer instructions for causing the computer to perform the adaptive batch normalization based neural network model pruning method of any one of claims 1-3.