CN111860779A

CN111860779A - Rapid automatic compression method for deep convolutional neural network

Info

Publication number: CN111860779A
Application number: CN202010659862.0A
Authority: CN
Inventors: 唐文婷; 韦星星; 王越; 李波
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2020-10-30

Abstract

The invention discloses a method for quickly and automatically compressing a deep convolutional neural network. Finding an optimal model quantization scheme aiming at an optimization target and a model to be optimized: initializing a reinforcement learning agent and an environment, and outputting an optimal model when the specified number of search rounds is achieved; otherwise, single round searching is carried out. In a single-round search, determining a zero-bit channel index and a channel minimum bit number of each quantization layer, and calculating a quantization scheme of the layer of the current model and a size of the quantized model. If the current quantization layer is the last layer and does not meet the optimization goal, adjusting the model quantization scheme until the optimization goal is met or the model quantization scheme cannot be adjusted; otherwise, evaluating the quantization scheme and storing the environmental parameters, and if the current quantization layer is not the last layer, continuing the single-round search; and otherwise, if the current model quantization scheme is optimal, the quantization scheme is saved as the optimal model quantization scheme. A large amount of artificial parameter adjustment is avoided, network branch reduction and network quantification are not needed to meet hardware requirements, and a compressed network can be obtained quickly.

Description

Rapid automatic compression method for deep convolutional neural network

Technical Field

The invention relates to the field of deep neural network lightweight, in particular to a method for quickly and automatically compressing a deep convolutional neural network for a target recognition task.

Background

With the development of semiconductor technology and hardware energy, hardware devices have been able to support higher concurrency and high throughput computing tasks. Meanwhile, in the last decade, the computer vision technology is rapidly developed, deep neural networks with different characteristics are produced, and human-like performance is obtained under application environments such as target detection and identification and scene understanding aiming at remote sensing image data. Therefore, in order to meet the development requirement of hardware and fully exert the advantages of the deep neural network on the tasks of target detection and identification (such as detection and identification of airplanes, ships and ground targets), the network lightweight has important research significance in the military and civil fields. Because the number of parameters of the deep convolutional neural network is huge and different hardware platforms exist, how to keep the network performance to be adapted to different hardware platforms on the premise of meeting hardware requirements (energy consumption and model size) is a difficult problem to be solved urgently.

Network pruning and network quantization are not only common network compression means but also the focus of current research. In 1989, the academia proposed an optimization method for neural network structure, namely, a reduction in the amount and size of network computation was achieved by reducing unnecessary connections in the network within the range of degradation allowed by the network performance. Since then, the network pruning process is modeled as a layer-by-layer path selection or network-by-network path selection problem. In industry, because the problem of modeling network pruning into network-by-network path selection not only increases a large amount of computation and storage overhead but also makes the compression process more complicated (the model needs to be finely adjusted or even retrained), for the trained model, a network pruning algorithm based on weight is generally used for channel selection layer by layer. Since layer-by-layer channel selection requires a large amount of human labor to determine the appropriate layer sparsity (0 bit element number in each layer) for each compressible layer, research on the method of adaptively adjusting the layer sparsity gradually draws attention from the industry and academia. Because the problem of determining the sparsity of each layer in the network to be compressed can be abstracted into a scheduling problem of dynamic programming, and reinforcement learning is a favorable means for solving the dynamic programming, the task of scheduling and compressing by using reinforcement learning is the key point of research in recent years, for example, an automatic branch reduction method by using reinforcement learning is adopted, and the method not only adapts to different industrial platforms, but also can determine the sparsity of each layer in a self-adaptive manner. On the other hand, in order to adapt to different hardware platforms, the storage precision of the model should be adjusted from model to model or even with smaller granularity, such as layer by layer. And adjusting the storage precision by model to 8 bits by using a K mean value method or using a check set to adjust the weight value, wherein the model performance can be nearly lossless. With the development of chip technology, the manufacturers or designs of one-line chips such as apple, great britain, and high-pass have issued chips supporting mixed precision operation. In order to adapt to different hardware platforms and further exert hardware advantages, adjusting model storage precision layer by layer is also one of the hot spots of research in recent years. Some methods may adjust the model storage accuracy of each layer by layer using reinforcement learning. However, due to the explosive increase in problem space size, these approaches do not support lower granularity such as per-channel model storage refinement.

The industry protocol (MPEG protocol ISO/IEC JTC 1/SC 29/WG 11 under N18575) states that the network performance to measure lightweighting will be from: the performance loss of the model, the size reduction of the model, the reduction of the calculated amount of the model, the compression time of the model and the decompression time of the model are measured in 5 aspects. Thus, considering the first 3, the lightweight framework that is now in common use is the "depth compression" framework. The framework firstly uses a network branch reduction technology to reduce the calculated amount, and then uses a network quantification technology to reduce the size of the model. The cascade design ensures compatibility with different compression methods, and has wide application in industry. However, this approach also emphasizes preservation of network performance. Therefore, to better balance hardware requirements and model performance, the framework may require repeating network pruning and network quantization tasks to search for a better compressed model, but this also causes additional compression time overhead. In industrial applications, lightweight approaches tailored to different hardware platforms are necessary. At this time, the network compression using the framework will generate a non-negligible time overhead. Therefore, considering model compression, performance preservation and compression time, the method is always the key point of network lightweight research and application.

The problems faced by this study are: (1) existing compression methods make compression decisions based on local information, i.e. only certain hardware requirements are considered and subsequent compression means are not considered. This not only increases the compression scheme that does not satisfy the compression condition in the compression and causes the compression to be performed again, but also outputs a sub-optimal solution in the middle of the compression, thereby causing the final output compression solution to be sub-optimal. (2) The existing compression framework only organizes a compression method based on a compression purpose, and does not schedule a compression task based on a hardware platform. In the general "depth compression" framework, network pruning is applied before network quantization. The method aims to reduce the calculated amount as much as possible by network branch reduction and reduce the size of the model as much as possible by network quantization. However, the reduction in the amount of calculation leads to a reduction in the performance of the model, and a framework designed only for pursuing a more lightweight model does not facilitate the full exploitation of the hardware performance in practice. (3) Finally, the "deep compression" framework achieves a potentially better performance-hardware balance point at the expense of time overhead. The root cause of these time overheads is caused by the step structure of the "deep compression" framework and the scheduling of compression tasks based on local information.

Therefore, how to provide a method for quickly and automatically reducing the weight of the deep convolutional neural network is a problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

In view of the above, the present invention provides a method for fast and automatically compressing a deep convolutional neural network, which not only avoids a large amount of artificial parameter adjustment, but also does not need to perform network pruning and network quantization separately to meet hardware requirements, and automatically predicts a reasonable bit number of each channel by using channel importance, thereby fast and automatically obtaining a compressed network.

In order to achieve the above purpose, the invention provides the following technical scheme:

a method for quickly and automatically compressing a deep convolutional neural network comprises the following steps: aiming at an input optimization target and a model to be optimized, searching to find an optimal model quantization scheme, wherein the model quantization scheme searching is a multi-round searching process. Firstly, initializing a reinforcement learning agent and an environment according to a model to be optimized and an optimization target, and then searching an optimal model quantization scheme, wherein the specific search process is as follows:

s1, single-round searching process:

s11: for each compressible layer, calculating a zero-bit channel index and a channel minimum bit number according to the channel importance;

S12: preliminarily calculating a current model quantization scheme according to a zero-bit channel index;

s13: calculating the size of a quantized model according to the quantization scheme of the current model;

s14: if the current compressible layer is the last layer and the current model quantization scheme does not meet the optimization target, circularly adjusting the current model quantization scheme according to the channel minimum bit number until the optimization target is met or quantization cannot be performed;

if the current compressible layer is not the last layer or the current model quantization scheme meets the optimization target, evaluating the current layer quantization scheme and storing relevant environmental parameters;

s15: continuing the search condition, judging whether to finish the current round of search, and further judging whether to finish the current round of search and whether to update the optimal model quantization scheme based on the current layer quantization scheme evaluation result;

s2, multi-round searching process: and S1 is repeatedly executed, after the number of current search rounds reaches the required number of search rounds, the model quantization scheme search is finished, and the optimal compression model aiming at the optimization target is output.

Further, the input model to be optimized includes N sets of quantifiable layers N ═ L₁，...，L_i，...，L_nWhere I is 1 … n, a set of quantisable layer input channel numbers I is { I } ₁，...，I_nSet of storage space required by the quantizeable layer LS ═ LS₁，...，LS_n}；

The optimization target describes the requirements of the size, the search time and the like of the compressed model, and comprises the maximum compressed bit number bit_maxNumber of search roundsepiscodes, model size compression ratio sc ∈ (0, 1)]Initializing TOP-5 accuracy acc (N) of a model to be optimized; optimum evaluation result R_bestInitialized to- ∞, optimal model quantization scheme P_bestIs phi;

when the reinforcement learning environment is initialized, the reinforcement learning state s_iIs defined as (idx, t, out, in, w, h, stride, k, rededFLOPs, resFLOPs, rededSize, restSize, a)_i-1) Wherein idx is a layer index; t is a layer type, and comprises a convolution layer and a full-connection layer; out is the number of output channels, and in is the number of input channels; w and h are the width and height of the input feature vector; stride and k are the step length of convolution layer convolution operation and the side length of convolution kernel, and stride and k are both 1 for a full connection layer; redundedfrops is the amount of computation reduced by the current compression strategy, initialized to 0; restFLOPs are model residual calculated quantities, initialized to model calculated quantity N_FLOPs(ii) a rededsize is the size of the model reduced by the current compression strategy and is initialized to 0; restSize is the model residual size, initialized to model size N _size；a_i-1Is the previous compressible layer sparsity, initialized to 0;

the reinforcement learning agent comprises an actor network theta, an evaluator network mu and environmental noise sigma;

the current search round number curreps is initialized to 0.

Further, determining a zero-bit channel index according to the channel importance, the specific steps include:

s111, creating a channel index sequence CI ═ p ═ 1, and I for the I-th layer according to the quantization layer input channel number set_i}，I_iInputting the number of channels for the ith quantizer layer;

s112, according to the channel index CI, for each channel c of the ith layer, according to the channel importance

Sequencing the index sequence CI in ascending order to obtain a post-sequencing index sequence CI',

the importance of the ith channel is defined as

Wherein, I_i+1The number of input channels for the i +1 th quantisable layer,

is the ith filter weight, which is the ith filter weight

Obtaining;

for channel importance of i-th layer, imp ═ impⁱI 1.. n } is the importance of the model channel, and n is the quantifiable layer number;

s113, inputting the channel number I according to the quantization layer_iCalculating the number of the zero bit channels of the layer,

wherein min () and max () are functions for solving the minimum value and the maximum value in the input sequence, and the layer sparsity

Wherein s is_iIn the reinforcement learning state, θ is the actor network, μ is the evaluator network, σ is the environmental noise, a _iIs in the value range of (0, 1)]；

S114, calculating the zero-bit channel index based on the ascending ordered index sequence CI' and the layer zero-bit channel number cp:

ZCI＝topk(CI′，cp) (3)

wherein the topk (a, k) function returns the first k elements in the input sequence a.

Further, the channel minimum bit number minBW is calculated:

minBW＝{minBWⁱ|i＝1，...，n} (4)

wherein, minBWⁱIndicates the minimum bit number, minBW, of the i-th layer channelⁱ _cRepresents the minimum channel bit number, bit, of the ith layer of the c channel_maxFor the maximum number of bits of the model after compression,

the ith channel importance.

Further, the specific step of S12 is as follows:

s121, initializing the quantization scheme P of the layerⁱ＝{bit_max×1^k|k＝1,...,I_i}，bit_maxFor the maximum number of bits of the compressed model, I_iThe number of input channels of the ith layer is;

s122, updating the layer of channel level quantization scheme Pⁱ；

For the ith channel of the ith layer in P, the quantization bit number of the channel

Is defined as the formula

Wherein ZCI is zero-bit channel index, and c has a value range of [1, I_i]，I_iThe number of input channels of the ith layer.

S123, the updated current layer quantization scheme PⁱSave to model channel level quantization scheme P ═ { P^k|k＝1，...，i}。

Further, the model size N 'after quantization of S13'_sizeThe specific calculation formula is as follows:

wherein the content of the first and second substances,

for the c-th filter weight of the k-th layer, according to the k-th quantized layer weight

Obtaining;

for the channel quantization bit number of the c channel at the kth layer in the channel quantization scheme P, for the layers which are not quantized, i.e., the (i + 1) th layer to the nth layer, the channel quantization bit numbers of all the channels are determined by the model to be optimized, and are generally 32; i is_iThe number of the input channels of the ith layer is the number of the input channels of the ith layer.

Further, if the current compressible layer is the last layer and the current model quantization scheme does not satisfy the optimization, i.e., i ═ N and N'_size＞sc×N_sizeWhere N is the quantifiable number of layers, N'_sizeFor the current model size, sc is the model size compression ratio, N_sizeIf the model size is to be compressed, the current model quantization scheme is adjusted, and the specific adjustment steps and termination conditions are as follows:

s141, determining a layer i for quantization scheme adjustment, wherein the adjustment layer i is defined as the following formula

Wherein LS_iFor the ith layer size of the model to be optimized, the sequence of layers QL may be initialized to QL ═ { 1., N }, if QL ═ Φ or N'_size≤sc×N_sizeIf the quantization cannot be performed or the optimization target is met, stopping adjusting the current model quantization scheme;

s142, calculating the quantization channel index set of the ith layer

Wherein

For the channel quantization bit number of the c-th channel of the k-th layer in the model channel quantization scheme P,

is the minimum bit number of the channel of the ith layer of the c channel, I _iThe number of input channels of the ith layer is; if QC is empty, remove i from QL, carry on S141 again, QL is the above-mentioned adjustable layer sequence;

s143, determining an adjusting channel c in the ith layer for quantization scheme adjustment, wherein the adjusting channel c is defined as the following formula

Where QC is the ith layer quantization index set,

the ith channel importance;

s144, adjusting the channel quantization bit number of the ith channel in the ith layer of the c channel in the model channel level quantization scheme P

The adjustment is defined as the following formula

Wherein the content of the first and second substances,

the minimum bit number of the c channel of the ith layer is adjusted

Saving to the model channel level quantization scheme P;

s145, updating the current model size N 'according to the post-adjustment model channel level quantization scheme P and the formula (8)'_size。

Further, the specific process of evaluating the current layer quantization scheme and storing the relevant environmental parameters is as follows:

the compression ratio sc is reduced by the size of the model, and the size N of the model to be compressed_sizeNumber of layers N can be quantized, previous model N quantized, current model size N'_sizeAnd, knowing the quantized model N', evaluating to obtain an evaluation result r of the quantization scheme of the current layer, wherein the evaluation result r is defined by the following formula:

wherein, acc (N) is the top-5 accuracy of the model to be optimized on the data of the verification set, and acc (N') is the top-5 accuracy of the quantized model on the data of the same verification set;

Calculating the current environment state s according to the reinforcement learning environment state_iPreservation of r, s_i；

Updating the parameters of the reinforcement learning agent actor network theta and the evaluator network mu.

Further, the process of continuously searching the conditions and outputting the optimal model quantization scheme comprises the following steps:

s151: judging whether the current compressible layer is the last layer or not, if the current layer is not the last layer, i is not equal to n, entering the next layer, i is i +1, and the searching is not finished in the round, wherein i is the current layer number, and n is the quantifiable layer number of the model;

s152: on the contrary, if the current layer quantization scheme evaluation result is optimal, namely R is more than or equal to R_bestThen the current model quantization scheme is saved as the optimal model quantization scheme, i.e. P_bestP, and ending the current search round, i.e. the current search round number curreps is currpes + 1; otherwise, directly ending the search in the current round; wherein R is_bestFor the optimal evaluation result, r is the evaluation result of the quantization scheme of the current layer model, P_bestFor the optimal compression scheme, P is the current model quantization scheme.

Further, the multi-round searching process is as follows:

according to the search round number episodes, the current search round number currs does not reach the required search round number, namely currs < episodes, and S1 is repeatedly executed, otherwise, the model quantization scheme Finishing the search; finally, according to the obtained optimal compression scheme P_bestAnd outputting the optimal compression model for the optimization target.

According to the technical scheme, compared with the prior art, the invention discloses a method for quickly and automatically compressing the deep convolutional neural network, which takes a trained model to be optimized (namely the deep convolutional neural network) and an optimization target (namely the performance requirement of a hardware platform) as input, performs network compression through a lightweight method, and outputs a compressed network which can maximally keep the performance of the original network under the condition of not violating the hardware requirement.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are only embodiments of the invention, and that for a person skilled in the art, other drawings can be obtained from the provided drawings without inventive effort.

FIG. 1 is a flow chart of the method for fast and automatic compression of deep convolutional neural network according to the present invention.

FIG. 2 is a flow chart of an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a method for quickly and automatically compressing a network of a deep convolutional neural network, which is characterized in that as shown in figure 1, the method for quickly and automatically compressing the deep convolutional neural network performs compression task scheduling under the control of global information by combining network branch reduction and network quantization into one compression step.

Referring to fig. 2, the network to be compressed in this embodiment is mobilene _ v1, the hardware requirement is to compress the size of the model to be below the original 1/5 and keep the accuracy of the model top-5 within the loss range of 2%, the maximum bit number of the quantized model is 8, and the mixing precision is supported, that is, the bit number of each reserved channel can be between 1 and 8. The verification dataset is ImageNet, the hardware is GPU GTX1080 pieces, and Python language and PyTorch development framework are used.

Step 1: obtaining N quantized layer sets N ═ L according to input network to be compressed₁，...，L_i，...，L_nWhere I is 1 … n, the number of input channels I of the n quantifiable layers I₁，...，I_nIn the same layer, the size of each layer LS is { LS ═ LS }₁，...，LS_n}. According to the hardware requirement, obtaining the maximum compressed bit number bit_maxSearching the round numbers episcodes, constructing a reinforcement learning environment, and initializing an actor network theta and an evaluator network mu in the reinforcement learning agent. Here, bit_max8, epsilon 100; theta and mu are neural networks comprising 2 hidden layers, each hidden layer comprises 300 neurons, the learning rate of theta is 0.001, and the learning rate of mu is 0.0001; the warm-up rounds of reinforcement learning agents are 40 rounds, the sampling size is 64, the buffer size of each layer is 10, and the discount factor is 1.0.

Step 2: and (3) judging whether the search is finished or not by using the total search round number episodies obtained in the step (1) and knowing the current exploration times currep. If curve is larger than or equal to episodes, searching is completed, and the optimal compressed network N ' ═ L ' is output '₁，...，L′_i，...，L′_nWhere i is 1, …, n; otherwise, step 3 is carried out. n is the number of quantifiable layers in the network to be compressed, and the method does not delete the whole quantifiable layer.

And step 3: predicting the sparsity a of the current layer according to a formula (1) by using the actor network theta and the evaluator network mu in the reinforcement learning agent initialized in the step 1 _i。

Wherein s is_iIn the reinforcement learning state, σ is the environmental noise in the reinforcement learning in the search environment, and is initialized to 0.95 and gradually decays to 0. a is_iIs in the value range of (0, 1)]. In this example, a is the input layer and the convolution layer having a convolution kernel size of 3_iIs 1.0.

And 4, step 4: the number I of the (I + 1) th layer input channels obtained in the step 1_i+1Then the weight value L of the layer_iCan be written

Where F is the layer filter, k is the filter kernel size and k is 1 when the i-th layer is a fully connected layer. The model channel importance is defined as imp ═ { impi | i ═ 1.., n }, then the ith layer channel importance is

Where the c channel importance imp_cCalculated using equation (2).

Wherein the content of the first and second substances,

and the filter weight is the v-th filter weight corresponding to the c-th channel.

And 5: using the maximum bit number bit of the compressed model obtained in the step 1 and the step 4_maxAnd ith layer channel importance impⁱDefining the minimum bit number of the model channel as minBW ═ minBWⁱI 1., n }, then it can be calculated

Wherein, the minimum bit number of the channel of the ith layer of the c channel

Calculated using equation (3).

Wherein

Is an integer, the value range in this example is [1, 8 ].

Step 6: the ith layer setting is computed as the zero-bit channel index.

(I) Using the number of channels I in the layer obtained in step 1 _iCreating a channel index sequence

CI＝{p|p＝1，...，I_i}；

(II) using the layer channel importance imp obtained in step 4 and step 6(I)ⁱAnd the channel index sequence CI, and the index sequence CI is sorted according to the ascending order to obtain a sorted ascending index sequence CI';

(III) Using the number of channels I in the layer obtained in step 1 and step 3_iAnd layer sparsity a_iThe layer zero bit lane number cp is calculated using equation 4.

Wherein min (), max () are functions for solving the minimum value and the maximum value in the input sequence;

(IV) using the ascending ordered index sequence CI' obtained in steps 6(II) and 6(III) and the layered zero-bit lane number cp, the zero-bit lane index ZCI is obtained using equation (5).

ZCI＝topk(CI′，cp) (5)

Where the topk (a, k) function returns the first k elements in the input sequence a.

And 7: a current layer channel quantization scheme is calculated.

(I) Using the maximum bit number bit of the compressed model obtained in the step 1_maxAnd the number of input channels I of the layer_iInitializing the layer channel level quantization scheme Pⁱ＝{bit_max×1^k|k＝1，...，Iⁱ}. Bit in this embodiment_maxIs 8.

(II) Using the zero-bit channel index ZCI and the channel-level quantization scheme P obtained in step 6 and step 7(I)ⁱUpdating the layer channel-level quantization scheme P according to equation (6)ⁱ；

(III) quantizing scheme P of current layer after updatingⁱSave to model channel level quantization scheme

P＝{Pⁱ|i＝1，...，n}。

And 8: quantizing scheme P for current layer channel level obtained in step 7ⁱUpdating and maintaining an ambient state s_iWherein s is_iThe tuple definition described by equation (7) is used. If the current layer is not the last layer, the evaluation result r is counted to be 0, and the step 3 is skipped; otherwise, entering step 9;

(idx，t，out，in，w，h，stride，k，reducedFLOPs，resFLOPs，reducedSize，restSize，a_i-1) (7)

in the formula (7), idx is a layer index, t is a layer type, and there are two types, namely a convolutional layer and a fully-connected layer. out is the output channel and in is the input channel. w and h are the width and height of the input feature vector. stride and k are the step length of convolution layer convolution operation and the side length of convolution kernel, and stride and k are both 1 for the fully connected layer. reducedFLOPs is the amount of computation that the current compression strategy reduces, restFLOPs is the amount of model residual computation. rededsize is the model size that the current compression strategy reduces, and restSize is the model residual size. a is_i-1Is the previous compressible layer sparsity.

And step 9: according to the quantization layer set N obtained in the step 1 and the step 7, the number I of input channels of each layer and the quantization scheme P of the current model channel level, the ith quantization layer is known

Where F is the layer filter. Then use equation (8) according toP update Current model size N'_size。

Wherein, I_kThe number of channels in the k-th layer,

for the bit number of the c-th channel of the k-th layer in the model channel-level quantization scheme P,

Is the kth quantization layer L^kThe c-th filter weight.

Step 10: according to the hardware condition requirements of the model size compression ratio sc in the step 1 and the step 9, the size N of the model to be compressed_sizeCurrent model size N'_sizeAnd judging whether the current model channel-level quantization scheme needs to be adjusted or not. N 'if the hardware condition is satisfied'_size≤sc×N_sizeThen, r is estimated using equation (9), and the environmental state s is preserved_iUpdating parameters of the reinforcement learning agent actor network theta and the evaluator network mu, and skipping to the step 2; otherwise, entering step 11;

wherein, acc (M) is top-5 accuracy rate which can be obtained by inputting the model M on the verification set. In this embodiment, the verification data set is ImageNet, 20000 pictures 224 × 224, and the Batch size is 60.

Step 11: the model channel-level quantization scheme is greedily adjusted.

(I) Determining a layer i for quantization scheme adjustment according to the size LS of each layer of the model obtained in step 1 by using formula (10):

(II) according to the step 1,number of I-th layer channels I in step 5 and step 7(I)_iThe minimum channel bit number of the ith channel

And channel level quantization scheme PⁱObtaining a set of quantized channel indices

(III) calculating the importance imp of the ith layer channel according to the step 4 and the step 11 ⁱAnd the quantization index set QC of the layer, and the channel c of the ith layer for bit number adjustment is determined by using formula 11.

(IV) calculating the minimum bit number minBW of the i-th layer channel according to the step 5ⁱAnd (4) updating the layer quantization scheme by using the formula (12) and saving the layer quantization scheme to the layer P, the layer I and the channel c selected in the step 11(I) and the step II and the current quantization scheme P.

(V) jumping to step 9 to update the current model size N 'according to P'_size。

Step 12: for each compressible layer, obtaining a channel needing to be set to a zero bit; the minimum number of bits to which the channel can be quantized without being set to zero is then calculated. And finally, saving the current layer state and counting the middle layer evaluation result to be 0. When the last layer is processed, if the current channel quantization scheme meets the hardware requirement, evaluating the current channel compression scheme; if the current channel quantization scheme does not meet the hardware requirement, greedily adjusting the current channel quantization scheme to enable the current channel quantization scheme to meet the hardware requirement; after the evaluation is finished, adjusting the parameters of the reinforcement learning agent according to the evaluation result and the state; and training the reinforcement learning agent to correctly adjust the bit number of each channel through multi-round layer-by-layer search, and finally obtaining the optimal compression network.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for quickly and automatically compressing a deep convolutional neural network is characterized by comprising the steps of inputting an optimization target and a model to be optimized, initializing a reinforcement learning agent and an environment according to the model to be optimized and the optimization target, and searching an optimal model quantization scheme, wherein the specific search process comprises the following steps:

s1 single round search procedure:

s11: calculating a zero-bit channel index and a channel minimum bit number according to the channel importance;

s12: determining a current model quantization scheme according to the zero-bit channel index;

s13: calculating a post-quantization model size based on the current model quantization scheme;

If the current compressible layer is not the last layer or the current model quantization scheme meets the optimization target, evaluating the current layer quantization scheme based on the quantized model size and storing relevant environmental parameters;

s2 multiple search process: and S1 is repeatedly executed until the current search round number reaches the required search round number, the model quantization scheme search is finished, and the optimal compression model is output.

2. The method according to claim 1, wherein the model to be optimized comprises a quantization layer set N ═ { L ═ L₁，...，L_i，...，L_nWhere I is 1 … n, n denotes the number of quantisable layers, and the set of quantisable layer input channel numbers I is { I }₁，...，I_nThe required storage space set LS for quantized layers₁，...，LS_n}；

The optimization objective comprises a compressed model maximum bit rate bit_maxSearch round number episcodes and model size compression ratio sc ∈ (0, 1)]Initializing TOP-5 accuracy acc (N) of a model to be optimized and optimal evaluation result R_bestAnd an optimal model quantization scheme P _best；

When the reinforcement learning environment is initialized, the reinforcement learning state s_iIs defined as (idx, t, out, in, w, h, stride, k, rededFLOPs, resFLOPs, rededSize, restSize, a)_i-1) Wherein idx is a layer index, t is a layer type including a convolutional layer and a fully-connected layer, out is the number of output channels, in is the number of input channels, w and h are the width and height of input feature vectors, stride and k are the step length of convolutional layer convolution operation and the side length of a convolutional kernel, stride and k are both 1 in the fully-connected layer, rededFLOPs are the calculated quantity reduced by the current compression strategy and are initialized to 0, restFLOPs are the model residual calculated quantity and are initialized to the model calculated quantity N_FLOPsreducedSize is the model size reduced by the current compression strategy, initialized to 0, restSize isModel residual size, initialized to model size N_size，a_i-1For the previous compressible layer sparsity, initialize to 0;

the reinforcement learning agent includes an actor network θ, an evaluator network μ, and an environmental noise σ.

3. The method of claim 2, wherein the zero-bit channel index is determined according to channel importance, and the method comprises the following specific steps:

s111, creating a channel index sequence CI ═ p ═ 1, and I for the I-th layer according to the quantization layer input channel number set _i}，I_iInputting the channel number for the ith quantized layer;

s112, according to the importance of each channel c of the ith layer

Sequencing the channel index sequences CI in an ascending order to obtain an index sequence CI' after the ascending order;

wherein the ith layer is the importance of the c channel

In the formula I_i+1The number of input channels for the i +1 th quantisable layer,

is the ith filter weight, which is the ith filter weight

Obtaining;

s113, inputting the channel number I according to the quantization layer_iCalculating the channel number cp of the layer zero bit,

s_iIn the reinforcement learning state, θ is the actor network, μ is the evaluator network, σ is the environmental noise, a_iIs in the value range of (0, 1)]；

S114, calculating a zero-bit channel index ZCI based on the ascending-ordered index sequence CI' and the zero-bit channel number cp:

ZCI＝topk(CI′，cp) (3)。

4. the method according to claim 3, wherein the minimum number of channel bits minBW is calculated by the following formula:

minBW＝{minBWⁱ|i＝1，...，n} (4)

wherein, minBWⁱIndicates the minimum bit number, minBW, of the i-th layer channel ⁱ _cRepresents the minimum channel bit number, bit, of the ith layer of the c channel_maxFor the maximum number of bits of the model after compression,

the ith channel importance.

5. The method for fast and automatically compressing the deep convolutional neural network as claimed in claim 4, wherein the step S12 is as follows:

s121, initializing a channel level quantization scheme Pⁱ＝{bit_max×1^k|k＝1,...,I_i}，bit_maxFor the maximum number of bits of the compressed model, I_iThe number of input channels of the ith layer is;

s122, updating the channel quantization bit number according to the zero-bit channel index

Further obtaining the updated current layer quantization scheme PⁱThe calculation formula is as follows:

wherein ZCI is zero-bit channel index, and c has a value range of [1, I_i]，I_iThe number of input channels of the ith layer is;

s123, saving the updated current-layer quantization scheme Pi to the model channel-level quantization scheme P ═ P^k1.,. i }.

6. The method of claim 5, wherein the model size N 'after S13 quantization'_sizeThe specific calculation formula is as follows:

wherein the content of the first and second substances,

So as to obtain the compound with the characteristics of,

for the channel quantization bit number of the c channel at the k layer in the model channel level quantization scheme P, the (I + 1) th layer to the n layer are layers which are not quantized, the channel quantization bit numbers of all the channels are determined by the model to be optimized and take the value of 32, I _iThe number of input channels of the ith layer.

7. The method of claim 6, wherein if the current compressible layer is the last layer and the current model quantization scheme does not satisfy the optimization, i ═ N and N'_size＞sc×N_sizeN is the quantifiable number of layers, N'_sizeFor quantifying the model size, sc is the model size compression ratio, N_sizeIf the model size to be compressed is, adjusting the current model quantization scheme, wherein the specific adjustment steps and termination conditions are as follows:

s141, determining an adjustment layer for quantization schemei：

Wherein LS_iFor the ith layer size of the model to be optimized, the sequence of layers QL may be initialized to QL ═ { 1., N }, if QL ═ Φ or N'_size≤sc×N_sizeIf the quantization can not be carried out or the optimization target is met, stopping adjusting the current model quantization scheme;

s142, calculating a quantization channel index set of the adjustment layer i

Wherein

For the channel quantization bit number of the c-th channel of the k-th layer in the model channel-level quantization scheme P,

is the minimum bit number of the channel of the ith layer of the c channel, I_iThe number of input channels of the ith layer,

if QC is empty, remove the adjustable layer i from QL, carry on S141 again, QL is the adjustable layer sequence;

s143, determining an adjustment channel c in the ith layer for quantization scheme adjustment, where the adjustment channel c is defined as the following formula:

Where QC is the ith layer quantization index set,

the ith channel importance;

The adjustment is defined as the following formula:

wherein the content of the first and second substances,

the minimum bit number of the c channel of the ith layer is adjusted

Saving to the model channel level quantization scheme P;

s145, updating the size N 'of the current model according to the post-adjustment model channel level quantization scheme P'_size。

8. The method of claim 7, wherein the specific process of evaluating the quantization scheme of the current layer and saving the relevant environmental parameters comprises:

according to the compression ratio sc of the size of the model, the size N of the model to be compressed_sizeThe number of layers N, the model N to be optimized, the current model size N 'can be quantized'_sizeAnd a known quantized model N' is evaluated to obtain an evaluation result r of the quantization scheme of the current layer, and the specific calculation formula is as follows:

9. The method according to claim 8, wherein the S15 specifically includes:

s151: judging whether the current compressible layer is the last layer or not, if the current layer is not the last layer, i is not equal to n, entering the next layer i which is i +1, and the searching is not finished in the round, wherein i is the current layer number, and n is the model quantifiable layer number;

s152: otherwise, judging whether the evaluation result of the quantization scheme of the current layer is optimal or not, if so, R is more than or equal to R_bestSaving the current model quantization scheme as an optimal model quantization scheme, P_bestIf the search result is P, the search of the current round is ended, and if the search result is not P, the search of the current round is ended directly; wherein R is_bestFor the optimal evaluation result, r is the evaluation result of the quantization scheme of the current layer model, P_bestFor the optimal compression scheme, P is the current model quantization scheme.

10. The method of claim 9, wherein the multiple rounds of search processes are as follows:

judging whether the current search round number curreps meets curreps < epsilon foods or not according to the search round number epsilon foods or not, and not meeting the requirement of repeatedly executing the S1, otherwise, finishing the model quantization scheme search and according to the optimal compression scheme P _bestAnd outputting the optimal compression model aiming at the optimization target.