CN111860779A - Rapid automatic compression method for deep convolutional neural network - Google Patents
Rapid automatic compression method for deep convolutional neural network Download PDFInfo
- Publication number
- CN111860779A CN111860779A CN202010659862.0A CN202010659862A CN111860779A CN 111860779 A CN111860779 A CN 111860779A CN 202010659862 A CN202010659862 A CN 202010659862A CN 111860779 A CN111860779 A CN 111860779A
- Authority
- CN
- China
- Prior art keywords
- layer
- model
- channel
- current
- quantization scheme
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a method for quickly and automatically compressing a deep convolutional neural network. Finding an optimal model quantization scheme aiming at an optimization target and a model to be optimized: initializing a reinforcement learning agent and an environment, and outputting an optimal model when the specified number of search rounds is achieved; otherwise, single round searching is carried out. In a single-round search, determining a zero-bit channel index and a channel minimum bit number of each quantization layer, and calculating a quantization scheme of the layer of the current model and a size of the quantized model. If the current quantization layer is the last layer and does not meet the optimization goal, adjusting the model quantization scheme until the optimization goal is met or the model quantization scheme cannot be adjusted; otherwise, evaluating the quantization scheme and storing the environmental parameters, and if the current quantization layer is not the last layer, continuing the single-round search; and otherwise, if the current model quantization scheme is optimal, the quantization scheme is saved as the optimal model quantization scheme. A large amount of artificial parameter adjustment is avoided, network branch reduction and network quantification are not needed to meet hardware requirements, and a compressed network can be obtained quickly.
Description
Technical Field
The invention relates to the field of deep neural network lightweight, in particular to a method for quickly and automatically compressing a deep convolutional neural network for a target recognition task.
Background
With the development of semiconductor technology and hardware energy, hardware devices have been able to support higher concurrency and high throughput computing tasks. Meanwhile, in the last decade, the computer vision technology is rapidly developed, deep neural networks with different characteristics are produced, and human-like performance is obtained under application environments such as target detection and identification and scene understanding aiming at remote sensing image data. Therefore, in order to meet the development requirement of hardware and fully exert the advantages of the deep neural network on the tasks of target detection and identification (such as detection and identification of airplanes, ships and ground targets), the network lightweight has important research significance in the military and civil fields. Because the number of parameters of the deep convolutional neural network is huge and different hardware platforms exist, how to keep the network performance to be adapted to different hardware platforms on the premise of meeting hardware requirements (energy consumption and model size) is a difficult problem to be solved urgently.
Network pruning and network quantization are not only common network compression means but also the focus of current research. In 1989, the academia proposed an optimization method for neural network structure, namely, a reduction in the amount and size of network computation was achieved by reducing unnecessary connections in the network within the range of degradation allowed by the network performance. Since then, the network pruning process is modeled as a layer-by-layer path selection or network-by-network path selection problem. In industry, because the problem of modeling network pruning into network-by-network path selection not only increases a large amount of computation and storage overhead but also makes the compression process more complicated (the model needs to be finely adjusted or even retrained), for the trained model, a network pruning algorithm based on weight is generally used for channel selection layer by layer. Since layer-by-layer channel selection requires a large amount of human labor to determine the appropriate layer sparsity (0 bit element number in each layer) for each compressible layer, research on the method of adaptively adjusting the layer sparsity gradually draws attention from the industry and academia. Because the problem of determining the sparsity of each layer in the network to be compressed can be abstracted into a scheduling problem of dynamic programming, and reinforcement learning is a favorable means for solving the dynamic programming, the task of scheduling and compressing by using reinforcement learning is the key point of research in recent years, for example, an automatic branch reduction method by using reinforcement learning is adopted, and the method not only adapts to different industrial platforms, but also can determine the sparsity of each layer in a self-adaptive manner. On the other hand, in order to adapt to different hardware platforms, the storage precision of the model should be adjusted from model to model or even with smaller granularity, such as layer by layer. And adjusting the storage precision by model to 8 bits by using a K mean value method or using a check set to adjust the weight value, wherein the model performance can be nearly lossless. With the development of chip technology, the manufacturers or designs of one-line chips such as apple, great britain, and high-pass have issued chips supporting mixed precision operation. In order to adapt to different hardware platforms and further exert hardware advantages, adjusting model storage precision layer by layer is also one of the hot spots of research in recent years. Some methods may adjust the model storage accuracy of each layer by layer using reinforcement learning. However, due to the explosive increase in problem space size, these approaches do not support lower granularity such as per-channel model storage refinement.
The industry protocol (MPEG protocol ISO/IEC JTC 1/SC 29/WG 11 under N18575) states that the network performance to measure lightweighting will be from: the performance loss of the model, the size reduction of the model, the reduction of the calculated amount of the model, the compression time of the model and the decompression time of the model are measured in 5 aspects. Thus, considering the first 3, the lightweight framework that is now in common use is the "depth compression" framework. The framework firstly uses a network branch reduction technology to reduce the calculated amount, and then uses a network quantification technology to reduce the size of the model. The cascade design ensures compatibility with different compression methods, and has wide application in industry. However, this approach also emphasizes preservation of network performance. Therefore, to better balance hardware requirements and model performance, the framework may require repeating network pruning and network quantization tasks to search for a better compressed model, but this also causes additional compression time overhead. In industrial applications, lightweight approaches tailored to different hardware platforms are necessary. At this time, the network compression using the framework will generate a non-negligible time overhead. Therefore, considering model compression, performance preservation and compression time, the method is always the key point of network lightweight research and application.
The problems faced by this study are: (1) existing compression methods make compression decisions based on local information, i.e. only certain hardware requirements are considered and subsequent compression means are not considered. This not only increases the compression scheme that does not satisfy the compression condition in the compression and causes the compression to be performed again, but also outputs a sub-optimal solution in the middle of the compression, thereby causing the final output compression solution to be sub-optimal. (2) The existing compression framework only organizes a compression method based on a compression purpose, and does not schedule a compression task based on a hardware platform. In the general "depth compression" framework, network pruning is applied before network quantization. The method aims to reduce the calculated amount as much as possible by network branch reduction and reduce the size of the model as much as possible by network quantization. However, the reduction in the amount of calculation leads to a reduction in the performance of the model, and a framework designed only for pursuing a more lightweight model does not facilitate the full exploitation of the hardware performance in practice. (3) Finally, the "deep compression" framework achieves a potentially better performance-hardware balance point at the expense of time overhead. The root cause of these time overheads is caused by the step structure of the "deep compression" framework and the scheduling of compression tasks based on local information.
Therefore, how to provide a method for quickly and automatically reducing the weight of the deep convolutional neural network is a problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
In view of the above, the present invention provides a method for fast and automatically compressing a deep convolutional neural network, which not only avoids a large amount of artificial parameter adjustment, but also does not need to perform network pruning and network quantization separately to meet hardware requirements, and automatically predicts a reasonable bit number of each channel by using channel importance, thereby fast and automatically obtaining a compressed network.
In order to achieve the above purpose, the invention provides the following technical scheme:
a method for quickly and automatically compressing a deep convolutional neural network comprises the following steps: aiming at an input optimization target and a model to be optimized, searching to find an optimal model quantization scheme, wherein the model quantization scheme searching is a multi-round searching process. Firstly, initializing a reinforcement learning agent and an environment according to a model to be optimized and an optimization target, and then searching an optimal model quantization scheme, wherein the specific search process is as follows:
s1, single-round searching process:
s11: for each compressible layer, calculating a zero-bit channel index and a channel minimum bit number according to the channel importance;
S12: preliminarily calculating a current model quantization scheme according to a zero-bit channel index;
s13: calculating the size of a quantized model according to the quantization scheme of the current model;
s14: if the current compressible layer is the last layer and the current model quantization scheme does not meet the optimization target, circularly adjusting the current model quantization scheme according to the channel minimum bit number until the optimization target is met or quantization cannot be performed;
if the current compressible layer is not the last layer or the current model quantization scheme meets the optimization target, evaluating the current layer quantization scheme and storing relevant environmental parameters;
s15: continuing the search condition, judging whether to finish the current round of search, and further judging whether to finish the current round of search and whether to update the optimal model quantization scheme based on the current layer quantization scheme evaluation result;
s2, multi-round searching process: and S1 is repeatedly executed, after the number of current search rounds reaches the required number of search rounds, the model quantization scheme search is finished, and the optimal compression model aiming at the optimization target is output.
Further, the input model to be optimized includes N sets of quantifiable layers N ═ L1,...,Li,...,LnWhere I is 1 … n, a set of quantisable layer input channel numbers I is { I } 1,...,InSet of storage space required by the quantizeable layer LS ═ LS1,...,LSn};
The optimization target describes the requirements of the size, the search time and the like of the compressed model, and comprises the maximum compressed bit number bitmaxNumber of search roundsepiscodes, model size compression ratio sc ∈ (0, 1)]Initializing TOP-5 accuracy acc (N) of a model to be optimized; optimum evaluation result RbestInitialized to- ∞, optimal model quantization scheme PbestIs phi;
when the reinforcement learning environment is initialized, the reinforcement learning state siIs defined as (idx, t, out, in, w, h, stride, k, rededFLOPs, resFLOPs, rededSize, restSize, a)i-1) Wherein idx is a layer index; t is a layer type, and comprises a convolution layer and a full-connection layer; out is the number of output channels, and in is the number of input channels; w and h are the width and height of the input feature vector; stride and k are the step length of convolution layer convolution operation and the side length of convolution kernel, and stride and k are both 1 for a full connection layer; redundedfrops is the amount of computation reduced by the current compression strategy, initialized to 0; restFLOPs are model residual calculated quantities, initialized to model calculated quantity NFLOPs(ii) a rededsize is the size of the model reduced by the current compression strategy and is initialized to 0; restSize is the model residual size, initialized to model size N size;ai-1Is the previous compressible layer sparsity, initialized to 0;
the reinforcement learning agent comprises an actor network theta, an evaluator network mu and environmental noise sigma;
the current search round number curreps is initialized to 0.
Further, determining a zero-bit channel index according to the channel importance, the specific steps include:
s111, creating a channel index sequence CI ═ p ═ 1, and I for the I-th layer according to the quantization layer input channel number seti},IiInputting the number of channels for the ith quantizer layer;
s112, according to the channel index CI, for each channel c of the ith layer, according to the channel importanceSequencing the index sequence CI in ascending order to obtain a post-sequencing index sequence CI',
Wherein, Ii+1The number of input channels for the i +1 th quantisable layer,is the ith filter weight, which is the ith filter weightObtaining;for channel importance of i-th layer, imp ═ impiI 1.. n } is the importance of the model channel, and n is the quantifiable layer number;
s113, inputting the channel number I according to the quantization layeriCalculating the number of the zero bit channels of the layer,
wherein min () and max () are functions for solving the minimum value and the maximum value in the input sequence, and the layer sparsityWherein s isiIn the reinforcement learning state, θ is the actor network, μ is the evaluator network, σ is the environmental noise, a iIs in the value range of (0, 1)];
S114, calculating the zero-bit channel index based on the ascending ordered index sequence CI' and the layer zero-bit channel number cp:
ZCI=topk(CI′,cp) (3)
wherein the topk (a, k) function returns the first k elements in the input sequence a.
Further, the channel minimum bit number minBW is calculated:
minBW={minBWi|i=1,...,n} (4)
wherein, minBWiIndicates the minimum bit number, minBW, of the i-th layer channeli cRepresents the minimum channel bit number, bit, of the ith layer of the c channelmaxFor the maximum number of bits of the model after compression,the ith channel importance.
Further, the specific step of S12 is as follows:
s121, initializing the quantization scheme P of the layeri={bitmax×1k|k=1,...,Ii},bitmaxFor the maximum number of bits of the compressed model, IiThe number of input channels of the ith layer is;
s122, updating the layer of channel level quantization scheme Pi;
For the ith channel of the ith layer in P, the quantization bit number of the channelIs defined as the formula
Wherein ZCI is zero-bit channel index, and c has a value range of [1, Ii],IiThe number of input channels of the ith layer.
S123, the updated current layer quantization scheme PiSave to model channel level quantization scheme P ═ { Pk|k=1,...,i}。
Further, the model size N 'after quantization of S13'sizeThe specific calculation formula is as follows:
wherein the content of the first and second substances,for the c-th filter weight of the k-th layer, according to the k-th quantized layer weight Obtaining;for the channel quantization bit number of the c channel at the kth layer in the channel quantization scheme P, for the layers which are not quantized, i.e., the (i + 1) th layer to the nth layer, the channel quantization bit numbers of all the channels are determined by the model to be optimized, and are generally 32; i isiThe number of the input channels of the ith layer is the number of the input channels of the ith layer.
Further, if the current compressible layer is the last layer and the current model quantization scheme does not satisfy the optimization, i.e., i ═ N and N'size>sc×NsizeWhere N is the quantifiable number of layers, N'sizeFor the current model size, sc is the model size compression ratio, NsizeIf the model size is to be compressed, the current model quantization scheme is adjusted, and the specific adjustment steps and termination conditions are as follows:
s141, determining a layer i for quantization scheme adjustment, wherein the adjustment layer i is defined as the following formula
Wherein LSiFor the ith layer size of the model to be optimized, the sequence of layers QL may be initialized to QL ═ { 1., N }, if QL ═ Φ or N'size≤sc×NsizeIf the quantization cannot be performed or the optimization target is met, stopping adjusting the current model quantization scheme;
s142, calculating the quantization channel index set of the ith layerWhereinFor the channel quantization bit number of the c-th channel of the k-th layer in the model channel quantization scheme P,is the minimum bit number of the channel of the ith layer of the c channel, I iThe number of input channels of the ith layer is; if QC is empty, remove i from QL, carry on S141 again, QL is the above-mentioned adjustable layer sequence;
s143, determining an adjusting channel c in the ith layer for quantization scheme adjustment, wherein the adjusting channel c is defined as the following formula
s144, adjusting the channel quantization bit number of the ith channel in the ith layer of the c channel in the model channel level quantization scheme PThe adjustment is defined as the following formula
Wherein the content of the first and second substances,the minimum bit number of the c channel of the ith layer is adjustedSaving to the model channel level quantization scheme P;
s145, updating the current model size N 'according to the post-adjustment model channel level quantization scheme P and the formula (8)'size。
Further, the specific process of evaluating the current layer quantization scheme and storing the relevant environmental parameters is as follows:
the compression ratio sc is reduced by the size of the model, and the size N of the model to be compressedsizeNumber of layers N can be quantized, previous model N quantized, current model size N'sizeAnd, knowing the quantized model N', evaluating to obtain an evaluation result r of the quantization scheme of the current layer, wherein the evaluation result r is defined by the following formula:
wherein, acc (N) is the top-5 accuracy of the model to be optimized on the data of the verification set, and acc (N') is the top-5 accuracy of the quantized model on the data of the same verification set;
Calculating the current environment state s according to the reinforcement learning environment stateiPreservation of r, si;
Updating the parameters of the reinforcement learning agent actor network theta and the evaluator network mu.
Further, the process of continuously searching the conditions and outputting the optimal model quantization scheme comprises the following steps:
s151: judging whether the current compressible layer is the last layer or not, if the current layer is not the last layer, i is not equal to n, entering the next layer, i is i +1, and the searching is not finished in the round, wherein i is the current layer number, and n is the quantifiable layer number of the model;
s152: on the contrary, if the current layer quantization scheme evaluation result is optimal, namely R is more than or equal to RbestThen the current model quantization scheme is saved as the optimal model quantization scheme, i.e. PbestP, and ending the current search round, i.e. the current search round number curreps is currpes + 1; otherwise, directly ending the search in the current round; wherein R isbestFor the optimal evaluation result, r is the evaluation result of the quantization scheme of the current layer model, PbestFor the optimal compression scheme, P is the current model quantization scheme.
Further, the multi-round searching process is as follows:
according to the search round number episodes, the current search round number currs does not reach the required search round number, namely currs < episodes, and S1 is repeatedly executed, otherwise, the model quantization scheme Finishing the search; finally, according to the obtained optimal compression scheme PbestAnd outputting the optimal compression model for the optimization target.
According to the technical scheme, compared with the prior art, the invention discloses a method for quickly and automatically compressing the deep convolutional neural network, which takes a trained model to be optimized (namely the deep convolutional neural network) and an optimization target (namely the performance requirement of a hardware platform) as input, performs network compression through a lightweight method, and outputs a compressed network which can maximally keep the performance of the original network under the condition of not violating the hardware requirement.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are only embodiments of the invention, and that for a person skilled in the art, other drawings can be obtained from the provided drawings without inventive effort.
FIG. 1 is a flow chart of the method for fast and automatic compression of deep convolutional neural network according to the present invention.
FIG. 2 is a flow chart of an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a method for quickly and automatically compressing a network of a deep convolutional neural network, which is characterized in that as shown in figure 1, the method for quickly and automatically compressing the deep convolutional neural network performs compression task scheduling under the control of global information by combining network branch reduction and network quantization into one compression step.
Referring to fig. 2, the network to be compressed in this embodiment is mobilene _ v1, the hardware requirement is to compress the size of the model to be below the original 1/5 and keep the accuracy of the model top-5 within the loss range of 2%, the maximum bit number of the quantized model is 8, and the mixing precision is supported, that is, the bit number of each reserved channel can be between 1 and 8. The verification dataset is ImageNet, the hardware is GPU GTX1080 pieces, and Python language and PyTorch development framework are used.
Step 1: obtaining N quantized layer sets N ═ L according to input network to be compressed1,...,Li,...,LnWhere I is 1 … n, the number of input channels I of the n quantifiable layers I1,...,InIn the same layer, the size of each layer LS is { LS ═ LS }1,...,LSn}. According to the hardware requirement, obtaining the maximum compressed bit number bitmaxSearching the round numbers episcodes, constructing a reinforcement learning environment, and initializing an actor network theta and an evaluator network mu in the reinforcement learning agent. Here, bitmax8, epsilon 100; theta and mu are neural networks comprising 2 hidden layers, each hidden layer comprises 300 neurons, the learning rate of theta is 0.001, and the learning rate of mu is 0.0001; the warm-up rounds of reinforcement learning agents are 40 rounds, the sampling size is 64, the buffer size of each layer is 10, and the discount factor is 1.0.
Step 2: and (3) judging whether the search is finished or not by using the total search round number episodies obtained in the step (1) and knowing the current exploration times currep. If curve is larger than or equal to episodes, searching is completed, and the optimal compressed network N ' ═ L ' is output '1,...,L′i,...,L′nWhere i is 1, …, n; otherwise, step 3 is carried out. n is the number of quantifiable layers in the network to be compressed, and the method does not delete the whole quantifiable layer.
And step 3: predicting the sparsity a of the current layer according to a formula (1) by using the actor network theta and the evaluator network mu in the reinforcement learning agent initialized in the step 1 i。
Wherein s isiIn the reinforcement learning state, σ is the environmental noise in the reinforcement learning in the search environment, and is initialized to 0.95 and gradually decays to 0. a isiIs in the value range of (0, 1)]. In this example, a is the input layer and the convolution layer having a convolution kernel size of 3iIs 1.0.
And 4, step 4: the number I of the (I + 1) th layer input channels obtained in the step 1i+1Then the weight value L of the layeriCan be writtenWhere F is the layer filter, k is the filter kernel size and k is 1 when the i-th layer is a fully connected layer. The model channel importance is defined as imp ═ { impi | i ═ 1.., n }, then the ith layer channel importance isWhere the c channel importance impcCalculated using equation (2).
Wherein the content of the first and second substances,and the filter weight is the v-th filter weight corresponding to the c-th channel.
And 5: using the maximum bit number bit of the compressed model obtained in the step 1 and the step 4maxAnd ith layer channel importance impiDefining the minimum bit number of the model channel as minBW ═ minBWiI 1., n }, then it can be calculatedWherein, the minimum bit number of the channel of the ith layer of the c channelCalculated using equation (3).
Step 6: the ith layer setting is computed as the zero-bit channel index.
(I) Using the number of channels I in the layer obtained in step 1 iCreating a channel index sequence
CI={p|p=1,...,Ii};
(II) using the layer channel importance imp obtained in step 4 and step 6(I)iAnd the channel index sequence CI, and the index sequence CI is sorted according to the ascending order to obtain a sorted ascending index sequence CI';
(III) Using the number of channels I in the layer obtained in step 1 and step 3iAnd layer sparsity aiThe layer zero bit lane number cp is calculated using equation 4.
Wherein min (), max () are functions for solving the minimum value and the maximum value in the input sequence;
(IV) using the ascending ordered index sequence CI' obtained in steps 6(II) and 6(III) and the layered zero-bit lane number cp, the zero-bit lane index ZCI is obtained using equation (5).
ZCI=topk(CI′,cp) (5)
Where the topk (a, k) function returns the first k elements in the input sequence a.
And 7: a current layer channel quantization scheme is calculated.
(I) Using the maximum bit number bit of the compressed model obtained in the step 1maxAnd the number of input channels I of the layeriInitializing the layer channel level quantization scheme Pi={bitmax×1k|k=1,...,Ii}. Bit in this embodimentmaxIs 8.
(II) Using the zero-bit channel index ZCI and the channel-level quantization scheme P obtained in step 6 and step 7(I)iUpdating the layer channel-level quantization scheme P according to equation (6)i;
(III) quantizing scheme P of current layer after updatingiSave to model channel level quantization scheme
P={Pi|i=1,...,n}。
And 8: quantizing scheme P for current layer channel level obtained in step 7iUpdating and maintaining an ambient state siWherein s isiThe tuple definition described by equation (7) is used. If the current layer is not the last layer, the evaluation result r is counted to be 0, and the step 3 is skipped; otherwise, entering step 9;
(idx,t,out,in,w,h,stride,k,reducedFLOPs,resFLOPs,reducedSize,restSize,ai-1) (7)
in the formula (7), idx is a layer index, t is a layer type, and there are two types, namely a convolutional layer and a fully-connected layer. out is the output channel and in is the input channel. w and h are the width and height of the input feature vector. stride and k are the step length of convolution layer convolution operation and the side length of convolution kernel, and stride and k are both 1 for the fully connected layer. reducedFLOPs is the amount of computation that the current compression strategy reduces, restFLOPs is the amount of model residual computation. rededsize is the model size that the current compression strategy reduces, and restSize is the model residual size. a isi-1Is the previous compressible layer sparsity.
And step 9: according to the quantization layer set N obtained in the step 1 and the step 7, the number I of input channels of each layer and the quantization scheme P of the current model channel level, the ith quantization layer is knownWhere F is the layer filter. Then use equation (8) according toP update Current model size N'size。
Wherein, IkThe number of channels in the k-th layer,for the bit number of the c-th channel of the k-th layer in the model channel-level quantization scheme P, Is the kth quantization layer LkThe c-th filter weight.
Step 10: according to the hardware condition requirements of the model size compression ratio sc in the step 1 and the step 9, the size N of the model to be compressedsizeCurrent model size N'sizeAnd judging whether the current model channel-level quantization scheme needs to be adjusted or not. N 'if the hardware condition is satisfied'size≤sc×NsizeThen, r is estimated using equation (9), and the environmental state s is preservediUpdating parameters of the reinforcement learning agent actor network theta and the evaluator network mu, and skipping to the step 2; otherwise, entering step 11;
wherein, acc (M) is top-5 accuracy rate which can be obtained by inputting the model M on the verification set. In this embodiment, the verification data set is ImageNet, 20000 pictures 224 × 224, and the Batch size is 60.
Step 11: the model channel-level quantization scheme is greedily adjusted.
(I) Determining a layer i for quantization scheme adjustment according to the size LS of each layer of the model obtained in step 1 by using formula (10):
(II) according to the step 1,number of I-th layer channels I in step 5 and step 7(I)iThe minimum channel bit number of the ith channelAnd channel level quantization scheme PiObtaining a set of quantized channel indices
(III) calculating the importance imp of the ith layer channel according to the step 4 and the step 11 iAnd the quantization index set QC of the layer, and the channel c of the ith layer for bit number adjustment is determined by using formula 11.
(IV) calculating the minimum bit number minBW of the i-th layer channel according to the step 5iAnd (4) updating the layer quantization scheme by using the formula (12) and saving the layer quantization scheme to the layer P, the layer I and the channel c selected in the step 11(I) and the step II and the current quantization scheme P.
(V) jumping to step 9 to update the current model size N 'according to P'size。
Step 12: for each compressible layer, obtaining a channel needing to be set to a zero bit; the minimum number of bits to which the channel can be quantized without being set to zero is then calculated. And finally, saving the current layer state and counting the middle layer evaluation result to be 0. When the last layer is processed, if the current channel quantization scheme meets the hardware requirement, evaluating the current channel compression scheme; if the current channel quantization scheme does not meet the hardware requirement, greedily adjusting the current channel quantization scheme to enable the current channel quantization scheme to meet the hardware requirement; after the evaluation is finished, adjusting the parameters of the reinforcement learning agent according to the evaluation result and the state; and training the reinforcement learning agent to correctly adjust the bit number of each channel through multi-round layer-by-layer search, and finally obtaining the optimal compression network.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A method for quickly and automatically compressing a deep convolutional neural network is characterized by comprising the steps of inputting an optimization target and a model to be optimized, initializing a reinforcement learning agent and an environment according to the model to be optimized and the optimization target, and searching an optimal model quantization scheme, wherein the specific search process comprises the following steps:
s1 single round search procedure:
s11: calculating a zero-bit channel index and a channel minimum bit number according to the channel importance;
s12: determining a current model quantization scheme according to the zero-bit channel index;
s13: calculating a post-quantization model size based on the current model quantization scheme;
s14: if the current compressible layer is the last layer and the current model quantization scheme does not meet the optimization target, circularly adjusting the current model quantization scheme according to the channel minimum bit number until the optimization target is met or quantization cannot be performed;
If the current compressible layer is not the last layer or the current model quantization scheme meets the optimization target, evaluating the current layer quantization scheme based on the quantized model size and storing relevant environmental parameters;
s15: continuing the search condition, judging whether to finish the current round of search, and further judging whether to finish the current round of search and whether to update the optimal model quantization scheme based on the current layer quantization scheme evaluation result;
s2 multiple search process: and S1 is repeatedly executed until the current search round number reaches the required search round number, the model quantization scheme search is finished, and the optimal compression model is output.
2. The method according to claim 1, wherein the model to be optimized comprises a quantization layer set N ═ { L ═ L1,...,Li,...,LnWhere I is 1 … n, n denotes the number of quantisable layers, and the set of quantisable layer input channel numbers I is { I }1,...,InThe required storage space set LS for quantized layers1,...,LSn};
The optimization objective comprises a compressed model maximum bit rate bitmaxSearch round number episcodes and model size compression ratio sc ∈ (0, 1)]Initializing TOP-5 accuracy acc (N) of a model to be optimized and optimal evaluation result RbestAnd an optimal model quantization scheme P best;
When the reinforcement learning environment is initialized, the reinforcement learning state siIs defined as (idx, t, out, in, w, h, stride, k, rededFLOPs, resFLOPs, rededSize, restSize, a)i-1) Wherein idx is a layer index, t is a layer type including a convolutional layer and a fully-connected layer, out is the number of output channels, in is the number of input channels, w and h are the width and height of input feature vectors, stride and k are the step length of convolutional layer convolution operation and the side length of a convolutional kernel, stride and k are both 1 in the fully-connected layer, rededFLOPs are the calculated quantity reduced by the current compression strategy and are initialized to 0, restFLOPs are the model residual calculated quantity and are initialized to the model calculated quantity NFLOPsreducedSize is the model size reduced by the current compression strategy, initialized to 0, restSize isModel residual size, initialized to model size Nsize,ai-1For the previous compressible layer sparsity, initialize to 0;
the reinforcement learning agent includes an actor network θ, an evaluator network μ, and an environmental noise σ.
3. The method of claim 2, wherein the zero-bit channel index is determined according to channel importance, and the method comprises the following specific steps:
s111, creating a channel index sequence CI ═ p ═ 1, and I for the I-th layer according to the quantization layer input channel number set i},IiInputting the channel number for the ith quantized layer;
s112, according to the importance of each channel c of the ith layerSequencing the channel index sequences CI in an ascending order to obtain an index sequence CI' after the ascending order;
In the formula Ii+1The number of input channels for the i +1 th quantisable layer,is the ith filter weight, which is the ith filter weightObtaining;for channel importance of i-th layer, imp ═ impiI 1.. n } is the importance of the model channel, and n is the quantifiable layer number;
s113, inputting the channel number I according to the quantization layeriCalculating the channel number cp of the layer zero bit,
wherein min () and max () are functions for solving the minimum value and the maximum value in the input sequence, and the layer sparsitysiIn the reinforcement learning state, θ is the actor network, μ is the evaluator network, σ is the environmental noise, aiIs in the value range of (0, 1)];
S114, calculating a zero-bit channel index ZCI based on the ascending-ordered index sequence CI' and the zero-bit channel number cp:
ZCI=topk(CI′,cp) (3)。
4. the method according to claim 3, wherein the minimum number of channel bits minBW is calculated by the following formula:
minBW={minBWi|i=1,...,n} (4)
5. The method for fast and automatically compressing the deep convolutional neural network as claimed in claim 4, wherein the step S12 is as follows:
s121, initializing a channel level quantization scheme Pi={bitmax×1k|k=1,...,Ii},bitmaxFor the maximum number of bits of the compressed model, IiThe number of input channels of the ith layer is;
s122, updating the channel quantization bit number according to the zero-bit channel indexFurther obtaining the updated current layer quantization scheme PiThe calculation formula is as follows:
wherein ZCI is zero-bit channel index, and c has a value range of [1, Ii],IiThe number of input channels of the ith layer is;
s123, saving the updated current-layer quantization scheme Pi to the model channel-level quantization scheme P ═ Pk1.,. i }.
6. The method of claim 5, wherein the model size N 'after S13 quantization'sizeThe specific calculation formula is as follows:
wherein the content of the first and second substances,for the c-th filter weight of the k-th layer, according to the k-th quantized layer weightSo as to obtain the compound with the characteristics of,for the channel quantization bit number of the c channel at the k layer in the model channel level quantization scheme P, the (I + 1) th layer to the n layer are layers which are not quantized, the channel quantization bit numbers of all the channels are determined by the model to be optimized and take the value of 32, I iThe number of input channels of the ith layer.
7. The method of claim 6, wherein if the current compressible layer is the last layer and the current model quantization scheme does not satisfy the optimization, i ═ N and N'size>sc×NsizeN is the quantifiable number of layers, N'sizeFor quantifying the model size, sc is the model size compression ratio, NsizeIf the model size to be compressed is, adjusting the current model quantization scheme, wherein the specific adjustment steps and termination conditions are as follows:
s141, determining an adjustment layer for quantization schemei:
Wherein LSiFor the ith layer size of the model to be optimized, the sequence of layers QL may be initialized to QL ═ { 1., N }, if QL ═ Φ or N'size≤sc×NsizeIf the quantization can not be carried out or the optimization target is met, stopping adjusting the current model quantization scheme;
s142, calculating a quantization channel index set of the adjustment layer iWhereinFor the channel quantization bit number of the c-th channel of the k-th layer in the model channel-level quantization scheme P,is the minimum bit number of the channel of the ith layer of the c channel, IiThe number of input channels of the ith layer,
if QC is empty, remove the adjustable layer i from QL, carry on S141 again, QL is the adjustable layer sequence;
s143, determining an adjustment channel c in the ith layer for quantization scheme adjustment, where the adjustment channel c is defined as the following formula:
s144, adjusting the channel quantization bit number of the ith channel in the ith layer of the c channel in the model channel level quantization scheme PThe adjustment is defined as the following formula:
wherein the content of the first and second substances,the minimum bit number of the c channel of the ith layer is adjustedSaving to the model channel level quantization scheme P;
s145, updating the size N 'of the current model according to the post-adjustment model channel level quantization scheme P'size。
8. The method of claim 7, wherein the specific process of evaluating the quantization scheme of the current layer and saving the relevant environmental parameters comprises:
according to the compression ratio sc of the size of the model, the size N of the model to be compressedsizeThe number of layers N, the model N to be optimized, the current model size N 'can be quantized'sizeAnd a known quantized model N' is evaluated to obtain an evaluation result r of the quantization scheme of the current layer, and the specific calculation formula is as follows:
wherein, acc (N) is the top-5 accuracy of the model to be optimized on the data of the verification set, and acc (N') is the top-5 accuracy of the quantized model on the data of the same verification set;
calculating the current environment state s according to the reinforcement learning environment stateiPreservation of r, si;
Updating the parameters of the reinforcement learning agent actor network theta and the evaluator network mu.
9. The method according to claim 8, wherein the S15 specifically includes:
s151: judging whether the current compressible layer is the last layer or not, if the current layer is not the last layer, i is not equal to n, entering the next layer i which is i +1, and the searching is not finished in the round, wherein i is the current layer number, and n is the model quantifiable layer number;
s152: otherwise, judging whether the evaluation result of the quantization scheme of the current layer is optimal or not, if so, R is more than or equal to RbestSaving the current model quantization scheme as an optimal model quantization scheme, PbestIf the search result is P, the search of the current round is ended, and if the search result is not P, the search of the current round is ended directly; wherein R isbestFor the optimal evaluation result, r is the evaluation result of the quantization scheme of the current layer model, PbestFor the optimal compression scheme, P is the current model quantization scheme.
10. The method of claim 9, wherein the multiple rounds of search processes are as follows:
judging whether the current search round number curreps meets curreps < epsilon foods or not according to the search round number epsilon foods or not, and not meeting the requirement of repeatedly executing the S1, otherwise, finishing the model quantization scheme search and according to the optimal compression scheme P bestAnd outputting the optimal compression model aiming at the optimization target.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010659862.0A CN111860779A (en) | 2020-07-09 | 2020-07-09 | Rapid automatic compression method for deep convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010659862.0A CN111860779A (en) | 2020-07-09 | 2020-07-09 | Rapid automatic compression method for deep convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111860779A true CN111860779A (en) | 2020-10-30 |
Family
ID=73153670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010659862.0A Pending CN111860779A (en) | 2020-07-09 | 2020-07-09 | Rapid automatic compression method for deep convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111860779A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329923A (en) * | 2020-11-24 | 2021-02-05 | 杭州海康威视数字技术股份有限公司 | Model compression method and device, electronic equipment and readable storage medium |
CN113282535A (en) * | 2021-05-25 | 2021-08-20 | 北京市商汤科技开发有限公司 | Quantization processing method and device and quantization processing chip |
CN113627593A (en) * | 2021-08-04 | 2021-11-09 | 西北工业大学 | Automatic quantification method of target detection model fast R-CNN |
CN113657592A (en) * | 2021-07-29 | 2021-11-16 | 中国科学院软件研究所 | Software-defined satellite self-adaptive pruning model compression method |
-
2020
- 2020-07-09 CN CN202010659862.0A patent/CN111860779A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329923A (en) * | 2020-11-24 | 2021-02-05 | 杭州海康威视数字技术股份有限公司 | Model compression method and device, electronic equipment and readable storage medium |
CN112329923B (en) * | 2020-11-24 | 2024-05-28 | 杭州海康威视数字技术股份有限公司 | Model compression method and device, electronic equipment and readable storage medium |
CN113282535A (en) * | 2021-05-25 | 2021-08-20 | 北京市商汤科技开发有限公司 | Quantization processing method and device and quantization processing chip |
CN113282535B (en) * | 2021-05-25 | 2022-11-25 | 北京市商汤科技开发有限公司 | Quantization processing method and device and quantization processing chip |
CN113657592A (en) * | 2021-07-29 | 2021-11-16 | 中国科学院软件研究所 | Software-defined satellite self-adaptive pruning model compression method |
CN113657592B (en) * | 2021-07-29 | 2024-03-05 | 中国科学院软件研究所 | Software-defined satellite self-adaptive pruning model compression method |
CN113627593A (en) * | 2021-08-04 | 2021-11-09 | 西北工业大学 | Automatic quantification method of target detection model fast R-CNN |
CN113627593B (en) * | 2021-08-04 | 2024-06-04 | 西北工业大学 | Automatic quantization method for target detection model Faster R-CNN |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111860779A (en) | Rapid automatic compression method for deep convolutional neural network | |
CN109002889B (en) | Adaptive iterative convolution neural network model compression method | |
WO2021185125A1 (en) | Fixed-point method and apparatus for neural network | |
CN112052951B (en) | Pruning neural network method, system, equipment and readable storage medium | |
CN112733964B (en) | Convolutional neural network quantization method for reinforcement learning automatic perception weight distribution | |
CN111489364B (en) | Medical image segmentation method based on lightweight full convolution neural network | |
CN112446491B (en) | Real-time automatic quantification method and real-time automatic quantification system for neural network model | |
CN110969251A (en) | Neural network model quantification method and device based on label-free data | |
CN113132723B (en) | Image compression method and device | |
TWI761813B (en) | Video analysis method and related model training methods, electronic device and storage medium thereof | |
CN113269312B (en) | Model compression method and system combining quantization and pruning search | |
CN110647990A (en) | Cutting method of deep convolutional neural network model based on grey correlation analysis | |
CN114943335A (en) | Layer-by-layer optimization method of ternary neural network | |
CN112819050A (en) | Knowledge distillation and image processing method, device, electronic equipment and storage medium | |
CN114154626B (en) | Filter pruning method for image classification task | |
KR102454420B1 (en) | Method and apparatus processing weight of artificial neural network for super resolution | |
CN114239799A (en) | Efficient target detection method, device, medium and system | |
CN114140641A (en) | Image classification-oriented multi-parameter self-adaptive heterogeneous parallel computing method | |
US6813390B2 (en) | Scalable expandable system and method for optimizing a random system of algorithms for image quality | |
CN116306879A (en) | Data processing method, device, electronic equipment and storage medium | |
CN113177627B (en) | Optimization system, retraining system, method thereof, processor and readable medium | |
CN114118357A (en) | Retraining method and system for replacing activation function in computer visual neural network | |
CN114548360A (en) | Method for updating artificial neural network | |
CN112529350A (en) | Developer recommendation method for cold start task | |
Wang et al. | Exploring quantization in few-shot learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201030 |
|
RJ01 | Rejection of invention patent application after publication |