CN111860779A - Rapid automatic compression method for deep convolutional neural network - Google Patents

Rapid automatic compression method for deep convolutional neural network Download PDF

Info

Publication number
CN111860779A
CN111860779A CN202010659862.0A CN202010659862A CN111860779A CN 111860779 A CN111860779 A CN 111860779A CN 202010659862 A CN202010659862 A CN 202010659862A CN 111860779 A CN111860779 A CN 111860779A
Authority
CN
China
Prior art keywords
layer
model
channel
current
quantization scheme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010659862.0A
Other languages
Chinese (zh)
Inventor
唐文婷
韦星星
王越
李波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202010659862.0A priority Critical patent/CN111860779A/en
Publication of CN111860779A publication Critical patent/CN111860779A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a method for quickly and automatically compressing a deep convolutional neural network. Finding an optimal model quantization scheme aiming at an optimization target and a model to be optimized: initializing a reinforcement learning agent and an environment, and outputting an optimal model when the specified number of search rounds is achieved; otherwise, single round searching is carried out. In a single-round search, determining a zero-bit channel index and a channel minimum bit number of each quantization layer, and calculating a quantization scheme of the layer of the current model and a size of the quantized model. If the current quantization layer is the last layer and does not meet the optimization goal, adjusting the model quantization scheme until the optimization goal is met or the model quantization scheme cannot be adjusted; otherwise, evaluating the quantization scheme and storing the environmental parameters, and if the current quantization layer is not the last layer, continuing the single-round search; and otherwise, if the current model quantization scheme is optimal, the quantization scheme is saved as the optimal model quantization scheme. A large amount of artificial parameter adjustment is avoided, network branch reduction and network quantification are not needed to meet hardware requirements, and a compressed network can be obtained quickly.

Description

Rapid automatic compression method for deep convolutional neural network
Technical Field
The invention relates to the field of deep neural network lightweight, in particular to a method for quickly and automatically compressing a deep convolutional neural network for a target recognition task.
Background
With the development of semiconductor technology and hardware energy, hardware devices have been able to support higher concurrency and high throughput computing tasks. Meanwhile, in the last decade, the computer vision technology is rapidly developed, deep neural networks with different characteristics are produced, and human-like performance is obtained under application environments such as target detection and identification and scene understanding aiming at remote sensing image data. Therefore, in order to meet the development requirement of hardware and fully exert the advantages of the deep neural network on the tasks of target detection and identification (such as detection and identification of airplanes, ships and ground targets), the network lightweight has important research significance in the military and civil fields. Because the number of parameters of the deep convolutional neural network is huge and different hardware platforms exist, how to keep the network performance to be adapted to different hardware platforms on the premise of meeting hardware requirements (energy consumption and model size) is a difficult problem to be solved urgently.
Network pruning and network quantization are not only common network compression means but also the focus of current research. In 1989, the academia proposed an optimization method for neural network structure, namely, a reduction in the amount and size of network computation was achieved by reducing unnecessary connections in the network within the range of degradation allowed by the network performance. Since then, the network pruning process is modeled as a layer-by-layer path selection or network-by-network path selection problem. In industry, because the problem of modeling network pruning into network-by-network path selection not only increases a large amount of computation and storage overhead but also makes the compression process more complicated (the model needs to be finely adjusted or even retrained), for the trained model, a network pruning algorithm based on weight is generally used for channel selection layer by layer. Since layer-by-layer channel selection requires a large amount of human labor to determine the appropriate layer sparsity (0 bit element number in each layer) for each compressible layer, research on the method of adaptively adjusting the layer sparsity gradually draws attention from the industry and academia. Because the problem of determining the sparsity of each layer in the network to be compressed can be abstracted into a scheduling problem of dynamic programming, and reinforcement learning is a favorable means for solving the dynamic programming, the task of scheduling and compressing by using reinforcement learning is the key point of research in recent years, for example, an automatic branch reduction method by using reinforcement learning is adopted, and the method not only adapts to different industrial platforms, but also can determine the sparsity of each layer in a self-adaptive manner. On the other hand, in order to adapt to different hardware platforms, the storage precision of the model should be adjusted from model to model or even with smaller granularity, such as layer by layer. And adjusting the storage precision by model to 8 bits by using a K mean value method or using a check set to adjust the weight value, wherein the model performance can be nearly lossless. With the development of chip technology, the manufacturers or designs of one-line chips such as apple, great britain, and high-pass have issued chips supporting mixed precision operation. In order to adapt to different hardware platforms and further exert hardware advantages, adjusting model storage precision layer by layer is also one of the hot spots of research in recent years. Some methods may adjust the model storage accuracy of each layer by layer using reinforcement learning. However, due to the explosive increase in problem space size, these approaches do not support lower granularity such as per-channel model storage refinement.
The industry protocol (MPEG protocol ISO/IEC JTC 1/SC 29/WG 11 under N18575) states that the network performance to measure lightweighting will be from: the performance loss of the model, the size reduction of the model, the reduction of the calculated amount of the model, the compression time of the model and the decompression time of the model are measured in 5 aspects. Thus, considering the first 3, the lightweight framework that is now in common use is the "depth compression" framework. The framework firstly uses a network branch reduction technology to reduce the calculated amount, and then uses a network quantification technology to reduce the size of the model. The cascade design ensures compatibility with different compression methods, and has wide application in industry. However, this approach also emphasizes preservation of network performance. Therefore, to better balance hardware requirements and model performance, the framework may require repeating network pruning and network quantization tasks to search for a better compressed model, but this also causes additional compression time overhead. In industrial applications, lightweight approaches tailored to different hardware platforms are necessary. At this time, the network compression using the framework will generate a non-negligible time overhead. Therefore, considering model compression, performance preservation and compression time, the method is always the key point of network lightweight research and application.
The problems faced by this study are: (1) existing compression methods make compression decisions based on local information, i.e. only certain hardware requirements are considered and subsequent compression means are not considered. This not only increases the compression scheme that does not satisfy the compression condition in the compression and causes the compression to be performed again, but also outputs a sub-optimal solution in the middle of the compression, thereby causing the final output compression solution to be sub-optimal. (2) The existing compression framework only organizes a compression method based on a compression purpose, and does not schedule a compression task based on a hardware platform. In the general "depth compression" framework, network pruning is applied before network quantization. The method aims to reduce the calculated amount as much as possible by network branch reduction and reduce the size of the model as much as possible by network quantization. However, the reduction in the amount of calculation leads to a reduction in the performance of the model, and a framework designed only for pursuing a more lightweight model does not facilitate the full exploitation of the hardware performance in practice. (3) Finally, the "deep compression" framework achieves a potentially better performance-hardware balance point at the expense of time overhead. The root cause of these time overheads is caused by the step structure of the "deep compression" framework and the scheduling of compression tasks based on local information.
Therefore, how to provide a method for quickly and automatically reducing the weight of the deep convolutional neural network is a problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
In view of the above, the present invention provides a method for fast and automatically compressing a deep convolutional neural network, which not only avoids a large amount of artificial parameter adjustment, but also does not need to perform network pruning and network quantization separately to meet hardware requirements, and automatically predicts a reasonable bit number of each channel by using channel importance, thereby fast and automatically obtaining a compressed network.
In order to achieve the above purpose, the invention provides the following technical scheme:
a method for quickly and automatically compressing a deep convolutional neural network comprises the following steps: aiming at an input optimization target and a model to be optimized, searching to find an optimal model quantization scheme, wherein the model quantization scheme searching is a multi-round searching process. Firstly, initializing a reinforcement learning agent and an environment according to a model to be optimized and an optimization target, and then searching an optimal model quantization scheme, wherein the specific search process is as follows:
s1, single-round searching process:
s11: for each compressible layer, calculating a zero-bit channel index and a channel minimum bit number according to the channel importance;
S12: preliminarily calculating a current model quantization scheme according to a zero-bit channel index;
s13: calculating the size of a quantized model according to the quantization scheme of the current model;
s14: if the current compressible layer is the last layer and the current model quantization scheme does not meet the optimization target, circularly adjusting the current model quantization scheme according to the channel minimum bit number until the optimization target is met or quantization cannot be performed;
if the current compressible layer is not the last layer or the current model quantization scheme meets the optimization target, evaluating the current layer quantization scheme and storing relevant environmental parameters;
s15: continuing the search condition, judging whether to finish the current round of search, and further judging whether to finish the current round of search and whether to update the optimal model quantization scheme based on the current layer quantization scheme evaluation result;
s2, multi-round searching process: and S1 is repeatedly executed, after the number of current search rounds reaches the required number of search rounds, the model quantization scheme search is finished, and the optimal compression model aiming at the optimization target is output.
Further, the input model to be optimized includes N sets of quantifiable layers N ═ L1,...,Li,...,LnWhere I is 1 … n, a set of quantisable layer input channel numbers I is { I } 1,...,InSet of storage space required by the quantizeable layer LS ═ LS1,...,LSn};
The optimization target describes the requirements of the size, the search time and the like of the compressed model, and comprises the maximum compressed bit number bitmaxNumber of search roundsepiscodes, model size compression ratio sc ∈ (0, 1)]Initializing TOP-5 accuracy acc (N) of a model to be optimized; optimum evaluation result RbestInitialized to- ∞, optimal model quantization scheme PbestIs phi;
when the reinforcement learning environment is initialized, the reinforcement learning state siIs defined as (idx, t, out, in, w, h, stride, k, rededFLOPs, resFLOPs, rededSize, restSize, a)i-1) Wherein idx is a layer index; t is a layer type, and comprises a convolution layer and a full-connection layer; out is the number of output channels, and in is the number of input channels; w and h are the width and height of the input feature vector; stride and k are the step length of convolution layer convolution operation and the side length of convolution kernel, and stride and k are both 1 for a full connection layer; redundedfrops is the amount of computation reduced by the current compression strategy, initialized to 0; restFLOPs are model residual calculated quantities, initialized to model calculated quantity NFLOPs(ii) a rededsize is the size of the model reduced by the current compression strategy and is initialized to 0; restSize is the model residual size, initialized to model size N size;ai-1Is the previous compressible layer sparsity, initialized to 0;
the reinforcement learning agent comprises an actor network theta, an evaluator network mu and environmental noise sigma;
the current search round number curreps is initialized to 0.
Further, determining a zero-bit channel index according to the channel importance, the specific steps include:
s111, creating a channel index sequence CI ═ p ═ 1, and I for the I-th layer according to the quantization layer input channel number seti},IiInputting the number of channels for the ith quantizer layer;
s112, according to the channel index CI, for each channel c of the ith layer, according to the channel importance
Figure BDA0002577587620000051
Sequencing the index sequence CI in ascending order to obtain a post-sequencing index sequence CI',
the importance of the ith channel is defined as
Figure BDA0002577587620000052
Wherein, Ii+1The number of input channels for the i +1 th quantisable layer,
Figure BDA0002577587620000053
is the ith filter weight, which is the ith filter weight
Figure BDA0002577587620000054
Obtaining;
Figure BDA0002577587620000055
for channel importance of i-th layer, imp ═ impiI 1.. n } is the importance of the model channel, and n is the quantifiable layer number;
s113, inputting the channel number I according to the quantization layeriCalculating the number of the zero bit channels of the layer,
Figure BDA0002577587620000057
wherein min () and max () are functions for solving the minimum value and the maximum value in the input sequence, and the layer sparsity
Figure BDA0002577587620000056
Wherein s isiIn the reinforcement learning state, θ is the actor network, μ is the evaluator network, σ is the environmental noise, a iIs in the value range of (0, 1)];
S114, calculating the zero-bit channel index based on the ascending ordered index sequence CI' and the layer zero-bit channel number cp:
ZCI=topk(CI′,cp) (3)
wherein the topk (a, k) function returns the first k elements in the input sequence a.
Further, the channel minimum bit number minBW is calculated:
minBW={minBWi|i=1,...,n} (4)
Figure BDA0002577587620000061
Figure BDA0002577587620000062
wherein, minBWiIndicates the minimum bit number, minBW, of the i-th layer channeli cRepresents the minimum channel bit number, bit, of the ith layer of the c channelmaxFor the maximum number of bits of the model after compression,
Figure BDA0002577587620000063
the ith channel importance.
Further, the specific step of S12 is as follows:
s121, initializing the quantization scheme P of the layeri={bitmax×1k|k=1,...,Ii},bitmaxFor the maximum number of bits of the compressed model, IiThe number of input channels of the ith layer is;
s122, updating the layer of channel level quantization scheme Pi
For the ith channel of the ith layer in P, the quantization bit number of the channel
Figure BDA0002577587620000064
Is defined as the formula
Figure BDA0002577587620000065
Wherein ZCI is zero-bit channel index, and c has a value range of [1, Ii],IiThe number of input channels of the ith layer.
S123, the updated current layer quantization scheme PiSave to model channel level quantization scheme P ═ { Pk|k=1,...,i}。
Further, the model size N 'after quantization of S13'sizeThe specific calculation formula is as follows:
Figure BDA0002577587620000071
wherein the content of the first and second substances,
Figure BDA0002577587620000072
for the c-th filter weight of the k-th layer, according to the k-th quantized layer weight
Figure BDA0002577587620000073
Obtaining;
Figure BDA0002577587620000074
for the channel quantization bit number of the c channel at the kth layer in the channel quantization scheme P, for the layers which are not quantized, i.e., the (i + 1) th layer to the nth layer, the channel quantization bit numbers of all the channels are determined by the model to be optimized, and are generally 32; i isiThe number of the input channels of the ith layer is the number of the input channels of the ith layer.
Further, if the current compressible layer is the last layer and the current model quantization scheme does not satisfy the optimization, i.e., i ═ N and N'size>sc×NsizeWhere N is the quantifiable number of layers, N'sizeFor the current model size, sc is the model size compression ratio, NsizeIf the model size is to be compressed, the current model quantization scheme is adjusted, and the specific adjustment steps and termination conditions are as follows:
s141, determining a layer i for quantization scheme adjustment, wherein the adjustment layer i is defined as the following formula
Figure BDA0002577587620000075
Wherein LSiFor the ith layer size of the model to be optimized, the sequence of layers QL may be initialized to QL ═ { 1., N }, if QL ═ Φ or N'size≤sc×NsizeIf the quantization cannot be performed or the optimization target is met, stopping adjusting the current model quantization scheme;
s142, calculating the quantization channel index set of the ith layer
Figure BDA0002577587620000076
Wherein
Figure BDA0002577587620000077
For the channel quantization bit number of the c-th channel of the k-th layer in the model channel quantization scheme P,
Figure BDA0002577587620000078
is the minimum bit number of the channel of the ith layer of the c channel, I iThe number of input channels of the ith layer is; if QC is empty, remove i from QL, carry on S141 again, QL is the above-mentioned adjustable layer sequence;
s143, determining an adjusting channel c in the ith layer for quantization scheme adjustment, wherein the adjusting channel c is defined as the following formula
Figure BDA0002577587620000079
Where QC is the ith layer quantization index set,
Figure BDA0002577587620000081
the ith channel importance;
s144, adjusting the channel quantization bit number of the ith channel in the ith layer of the c channel in the model channel level quantization scheme P
Figure BDA0002577587620000082
The adjustment is defined as the following formula
Figure BDA0002577587620000083
Wherein the content of the first and second substances,
Figure BDA0002577587620000084
the minimum bit number of the c channel of the ith layer is adjusted
Figure BDA0002577587620000085
Saving to the model channel level quantization scheme P;
s145, updating the current model size N 'according to the post-adjustment model channel level quantization scheme P and the formula (8)'size
Further, the specific process of evaluating the current layer quantization scheme and storing the relevant environmental parameters is as follows:
the compression ratio sc is reduced by the size of the model, and the size N of the model to be compressedsizeNumber of layers N can be quantized, previous model N quantized, current model size N'sizeAnd, knowing the quantized model N', evaluating to obtain an evaluation result r of the quantization scheme of the current layer, wherein the evaluation result r is defined by the following formula:
Figure BDA0002577587620000086
wherein, acc (N) is the top-5 accuracy of the model to be optimized on the data of the verification set, and acc (N') is the top-5 accuracy of the quantized model on the data of the same verification set;
Calculating the current environment state s according to the reinforcement learning environment stateiPreservation of r, si
Updating the parameters of the reinforcement learning agent actor network theta and the evaluator network mu.
Further, the process of continuously searching the conditions and outputting the optimal model quantization scheme comprises the following steps:
s151: judging whether the current compressible layer is the last layer or not, if the current layer is not the last layer, i is not equal to n, entering the next layer, i is i +1, and the searching is not finished in the round, wherein i is the current layer number, and n is the quantifiable layer number of the model;
s152: on the contrary, if the current layer quantization scheme evaluation result is optimal, namely R is more than or equal to RbestThen the current model quantization scheme is saved as the optimal model quantization scheme, i.e. PbestP, and ending the current search round, i.e. the current search round number curreps is currpes + 1; otherwise, directly ending the search in the current round; wherein R isbestFor the optimal evaluation result, r is the evaluation result of the quantization scheme of the current layer model, PbestFor the optimal compression scheme, P is the current model quantization scheme.
Further, the multi-round searching process is as follows:
according to the search round number episodes, the current search round number currs does not reach the required search round number, namely currs < episodes, and S1 is repeatedly executed, otherwise, the model quantization scheme Finishing the search; finally, according to the obtained optimal compression scheme PbestAnd outputting the optimal compression model for the optimization target.
According to the technical scheme, compared with the prior art, the invention discloses a method for quickly and automatically compressing the deep convolutional neural network, which takes a trained model to be optimized (namely the deep convolutional neural network) and an optimization target (namely the performance requirement of a hardware platform) as input, performs network compression through a lightweight method, and outputs a compressed network which can maximally keep the performance of the original network under the condition of not violating the hardware requirement.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are only embodiments of the invention, and that for a person skilled in the art, other drawings can be obtained from the provided drawings without inventive effort.
FIG. 1 is a flow chart of the method for fast and automatic compression of deep convolutional neural network according to the present invention.
FIG. 2 is a flow chart of an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a method for quickly and automatically compressing a network of a deep convolutional neural network, which is characterized in that as shown in figure 1, the method for quickly and automatically compressing the deep convolutional neural network performs compression task scheduling under the control of global information by combining network branch reduction and network quantization into one compression step.
Referring to fig. 2, the network to be compressed in this embodiment is mobilene _ v1, the hardware requirement is to compress the size of the model to be below the original 1/5 and keep the accuracy of the model top-5 within the loss range of 2%, the maximum bit number of the quantized model is 8, and the mixing precision is supported, that is, the bit number of each reserved channel can be between 1 and 8. The verification dataset is ImageNet, the hardware is GPU GTX1080 pieces, and Python language and PyTorch development framework are used.
Step 1: obtaining N quantized layer sets N ═ L according to input network to be compressed1,...,Li,...,LnWhere I is 1 … n, the number of input channels I of the n quantifiable layers I1,...,InIn the same layer, the size of each layer LS is { LS ═ LS }1,...,LSn}. According to the hardware requirement, obtaining the maximum compressed bit number bitmaxSearching the round numbers episcodes, constructing a reinforcement learning environment, and initializing an actor network theta and an evaluator network mu in the reinforcement learning agent. Here, bitmax8, epsilon 100; theta and mu are neural networks comprising 2 hidden layers, each hidden layer comprises 300 neurons, the learning rate of theta is 0.001, and the learning rate of mu is 0.0001; the warm-up rounds of reinforcement learning agents are 40 rounds, the sampling size is 64, the buffer size of each layer is 10, and the discount factor is 1.0.
Step 2: and (3) judging whether the search is finished or not by using the total search round number episodies obtained in the step (1) and knowing the current exploration times currep. If curve is larger than or equal to episodes, searching is completed, and the optimal compressed network N ' ═ L ' is output '1,...,L′i,...,L′nWhere i is 1, …, n; otherwise, step 3 is carried out. n is the number of quantifiable layers in the network to be compressed, and the method does not delete the whole quantifiable layer.
And step 3: predicting the sparsity a of the current layer according to a formula (1) by using the actor network theta and the evaluator network mu in the reinforcement learning agent initialized in the step 1 i
Figure BDA0002577587620000111
Wherein s isiIn the reinforcement learning state, σ is the environmental noise in the reinforcement learning in the search environment, and is initialized to 0.95 and gradually decays to 0. a isiIs in the value range of (0, 1)]. In this example, a is the input layer and the convolution layer having a convolution kernel size of 3iIs 1.0.
And 4, step 4: the number I of the (I + 1) th layer input channels obtained in the step 1i+1Then the weight value L of the layeriCan be written
Figure BDA0002577587620000112
Where F is the layer filter, k is the filter kernel size and k is 1 when the i-th layer is a fully connected layer. The model channel importance is defined as imp ═ { impi | i ═ 1.., n }, then the ith layer channel importance is
Figure BDA0002577587620000113
Where the c channel importance impcCalculated using equation (2).
Figure BDA0002577587620000114
Wherein the content of the first and second substances,
Figure BDA0002577587620000115
and the filter weight is the v-th filter weight corresponding to the c-th channel.
And 5: using the maximum bit number bit of the compressed model obtained in the step 1 and the step 4maxAnd ith layer channel importance impiDefining the minimum bit number of the model channel as minBW ═ minBWiI 1., n }, then it can be calculated
Figure BDA0002577587620000116
Wherein, the minimum bit number of the channel of the ith layer of the c channel
Figure BDA0002577587620000117
Calculated using equation (3).
Figure BDA0002577587620000118
Wherein
Figure BDA0002577587620000119
Is an integer, the value range in this example is [1, 8 ].
Step 6: the ith layer setting is computed as the zero-bit channel index.
(I) Using the number of channels I in the layer obtained in step 1 iCreating a channel index sequence
CI={p|p=1,...,Ii};
(II) using the layer channel importance imp obtained in step 4 and step 6(I)iAnd the channel index sequence CI, and the index sequence CI is sorted according to the ascending order to obtain a sorted ascending index sequence CI';
(III) Using the number of channels I in the layer obtained in step 1 and step 3iAnd layer sparsity aiThe layer zero bit lane number cp is calculated using equation 4.
Figure BDA0002577587620000122
Wherein min (), max () are functions for solving the minimum value and the maximum value in the input sequence;
(IV) using the ascending ordered index sequence CI' obtained in steps 6(II) and 6(III) and the layered zero-bit lane number cp, the zero-bit lane index ZCI is obtained using equation (5).
ZCI=topk(CI′,cp) (5)
Where the topk (a, k) function returns the first k elements in the input sequence a.
And 7: a current layer channel quantization scheme is calculated.
(I) Using the maximum bit number bit of the compressed model obtained in the step 1maxAnd the number of input channels I of the layeriInitializing the layer channel level quantization scheme Pi={bitmax×1k|k=1,...,Ii}. Bit in this embodimentmaxIs 8.
(II) Using the zero-bit channel index ZCI and the channel-level quantization scheme P obtained in step 6 and step 7(I)iUpdating the layer channel-level quantization scheme P according to equation (6)i
Figure BDA0002577587620000121
(III) quantizing scheme P of current layer after updatingiSave to model channel level quantization scheme
P={Pi|i=1,...,n}。
And 8: quantizing scheme P for current layer channel level obtained in step 7iUpdating and maintaining an ambient state siWherein s isiThe tuple definition described by equation (7) is used. If the current layer is not the last layer, the evaluation result r is counted to be 0, and the step 3 is skipped; otherwise, entering step 9;
(idx,t,out,in,w,h,stride,k,reducedFLOPs,resFLOPs,reducedSize,restSize,ai-1) (7)
in the formula (7), idx is a layer index, t is a layer type, and there are two types, namely a convolutional layer and a fully-connected layer. out is the output channel and in is the input channel. w and h are the width and height of the input feature vector. stride and k are the step length of convolution layer convolution operation and the side length of convolution kernel, and stride and k are both 1 for the fully connected layer. reducedFLOPs is the amount of computation that the current compression strategy reduces, restFLOPs is the amount of model residual computation. rededsize is the model size that the current compression strategy reduces, and restSize is the model residual size. a isi-1Is the previous compressible layer sparsity.
And step 9: according to the quantization layer set N obtained in the step 1 and the step 7, the number I of input channels of each layer and the quantization scheme P of the current model channel level, the ith quantization layer is known
Figure BDA0002577587620000131
Where F is the layer filter. Then use equation (8) according toP update Current model size N'size
Figure BDA0002577587620000132
Wherein, IkThe number of channels in the k-th layer,
Figure BDA0002577587620000133
for the bit number of the c-th channel of the k-th layer in the model channel-level quantization scheme P,
Figure BDA0002577587620000134
Is the kth quantization layer LkThe c-th filter weight.
Step 10: according to the hardware condition requirements of the model size compression ratio sc in the step 1 and the step 9, the size N of the model to be compressedsizeCurrent model size N'sizeAnd judging whether the current model channel-level quantization scheme needs to be adjusted or not. N 'if the hardware condition is satisfied'size≤sc×NsizeThen, r is estimated using equation (9), and the environmental state s is preservediUpdating parameters of the reinforcement learning agent actor network theta and the evaluator network mu, and skipping to the step 2; otherwise, entering step 11;
Figure BDA0002577587620000135
wherein, acc (M) is top-5 accuracy rate which can be obtained by inputting the model M on the verification set. In this embodiment, the verification data set is ImageNet, 20000 pictures 224 × 224, and the Batch size is 60.
Step 11: the model channel-level quantization scheme is greedily adjusted.
(I) Determining a layer i for quantization scheme adjustment according to the size LS of each layer of the model obtained in step 1 by using formula (10):
Figure BDA0002577587620000141
(II) according to the step 1,number of I-th layer channels I in step 5 and step 7(I)iThe minimum channel bit number of the ith channel
Figure BDA0002577587620000142
And channel level quantization scheme PiObtaining a set of quantized channel indices
Figure BDA0002577587620000143
(III) calculating the importance imp of the ith layer channel according to the step 4 and the step 11 iAnd the quantization index set QC of the layer, and the channel c of the ith layer for bit number adjustment is determined by using formula 11.
Figure BDA0002577587620000144
(IV) calculating the minimum bit number minBW of the i-th layer channel according to the step 5iAnd (4) updating the layer quantization scheme by using the formula (12) and saving the layer quantization scheme to the layer P, the layer I and the channel c selected in the step 11(I) and the step II and the current quantization scheme P.
Figure BDA0002577587620000145
(V) jumping to step 9 to update the current model size N 'according to P'size
Step 12: for each compressible layer, obtaining a channel needing to be set to a zero bit; the minimum number of bits to which the channel can be quantized without being set to zero is then calculated. And finally, saving the current layer state and counting the middle layer evaluation result to be 0. When the last layer is processed, if the current channel quantization scheme meets the hardware requirement, evaluating the current channel compression scheme; if the current channel quantization scheme does not meet the hardware requirement, greedily adjusting the current channel quantization scheme to enable the current channel quantization scheme to meet the hardware requirement; after the evaluation is finished, adjusting the parameters of the reinforcement learning agent according to the evaluation result and the state; and training the reinforcement learning agent to correctly adjust the bit number of each channel through multi-round layer-by-layer search, and finally obtaining the optimal compression network.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for quickly and automatically compressing a deep convolutional neural network is characterized by comprising the steps of inputting an optimization target and a model to be optimized, initializing a reinforcement learning agent and an environment according to the model to be optimized and the optimization target, and searching an optimal model quantization scheme, wherein the specific search process comprises the following steps:
s1 single round search procedure:
s11: calculating a zero-bit channel index and a channel minimum bit number according to the channel importance;
s12: determining a current model quantization scheme according to the zero-bit channel index;
s13: calculating a post-quantization model size based on the current model quantization scheme;
s14: if the current compressible layer is the last layer and the current model quantization scheme does not meet the optimization target, circularly adjusting the current model quantization scheme according to the channel minimum bit number until the optimization target is met or quantization cannot be performed;
If the current compressible layer is not the last layer or the current model quantization scheme meets the optimization target, evaluating the current layer quantization scheme based on the quantized model size and storing relevant environmental parameters;
s15: continuing the search condition, judging whether to finish the current round of search, and further judging whether to finish the current round of search and whether to update the optimal model quantization scheme based on the current layer quantization scheme evaluation result;
s2 multiple search process: and S1 is repeatedly executed until the current search round number reaches the required search round number, the model quantization scheme search is finished, and the optimal compression model is output.
2. The method according to claim 1, wherein the model to be optimized comprises a quantization layer set N ═ { L ═ L1,...,Li,...,LnWhere I is 1 … n, n denotes the number of quantisable layers, and the set of quantisable layer input channel numbers I is { I }1,...,InThe required storage space set LS for quantized layers1,...,LSn};
The optimization objective comprises a compressed model maximum bit rate bitmaxSearch round number episcodes and model size compression ratio sc ∈ (0, 1)]Initializing TOP-5 accuracy acc (N) of a model to be optimized and optimal evaluation result RbestAnd an optimal model quantization scheme P best
When the reinforcement learning environment is initialized, the reinforcement learning state siIs defined as (idx, t, out, in, w, h, stride, k, rededFLOPs, resFLOPs, rededSize, restSize, a)i-1) Wherein idx is a layer index, t is a layer type including a convolutional layer and a fully-connected layer, out is the number of output channels, in is the number of input channels, w and h are the width and height of input feature vectors, stride and k are the step length of convolutional layer convolution operation and the side length of a convolutional kernel, stride and k are both 1 in the fully-connected layer, rededFLOPs are the calculated quantity reduced by the current compression strategy and are initialized to 0, restFLOPs are the model residual calculated quantity and are initialized to the model calculated quantity NFLOPsreducedSize is the model size reduced by the current compression strategy, initialized to 0, restSize isModel residual size, initialized to model size Nsize,ai-1For the previous compressible layer sparsity, initialize to 0;
the reinforcement learning agent includes an actor network θ, an evaluator network μ, and an environmental noise σ.
3. The method of claim 2, wherein the zero-bit channel index is determined according to channel importance, and the method comprises the following specific steps:
s111, creating a channel index sequence CI ═ p ═ 1, and I for the I-th layer according to the quantization layer input channel number set i},IiInputting the channel number for the ith quantized layer;
s112, according to the importance of each channel c of the ith layer
Figure FDA0002577587610000021
Sequencing the channel index sequences CI in an ascending order to obtain an index sequence CI' after the ascending order;
wherein the ith layer is the importance of the c channel
Figure FDA0002577587610000022
In the formula Ii+1The number of input channels for the i +1 th quantisable layer,
Figure FDA0002577587610000023
is the ith filter weight, which is the ith filter weight
Figure FDA0002577587610000024
Obtaining;
Figure FDA0002577587610000025
for channel importance of i-th layer, imp ═ impiI 1.. n } is the importance of the model channel, and n is the quantifiable layer number;
s113, inputting the channel number I according to the quantization layeriCalculating the channel number cp of the layer zero bit,
Figure FDA0002577587610000035
wherein min () and max () are functions for solving the minimum value and the maximum value in the input sequence, and the layer sparsity
Figure FDA0002577587610000031
siIn the reinforcement learning state, θ is the actor network, μ is the evaluator network, σ is the environmental noise, aiIs in the value range of (0, 1)];
S114, calculating a zero-bit channel index ZCI based on the ascending-ordered index sequence CI' and the zero-bit channel number cp:
ZCI=topk(CI′,cp) (3)。
4. the method according to claim 3, wherein the minimum number of channel bits minBW is calculated by the following formula:
minBW={minBWi|i=1,...,n} (4)
Figure FDA0002577587610000032
Figure FDA0002577587610000033
wherein, minBWiIndicates the minimum bit number, minBW, of the i-th layer channel i cRepresents the minimum channel bit number, bit, of the ith layer of the c channelmaxFor the maximum number of bits of the model after compression,
Figure FDA0002577587610000034
the ith channel importance.
5. The method for fast and automatically compressing the deep convolutional neural network as claimed in claim 4, wherein the step S12 is as follows:
s121, initializing a channel level quantization scheme Pi={bitmax×1k|k=1,...,Ii},bitmaxFor the maximum number of bits of the compressed model, IiThe number of input channels of the ith layer is;
s122, updating the channel quantization bit number according to the zero-bit channel index
Figure FDA0002577587610000041
Further obtaining the updated current layer quantization scheme PiThe calculation formula is as follows:
Figure FDA0002577587610000042
wherein ZCI is zero-bit channel index, and c has a value range of [1, Ii],IiThe number of input channels of the ith layer is;
s123, saving the updated current-layer quantization scheme Pi to the model channel-level quantization scheme P ═ Pk1.,. i }.
6. The method of claim 5, wherein the model size N 'after S13 quantization'sizeThe specific calculation formula is as follows:
Figure FDA0002577587610000043
wherein the content of the first and second substances,
Figure FDA0002577587610000044
for the c-th filter weight of the k-th layer, according to the k-th quantized layer weight
Figure FDA0002577587610000045
So as to obtain the compound with the characteristics of,
Figure FDA0002577587610000046
for the channel quantization bit number of the c channel at the k layer in the model channel level quantization scheme P, the (I + 1) th layer to the n layer are layers which are not quantized, the channel quantization bit numbers of all the channels are determined by the model to be optimized and take the value of 32, I iThe number of input channels of the ith layer.
7. The method of claim 6, wherein if the current compressible layer is the last layer and the current model quantization scheme does not satisfy the optimization, i ═ N and N'size>sc×NsizeN is the quantifiable number of layers, N'sizeFor quantifying the model size, sc is the model size compression ratio, NsizeIf the model size to be compressed is, adjusting the current model quantization scheme, wherein the specific adjustment steps and termination conditions are as follows:
s141, determining an adjustment layer for quantization schemei
Figure FDA0002577587610000051
Wherein LSiFor the ith layer size of the model to be optimized, the sequence of layers QL may be initialized to QL ═ { 1., N }, if QL ═ Φ or N'size≤sc×NsizeIf the quantization can not be carried out or the optimization target is met, stopping adjusting the current model quantization scheme;
s142, calculating a quantization channel index set of the adjustment layer i
Figure FDA0002577587610000052
Wherein
Figure FDA0002577587610000053
For the channel quantization bit number of the c-th channel of the k-th layer in the model channel-level quantization scheme P,
Figure FDA0002577587610000054
is the minimum bit number of the channel of the ith layer of the c channel, IiThe number of input channels of the ith layer,
if QC is empty, remove the adjustable layer i from QL, carry on S141 again, QL is the adjustable layer sequence;
s143, determining an adjustment channel c in the ith layer for quantization scheme adjustment, where the adjustment channel c is defined as the following formula:
Figure FDA0002577587610000055
Where QC is the ith layer quantization index set,
Figure FDA0002577587610000056
the ith channel importance;
s144, adjusting the channel quantization bit number of the ith channel in the ith layer of the c channel in the model channel level quantization scheme P
Figure FDA0002577587610000057
The adjustment is defined as the following formula:
Figure FDA0002577587610000058
wherein the content of the first and second substances,
Figure FDA00025775876100000510
the minimum bit number of the c channel of the ith layer is adjusted
Figure FDA0002577587610000059
Saving to the model channel level quantization scheme P;
s145, updating the size N 'of the current model according to the post-adjustment model channel level quantization scheme P'size
8. The method of claim 7, wherein the specific process of evaluating the quantization scheme of the current layer and saving the relevant environmental parameters comprises:
according to the compression ratio sc of the size of the model, the size N of the model to be compressedsizeThe number of layers N, the model N to be optimized, the current model size N 'can be quantized'sizeAnd a known quantized model N' is evaluated to obtain an evaluation result r of the quantization scheme of the current layer, and the specific calculation formula is as follows:
Figure FDA0002577587610000061
wherein, acc (N) is the top-5 accuracy of the model to be optimized on the data of the verification set, and acc (N') is the top-5 accuracy of the quantized model on the data of the same verification set;
calculating the current environment state s according to the reinforcement learning environment stateiPreservation of r, si
Updating the parameters of the reinforcement learning agent actor network theta and the evaluator network mu.
9. The method according to claim 8, wherein the S15 specifically includes:
s151: judging whether the current compressible layer is the last layer or not, if the current layer is not the last layer, i is not equal to n, entering the next layer i which is i +1, and the searching is not finished in the round, wherein i is the current layer number, and n is the model quantifiable layer number;
s152: otherwise, judging whether the evaluation result of the quantization scheme of the current layer is optimal or not, if so, R is more than or equal to RbestSaving the current model quantization scheme as an optimal model quantization scheme, PbestIf the search result is P, the search of the current round is ended, and if the search result is not P, the search of the current round is ended directly; wherein R isbestFor the optimal evaluation result, r is the evaluation result of the quantization scheme of the current layer model, PbestFor the optimal compression scheme, P is the current model quantization scheme.
10. The method of claim 9, wherein the multiple rounds of search processes are as follows:
judging whether the current search round number curreps meets curreps < epsilon foods or not according to the search round number epsilon foods or not, and not meeting the requirement of repeatedly executing the S1, otherwise, finishing the model quantization scheme search and according to the optimal compression scheme P bestAnd outputting the optimal compression model aiming at the optimization target.
CN202010659862.0A 2020-07-09 2020-07-09 Rapid automatic compression method for deep convolutional neural network Pending CN111860779A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010659862.0A CN111860779A (en) 2020-07-09 2020-07-09 Rapid automatic compression method for deep convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010659862.0A CN111860779A (en) 2020-07-09 2020-07-09 Rapid automatic compression method for deep convolutional neural network

Publications (1)

Publication Number Publication Date
CN111860779A true CN111860779A (en) 2020-10-30

Family

ID=73153670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010659862.0A Pending CN111860779A (en) 2020-07-09 2020-07-09 Rapid automatic compression method for deep convolutional neural network

Country Status (1)

Country Link
CN (1) CN111860779A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329923A (en) * 2020-11-24 2021-02-05 杭州海康威视数字技术股份有限公司 Model compression method and device, electronic equipment and readable storage medium
CN113282535A (en) * 2021-05-25 2021-08-20 北京市商汤科技开发有限公司 Quantization processing method and device and quantization processing chip
CN113627593A (en) * 2021-08-04 2021-11-09 西北工业大学 Automatic quantification method of target detection model fast R-CNN
CN113657592A (en) * 2021-07-29 2021-11-16 中国科学院软件研究所 Software-defined satellite self-adaptive pruning model compression method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329923A (en) * 2020-11-24 2021-02-05 杭州海康威视数字技术股份有限公司 Model compression method and device, electronic equipment and readable storage medium
CN112329923B (en) * 2020-11-24 2024-05-28 杭州海康威视数字技术股份有限公司 Model compression method and device, electronic equipment and readable storage medium
CN113282535A (en) * 2021-05-25 2021-08-20 北京市商汤科技开发有限公司 Quantization processing method and device and quantization processing chip
CN113282535B (en) * 2021-05-25 2022-11-25 北京市商汤科技开发有限公司 Quantization processing method and device and quantization processing chip
CN113657592A (en) * 2021-07-29 2021-11-16 中国科学院软件研究所 Software-defined satellite self-adaptive pruning model compression method
CN113657592B (en) * 2021-07-29 2024-03-05 中国科学院软件研究所 Software-defined satellite self-adaptive pruning model compression method
CN113627593A (en) * 2021-08-04 2021-11-09 西北工业大学 Automatic quantification method of target detection model fast R-CNN
CN113627593B (en) * 2021-08-04 2024-06-04 西北工业大学 Automatic quantization method for target detection model Faster R-CNN

Similar Documents

Publication Publication Date Title
CN111860779A (en) Rapid automatic compression method for deep convolutional neural network
CN109002889B (en) Adaptive iterative convolution neural network model compression method
WO2021185125A1 (en) Fixed-point method and apparatus for neural network
CN112052951B (en) Pruning neural network method, system, equipment and readable storage medium
CN112733964B (en) Convolutional neural network quantization method for reinforcement learning automatic perception weight distribution
CN111489364B (en) Medical image segmentation method based on lightweight full convolution neural network
CN112446491B (en) Real-time automatic quantification method and real-time automatic quantification system for neural network model
CN110969251A (en) Neural network model quantification method and device based on label-free data
CN113132723B (en) Image compression method and device
TWI761813B (en) Video analysis method and related model training methods, electronic device and storage medium thereof
CN113269312B (en) Model compression method and system combining quantization and pruning search
CN110647990A (en) Cutting method of deep convolutional neural network model based on grey correlation analysis
CN114943335A (en) Layer-by-layer optimization method of ternary neural network
CN112819050A (en) Knowledge distillation and image processing method, device, electronic equipment and storage medium
CN114154626B (en) Filter pruning method for image classification task
KR102454420B1 (en) Method and apparatus processing weight of artificial neural network for super resolution
CN114239799A (en) Efficient target detection method, device, medium and system
CN114140641A (en) Image classification-oriented multi-parameter self-adaptive heterogeneous parallel computing method
US6813390B2 (en) Scalable expandable system and method for optimizing a random system of algorithms for image quality
CN116306879A (en) Data processing method, device, electronic equipment and storage medium
CN113177627B (en) Optimization system, retraining system, method thereof, processor and readable medium
CN114118357A (en) Retraining method and system for replacing activation function in computer visual neural network
CN114548360A (en) Method for updating artificial neural network
CN112529350A (en) Developer recommendation method for cold start task
Wang et al. Exploring quantization in few-shot learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201030

RJ01 Rejection of invention patent application after publication