CN110020718A

CN110020718A - The layer-by-layer neural networks pruning method and system inferred based on variation

Info

Publication number: CN110020718A
Application number: CN201910195272.4A
Authority: CN
Inventors: 王延峰; 周越夫; 张娅
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2019-03-14
Filing date: 2019-03-14
Publication date: 2019-07-16

Abstract

The present invention provides a kind of layer-by-layer neural networks pruning method and systems inferred based on variation to obtain the neural network for making an uproar this method comprises: infusing noise in neural network by sample mode；It is trained according to weight of the preset objective function to the neural network for making an uproar, the neural network weight after being trained and the neural network after training；The variation lower bound inferred according to variation, the corresponding noise parameter of multiplying property Gaussian noise of training injection, the noise parameter after being trained；Based on the noise parameter after the training, the neural network weight after training, corresponding neuron or convolution kernel in the neural network after the training is successively deleted by threshold function table.In method of the invention, the noise injected has incorporated the hierarchical relationship of neural network in the training process, so that fully considering the dependence of interlayer during beta pruning, then ensure that the output result of the neural network under height beta pruning still has robustness.

Description

The layer-by-layer neural networks pruning method and system inferred based on variation

Technical field

The present invention relates to depth learning technology fields, and in particular, to the layer-by-layer neural networks pruning inferred based on variation Method and system.

Background technique

In deep learning field, the powerful Function Fitting ability of neural network enables it ideally to complete such as image point The tasks such as class, object detection, speech recognition.However when the weight quantity of neural network is excessively huge, generally require to expend huge Big storage and computing cost, therefore be difficult to apply in the equipment that the computing resources such as mobile terminal, embedded board are limited.For reality The compression and acceleration of existing neural network, technical staff propose the schemes such as beta pruning, quantization, low-rank decomposition.

It is the mainstream scheme for carrying out neural network compression and acceleration currently, carrying out beta pruning processing to neural network, it is main Thought is to delete in neural network to influence minimum neuron or convolution kernel to output result.Based on this all kinds of algorithms, usually It is to be positioned using regular terms, Hessian matrix etc. and delete neuron or convolution kernel.But guaranteeing neural network precision feelings Under condition, realize that large-scale beta pruning remains the difficult point of research.Recently, some pruning algorithms have been embodied in precision and compression ratio Between it is well balanced, but mostly for didactic design, lack theories integration.As Bayesian frame is in deep learning field Gradually develop, variation is inferred as a kind of approximate Bayesian inference, starts to attract attention.Its good behaviour under particular task Behind, be complete mathematical theory basis.To assign this theoretical basis to pruning algorithms, newest some researchs are begun trying By injecting optimizable sparse noise to neural network, to realize the height beta pruning under low loss of significance.

But the interlayer relation in neural network is often had ignored, is then influenced to reduce difficulty in computation in the above method The compression ratio and precision of neural network.

Summary of the invention

For the defects in the prior art, the object of the present invention is to provide a kind of layer-by-layer neural networks inferred based on variation Pruning method and system.

In a first aspect, the embodiment of the present invention provides a kind of layer-by-layer neural networks pruning method inferred based on variation, comprising:

Noise is infused in neural network by sample mode, obtains the neural network for making an uproar；

It is trained according to weight of the preset objective function to the neural network for making an uproar, the neural network after being trained Neural network after weight and training；

The variation lower bound inferred according to variation, the corresponding noise parameter of multiplying property Gaussian noise of training injection, obtains Noise parameter after training；

Based on the noise parameter after the training, the neural network weight after training, institute is successively deleted by threshold function table Corresponding neuron or convolution kernel in neural network after stating training.

Optionally, noise is injected in neural network by sample mode, obtains the neural network for making an uproar, comprising:

Input in l layers of jth dimension of neural network is denoted as:It then enables defeated in l layers of jth dimension of neural network EnterMultiplied by noiseWherein,Indicate the noise injected in l layers of jth dimension,Follow following Gaussian Profile:

∈ follows standard gaussian distribution:

Wherein:ForCorresponding noise parameter.It is mutually indepedent for the noise of different dimension injections, it is optional Ground, if input is the multi-channel feature figure from convolutional layer,It can be defined as the characteristic pattern in jth channel.

Optionally, it is trained according to weight of the preset objective function to the neural network for making an uproar, after being trained Neural network after neural network weight and training, comprising:

Assuming that the weight of neural network is W, the noise of l layers of neural network injection is θ^l, then the power of the neural network Weight optimal solution W^*Calculation formula it is as follows:

Wherein: p (y | x, W, θ^l) indicate input data be x, weight W, injection noise be θ^lWhen, neural network output Value is the probability of y；D indicates the data set of training, and wherein x is input data, and y is label；Indicate that total L layers of neural network exists Output valve on y-dimension；Indicate output valve of total L layers of the neural network in the y ' dimension, the value range of y ' is total All dimensions of L layers of neural network.

Optionally, the variation lower bound inferred according to variation, the corresponding noise ginseng of the multiplying property Gaussian noise of training injection Number, the noise parameter after being trained, comprising:

The calculation formula for maximizing variation lower bound is as follows:

Wherein:It indicates to maximize variation lower bound, r^lIndicate l layers of noise parameter, J^lIndicate l layers Dimension is inputted,It indicatesIt arrivesKullback-Leibler divergence；Indicate noiseProbability density function,Indicate priori of l layers of the noise in jth dimension；It indicates to inject in l layers of jth dimension Noise；

The Gaussian prior distribution that induction sparsity is arranged is as follows:

Wherein:Indicate that set of real numbers, σ indicate standard deviation；

Noise parameter after being trained is as follows:

Wherein:Indicate r^lOptimal solution,Indicate r^lOptimal solution first dimension element,Indicate r^lIt is optimal The last one-dimensional element of solution, arg max indicate maximizing operation.

After the completion of trainingThe as approximate Posterior distrbutionp of noise, to guarantee that sparsity is successfully induced into the distribution, Selection uses minimum variance, such as σ on designed prior distribution²=0.01.

For the calculating convenient for variation lower bound, Kullback-Leibler divergence can be done to following abbreviation:

Optionally, layer-by-layer by threshold function table based on the noise parameter after the training, the neural network weight after training Corresponding neuron or convolution kernel in neural network after deleting the training, comprising:

After the completion of l layers of weight and the noise parameter training of neural network, pass through the modification injection of following threshold function table Noise:

Wherein,Indicate the noise injected in l layers of jth dimension,Indicate the jth dimension element of l layers of noise parameter,

Beta pruning processing is carried out to l layers of neuron or convolution kernel by modified injection noise, calculation formula is as follows:

Wherein: on the left of equationFor l layers after rarefaction of j-th of neuron or convolution kernel；On the right side of equationTable Show j-th of neuron or convolution kernel before rarefaction.

By the operation of above-mentioned rarefaction, the fractional weight of neural network will be set to 0, then corresponding input will be lost It abandons, is no longer participate in the operation of neural network, delete neuron or convolution kernel to realize, i.e. the effect of beta pruning.By from mind Start through network first tier, to the last terminates for one layer, the neural network that final output beta pruning is completed.Wherein, because it is designed Layer-by-layer training and beta pruning form, the noise injected incorporated the hierarchical relationship of neural network in the training process so that The dependence of interlayer is fully considered during beta pruning, then ensure that the output result of the neural network under height beta pruning still has Standby robustness.

Second aspect, the embodiment of the present invention provide a kind of layer-by-layer neural networks pruning system inferred based on variation, comprising: Processor and memory are stored with computer program in the memory, when the computer program is called by the processor, The processor executes the layer-by-layer neural networks pruning method inferred based on variation as described in any one of first aspect.

Compared with prior art, the present invention have it is following the utility model has the advantages that

The layer-by-layer neural networks pruning method and system provided by the invention inferred based on variation is trained and is cut using layer-by-layer The mode of branch, combining between neural network middle layer has the characteristics that dependence, can be by each interlayer relation in training completion It is dissolved into noise parameter, to more precisely position the neuron or convolution kernel of redundancy, ensure that refreshing under height beta pruning Output result through network still has robustness.

Detailed description of the invention

Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention, Objects and advantages will become more apparent upon:

Fig. 1 is the schematic illustration of the layer-by-layer neural networks pruning method provided by the invention inferred based on variation；

Fig. 2 is the structural schematic diagram of the layer-by-layer neural networks pruning device provided by the invention inferred based on variation.

Specific embodiment

The present invention is described in detail combined with specific embodiments below.Following embodiment will be helpful to the technology of this field Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill of this field For personnel, without departing from the inventive concept of the premise, several changes and improvements can also be made.These belong to the present invention Protection scope.

As shown in Figure 1, for the flow chart for the layer-by-layer neural networks pruning embodiment of the method inferred the present invention is based on variation, it should Designed Gaussian noise is injected neural network by injection noise step by method, and is walked respectively in neural network weight training Suddenly it and in noise parameter training step, for particular task training network weight and noise parameter, is finally utilized in beta pruning step The noise parameter that training is completed realizes beta pruning by threshold function table.Above-mentioned process is simultaneously successively repeated since neural network first layer It carries out, to the last terminates for one layer.

The present invention can realize the height beta pruning of neural network under low loss of significance.Because being inferred based on variation, this hair It is bright to absorb the complete theoretical basis of Bayesian frame, simultaneously because the mode of designed layer-by-layer training and beta pruning, in theory On the basis of combine between neural network middle layer and have the characteristics that dependence, this interlayer relation can be dissolved into when training is completed In noise parameter, the neuron or convolution kernel of redundancy are more precisely positioned.

Specifically, referring to Fig.1, described method includes following steps:

Injection noise step: designing a kind of Gaussian noise of multiplying property, injects neural network by sampling；

Neural network weight training step: according to the specific objective function training neural network weight of task；

Noise parameter training step: the parameter of injected noise is trained according to the variation lower bound that variation is inferred；

Beta pruning step: the noise parameter completed according to training deletes phase in trained neural network by threshold function table The neuron or convolution kernel answered.

Corresponding to the above method, referring to Fig. 2, a kind of layer-by-layer neural networks pruning device inferred based on variation, comprising:

Injection noise module: designing a kind of Gaussian noise of multiplying property, injects neural network by sampling；

Neural network weight training module: according to the specific objective function training neural network weight of task；

Noise parameter training module: the parameter of injected noise is trained according to the variation lower bound that variation is inferred；

Pruning module: the noise parameter completed according to training deletes phase in trained neural network by threshold function table The neuron or convolution kernel answered.

The specific implementation of above-mentioned each step and module is described in detail below, to understand the technology of the present invention side Case.

In section Example of the present invention, the injection noise step, in which: for the random layer of neural network, design A kind of multiplying property Gaussian noise inputted with this layer with dimension, when neural network is calculated, which will be by sampling with phase The mode multiplied injects the input of this layer.For that designed noise can be converted into row equivalent parametrization convenient for sampling.

In section Example of the present invention, the neural network weight training step, in which: be directed to any training data Collection, the training objective of neural network weight are the probability for maximizing known input data and exporting corresponding label.

In section Example of the present invention, the noise parameter training step, in which: inferred using variation and change is calculated Divide lower bound, the training objective of noise parameter is to maximize the variation lower bound.The noise that training is completed is the approximation of its Posterior distrbutionp Solution.

In section Example of the present invention, the beta pruning step is specific as follows: the noise inputs threshold value that training is completed Function obtains new noise, then realizes beta pruning with the neural network weight of new noise rarefaction respective layer in the form of multiplication Effect.The input of this layer will no longer be required to multiplied by noise after rarefaction operation.

As shown in Fig. 2, beta pruning system is by injection noise module, neural network weight training module, noise parameter training mould Block, pruning module composition, whole system frame can be trained end-to-endly.

In the system framework of embodiment as shown in Figure 2, for l layers of neural network, Gaussian noiseThrough over-sampling It is multiplied by j-th of neuron or input channel of this layer afterwards, noise is distributed as

Wherein,For noise parameter, and it is mutually indepedent for different j.

In the system framework of embodiment as shown in Figure 2, the objective function for training noise parameter is

Wherein, D is training dataset, and x is input data, and y is label, and W is neural network weight, θ^lForSet, J^l For l layers of input dimension or port number, Kullback-Leibler divergence calculation formula is

Then the training objective of noise parameter is

In the system framework of embodiment shown in Fig. 2, the objective function of training neural network weight is training data Maximum likelihood, calculation formula are

Wherein,The output valve signed for the L layer neural network trained in y category.The then instruction of neural network weight Practicing target is

The training of noise parameter starts from neural network first layer, works as objective functionWhen convergence, terminate to current The update of the noise parameter of layer, and it is passed through into following threshold function table:

Further, due toIt will be no longer stochastic variable by threshold function table assignment as above, so do not need past Through over-sampling in training process afterwards.Particularly, forThe case where, noise is directly replicated to 0, then corresponding Neuron or channel be considered as redundancy and explicitly leave out, wherein generating the convolution kernel in these channels can also directly leave out, with This realizes the beta pruning effect to this layer of neural network；ForThe case where, for the expense for reducing storage parameter, it is also convenient for Neural computing, the input of this layer will no longer be multiplied by noise, and the weight of this layer will be directly by noise rarefaction, calculation It is as follows:

Wherein,For the jth row of l layers of weight of neural network.

In the system framework of embodiment shown in Fig. 2, terminate the noise parameter training of this layer, the network weight of current round After retraining and beta pruning, entire training and beta pruning process will be repeatedly applied on next layer of neural network, until neural network The last layer.So form of successively training and beta pruning makes injected noise incorporate neural network in the training process Hierarchical relationship.

To sum up, the present invention is using injected noise as the index of redundancy, after inferring that approximation acquires noise using variation It tests distribution and positions the neuron or convolution kernel of redundancy by threshold function table later, and carry out beta pruning.Simultaneously as trained and beta pruning It is successively to carry out, the noise injected has incorporated the hierarchical relationship of neural network in the training process, so that during beta pruning It fully considers the dependence of interlayer, then ensure that the output result of the neural network under height beta pruning still has robustness.

It should be noted that the step in the layer-by-layer neural networks pruning method inferred based on variation provided by the invention Suddenly, it can use corresponding module, device, unit etc. in the layer-by-layer neural networks pruning system inferred based on variation to give It realizes, the technical solution that those skilled in the art are referred to the system realizes the step process of the method, that is, the system Embodiment in system can be regarded as realizing the preference of the method, and it will not be described here.

One skilled in the art will appreciate that in addition to realizing system provided by the invention in a manner of pure computer readable program code And its other than each device, completely can by by method and step carry out programming in logic come so that system provided by the invention and its Each device is in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and embedded microcontroller etc. To realize identical function.So system provided by the invention and its every device are considered a kind of hardware component, and it is right The device for realizing various functions for including in it can also be considered as the structure in hardware component；It can also will be for realizing each The device of kind function is considered as either the software module of implementation method can be the structure in hardware component again.

Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned Particular implementation, those skilled in the art can make a variety of changes or modify within the scope of the claims, this not shadow Ring substantive content of the invention.In the absence of conflict, the feature in embodiments herein and embodiment can any phase Mutually combination.

Claims

1. a kind of layer-by-layer neural networks pruning method inferred based on variation characterized by comprising

It is trained according to weight of the preset objective function to the neural network for making an uproar, the neural network weight after being trained With the neural network after training；

The variation lower bound inferred according to variation, the corresponding noise parameter of multiplying property Gaussian noise of training injection, is trained Noise parameter afterwards；

Based on the noise parameter after the training, the neural network weight after training, the instruction is successively deleted by threshold function table Corresponding neuron or convolution kernel in neural network after white silk.

2. the layer-by-layer neural networks pruning method according to claim 1 inferred based on variation, which is characterized in that by adopting Sample loading mode injects noise in neural network, obtains the neural network for making an uproar, comprising:

Input in l layers of jth dimension of neural network is denoted as:Then enable the input in l layers of jth dimension of neural network Multiplied by noiseWherein,Indicate the noise injected in l layers of jth dimension,Follow following Gaussian Profile:

∈ follows standard gaussian distribution:

Wherein:ForCorresponding noise parameter.

3. the layer-by-layer neural networks pruning method according to claim 1 inferred based on variation, which is characterized in that according to pre- If objective function the weight of the neural network for making an uproar is trained, the neural network weight after being trained and training after Neural network, comprising:

Assuming that the weight of neural network is W, the noise of l layers of neural network injection is θ^l, then the weight of the neural network is optimal Solve W^*Calculation formula it is as follows:

Wherein: p (y | x, W, θ^l) indicate input data be x, weight W, injection noise be θ^lWhen, neural network output valve is y Probability；D indicates the data set of training, and wherein x is input data, and y is label；Indicate that total L layers of neural network is tieed up in y Output valve on degree；Indicate that output valve of total L layers of the neural network in the y ' dimension, the value range of y ' are total L layers All dimensions of neural network.

4. the layer-by-layer neural networks pruning method according to claim 3 inferred based on variation, which is characterized in that according to change Divide the variation lower bound inferred and obtained, the corresponding noise parameter of multiplying property Gaussian noise of training injection, the noise ginseng after being trained Number, comprising:

The calculation formula for maximizing variation lower bound is as follows:

Wherein:It indicates to maximize variation lower bound, r^lIndicate l layers of noise parameter, J^lIndicate l layers of input Dimension,It indicatesIt arrivesKullback-Leibler divergence；Indicate noise's Probability density function,Indicate priori of l layers of the noise in jth dimension；Indicate that injects in l layers of jth dimension makes an uproar Sound；

Wherein:Indicate that set of real numbers, σ indicate standard deviation；

Noise parameter after being trained is as follows:

Wherein:Indicate r^lOptimal solution,Indicate r^lOptimal solution first dimension element,Indicate r^lOptimal solution Last one-dimensional element, argmax indicate maximizing operation.

5. the layer-by-layer neural networks pruning method inferred described in any one of -4 based on variation according to claim 1, feature It is, based on the noise parameter after the training, the neural network weight after training, the instruction is successively deleted by threshold function table Corresponding neuron or convolution kernel in neural network after white silk, comprising:

After the completion of l layers of weight and the noise parameter training of neural network, pass through the noise of following threshold function table modification injection:

Wherein: on the left of equationFor l layers after rarefaction of j-th of neuron or convolution kernel；On the right side of equationIndicate dilute Preceding j-th of the neuron of thinization or convolution kernel.

6. a kind of layer-by-layer neural networks pruning system inferred based on variation characterized by comprising processor and memory, It is stored with computer program in the memory, when the computer program is called by the processor, the processor is executed The layer-by-layer neural networks pruning method according to any one of claims 1 to 5 inferred based on variation.