CN108090564A

CN108090564A - Based on network weight is initial and the redundant weighting minimizing technology of end-state difference

Info

Publication number: CN108090564A
Application number: CN201711385134.XA
Authority: CN
Inventors: 胡永健; 黎***; 刘琲贝; 郑浩聪
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2018-05-29

Abstract

The invention discloses based on network weight is initial and the redundant weighting minimizing technology of end-state difference.First the original state to every layer network weight and end-state carry out statistics with histogram to this method respectively, and carry out curve fitting；According to the original state of every layer network weight and the difference condition of end-state histogram, network beta pruning section is determined；After the network weight removal in network beta pruning section, retraining is carried out to obtained network model；Compared with legacy network accuracy, if preset standard is not achieved in the network model after retraining, further network weight is screened by weights amplitude of variation size.Compared with prior art, the present invention can effectively solve the On The Choice of network pruning threshold by the original state and end-state histogram difference of analyzing network weight, the present invention can further remove network redundancy weight with the method optimizing network weight of weights amplitude of variation size.

Description

Based on network weight is initial and the redundant weighting minimizing technology of end-state difference

Technical field

The present invention relates to pattern-recognitions and field of artificial intelligence, and in particular to the acceleration and compression of deep neural network Aspect.

Background technology

In recent years, deep neural network (Deep Neural Network, DNN) computer vision, speech recognition and The multiple fields such as natural language processing obtain remarkable effect, and are used widely.Research shows DNN in terms of feature extraction With outstanding performance, network layer is deeper, and the characterization ability of network is stronger.However, since the intensification of the network number of plies needs more The expense of more computing capabilitys and storage resource, therefore, it is difficult to be disposed in storage and the limited hardware device of computing resource, Especially move equipment.

Network model parameter includes network weight and biasing, and the obtained network model of training method used at present is usual Parameter comprising bulk redundancy.The redundancy issue of DNN network parameters how is efficiently solved, has many research so far, it is main It can be divided into three classes：The first kind is the method for substituting original weight using the quantized value or approximation of network weight, second Class is converting network structural approach, as replaced full articulamentum using the network layer in global average pond to reduce neutral net Number of parameters, three classes are network pruning methods.

LeCun et al. is in nineteen ninety in Advances in Neural Information Processing Systems The paper delivered in meeting《Optimal brain damage》With Hassibi et al. in 1993 in Portuguese It is delivered in Conference on Artificial Intelligence meetings《Second order derivatives for network pruning:Optimal brain surgeon》, propose by using second dervative or Hessian The information of matrix comes the complexity of weighting network and the error of training set, however, enlargement and complication with network, it is difficult to Calculate each parameter second dervative in Hessian matrixes.

1997, Castellano et al. delivered opinion in periodical IEEE Transactions on Neural Networks Text《An iterative pruning algorithm for feedforward neural networks》In propose one Kind iteration pruning method, this method are selected to the hidden node of all patterns output energy minimum as deletion of node.It is noticeable , 2015, Han et al. was in Advances in Neural Information Processing Systems meetings It publishes thesis《Learning both weights and connections for efficient neural network》 In propose and decided whether to remove interneuronal connection according to the importance connected between network middle level and layer neuron, and ensure Legacy network performance hardly declines.However, the On The Choice determined with important Connecting quantity on network pruning threshold, Han et al. does not provide effective solution method.

The content of the invention

It is an object of the invention to existing network pruning method is overcome to be determined and important connection weight in pruning threshold A kind of deficiency in terms of selection, it is proposed that the side determined with the selection of important connection weight of new active block pruning threshold Method.

The present invention is based on network weight initially and the redundant weighting minimizing technology of end-state difference is a kind of to be based on network The selection of critical network connection weight of original state and end-state difference of weight and the minimizing technology of redundant weighting.

The purpose of the present invention is achieved through the following technical solutions：

Based on network weight is initial and the redundant weighting minimizing technology of end-state difference, include the following steps：

Step 1, the network weight that its original state is first preserved before network model training, it is preserved after training The network weight of end-state, respectively to the distributed intelligence of the original state and end-state of every layer network weight into column hisgram It counts and carries out curve fitting, respectively obtain every layer of initial network weight histogram matched curve q_lWith final network weight Nogata Figure matched curveWherein subscript l represents the l layer networks in network, and the value range of l shares convolutional layer depending on network With the sum of full articulamentum；

Step 2, according to initial network weight histogram matched curve q_lWith final network weight histogram-fitting curve Difference, determined respectively per layer network beta pruning positive threshold valueAnd negative threshold valueThe positive threshold interval at placeWith negative threshold value sectionAnd according to initial network weight histogram matched curve q_lWith final network weight histogram-fitting curveBetween difference Value, obtains difference curve f_l；

Step 3, respectively calculating difference curve f_lIn positive threshold intervalWith negative threshold value sectionOn slope, be chosen at Positive threshold intervalThe maximum point of interior slope is as the positive threshold value of network beta pruningCandidate value point, add in positive Candidate Set R_l, choosing It takes in negative threshold value sectionThe minimum point of interior slope is as negative threshold valueCandidate value point, add in negative Candidate Set S_l；

Step 4 chooses positive Candidate Set R according to slope successively from big to small_lIn point, by the value of the corresponding abscissa of point As positive threshold valuePositive nonceChoose negative Candidate Set S successively from small to large according to slope simultaneously_lIn point, by this The value conduct of the corresponding abscissa of pointNegative nonceI represents the positive nonce of ithOr negative nonceChoosing It takes, until positive Candidate Set R_lOr negative Candidate Set S_lIn point traveled through until；

Step 5 is located at section by l layers of final network modelNetwork weight zero setting, and test number Ith network model accuracy test is carried out according to collection, until positive Candidate Set R_lOr negative Candidate Set S_lPoint traveled through until, choose Network model is corresponding in the acquirement highest accuracy of test data concentrationAs network beta pruning section

The network beta pruning section of other network layers in addition to l is chosen in step 6, the operation for repeating step 1- steps 5Until the network beta pruning section of all-network layer is chosen

Final network model is located at network beta pruning section by step 7Interior network weight carries out zero setting processing, It realizes the beta pruning per layer network, obtains the network model of beta pruning；

The network model of obtained beta pruning is carried out retraining, and the network weight of beta pruning is kept to instruct again by step 8 It is always 0 in experienced process, obtains the network model of beta pruning and retraining；

If step 9, beta pruning and the network model of retraining test data set the final network model of accuracy ratio it is small It in preset value, then stops operation, and retains the network model of beta pruning and retraining, otherwise, carry out next step operation；

Step 10 calculates every layer of final network weight and initial network weight amplitude of variation, becomes by comparing network weight Change amplitude withOrSize, zero setting processing, α tables are carried out to each layer network weight of final network model again Show network weight amplitude of variation coefficient, member-retaining portion is in step 7 by the weight of beta pruning；Again the network of beta pruning is carried out again Training, makes the weight of beta pruning remain 0 during retraining；After retraining, step 9 is returned to.

The original state of network model represents the init state that network model is trained in data for the first time, network at this time Model is named as initial network model, and the network weight of the state is generated by corresponding network weight generating function, not By the network weight of any data set training study, and the network model after first training is named as final network mould Type, the network weight of end-state are the network weights by generation is constantly updated and adjusted in being trained in data set, wherein, just The network structure of beginning network model and final network model is identical, is only that Connecting quantity between layers is different.

Further to realize the object of the invention, it is preferable that in step 1, the network model is deep learning frame LeNet-5 network models in Caffe.

Preferably, in step 1, the network weight of the original state is random by corresponding network weight generating function The value of generation represents the initial value of the network weight before network model is trained.

Preferably, in step 1, the network weight of the end-state is that network model passes through in data set training not Disconnected update and the network weight of adjustment generation；In step 5, step 7 and step 9, the final network model is instructed in step 1 Corresponding network model after white silk, the network weight of final network model are the network weights of the end-state in step 1.Most Information in the network weight and data set of whole state is related, and is mainly reflected in network model to the feature extraction of data and is divided Class ability.

Preferably, in step 1, the curve matching is in the original state and end-state to every layer network weight On the basis of distributed intelligence carries out statistics with histogram, the profile of histogram is fitted using Density Estimator method, obtains phase The matched curve answered.

Preferably, in step 8, the retraining is to concentrating to obtain trained network model progress in training data It trains, is connected between layers in update and adjustment network model again, wherein the parameter connected between layers includes again Network weight and network biasing.

Preferably, in step 8, it 0 is logical that the network weight of the holding beta pruning is always during retraining Cross setting network weight removal identifier Mask an equal amount of with l layer network weight quantity_lIt realizes, Mask_lIt is to record the layer The flag bit whether some network weight removes by being multiplied with the correspondence position in the layer network weight, is realized per layer network The removal of weight.

Preferably, in step 9, the value range of the preset value is [0.5%, 2%].It is preferred that the value of preset value 1%.

Preferably, in step 10, the comparing cell weight amplitude of variation withOrSize pass through Following method is realized：If the initial value w of k-th of weight in l layers of network_lk, end value is w '_lk, the amplitude of variation of weight is Δw_lk=w '_lk-w_lk, the selection standard of network weight amplitude of variation is any of following four kinds of situations：

(1)&&w′_lk×w_lk＜ 0 is represented as w '_lkPositioned at sectionWhen, if being instructed for the first time During white silk, W_lkBecome negative value from positive value, then retain the weight, weight amplitude of variation meets at this time

(2)&&w′_lk×w_lk＜ 0 is represented as w '_lkPositioned at sectionIf it is trained for the first time In the process, W_lkBecome positive value from negative value, then retain the weight, weight amplitude of variation meets at this time

(3)&&Δw_lk×w_lk＞ 0 is represented as w '_lkIt is more thanAnd it is less than, but trained for the first time In the process, W_lkBecome larger negative value from smaller negative value, and amplitude of variation meetsWhen, by weight W_lkIt protects It stays；

(4)&&Δw_lk×w_lk＞ 0 is represented as w '_lkIt is more thanAnd it is less than, but trained for the first time In the process, W_lkBecome larger positive value from smaller positive value, and amplitude of variation meetsWhen, by weight W_lkIt protects It stays.

Preferably, the selection of the weight amplitude of variation factor alpha in step 10 first sets α=mean (abs (w '_l))/std (w′_l), wherein mean (abs (w '_l)) and std (w '_l) average and weight of this layer of final network weight absolute value are represented respectively Standard deviation, whether network reaches preset standard after judging beta pruning, if so, directly preserving the network model after beta pruning, otherwise, presses Weight amplitude of variation factor alpha is adjusted in a manner that step-length β gradually successively decreases according to from 1 to 0, network reaches preset standard after beta pruning Until, wherein the value range of step-length β is (0,1).It is preferred that the value 0.1 of β.

Inventive network model is deep neural network model, than the neural network model of traditional neural network more layers, It is mainly made of input layer, output layer, multiple and different hidden layers and connection between layers, and input layer is by number According to be input into network model, output layer is the operation result output for network model, and hidden layer is except input layer and output Other network layers beyond layer, different types of hidden layer has different calculation functions to data, and connects between layers Parameter include network weight and network and bias, it is known that the network structure and layer and layer that network model is made of heterogeneous networks layer Between Connecting quantity form.

The present invention is had the following advantages compared with existing network technology of prunning branches and effect：The present invention innovatively utilizes net The histogram difference of the initial and end-state of network weight carries out determining for network pruning threshold, and on this basis, propose with The mode of network weight amplitude of variation size optimizes selection.The present invention can make full use of network-critical connection weight and its The weight size property related with weight amplitude of variation under conditions of the original accuracy of basic guarantee network, realizes network weight The selection of connection weight and the removal of redundant weighting are wanted, is conducive to the compression and acceleration of successive depths nerve neutral net, is solved Deep learning network is higher to equipment computing capability and memory requirement, is difficult in the limited hardware device of storage and computing resource, Especially move the problem of being disposed in equipment.

Description of the drawings

Fig. 1 is that the present invention is based on network weight is initial and the flow chart element of the redundant weighting minimizing technology of end-state difference Figure.

Fig. 2 is network weight histogram and its curve fit figure of the LeNet-5 initial networks at conv2 layers.

Fig. 3 is network weight histogram and its curve fit figure of the final networks of LeNet-5 at conv2 layers.

Fig. 4 be LeNet-5 initially and network weight histogram curve fitted figure of the final network at conv2 layers.

Fig. 5 is the initial and final training networks of LeNet-5 between the matched curve of conv2 layers of network weight histogram Difference curve figure.

Specific embodiment

To more fully understand the present invention, with reference to embodiment and attached drawing, the invention will be further described, but this hair Bright embodiment is without being limited thereto.

It is carried out below by example of the LeNet-5 network models in the deep learning frame Caffe of Berkeley BVLC exploitations Detailed introduction and explanation.LeNet-5 network models are for identifying the convolutional neural networks of handwritten numeral 0 to 9, are to roll up early stage One of most representative network model in product neutral net, the network model can reach 99.20% accurately in MNIST data sets Degree.LeNet-5 network models share 7 layers (not including input layer), every layer of network connection parameter for all including different number. LeNet-5 is mainly made of sample under level 2 volume lamination, 2 layers layer, 2 layers of full articulamentum and 1 layer of output layer, wherein only convolutional layer It needs to update and adjust in the training process with the Connecting quantity of full articulamentum, remaining Connecting quantity varies without.Network mould The name of level 2 volume lamination in type is respectively conv1 and conv2, they are mainly used for extraction data characteristics, convolutional layer Conv1 has 20 convolution kernels, and convolutional layer conv2 has 50 convolution kernels, and the size of each of which convolution kernel is 5, and sliding step is 1, the initialization mode of network weight is xavier.The name of 2 layers of full articulamentum in network model is respectively ip1 and ip2, entirely Articulamentum ip1 has 500 neurons, and full articulamentum ip2 has 10 neurons, their network weight initialization mode is all set For xavier, the network weight obedience of the initialization mode generation of the network weight is uniformly distributed.Above operation can directly pass through Deep learning frame Caffe is realized.

The specific implementation FB(flow block) of the present invention is as shown in Figure 1, the first step, the first first instruction in LeNet-5 network models The network model of LeNet-5 network model original states, the initial network mould of abbreviation LeNet-5 network models are preserved before practicing Type.The network model of LeNet-5 network model end-state, abbreviation LeNet-5 network models are preserved after first training Final network model.Since the present invention is initially to be weighed according to network weight with the difference of end-state to remove redundant network Weight, therefore first training is mainly by the initial network model for preserving LeNet-5 network models and final network model, to obtain The network weight of the initial and end-state of LeNet-5 network models is obtained, the first training of network model can directly pass through Caffe It realizes.The original state network weight of LeNet-5 network models is that the network model initializes shape in data during training for the first time The network weight of state, the network weight of the state are generated by network weight initial mode xavier, and there is no by any Data set training study.Since deep neural network has automatic learning ability, only need directly to provide data for it, allow it The autonomous rule gone in learning data, particularly as being after being trained using the data of subsidiary correct label to it, they By updating and effect is greatly improved in each connection of adjustment layer and layer, until prediction result that network model exports as far as possible Until consistent with correct label, therefore the network weight of the end-state of LeNet-5 network models is by MNIST data sets Generation is constantly updated and adjusted in training.The initial network model of wherein LeNet-5 and each network of final network model Layer is the same, and is only that Connecting quantity between layers is different.

The distributed intelligence of original state and end-state per layer network weight in LeNet-5 network models is carried out respectively Statistics with histogram simultaneously carries out curve fitting.Every layer network weight in network is carried out according to Fig. 1 the present embodiment FB(flow block) Similary operation, therefore only carry out detailed operation by taking conv2 layers of network weight in LeNet-5 network models as an example below and say It is bright.Conv2 layers be LeNet-5 network models second layer network layer, the initialization mode of the layer network weight is xavier, i.e., The obedience of the initial network weight of this layer is uniformly distributed, and the final network weight of this layer is network model by training up and Learn obtained state, conv2 layers of initial weight and the histogram of final weight are respectively such as the initial network weight of Fig. 2 Shown in the histogram of final network weight in histogram and Fig. 3.By respectively to the histogram of initial weight and final weight It carries out curve fitting, the present embodiment is Density Estimator to the method that weight histogram carries out curve fitting, and obtains conv2 layers The histogram-fitting curve q of initial weight₂With the histogram-fitting curve of final weightRespectively such as the initial network weight of Fig. 2 The histogram-fitting curve q of weight₂With the histogram-fitting curve of the final network weight in Fig. 3It is shown.Conv2 layer networks are weighed The original state matched curve q of weight₂With end-state histogram-fitting curveComparison diagram, as shown in Figure 4.A in Fig. 4, B, tetra- points of C, D are the original state matched curve q of conv2 layer network weights respectively₂With end-state histogram-fitting curveFrom the intersection point along transverse axis from small to large, initial network weight histogram and the equal point of final network histogram are represented, a, b, C, d represents that intersection point A, B, C, D correspond to the value of abscissa respectively.

Second step, according to original state and end-state histogram of the LeNet-5 network models per layer network weight not The change of divergence with weight magnitude range judges the distribution of the important and inessential connection weight of every layer network, so as to true respectively The positive threshold value of fixed every layer network beta pruningAnd negative threshold valueThe positive threshold interval at placeWith negative threshold value sectionWherein subscript l Represent the l layer networks in network.By taking conv2 layers of network weight in LeNet-5 network models as an example, as shown in Figure 4, pass through Compare the size of the corresponding network weight histogram of different abscissas, conv2 layers can be obtained at (- ∞, a], [d ,+∞) and (b, c) Final network histogram in section is more than initial network histogram, i.e., the weighted value of final network positioned at (- ∞, a], [d ,+ ∞) and the quantity of the connection weight in (b, c) section is more than initial network, and conv2 layers in (a, b) and (c, d) section Final network weight histogram is less than initial network weight histogram, i.e., the weighted value of final network is located at (a, b) and (c, d) area Between connection weight quantity it is fewer than initial network.It is more important according to the bigger network connection of the absolute value of final network weight Final network weight histogram is more than just by the relation of property and final network weight histogram and initial network weight histogram The region that beginning network weight histogram and network weight histogram are located at larger network weight absolute value connects as network-critical Connect the section of weight；Final network weight histogram is more than initial network weight histogram and network weight histogram is located at Section of the region of smaller network weight absolute value as the inessential connection weight of network.As shown in Figure 4, conv2 layer networks weight Connection weight is wanted to be predominantly located in (- ∞, a] and [d ,+∞) section, and the inessential connection weight of network focuses primarily upon (b, c) On section.Accordingly, it may be determined that the positive threshold value of conv2 layer network beta pruningsWith network beta pruning negative threshold valueThe positive threshold value in section at place SectionWith negative threshold value sectionRespectively (c, d) and (a, b).

3rd step calculates the difference curve of the original state and end-state histogram-fitting curve per layer network weight f_l.Wherein, the original state matched curve q of conv2 layer networks weight₂With end-state histogram-fitting curveBetween difference It is worth curve f₂, i.e.,As shown in Figure 5.

4th step, respectively to the difference curve f of every layer network weight_lIn the respective corresponding positive threshold interval in sectionWith negative threshold It is worth sectionUpper slope calculations, in positive threshold intervalInside take difference curve f_lThe maximum point of slope as network beta pruning just The candidate value point of threshold value, adds in positive Candidate Set R_l, in negative threshold value sectionInside take difference curve f_lThe minimum point conduct of slope The candidate value point of network beta pruning negative threshold value adds in negative Candidate Set S_l。

5th step chooses positive Candidate Set R according to slope successively from big to small_lIn the corresponding abscissa of point value as network The positive threshold value of beta pruningPositive nonceMeanwhile choose negative Candidate Set S successively from small to large according to slope_lIn point correspond to The value of abscissa is as network beta pruning negative threshold valueNegative nonceIt is located at section by l layers of final network model againNetwork weight zero setting, and ith network test is carried out, until positive Candidate Set R_lOr negative Candidate Set S_lIn point time Until having gone through, it is corresponding in test data set accuracy highest finally to choose network modelAs network beta pruning area Between, wherein, the accuracy of network model is prediction result and reality that statistics network model concentrates data in sample As a result whether identical accuracy.

6th step, repeat the above steps 1- steps 5, determines the network beta pruning section of other network layers successively Until the network beta pruning section of all-network layer is determined

7th step ensures that each layer is located at network beta pruning section in the final network model of LeNet-5Network Weight is always zero during retraining.The final network model of LeNet-5 is that training is tied for the first time by LeNet-5 network models The Connecting quantity between layers and the network model of network structure composition finally preserved after first training is preserved after beam, And the design of network structure is before being completed in the training of network model.The purpose of retraining is automatic again through network model Connection between layers is continuously updated and adjusted during the sample information of learning data set, effectively to be promoted data spy Sign extraction and Classification and Identification ability, retraining can be completed directly by deep learning frame Caffe, and Caffe can be in network model The network connection parameter ultimately generated is preserved after trained.In order to keep the network weight with removal during retraining It is always 0, identifier Mask is removed with an equal amount of network weight of l layer network weight quantity by setting_l, as record The flag bit whether the layer network weight removes will be located at network beta pruning sectionL layer network weights it is corresponding Mask_lCorresponding position is arranged to 0, is otherwise provided as 1, concrete operations are as shown in 1 initialization section of algorithm, Mask_lkRepresent network L layer networks k-th of network weight W_lkThe flag bit whether removed.

Algorithm 1：Network beta pruning and retraining based on threshold value

Initialization:

Mask_lk=0

else

Mask_lk=1

endif

Beta pruning and retraining:

8th step, by beta pruning network carry out re -training, concrete operations as shown in 1 beta pruning of algorithm and retraining part, The learning rate of network retraining is arranged to the 1/10 of original learning rate after beta pruning, remaining network parameter remains unchanged.Table 1 gives The data of the result of variations of LeNet-5 network models each layer weight after beta pruning, 1 first row of table and secondary series are network models Existing definite network architecture parameters itself, first row represent the title of each layer network layer Layer in network model, second list Show the number of every layer network weight or the number of whole network weight.Third and fourth row is respectively per layer network beta pruning negative threshold valueWith Positive threshold valueThe data can be obtained by above-mentioned steps 1-6.The weight quantity that 5th row represent per layer network to retain after beta pruning accounts for this The weight quantity that whole network retains after the percentage of layer network weight number or beta pruning accounts for the percentage of whole network weight number Than the data can be according to Mask in the every layer network of statistics in step 7_lIn 0 number remove in table 1 secondary series per layer network weight Number.As shown in Table 1, network weight quantity can be reduced to original total by network pruning method of the present embodiment based on algorithm 1 Several 7.68%.

Weight result of variations after beta pruning of the table 1LeNet-5 network models based on algorithm 1

9th step, by the final network model of LetNet-5 network models and its network for having been subjected to beta pruning and retraining Model carries out accuracy test in MNIST data sets respectively in test data set, and compares the accuracy obtained by the two. The have been subjected to beta pruning and the network model of retraining of LetNet-5 networks are to instruct its final network model by beta pruning and again It drills to obtain network model, the network structure of the two is identical, but their network connection parameter is different.Network model Accuracy refers to the network model frequency identical with actual result to the prediction result of data set sample, such as LetNet-5 network moulds The final mask of type predicts the content information of a certain sample of test data in MNIST data sets, judges its prediction knot Whether the actual result of fruit and sample is identical and counts identical frequency.When test of the completion to all samples of test data Afterwards, the accuracy of network model is finally obtained.The operation of the process and result can be completed directly by deep learning frame Caffe With obtain, concrete outcome is as shown in table 2.As shown in Table 2, the LetNet-5 network models of beta pruning retraining are in MNIST data sets On accuracy improve 0.06 percentage point up to 99.28%, 99.22% than final network model, show the network mould Type can realize the effect of removal network redundancy weight under the conditions of legacy network accuracy is kept substantially, therefore stop operation simultaneously Retain the network model.

The final network accuracy of accuracy ratio of network model after beta pruning and retraining decrease beyond more than preset value When, the preset value of wherein this example is arranged to 1%, and the present embodiment is further important according to network weight amplitude of variation optimised The choice of connection weight is continued to execute and operated in next step.

Table 2LeNet-5 is compared based on the network accuracy of 1 final network model of algorithm and beta pruning retraining

Tenth step calculates per the amplitude of variation between the final network weight of layer network and initial network weight, passes through ratio Compared with network weight amplitude of variation withOrSize, zero setting is carried out to each layer network weight of final network again Processing, α represent network weight amplitude of variation coefficient, and member-retaining portion is in step 7 by the weight of beta pruning；To the net of beta pruning again Network carries out retraining, and the weight of beta pruning is made to remain 0 during retraining, specifically as shown in algorithm 2.Algorithm 2 It is to be designed according to the importance of the network connection relation related with final network weight size and amplitude of variation size, algorithm Situation 1 in 2 initialization steps is that the final bigger network connection of network weight of reflection is more important, and situation 2-5 is the final net of reflection Network weight is related with the importance of network connection with the amplitude of variation size of initial network weight.

In 2 initialization step of algorithm, situation 1 is represented as w '_lkIt is less thanOr it is more thanWhen, by weight W_lkRetain；Situation 2 It represents as w '_lkPositioned at sectionWhen, if in network weight learning process, W_lkBecome negative value from positive value, then should Weight retains, it is known that weight amplitude of variation meets at this timeSituation 3 is represented as w '_lkPositioned at sectionIf in network weight learning process, W_lkBecome positive value from negative value, then retain the weight, it is known that weigh at this time Weight amplitude of variation meetsSituation 4 is represented as w '_lkIt is more thanAnd it is less thanBut in network weight learning process In, W_lkBecome larger negative value from smaller negative value, and amplitude of variation meetsWhen, by weight W_lkRetain；Feelings Condition 5 is represented as w '_lkIt is more thanAnd it is less than, but in network weight learning process, W_lkBecome larger from smaller positive value Positive value, and amplitude of variation meetsWhen, by weight W_lkRetain.

The present embodiment usually sets α=mean (abs (w ' first_l))/std(w′_l), wherein mean (abs (w '_l)) and std (w ' l) represents this layer of final average of network weight absolute value and the standard deviation of weight respectively, judge after beta pruning network whether base The final network accuracy of this holding, if so, the model after beta pruning is directly preserved, otherwise, according to from 1 to 0 with the step of β=0.1 The long mode gradually successively decreased adjusts α.As α=1, algorithm 2 is equivalent to algorithm 1, and as α=0, algorithm 2 is not cut network Branch operation.And the 8th step describes 1 beta pruning of algorithm and retraining part is identical for the concrete operations of the re -training of beta pruning network progress.

Algorithm 2：Network beta pruning and retraining based on threshold value and weights amplitude of variation

Initialization：casew′_lk of

case1:

Mask_lk=1

case2:

Mask_lk=1

case 3:

Mask_lk=1

case 4:

Mask_lk=1

case 5:

Mask_lk=1

else

Mask_lk=0

end

Beta pruning and retraining：

Claims

1. based on network weight is initial and the redundant weighting minimizing technology of end-state difference, it is characterised in that including walking as follows Suddenly：

Step 1, the network weight that its original state is first preserved before network model training, it is final to preserve it after training The network weight of state, the respectively distributed intelligence of the original state to every layer network weight and end-state carry out statistics with histogram And carry out curve fitting, respectively obtain every layer of initial network weight histogram matched curve q_lIntend with final network weight histogram Close curveWherein subscript l represents the l layer networks in network, and the value range of l depends on network and shares convolutional layer and entirely The sum of articulamentum；

Step 2, according to initial network weight histogram matched curve q_lWith final network weight histogram-fitting curveDifference It is different, it is determined respectively per the positive threshold value T of layer network beta pruning_l ⁺With negative threshold value T_l ^-The positive threshold interval at placeWith negative threshold value section And according to initial network weight histogram matched curve q_lWith final network weight histogram-fitting curveBetween difference, obtain To difference curve f_l；

Step 3, respectively calculating difference curve f_lIn positive threshold intervalWith negative threshold value sectionOn slope, be chosen at positive threshold value SectionThe maximum point of interior slope is as the positive threshold value T of network beta pruning_l ⁺Candidate value point, add in positive Candidate Set R_l, it is chosen at negative Threshold intervalThe minimum point of interior slope is as negative threshold value T_l ^-Candidate value point, add in negative Candidate Set S_l；

Step 4 chooses positive Candidate Set R according to slope successively from big to small_lIn point, using the value of the corresponding abscissa of point as Positive threshold value T_l ⁺Positive nonceChoose negative Candidate Set S successively from small to large according to slope simultaneously_lIn point, which is corresponded to Abscissa value as T_l ^-Negative nonceI represents the positive nonce of ithOr negative nonceSelection, until Positive Candidate Set R_lOr negative Candidate Set S_lIn point traveled through until；

Step 5 is located at section by l layers of final network modelNetwork weight zero setting, and in test data set Ith network model accuracy test is carried out, until positive Candidate Set R_lOr negative Candidate Set S_lPoint traveled through until, choose network Model is corresponding in the acquirement highest accuracy of test data concentrationAs network beta pruning section (T_l ^-,T_l ⁺)；

The network beta pruning section (T of other network layers in addition to l is chosen in step 6, the operation for repeating step 1- steps 5_l ^-,T_l ⁺), directly To the network beta pruning section (T for having chosen all-network layer_l ^-,T_l ⁺)；

Final network model is located at network beta pruning section (T by step 7_l ^-,T_l ⁺) in network weight carry out zero setting processing, realize Beta pruning per layer network, obtains the network model of beta pruning；

The network model of obtained beta pruning is carried out retraining by step 8, and keeps the network weight of beta pruning in retraining It is always 0 in the process, obtains the network model of beta pruning and retraining；

If step 9, beta pruning and the network model of retraining test data set the final network model of accuracy ratio be less than it is pre- If value, then stop operation, and retains the network model of beta pruning and retraining, otherwise, next step operation is carried out；

Step 10 calculates every layer of final network weight and initial network weight amplitude of variation, changes width by comparing network weight Degree and α × T_l ^-Or α × T_l ⁺Size, zero setting processing is carried out to each layer network weight of final network model again, α represents net Network weight amplitude of variation coefficient, member-retaining portion is in step 7 by the weight of beta pruning；Again the network of beta pruning is instructed again Practice, the weight of beta pruning is made to remain 0 during retraining；After retraining, step 9 is returned to.

2. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that：In step 1, the network model is the LeNet-5 network models in deep learning frame Caffe.

3. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that：In step 1, the network weight of the original state is generated at random by corresponding network weight generating function Value represents the initial value of the network weight before network model is trained.

4. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that：In step 1, the network weight of the end-state is that network model passes through in data set training and constantly updates With the network weight of adjustment generation；In step 5, step 7 and step 9, the final network model is that training terminates in step 1 Corresponding network model afterwards, the network weight of final network model are the network weights of the end-state in step 1.

5. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that：In step 1, the curve matching is to the original state of every layer network weight and the distribution letter of end-state On the basis of breath carries out statistics with histogram, the profile of histogram is fitted using Density Estimator method, is intended accordingly Close curve.

6. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that：In step 8, the retraining is instructed again to concentrating to obtain trained network model in training data Practice, connected between layers in update and adjustment network model again, wherein the parameter connected between layers includes network weight Weight and network biasing.

7. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that：In step 8, it 0 is to pass through setting that the network weight of the holding beta pruning is always during retraining Network weight removal identifier Mask an equal amount of with l layer network weight quantity_lIt realizes, Mask_lIt is to record the layer some net The flag bit whether network weight removes by being multiplied with the correspondence position in the layer network weight, is realized per layer network weight Removal.

8. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that：In step 9, the value of the preset value is [0.5%, 2%].

9. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that：In step 10, the comparing cell weight amplitude of variation and α × T_l ^-Or α × T_l ⁺Size by the following method It realizes：If the initial value w of k-th of weight in l layers of network_lk, end value is w '_lk, the amplitude of variation of weight is Δ w_lk= w′_lk-w_lk, the selection standard of network weight amplitude of variation is any of following four kinds of situations：

(1)w′_lk＜ α × T_l ^-&&w′_lk×w_lk＜ 0 is represented as w '_lkPositioned at section (T_l ^-,α×T_l ^-) when, if being trained for the first time Cheng Zhong, W_lkBecome negative value from positive value, then retain the weight, weight amplitude of variation meets Δ w at this time_lk＜ α × T_l ^-；

(2)w′_lk＞ α × T_l ⁺&&w′_lk×w_lk＜ 0 is represented as w '_lkPositioned at section (α × T_l ⁺,T_l ⁺), if in first training process In, W_lkBecome positive value from negative value, then retain the weight, weight amplitude of variation meets Δ w at this time_lk＞ α × T_l ⁺；

(3)Δw_lk＜ α × T_l ^-&&Δw_lk×w_lk＞ 0 is represented as w '_lkMore than T_l ^-And less than T_l ⁺, but in first training process, W_lkBecome larger negative value from smaller negative value, and amplitude of variation meets Δ w_lk＜ α × T_l ^-When, by weight W_lkRetain；

(4)Δw_lk＞ α × T_l ⁺&&Δw_lk×w_lk＞ 0 is represented as w '_lkMore than T_l ^-And less than T_l ⁺, but in first training process, W_lkBecome larger positive value from smaller positive value, and amplitude of variation meets Δ w_lk＞ α × T_l ⁺When, by weight W_lkRetain.

10. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that：The selection of weight amplitude of variation factor alpha in step 10 first sets α=mean (abs (w '_l))/std(w′_l), Middle mean (abs (w '_l)) and std (w '_l) this layer of final average of network weight absolute value and the standard deviation of weight are represented respectively, Whether network reaches preset standard after judging beta pruning, if so, directly preserve beta pruning after network model, otherwise, according to from 1 to 0 adjusts weight amplitude of variation factor alpha in a manner that step-length β gradually successively decreases, until network reaches preset standard after the beta pruning, The value range of middle step-length β is (0,1).