CN108090564A - Based on network weight is initial and the redundant weighting minimizing technology of end-state difference - Google Patents

Based on network weight is initial and the redundant weighting minimizing technology of end-state difference Download PDF

Info

Publication number
CN108090564A
CN108090564A CN201711385134.XA CN201711385134A CN108090564A CN 108090564 A CN108090564 A CN 108090564A CN 201711385134 A CN201711385134 A CN 201711385134A CN 108090564 A CN108090564 A CN 108090564A
Authority
CN
China
Prior art keywords
network
weight
value
network weight
beta pruning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711385134.XA
Other languages
Chinese (zh)
Inventor
胡永健
黎***
刘琲贝
郑浩聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201711385134.XA priority Critical patent/CN108090564A/en
Publication of CN108090564A publication Critical patent/CN108090564A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses based on network weight is initial and the redundant weighting minimizing technology of end-state difference.First the original state to every layer network weight and end-state carry out statistics with histogram to this method respectively, and carry out curve fitting;According to the original state of every layer network weight and the difference condition of end-state histogram, network beta pruning section is determined;After the network weight removal in network beta pruning section, retraining is carried out to obtained network model;Compared with legacy network accuracy, if preset standard is not achieved in the network model after retraining, further network weight is screened by weights amplitude of variation size.Compared with prior art, the present invention can effectively solve the On The Choice of network pruning threshold by the original state and end-state histogram difference of analyzing network weight, the present invention can further remove network redundancy weight with the method optimizing network weight of weights amplitude of variation size.

Description

Based on network weight is initial and the redundant weighting minimizing technology of end-state difference
Technical field
The present invention relates to pattern-recognitions and field of artificial intelligence, and in particular to the acceleration and compression of deep neural network Aspect.
Background technology
In recent years, deep neural network (Deep Neural Network, DNN) computer vision, speech recognition and The multiple fields such as natural language processing obtain remarkable effect, and are used widely.Research shows DNN in terms of feature extraction With outstanding performance, network layer is deeper, and the characterization ability of network is stronger.However, since the intensification of the network number of plies needs more The expense of more computing capabilitys and storage resource, therefore, it is difficult to be disposed in storage and the limited hardware device of computing resource, Especially move equipment.
Network model parameter includes network weight and biasing, and the obtained network model of training method used at present is usual Parameter comprising bulk redundancy.The redundancy issue of DNN network parameters how is efficiently solved, has many research so far, it is main It can be divided into three classes:The first kind is the method for substituting original weight using the quantized value or approximation of network weight, second Class is converting network structural approach, as replaced full articulamentum using the network layer in global average pond to reduce neutral net Number of parameters, three classes are network pruning methods.
LeCun et al. is in nineteen ninety in Advances in Neural Information Processing Systems The paper delivered in meeting《Optimal brain damage》With Hassibi et al. in 1993 in Portuguese It is delivered in Conference on Artificial Intelligence meetings《Second order derivatives for network pruning:Optimal brain surgeon》, propose by using second dervative or Hessian The information of matrix comes the complexity of weighting network and the error of training set, however, enlargement and complication with network, it is difficult to Calculate each parameter second dervative in Hessian matrixes.
1997, Castellano et al. delivered opinion in periodical IEEE Transactions on Neural Networks Text《An iterative pruning algorithm for feedforward neural networks》In propose one Kind iteration pruning method, this method are selected to the hidden node of all patterns output energy minimum as deletion of node.It is noticeable , 2015, Han et al. was in Advances in Neural Information Processing Systems meetings It publishes thesis《Learning both weights and connections for efficient neural network》 In propose and decided whether to remove interneuronal connection according to the importance connected between network middle level and layer neuron, and ensure Legacy network performance hardly declines.However, the On The Choice determined with important Connecting quantity on network pruning threshold, Han et al. does not provide effective solution method.
The content of the invention
It is an object of the invention to existing network pruning method is overcome to be determined and important connection weight in pruning threshold A kind of deficiency in terms of selection, it is proposed that the side determined with the selection of important connection weight of new active block pruning threshold Method.
The present invention is based on network weight initially and the redundant weighting minimizing technology of end-state difference is a kind of to be based on network The selection of critical network connection weight of original state and end-state difference of weight and the minimizing technology of redundant weighting.
The purpose of the present invention is achieved through the following technical solutions:
Based on network weight is initial and the redundant weighting minimizing technology of end-state difference, include the following steps:
Step 1, the network weight that its original state is first preserved before network model training, it is preserved after training The network weight of end-state, respectively to the distributed intelligence of the original state and end-state of every layer network weight into column hisgram It counts and carries out curve fitting, respectively obtain every layer of initial network weight histogram matched curve qlWith final network weight Nogata Figure matched curveWherein subscript l represents the l layer networks in network, and the value range of l shares convolutional layer depending on network With the sum of full articulamentum;
Step 2, according to initial network weight histogram matched curve qlWith final network weight histogram-fitting curve Difference, determined respectively per layer network beta pruning positive threshold valueAnd negative threshold valueThe positive threshold interval at placeWith negative threshold value sectionAnd according to initial network weight histogram matched curve qlWith final network weight histogram-fitting curveBetween difference Value, obtains difference curve fl
Step 3, respectively calculating difference curve flIn positive threshold intervalWith negative threshold value sectionOn slope, be chosen at Positive threshold intervalThe maximum point of interior slope is as the positive threshold value of network beta pruningCandidate value point, add in positive Candidate Set Rl, choosing It takes in negative threshold value sectionThe minimum point of interior slope is as negative threshold valueCandidate value point, add in negative Candidate Set Sl
Step 4 chooses positive Candidate Set R according to slope successively from big to smalllIn point, by the value of the corresponding abscissa of point As positive threshold valuePositive nonceChoose negative Candidate Set S successively from small to large according to slope simultaneouslylIn point, by this The value conduct of the corresponding abscissa of pointNegative nonceI represents the positive nonce of ithOr negative nonceChoosing It takes, until positive Candidate Set RlOr negative Candidate Set SlIn point traveled through until;
Step 5 is located at section by l layers of final network modelNetwork weight zero setting, and test number Ith network model accuracy test is carried out according to collection, until positive Candidate Set RlOr negative Candidate Set SlPoint traveled through until, choose Network model is corresponding in the acquirement highest accuracy of test data concentrationAs network beta pruning section
The network beta pruning section of other network layers in addition to l is chosen in step 6, the operation for repeating step 1- steps 5Until the network beta pruning section of all-network layer is chosen
Final network model is located at network beta pruning section by step 7Interior network weight carries out zero setting processing, It realizes the beta pruning per layer network, obtains the network model of beta pruning;
The network model of obtained beta pruning is carried out retraining, and the network weight of beta pruning is kept to instruct again by step 8 It is always 0 in experienced process, obtains the network model of beta pruning and retraining;
If step 9, beta pruning and the network model of retraining test data set the final network model of accuracy ratio it is small It in preset value, then stops operation, and retains the network model of beta pruning and retraining, otherwise, carry out next step operation;
Step 10 calculates every layer of final network weight and initial network weight amplitude of variation, becomes by comparing network weight Change amplitude withOrSize, zero setting processing, α tables are carried out to each layer network weight of final network model again Show network weight amplitude of variation coefficient, member-retaining portion is in step 7 by the weight of beta pruning;Again the network of beta pruning is carried out again Training, makes the weight of beta pruning remain 0 during retraining;After retraining, step 9 is returned to.
The original state of network model represents the init state that network model is trained in data for the first time, network at this time Model is named as initial network model, and the network weight of the state is generated by corresponding network weight generating function, not By the network weight of any data set training study, and the network model after first training is named as final network mould Type, the network weight of end-state are the network weights by generation is constantly updated and adjusted in being trained in data set, wherein, just The network structure of beginning network model and final network model is identical, is only that Connecting quantity between layers is different.
Further to realize the object of the invention, it is preferable that in step 1, the network model is deep learning frame LeNet-5 network models in Caffe.
Preferably, in step 1, the network weight of the original state is random by corresponding network weight generating function The value of generation represents the initial value of the network weight before network model is trained.
Preferably, in step 1, the network weight of the end-state is that network model passes through in data set training not Disconnected update and the network weight of adjustment generation;In step 5, step 7 and step 9, the final network model is instructed in step 1 Corresponding network model after white silk, the network weight of final network model are the network weights of the end-state in step 1.Most Information in the network weight and data set of whole state is related, and is mainly reflected in network model to the feature extraction of data and is divided Class ability.
Preferably, in step 1, the curve matching is in the original state and end-state to every layer network weight On the basis of distributed intelligence carries out statistics with histogram, the profile of histogram is fitted using Density Estimator method, obtains phase The matched curve answered.
Preferably, in step 8, the retraining is to concentrating to obtain trained network model progress in training data It trains, is connected between layers in update and adjustment network model again, wherein the parameter connected between layers includes again Network weight and network biasing.
Preferably, in step 8, it 0 is logical that the network weight of the holding beta pruning is always during retraining Cross setting network weight removal identifier Mask an equal amount of with l layer network weight quantitylIt realizes, MasklIt is to record the layer The flag bit whether some network weight removes by being multiplied with the correspondence position in the layer network weight, is realized per layer network The removal of weight.
Preferably, in step 9, the value range of the preset value is [0.5%, 2%].It is preferred that the value of preset value 1%.
Preferably, in step 10, the comparing cell weight amplitude of variation withOrSize pass through Following method is realized:If the initial value w of k-th of weight in l layers of networklk, end value is w 'lk, the amplitude of variation of weight is Δwlk=w 'lk-wlk, the selection standard of network weight amplitude of variation is any of following four kinds of situations:
(1)&&w′lk×wlk< 0 is represented as w 'lkPositioned at sectionWhen, if being instructed for the first time During white silk, WlkBecome negative value from positive value, then retain the weight, weight amplitude of variation meets at this time
(2)&&w′lk×wlk< 0 is represented as w 'lkPositioned at sectionIf it is trained for the first time In the process, WlkBecome positive value from negative value, then retain the weight, weight amplitude of variation meets at this time
(3)&&Δwlk×wlk> 0 is represented as w 'lkIt is more thanAnd it is less than, but trained for the first time In the process, WlkBecome larger negative value from smaller negative value, and amplitude of variation meetsWhen, by weight WlkIt protects It stays;
(4)&&Δwlk×wlk> 0 is represented as w 'lkIt is more thanAnd it is less than, but trained for the first time In the process, WlkBecome larger positive value from smaller positive value, and amplitude of variation meetsWhen, by weight WlkIt protects It stays.
Preferably, the selection of the weight amplitude of variation factor alpha in step 10 first sets α=mean (abs (w 'l))/std (w′l), wherein mean (abs (w 'l)) and std (w 'l) average and weight of this layer of final network weight absolute value are represented respectively Standard deviation, whether network reaches preset standard after judging beta pruning, if so, directly preserving the network model after beta pruning, otherwise, presses Weight amplitude of variation factor alpha is adjusted in a manner that step-length β gradually successively decreases according to from 1 to 0, network reaches preset standard after beta pruning Until, wherein the value range of step-length β is (0,1).It is preferred that the value 0.1 of β.
Inventive network model is deep neural network model, than the neural network model of traditional neural network more layers, It is mainly made of input layer, output layer, multiple and different hidden layers and connection between layers, and input layer is by number According to be input into network model, output layer is the operation result output for network model, and hidden layer is except input layer and output Other network layers beyond layer, different types of hidden layer has different calculation functions to data, and connects between layers Parameter include network weight and network and bias, it is known that the network structure and layer and layer that network model is made of heterogeneous networks layer Between Connecting quantity form.
The present invention is had the following advantages compared with existing network technology of prunning branches and effect:The present invention innovatively utilizes net The histogram difference of the initial and end-state of network weight carries out determining for network pruning threshold, and on this basis, propose with The mode of network weight amplitude of variation size optimizes selection.The present invention can make full use of network-critical connection weight and its The weight size property related with weight amplitude of variation under conditions of the original accuracy of basic guarantee network, realizes network weight The selection of connection weight and the removal of redundant weighting are wanted, is conducive to the compression and acceleration of successive depths nerve neutral net, is solved Deep learning network is higher to equipment computing capability and memory requirement, is difficult in the limited hardware device of storage and computing resource, Especially move the problem of being disposed in equipment.
Description of the drawings
Fig. 1 is that the present invention is based on network weight is initial and the flow chart element of the redundant weighting minimizing technology of end-state difference Figure.
Fig. 2 is network weight histogram and its curve fit figure of the LeNet-5 initial networks at conv2 layers.
Fig. 3 is network weight histogram and its curve fit figure of the final networks of LeNet-5 at conv2 layers.
Fig. 4 be LeNet-5 initially and network weight histogram curve fitted figure of the final network at conv2 layers.
Fig. 5 is the initial and final training networks of LeNet-5 between the matched curve of conv2 layers of network weight histogram Difference curve figure.
Specific embodiment
To more fully understand the present invention, with reference to embodiment and attached drawing, the invention will be further described, but this hair Bright embodiment is without being limited thereto.
It is carried out below by example of the LeNet-5 network models in the deep learning frame Caffe of Berkeley BVLC exploitations Detailed introduction and explanation.LeNet-5 network models are for identifying the convolutional neural networks of handwritten numeral 0 to 9, are to roll up early stage One of most representative network model in product neutral net, the network model can reach 99.20% accurately in MNIST data sets Degree.LeNet-5 network models share 7 layers (not including input layer), every layer of network connection parameter for all including different number. LeNet-5 is mainly made of sample under level 2 volume lamination, 2 layers layer, 2 layers of full articulamentum and 1 layer of output layer, wherein only convolutional layer It needs to update and adjust in the training process with the Connecting quantity of full articulamentum, remaining Connecting quantity varies without.Network mould The name of level 2 volume lamination in type is respectively conv1 and conv2, they are mainly used for extraction data characteristics, convolutional layer Conv1 has 20 convolution kernels, and convolutional layer conv2 has 50 convolution kernels, and the size of each of which convolution kernel is 5, and sliding step is 1, the initialization mode of network weight is xavier.The name of 2 layers of full articulamentum in network model is respectively ip1 and ip2, entirely Articulamentum ip1 has 500 neurons, and full articulamentum ip2 has 10 neurons, their network weight initialization mode is all set For xavier, the network weight obedience of the initialization mode generation of the network weight is uniformly distributed.Above operation can directly pass through Deep learning frame Caffe is realized.
Based on network weight is initial and the redundant weighting minimizing technology of end-state difference, include the following steps:
The specific implementation FB(flow block) of the present invention is as shown in Figure 1, the first step, the first first instruction in LeNet-5 network models The network model of LeNet-5 network model original states, the initial network mould of abbreviation LeNet-5 network models are preserved before practicing Type.The network model of LeNet-5 network model end-state, abbreviation LeNet-5 network models are preserved after first training Final network model.Since the present invention is initially to be weighed according to network weight with the difference of end-state to remove redundant network Weight, therefore first training is mainly by the initial network model for preserving LeNet-5 network models and final network model, to obtain The network weight of the initial and end-state of LeNet-5 network models is obtained, the first training of network model can directly pass through Caffe It realizes.The original state network weight of LeNet-5 network models is that the network model initializes shape in data during training for the first time The network weight of state, the network weight of the state are generated by network weight initial mode xavier, and there is no by any Data set training study.Since deep neural network has automatic learning ability, only need directly to provide data for it, allow it The autonomous rule gone in learning data, particularly as being after being trained using the data of subsidiary correct label to it, they By updating and effect is greatly improved in each connection of adjustment layer and layer, until prediction result that network model exports as far as possible Until consistent with correct label, therefore the network weight of the end-state of LeNet-5 network models is by MNIST data sets Generation is constantly updated and adjusted in training.The initial network model of wherein LeNet-5 and each network of final network model Layer is the same, and is only that Connecting quantity between layers is different.
The distributed intelligence of original state and end-state per layer network weight in LeNet-5 network models is carried out respectively Statistics with histogram simultaneously carries out curve fitting.Every layer network weight in network is carried out according to Fig. 1 the present embodiment FB(flow block) Similary operation, therefore only carry out detailed operation by taking conv2 layers of network weight in LeNet-5 network models as an example below and say It is bright.Conv2 layers be LeNet-5 network models second layer network layer, the initialization mode of the layer network weight is xavier, i.e., The obedience of the initial network weight of this layer is uniformly distributed, and the final network weight of this layer is network model by training up and Learn obtained state, conv2 layers of initial weight and the histogram of final weight are respectively such as the initial network weight of Fig. 2 Shown in the histogram of final network weight in histogram and Fig. 3.By respectively to the histogram of initial weight and final weight It carries out curve fitting, the present embodiment is Density Estimator to the method that weight histogram carries out curve fitting, and obtains conv2 layers The histogram-fitting curve q of initial weight2With the histogram-fitting curve of final weightRespectively such as the initial network weight of Fig. 2 The histogram-fitting curve q of weight2With the histogram-fitting curve of the final network weight in Fig. 3It is shown.Conv2 layer networks are weighed The original state matched curve q of weight2With end-state histogram-fitting curveComparison diagram, as shown in Figure 4.A in Fig. 4, B, tetra- points of C, D are the original state matched curve q of conv2 layer network weights respectively2With end-state histogram-fitting curveFrom the intersection point along transverse axis from small to large, initial network weight histogram and the equal point of final network histogram are represented, a, b, C, d represents that intersection point A, B, C, D correspond to the value of abscissa respectively.
Second step, according to original state and end-state histogram of the LeNet-5 network models per layer network weight not The change of divergence with weight magnitude range judges the distribution of the important and inessential connection weight of every layer network, so as to true respectively The positive threshold value of fixed every layer network beta pruningAnd negative threshold valueThe positive threshold interval at placeWith negative threshold value sectionWherein subscript l Represent the l layer networks in network.By taking conv2 layers of network weight in LeNet-5 network models as an example, as shown in Figure 4, pass through Compare the size of the corresponding network weight histogram of different abscissas, conv2 layers can be obtained at (- ∞, a], [d ,+∞) and (b, c) Final network histogram in section is more than initial network histogram, i.e., the weighted value of final network positioned at (- ∞, a], [d ,+ ∞) and the quantity of the connection weight in (b, c) section is more than initial network, and conv2 layers in (a, b) and (c, d) section Final network weight histogram is less than initial network weight histogram, i.e., the weighted value of final network is located at (a, b) and (c, d) area Between connection weight quantity it is fewer than initial network.It is more important according to the bigger network connection of the absolute value of final network weight Final network weight histogram is more than just by the relation of property and final network weight histogram and initial network weight histogram The region that beginning network weight histogram and network weight histogram are located at larger network weight absolute value connects as network-critical Connect the section of weight;Final network weight histogram is more than initial network weight histogram and network weight histogram is located at Section of the region of smaller network weight absolute value as the inessential connection weight of network.As shown in Figure 4, conv2 layer networks weight Connection weight is wanted to be predominantly located in (- ∞, a] and [d ,+∞) section, and the inessential connection weight of network focuses primarily upon (b, c) On section.Accordingly, it may be determined that the positive threshold value of conv2 layer network beta pruningsWith network beta pruning negative threshold valueThe positive threshold value in section at place SectionWith negative threshold value sectionRespectively (c, d) and (a, b).
3rd step calculates the difference curve of the original state and end-state histogram-fitting curve per layer network weight fl.Wherein, the original state matched curve q of conv2 layer networks weight2With end-state histogram-fitting curveBetween difference It is worth curve f2, i.e.,As shown in Figure 5.
4th step, respectively to the difference curve f of every layer network weightlIn the respective corresponding positive threshold interval in sectionWith negative threshold It is worth sectionUpper slope calculations, in positive threshold intervalInside take difference curve flThe maximum point of slope as network beta pruning just The candidate value point of threshold value, adds in positive Candidate Set Rl, in negative threshold value sectionInside take difference curve flThe minimum point conduct of slope The candidate value point of network beta pruning negative threshold value adds in negative Candidate Set Sl
5th step chooses positive Candidate Set R according to slope successively from big to smalllIn the corresponding abscissa of point value as network The positive threshold value of beta pruningPositive nonceMeanwhile choose negative Candidate Set S successively from small to large according to slopelIn point correspond to The value of abscissa is as network beta pruning negative threshold valueNegative nonceIt is located at section by l layers of final network model againNetwork weight zero setting, and ith network test is carried out, until positive Candidate Set RlOr negative Candidate Set SlIn point time Until having gone through, it is corresponding in test data set accuracy highest finally to choose network modelAs network beta pruning area Between, wherein, the accuracy of network model is prediction result and reality that statistics network model concentrates data in sample As a result whether identical accuracy.
6th step, repeat the above steps 1- steps 5, determines the network beta pruning section of other network layers successively Until the network beta pruning section of all-network layer is determined
7th step ensures that each layer is located at network beta pruning section in the final network model of LeNet-5Network Weight is always zero during retraining.The final network model of LeNet-5 is that training is tied for the first time by LeNet-5 network models The Connecting quantity between layers and the network model of network structure composition finally preserved after first training is preserved after beam, And the design of network structure is before being completed in the training of network model.The purpose of retraining is automatic again through network model Connection between layers is continuously updated and adjusted during the sample information of learning data set, effectively to be promoted data spy Sign extraction and Classification and Identification ability, retraining can be completed directly by deep learning frame Caffe, and Caffe can be in network model The network connection parameter ultimately generated is preserved after trained.In order to keep the network weight with removal during retraining It is always 0, identifier Mask is removed with an equal amount of network weight of l layer network weight quantity by settingl, as record The flag bit whether the layer network weight removes will be located at network beta pruning sectionL layer network weights it is corresponding MasklCorresponding position is arranged to 0, is otherwise provided as 1, concrete operations are as shown in 1 initialization section of algorithm, MasklkRepresent network L layer networks k-th of network weight WlkThe flag bit whether removed.
Algorithm 1:Network beta pruning and retraining based on threshold value
Initialization:
Masklk=0
else
Masklk=1
endif
Beta pruning and retraining:
8th step, by beta pruning network carry out re -training, concrete operations as shown in 1 beta pruning of algorithm and retraining part, The learning rate of network retraining is arranged to the 1/10 of original learning rate after beta pruning, remaining network parameter remains unchanged.Table 1 gives The data of the result of variations of LeNet-5 network models each layer weight after beta pruning, 1 first row of table and secondary series are network models Existing definite network architecture parameters itself, first row represent the title of each layer network layer Layer in network model, second list Show the number of every layer network weight or the number of whole network weight.Third and fourth row is respectively per layer network beta pruning negative threshold valueWith Positive threshold valueThe data can be obtained by above-mentioned steps 1-6.The weight quantity that 5th row represent per layer network to retain after beta pruning accounts for this The weight quantity that whole network retains after the percentage of layer network weight number or beta pruning accounts for the percentage of whole network weight number Than the data can be according to Mask in the every layer network of statistics in step 7lIn 0 number remove in table 1 secondary series per layer network weight Number.As shown in Table 1, network weight quantity can be reduced to original total by network pruning method of the present embodiment based on algorithm 1 Several 7.68%.
Weight result of variations after beta pruning of the table 1LeNet-5 network models based on algorithm 1
9th step, by the final network model of LetNet-5 network models and its network for having been subjected to beta pruning and retraining Model carries out accuracy test in MNIST data sets respectively in test data set, and compares the accuracy obtained by the two. The have been subjected to beta pruning and the network model of retraining of LetNet-5 networks are to instruct its final network model by beta pruning and again It drills to obtain network model, the network structure of the two is identical, but their network connection parameter is different.Network model Accuracy refers to the network model frequency identical with actual result to the prediction result of data set sample, such as LetNet-5 network moulds The final mask of type predicts the content information of a certain sample of test data in MNIST data sets, judges its prediction knot Whether the actual result of fruit and sample is identical and counts identical frequency.When test of the completion to all samples of test data Afterwards, the accuracy of network model is finally obtained.The operation of the process and result can be completed directly by deep learning frame Caffe With obtain, concrete outcome is as shown in table 2.As shown in Table 2, the LetNet-5 network models of beta pruning retraining are in MNIST data sets On accuracy improve 0.06 percentage point up to 99.28%, 99.22% than final network model, show the network mould Type can realize the effect of removal network redundancy weight under the conditions of legacy network accuracy is kept substantially, therefore stop operation simultaneously Retain the network model.
The final network accuracy of accuracy ratio of network model after beta pruning and retraining decrease beyond more than preset value When, the preset value of wherein this example is arranged to 1%, and the present embodiment is further important according to network weight amplitude of variation optimised The choice of connection weight is continued to execute and operated in next step.
Table 2LeNet-5 is compared based on the network accuracy of 1 final network model of algorithm and beta pruning retraining
Tenth step calculates per the amplitude of variation between the final network weight of layer network and initial network weight, passes through ratio Compared with network weight amplitude of variation withOrSize, zero setting is carried out to each layer network weight of final network again Processing, α represent network weight amplitude of variation coefficient, and member-retaining portion is in step 7 by the weight of beta pruning;To the net of beta pruning again Network carries out retraining, and the weight of beta pruning is made to remain 0 during retraining, specifically as shown in algorithm 2.Algorithm 2 It is to be designed according to the importance of the network connection relation related with final network weight size and amplitude of variation size, algorithm Situation 1 in 2 initialization steps is that the final bigger network connection of network weight of reflection is more important, and situation 2-5 is the final net of reflection Network weight is related with the importance of network connection with the amplitude of variation size of initial network weight.
In 2 initialization step of algorithm, situation 1 is represented as w 'lkIt is less thanOr it is more thanWhen, by weight WlkRetain;Situation 2 It represents as w 'lkPositioned at sectionWhen, if in network weight learning process, WlkBecome negative value from positive value, then should Weight retains, it is known that weight amplitude of variation meets at this timeSituation 3 is represented as w 'lkPositioned at sectionIf in network weight learning process, WlkBecome positive value from negative value, then retain the weight, it is known that weigh at this time Weight amplitude of variation meetsSituation 4 is represented as w 'lkIt is more thanAnd it is less thanBut in network weight learning process In, WlkBecome larger negative value from smaller negative value, and amplitude of variation meetsWhen, by weight WlkRetain;Feelings Condition 5 is represented as w 'lkIt is more thanAnd it is less than, but in network weight learning process, WlkBecome larger from smaller positive value Positive value, and amplitude of variation meetsWhen, by weight WlkRetain.
The present embodiment usually sets α=mean (abs (w ' firstl))/std(w′l), wherein mean (abs (w 'l)) and std (w ' l) represents this layer of final average of network weight absolute value and the standard deviation of weight respectively, judge after beta pruning network whether base The final network accuracy of this holding, if so, the model after beta pruning is directly preserved, otherwise, according to from 1 to 0 with the step of β=0.1 The long mode gradually successively decreased adjusts α.As α=1, algorithm 2 is equivalent to algorithm 1, and as α=0, algorithm 2 is not cut network Branch operation.And the 8th step describes 1 beta pruning of algorithm and retraining part is identical for the concrete operations of the re -training of beta pruning network progress.
Algorithm 2:Network beta pruning and retraining based on threshold value and weights amplitude of variation
Initialization:casew′lk of
case1:
Masklk=1
case2:
Masklk=1
case 3:
Masklk=1
case 4:
Masklk=1
case 5:
Masklk=1
else
Masklk=0
end
Beta pruning and retraining:

Claims (10)

1. based on network weight is initial and the redundant weighting minimizing technology of end-state difference, it is characterised in that including walking as follows Suddenly:
Step 1, the network weight that its original state is first preserved before network model training, it is final to preserve it after training The network weight of state, the respectively distributed intelligence of the original state to every layer network weight and end-state carry out statistics with histogram And carry out curve fitting, respectively obtain every layer of initial network weight histogram matched curve qlIntend with final network weight histogram Close curveWherein subscript l represents the l layer networks in network, and the value range of l depends on network and shares convolutional layer and entirely The sum of articulamentum;
Step 2, according to initial network weight histogram matched curve qlWith final network weight histogram-fitting curveDifference It is different, it is determined respectively per the positive threshold value T of layer network beta pruningl +With negative threshold value Tl -The positive threshold interval at placeWith negative threshold value section And according to initial network weight histogram matched curve qlWith final network weight histogram-fitting curveBetween difference, obtain To difference curve fl
Step 3, respectively calculating difference curve flIn positive threshold intervalWith negative threshold value sectionOn slope, be chosen at positive threshold value SectionThe maximum point of interior slope is as the positive threshold value T of network beta pruningl +Candidate value point, add in positive Candidate Set Rl, it is chosen at negative Threshold intervalThe minimum point of interior slope is as negative threshold value Tl -Candidate value point, add in negative Candidate Set Sl
Step 4 chooses positive Candidate Set R according to slope successively from big to smalllIn point, using the value of the corresponding abscissa of point as Positive threshold value Tl +Positive nonceChoose negative Candidate Set S successively from small to large according to slope simultaneouslylIn point, which is corresponded to Abscissa value as Tl -Negative nonceI represents the positive nonce of ithOr negative nonceSelection, until Positive Candidate Set RlOr negative Candidate Set SlIn point traveled through until;
Step 5 is located at section by l layers of final network modelNetwork weight zero setting, and in test data set Ith network model accuracy test is carried out, until positive Candidate Set RlOr negative Candidate Set SlPoint traveled through until, choose network Model is corresponding in the acquirement highest accuracy of test data concentrationAs network beta pruning section (Tl -,Tl +);
The network beta pruning section (T of other network layers in addition to l is chosen in step 6, the operation for repeating step 1- steps 5l -,Tl +), directly To the network beta pruning section (T for having chosen all-network layerl -,Tl +);
Final network model is located at network beta pruning section (T by step 7l -,Tl +) in network weight carry out zero setting processing, realize Beta pruning per layer network, obtains the network model of beta pruning;
The network model of obtained beta pruning is carried out retraining by step 8, and keeps the network weight of beta pruning in retraining It is always 0 in the process, obtains the network model of beta pruning and retraining;
If step 9, beta pruning and the network model of retraining test data set the final network model of accuracy ratio be less than it is pre- If value, then stop operation, and retains the network model of beta pruning and retraining, otherwise, next step operation is carried out;
Step 10 calculates every layer of final network weight and initial network weight amplitude of variation, changes width by comparing network weight Degree and α × Tl -Or α × Tl +Size, zero setting processing is carried out to each layer network weight of final network model again, α represents net Network weight amplitude of variation coefficient, member-retaining portion is in step 7 by the weight of beta pruning;Again the network of beta pruning is instructed again Practice, the weight of beta pruning is made to remain 0 during retraining;After retraining, step 9 is returned to.
2. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that:In step 1, the network model is the LeNet-5 network models in deep learning frame Caffe.
3. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that:In step 1, the network weight of the original state is generated at random by corresponding network weight generating function Value represents the initial value of the network weight before network model is trained.
4. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that:In step 1, the network weight of the end-state is that network model passes through in data set training and constantly updates With the network weight of adjustment generation;In step 5, step 7 and step 9, the final network model is that training terminates in step 1 Corresponding network model afterwards, the network weight of final network model are the network weights of the end-state in step 1.
5. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that:In step 1, the curve matching is to the original state of every layer network weight and the distribution letter of end-state On the basis of breath carries out statistics with histogram, the profile of histogram is fitted using Density Estimator method, is intended accordingly Close curve.
6. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that:In step 8, the retraining is instructed again to concentrating to obtain trained network model in training data Practice, connected between layers in update and adjustment network model again, wherein the parameter connected between layers includes network weight Weight and network biasing.
7. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that:In step 8, it 0 is to pass through setting that the network weight of the holding beta pruning is always during retraining Network weight removal identifier Mask an equal amount of with l layer network weight quantitylIt realizes, MasklIt is to record the layer some net The flag bit whether network weight removes by being multiplied with the correspondence position in the layer network weight, is realized per layer network weight Removal.
8. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that:In step 9, the value of the preset value is [0.5%, 2%].
9. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that:In step 10, the comparing cell weight amplitude of variation and α × Tl -Or α × Tl +Size by the following method It realizes:If the initial value w of k-th of weight in l layers of networklk, end value is w 'lk, the amplitude of variation of weight is Δ wlk= w′lk-wlk, the selection standard of network weight amplitude of variation is any of following four kinds of situations:
(1)w′lk< α × Tl -&&w′lk×wlk< 0 is represented as w 'lkPositioned at section (Tl -,α×Tl -) when, if being trained for the first time Cheng Zhong, WlkBecome negative value from positive value, then retain the weight, weight amplitude of variation meets Δ w at this timelk< α × Tl -
(2)w′lk> α × Tl +&&w′lk×wlk< 0 is represented as w 'lkPositioned at section (α × Tl +,Tl +), if in first training process In, WlkBecome positive value from negative value, then retain the weight, weight amplitude of variation meets Δ w at this timelk> α × Tl +
(3)Δwlk< α × Tl -&&Δwlk×wlk> 0 is represented as w 'lkMore than Tl -And less than Tl +, but in first training process, WlkBecome larger negative value from smaller negative value, and amplitude of variation meets Δ wlk< α × Tl -When, by weight WlkRetain;
(4)Δwlk> α × Tl +&&Δwlk×wlk> 0 is represented as w 'lkMore than Tl -And less than Tl +, but in first training process, WlkBecome larger positive value from smaller positive value, and amplitude of variation meets Δ wlk> α × Tl +When, by weight WlkRetain.
10. it is according to claim 1 based on network weight is initial and the redundant weighting minimizing technology of end-state difference, It is characterized in that:The selection of weight amplitude of variation factor alpha in step 10 first sets α=mean (abs (w 'l))/std(w′l), Middle mean (abs (w 'l)) and std (w 'l) this layer of final average of network weight absolute value and the standard deviation of weight are represented respectively, Whether network reaches preset standard after judging beta pruning, if so, directly preserve beta pruning after network model, otherwise, according to from 1 to 0 adjusts weight amplitude of variation factor alpha in a manner that step-length β gradually successively decreases, until network reaches preset standard after the beta pruning, The value range of middle step-length β is (0,1).
CN201711385134.XA 2017-12-20 2017-12-20 Based on network weight is initial and the redundant weighting minimizing technology of end-state difference Pending CN108090564A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711385134.XA CN108090564A (en) 2017-12-20 2017-12-20 Based on network weight is initial and the redundant weighting minimizing technology of end-state difference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711385134.XA CN108090564A (en) 2017-12-20 2017-12-20 Based on network weight is initial and the redundant weighting minimizing technology of end-state difference

Publications (1)

Publication Number Publication Date
CN108090564A true CN108090564A (en) 2018-05-29

Family

ID=62176226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711385134.XA Pending CN108090564A (en) 2017-12-20 2017-12-20 Based on network weight is initial and the redundant weighting minimizing technology of end-state difference

Country Status (1)

Country Link
CN (1) CN108090564A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635935A (en) * 2018-12-29 2019-04-16 北京航空航天大学 Depth convolutional neural networks model adaptation quantization method based on the long cluster of mould
CN111582456A (en) * 2020-05-11 2020-08-25 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for generating network model information
CN111967591A (en) * 2020-06-29 2020-11-20 北京百度网讯科技有限公司 Neural network automatic pruning method and device and electronic equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635935A (en) * 2018-12-29 2019-04-16 北京航空航天大学 Depth convolutional neural networks model adaptation quantization method based on the long cluster of mould
CN109635935B (en) * 2018-12-29 2022-10-14 北京航空航天大学 Model adaptive quantization method of deep convolutional neural network based on modular length clustering
CN111582456A (en) * 2020-05-11 2020-08-25 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for generating network model information
CN111582456B (en) * 2020-05-11 2023-12-15 抖音视界有限公司 Method, apparatus, device and medium for generating network model information
CN111967591A (en) * 2020-06-29 2020-11-20 北京百度网讯科技有限公司 Neural network automatic pruning method and device and electronic equipment
CN111967591B (en) * 2020-06-29 2024-07-02 上饶市纯白数字科技有限公司 Automatic pruning method and device for neural network and electronic equipment

Similar Documents

Publication Publication Date Title
CN110020682B (en) Attention mechanism relation comparison network model method based on small sample learning
CN107633301B (en) A kind of the training test method and its application system of BP neural network regression model
CN107679617A (en) The deep neural network compression method of successive ignition
CN108345911A (en) Surface Defects in Steel Plate detection method based on convolutional neural networks multi-stage characteristics
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
CN106023195A (en) BP neural network image segmentation method and device based on adaptive genetic algorithm
CN111104513B (en) Short text classification method for question and answer service of game platform user
CN110427965A (en) Convolutional neural networks structural reduction and image classification method based on evolution strategy
CN109344888A (en) A kind of image-recognizing method based on convolutional neural networks, device and equipment
CN108090564A (en) Based on network weight is initial and the redundant weighting minimizing technology of end-state difference
CN111768000A (en) Industrial process data modeling method for online adaptive fine-tuning deep learning
CN109766935A (en) A kind of semisupervised classification method based on hypergraph p-Laplacian figure convolutional neural networks
CN108665322A (en) The construction method of grain ration Consumption forecast model, Consumption forecast method and device
CN114897144A (en) Complex value time sequence signal prediction method based on complex value neural network
CN115511069A (en) Neural network training method, data processing method, device and storage medium
Pietron et al. Retrain or not retrain?-efficient pruning methods of deep cnn networks
CN112200208B (en) Cloud workflow task execution time prediction method based on multi-dimensional feature fusion
CN117034060A (en) AE-RCNN-based flood classification intelligent forecasting method
CN115454988B (en) Satellite power supply system missing data complement method based on random forest network
CN115906959A (en) Parameter training method of neural network model based on DE-BP algorithm
CN113205182B (en) Real-time power load prediction system based on sparse pruning method
Ancona et al. On the importance of sorting in" Neural Gas" training of vector quantizers
CN108932550B (en) Method for classifying images based on fuzzy dense sparse dense algorithm
CN110265092B (en) Artificial intelligence-based antibody-antigen molecule docking evaluation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180529

WD01 Invention patent application deemed withdrawn after publication