CN109523016A

CN109523016A - The multivalue quantisation depth neural network compression method and system of embedded system

Info

Publication number: CN109523016A
Application number: CN201811390683.0A
Authority: CN
Inventors: 郭庆北
Original assignee: University of Jinan
Current assignee: University of Jinan
Priority date: 2018-11-21
Filing date: 2018-11-21
Publication date: 2019-03-26
Anticipated expiration: 2038-11-21
Also published as: CN109523016B

Abstract

The invention discloses the multivalue quantisation depth neural network compression methods and system of embedded system, comprising: the grade M for needing to quantify is determined according to the available storage resource of embedded system；The weight of each convolutional layer in convolutional neural networks is divided into M section；The weight of current layer is constrained according to each section size of each convolutional layer；The restrained convolutional neural networks of retraining update weight；Section movement and range shortening are realized to update each section；Each section size according to each convolutional layer is repeated to constrain the weight in each section, the restrained convolutional neural networks of retraining weight update weight, realize section movement and range shortening to update each section step, until the polymerization of each section, that is, weight are shared；After weight is shared, continue retraining to obtain higher performance.Compressed network is deployed in embedded system and mobile platform, the memory space of network is on the one hand reduced, on the other hand keeps the accuracy of identification of network.

Description

The multivalue quantisation depth neural network compression method and system of embedded system

Technical field

This disclosure relates to field of computer technology, more particularly to the multivalue quantisation depth nerve net of embedded system Network compression method and system.

Background technique

In recent years, depth convolutional neural networks have been achieved for huge advance in various Computer Vision Tasks.Like that Progress be derived mainly from three aspect development: large-scale dataset, the various network frames gradually deepened and various depth The method of habit.However, such depth network frame needs more computing capabilitys and memory consumption.For embedded and Application in mobile device, inevitably needs to consider to calculate and the limitation of memory, this limitation hinder depth nerve net The expanded application of network.In addition, although deep neural network can be trained on high-performance host, and calculate negative with the height of memory Load is so that test process can hardly be completed in such equipment.Therefore, simultaneously compress and accelerate deep neural network To its actual application and universal have great importance.

Many research has begun the size for setting about reducing network and accelerates network model, and damages on accuracy of identification It loses seldom, even without loss.Quantization is the effective ways of one while compression and acceleration deep neural network. It is+1 or -1 that BinaryConnect, which constrains all weights by using two Value Operations based on sign function,.With BinaryConnect is similar, and Binarized Neural Networks (BNN) is all with same two-valued function binaryzation Weight.Binary Weight Networks (BWN) the approximate full essence of the positive zoom factor of a two-value weight matrix and one Degree weight matrix realizes compression depth convolutional neural networks, and by conversion convolution operation be addition and subtraction operation come It realizes to calculate and accelerate.Ternary Weight Networks (TWN) uses three value weights :+1, -1 and 0 preferably approximate full essence It spends weight and more powerful ability to express is provided, and harvested higher accuracy of identification.Trained Ternary Quantization (TTQ) using two it is asymmetrical train contraction factor more accurately approximate full precision weight matrix, take Obtaining higher accuracy even has been more than TWN in some visual tasks.

Inventor has found under study for action, and above-mentioned quantization method possesses following common feature: although (1) these two-values or Three value networks retain full precision weight to be better carried out stochastic gradient descent (SGD) in the parameter more new stage, but directly Ground, which quantifies all weights necessarily, leads to biggish reconstructed error, and then biggish accuracy is caused to be lost.(2) these methods The compression for pursuing extreme is deployed in as much as possible in embedded and mobile device.However, different equipment possess it is different size of Resource.Particularly, with the development of microchip technology, such equipment has more resources.Therefore, if these not Equipment with resource can match an optimal quantization level, and model size and identification essence have been weighed in the quantization of this customization Degree, this is that two-value and three value quantization methods are irrealizable.

Summary of the invention

In order to solve the deficiencies in the prior art, present disclose provides embedded system multivalue quantisation depth nerve Web compression method enables compressed network to be deployed in embedded system and mobile platform, and one side of this method Face reduces the memory space of network, on the other hand keeps the accuracy of identification of network.

To achieve the goals above, the application uses following technical scheme:

The multivalue quantisation depth neural network compression method of embedded system, comprising:

The grade M for needing to quantify is determined according to the available storage resource of embedded system, that is, every layer of ownership Weight turns to M weighted value；

The L layer convolutional neural networks of a given pre-training；

The weight of each convolutional layer in convolutional neural networks is divided into M section by partition initialization, is determined each The affiliated subregion of weight；

The weight for belonging to each section is constrained according to each section size of each convolutional layer；

The restrained convolutional neural networks of retraining weight update weight；

Section movement and range shortening are realized to update each section；

Each section size according to each convolutional layer is repeated to constrain the weight in each section；Retraining weight is by about The convolutional neural networks of beam update weight；Section movement and range shortening are realized to update each section step, until every One section polymerization, that is, the weight in each section is finally quantified as section polymerizing value, that is, it is shared to realize weight；

After weight is shared, continues retraining to obtain higher performance, that is, calculate shared weight gradient, update each A shared weight.Moreover, the convolutional neural networks of shared weight only need to store shared weight and corresponding index, therefore Greatly save memory space.

Further technical solution, about partition initialization, for the l convolutional layer in the t times period, according to its weight All weights that value divides it are M nonoverlapping subregions, it is expressed asHere,It is expressed as I subregion, 0 < i≤M, each subregion include a part of weight,For W_l ^tJ-th of weighted value, express each subregion Weight setIt is combined into corresponding weight indexed set WithTable respectively The weight set and weight index for being shown as i-th of subregion are gathered, and ifSoAnd

Further technical solution, when the weight of each convolutional layer is divided into M section, using k-means's Initial method determines the affiliated subregion of each weight, and k-means means to minimize quadratic sum in class:

Wherein,Represent subregionCorresponding cluster centre.

Further technical solution realizes that section is mobile and range shortening is come when updating each section, section updates former Then:

HereIt is expressed as i-th of subregion (0 < i≤M) of l convolutional layer when the t times period.WithIt indicates Lower bound and the upper bound, τ^tIt is average and variance rate,WithIt is the contraction size on section both sides, they determine mean value respectively Mobile and range shortening step sizes,Represent sectionThe distance of average and variance.WhereinWithRespectively indicate current and updated region average:

When a section is less thanWhen, it is meant that update again is so that two boundaries in section exchange position. Therefore, seek section polymerize to a weight share value when, update rule it is rewritable be as follows:

Further technical solution constrains the power for belonging to each section according to each section size of each convolutional layer When weight, the formula that utilizes are as follows:

Further technical solution, the restrained network of retraining, when updating weight, the formula that is utilized are as follows:

Wherein,It indicates during weight quantizationUpdate weight after the s times iteration in the t times period,It is again the update weight after iteration, λ^tIt is current learning rate.

Further technical solution updates shared weight.The gradient of shared weight calculates as follows:

Wherein, C represents cost function.C is about sectionUpdate gradient be equal to sectionIn each weight gradient mean value.

Therefore, optimized according to stochastic gradient descent, it is as follows to share weight update calculating:

Wherein,It indicates during weight is sharedUpdate weight after the s times iteration in the t times period,It is again the update weight after iteration, wherein λ^tRepresent current learning rate.

Another aspect of the present disclosure is to disclose a kind of computer readable storage medium.

To achieve the goals above, the present invention is using a kind of following technical solution:

A kind of computer readable storage medium, wherein being stored with a plurality of instruction, described instruction is suitable for by terminal device Reason device loads and executes the multivalue quantisation depth neural network compression method of the embedded system.

The multivalue quantisation depth neural network compressibility being to provide based on embedded system of the third aspect of the present invention.

Multivalue quantisation depth neural network compressibility based on embedded system, comprising:

The level de-termination unit of quantization determines the grade M for needing to quantify according to the available storage resource of embedded system, The ownership weight every layer is exactly needed to turn to M weighted value；

Weight division unit gives the L layer convolutional neural networks of a pre-training；Each in convolutional neural networks is rolled up The weight of lamination is divided into M section；

Weight constraints unit constrains the weight of current layer according to each section size of each convolutional layer；

Weight updating unit, the restrained convolutional neural networks of retraining update weight；

Section updating unit realizes section movement and range shortening to update each section；

Weight shared cell repeats each section size according to each convolutional layer to constrain the weight in each section； The restrained convolutional neural networks of retraining weight update weight；Section movement and range shortening are realized to update each area Intermediate step, until the polymerization of each section, that is, weight are shared；

Shared weight updating unit after weight is shared, continues retraining to obtain higher performance, first calculates shared Weight gradient, then update each shared weight.

Compared with prior art, the beneficial effect of the disclosure is:

It can be to possess the embedded of limited resources and move by the technical solution that a large amount of Experimental results show goes out the disclosure Dynamic equipment provides the customization compression service of optimal compression ratio and performance.

In quantization level same as two-value and three value networks, the same energy of multivalue quantization method of the technical solution of the disclosure Enough reach same requirement.For higher quantization level, two-value and three value networks before is not accomplished, and the disclosure Technical solution can not only realize and obtain higher performance.

On the one hand the disclosure reduces the memory space of network, on the other hand keep the accuracy of identification of network, pass through multivalue amount Change can provide tailored compression service for the embedded system and mobile platform of different resource sizes, and according to resource Size realizes optimal recognition capability.

Detailed description of the invention

The accompanying drawings constituting a part of this application is used to provide further understanding of the present application, and the application's shows Meaning property embodiment and its explanation are not constituted an undue limitation on the present application for explaining the application.

Fig. 1 (a)-Fig. 1 (d) is the quantization compression process schematic diagram of the multivalue quantization method of some examples of implementation of the application；

Fig. 2-Fig. 3 is that nicety of grading of the multivalue quantitative model of some examples of implementation of the application under a variety of quantization levels is shown It is intended to；

Wherein, Δ represents quantization starting point, and represents all sections congruent point (sharing weight point for the first time) for the first time, and zero represents Best result point after polymerization.

Specific embodiment

It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another It indicates, all technical and scientific terms used herein has logical with the application person of an ordinary skill in the technical field The identical meanings understood.

It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular shape Formula be also intended to include plural form, additionally, it should be understood that, when in the present specification use term "comprising" and/or When " comprising ", existing characteristics, step, operation, device, component and/or their combination are indicated.

Just as described in the background art, deep neural network has been achieved for higher water in various identification missions Standard, this is dependent on deep network frame and a large amount of parameter.However, to calculating and the widespread demand of memory hinders it to having Limit the expanded application of the embedded and mobile device of resource.

As it can be seen that although deep neural network has had reached the performance of highest level now, due to embedded system Limited resources limitation and deep neural network demand of the huge parameter to storage, therefore these neural network models It can not directly be deployed in embedded system and mobile platform.For this problem, the technical solution of the disclosure provides multivalue amount Change compression method compression depth convolutional neural networks, to meet the deployment conditions in the equipment of limited resources.

In view of the above-mentioned problems, the application discloses a novel multivalue quantization method (referred to as in specific embodiment For MVQ), the existing resource of its clear application apparatus matches an optimal quantization level to balance compression and performance.The disclosure Method divide first every layer weight be multiple sections each section is then forced using average and variance and range shortening Weight is shared.Theoretically, the quantization of this multivalue can be converted into the special three value quantization for possessing multiple zoom factors, this meaning Disclosed method bigger ability to express is provided than existing two-value and three value networks.

The novel method based on the quantization of the multivalue of average and variance and range shortening of the disclosure come compression depth neural network, So that compressed network can be applied to the limited embedded system of resource.

The technical solution of the disclosure matches optimal quantization level according to equipment existing resource and mentions for deep neural network Model size and accuracy of identification are weighed for the compression service of customization.

In the specific embodiment of the application, it is related to definition and the multivalue quantization method of multivalue quantization network.

The convolutional neural networks of multivalue quantization are introduced first:

For one L layers of convolutional neural networks, in order to weigh model size and accuracy of identification, the disclosure seeks to determine one A optimal quantization level minimizes the reconstructed error of full precision weight.For the weight matrix W of first of convolutional layer_l, this It is open that each weight is divided to multiple approximate submatrix α according to the value of these weights_liT_li(0<i≤M).This optimization problem It can be formulated as follows:

Wherein, α_li∈R⁺It is the zoom factor of a positive real number, T_li∈{+1,0}^c×w×hOr { -1,0 }^c×w×h, that is, Say T_liWill not occur simultaneously in matrix {+1,0, -1 }, therefore it is a non-negative or non-positive three special value weight matrix. Wherein, c, w and h respectively indicate channel, width and height, that is to say, that each weight matrix can be analyzed to the spy of multiple weightings The sum of three different value weight matrix, M are the number of subregion.

Using the approximation of weight matrix, embodiments herein uses maximum approximate each layer of the convolution of multivalue weight, It is as follows:

Wherein, I_lIt is l layers of input tensor, Z_lIt is l layers of activation tensor, * represents the convolution operation with product, ⊕ represents the convolution operation of not product.

It is specific as follows about multivalue quantization method in the specific embodiment of the application:

In two-value and three value methods before, two-value and three value weight matrix by a two-value based on threshold value and Ternary function approximatively obtains, and zoom factor solves an optimization problem by derivative to obtain.However, the multivalue of the disclosure Quantization method is by training study while obtaining three value weights and multiple zoom factors.Although three value weights, T_liMatrix In will not occur simultaneously {+1,0, -1 }, therefore it is a non-negative or non-positive three special value weight matrix.

Shown in the general idea of the application specific embodiment such as Fig. 1 (a)-Fig. 1 (d), the exhibition in Fig. 1 (a)-Fig. 1 (d) The quantization compression process of the multivalue quantization method of the disclosure is shown.Firstly, for the deep neural network of a pre-training, this public affairs It opens and each layer of weighted value is divided them to multiple sections according to k-means, and each weight is made to belong to a weight area Between.The number in section represents the quantization level of deep neural network.Then, after each iteration in training, all weights Optimization is realized by SGD or Adam, and which results in the changes of the mean value in each section.

The disclosure mobile each section on new mean value direction, while each section is shunk to smaller section.For Those are belonged to a section but are fallen into the weight of the outside in new section due to average and variance and range shortening, they are by about Nearest boundary value of the beam between their new districts minimizes weight reconstructed error.On the contrary, still being fallen in for those in new district The weight in portion, they are according further to original propagated forward and the training of backpropagation mode.Finally, by gradually moving and receiving Contracting, each section are polymerized to a numerical value, it is exactly the contraction factor of a quantization weight.Belong to the power in the same section This weighted value is shared again.Also, this shared weight can continue to optimize to obtain more with the gradient mean value in each section Good recognition performance.The weight being quantized for one, it is only necessary to store its concordance list that indexes, therefore, this is substantially reduced The storage demand of deep neural network.

In specific embodiment, about initialization subregion:

Partition initialization is the very crucial first step, the quantized result of its greatly influence depth neural network.For The l convolutional layer in the t times period, the disclosure are M nonoverlapping subregions according to all weights that its weighted value divides it, It is expressed asHere,It is expressed as i-th of subregion (0 < i≤M).Each subregion includes one Fraction weight.Disclosure expressionFor W_l ^tJ-th of weighted value, express the weight set of each subregionIt is combined into corresponding weight indexed setHere,WithIt respectively indicates For weight set and weight the index set of i-th subregion, and ifSoAnd

The disclosure determines the affiliated subregion of each weight using the initial method of k-means.K-means means minimum Change quadratic sum in class, as follows:

Here,Represent subregionCorresponding cluster centre.

In addition, weight distribution knowledge is also the factor of an important influence quantization ability.For a depth nerve net Network, its parameter carry out parameter by gradient descent algorithm and update realization optimization, meanwhile, the variation that parameter updates is gradually reduced. Be conducive to further decrease such variation, such as exponential damping learning rate moreover, in a planned way reducing learning rate.This meaning The different training stages have different weight distributions, it determine weight affiliated subregion.For the depth mind of a pre-training Through network, it has more stable parameter distribution, this makes each subregion have relatively-stationary weight.Therefore, disclosure quantization one The deep neural network of a pre-training, and seek to quantify since lesser learning rate to reach better quantized result.

In specific embodiment, about regeneration block:

Regeneration block is made of two important operations: average and variance and range shortening.After weight update, they are most Change very smallly in the principal direction of smallization objective function, accommodating, which does not doubt this, yet causes the mean value in each section to change.Each area Between mean value change direction represent one optimization direction, part or global minima can be reached in this direction.Cause This, for the disclosure towards the new weight equal value direction moving section in section, this operation is named as average and variance.It is current and more Region average after new can calculate as follows:

Range shortening is the important operation that another subregion updates.It shrinks in the new mean value direction in this operation towards section Two boundaries in section, therefore constrain each section to smaller section.Clipping is a practicable realization canonical The method of change, range shortening also are used as regularizer.According to the two basic operators, the disclosure derives following Subregion updates principle:

HereWithIt indicatesLower bound and the upper bound, τ^tIt is average and variance rate,WithIt is section on both sides Shrinking percentage, they determine the step sizes of average and variance and range shortening respectively.Represent average and variance Distance.

Especially, when a section is less thanWhen, that mean update again so that section two boundaries Exchange position.However, the purpose of the disclosure is to seek section to polymerize the value shared to a weight.Therefore, in such situation Under, it is as follows that it is rewritable, which to update rule:

However, more and more weighted values are gathered in the both sides in each section with contraction process, this is easy to cause subsequent Contraction process generate biggish reconstructed error.In order to mitigate such problem, the disclosure is come using the contraction attenuation rate of index Ensure smaller and smaller shrinking percentage and network model allowed to train in a controlled manner, it is defined as follows:

Here, r_sIndicate starting shrinking percentage, r_fIndicate final shrinking percentage, N indicates total number of contractions.

Furthermore, it is contemplated that the non-uniform Distribution feature of every layer of weight, the disclosure update the shrinking percentage on section both sides in proportion, It is as follows:

Here,WithRespectively represent sectionTwo back gauge mean values Ratio,Represent last (t-1) period shrinking percentage.

It should be noted that: one suitable shrinking percentage of selection is very important to a good quantized result is obtained.It shrinks Rate is too small to be easy to cause quantization slowly, and shrinking percentage easily causes greatly very much biggish weight reconstructed error and leads to moving back for precision Change.Moreover, in order to reduce the negative effect of accuracy of identification caused by range shortening, L₂Regularization is used to promote ownership The pulverised of weight.

In specific embodiment, about constraint weight:

For each layer, the weight of two-value before and three value methods constraint full precision to fixed section [- 1, 1].On the contrary, the multivalue quantization method of the disclosure constrains all weights to a dynamic reduced domain.It changes for the first time Dai Shi, each section do not need any variation.The disclosure considers such a convolutional layer, its section is due to shifting before Variation occurred for dynamic and contraction.Specifically, after the update of all weights, since weight updates, average and variance and range shortening etc. Reason, the seldom weight of only a part is departing from their own interval range.Since constraining weight is one actually useful Regularization method, the disclosure also constrain the section of the update of all weights to them.In order to minimize weight reconstructed error, The disclosure seeks approximate each weight in its update section, as follows:

Since each section be it is one-dimensional, the optimization numerical value of the weight outside section must be nearest that in two boundary values One numerical value.Therefore, above equation is equivalent to following formula:

Two-value and full precision weight has been respectively adopted in quantization and renewal process in two-value and three value quantization methods before. Different from them, disclosed method is in entire training process only with full precision weight.In addition, because small ginseng Number variation and lesser Learning Step, the outside for updating section is fallen in there are considerably less weight.Therefore, disclosed method Approximate original weight more accurately than method before.

After constraint, the disclosure keeps its identification essence by this quantization network of the training process retraining of standard Degree.Therefore, it is as follows to update calculating for the weight during retraining:

In specific embodiment, weight is shared about updating:

Since each section uses range shortening, its size is incrementally decreased.Finally, subregion updates rule and guarantees each Subregion is polymerized to a shared weighted value, it is exactly the zoom factor of quantization weight, and the ownership in the same subregion Identical weight is shared again.The disclosure reaches out for three special value weight matrix and multiple zoom factors to assess full precision Weight.Different from work before, they obtain two-value and three value weight matrix by the function based on threshold value, by asking Solution optimization problem acquires zoom factor.And the three special value weight matrix of the disclosure and multiple zoom factors are can to train one It tracks down and recovers.Moreover, these zoom factors can constantly be trained to obtain better optimum results.These zoom factors It is updated using the gradient of shared weight, the gradient calculating for sharing weight is as follows:

Therefore, optimized according to stochastic gradient descent, it is as follows that this weight updates calculating:

Wherein,It indicates during weight is sharedUpdate weight after the s times iteration in the t times period,It is again the update weight after iteration, λ^tRepresent current learning rate.

In a kind of typical embodiment of the application, the multivalue quantisation depth nerve net of embedded system is provided Network compression method, comprising:

Step 1: determining the grade M for needing to quantify according to the available storage resource of embedded system, that is, need every The ownership weight of layer turns to M weighted value.

Step 2: the L layer convolutional neural networks of a given pre-training.

Step 3: the weight of each convolutional layer is divided into M section according to k-means.

Step 4: according to formula (12), i.e., according to each section size (each section here of each convolutional layer May and range shortening mobile by multiple section) constrain the weight of current layer.

Step 5: the network that retraining is restrained, updates weight according to formula (13).

Step 6: realizing section movement and range shortening according to formula (6) and (7) to update each section.

Step 7: the 4th step, the 5th step and the 6th step are repeated, until the polymerization of each section, that is, weight are shared.

Step 8: continue retraining after weight is shared to obtain higher performance, first calculate shared weight gradient, Each shared weight is updated further according to formula (15).

Another specific embodiment of the disclosure, discloses a kind of computer readable storage medium, wherein being stored with a plurality of finger It enables, described instruction is suitable for being loaded by the processor of terminal device equipment and executes following processing: including:

The grade M for needing to quantify is determined according to the available storage resource of embedded system, that is, needs the institute every layer Weight of having the right turns to M weighted value；

The L layer convolutional neural networks of a given pre-training；

The weight of each convolutional layer in convolutional neural networks is divided into M section；

The weight of current layer is constrained according to each section size of each convolutional layer；

The restrained convolutional neural networks of retraining update weight；

Section movement and range shortening are realized to update each section；

Each section size according to each convolutional layer is repeated to constrain the weight in each section；Retraining weight is by about The convolutional neural networks of beam update weight；Section movement and range shortening are realized to update each section step, until every One section polymerization, that is, weight are shared；

After weight is shared, continues retraining to obtain higher performance, first calculate shared weight gradient, then update every One shared weight.

Another specific embodiment of the disclosure, discloses the multivalue quantisation depth neural network pressure based on embedded system Compression system,

Weight constraints unit quantifies the weight of current layer according to each section size of each convolutional layer；

Weight updating unit, the convolutional neural networks that retraining is quantized update weight；

In order to preferably prove the particular technique effect of the application, the following discloses specific emulation experiments:

MNIST data set: MNIST is a very popular and widely applied image data set.It is by training Collection and test set are constituted, and separately include 60000 and 10000 28*28 grey pictures.One hand-written number of each sample representation Word 0~9.The disclosure increases the resolution ratio of each sample to 32*32 by filling 2 pixels on each side.

Network frame: the disclosure uses the mutation frame of a LeNet-5 as the baseline frame of the disclosure, and definition is such as Under:

[C5(S1P0)@32-BN-MP2(S2)]-[C5(S1P0)@64-BN-MP2(S2)]-FC512-FC10.Here C5 is The convolutional layer of one 5*5, BN are batch normalization layers, are followed by ReLU activation primitive, MP2 is the max-pooling of a 2*2 Layer, FC is full articulamentum.The disclosure use an artificial attenuation learning rate, at 15 and 25 period, learning rate divided by 10 into Row decaying.Also, after weight is shared, learning rate is again divided by 10 further decaying.Hyper parameter setting is as shown in table 1.

CIFAR10 data set: CIFAR10 is another very popular and widely applied image data set.It It is made of training set and test set, separately includes 50000 and 10000 32*32RGB pictures.Sample is divided into 10 classifications.This Disclosure enhances sample by the sample of stochastical sampling 32*32 in the picture of 4 pixels of filling and flip horizontal on each side This.

Network frame: the disclosure uses the mutation frame VGG7 of a VGG16 as the baseline frame of the disclosure, and definition is such as Under:

[C3(S1P1)@128-BN-C3(S1P1)@128-MP2(S2)-BN]-[C3(S1P1)@256-BN-C3(S1P1)@ 256-M P2(S2)-BN]-[C3(S1P1)@512-BN-C3(S1P1)@512-MP2(S2)-BN]-(2×FC1024)-FC10。 The disclosure uses the learning rate of an artificial attenuation, and at 40,80 and 120 period, learning rate is decayed divided by 10.Also, After weight is shared, learning rate is again divided by 10 further decaying.Hyper parameter setting is as shown in table 1.

Table is arranged in 1 hyper parameter of table

Fig. 2 and Fig. 3 illustrates nicety of grading of the multivalue quantitative model of the disclosure under a variety of quantization levels.Such as Fig. 2 and Shown in Fig. 3, for all quantization levels, best precision is also obtained when they complete quantization.Quantization level is higher, Nicety of grading is closer to full precision.Moreover, since originating point of quantification, the model of the disclosure is gradually more than full precision model Add and smoothly polymerize, this reveals that range shortening there is certain regularization to influence model training.

Table 2 summarizes all experimental results.On MNIST data set, the two-value of the disclosure under same quantization level The current state-of-the-art that the multivalue quantitative model of quantization level has reached.On CIFAR10 data set, in quantization water on an equal basis The multivalue quantitative model of the two-value quantization level of the flat lower disclosure is slightly below current state-of-the-art.For higher quantization water Flat, the multivalue quantization of the disclosure is even more more than other quantization methods.For compression ratio, two-value quantization compression ratio reaches about 31 times Compression, 4,8 and 16 quantization levels respectively reach about 16 times, 10 times and 8 times compressions.Therefore, disclosed method is with quantization water Flat increase improves accuracy of identification, this is conducive to provide corresponding tailored compression service for embedded system and mobile platform. The some examples of implementation of the application are compared with other two-values and three values quantization compression algorithm, as shown in table 2.

Table 2 is some examples of implementation of the application compared with other two-values and three values quantify compression algorithm

Model	MNIST	CIFAR10	Compression ratio
				MVQ-2	99.39	89.61	≈31
MVQ-4	99.42	92.81	≈16
				MVQ-8	99.43	93.45	≈10
MVQ-16	99.45.	93.50	≈8
				BinaryConnect	98.82	91.73	≈31
BNN	88.6	89.86	≈31
				TWN	99.35	92.56	≈16
TTQ	—	91.13	≈16

The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any Modification, equivalent replacement, improvement etc., should be included within the scope of protection of this application.

Claims

1. the multivalue quantisation depth neural network compression method of embedded system, characterized in that include:

The grade M for needing to quantify is determined according to the available storage resource of embedded system, that is, needs the ownership every layer Weight turns to M weighted value；

The L layer convolutional neural networks of a given pre-training；

The weight of each convolutional layer in convolutional neural networks is divided into M section by partition initialization, determines each weight Affiliated subregion；

The restrained convolutional neural networks of retraining update weight；

Section movement and range shortening are realized to update each section；

Each section size according to each convolutional layer is repeated to constrain the weight in each section；Retraining weight is restrained Convolutional neural networks update weight, realize section movement and range shortening to update each section step, until each area Between polymerize, that is, the weight in each section is finally quantified as section polymerizing value, that is, it is shared to realize weight；

After weight is shared, continues retraining to obtain higher performance, that is, calculate shared weight gradient, it is total to update each Weight is enjoyed, moreover, the convolutional neural networks of shared weight only need to store shared weight and corresponding index.

2. the multivalue quantisation depth neural network compression method of embedded system as described in claim 1, characterized in that It is M a not according to all weights that its weighted value divides it for the l convolutional layer in the t times period about partition initialization The subregion of overlapping, it is expressed asHere,It is expressed as i-th of subregion, 0 < i≤M, each subregion Comprising a part of weight,ForJ-th of weighted value, express the weight set of each subregion It is combined into corresponding weight indexed set WithBe expressed as i-th of subregion weight set and Weight index set, and ifSoAnd

3. the multivalue quantisation depth neural network compression method of embedded system as claimed in claim 2, characterized in that When the weight of each convolutional layer is divided into M section, each weight institute is determined using the initial method of k-means Belong to subregion, k-means means to minimize quadratic sum in class:

Wherein,Represent subregionCorresponding cluster centre.

4. the multivalue quantisation depth neural network compression method of embedded system as described in claim 1, characterized in that Realize that section is mobile and range shortening is come when updating each section, section updates principle:

HereIt is expressed as i-th of subregion of l convolutional layer when the t times period, 0 < i≤M,WithIt indicatesUnder Boundary and the upper bound, τ^tIt is average and variance rate,WithIt is the contraction size on section both sides, they determine average and variance respectively With the step sizes of range shortening,Represent sectionThe distance of average and variance, whereinWithRespectively Indicate current and updated region average.

5. the multivalue quantisation depth neural network compression method of embedded system as claimed in claim 4, characterized in that When a section is less thanWhen, it is meant that update again seeks section so that two boundaries in section exchange position When the value that polymerization is shared to a weight, it is as follows that it is rewritable, which to update rule:

6. the multivalue quantisation depth neural network compression method of embedded system as claimed in claim 2, characterized in that When constraining the weight for belonging to each section according to each section size of each convolutional layer, the formula that utilizes are as follows:

7. the multivalue quantisation depth neural network compression method of embedded system as described in claim 1, characterized in that The restrained network of retraining, when updating weight, the formula that is utilized are as follows:

Wherein,It indicates during weight quantizationUpdate weight after the s times iteration in the t times period,It is Update weight after iteration again, λ^tIt is current learning rate.

8. the multivalue quantisation depth neural network compression method of embedded system as described in claim 1, characterized in that Shared weight is updated, the gradient calculating for sharing weight is as follows:

Wherein, C represents cost function, and C is about sectionUpdate gradient be equal to sectionIn each weight gradient mean value；

Wherein,It indicates during weight is sharedUpdate weight after the s times iteration in the t times period,It is Update weight after iteration again, λ^tRepresent current learning rate.

9. a kind of computer readable storage medium, wherein being stored with a plurality of instruction, characterized in that described instruction is suitable for being set by terminal Standby processor load and perform claim requires the multivalue quantisation depth neural network of any embedded system of 1-7 Compression method.

10. the multivalue quantisation depth neural network compressibility based on embedded system, characterized in that include:

The level de-termination unit of quantization determines the grade M for needing to quantify according to the available storage resource of embedded system, that is, The ownership weight every layer is needed to turn to M weighted value；

Weight division unit gives the L layer convolutional neural networks of a pre-training；By each convolutional layer in convolutional neural networks Weight be divided into M section；

Weight shared cell repeats each section size according to each convolutional layer to constrain the weight in each section；It instructs again Practice the restrained convolutional neural networks of weight, updates weight；Realize that section is mobile and range shortening walks to update each section Suddenly, until the polymerization of each section, that is, weight are shared；