CN109388779A

CN109388779A - A kind of neural network weight quantization method and neural network weight quantization device

Info

Publication number: CN109388779A
Application number: CN201710656027.XA
Authority: CN
Inventors: 南楠; 叶丽萍; 李晓会
Original assignee: Allwinner Technology Co Ltd
Current assignee: Allwinner Technology Co Ltd
Priority date: 2017-08-03
Filing date: 2017-08-03
Publication date: 2019-02-26

Abstract

The invention discloses a kind of neural network weights to quantify method and apparatus, which comprises step 1, obtains set of matrices to be quantified；Step 2, the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, obtains the weight quantized value of the weighted value to be quantified, and obtain quantization matrix set；Step 3, according to the difference of the weight quantized value weighted value to be quantified corresponding with the weight quantized value, quantization is compensated to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and obtain compensation quantization set of matrices；Step 4, the quantization matrix set and the compensation are quantified into set of matrices, it is medium to be used that the quantized result as the set of matrices to be quantified is stored in neural network.Compensation quantization in the present invention keeps its quantization sampling interval denser, to reduce the performance loss of neural network model brought by quantization error by giving more important weighted superposition shift term.

Description

A kind of neural network weight quantization method and neural network weight quantization device

Technical field

The present invention relates to artificial neural network technology field more particularly to a kind of neural network weight quantization methods and nerve Network weight quantization device.

Background technique

In recent years, with the fast development of artificial intelligence technology, deep learning neural network obtains in terms of pattern-recognition Huge success, such as image classification, target detection, image segmentation, speech recognition and machine translation etc..In above-mentioned field In, the either shallow model of the remote ultra-traditional of the performance of deep learning model, or even reached the level of the mankind in some aspects.However, The neural network of better performances usually has larger model parameter, so that its computation complexity is higher.The complexity of calculating It spends while being embodied in space amount of storage (EMS memory occupation when huge model storage volume and operation) and time calculation amount (single Tens billion of secondary floating-point operations needed for reasoning) two aspect on.Therefore, carrying out compression and acceleration to neural network becomes particularly important, Especially to operate on such as embedded device, integrated hardware equipment and large-scale data processing center application for.

It is converted into fixed-point number in such a way that weight quantifies, or by weight, or establishes quantization weight code book, realizes that weight is total It enjoys, model can be effectively compressed, reduce the amount of storage of neural network.Thus, effective quantization method how is designed, and then be directed to The quantization method designs efficient hardware configuration, is nerual network technique field urgent problem to be solved.

Summary of the invention

The present invention provides a kind of neural network weight quantization method and neural network weight quantization device, existing to solve The problem of calculating is complicated when neural network weight quantifies in technology, low efficiency.

According to one aspect of the present invention, a kind of neural network weight quantization method is provided, comprising:

Step 1, set of matrices to be quantified is obtained；

Step 2, the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, obtained described wait measure Change the weight quantized value of weighted value, and is worth to obtain quantization matrix set according to the weight quantization；

Step 3, according to the difference of the weight quantized value weighted value to be quantified corresponding with the weight quantized value, Quantization is compensated to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and according to the compensation rate Change is worth to obtain compensation quantization set of matrices；

Step 4, the quantization matrix set and the compensation are quantified into set of matrices, as the set of matrices to be quantified Quantized result to be stored in neural network medium to be used.

The present invention also provides a kind of neural network weight quantization devices, comprising:

Module is obtained, for obtaining set of matrices to be quantified；

Weight quantization modules, for the weighted value to be quantified in the set of matrices to be quantified to be quantized to log space, The weight quantized value of the weighted value to be quantified is obtained, and is worth to obtain quantization matrix set according to the weight quantization；

Quantization modules are compensated, for according to the weight quantized value power to be quantified corresponding with the weight quantized value The difference of weight values compensates quantization to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and according to The compensation quantization is worth to obtain compensation quantization set of matrices；

Quantized result memory module, for the quantization matrix set and the compensation to be quantified set of matrices, as institute Stating the quantized result of set of matrices to be quantified, to be stored in neural network medium to be used.

Neural network weight quantization method provided by the present invention, is first quantized to log space for set of matrices to be quantified, After obtaining quantization matrix set, then the quantization difference of initial quantization matrix compensated into quantization, obtains quantization compensation matrix, institute It states quantization matrix and the quantization compensation matrix is the quantized result of the matrix to be quantified.By to initial quantization matrix into Row compensation quantization does further compensation quantization to quantization difference caused by quantization, and the compensation quantization passes through to more important Weighted superposition shift term keeps its quantization sampling interval denser, to reduce neural network mould brought by quantization error The performance loss of type.

The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.

Detailed description of the invention

By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:

Fig. 1 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 1；

Fig. 2 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 2；

Fig. 3 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 3；

Fig. 4 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 4；

Fig. 5 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 5；

Fig. 6 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 6；

Fig. 7 is weight power quantization sampling schematic diagram；

Fig. 8 is the quantization of weight power (band compensation) sampling schematic diagram；

Fig. 9 is the structural schematic diagram of the neural network weight quantization device in embodiment of the present invention method 7；

Figure 10 is the structural schematic diagram of the neural network weight quantization device in embodiment of the present invention method 8.

Specific embodiment

The exemplary embodiment that the present invention will be described in more detail below with reference to accompanying drawings.Although being shown in attached drawing disclosed Exemplary embodiment, it being understood, however, that may be realized in various forms the present invention without should be by embodiments set forth here institute Limitation.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be complete by the scope of the present invention Whole is communicated to those skilled in the art.

Fig. 1 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 1, as shown in Fig. 1 originally Neural network weight quantization method in inventive method embodiment 1, comprising:

Step 1, set of matrices to be quantified is obtained.

It specifically, firstly, for the excitation of neural network weight quantization behavior, is determined by network training state, Wherein, network training state includes at least: the degree of stability that the progress and network that network is currently trained currently are trained.Such as network Currently trained progress is the training burden for completing 25%, and network is in steady state (can not restrain).

Neural network is made of multiple layers, and every layer of weight parameter (weighted value) can be expressed as a matrix.? During quantifying to the weight of neural network, can once by all layers of all weighted values of neural network composition to Quantization matrix set is quantified, and one layer therein or multiple layers of whole weighted values also may be selected and form set of matrices to be quantified Quantified, or select the fractional weight value of whole layers or part layer to form set of matrices to be quantified and quantified, according to demand Flexibly set.

Step 2, the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, obtained described wait measure Change the weight quantized value of weighted value, and is worth to obtain quantization matrix set according to the weight quantization.

Specifically, it is determined that after good set of matrices to be quantified, it once will be all to be quantified in the set of matrices to be quantified Weighted value quantified, obtain quantization matrix set.

Step 3, according to the difference of the weight quantized value weighted value to be quantified corresponding with the weight quantized value, Quantization is compensated to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and according to the compensation rate Change is worth to obtain compensation quantization set of matrices.

Specifically, it is assumed that weighted value to be quantified is certain real number A, and the weight quantized value of quantized result is A*, then A with Difference between A* is diff_A, and each weight quantized value compensates quantization all in accordance with corresponding diff_A, mended After repaying quantization, the quantization sampling interval is closeer, and the weight matrix set after quantization is also more relevantly fitted original power to be quantified Weight set of matrices, ensure that the calculated performance of neural network.

Specifically, the quantization matrix set and the compensation quantify set of matrices, the as described set of matrices to be quantified Quantized result.

Set of matrices to be quantified is first quantized to logarithm sky by neural network weight quantization method provided by the present embodiment Between, after obtaining quantization matrix set, then the quantization difference of initial quantization matrix compensated into quantization, obtains quantization compensation square Battle array, the quantization matrix and the quantization compensation matrix are the quantized result of the matrix to be quantified.By to initial quantization Matrix compensates quantization, does further compensation quantization to quantization difference caused by quantization, the compensation quantization is by giving More important weighted superposition shift term keeps its quantization sampling interval denser, to reduce nerve brought by quantization error The performance loss of network model.

Fig. 2 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 2, as shown in Fig. 2 originally Neural network weight quantization method in inventive method embodiment 2, comprising:

Step 1A, obtains set of matrices to be quantified and weight quantitative information, and the weight quantization information includes importance ginseng Several and part quantified goal.

Specifically, the importance parameter, modulus value including weight, based on weight to the accumulative contribution of activation value, be based on Accumulative contribution etc. of the weight to output valve carries out subsequent quantization step, different weights to weight according to the importance parameter The property wanted parameter will lead to different quantized results.

The part quantified goal is gradually measured for the set of matrices to be quantified to be divided at least two parts Change, including the range for the weight for quantifying to cover every time, the mode table of preset weighted value range or preset weight threshold can be used Show, is greater than the weight parameter of preset weight threshold, or the weight parameter within the scope of certain preset weighted value, It is quantized in the part of this part quantization.

Step 21, according to the importance parameter and the part quantified goal, in the set of matrices to be quantified really Determine part weighted value to be quantified.

Specifically, the weighted value to be quantified in the set of matrices to be quantified is arranged according to the importance parameter Sequence determines this part weighted value to be quantified for needing to be quantized further according to the part quantified goal.Due to different important Property parameter can provide different ranking results, according to identical part quantified goal, finally provide different parts power to be quantified Weight values.

For example, selecting the modulus value of weight relatively large as the importance parameter, by 20% weight set to be quantified As preset quantizing range, then according to the modulus value of the weight it is descending in all weight matrix set to be quantified to After quantization weight value is ranked up, quantified by the weighted value to be quantified that modulus value is up to before small select 20% as this part Target.

Step 22, part weighted value to be quantified is quantized to log space, obtains part weighted value to be quantified Weight quantized value, and be worth to obtain quantization matrix set according to the weight quantization.

Specifically, by the maximum weighted value to be quantified of 20% modulus value before selection, after being quantized to log space, before obtaining The quantized result of the maximum weighted value to be quantified of 20% modulus value, fractional weight quantized value, and obtain part quantization set of matrices.? In actual use, such as preferential selection quantifies relatively large (more important) weighted value to be quantified of modulus value, last stage quantization mould It is worth relatively small weight, can effectively avoid the decline of neural network performance in last stage quantization (no re -training process).

Specifically, with the step 3 implemented in 1.

Step 3A quantifies set of matrices according to the part and the compensation quantifies set of matrices, carries out to neural network Predetermined training, and the set of matrices to be quantified is updated according to the training result of the predetermined training.

Specifically, quantify due to having carried out part, need to update the weight for being not yet quantized part, to guarantee entire nerve The recovery of the model performance of network.

Step 3B, return step 21, until the weighted value to be quantified in the set of matrices to be quantified all complete by quantization.

Specifically, in set of matrices to be quantified in the updated, after return step 21, according to the importance parameter and institute Part quantified goal is stated, another part weighted value to be quantified is determined in the set of matrices to be quantified, and continue to execute Subsequent step, until all set of matrices to be quantified all complete by quantization.

Specifically, with the step 4 in embodiment 1, due to finally getting for the last time by the way of gradually quantifying Quantization matrix set and it is described gradually quantization in get every time compensation quantization set of matrices, as the square to be quantified The quantized result of battle array set.

Neural network weight quantization method provided by the present embodiment, first by set of matrices to be quantified according to preset important Property parameter and preset part quantified goal, are gradually quantized to log space, after obtaining quantization matrix set, then by initial quantization The quantization difference of matrix compensates quantization, quantization compensation matrix is obtained, then according to the quantization matrix set and the amount Change compensation matrix set is treated quantization weight set of matrices and is updated, and according to importance parameter and part quantified goal to mind It is trained through network, updates non-quantized weighted value to be quantified, the quantization matrix set finally got and the quantization are mended Repay set of matrices, the quantized result of the as described matrix to be quantified.Due to using the method gradually quantified, first quantify one Point, then the mode of quantization scale is gradually expanded so that the precision of quantization is higher so that neural network it is final performance it is more preferable.

It is described in one of the embodiments, that set of matrices and compensation quantization matrix stack are quantified according to the part It closes, neural network is trained, and the set of matrices to be quantified is updated according to training result, comprising: according to the part Quantization matrix set and the compensation quantify set of matrices, and propagated forward operation is carried out in neural network, obtains propagated forward Operation values；According to the propagated forward operation values, back-propagating operation is carried out to neural network, obtains the matrix stack to be quantified The weight updated value of non-quantized segment in conjunction；According to the weight updated value, part quantization set of matrices and the compensation Quantization matrix set updates the set of matrices to be quantified.

Specifically, since weight quantized value is the power weighted value for having carried out logarithmic transformation (with 2 bottom of for), described Convolution algorithm multiplication in propagated forward calculating process can be substituted with displacement, greatly accelerate the speed of propagated forward operation. In back-propagating operation, using gradient descent method, for the weight of non-quantized segment, local derviation of the calculating loss function to it Number, obtains updated value.

In the present embodiment, during being trained to neural network, power weight makes the result of weight quantization Obtain good compression and while acceleration effect, hardware realization be more efficiently succinct, economic value and practical value compared with It is high.

Fig. 3 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 3, as shown in Fig. 3 originally Neural network weight quantization method in inventive method embodiment 3, comprising:

Step 1B, obtains set of matrices to be quantified and weight quantitative information, and the weight quantization information includes importance ginseng Number and compensation quantified goal.

Specifically, the importance parameter, with the elaboration in embodiment 2.The compensation quantified goal is used for the amount Change set of matrices a part of quantization weight value rather than whole quantization weight value compensate quantization.The compensation quantifies mesh Mark includes the range for the quantization weight value that compensation quantization is covered every time, can use preset quantization weight value range or preset quantization The mode of weight threshold indicates, is greater than the weight parameter of preset quantization weight threshold value, or be in certain preset quantization Weight parameter within the scope of weighted value is quantified by compensation.

Specifically, for the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, institute is obtained The process for stating the weight quantized value of weighted value to be quantified using the one step completed mode in embodiment 1, or uses embodiment 2 In the mode gradually completed, without limitation.

Step 31, according to the importance parameter and the compensation quantified goal, determined in the weight quantized value to Backoff weight quantized value.

Specifically, as the weighted value to be quantified in the set of matrices to be quantified is quantized to log space mistake in step 2 Journey, by the way of gradually completing, then used importance parameter during determining weight quantized value to be compensated, and really It is identical to determine the importance parameter used during the weighted value to be quantified of part.

The importance parameter and the compensation quantified goal are used cooperatively, and determine the process of weight quantized value to be compensated, It is similar with the step 21 in embodiment 2, it repeats no more.

Step 32, corresponding described to be quantified according to the weight quantized value to be compensated and the weight quantized value to be compensated The difference of weighted value compensates quantization to the weight quantized value to be compensated, obtains the benefit of the weight quantized value to be compensated Quantized value is repaid, and is worth to obtain compensation quantization set of matrices according to the compensation quantization.

Specifically, selected weight quantized value to be compensated is only compensated into quantization.

Specifically, as the weighted value to be quantified in the set of matrices to be quantified is quantized to log space mistake in step 2 Journey further includes step 3A and step 3B with embodiment 2 then after step 3 by the way of gradually completing.

It in the present embodiment, can be according to default in such a way that selected part weight quantized value compensates quantization Importance parameter and compensation quantified goal, to the quantization weight value in quantization matrix set compensate quantization range carry out Flexibly control further improves the flexibility of neural network weight quantization provided by the present invention, while guaranteeing performance, Improve compression ratio and computational efficiency.

Fig. 4 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 4, as shown in Fig. 4 originally Neural network weight quantization method in inventive method embodiment 4, comprising:

Step 1C, obtains set of matrices to be quantified and weight quantitative information, and the weight quantization information includes importance ginseng The number of iterations, the iterative compensation ratio of each the number of iterations that number, compensation quantify.

Specifically, the importance parameter, with the elaboration in embodiment 2.It is the number of iterations of the compensation quantization, each described The iterative compensation ratio of the number of iterations repeatedly compensates quantum chemical method for carrying out, so that the result of compensation quantization is more accurate.Its In, the number of iterations of the compensation quantization is the number for compensating quantization, can arbitrarily be set according to demand, each iteration time When to iterate to calculate every time, the quantization to be compensated next time chosen in the result quantified was compensated from last time for several iterative compensation ratios Range or threshold value, can be identical every time, iterative compensation ratio that can also be each is reduced than the iterative compensation ratio of last time.

Step 3 ', according to the importance parameter, the number of iterations, the iteration of each the number of iterations of the compensation quantization The difference of compensating proportion, the weight quantized value weighted value to be quantified corresponding with the weight quantized value, to the power Re-quantization value is iterated compensation quantization, obtains iterative compensation quantized value, and obtain iteration according to the iterative compensation quantized value Compensation quantization set of matrices.

Specifically, according to the importance parameter, it is preset compensation quantization the number of iterations, each the number of iterations repeatedly For compensating proportion, the weighted value to be quantified of each iterative compensation quantization is chosen, process is similar with the step 21 in embodiment 2, no It repeats again.

After the compensation quantization that have passed through preset the number of iterations time, multiple compensation quantization set of matrices, conduct are got Iterative compensation quantization matrix set.

In the present embodiment, pass through the number of iterations of preset band compensation quantization and the iterative compensation of each the number of iterations Quantization weight value has been carried out multiple compensation and quantified by ratio, is obtained multiple compensation and is quantified set of matrices.Further improve this hair The flexibility of the quantization of neural network weight provided by bright improves compression ratio and computational efficiency while guaranteeing performance.

Fig. 5 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 5, as shown in Fig. 5 originally Neural network weight quantization method in inventive method embodiment 5, comprising:

Step 1D, obtains set of matrices to be quantified and weight quantitative information, and the weight quantization information includes importance ginseng Number, compensation quantization desired value.

Specifically, the importance parameter, with the elaboration in embodiment 2.The compensation quantifies desired value, is preset benefit Repaying quantization needs desired value to be achieved.

Specifically, the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, obtain it is described to The process of the weight quantized value of quantization weight value, using the one step completed mode in embodiment 1, or using in embodiment 2 The mode gradually completed, without limitation.

Specifically, with the step 3 in embodiment 1.

Step 3C, according to the difference of the compensation quantization desired value and the compensation quantized value, to the compensation quantized value Continue compensation quantization, until obtaining the compensation quantized value for meeting the compensation quantization desired value, and according to the compensation rate Change is worth to obtain compensation quantization set of matrices.

Specifically, according to the compensation quantized value in preset compensation quantization desired value and the compensation quantization set of matrices Between quantization difference, it is determined whether compensate quantization, and desired value quantified according to the compensation and determines whether to continue Compensation quantization sets the compensation quantization desired value according to demand, controls the precision of the compensation quantization set of matrices, It can be quantified by one or many compensation, obtain the compensation quantization set of matrices for meeting the compensation quantization desired value.

In the present embodiment, desired value is quantified by predesigned compensation, after carrying out one or many compensation quantizations, is accorded with The compensation quantization set of matrices for closing the compensation quantization desired value, since the preset compensation quantization desired value can be according to demand Flexibly setting, therefore what be can be convenient controls the precision of compensation quantization.It further improves provided by the present invention The flexibility of neural network weight quantization improves compression ratio and computational efficiency while guaranteeing performance.

Fig. 6 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 6, as shown in Fig. 6 originally Neural network weight quantization method in inventive method embodiment 6, comprising:

Step 1E obtains set of matrices to be quantified.

Specifically, with the step 1 in embodiment 1.

Step 20E, judges whether the symbol of weighted value to be quantified is positive, if being positive, meets step 21E, if not just, connecing step Rapid 22E.

Specifically, before being quantified the weighted value to be quantified, judge that the symbol of the weighted value to be quantified is It is just still negative, and enters corresponding process flow.

The weighted value to be quantified that symbol is positive in the set of matrices to be quantified is quantized to log space by step 21E, Positive weights quantized value is obtained, and positive weights set of matrices is obtained according to the positive weights quantized value.Meet step 31E.

The weighted value to be quantified that symbol is negative in the set of matrices to be quantified is quantized to log space by step 22E, Negative weight quantized value is obtained, and negative weight set of matrices is obtained according to the negative weight quantized value.Meet step 32E.

Specifically, LOG parameter sets L is obtained according to the difference of weight sign symbol, step 21E and step 22E₁And L₂.Its In, L₁The element value of middle positive weights value corresponding position is the intermediate parameters for converting its modulus value progress LOG, negative weighted value pair Answer the element value of position to preset the maximum negative that quantizing bit number can indicate, so that corresponding weighted value is approximate after decoding It is 0.Conversely, L₂The element value of middle negative weighted value corresponding position is the intermediate parameters for converting its modulus value progress LOG, positive to weigh The element value of weight values corresponding position is presets the maximum negative that quantizing bit number can indicate, so that corresponding weight after decoding Value is approximately 0.Step 21E and step 22E has obtained two to respectively indicate having for positive weights value and negative weighted value sparse as a result, The power weight matrix L of structure₁And L₂.Mapping mode includes but is not limited to:

1) element x in weight set N to be quantified to model is realized according to formula 1_iCarry out LOG transformation:

y_i=round (log₂(|x_i|)) (1)

Wherein, y_iFor power weight quantized value, y_i∈L。

2) according to the statistical information of weight set N to be quantified, respectively determine network in each can learning layer quantization code book C1, the code book is on log space according to 2ⁿIt divides, then each weight in set N is mapped to the code book value of equivalent layer.

To two power weight matrix L₁,L₂Carry out rarefaction.Step 21E and step 22E carries out positive negative weight respectively Logarithmic transformation and part modulus value are quantified as 0 close to 0 weight after logarithmic transformation, even if not considering that pre-training obtains The sparsity of the weight matrix arrived, LOG parameter sets L₁And L₂Still there is apparent sparsity structure sparse matrix therefore can be used Expression-form carries out storage and operation.

Model Weight storage volume can be made to obtain more times of compressions based on sparse matrix expression-form described in the present embodiment, be based on Sparse matrix multiplication can be such that model forward inference arithmetic speed accelerates, and compression ratio is higher, and operation acceleration is more obvious.

Step 31E, to the positive weights quantized value in the positive weights set of matrices, and it is corresponding with the positive weights quantized value The difference of weighted value to be quantified compensate quantization, obtain positive weights offset entry value, and obtain according to the positive weights deviant To positive weights excursion matrix set.

Step 32E, to the negative weight quantized value in the negative weight set of matrices, and it is corresponding with the negative weight quantized value The difference of weighted value to be quantified compensate quantization, obtain negative weight offset entry value, and obtain according to the negative weight deviant To negative weight excursion matrix set.

Specifically, for the weight of more important (such as quantization sampling interval relatively thin big modulus value weight), logarithmic transformation Influence of the generated quantization error to model performance is particularly evident, and the present embodiment weight set more important to this is mended Quantization is repaid, i.e., further logarithmic transformation (formula 2) is made to quantization difference:

The weight actual quantization modulus value is obtained by formula 3:

According to the difference of weight sign symbol, LOG parameter sets L is obtained₁' and L₂', wherein L₁' in positive weights value correspond to position The element value set is to be quantified difference to carry out the intermediate parameters that LOG is converted, and the element value of negative weighted value corresponding position is The maximum negative that default quantizing bit number can indicate, so that corresponding weighted value is approximately 0 after decoding.Conversely, L₂' in bear The element value of weighted value corresponding position is to be quantified difference to carry out the intermediate parameters that LOG is converted, and positive weights value corresponds to position The element value set is presets the maximum negative that quantizing bit number can indicate, so that corresponding weighted value is approximately 0 after decoding. Two higher power weight shift term matrixes of degree of rarefication for respectively indicating positive weights value and negative weighted value have been obtained as a result,.

Without compensating the weight of quantization, it is negative that compensation quantized value is set as the maximum that default quantizing bit number can indicate Number, so that corresponding compensation authority weight values are approximately 0 after decoding.It is described that the weight set P for compensating quantization is needed to pass through compensation rate Change importance selecting module to generate, production method includes but is not limited to: in a network respectively can the quantization code book that determines of learning layer (sampling interval signal generates compensation quantization code book C2 (C2 as in Fig. 7), abandoned the larger value with smaller portions in code book both ends to C1 ∈C1).Assuming that the yardage in code book C1 is n1, the yardage in code book C2 is n2 (n2 < n1), and C1 is superimposed production with C2 permutation and combination A raw yardage is n1*n2, the denser code book C in larger sampled value interval (sampling interval signal such as Fig. 8).For code book C's Quantization according to C1 really on the basis of being quantified, quantization of the selective superposition for code book C2.

In the present embodiment, the yardage n2 in compensation quantization code book is less than the yardage n1 for quantifying code book used for the first time, accordingly , the default quantizing bit number of compensation quantization institute is just less than the default quantizing bit number of the institute of quantization for the first time.For example, first time quantization Number is preset as 4 bits, and compensation quantization only needs 2 bits just and can reach requirement.

Step 4E, by the positive weights set of matrices, the negative weight set of matrices, the positive weights excursion matrix collection It closes, the negative weight excursion matrix set, the quantized result as the set of matrices to be quantified, which is stored in neural network, to be made With.

In the present embodiment, positive negative weight is respectively mapped to corresponding power weight matrix by Model Weight quantization, is obtained Two sparse weight matrix.Positive negative weight quantifies respectively, effectively multiplying can be avoided (to pass through shift operation in the present invention Realize) in judgement to weight symbol, therefore accelerate arithmetic speed, while the present invention is dilute using compressing by the sparse matrix Space-out (Compressed Sparse Row, CSR) or the lattice of the sparse column (Compressed Sparse Column, CSC) of compression Formula stores the nonzero element of sparse matrix, saves a large amount of memory space.By compensation quantify, obtain two it is sparse Spend higher weight shift term matrix.The compensation quantization is by giving more important weighted superposition shift term, between sampling its quantization Every denser, to reduce the loss of model performance brought by quantization error.The compression DCNN obtained through the invention, can Suitable for the high technology industry field of the calculating such as mobile terminal, embedded equipment, robot and constrained storage, economical and practical value It is high.

Fig. 9 is the structural schematic diagram of the neural network weight quantization device in embodiment of the present invention method 7, as shown in Figure 9 Embodiment of the present invention method 7 in neural network weight quantization device, comprising:

Module 10 is obtained, for obtaining set of matrices to be quantified.

Weight quantization modules 20, for the weighted value to be quantified in the set of matrices to be quantified to be quantized to logarithm sky Between, the weight quantized value of the weighted value to be quantified is obtained, and be worth to obtain quantization matrix set according to the weight quantization.

Quantization modules 30 are compensated, for corresponding with the weight quantized value described to be quantified according to the weight quantized value The difference of weighted value compensates quantization to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and root It is worth to obtain compensation quantization set of matrices according to the compensation quantization.

Quantized result memory module 40, for the quantization matrix set and the compensation to be quantified set of matrices, as It is medium to be used that the quantized result of the set of matrices to be quantified is stored in neural network.

Set of matrices to be quantified is first quantized to logarithm sky by neural network weight quantization device provided by the present embodiment Between, after obtaining quantization matrix set, then the quantization difference of initial quantization matrix compensated into quantization, obtains quantization compensation square Battle array, the quantization matrix and the quantization compensation matrix are the quantized result of the matrix to be quantified.By to initial quantization Matrix compensates quantization, does further compensation quantization to quantization difference caused by quantization, the compensation quantization is by giving More important weighted superposition shift term keeps its quantization sampling interval denser, to reduce nerve brought by quantization error The performance loss of network model.

Figure 10 is the structural schematic diagram of the neural network weight quantization device in embodiment of the present invention method 8, such as Figure 10 institute Neural network weight quantization device in the embodiment of the present invention method 8 shown, comprising:

Module 10 is obtained, for obtaining set of matrices to be quantified；It is also used to obtain weight quantitative information, the weight quantization Information includes importance parameter and part quantified goal.

Part quantization weighted value acquiring unit 21, for according to the importance parameter and the part quantified goal, Part weighted value to be quantified is determined in the set of matrices to be quantified.

Part quantifying unit 22 obtains the part for part weighted value to be quantified to be quantized to log space The weight quantized value of weighted value to be quantified, and be worth to obtain quantization matrix set according to the weight quantization.

Training module 30A, for quantifying set of matrices and compensation quantization set of matrices according to the part, to nerve Network carries out predetermined training, and updates the set of matrices to be quantified according to the training result of the predetermined training.

Training terminates module 30B, for returning to the part quantization weighted value acquiring unit, until the matrix to be quantified Weighted value to be quantified in set all complete by quantization.

The quantization matrix set and the compensation are quantified set of matrices, as described by quantized result memory module 40 It is medium to be used that the quantized result of set of matrices to be quantified is stored in neural network.

Neural network weight quantization device provided by the present embodiment, first by set of matrices to be quantified according to preset important Property parameter and preset part quantified goal, are gradually quantized to log space, after obtaining quantization matrix set, then by initial quantization The quantization difference of matrix compensates quantization, quantization compensation matrix is obtained, then according to the quantization matrix set and the amount Change compensation matrix set is treated quantization weight set of matrices and is updated, and according to importance parameter and part quantified goal to mind It is trained through network, updates non-quantized weighted value to be quantified, the quantization matrix set finally got and the quantization are mended Repay set of matrices, the quantized result of the as described matrix to be quantified.Due to using the method gradually quantified, first quantify one Point, then the mode of quantization scale is gradually expanded so that the precision of quantization is higher so that neural network it is final performance it is more preferable.

The training module 30A in one of the embodiments, including propagated forward arithmetic element, for according to Part quantization set of matrices and the compensation quantify set of matrices, and propagated forward operation is carried out in neural network, obtain it is preceding to Propagate operation values；Back-propagating arithmetic element, for carrying out back-propagating to neural network according to the propagated forward operation values Operation obtains the weight updated value of non-quantized segment in the set of matrices to be quantified；Set of matrices updating unit to be quantified is used In quantifying set of matrices according to the weight updated value, part quantization set of matrices and the compensation, update described wait measure Change set of matrices.

The weight quantitative information in one of the embodiments, including weight quantized value acquiring unit to be compensated, are used for According to the importance parameter and the compensation quantified goal, weight quantized value to be compensated is determined in the weight quantized value； First compensation quantifying unit, for corresponding with the weight quantized value to be compensated described according to the weight quantized value to be compensated The difference of weighted value to be quantified compensates quantization to the weight quantized value to be compensated, obtains the weight quantization to be compensated The compensation quantized value of value, and be worth to obtain compensation quantization set of matrices according to the compensation quantization.

The weight quantitative information in one of the embodiments, the iteration time including importance parameter, compensation quantization The iterative compensation ratio of several, each the number of iterations；The compensation quantization modules, comprising: iterative compensation unit, for according to institute State importance parameter, the number of iterations of the compensation quantization, the iterative compensation ratio of each the number of iterations, weight quantization It is worth the difference of the weighted value to be quantified corresponding with the weight quantized value, compensation rate is iterated to the weight quantized value Change, obtains iterative compensation quantized value, and iterative compensation quantization matrix set is obtained according to the iterative compensation quantized value.

The weight quantitative information in one of the embodiments, including compensation quantization desired value；The neural network power Re-quantization device further includes desired value compensation quantization modules, for quantifying desired value and compensation quantization according to the compensation The difference of value continues compensation quantization to the compensation quantized value, until obtaining the benefit for meeting the compensation quantization desired value Quantized value is repaid, and is worth to obtain compensation quantization set of matrices according to the compensation quantization.

In the present embodiment, desired value is quantified by predesigned compensation, after carrying out one or many compensation quantizations, is accorded with The compensation quantization set of matrices for closing the compensation quantization desired value, since the preset compensation quantization desired value can be according to demand Flexibly setting, therefore the precision of compensation quantization is controlled with can be convenient.Further improve mind provided by the present invention The flexibility quantified through network weight improves compression ratio and computational efficiency while guaranteeing performance.

The weight quantization modules 20 in one of the embodiments, comprising: positive and negative weight matrix set acquiring unit, For the weighted value to be quantified in the set of matrices to be quantified according to the difference of sign symbol, to be quantized to log space, point Positive weights quantized value and negative weight quantized value are not obtained, and positive weights set of matrices, root are obtained according to the positive weights quantized value Negative weight set of matrices is obtained according to the negative weight quantized value；The compensation quantization modules 30 include: positive negative weight excursion matrix Gather acquiring unit, for the positive weights quantized value in the positive weights set of matrices, and with the positive weights quantized value pair The difference for the weighted value to be quantified answered compensates quantization, obtains positive weights offset entry value, and according to the positive weights deviant Obtain positive weights excursion matrix set, and to the negative weight quantized value in the negative weight set of matrices, and with the negative power The difference of the corresponding weighted value to be quantified of re-quantization value compensates quantization, obtains negative weight offset entry value, and according to described negative Weight deviant obtains negative weight excursion matrix set.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include: ROM, RAM, disk or CD etc..

In short, the foregoing is merely illustrative of the preferred embodiments of the present invention, it is not intended to limit the scope of the present invention. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims

1. a kind of neural network weight quantization method, which is characterized in that the described method includes:

Step 1, set of matrices to be quantified is obtained；

Step 2, the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, obtains the power to be quantified The weight quantized value of weight values, and be worth to obtain quantization matrix set according to the weight quantization；

Step 3, according to the difference of the weight quantized value weighted value to be quantified corresponding with the weight quantized value, to institute It states weight quantized value and compensates quantization, obtain the compensation quantized value of the weight quantized value, and according to the compensation quantized value Obtain compensation quantization set of matrices；

Step 4, the quantization matrix set and the compensation are quantified into set of matrices, the amount as the set of matrices to be quantified It is medium to be used that change result is stored in neural network.

2. neural network weight quantization method as described in claim 1, which is characterized in that before step 2, the method is also Include:

Weight quantitative information is obtained, the weight quantization information includes: importance parameter and part quantified goal；

The step 2, comprising:

Step 21, according to the importance parameter and the part quantified goal, the determining section in the set of matrices to be quantified Divide weighted value to be quantified；

Step 22, part weighted value to be quantified is quantized to log space, obtains the power of part weighted value to be quantified Re-quantization value, and be worth to obtain quantization matrix set according to the weight quantization；

After step 3, the method also includes:

Step 3A quantifies set of matrices according to the part and the compensation quantifies set of matrices, makes a reservation for neural network Training, and the set of matrices to be quantified is updated according to the training result of the predetermined training；

3. neural network weight quantization method as claimed in claim 2, which is characterized in that described to quantify square according to the part Battle array set and the compensation quantify set of matrices, are trained to neural network, and described to be quantified according to training result update Set of matrices, comprising:

Quantify set of matrices according to the part and the compensation quantifies set of matrices, propagated forward fortune is carried out in neural network It calculates, obtains propagated forward operation values；

According to the propagated forward operation values, back-propagating operation is carried out to neural network, obtains the set of matrices to be quantified In non-quantized segment weight updated value；

According to the weight updated value, the part quantization set of matrices and the compensation quantify set of matrices, update described in Quantization matrix set.

4. neural network weight quantization method as described in claim 1, which is characterized in that

The weight quantization information includes: importance parameter and compensation quantified goal；

The step 3, comprising:

According to the importance parameter and the compensation quantified goal, weight quantization to be compensated is determined in the weight quantized value Value；

According to the difference of the weight quantized value to be compensated weighted value to be quantified corresponding with the weight quantized value to be compensated Value, compensates quantization to the weight quantized value to be compensated, obtains the compensation quantized value of the weight quantized value to be compensated, and It is worth to obtain compensation quantization set of matrices according to the compensation quantization.

5. neural network weight quantization method as described in claim 1, it is characterised in that:

The weight quantization information includes: importance parameter, the number of iterations of compensation quantization, the iteration of each the number of iterations are mended Repay ratio；

The step 3, comprising:

According to the importance parameter, the number of iterations, the iterative compensation ratio of each the number of iterations, institute of the compensation quantization The difference for stating the weight quantized value weighted value to be quantified corresponding with the weight quantized value carries out the weight quantized value Iterative compensation quantization, obtains iterative compensation quantized value, and obtain iterative compensation quantization matrix according to the iterative compensation quantized value Set.

6. neural network weight quantization method as described in claim 1, which is characterized in that

The weight quantization information includes: compensation quantization desired value；

After the step 3, the method also includes:

According to the difference of the compensation quantization desired value and the compensation quantized value, the compensation quantized value is continued to compensate Quantization until obtaining the compensation quantized value for meeting the compensation quantization desired value, and is worth according to the compensation quantization and is compensated Quantization matrix set.

7. neural network weight quantization method as described in claim 1, which is characterized in that

The step 2, comprising:

By the weighted value to be quantified in the set of matrices to be quantified according to the difference of sign symbol, it is quantized to log space, point Positive weights quantized value and negative weight quantized value are not obtained, and positive weights set of matrices, root are obtained according to the positive weights quantized value Negative weight set of matrices is obtained according to the negative weight quantized value；

The step 3, comprising:

To the positive weights quantized value in the positive weights set of matrices, and weight to be quantified corresponding with the positive weights quantized value The difference of value compensates quantization, obtains positive weights offset entry value, and obtain positive weights offset according to the positive weights deviant Set of matrices, and

To the negative weight quantized value in the negative weight set of matrices, and weight to be quantified corresponding with the negative weight quantized value The difference of value compensates quantization, obtains negative weight offset entry value, and obtain negative weight offset according to the negative weight deviant Set of matrices.

8. a kind of neural network weight quantization device characterized by comprising

Module is obtained, for obtaining set of matrices to be quantified；

Weight quantization modules are obtained for the weighted value to be quantified in the set of matrices to be quantified to be quantized to log space The weight quantized value of the weighted value to be quantified, and be worth to obtain quantization matrix set according to the weight quantization；

Quantization modules are compensated, for according to the weight quantized value weighted value to be quantified corresponding with the weight quantized value Difference, quantization is compensated to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and according to described Compensation quantization is worth to obtain compensation quantization set of matrices；

Quantized result memory module, for by the quantization matrix set and the compensation quantization set of matrices, as it is described to It is medium to be used that the quantized result of quantization matrix set is stored in neural network.

9. neural network weight quantization device as claimed in claim 8, which is characterized in that further include:

The acquisition module is also used to obtain weight quantitative information, and the weight quantization information includes importance parameter and part Quantified goal；

The weight quantization modules, comprising:

Part quantization weighted value acquiring unit, for according to the importance parameter and the part quantified goal, it is described to Part weighted value to be quantified is determined in quantization matrix set；

It is to be quantified to obtain the part for part weighted value to be quantified to be quantized to log space for part quantifying unit The weight quantized value of weighted value, and be worth to obtain quantization matrix set according to the weight quantization；

The neural network weight quantization device, further includes:

Training module, for according to the part quantify set of matrices and the compensation quantization set of matrices, to neural network into The predetermined training of row, and the set of matrices to be quantified is updated according to the training result of the predetermined training；

Training terminates module, for returning to the part quantization weighted value acquiring unit, until in the set of matrices to be quantified Weighted value to be quantified all quantization complete.

10. neural network weight quantization device as claimed in claim 9, which is characterized in that the training module, comprising:

Propagated forward arithmetic element, for quantifying set of matrices and compensation quantization set of matrices according to the part, in mind Through carrying out propagated forward operation in network, propagated forward operation values are obtained；

Back-propagating arithmetic element, for carrying out back-propagating operation to neural network, obtaining according to the propagated forward operation values The weight updated value of non-quantized segment into the set of matrices to be quantified；

Set of matrices updating unit to be quantified, for according to the weight updated value, part quantization set of matrices and described Compensation quantization set of matrices, updates the set of matrices to be quantified.

11. neural network weight quantization device as claimed in claim 8, which is characterized in that

The compensation quantization modules, comprising:

Weight quantized value acquiring unit to be compensated is used for according to the importance parameter and the compensation quantified goal, described Weight quantized value to be compensated is determined in weight quantized value；

First compensation quantifying unit, for corresponding with the weight quantized value to be compensated according to the weight quantized value to be compensated The difference of the weighted value to be quantified compensates quantization to the weight quantized value to be compensated, obtains the weight to be compensated The compensation quantized value of quantized value, and be worth to obtain compensation quantization set of matrices according to the compensation quantization.

12. neural network weight quantization device as claimed in claim 8, it is characterised in that:

The compensation quantization modules, comprising:

Iterative compensation unit, for the number of iterations, each the number of iterations according to the importance parameter, the compensation quantization Iterative compensation ratio, the difference of the weight quantized value weighted value to be quantified corresponding with the weight quantized value, it is right The weight quantized value is iterated compensation quantization, obtains iterative compensation quantized value, and obtain according to the iterative compensation quantized value To iterative compensation quantization matrix set.

13. neural network weight quantization device as claimed in claim 8, which is characterized in that

The neural network weight quantization device, further includes:

Desired value compensates quantization modules, for the difference according to the compensation quantization desired value and the compensation quantized value, to institute It states compensation quantized value and continues compensation quantization, until obtaining the compensation quantized value for meeting the compensation quantization desired value, and root It is worth to obtain compensation quantization set of matrices according to the compensation quantization.

14. neural network weight quantization device as claimed in claim 8, which is characterized in that the weight quantization modules, packet It includes:

Positive and negative weight matrix set acquiring unit, for by the weighted value to be quantified in the set of matrices to be quantified according to positive and negative The difference of symbol, is quantized to log space, respectively obtains positive weights quantized value and negative weight quantized value, and according to the positive weights Quantized value obtains positive weights set of matrices, obtains negative weight set of matrices according to the negative weight quantized value；

The compensation quantization modules include:

Positive negative weight excursion matrix set acquiring unit, for the positive weights quantized value in the positive weights set of matrices, and The difference of weighted value to be quantified corresponding with the positive weights quantized value compensates quantization, obtains positive weights offset entry value, and Positive weights excursion matrix set is obtained according to the positive weights deviant, and