CN109388779A - A kind of neural network weight quantization method and neural network weight quantization device - Google Patents
A kind of neural network weight quantization method and neural network weight quantization device Download PDFInfo
- Publication number
- CN109388779A CN109388779A CN201710656027.XA CN201710656027A CN109388779A CN 109388779 A CN109388779 A CN 109388779A CN 201710656027 A CN201710656027 A CN 201710656027A CN 109388779 A CN109388779 A CN 109388779A
- Authority
- CN
- China
- Prior art keywords
- quantization
- weight
- compensation
- quantified
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013139 quantization Methods 0.000 title claims abstract description 413
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 103
- 238000000034 method Methods 0.000 title claims abstract description 78
- 239000011159 matrix material Substances 0.000 claims abstract description 108
- 238000012549 training Methods 0.000 claims description 25
- 230000008859 change Effects 0.000 claims description 18
- 230000000644 propagated effect Effects 0.000 claims description 15
- 238000005070 sampling Methods 0.000 abstract description 12
- 238000003062 neural network model Methods 0.000 abstract 1
- 230000006835 compression Effects 0.000 description 15
- 238000007906 compression Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 8
- 230000008901 benefit Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 210000005036 nerve Anatomy 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 241000208340 Araliaceae Species 0.000 description 4
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 4
- 235000003140 Panax quinquefolius Nutrition 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 4
- 235000008434 ginseng Nutrition 0.000 description 4
- 230000000694 effects Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011551 log transformation method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a kind of neural network weights to quantify method and apparatus, which comprises step 1, obtains set of matrices to be quantified;Step 2, the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, obtains the weight quantized value of the weighted value to be quantified, and obtain quantization matrix set;Step 3, according to the difference of the weight quantized value weighted value to be quantified corresponding with the weight quantized value, quantization is compensated to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and obtain compensation quantization set of matrices;Step 4, the quantization matrix set and the compensation are quantified into set of matrices, it is medium to be used that the quantized result as the set of matrices to be quantified is stored in neural network.Compensation quantization in the present invention keeps its quantization sampling interval denser, to reduce the performance loss of neural network model brought by quantization error by giving more important weighted superposition shift term.
Description
Technical field
The present invention relates to artificial neural network technology field more particularly to a kind of neural network weight quantization methods and nerve
Network weight quantization device.
Background technique
In recent years, with the fast development of artificial intelligence technology, deep learning neural network obtains in terms of pattern-recognition
Huge success, such as image classification, target detection, image segmentation, speech recognition and machine translation etc..In above-mentioned field
In, the either shallow model of the remote ultra-traditional of the performance of deep learning model, or even reached the level of the mankind in some aspects.However,
The neural network of better performances usually has larger model parameter, so that its computation complexity is higher.The complexity of calculating
It spends while being embodied in space amount of storage (EMS memory occupation when huge model storage volume and operation) and time calculation amount (single
Tens billion of secondary floating-point operations needed for reasoning) two aspect on.Therefore, carrying out compression and acceleration to neural network becomes particularly important,
Especially to operate on such as embedded device, integrated hardware equipment and large-scale data processing center application for.
It is converted into fixed-point number in such a way that weight quantifies, or by weight, or establishes quantization weight code book, realizes that weight is total
It enjoys, model can be effectively compressed, reduce the amount of storage of neural network.Thus, effective quantization method how is designed, and then be directed to
The quantization method designs efficient hardware configuration, is nerual network technique field urgent problem to be solved.
Summary of the invention
The present invention provides a kind of neural network weight quantization method and neural network weight quantization device, existing to solve
The problem of calculating is complicated when neural network weight quantifies in technology, low efficiency.
According to one aspect of the present invention, a kind of neural network weight quantization method is provided, comprising:
Step 1, set of matrices to be quantified is obtained;
Step 2, the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, obtained described wait measure
Change the weight quantized value of weighted value, and is worth to obtain quantization matrix set according to the weight quantization;
Step 3, according to the difference of the weight quantized value weighted value to be quantified corresponding with the weight quantized value,
Quantization is compensated to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and according to the compensation rate
Change is worth to obtain compensation quantization set of matrices;
Step 4, the quantization matrix set and the compensation are quantified into set of matrices, as the set of matrices to be quantified
Quantized result to be stored in neural network medium to be used.
The present invention also provides a kind of neural network weight quantization devices, comprising:
Module is obtained, for obtaining set of matrices to be quantified;
Weight quantization modules, for the weighted value to be quantified in the set of matrices to be quantified to be quantized to log space,
The weight quantized value of the weighted value to be quantified is obtained, and is worth to obtain quantization matrix set according to the weight quantization;
Quantization modules are compensated, for according to the weight quantized value power to be quantified corresponding with the weight quantized value
The difference of weight values compensates quantization to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and according to
The compensation quantization is worth to obtain compensation quantization set of matrices;
Quantized result memory module, for the quantization matrix set and the compensation to be quantified set of matrices, as institute
Stating the quantized result of set of matrices to be quantified, to be stored in neural network medium to be used.
Neural network weight quantization method provided by the present invention, is first quantized to log space for set of matrices to be quantified,
After obtaining quantization matrix set, then the quantization difference of initial quantization matrix compensated into quantization, obtains quantization compensation matrix, institute
It states quantization matrix and the quantization compensation matrix is the quantized result of the matrix to be quantified.By to initial quantization matrix into
Row compensation quantization does further compensation quantization to quantization difference caused by quantization, and the compensation quantization passes through to more important
Weighted superposition shift term keeps its quantization sampling interval denser, to reduce neural network mould brought by quantization error
The performance loss of type.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 1;
Fig. 2 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 2;
Fig. 3 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 3;
Fig. 4 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 4;
Fig. 5 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 5;
Fig. 6 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 6;
Fig. 7 is weight power quantization sampling schematic diagram;
Fig. 8 is the quantization of weight power (band compensation) sampling schematic diagram;
Fig. 9 is the structural schematic diagram of the neural network weight quantization device in embodiment of the present invention method 7;
Figure 10 is the structural schematic diagram of the neural network weight quantization device in embodiment of the present invention method 8.
Specific embodiment
The exemplary embodiment that the present invention will be described in more detail below with reference to accompanying drawings.Although being shown in attached drawing disclosed
Exemplary embodiment, it being understood, however, that may be realized in various forms the present invention without should be by embodiments set forth here institute
Limitation.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be complete by the scope of the present invention
Whole is communicated to those skilled in the art.
Fig. 1 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 1, as shown in Fig. 1 originally
Neural network weight quantization method in inventive method embodiment 1, comprising:
Step 1, set of matrices to be quantified is obtained.
It specifically, firstly, for the excitation of neural network weight quantization behavior, is determined by network training state,
Wherein, network training state includes at least: the degree of stability that the progress and network that network is currently trained currently are trained.Such as network
Currently trained progress is the training burden for completing 25%, and network is in steady state (can not restrain).
Neural network is made of multiple layers, and every layer of weight parameter (weighted value) can be expressed as a matrix.?
During quantifying to the weight of neural network, can once by all layers of all weighted values of neural network composition to
Quantization matrix set is quantified, and one layer therein or multiple layers of whole weighted values also may be selected and form set of matrices to be quantified
Quantified, or select the fractional weight value of whole layers or part layer to form set of matrices to be quantified and quantified, according to demand
Flexibly set.
Step 2, the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, obtained described wait measure
Change the weight quantized value of weighted value, and is worth to obtain quantization matrix set according to the weight quantization.
Specifically, it is determined that after good set of matrices to be quantified, it once will be all to be quantified in the set of matrices to be quantified
Weighted value quantified, obtain quantization matrix set.
Step 3, according to the difference of the weight quantized value weighted value to be quantified corresponding with the weight quantized value,
Quantization is compensated to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and according to the compensation rate
Change is worth to obtain compensation quantization set of matrices.
Specifically, it is assumed that weighted value to be quantified is certain real number A, and the weight quantized value of quantized result is A*, then A with
Difference between A* is diff_A, and each weight quantized value compensates quantization all in accordance with corresponding diff_A, mended
After repaying quantization, the quantization sampling interval is closeer, and the weight matrix set after quantization is also more relevantly fitted original power to be quantified
Weight set of matrices, ensure that the calculated performance of neural network.
Step 4, the quantization matrix set and the compensation are quantified into set of matrices, as the set of matrices to be quantified
Quantized result to be stored in neural network medium to be used.
Specifically, the quantization matrix set and the compensation quantify set of matrices, the as described set of matrices to be quantified
Quantized result.
Set of matrices to be quantified is first quantized to logarithm sky by neural network weight quantization method provided by the present embodiment
Between, after obtaining quantization matrix set, then the quantization difference of initial quantization matrix compensated into quantization, obtains quantization compensation square
Battle array, the quantization matrix and the quantization compensation matrix are the quantized result of the matrix to be quantified.By to initial quantization
Matrix compensates quantization, does further compensation quantization to quantization difference caused by quantization, the compensation quantization is by giving
More important weighted superposition shift term keeps its quantization sampling interval denser, to reduce nerve brought by quantization error
The performance loss of network model.
Fig. 2 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 2, as shown in Fig. 2 originally
Neural network weight quantization method in inventive method embodiment 2, comprising:
Step 1A, obtains set of matrices to be quantified and weight quantitative information, and the weight quantization information includes importance ginseng
Several and part quantified goal.
Specifically, the importance parameter, modulus value including weight, based on weight to the accumulative contribution of activation value, be based on
Accumulative contribution etc. of the weight to output valve carries out subsequent quantization step, different weights to weight according to the importance parameter
The property wanted parameter will lead to different quantized results.
The part quantified goal is gradually measured for the set of matrices to be quantified to be divided at least two parts
Change, including the range for the weight for quantifying to cover every time, the mode table of preset weighted value range or preset weight threshold can be used
Show, is greater than the weight parameter of preset weight threshold, or the weight parameter within the scope of certain preset weighted value,
It is quantized in the part of this part quantization.
Step 21, according to the importance parameter and the part quantified goal, in the set of matrices to be quantified really
Determine part weighted value to be quantified.
Specifically, the weighted value to be quantified in the set of matrices to be quantified is arranged according to the importance parameter
Sequence determines this part weighted value to be quantified for needing to be quantized further according to the part quantified goal.Due to different important
Property parameter can provide different ranking results, according to identical part quantified goal, finally provide different parts power to be quantified
Weight values.
For example, selecting the modulus value of weight relatively large as the importance parameter, by 20% weight set to be quantified
As preset quantizing range, then according to the modulus value of the weight it is descending in all weight matrix set to be quantified to
After quantization weight value is ranked up, quantified by the weighted value to be quantified that modulus value is up to before small select 20% as this part
Target.
Step 22, part weighted value to be quantified is quantized to log space, obtains part weighted value to be quantified
Weight quantized value, and be worth to obtain quantization matrix set according to the weight quantization.
Specifically, by the maximum weighted value to be quantified of 20% modulus value before selection, after being quantized to log space, before obtaining
The quantized result of the maximum weighted value to be quantified of 20% modulus value, fractional weight quantized value, and obtain part quantization set of matrices.?
In actual use, such as preferential selection quantifies relatively large (more important) weighted value to be quantified of modulus value, last stage quantization mould
It is worth relatively small weight, can effectively avoid the decline of neural network performance in last stage quantization (no re -training process).
Step 3, according to the difference of the weight quantized value weighted value to be quantified corresponding with the weight quantized value,
Quantization is compensated to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and according to the compensation rate
Change is worth to obtain compensation quantization set of matrices.
Specifically, with the step 3 implemented in 1.
Step 3A quantifies set of matrices according to the part and the compensation quantifies set of matrices, carries out to neural network
Predetermined training, and the set of matrices to be quantified is updated according to the training result of the predetermined training.
Specifically, quantify due to having carried out part, need to update the weight for being not yet quantized part, to guarantee entire nerve
The recovery of the model performance of network.
Step 3B, return step 21, until the weighted value to be quantified in the set of matrices to be quantified all complete by quantization.
Specifically, in set of matrices to be quantified in the updated, after return step 21, according to the importance parameter and institute
Part quantified goal is stated, another part weighted value to be quantified is determined in the set of matrices to be quantified, and continue to execute
Subsequent step, until all set of matrices to be quantified all complete by quantization.
Step 4, the quantization matrix set and the compensation are quantified into set of matrices, as the set of matrices to be quantified
Quantized result to be stored in neural network medium to be used.
Specifically, with the step 4 in embodiment 1, due to finally getting for the last time by the way of gradually quantifying
Quantization matrix set and it is described gradually quantization in get every time compensation quantization set of matrices, as the square to be quantified
The quantized result of battle array set.
Neural network weight quantization method provided by the present embodiment, first by set of matrices to be quantified according to preset important
Property parameter and preset part quantified goal, are gradually quantized to log space, after obtaining quantization matrix set, then by initial quantization
The quantization difference of matrix compensates quantization, quantization compensation matrix is obtained, then according to the quantization matrix set and the amount
Change compensation matrix set is treated quantization weight set of matrices and is updated, and according to importance parameter and part quantified goal to mind
It is trained through network, updates non-quantized weighted value to be quantified, the quantization matrix set finally got and the quantization are mended
Repay set of matrices, the quantized result of the as described matrix to be quantified.Due to using the method gradually quantified, first quantify one
Point, then the mode of quantization scale is gradually expanded so that the precision of quantization is higher so that neural network it is final performance it is more preferable.
It is described in one of the embodiments, that set of matrices and compensation quantization matrix stack are quantified according to the part
It closes, neural network is trained, and the set of matrices to be quantified is updated according to training result, comprising: according to the part
Quantization matrix set and the compensation quantify set of matrices, and propagated forward operation is carried out in neural network, obtains propagated forward
Operation values;According to the propagated forward operation values, back-propagating operation is carried out to neural network, obtains the matrix stack to be quantified
The weight updated value of non-quantized segment in conjunction;According to the weight updated value, part quantization set of matrices and the compensation
Quantization matrix set updates the set of matrices to be quantified.
Specifically, since weight quantized value is the power weighted value for having carried out logarithmic transformation (with 2 bottom of for), described
Convolution algorithm multiplication in propagated forward calculating process can be substituted with displacement, greatly accelerate the speed of propagated forward operation.
In back-propagating operation, using gradient descent method, for the weight of non-quantized segment, local derviation of the calculating loss function to it
Number, obtains updated value.
In the present embodiment, during being trained to neural network, power weight makes the result of weight quantization
Obtain good compression and while acceleration effect, hardware realization be more efficiently succinct, economic value and practical value compared with
It is high.
Fig. 3 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 3, as shown in Fig. 3 originally
Neural network weight quantization method in inventive method embodiment 3, comprising:
Step 1B, obtains set of matrices to be quantified and weight quantitative information, and the weight quantization information includes importance ginseng
Number and compensation quantified goal.
Specifically, the importance parameter, with the elaboration in embodiment 2.The compensation quantified goal is used for the amount
Change set of matrices a part of quantization weight value rather than whole quantization weight value compensate quantization.The compensation quantifies mesh
Mark includes the range for the quantization weight value that compensation quantization is covered every time, can use preset quantization weight value range or preset quantization
The mode of weight threshold indicates, is greater than the weight parameter of preset quantization weight threshold value, or be in certain preset quantization
Weight parameter within the scope of weighted value is quantified by compensation.
Step 2, the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, obtained described wait measure
Change the weight quantized value of weighted value, and is worth to obtain quantization matrix set according to the weight quantization.
Specifically, for the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, institute is obtained
The process for stating the weight quantized value of weighted value to be quantified using the one step completed mode in embodiment 1, or uses embodiment 2
In the mode gradually completed, without limitation.
Step 31, according to the importance parameter and the compensation quantified goal, determined in the weight quantized value to
Backoff weight quantized value.
Specifically, as the weighted value to be quantified in the set of matrices to be quantified is quantized to log space mistake in step 2
Journey, by the way of gradually completing, then used importance parameter during determining weight quantized value to be compensated, and really
It is identical to determine the importance parameter used during the weighted value to be quantified of part.
The importance parameter and the compensation quantified goal are used cooperatively, and determine the process of weight quantized value to be compensated,
It is similar with the step 21 in embodiment 2, it repeats no more.
Step 32, corresponding described to be quantified according to the weight quantized value to be compensated and the weight quantized value to be compensated
The difference of weighted value compensates quantization to the weight quantized value to be compensated, obtains the benefit of the weight quantized value to be compensated
Quantized value is repaid, and is worth to obtain compensation quantization set of matrices according to the compensation quantization.
Specifically, selected weight quantized value to be compensated is only compensated into quantization.
Step 4, the quantization matrix set and the compensation are quantified into set of matrices, as the set of matrices to be quantified
Quantized result to be stored in neural network medium to be used.
Specifically, as the weighted value to be quantified in the set of matrices to be quantified is quantized to log space mistake in step 2
Journey further includes step 3A and step 3B with embodiment 2 then after step 3 by the way of gradually completing.
It in the present embodiment, can be according to default in such a way that selected part weight quantized value compensates quantization
Importance parameter and compensation quantified goal, to the quantization weight value in quantization matrix set compensate quantization range carry out
Flexibly control further improves the flexibility of neural network weight quantization provided by the present invention, while guaranteeing performance,
Improve compression ratio and computational efficiency.
Fig. 4 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 4, as shown in Fig. 4 originally
Neural network weight quantization method in inventive method embodiment 4, comprising:
Step 1C, obtains set of matrices to be quantified and weight quantitative information, and the weight quantization information includes importance ginseng
The number of iterations, the iterative compensation ratio of each the number of iterations that number, compensation quantify.
Specifically, the importance parameter, with the elaboration in embodiment 2.It is the number of iterations of the compensation quantization, each described
The iterative compensation ratio of the number of iterations repeatedly compensates quantum chemical method for carrying out, so that the result of compensation quantization is more accurate.Its
In, the number of iterations of the compensation quantization is the number for compensating quantization, can arbitrarily be set according to demand, each iteration time
When to iterate to calculate every time, the quantization to be compensated next time chosen in the result quantified was compensated from last time for several iterative compensation ratios
Range or threshold value, can be identical every time, iterative compensation ratio that can also be each is reduced than the iterative compensation ratio of last time.
Step 2, the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, obtained described wait measure
Change the weight quantized value of weighted value, and is worth to obtain quantization matrix set according to the weight quantization.
Specifically, for the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, institute is obtained
The process for stating the weight quantized value of weighted value to be quantified using the one step completed mode in embodiment 1, or uses embodiment 2
In the mode gradually completed, without limitation.
Step 3 ', according to the importance parameter, the number of iterations, the iteration of each the number of iterations of the compensation quantization
The difference of compensating proportion, the weight quantized value weighted value to be quantified corresponding with the weight quantized value, to the power
Re-quantization value is iterated compensation quantization, obtains iterative compensation quantized value, and obtain iteration according to the iterative compensation quantized value
Compensation quantization set of matrices.
Specifically, according to the importance parameter, it is preset compensation quantization the number of iterations, each the number of iterations repeatedly
For compensating proportion, the weighted value to be quantified of each iterative compensation quantization is chosen, process is similar with the step 21 in embodiment 2, no
It repeats again.
After the compensation quantization that have passed through preset the number of iterations time, multiple compensation quantization set of matrices, conduct are got
Iterative compensation quantization matrix set.
Step 4, the quantization matrix set and the compensation are quantified into set of matrices, as the set of matrices to be quantified
Quantized result to be stored in neural network medium to be used.
Specifically, as the weighted value to be quantified in the set of matrices to be quantified is quantized to log space mistake in step 2
Journey further includes step 3A and step 3B with embodiment 2 then after step 3 by the way of gradually completing.
In the present embodiment, pass through the number of iterations of preset band compensation quantization and the iterative compensation of each the number of iterations
Quantization weight value has been carried out multiple compensation and quantified by ratio, is obtained multiple compensation and is quantified set of matrices.Further improve this hair
The flexibility of the quantization of neural network weight provided by bright improves compression ratio and computational efficiency while guaranteeing performance.
Fig. 5 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 5, as shown in Fig. 5 originally
Neural network weight quantization method in inventive method embodiment 5, comprising:
Step 1D, obtains set of matrices to be quantified and weight quantitative information, and the weight quantization information includes importance ginseng
Number, compensation quantization desired value.
Specifically, the importance parameter, with the elaboration in embodiment 2.The compensation quantifies desired value, is preset benefit
Repaying quantization needs desired value to be achieved.
Step 2, the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, obtained described wait measure
Change the weight quantized value of weighted value, and is worth to obtain quantization matrix set according to the weight quantization.
Specifically, the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, obtain it is described to
The process of the weight quantized value of quantization weight value, using the one step completed mode in embodiment 1, or using in embodiment 2
The mode gradually completed, without limitation.
Step 3, according to the difference of the weight quantized value weighted value to be quantified corresponding with the weight quantized value,
Quantization is compensated to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and according to the compensation rate
Change is worth to obtain compensation quantization set of matrices.
Specifically, with the step 3 in embodiment 1.
Step 3C, according to the difference of the compensation quantization desired value and the compensation quantized value, to the compensation quantized value
Continue compensation quantization, until obtaining the compensation quantized value for meeting the compensation quantization desired value, and according to the compensation rate
Change is worth to obtain compensation quantization set of matrices.
Specifically, according to the compensation quantized value in preset compensation quantization desired value and the compensation quantization set of matrices
Between quantization difference, it is determined whether compensate quantization, and desired value quantified according to the compensation and determines whether to continue
Compensation quantization sets the compensation quantization desired value according to demand, controls the precision of the compensation quantization set of matrices,
It can be quantified by one or many compensation, obtain the compensation quantization set of matrices for meeting the compensation quantization desired value.
Step 4, the quantization matrix set and the compensation are quantified into set of matrices, as the set of matrices to be quantified
Quantized result to be stored in neural network medium to be used.
In the present embodiment, desired value is quantified by predesigned compensation, after carrying out one or many compensation quantizations, is accorded with
The compensation quantization set of matrices for closing the compensation quantization desired value, since the preset compensation quantization desired value can be according to demand
Flexibly setting, therefore what be can be convenient controls the precision of compensation quantization.It further improves provided by the present invention
The flexibility of neural network weight quantization improves compression ratio and computational efficiency while guaranteeing performance.
Fig. 6 is the flow chart of the neural network weight quantization method in embodiment of the present invention method 6, as shown in Fig. 6 originally
Neural network weight quantization method in inventive method embodiment 6, comprising:
Step 1E obtains set of matrices to be quantified.
Specifically, with the step 1 in embodiment 1.
Step 20E, judges whether the symbol of weighted value to be quantified is positive, if being positive, meets step 21E, if not just, connecing step
Rapid 22E.
Specifically, before being quantified the weighted value to be quantified, judge that the symbol of the weighted value to be quantified is
It is just still negative, and enters corresponding process flow.
The weighted value to be quantified that symbol is positive in the set of matrices to be quantified is quantized to log space by step 21E,
Positive weights quantized value is obtained, and positive weights set of matrices is obtained according to the positive weights quantized value.Meet step 31E.
The weighted value to be quantified that symbol is negative in the set of matrices to be quantified is quantized to log space by step 22E,
Negative weight quantized value is obtained, and negative weight set of matrices is obtained according to the negative weight quantized value.Meet step 32E.
Specifically, LOG parameter sets L is obtained according to the difference of weight sign symbol, step 21E and step 22E1And L2.Its
In, L1The element value of middle positive weights value corresponding position is the intermediate parameters for converting its modulus value progress LOG, negative weighted value pair
Answer the element value of position to preset the maximum negative that quantizing bit number can indicate, so that corresponding weighted value is approximate after decoding
It is 0.Conversely, L2The element value of middle negative weighted value corresponding position is the intermediate parameters for converting its modulus value progress LOG, positive to weigh
The element value of weight values corresponding position is presets the maximum negative that quantizing bit number can indicate, so that corresponding weight after decoding
Value is approximately 0.Step 21E and step 22E has obtained two to respectively indicate having for positive weights value and negative weighted value sparse as a result,
The power weight matrix L of structure1And L2.Mapping mode includes but is not limited to:
1) element x in weight set N to be quantified to model is realized according to formula 1iCarry out LOG transformation:
yi=round (log2(|xi|)) (1)
Wherein, yiFor power weight quantized value, yi∈L。
2) according to the statistical information of weight set N to be quantified, respectively determine network in each can learning layer quantization code book
C1, the code book is on log space according to 2nIt divides, then each weight in set N is mapped to the code book value of equivalent layer.
To two power weight matrix L1,L2Carry out rarefaction.Step 21E and step 22E carries out positive negative weight respectively
Logarithmic transformation and part modulus value are quantified as 0 close to 0 weight after logarithmic transformation, even if not considering that pre-training obtains
The sparsity of the weight matrix arrived, LOG parameter sets L1And L2Still there is apparent sparsity structure sparse matrix therefore can be used
Expression-form carries out storage and operation.
Model Weight storage volume can be made to obtain more times of compressions based on sparse matrix expression-form described in the present embodiment, be based on
Sparse matrix multiplication can be such that model forward inference arithmetic speed accelerates, and compression ratio is higher, and operation acceleration is more obvious.
Step 31E, to the positive weights quantized value in the positive weights set of matrices, and it is corresponding with the positive weights quantized value
The difference of weighted value to be quantified compensate quantization, obtain positive weights offset entry value, and obtain according to the positive weights deviant
To positive weights excursion matrix set.
Step 32E, to the negative weight quantized value in the negative weight set of matrices, and it is corresponding with the negative weight quantized value
The difference of weighted value to be quantified compensate quantization, obtain negative weight offset entry value, and obtain according to the negative weight deviant
To negative weight excursion matrix set.
Specifically, for the weight of more important (such as quantization sampling interval relatively thin big modulus value weight), logarithmic transformation
Influence of the generated quantization error to model performance is particularly evident, and the present embodiment weight set more important to this is mended
Quantization is repaid, i.e., further logarithmic transformation (formula 2) is made to quantization difference:
The weight actual quantization modulus value is obtained by formula 3:
According to the difference of weight sign symbol, LOG parameter sets L is obtained1' and L2', wherein L1' in positive weights value correspond to position
The element value set is to be quantified difference to carry out the intermediate parameters that LOG is converted, and the element value of negative weighted value corresponding position is
The maximum negative that default quantizing bit number can indicate, so that corresponding weighted value is approximately 0 after decoding.Conversely, L2' in bear
The element value of weighted value corresponding position is to be quantified difference to carry out the intermediate parameters that LOG is converted, and positive weights value corresponds to position
The element value set is presets the maximum negative that quantizing bit number can indicate, so that corresponding weighted value is approximately 0 after decoding.
Two higher power weight shift term matrixes of degree of rarefication for respectively indicating positive weights value and negative weighted value have been obtained as a result,.
Without compensating the weight of quantization, it is negative that compensation quantized value is set as the maximum that default quantizing bit number can indicate
Number, so that corresponding compensation authority weight values are approximately 0 after decoding.It is described that the weight set P for compensating quantization is needed to pass through compensation rate
Change importance selecting module to generate, production method includes but is not limited to: in a network respectively can the quantization code book that determines of learning layer
(sampling interval signal generates compensation quantization code book C2 (C2 as in Fig. 7), abandoned the larger value with smaller portions in code book both ends to C1
∈C1).Assuming that the yardage in code book C1 is n1, the yardage in code book C2 is n2 (n2 < n1), and C1 is superimposed production with C2 permutation and combination
A raw yardage is n1*n2, the denser code book C in larger sampled value interval (sampling interval signal such as Fig. 8).For code book C's
Quantization according to C1 really on the basis of being quantified, quantization of the selective superposition for code book C2.
In the present embodiment, the yardage n2 in compensation quantization code book is less than the yardage n1 for quantifying code book used for the first time, accordingly
, the default quantizing bit number of compensation quantization institute is just less than the default quantizing bit number of the institute of quantization for the first time.For example, first time quantization
Number is preset as 4 bits, and compensation quantization only needs 2 bits just and can reach requirement.
Step 4E, by the positive weights set of matrices, the negative weight set of matrices, the positive weights excursion matrix collection
It closes, the negative weight excursion matrix set, the quantized result as the set of matrices to be quantified, which is stored in neural network, to be made
With.
In the present embodiment, positive negative weight is respectively mapped to corresponding power weight matrix by Model Weight quantization, is obtained
Two sparse weight matrix.Positive negative weight quantifies respectively, effectively multiplying can be avoided (to pass through shift operation in the present invention
Realize) in judgement to weight symbol, therefore accelerate arithmetic speed, while the present invention is dilute using compressing by the sparse matrix
Space-out (Compressed Sparse Row, CSR) or the lattice of the sparse column (Compressed Sparse Column, CSC) of compression
Formula stores the nonzero element of sparse matrix, saves a large amount of memory space.By compensation quantify, obtain two it is sparse
Spend higher weight shift term matrix.The compensation quantization is by giving more important weighted superposition shift term, between sampling its quantization
Every denser, to reduce the loss of model performance brought by quantization error.The compression DCNN obtained through the invention, can
Suitable for the high technology industry field of the calculating such as mobile terminal, embedded equipment, robot and constrained storage, economical and practical value
It is high.
Fig. 9 is the structural schematic diagram of the neural network weight quantization device in embodiment of the present invention method 7, as shown in Figure 9
Embodiment of the present invention method 7 in neural network weight quantization device, comprising:
Module 10 is obtained, for obtaining set of matrices to be quantified.
Weight quantization modules 20, for the weighted value to be quantified in the set of matrices to be quantified to be quantized to logarithm sky
Between, the weight quantized value of the weighted value to be quantified is obtained, and be worth to obtain quantization matrix set according to the weight quantization.
Quantization modules 30 are compensated, for corresponding with the weight quantized value described to be quantified according to the weight quantized value
The difference of weighted value compensates quantization to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and root
It is worth to obtain compensation quantization set of matrices according to the compensation quantization.
Quantized result memory module 40, for the quantization matrix set and the compensation to be quantified set of matrices, as
It is medium to be used that the quantized result of the set of matrices to be quantified is stored in neural network.
Set of matrices to be quantified is first quantized to logarithm sky by neural network weight quantization device provided by the present embodiment
Between, after obtaining quantization matrix set, then the quantization difference of initial quantization matrix compensated into quantization, obtains quantization compensation square
Battle array, the quantization matrix and the quantization compensation matrix are the quantized result of the matrix to be quantified.By to initial quantization
Matrix compensates quantization, does further compensation quantization to quantization difference caused by quantization, the compensation quantization is by giving
More important weighted superposition shift term keeps its quantization sampling interval denser, to reduce nerve brought by quantization error
The performance loss of network model.
Figure 10 is the structural schematic diagram of the neural network weight quantization device in embodiment of the present invention method 8, such as Figure 10 institute
Neural network weight quantization device in the embodiment of the present invention method 8 shown, comprising:
Module 10 is obtained, for obtaining set of matrices to be quantified;It is also used to obtain weight quantitative information, the weight quantization
Information includes importance parameter and part quantified goal.
Part quantization weighted value acquiring unit 21, for according to the importance parameter and the part quantified goal,
Part weighted value to be quantified is determined in the set of matrices to be quantified.
Part quantifying unit 22 obtains the part for part weighted value to be quantified to be quantized to log space
The weight quantized value of weighted value to be quantified, and be worth to obtain quantization matrix set according to the weight quantization.
Quantization modules 30 are compensated, for corresponding with the weight quantized value described to be quantified according to the weight quantized value
The difference of weighted value compensates quantization to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and root
It is worth to obtain compensation quantization set of matrices according to the compensation quantization.
Training module 30A, for quantifying set of matrices and compensation quantization set of matrices according to the part, to nerve
Network carries out predetermined training, and updates the set of matrices to be quantified according to the training result of the predetermined training.
Training terminates module 30B, for returning to the part quantization weighted value acquiring unit, until the matrix to be quantified
Weighted value to be quantified in set all complete by quantization.
The quantization matrix set and the compensation are quantified set of matrices, as described by quantized result memory module 40
It is medium to be used that the quantized result of set of matrices to be quantified is stored in neural network.
Neural network weight quantization device provided by the present embodiment, first by set of matrices to be quantified according to preset important
Property parameter and preset part quantified goal, are gradually quantized to log space, after obtaining quantization matrix set, then by initial quantization
The quantization difference of matrix compensates quantization, quantization compensation matrix is obtained, then according to the quantization matrix set and the amount
Change compensation matrix set is treated quantization weight set of matrices and is updated, and according to importance parameter and part quantified goal to mind
It is trained through network, updates non-quantized weighted value to be quantified, the quantization matrix set finally got and the quantization are mended
Repay set of matrices, the quantized result of the as described matrix to be quantified.Due to using the method gradually quantified, first quantify one
Point, then the mode of quantization scale is gradually expanded so that the precision of quantization is higher so that neural network it is final performance it is more preferable.
The training module 30A in one of the embodiments, including propagated forward arithmetic element, for according to
Part quantization set of matrices and the compensation quantify set of matrices, and propagated forward operation is carried out in neural network, obtain it is preceding to
Propagate operation values;Back-propagating arithmetic element, for carrying out back-propagating to neural network according to the propagated forward operation values
Operation obtains the weight updated value of non-quantized segment in the set of matrices to be quantified;Set of matrices updating unit to be quantified is used
In quantifying set of matrices according to the weight updated value, part quantization set of matrices and the compensation, update described wait measure
Change set of matrices.
In the present embodiment, during being trained to neural network, power weight makes the result of weight quantization
Obtain good compression and while acceleration effect, hardware realization be more efficiently succinct, economic value and practical value compared with
It is high.
The weight quantitative information in one of the embodiments, including weight quantized value acquiring unit to be compensated, are used for
According to the importance parameter and the compensation quantified goal, weight quantized value to be compensated is determined in the weight quantized value;
First compensation quantifying unit, for corresponding with the weight quantized value to be compensated described according to the weight quantized value to be compensated
The difference of weighted value to be quantified compensates quantization to the weight quantized value to be compensated, obtains the weight quantization to be compensated
The compensation quantized value of value, and be worth to obtain compensation quantization set of matrices according to the compensation quantization.
It in the present embodiment, can be according to default in such a way that selected part weight quantized value compensates quantization
Importance parameter and compensation quantified goal, to the quantization weight value in quantization matrix set compensate quantization range carry out
Flexibly control further improves the flexibility of neural network weight quantization provided by the present invention, while guaranteeing performance,
Improve compression ratio and computational efficiency.
The weight quantitative information in one of the embodiments, the iteration time including importance parameter, compensation quantization
The iterative compensation ratio of several, each the number of iterations;The compensation quantization modules, comprising: iterative compensation unit, for according to institute
State importance parameter, the number of iterations of the compensation quantization, the iterative compensation ratio of each the number of iterations, weight quantization
It is worth the difference of the weighted value to be quantified corresponding with the weight quantized value, compensation rate is iterated to the weight quantized value
Change, obtains iterative compensation quantized value, and iterative compensation quantization matrix set is obtained according to the iterative compensation quantized value.
In the present embodiment, pass through the number of iterations of preset band compensation quantization and the iterative compensation of each the number of iterations
Quantization weight value has been carried out multiple compensation and quantified by ratio, is obtained multiple compensation and is quantified set of matrices.Further improve this hair
The flexibility of the quantization of neural network weight provided by bright improves compression ratio and computational efficiency while guaranteeing performance.
The weight quantitative information in one of the embodiments, including compensation quantization desired value;The neural network power
Re-quantization device further includes desired value compensation quantization modules, for quantifying desired value and compensation quantization according to the compensation
The difference of value continues compensation quantization to the compensation quantized value, until obtaining the benefit for meeting the compensation quantization desired value
Quantized value is repaid, and is worth to obtain compensation quantization set of matrices according to the compensation quantization.
In the present embodiment, desired value is quantified by predesigned compensation, after carrying out one or many compensation quantizations, is accorded with
The compensation quantization set of matrices for closing the compensation quantization desired value, since the preset compensation quantization desired value can be according to demand
Flexibly setting, therefore the precision of compensation quantization is controlled with can be convenient.Further improve mind provided by the present invention
The flexibility quantified through network weight improves compression ratio and computational efficiency while guaranteeing performance.
The weight quantization modules 20 in one of the embodiments, comprising: positive and negative weight matrix set acquiring unit,
For the weighted value to be quantified in the set of matrices to be quantified according to the difference of sign symbol, to be quantized to log space, point
Positive weights quantized value and negative weight quantized value are not obtained, and positive weights set of matrices, root are obtained according to the positive weights quantized value
Negative weight set of matrices is obtained according to the negative weight quantized value;The compensation quantization modules 30 include: positive negative weight excursion matrix
Gather acquiring unit, for the positive weights quantized value in the positive weights set of matrices, and with the positive weights quantized value pair
The difference for the weighted value to be quantified answered compensates quantization, obtains positive weights offset entry value, and according to the positive weights deviant
Obtain positive weights excursion matrix set, and to the negative weight quantized value in the negative weight set of matrices, and with the negative power
The difference of the corresponding weighted value to be quantified of re-quantization value compensates quantization, obtains negative weight offset entry value, and according to described negative
Weight deviant obtains negative weight excursion matrix set.
In the present embodiment, positive negative weight is respectively mapped to corresponding power weight matrix by Model Weight quantization, is obtained
Two sparse weight matrix.Positive negative weight quantifies respectively, effectively multiplying can be avoided (to pass through shift operation in the present invention
Realize) in judgement to weight symbol, therefore accelerate arithmetic speed, while the present invention is dilute using compressing by the sparse matrix
Space-out (Compressed Sparse Row, CSR) or the lattice of the sparse column (Compressed Sparse Column, CSC) of compression
Formula stores the nonzero element of sparse matrix, saves a large amount of memory space.By compensation quantify, obtain two it is sparse
Spend higher weight shift term matrix.The compensation quantization is by giving more important weighted superposition shift term, between sampling its quantization
Every denser, to reduce the loss of model performance brought by quantization error.The compression DCNN obtained through the invention, can
Suitable for the high technology industry field of the calculating such as mobile terminal, embedded equipment, robot and constrained storage, economical and practical value
It is high.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage
Medium may include: ROM, RAM, disk or CD etc..
In short, the foregoing is merely illustrative of the preferred embodiments of the present invention, it is not intended to limit the scope of the present invention.
All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention
Within protection scope.
Claims (14)
1. a kind of neural network weight quantization method, which is characterized in that the described method includes:
Step 1, set of matrices to be quantified is obtained;
Step 2, the weighted value to be quantified in the set of matrices to be quantified is quantized to log space, obtains the power to be quantified
The weight quantized value of weight values, and be worth to obtain quantization matrix set according to the weight quantization;
Step 3, according to the difference of the weight quantized value weighted value to be quantified corresponding with the weight quantized value, to institute
It states weight quantized value and compensates quantization, obtain the compensation quantized value of the weight quantized value, and according to the compensation quantized value
Obtain compensation quantization set of matrices;
Step 4, the quantization matrix set and the compensation are quantified into set of matrices, the amount as the set of matrices to be quantified
It is medium to be used that change result is stored in neural network.
2. neural network weight quantization method as described in claim 1, which is characterized in that before step 2, the method is also
Include:
Weight quantitative information is obtained, the weight quantization information includes: importance parameter and part quantified goal;
The step 2, comprising:
Step 21, according to the importance parameter and the part quantified goal, the determining section in the set of matrices to be quantified
Divide weighted value to be quantified;
Step 22, part weighted value to be quantified is quantized to log space, obtains the power of part weighted value to be quantified
Re-quantization value, and be worth to obtain quantization matrix set according to the weight quantization;
After step 3, the method also includes:
Step 3A quantifies set of matrices according to the part and the compensation quantifies set of matrices, makes a reservation for neural network
Training, and the set of matrices to be quantified is updated according to the training result of the predetermined training;
Step 3B, return step 21, until the weighted value to be quantified in the set of matrices to be quantified all complete by quantization.
3. neural network weight quantization method as claimed in claim 2, which is characterized in that described to quantify square according to the part
Battle array set and the compensation quantify set of matrices, are trained to neural network, and described to be quantified according to training result update
Set of matrices, comprising:
Quantify set of matrices according to the part and the compensation quantifies set of matrices, propagated forward fortune is carried out in neural network
It calculates, obtains propagated forward operation values;
According to the propagated forward operation values, back-propagating operation is carried out to neural network, obtains the set of matrices to be quantified
In non-quantized segment weight updated value;
According to the weight updated value, the part quantization set of matrices and the compensation quantify set of matrices, update described in
Quantization matrix set.
4. neural network weight quantization method as described in claim 1, which is characterized in that
The weight quantization information includes: importance parameter and compensation quantified goal;
The step 3, comprising:
According to the importance parameter and the compensation quantified goal, weight quantization to be compensated is determined in the weight quantized value
Value;
According to the difference of the weight quantized value to be compensated weighted value to be quantified corresponding with the weight quantized value to be compensated
Value, compensates quantization to the weight quantized value to be compensated, obtains the compensation quantized value of the weight quantized value to be compensated, and
It is worth to obtain compensation quantization set of matrices according to the compensation quantization.
5. neural network weight quantization method as described in claim 1, it is characterised in that:
The weight quantization information includes: importance parameter, the number of iterations of compensation quantization, the iteration of each the number of iterations are mended
Repay ratio;
The step 3, comprising:
According to the importance parameter, the number of iterations, the iterative compensation ratio of each the number of iterations, institute of the compensation quantization
The difference for stating the weight quantized value weighted value to be quantified corresponding with the weight quantized value carries out the weight quantized value
Iterative compensation quantization, obtains iterative compensation quantized value, and obtain iterative compensation quantization matrix according to the iterative compensation quantized value
Set.
6. neural network weight quantization method as described in claim 1, which is characterized in that
The weight quantization information includes: compensation quantization desired value;
After the step 3, the method also includes:
According to the difference of the compensation quantization desired value and the compensation quantized value, the compensation quantized value is continued to compensate
Quantization until obtaining the compensation quantized value for meeting the compensation quantization desired value, and is worth according to the compensation quantization and is compensated
Quantization matrix set.
7. neural network weight quantization method as described in claim 1, which is characterized in that
The step 2, comprising:
By the weighted value to be quantified in the set of matrices to be quantified according to the difference of sign symbol, it is quantized to log space, point
Positive weights quantized value and negative weight quantized value are not obtained, and positive weights set of matrices, root are obtained according to the positive weights quantized value
Negative weight set of matrices is obtained according to the negative weight quantized value;
The step 3, comprising:
To the positive weights quantized value in the positive weights set of matrices, and weight to be quantified corresponding with the positive weights quantized value
The difference of value compensates quantization, obtains positive weights offset entry value, and obtain positive weights offset according to the positive weights deviant
Set of matrices, and
To the negative weight quantized value in the negative weight set of matrices, and weight to be quantified corresponding with the negative weight quantized value
The difference of value compensates quantization, obtains negative weight offset entry value, and obtain negative weight offset according to the negative weight deviant
Set of matrices.
8. a kind of neural network weight quantization device characterized by comprising
Module is obtained, for obtaining set of matrices to be quantified;
Weight quantization modules are obtained for the weighted value to be quantified in the set of matrices to be quantified to be quantized to log space
The weight quantized value of the weighted value to be quantified, and be worth to obtain quantization matrix set according to the weight quantization;
Quantization modules are compensated, for according to the weight quantized value weighted value to be quantified corresponding with the weight quantized value
Difference, quantization is compensated to the weight quantized value, obtains the compensation quantized value of the weight quantized value, and according to described
Compensation quantization is worth to obtain compensation quantization set of matrices;
Quantized result memory module, for by the quantization matrix set and the compensation quantization set of matrices, as it is described to
It is medium to be used that the quantized result of quantization matrix set is stored in neural network.
9. neural network weight quantization device as claimed in claim 8, which is characterized in that further include:
The acquisition module is also used to obtain weight quantitative information, and the weight quantization information includes importance parameter and part
Quantified goal;
The weight quantization modules, comprising:
Part quantization weighted value acquiring unit, for according to the importance parameter and the part quantified goal, it is described to
Part weighted value to be quantified is determined in quantization matrix set;
It is to be quantified to obtain the part for part weighted value to be quantified to be quantized to log space for part quantifying unit
The weight quantized value of weighted value, and be worth to obtain quantization matrix set according to the weight quantization;
The neural network weight quantization device, further includes:
Training module, for according to the part quantify set of matrices and the compensation quantization set of matrices, to neural network into
The predetermined training of row, and the set of matrices to be quantified is updated according to the training result of the predetermined training;
Training terminates module, for returning to the part quantization weighted value acquiring unit, until in the set of matrices to be quantified
Weighted value to be quantified all quantization complete.
10. neural network weight quantization device as claimed in claim 9, which is characterized in that the training module, comprising:
Propagated forward arithmetic element, for quantifying set of matrices and compensation quantization set of matrices according to the part, in mind
Through carrying out propagated forward operation in network, propagated forward operation values are obtained;
Back-propagating arithmetic element, for carrying out back-propagating operation to neural network, obtaining according to the propagated forward operation values
The weight updated value of non-quantized segment into the set of matrices to be quantified;
Set of matrices updating unit to be quantified, for according to the weight updated value, part quantization set of matrices and described
Compensation quantization set of matrices, updates the set of matrices to be quantified.
11. neural network weight quantization device as claimed in claim 8, which is characterized in that
The weight quantization information includes: importance parameter and compensation quantified goal;
The compensation quantization modules, comprising:
Weight quantized value acquiring unit to be compensated is used for according to the importance parameter and the compensation quantified goal, described
Weight quantized value to be compensated is determined in weight quantized value;
First compensation quantifying unit, for corresponding with the weight quantized value to be compensated according to the weight quantized value to be compensated
The difference of the weighted value to be quantified compensates quantization to the weight quantized value to be compensated, obtains the weight to be compensated
The compensation quantized value of quantized value, and be worth to obtain compensation quantization set of matrices according to the compensation quantization.
12. neural network weight quantization device as claimed in claim 8, it is characterised in that:
The weight quantization information includes: importance parameter, the number of iterations of compensation quantization, the iteration of each the number of iterations are mended
Repay ratio;
The compensation quantization modules, comprising:
Iterative compensation unit, for the number of iterations, each the number of iterations according to the importance parameter, the compensation quantization
Iterative compensation ratio, the difference of the weight quantized value weighted value to be quantified corresponding with the weight quantized value, it is right
The weight quantized value is iterated compensation quantization, obtains iterative compensation quantized value, and obtain according to the iterative compensation quantized value
To iterative compensation quantization matrix set.
13. neural network weight quantization device as claimed in claim 8, which is characterized in that
The weight quantization information includes: compensation quantization desired value;
The neural network weight quantization device, further includes:
Desired value compensates quantization modules, for the difference according to the compensation quantization desired value and the compensation quantized value, to institute
It states compensation quantized value and continues compensation quantization, until obtaining the compensation quantized value for meeting the compensation quantization desired value, and root
It is worth to obtain compensation quantization set of matrices according to the compensation quantization.
14. neural network weight quantization device as claimed in claim 8, which is characterized in that the weight quantization modules, packet
It includes:
Positive and negative weight matrix set acquiring unit, for by the weighted value to be quantified in the set of matrices to be quantified according to positive and negative
The difference of symbol, is quantized to log space, respectively obtains positive weights quantized value and negative weight quantized value, and according to the positive weights
Quantized value obtains positive weights set of matrices, obtains negative weight set of matrices according to the negative weight quantized value;
The compensation quantization modules include:
Positive negative weight excursion matrix set acquiring unit, for the positive weights quantized value in the positive weights set of matrices, and
The difference of weighted value to be quantified corresponding with the positive weights quantized value compensates quantization, obtains positive weights offset entry value, and
Positive weights excursion matrix set is obtained according to the positive weights deviant, and
To the negative weight quantized value in the negative weight set of matrices, and weight to be quantified corresponding with the negative weight quantized value
The difference of value compensates quantization, obtains negative weight offset entry value, and obtain negative weight offset according to the negative weight deviant
Set of matrices.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710656027.XA CN109388779A (en) | 2017-08-03 | 2017-08-03 | A kind of neural network weight quantization method and neural network weight quantization device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710656027.XA CN109388779A (en) | 2017-08-03 | 2017-08-03 | A kind of neural network weight quantization method and neural network weight quantization device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109388779A true CN109388779A (en) | 2019-02-26 |
Family
ID=65412375
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710656027.XA Pending CN109388779A (en) | 2017-08-03 | 2017-08-03 | A kind of neural network weight quantization method and neural network weight quantization device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109388779A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909667A (en) * | 2019-11-20 | 2020-03-24 | 北京化工大学 | Lightweight design method for multi-angle SAR target recognition network |
CN112085185A (en) * | 2019-06-12 | 2020-12-15 | 上海寒武纪信息科技有限公司 | Quantization parameter adjusting method and device and related product |
CN112132024A (en) * | 2020-09-22 | 2020-12-25 | 中国农业大学 | Underwater target recognition network optimization method and device |
CN112749782A (en) * | 2019-10-31 | 2021-05-04 | 上海商汤智能科技有限公司 | Data processing method and related product |
CN113112009A (en) * | 2020-01-13 | 2021-07-13 | 中科寒武纪科技股份有限公司 | Method, apparatus and computer-readable storage medium for neural network data quantization |
WO2021164750A1 (en) * | 2020-02-21 | 2021-08-26 | 华为技术有限公司 | Method and apparatus for convolutional layer quantization |
CN113678465A (en) * | 2019-06-04 | 2021-11-19 | 谷歌有限责任公司 | Quantization constrained neural image compilation |
WO2022027442A1 (en) * | 2020-08-06 | 2022-02-10 | 华为技术有限公司 | Input preprocessing method and apparatus of image processing network, and output postprocessing method and apparatus of image processing network |
WO2023000898A1 (en) * | 2021-07-20 | 2023-01-26 | 腾讯科技(深圳)有限公司 | Image segmentation model quantization method and apparatus, computer device, and storage medium |
WO2023163419A1 (en) * | 2022-02-22 | 2023-08-31 | 삼성전자 주식회사 | Data processing method and data processing device using supplemented neural network quantization operation |
-
2017
- 2017-08-03 CN CN201710656027.XA patent/CN109388779A/en active Pending
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113678465A (en) * | 2019-06-04 | 2021-11-19 | 谷歌有限责任公司 | Quantization constrained neural image compilation |
CN112085185A (en) * | 2019-06-12 | 2020-12-15 | 上海寒武纪信息科技有限公司 | Quantization parameter adjusting method and device and related product |
CN112085185B (en) * | 2019-06-12 | 2024-04-02 | 上海寒武纪信息科技有限公司 | Quantization parameter adjustment method and device and related product |
CN112749782A (en) * | 2019-10-31 | 2021-05-04 | 上海商汤智能科技有限公司 | Data processing method and related product |
CN110909667A (en) * | 2019-11-20 | 2020-03-24 | 北京化工大学 | Lightweight design method for multi-angle SAR target recognition network |
CN113112009A (en) * | 2020-01-13 | 2021-07-13 | 中科寒武纪科技股份有限公司 | Method, apparatus and computer-readable storage medium for neural network data quantization |
CN113112009B (en) * | 2020-01-13 | 2023-04-18 | 中科寒武纪科技股份有限公司 | Method, apparatus and computer-readable storage medium for neural network data quantization |
WO2021164750A1 (en) * | 2020-02-21 | 2021-08-26 | 华为技术有限公司 | Method and apparatus for convolutional layer quantization |
WO2022027442A1 (en) * | 2020-08-06 | 2022-02-10 | 华为技术有限公司 | Input preprocessing method and apparatus of image processing network, and output postprocessing method and apparatus of image processing network |
CN112132024B (en) * | 2020-09-22 | 2024-02-27 | 中国农业大学 | Underwater target recognition network optimization method and device |
CN112132024A (en) * | 2020-09-22 | 2020-12-25 | 中国农业大学 | Underwater target recognition network optimization method and device |
WO2023000898A1 (en) * | 2021-07-20 | 2023-01-26 | 腾讯科技(深圳)有限公司 | Image segmentation model quantization method and apparatus, computer device, and storage medium |
WO2023163419A1 (en) * | 2022-02-22 | 2023-08-31 | 삼성전자 주식회사 | Data processing method and data processing device using supplemented neural network quantization operation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109388779A (en) | A kind of neural network weight quantization method and neural network weight quantization device | |
CN110880036B (en) | Neural network compression method, device, computer equipment and storage medium | |
CN108510067B (en) | Convolutional neural network quantification method based on engineering realization | |
CN106650813B (en) | A kind of image understanding method based on depth residual error network and LSTM | |
CN110175386B (en) | Method for predicting temperature of electrical equipment of transformer substation | |
CN112396181A (en) | Automatic pruning method and platform for general compression architecture of convolutional neural network | |
CN112241455B (en) | Automatic compression method and platform based on multi-level knowledge distillation pre-training language model | |
CN109754063A (en) | For learning the method and device of low precision neural network | |
JP2021505993A5 (en) | ||
CN108334945B (en) | Acceleration and compression method and device of deep neural network | |
CN107729999A (en) | Consider the deep neural network compression method of matrix correlation | |
CN110276451A (en) | One kind being based on the normalized deep neural network compression method of weight | |
CN108734266A (en) | Compression method and device, terminal, the storage medium of deep neural network model | |
WO2022126683A1 (en) | Method and platform for automatically compressing multi-task-oriented pre-training language model | |
US20220198276A1 (en) | Method and platform for pre-trained language model automatic compression based on multilevel knowledge distillation | |
CN108734264A (en) | Deep neural network model compression method and device, storage medium, terminal | |
CN116316591A (en) | Short-term photovoltaic power prediction method and system based on hybrid bidirectional gating cycle | |
CN116523079A (en) | Reinforced learning-based federal learning optimization method and system | |
CN115829024B (en) | Model training method, device, equipment and storage medium | |
CN109523016B (en) | Multi-valued quantization depth neural network compression method and system for embedded system | |
CN110796233A (en) | Self-adaptive compression method of deep residual convolution neural network based on transfer learning | |
CN108734268A (en) | Compression method and device, terminal, the storage medium of deep neural network model | |
CN114418105B (en) | Method and device for processing quantum application problem based on quantum circuit | |
WO2023207039A1 (en) | Data processing method and apparatus, and device and storage medium | |
CN111695687A (en) | Method and apparatus for training neural network for image recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190226 |