CN114692814A - Quantification method for optimizing neural network model activation - Google Patents

Quantification method for optimizing neural network model activation Download PDF

Info

Publication number
CN114692814A
CN114692814A CN202011617711.5A CN202011617711A CN114692814A CN 114692814 A CN114692814 A CN 114692814A CN 202011617711 A CN202011617711 A CN 202011617711A CN 114692814 A CN114692814 A CN 114692814A
Authority
CN
China
Prior art keywords
model
training
value
bit
maxvalue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011617711.5A
Other languages
Chinese (zh)
Inventor
张东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Ingenic Technology Co ltd
Original Assignee
Hefei Ingenic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Ingenic Technology Co ltd filed Critical Hefei Ingenic Technology Co ltd
Priority to CN202011617711.5A priority Critical patent/CN114692814A/en
Publication of CN114692814A publication Critical patent/CN114692814A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/045Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a quantization method for optimizing neural network model activation, which aims to overcome the defects in the prior art and solve the problems of serious precision loss and difficult convergence of the existing low-bit (for example, 2-bit) model in the training process. The method is based on a full-precision model fine-tuning low-bit model: firstly, a data set is used for training a full-precision model to reach target precision, and then a low-bit model is trained in a fine-tuning mode based on the full-precision model. The method is characterized in that a 4-bit model is trained based on a full-precision model, a 2-bit model is trained based on the 4-bit model, the maximum value of a feature map is counted in a moving average mode when the 4-bit and 2-bit low-bit models are trained, and the feature map is updated again during each training without depending on the counting result of the previous model.

Description

Quantification method for optimizing neural network model activation
Technical Field
The invention relates to the technical field of convolutional neural network acceleration, in particular to a quantization method for optimizing neural network model activation.
Background
In recent years, with the rapid development of science and technology, a big data age has come. Deep learning takes a Deep Neural Network (DNN) as a model, and achieves remarkable results in key fields of many human intelligence, such as image recognition, reinforcement learning, semantic analysis and the like. The Convolutional Neural Network (CNN) is a typical DNN structure, can effectively extract hidden layer features of an image and accurately classify the image, and is widely applied to the field of image identification and detection in recent years.
However, in the prior art, the Relu function is mostly adopted when a full-precision model is trained, because the real number range represented by the full-precision number is very wide, the range of values required in the training process can be met, but when low bits are trained, because of the limit of bit width, the representation range is limited, the model cannot be effectively converged in the training process, and the precision of the final model is not ideal.
Technical terms commonly used in the prior art include:
convolutional Neural Networks (CNN): is a type of feedforward neural network that contains convolution calculations and has a depth structure.
And (3) quantification: quantization refers to the process of approximating a continuous value (or a large number of possible discrete values) of a signal to a finite number (or fewer) of discrete values.
Low bit rate: and quantizing the data into data with bit width of 8bit, 4bit or 2 bit.
BN (Batch Normalization): the batch normalization operation can be viewed as a special neural network layer that is added before each layer of the nonlinear activation function.
Relu (Rectified Linear Unit) operation: also called modified linear unit, is a commonly used activation function in artificial neural networks, and usually refers to a nonlinear function represented by a ramp function and its variants.
Disclosure of Invention
In order to solve the above problems, the method aims to overcome the defects in the prior art, and solve the problems that the existing low-bit (e.g. 2-bit) model has serious precision loss and is difficult to converge in the training process.
The method is based on a full-precision model fine-tuning low-bit model: firstly, a data set is used for training a full-precision model to reach target precision, and then a low-bit model is finely trained based on the full-precision model.
Specifically, the invention provides a quantization method for optimizing neural network model activation, the method is based on a full-precision model, a 4-bit model is firstly trained, then a 2-bit model is trained based on the 4-bit model, the maximum value of a feature map is counted in a sliding average mode when the 4-bit and 2-bit low-bit models are trained, and the maximum value is updated again during each training without depending on the statistical result of the previous model.
The method comprises the following steps:
s1, training a full-precision model to reach target precision based on the data set;
s2, training a weight and activating a model quantized to 4bit based on the full-precision model, and counting the maximum value of feature map while training;
s3, training weight and activating the model quantized to 2bit based on the 4bit model trained in the step S2, and counting the maximum value of feature map again.
The step S1 further includes:
s1.1, determining training data:
the data set of the training model is ImageNet1000, which is a subset of the ImageNet data set, and comprises a training set of 1.2 millions, 5 ten thousand verification sets, 15 ten thousand test sets and 1000 categories;
s1.2, determining a training model:
the basic neural network model adopted by training is MobileNet V1, and the network is a model based on deep separable convolution;
s1.3, selecting an activation function:
the MobileNet V1 model adds batch normalization BN and activation function Relu operation after each layer of convolution;
s1.4, training a network:
the basic steps for training the network are: firstly, training 60 epochs by using an adam optimizer, and then using an SGD optimizer until the training is finished;
s1.5, testing the network effect:
and testing the network result by using the test set.
In step S1.3, since the trained model needs to be quantized to a low bit and then the feature map needs to be quantized to 2 bits, the activate function Relu operation may also convert the Relu activate function to ReluX during training, as shown in equation 1:
Figure BDA0002875309090000031
the step S2 further includes:
s2.1, data quantization: quantizing the data to be quantized according to a formula 2 to obtain low-bit data, and quantizing the weight sum activation to 4 bits during training:
Figure BDA0002875309090000032
description of variables: wfFor full-precision data being an array, WqTo simulate the quantized data, maxwFull precision data WfMedian maximum value, minwFull precision data WfB is the bit width after quantization;
s2.2, acquiring the maximum value of feature map while training the model, then counting the maximum value by a moving average method, and expressing the maximum value by maxValue;
s2.3, updating the parameter maxValue obtained by each layer of activation function by a moving average method, as shown in formula 3:
vt=β·vt-1+(1-β)·(θt) Equation 3
Description of the variables: v. oftIs the value of the variable v at time t, beta is a weighting factor, thetatIs the value of the variable v at time t, vt-1Is the value of the variable v at time t-1.
The step S2.2 obtains the maximum value by processing the following steps:
1 assigning relux (v) to a parameter v ═ relux (v);
getChannelNum (v) acquiring the channel number of feature map and assigning to parameter channels, wherein the parameter channels is getChannelNum (v);
getBatchNum (v) acquires the size of the batch of feature map and assigns the size to a parameter batchNum which is getBatchNum (v);
4, initializing a parameter maxValue, wherein maxValue is 0.0;
5 tag from 0to batchNum value: for tag is 0to batchNum do;
6:vValueTag=v[tag];
getChannelMax (v) obtains the maximum value on each channel of feature map and assigns it to parameter percannelMax (getChannelMax (vValueTag))
8, reducisum (perChannelMax) calculates the sum of the variable perChannelMax, then divides the sum by the parameter values, adds the value with the parameter maxValue and then assigns the value to the parameter maxValue, maxValue + (reducisum (perChannelMax))/channels;
9:end for;
dividing the value of the parameter maxValue by the value of batchNum and assigning to the parameter maxValue: maxValue/batchNum.
In step S2.3, the value of the weighting coefficient β in the moving average is set to 0.996, and the initial value is set to 3.0, where the initial value is v in formula 3tAnd t is 0; the X value in the training process is maxValue, maxValue is recorded, and the statistical maximum value is used in the reasoning process without recalculating the maximum value of feature.
The step S3 includes:
s3.1, data quantization: quantizing the data to be quantized according to a formula 2 to obtain low-bit data, and quantizing the weight sum activation to 2 bits during training:
Figure BDA0002875309090000051
description of the variables: w is a group offFor full-precision data being an array, WqTo simulate quantized data, maxwFull precision data WfMedian maximum value, minwFull precision data WfB is the bit width after quantization;
s3.2, acquiring the maximum value of feature map while training the model, then counting the maximum value by a moving average method, and expressing the maximum value by maxValue;
s3.3, updating the parameter maxValue obtained by each layer of activation function by a moving average method, as shown in formula 3:
vt=β·vt-1+(1-β)·(θt) Equation 3
Description of variables: v. oftIs the value of the variable v at time t, beta is a weighting factor, thetatIs the value of the variable v at time t, vt-1Is the value of the variable v at time t-1.
When training the 2bit model, maxValue needs to be counted again, the value of the weighting coefficient beta during the moving average is set to 0.998, the initial value is set to 2.0, wherein the initial value is v in formula 3tAnd t is 0.
The method may further comprise:
s4, testing the network effect;
and S5, outputting the network.
Thus, the present application has the advantages that:
(1) firstly, training a full-precision model based on a data set, then training a 4-bit model and a 2-bit model, reducing the training difficulty, and improving the convergence rate and the final effect of the model;
(2) because the distribution of the feature map changes after the bit width is changed every time, the value of the feature map is counted again when the bit width is changed, so that each layer can obtain the maximum value which is most suitable for the actual distribution under the current bit width.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principle of the invention.
FIG. 1 is a schematic flow diagram of the process of the present invention.
Fig. 2 is a flow diagram of BN and Relu operations.
Fig. 3 is a schematic flow chart of converting the Relu activation function into ReluX during training.
Fig. 4 is a flow chart diagram of low bit model training.
Fig. 5 is a schematic diagram of the processing steps of acquiring the maximum value of feature map, and then counting the maximum value by the moving average method.
Fig. 6 is a schematic diagram of the entire training process.
Detailed Description
In order that the technical contents and advantages of the present invention can be more clearly understood, the present invention will now be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, the present invention relates to a quantitative method of optimizing neural network model activation, the method comprising the steps of:
s1, training a full-precision model to reach target precision based on the data set;
s2, training a weight and activating a model quantized to 4bit based on the full-precision model, and counting the maximum value of feature map while training;
s3, training weight and activating the model quantized to 2bit based on the 4bit model trained in the step S2, and counting the maximum value of feature map again.
Specifically, the following are included:
1, full-precision model training:
1) training data:
the dataset for the training model is ImageNet1000, which is a subset of the ImageNet dataset with about 1.2 millions of training set, 5 ten thousand of validation set, 15 thousand of test set, 1000 classes.
2) Model:
the basic neural network model used in this training is MobileNetV1, a model based on deep separable convolutions, on which some of the modifications mentioned herein are modified.
3) Selecting an activation function:
the MobileNetV1 model adds BN and Relu operations after each layer of convolution, as shown in fig. 2, and the MobileNetV1 model adds BN and Relu operations after each layer of convolution. Conv2D is convolved first, then BN, and then Relu operation. Wherein the BN batch normalization layer is defined such that when using a batch gradient descent (or minibatch), the output of the previous layer is normalized in the dimension of the batch, i.e. normalized
Figure BDA0002875309090000071
Wherein the content of the first and second substances,
Figure BDA0002875309090000072
Figure BDA0002875309090000073
where n is the number of input batches,
Figure BDA0002875309090000074
is the ith in the previous layer output batch and epsilon is a smaller number set to avoid dividing by 0.
However, since the trained model needs to be quantized to low bits, if the Relu activation function is directly adopted, the maximum value of the feature map of each layer activation output is unknown and has no upper bound, which is disadvantageous for quantization to low bits, and since the feature map needs to be quantized to 2 bits later, the Relu activation function is changed to Relux during training:
Figure BDA0002875309090000081
fig. 3 shows a flow of converting the Relu activation function into ReluX during training in fig. 2.
4) Training a network:
the basic steps for training the network are: 60 epochs are trained by the adam optimizer first, and then the SGD optimizer is used until the training is finished.
5) Testing the network effect:
and testing the network result by using the test set.
2. Training a low bit model:
as shown in fig. 4, the low bit model training process specifically includes: firstly training a 4-bit model, secondly training a 2-bit model, testing the network effect again, and finally outputting the network.
1) Training a 4-bit model:
data quantization: the data to be quantized is quantized according to the following formula to obtain low-bit data.
Figure BDA0002875309090000082
Description of variables: wfFor full-precision data being an array, WqTo simulate the quantized data, maxwFull precision data WfMedian maximum value, minwFull precision data WfAnd b is the bit width after quantization.
The method comprises the following steps of quantizing weight and activation to 4 bits during first-step training, acquiring the maximum value of feature map through the following processing while training a model, and counting the maximum value through a moving average method, wherein the processing steps are as follows:
1:v=ReluX(v)
2:channels=getChannelsNum(v)
3:batchNum=getBatchNum(v)
4:maxValue=0.0
5:for tag=0 to bachNum do
6:vValueTag=v[tag]
7:perChannelMax=getChannelMax(vValueTag)
8:
Figure BDA0002875309090000091
9:end for
10:
Figure BDA0002875309090000092
11:max Value=max(max Value,0.0)
12:maxValue=min(maxValue,3.0)
description of the function: getChannelNum (v) obtains the number of channels of feature map, getBatchNum (v) obtains the size of the bandwidth of feature map, getChannelMax (v) obtains the maximum value of each channel of feature map, redeucum (v) sums the variables v.
3. The maxValue obtained for each layer of activation function is updated by the moving average method, and the formula is as follows:
vt=β·vt-1+(1-β)·(θt)
description of variables: v. oftIs the value of the variable v at time t, beta is a weighting coefficient, thetatIs the value of the variable v at time t, vt-1Is the value of the variable v at time t-1.
The value of the weight coefficient β at the time of moving average is set to 0.996, and the initial value is set to 3.0. The X value in the training process is maxValue, maxValue is recorded, and the statistical maximum value is used in the reasoning process without recalculating the maximum value of feature. The flow is shown in fig. 5, the X value in the training process is maxValue, and maxValue is recorded, and the maximum value of statistics is used in the reasoning process without recalculating the maximum value of feature: after the BN operation, the maximum value computeMaxValue is calculated, and the function ReluX (x ═ maxValue) is calculated, where the parameter of the function is the maximum value maxValue. As shown in fig. 6, the whole training process: calculating weight, calculating quantization quantize (weight) according to the weight, pre-dividing a layer PreLayer, and performing convolution and batch normalization according to a quantization result; relux (x ═ maxValue) is further performed; quantification of Quantize (x) was performed.
2) Model for training 2bit
After the first training step, a model with the weight and the activation both quantized to 4 bits is obtained, then the model with the weight and the activation both quantized to 2 bits is trained based on the model, the specific process is consistent with the model process for training 4 bits, maxValue needs to be counted again when the model for 2 bits is trained, the value of the weighting coefficient beta during the sliding average is set to be 0.998, and the initial value is set to be 2.0.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A quantification method for optimizing activation of a neural network model is characterized in that a 4-bit model is trained based on a full-precision model, a 2-bit model is trained based on the 4-bit model, the maximum value of a feature map is counted in a sliding average mode when the 4-bit and 2-bit low-bit models are trained, and the maximum value is updated again during each training without depending on the counting result of the previous model.
2. The quantitative method for optimizing neural network model activation according to claim 1, wherein the method comprises the following steps:
s1, training a full-precision model to reach target precision based on the data set;
s2, training a weight and activating a model quantized to 4bit based on the full-precision model, and counting the maximum value of feature map while training;
s3, training weight and activating the model quantized to 2bit based on the 4bit model trained in the step S2, and counting the maximum value of feature map again.
3. The quantitative method for optimizing neural network model activation according to claim 1, wherein the step S1 further comprises:
s1.1, determining training data:
the data set of the training model is ImageNet1000, which is a subset of the ImageNet data set, and comprises a training set of 1.2 millions, 5 ten thousand verification sets, 15 ten thousand test sets and 1000 categories;
s1.2, determining a training model:
the basic neural network model adopted by training is MobileNet V1, and the network is a model based on deep separable convolution;
s1.3, selecting an activation function:
the MobileNet V1 model adds batch normalization BN and activation function Relu operation after each layer of convolution;
s1.4, training a network:
the basic steps for training the network are: firstly, training 60 epochs by using an adam optimizer, and then using an SGD optimizer until the training is finished;
s1.5, testing the network effect:
and testing the network result by using the test set.
4. The quantization method for optimizing neural network model activation according to claim 3, wherein in step S1.3, since the trained model needs to be quantized to low bits and then the feature map needs to be quantized to 2 bits, the Relu activation function operation further transforms the Relu activation function into Relux during training, as shown in equation 1:
Figure FDA0002875309080000021
5. the quantitative method for optimizing neural network model activation according to claim 2, wherein the step S2 further comprises:
s2.1, data quantization: quantizing the data to be quantized according to a formula 2 to obtain low-bit data, and quantizing the weight sum activation to 4 bits during training:
Figure FDA0002875309080000022
description of variables: wfFor full-precision data being an array, WqTo simulate the quantized data, maxwFull precision data WfMedian maximum value, minwFull precision data WfB is the bit width after quantization;
s2.2, acquiring the maximum value of feature map while training the model, then counting the maximum value by a moving average method, and expressing the maximum value by maxValue;
s2.3, updating the parameter maxValue obtained by each layer of activation function by a moving average method, as shown in formula 3:
vt=β·vt-1+(1-β)·(θt) Equation 3
Description of variables: v. oftIs the value of the variable v at time t, beta is a weighting factor, thetatIs the value of the variable v at time t, vt-1Is the value of the variable v at time t-1.
6. A quantitative method of optimizing neural network model activation as claimed in claim 5, wherein said step S2.2 is processed to obtain the maximum value by:
1 assigning relux (v) to a parameter v ═ relux (v);
getChannelNum (v) acquiring the channel number of feature map and assigning to parameter channels, wherein the parameter channels is getChannelNum (v);
getBatchNum (v) acquires the size of the batch of feature map and assigns the size to a parameter batchNum which is getBatchNum (v);
4, initializing a parameter maxValue, wherein maxValue is 0.0;
5, tag is from 0to batchNum value, the following steps are carried out: for tag is 0to batchNum do;
6:vValueTag=v[tag];
getChannelMax (v) obtains the maximum value on each channel of feature map and assigns it to parameter percannelMax (getChannelMax (vValueTag))
8, reducisum (perChannelMax) calculates the sum of the variable perChannelMax, then divides the sum by the parameter values, adds the value with the parameter maxValue and then assigns the value to the parameter maxValue, maxValue + (reducisum (perChannelMax))/channels;
9:end for;
10, dividing the value of the parameter maxValue by the value of batchNum and assigning the value to the parameter maxValue: maxValue ═ maxValue/batchNum.
7. The quantization method for optimizing neural network model activation according to claim 5, wherein in step S2.3, the value of the weight coefficient β is set to 0.996 and the initial value is set to 3.0 during the moving average, wherein the initial value is v in formula 3tAnd t is 0; then trainIn the process, the value X is maxValue, maxValue is recorded, and the statistical maximum value is used in the reasoning process without recalculating the maximum value of feature.
8. The quantitative method for optimizing neural network model activation according to claim 2, wherein the step S3 includes:
s3.1, data quantization: quantizing the data to be quantized according to a formula 2 to obtain low-bit data, and quantizing the weight sum activation to 2 bits during training:
Figure FDA0002875309080000041
description of variables: wfFor full-precision data being an array, WqTo simulate quantized data, maxwFull precision data WfMedian maximum value, minwFull precision data WfB is the bit width after quantization;
s3.2, acquiring the maximum value of feature map while training the model, then counting the maximum value by a moving average method, and expressing the maximum value by maxValue;
s3.3, updating the parameter maxValue obtained by each layer of activation function by a moving average method, as shown in formula 3:
vt=β·vt-1+(1-β)·(θt) Equation 3
Description of variables: v. oftIs the value of the variable v at time t, beta is a weighting factor, thetatIs the value of the variable v at time t, vt-1Is the value of the variable v at time t-1.
9. The quantization method for optimizing neural network model activation according to claim 8, wherein maxValue needs to be re-counted when training 2bit model, the value of the weighting coefficient β is set to 0.998 during moving average, and the initial value is set to 2.0, wherein v is the initial value in formula 3tAnd t is 0.
10. The quantitative method for optimizing neural network model activation of claim 1, wherein the method further comprises:
s4, testing the network effect;
and S5, outputting the network.
CN202011617711.5A 2020-12-31 2020-12-31 Quantification method for optimizing neural network model activation Pending CN114692814A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011617711.5A CN114692814A (en) 2020-12-31 2020-12-31 Quantification method for optimizing neural network model activation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011617711.5A CN114692814A (en) 2020-12-31 2020-12-31 Quantification method for optimizing neural network model activation

Publications (1)

Publication Number Publication Date
CN114692814A true CN114692814A (en) 2022-07-01

Family

ID=82133556

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011617711.5A Pending CN114692814A (en) 2020-12-31 2020-12-31 Quantification method for optimizing neural network model activation

Country Status (1)

Country Link
CN (1) CN114692814A (en)

Similar Documents

Publication Publication Date Title
US20210089922A1 (en) Joint pruning and quantization scheme for deep neural networks
CN107729999A (en) Consider the deep neural network compression method of matrix correlation
CN112116030A (en) Image classification method based on vector standardization and knowledge distillation
Hu et al. Opq: Compressing deep neural networks with one-shot pruning-quantization
CN112183742B (en) Neural network hybrid quantization method based on progressive quantization and Hessian information
CN111008694B (en) Depth convolution countermeasure generation network-based data model quantization compression method
CN108596890B (en) Full-reference image quality objective evaluation method based on vision measurement rate adaptive fusion
CN110276451A (en) One kind being based on the normalized deep neural network compression method of weight
CN113657491A (en) Neural network design method for signal modulation type recognition
CN111860779A (en) Rapid automatic compression method for deep convolutional neural network
CN113282926B (en) Malicious software classification method based on three-channel image
CN113206808B (en) Channel coding blind identification method based on one-dimensional multi-input convolutional neural network
CN112085668B (en) Image tone mapping method based on region self-adaptive self-supervision learning
CN114692814A (en) Quantification method for optimizing neural network model activation
CN112906883A (en) Hybrid precision quantization strategy determination method and system for deep neural network
CN110288002B (en) Image classification method based on sparse orthogonal neural network
CN116956997A (en) LSTM model quantization retraining method, system and equipment for time sequence data processing
CN113762500B (en) Training method for improving model precision during quantization of convolutional neural network
CN115063374A (en) Model training method, face image quality scoring method, electronic device and storage medium
CN114565080A (en) Neural network compression method and device, computer readable medium and electronic equipment
CN113762499B (en) Method for quantizing weights by using multiple channels
CN108805944B (en) Online image set compression method with maintained classification precision
CN113762497B (en) Low-bit reasoning optimization method for convolutional neural network model
CN114580605A (en) Convolutional neural network, operation optimization method, device, electronic device and medium
Zhou et al. Optimizing information theory based bitwise bottlenecks for efficient mixed-precision activation quantization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination