CN111783957B - Model quantization training method and device, machine-readable storage medium and electronic equipment - Google Patents

Model quantization training method and device, machine-readable storage medium and electronic equipment Download PDF

Info

Publication number
CN111783957B
CN111783957B CN202010634312.3A CN202010634312A CN111783957B CN 111783957 B CN111783957 B CN 111783957B CN 202010634312 A CN202010634312 A CN 202010634312A CN 111783957 B CN111783957 B CN 111783957B
Authority
CN
China
Prior art keywords
quantized
layer
maximum value
model
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010634312.3A
Other languages
Chinese (zh)
Other versions
CN111783957A (en
Inventor
刘岩
曲晓超
姜浩
杨思远
万鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meitu Technology Co Ltd
Original Assignee
Xiamen Meitu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meitu Technology Co Ltd filed Critical Xiamen Meitu Technology Co Ltd
Priority to CN202010634312.3A priority Critical patent/CN111783957B/en
Publication of CN111783957A publication Critical patent/CN111783957A/en
Application granted granted Critical
Publication of CN111783957B publication Critical patent/CN111783957B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application discloses a model quantization training method, a device, a machine-readable storage medium and electronic equipment, wherein the method comprises the following steps: aiming at each layer to be quantized in the model to be quantized, acquiring network parameters of the layer to be quantized, wherein the network parameters are weight values or activation values in a network; judging whether the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or not, or whether the maximum value is smaller than a preset parameter threshold value or not; if the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than a parameter threshold, taking the sum of the maximum value and a preset value as a new maximum value; and carrying out network quantization according to the maximum value and the minimum value of the parameters in the layer to be quantized, and obtaining the target model. The scheme can solve the problem that the model to be quantized can be normally quantized and trained when a certain layer is 0, the values of the certain layer are completely equal or the values of the certain layer are very close to 0 in the quantization and training process.

Description

Model quantization training method and device, machine-readable storage medium and electronic equipment
Technical Field
The application relates to the technical field of neural networks, in particular to a model quantization training method and device, a machine-readable storage medium and electronic equipment.
Background
With the rapid development of deep learning, the precision of the deep learning model is continuously improved. However, the higher the accuracy of the deep learning model, the more often the GPU needs to be relied upon for high performance. These deep learning models also require huge hardware resources to be consumed when applied, and are not suitable for mobile terminals and the like. At present, in order to solve the problem of applying a high-precision deep learning model on a mobile terminal, a method for quantizing the model is generally adopted to obtain a model which can be used on the mobile terminal, however, in the current model quantization method, network parameters of a certain layer in the model are equal, for example, all the network parameters are equal to 0, or when parameters of a certain layer are very close to 0, the situation that the model cannot be quantized can occur.
Disclosure of Invention
To overcome at least the above-mentioned shortcomings in the prior art, one of the objects of the present application is to provide a model quantization training method, which comprises:
Aiming at each layer to be quantized in the model to be quantized, acquiring network parameters of the layer to be quantized, wherein the network parameters are weight values or activation values in a network;
Judging whether the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or not, or whether the maximum value is smaller than a preset parameter threshold value or not;
If the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold, taking the sum of the maximum value and a preset value as a new maximum value;
and carrying out network quantization according to the maximum value and the minimum value of the parameters in the layer to be quantized to obtain a target model.
Optionally, performing network quantization according to the maximum value and the minimum value of the parameters in the layer to be quantized, and the process of obtaining the target model includes:
calculating the slope of the mapping according to the maximum value and the minimum value;
Mapping the network parameters of the layer to be quantized into integers in a preset numerical interval according to the slope;
and inversely mapping the integer into a floating point number according to the slope to obtain a target model.
Optionally, the method for calculating the slope of the mapping according to the maximum value and the minimum value is as follows:
wherein a is the minimum value of the network parameters in the layer to be quantized, b is the maximum value of the network parameters in the layer to be quantized, n is the number of integers in a preset numerical interval, and s (a, b, n) is the slope of the mapping.
Optionally, when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold, the method for mapping the parameters of the layer to be quantized to an integer q (r, a, b, n) in a preset value interval according to the slope is as follows:
clamp(r;a,b):=min(max(r,a),b)
b=a+m
When the maximum value and the minimum value in the network parameters of the layer to be quantized are unequal and the maximum value is greater than or equal to the parameter threshold, the method for mapping the parameters of the layer to be quantized to integers in a preset value interval according to the slope is as follows:
clamp(r;a,b):=min(max(r,a),b)
Wherein, the network parameters in the r layer to be quantized are truncated functions, q (r, a, b, n) is the network parameters after r is mapped to a preset numerical interval, n is the number of integers in the preset numerical interval, and s (a, b, n) is the slope of the mapping.
Optionally, the method further comprises:
acquiring a plurality of training samples;
and inputting the training samples into a pre-trained original network model to perform network training, so as to obtain the model to be quantized.
Optionally, the method further comprises:
acquiring data to be processed;
And inputting the data to be processed into the target model for data processing to obtain a processing result of the data to be processed.
Another object of the present application is to provide a model quantization training apparatus, the apparatus comprising:
The acquisition module is used for acquiring network parameters of each layer to be quantized in the model to be quantized;
The judging module is used for judging whether the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or not, or whether the maximum value is smaller than a preset parameter threshold value or not;
the adjusting module is used for taking the sum of the maximum value and a preset value as a new maximum value when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold value;
And the quantization module is used for carrying out network quantization according to the maximum value and the minimum value of the parameters in the layer to be quantized to obtain a target model.
Optionally, the quantization module is specifically configured to:
calculating the slope of the mapping according to the maximum value and the minimum value;
Mapping the network parameters of the layer to be quantized into integers in a preset numerical interval according to the slope;
and inversely mapping the integer into a floating point number according to the slope to obtain a target model.
It is a further object of the present application to provide a machine-readable storage medium storing an executable program which, when executed, causes a processor to implement a method according to any of the present application.
Another object of the present application is to provide an electronic device, which includes a memory and a processor, where the memory is electrically connected to the processor, and the memory stores an executable program, and the processor implements the method according to any one of the present application when executing the executable program.
Compared with the prior art, the application has the following beneficial effects:
In this embodiment, when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold, the sum of the maximum value and the preset value is used as a new maximum value, so that the difference between the maximum value and the minimum value of the network parameters of the layer to be quantized becomes larger, thereby realizing quantization of the layer to be quantized, and solving the problems that a certain layer is 0, the value of a certain layer is completely equal or the value of a certain layer is very close to 0 but cannot be trained in the quantization training process, so that the model to be quantized can be normally trained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic block diagram of an electronic device according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a model quantization method according to an embodiment of the present application;
FIG. 3 is a second flow chart of a model quantization method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of model training of a model quantization method according to an embodiment of the present application;
Fig. 5 is a schematic block diagram of a model quantization apparatus according to an embodiment of the present application.
Icon: 100-an electronic device; 110-model quantization training device; 111-an acquisition module; 112-a judging module; 113-an adjustment module; 114-a quantization module; 120-memory; 130-processor.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
In the description of the present application, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.
In recent years, deep learning has been rapidly developed, and in 2012, krizhevsky et al adopted a deep learning algorithm to obtain image net image classification match champions with a great improvement of 10% of accuracy over the conventional artificial design feature method adopted by the second. The computer vision game thereafter uses a large variety of deep learning models. These high-precision deep learning models rely on deep networks with hundreds or even billions of parameters, and conventional CPUs span such huge networks, with only GPUs with high computational power enabling relatively fast training of the network. In addition, the consumption of hardware using these deep learning models is also very large.
For the mobile terminal, because the hardware computing capability and the available storage space are limited, and most of deployed models are required to run in real time, the convolutional neural network with better effect in the current deep learning model is not suitable for being deployed on the mobile terminal. In order to apply a deep learning model trained by a convolutional neural network on a mobile terminal, a common way is to prune the model, reduce the number of layers of the network, or reduce the number of channels of each layer of the network, so as to reduce the size and the calculation amount of the model, but this reduces the performance of the deep learning model, so that the effect is poor. In another mode, the deep learning model is quantized, namely parameters of the float type in the model are converted into the int8 type, so that the size of the model can be reduced, and meanwhile, when the mobile terminal calculates the model, the calculation of the float is converted into the calculation of the int8, so that acceleration is realized.
However, when the deep learning model is quantized, all parameters of a certain layer in the model are equal, for example, all parameters are 0, or all parameters of a certain layer are close to a value of 0, so that the deep learning model cannot be quantized.
In order to solve the problem that the deep learning model cannot be quantized in the prior art, the embodiment of the application provides an electronic device 100.
Referring to fig. 1, fig. 1 is a schematic block diagram of an electronic device 100 according to an embodiment of the present application, where the electronic device 100 includes a model quantization training device 110, a memory 120 and a processor 130, and the memory 120 and the processor 130 are electrically connected directly or indirectly to each other for implementing data interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. Model quantization training device 110 includes at least one software functional module that may be stored in memory 120 in the form of software or Firmware (Firmware) or cured in an Operating System (OS) of electronic device 100. The processor 130 is configured to execute executable modules stored in the memory 120, such as software functional modules and computer programs included in the model quantization training device 110.
The electronic device 100 in this embodiment may be a mobile terminal, such as a mobile phone.
In order to solve the problem that the deep learning model cannot be quantized in the prior art, the embodiment of the application further provides a model quantization training method applicable to the electronic device 100, please refer to fig. 2, fig. 2 is a flow chart of the model quantization training method provided in the embodiment of the application. The method comprises steps S110-S140.
Step S110, for each layer to be quantized in the model to be quantized, acquiring network parameters of the layer to be quantized, wherein the network parameters are weight values or activation values in the network.
Step S120, judging whether the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or not, or whether the maximum value is smaller than a preset parameter threshold.
In step S130, if the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold, the sum of the maximum value and the preset value is taken as a new maximum value.
For example, when the minimum value a is equal to the minimum value b, then the new maximum value b=a+m, where m is a preset value.
And step S140, carrying out network quantization according to the maximum value and the minimum value of the parameters in the layer to be quantized, and obtaining the target model.
In this embodiment, the model to be quantized may be a convolutional neural network model, where the layer to be quantized may be, but is not limited to, a convolutional layer conv, or an active layer (such as Rule6 layer), and the layer to be quantized may also be other network layers that cannot be fused.
In this embodiment, when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal (including the case that both the maximum value and the minimum value are 0), or the maximum value is smaller than the parameter threshold, the sum of the maximum value and the preset value is taken as the new maximum value, so that the difference between the maximum value and the minimum value of the network parameters of the layer to be quantized becomes larger, quantization of the layer to be quantized is realized, and the problem that the value of a certain layer is completely equal in the quantization training process, for example, the value of a certain layer is 0, or the value of a certain layer is very close to 0, but training is not performed is solved, so that the model to be quantized can be normally used for quantization training.
In addition, the embodiment can enable a model with a certain redundancy to simulate quantization training and use of an Int8 fixed point forward frame.
Referring to fig. 3, in the present embodiment, step S140 includes sub-steps S141-S143.
Step S141, calculating the slope of the mapping according to the maximum value and the minimum value.
Step S142, the network parameters of the layer to be quantized are mapped into integers within a preset value interval according to the slope.
Specifically, in this embodiment, the floating-point network parameters of the layer to be quantized may be mapped to values within a preset value interval, and then the mapped network parameters may be rounded, and a rounding method may be adopted when rounding. In this embodiment, the floating point number (floating point type data) is changed into an integer (integer type data) and then into a floating point number, and the quantization loss in the forward frame calculation process is simulated through the process, so that the quantization loss of the Int8 vertex forward frame can be reduced.
Step S143, reversely mapping the integer into the floating point number according to the slope, and obtaining the target model.
The process of inversely mapping the integer to the floating point number according to the slope is opposite to the process of mapping the floating point number to the preset value interval (before rounding), and will not be described herein. The slope of the mapping is the mapping proportion, that is, the ratio of the corresponding value of the network parameter in the layer to be quantized in the mapped interval to the network parameter. In this embodiment, the slope of the mapping is calculated according to the maximum value and the minimum value, and when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal (including the case that the maximum value and the minimum value are both 0), or the maximum value is smaller than the parameter threshold, the maximum value is adjusted, and therefore when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal (including the case that the maximum value and the minimum value are both 0), or the maximum value is smaller than the parameter threshold, the slope of the mapping can be calculated according to the minimum value and the adjusted maximum value, so that the quantization of the network can be performed according to the calculated slope, so that the model to be quantized can perform normal quantization training, and can bring negligible error.
In this embodiment, the preset value m may be set according to actual needs, so that the slope for mapping can be calculated, for example, m may be set to 1.
For example, in this embodiment, the forward framework of the Int8 fixed point may be employed to convert floating point arithmetic operations to integer operations. At this time, the preset number of the integers is 256, for example, the preset number of the integers may be [0, 255], where n=256.
Optionally, in this embodiment, the method for calculating the slope of the mapping according to the maximum value and the minimum value is:
wherein a is the minimum value of the network parameters in the layer to be quantized, b is the maximum value of the network parameters in the layer to be quantized, n is the number of integers in a preset numerical interval, and s (a, b, n) is the slope of the mapping.
When the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold, the calculation formula of s (a, b, n) is as follows:
for example, when m=1, the calculation formula of s (a, b, n) is as follows:
alternatively, in the present embodiment, the method of mapping the parameter of the layer to be quantized to the integer q (r, a, a+m, n) within the preset value interval according to the slope is divided into two cases.
When the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold, q (r, a, b, n) is calculated in the following manner.
clamp(r;a,b):=min(max(r,a),b)
b=a+m
In an embodiment where m=1, the calculation formula of q (r, a, b, n) is:
clamp(r;a,a+1):=min(max(r,a),a+1)
When the maximum value and the minimum value in the network parameters of the layer to be quantized are not equal and the maximum value is greater than or equal to the parameter threshold, q (r, a, b, n) is calculated in the following manner.
clamp(r;a,b):=min(max(r,a),b)
And r is a network parameter in a layer to be quantized, wherein a clamp (r; a, b) is a cut-off function, q (r, a, b, n) is a network parameter after r is mapped to a preset numerical interval, and s (a, b, n) is a mapped slope.
In this embodiment, in the process of model quantization, the mapped network parameters are rounded, and then the rounded network parameters are reversely mapped into floating point numbers, so that the vectorization loss can be simulated before training, thereby achieving the purpose of reducing the quantization loss of the forward frame.
In this embodiment, when the activation value of the analog model, a is the minimum value of the activation values in the activation layer, and b is the maximum value of the activation values in the activation layer. When the weight value of the analog model, a is the minimum value of the weight value in the layer to be quantized, and b is the maximum value of the weight value in the layer to be quantized.
Referring to fig. 4, for example, in a model to be quantized including a convolution operation, an addition operation, and a Relu operation, the input of the model is input, the output of the model is output, the deviation is bias, and the weight weights needs to be quantized in an analog manner, so that an analog quantization operation (wt quant) is added after weight is applied; the activated value output by the convolution conv needs to be subjected to analog quantization, and an analog quantization operation (act quat) is added after the conv operation (corresponding to the symbol conv quat in fig. 4); the addition layer and the ReLU6 layer may be combined into a layer, and the activation value output by the addition layer is not subjected to analog quantization, so that an analog quantization operation is added to the activation value output by the ReLU6, such as an analog quantization operation (act quat) added to the ReLU6 in fig. 4.
Optionally, in this implementation, the method further includes obtaining a plurality of training samples; and inputting a plurality of training samples into a pre-trained original network model to perform network training, and obtaining a model to be quantized.
In this embodiment, a training sample is used to train the original network model, so as to obtain a model to be quantized.
Optionally, in this embodiment, the method further includes acquiring data to be processed; and inputting the data to be processed into the target model for data processing to obtain a processing result of the data to be processed.
According to the embodiment, the data to be processed is processed by adopting the trained target model, so that the target model is a quantized deep learning model, the consumption of hardware resources by the model is small, the processing result of the data to be processed can be rapidly and accurately processed, and the method is applicable to equipment such as mobile terminals.
Referring to fig. 5, the embodiment of the application further provides a model quantization training device 110, which includes an obtaining module 111, a judging module 112, an adjusting module 113 and a quantization module 114. Model quantization training device 110 includes a software function module that may be stored in memory 120 in the form of software or firmware or cured in an Operating System (OS) of electronic device 100.
The obtaining module 111 is configured to obtain, for each layer to be quantized in the model to be quantized, a network parameter of the layer to be quantized.
The acquisition module 111 in the present embodiment is configured to perform step S110, and a specific description of the acquisition module 111 may refer to the description of step S110.
The judging module 112 is configured to judge whether a maximum value and a minimum value in the network parameters of the layer to be quantized are equal or whether the maximum value is smaller than a preset parameter threshold.
The judgment module 112 in the present embodiment is configured to execute step S120, and a specific description of the judgment module 112 may refer to the description of step S120.
And the adjusting module 113 is configured to take the sum of the maximum value and a preset value as a new maximum value when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold.
The adjustment module 113 in the present embodiment is used for executing step S130, and a specific description of the adjustment module 113 may refer to the description of step S130.
The quantization module 114 is configured to perform network quantization according to the maximum value and the minimum value of the parameters in the layer to be quantized, so as to obtain a target model.
The quantization module 114 in this embodiment is configured to perform step S140, and a specific description of the quantization module 114 may refer to the description of step S140.
Optionally, in this implementation, the quantization module 114 is specifically configured to calculate the slope of the mapping according to the maximum value and the minimum value; mapping network parameters of a layer to be quantized into integers in a preset numerical interval according to the slope; and inversely mapping the integer into a floating point number according to the slope to obtain a target model.
Embodiments of the present application also provide a machine-readable storage medium, in which an executable program is stored, which when executed by the processor 130 implements a method as in any of the embodiments.
The above is merely various embodiments of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered in the protection scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (10)

1. A model quantization training method, the method comprising:
Aiming at each layer to be quantized in the model to be quantized, acquiring network parameters of the layer to be quantized, wherein the network parameters are weight values or activation values in a network; the model to be quantized is obtained through training of a pre-trained original network model based on training samples, and the training samples comprise image data or video data;
Judging whether the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or not, or whether the maximum value is smaller than a preset parameter threshold value or not;
If the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold, taking the sum of the maximum value and a preset value as a new maximum value;
network quantization is carried out according to the maximum value and the minimum value of the parameters in the layer to be quantized, and a target model is obtained; the object model is used for processing data to be processed, and the data to be processed comprises image data or video data.
2. The method of claim 1, wherein the network quantization based on the maximum and minimum values of the parameters in the layer to be quantized, the process of obtaining the object model comprises:
calculating the slope of the mapping according to the maximum value and the minimum value;
Mapping the network parameters of the layer to be quantized into integers in a preset numerical interval according to the slope;
and inversely mapping the integer into a floating point number according to the slope to obtain a target model.
3. The method of claim 2, wherein the method of calculating the slope of the map from the maximum and minimum values is:
wherein a is the minimum value of the network parameters in the layer to be quantized, b is the maximum value of the network parameters in the layer to be quantized, n is the number of integers in a preset numerical interval, and s (a, b, n) is the slope of the mapping.
4. A method according to claim 3, wherein when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold, the method of mapping the parameters of the layer to be quantized to an integer q (r, a, b, n) within a preset value interval according to the slope is as follows:
clamp(r;a,b):=min(max(r,a),b)
b=a+m
When the maximum value and the minimum value in the network parameters of the layer to be quantized are unequal and the maximum value is greater than or equal to the parameter threshold, the method for mapping the parameters of the layer to be quantized to integers in a preset value interval according to the slope is as follows:
clamp(r;a,b):=min(max(r,a),b)
Wherein r is a network parameter in a layer to be quantized, clip (r; a, b) is a cut-off function, q (r, a, b, n) is a network parameter after r is mapped to a preset numerical interval, n is the number of integers in the preset numerical interval, and s (a, b, n) is a slope of the mapping.
5. The method according to any one of claims 1-4, further comprising:
acquiring a plurality of training samples;
and inputting the training samples into a pre-trained original network model to perform network training, so as to obtain the model to be quantized.
6. The method according to any one of claims 1-4, further comprising:
acquiring data to be processed;
And inputting the data to be processed into the target model for data processing to obtain a processing result of the data to be processed.
7. A model quantization training device, the device comprising:
The acquisition module is used for acquiring network parameters of each layer to be quantized in the model to be quantized; the model to be quantized is obtained through training of a pre-trained original network model based on training samples, and the training samples comprise image data or video data;
The judging module is used for judging whether the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or not, or whether the maximum value is smaller than a preset parameter threshold value or not;
the adjusting module is used for taking the sum of the maximum value and a preset value as a new maximum value when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold value;
And the quantization module is used for carrying out network quantization according to the maximum value and the minimum value of the parameters in the layer to be quantized to obtain a target model.
8. The apparatus of claim 7, wherein the quantization module is specifically configured to:
calculating the slope of the mapping according to the maximum value and the minimum value;
Mapping the network parameters of the layer to be quantized into integers in a preset numerical interval according to the slope;
and inversely mapping the integer into a floating point number according to the slope to obtain a target model.
9. A machine readable storage medium storing an executable program which when executed by a processor implements the method of any of claims 1-6.
10. An electronic device, characterized in that the electronic device comprises a memory and a processor, the memory and the processor being electrically connected, the memory storing an executable program, the processor implementing the method according to any of claims 1-6 when executing the executable program.
CN202010634312.3A 2020-07-02 2020-07-02 Model quantization training method and device, machine-readable storage medium and electronic equipment Active CN111783957B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010634312.3A CN111783957B (en) 2020-07-02 2020-07-02 Model quantization training method and device, machine-readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010634312.3A CN111783957B (en) 2020-07-02 2020-07-02 Model quantization training method and device, machine-readable storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN111783957A CN111783957A (en) 2020-10-16
CN111783957B true CN111783957B (en) 2024-05-03

Family

ID=72758914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010634312.3A Active CN111783957B (en) 2020-07-02 2020-07-02 Model quantization training method and device, machine-readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111783957B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238883A (en) * 2021-04-23 2022-10-25 Oppo广东移动通信有限公司 Neural network model training method, device, equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628807A (en) * 2017-03-20 2018-10-09 北京百度网讯科技有限公司 Processing method, device, equipment and the computer readable storage medium of floating-point matrix number
CN109102064A (en) * 2018-06-26 2018-12-28 杭州雄迈集成电路技术有限公司 A kind of high-precision neural network quantization compression method
CN109472353A (en) * 2018-11-22 2019-03-15 济南浪潮高新科技投资发展有限公司 A kind of convolutional neural networks sample circuit and quantization method
CN109583561A (en) * 2017-09-28 2019-04-05 杭州海康威视数字技术股份有限公司 A kind of the activation amount quantization method and device of deep neural network
WO2019120114A1 (en) * 2017-12-21 2019-06-27 深圳励飞科技有限公司 Data fixed point processing method, device, electronic apparatus and computer storage medium
CN110245753A (en) * 2019-05-27 2019-09-17 东南大学 A kind of neural network compression method based on power exponent quantization
EP3543917A1 (en) * 2018-03-19 2019-09-25 SRI International Inc. Dynamic adaptation of deep neural networks
CN110363297A (en) * 2019-07-05 2019-10-22 上海商汤临港智能科技有限公司 Neural metwork training and image processing method, device, equipment and medium
CN110610237A (en) * 2019-09-17 2019-12-24 普联技术有限公司 Quantitative training method and device of model and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11222263B2 (en) * 2016-07-28 2022-01-11 Samsung Electronics Co., Ltd. Neural network method and apparatus

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628807A (en) * 2017-03-20 2018-10-09 北京百度网讯科技有限公司 Processing method, device, equipment and the computer readable storage medium of floating-point matrix number
CN109583561A (en) * 2017-09-28 2019-04-05 杭州海康威视数字技术股份有限公司 A kind of the activation amount quantization method and device of deep neural network
WO2019120114A1 (en) * 2017-12-21 2019-06-27 深圳励飞科技有限公司 Data fixed point processing method, device, electronic apparatus and computer storage medium
EP3543917A1 (en) * 2018-03-19 2019-09-25 SRI International Inc. Dynamic adaptation of deep neural networks
CN109102064A (en) * 2018-06-26 2018-12-28 杭州雄迈集成电路技术有限公司 A kind of high-precision neural network quantization compression method
CN109472353A (en) * 2018-11-22 2019-03-15 济南浪潮高新科技投资发展有限公司 A kind of convolutional neural networks sample circuit and quantization method
CN110245753A (en) * 2019-05-27 2019-09-17 东南大学 A kind of neural network compression method based on power exponent quantization
CN110363297A (en) * 2019-07-05 2019-10-22 上海商汤临港智能科技有限公司 Neural metwork training and image processing method, device, equipment and medium
CN110610237A (en) * 2019-09-17 2019-12-24 普联技术有限公司 Quantitative training method and device of model and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Accurate and efficient 2-bit quantized neural networks;Choi J等;《Proceedings of Machine Learning and Systems》;20191231;第1卷;348-359 *
Post-training piecewise linear quantization for deep neural networks;Fang J等;《omputer Vision–ECCV 2020: 16th European Conference》;20200318;69-86 *
卷积神经网络的压缩和加速;SIGAI学***台;《https://cloud.tencent.com/developer/article/1152191》;20180626;1-26 *

Also Published As

Publication number Publication date
CN111783957A (en) 2020-10-16

Similar Documents

Publication Publication Date Title
CN108304921B (en) Convolutional neural network training method and image processing method and device
CN111950723B (en) Neural network model training method, image processing method, device and terminal equipment
CN110929865B (en) Network quantification method, service processing method and related product
CN114186632B (en) Method, device, equipment and storage medium for training key point detection model
CN110413812B (en) Neural network model training method and device, electronic equipment and storage medium
US20210192349A1 (en) Method and apparatus for quantizing neural network model in device
CN110458084B (en) Face age estimation method based on inverted residual error network
CN110175641B (en) Image recognition method, device, equipment and storage medium
CN108885787B (en) Method for training image restoration model, image restoration method, device, medium, and apparatus
CN112508125A (en) Efficient full-integer quantization method of image detection model
CN111460999A (en) Low-altitude aerial image target tracking method based on FPGA
CN112053308B (en) Image deblurring method and device, computer equipment and storage medium
CN111860276B (en) Human body key point detection method, device, network equipment and storage medium
CN111783957B (en) Model quantization training method and device, machine-readable storage medium and electronic equipment
CN114418121A (en) Model training method, object processing method and device, electronic device and medium
CN113780549A (en) Quantitative model training method, device, medium and terminal equipment for overflow perception
CN109034384B (en) Data processing method and device
CN114429208A (en) Model compression method, device, equipment and medium based on residual structure pruning
CN115564975A (en) Image matching method and device, terminal equipment and storage medium
CN117894038A (en) Method and device for generating object gesture in image
CN115238883A (en) Neural network model training method, device, equipment and storage medium
CN111860557A (en) Image processing method and device, electronic equipment and computer storage medium
CN112734673B (en) Low-illumination image enhancement method and system based on multi-expression fusion
CN113962332A (en) Salient target identification method based on self-optimization fusion feedback
CN114444688A (en) Neural network quantization method, apparatus, device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant