WO2022062828A1 - Image model training method, image processing method, chip, device and medium - Google Patents

Image model training method, image processing method, chip, device and medium Download PDF

Info

Publication number
WO2022062828A1
WO2022062828A1 PCT/CN2021/114801 CN2021114801W WO2022062828A1 WO 2022062828 A1 WO2022062828 A1 WO 2022062828A1 CN 2021114801 W CN2021114801 W CN 2021114801W WO 2022062828 A1 WO2022062828 A1 WO 2022062828A1
Authority
WO
WIPO (PCT)
Prior art keywords
operator
preset
output
network
quantization
Prior art date
Application number
PCT/CN2021/114801
Other languages
French (fr)
Chinese (zh)
Inventor
尹长生
Original Assignee
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术股份有限公司 filed Critical 深圳云天励飞技术股份有限公司
Publication of WO2022062828A1 publication Critical patent/WO2022062828A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present invention relates to the field of intelligent chips, in particular to an image model training, image processing method, chip, device and medium.
  • the parameters of the neural network model in general smart chips are float32 operators, but as the parameters of the neural network model in smart chips increase, smart chips have to use low
  • For bit operator calculation for example, int4/int8/int16/float16 is used to replace the original model float32 operator, so the smart chip needs to be quantized.
  • the smaller the number of bits used for quantization the less memory required by the smart chip, the faster the calculation, and the lower the power consumption, but too few bits will inevitably lead to a decrease in the accuracy of the entire neural network.
  • How to determine the number of quantized bits required for each processing in a neural network to balance performance and accuracy is a complex project, so there is a lack of neural networks with effective quantization methods in existing smart chips.
  • Embodiments of the present invention provide an image model training, image processing method, chip, device and medium to solve the problem that the number of bits quantized by the neural network cannot balance the performance and accuracy of the smart chip.
  • An image model training method comprising:
  • the sample image set includes at least one sample image; one of the sample images is associated with an original network output;
  • the preset similarity threshold According to the first cosine similarity, the preset similarity threshold and the preset mixed precision quantization method, the preset neural network model is optimized, and the preset network output of the optimized preset neural network model is obtained. ;
  • An image processing method comprising:
  • the to-be-processed image is input into an image processing model, and an image output result corresponding to the to-be-processed image is obtained; the image processing model is obtained according to the above-mentioned image model training method.
  • An intelligent chip comprising a storage module, a processing module and an image processing model stored in the storage module and running on the processing module, the image processing model is obtained according to the above-mentioned image processing model training method;
  • the processing module is configured to execute the above image processing method through the image processing model.
  • a computer device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implements the above-mentioned image processing model training method when the processor executes the computer program, or The above-mentioned image processing method is implemented when the processor executes the computer program.
  • a computer-readable storage medium storing a computer program, which implements the above-mentioned image model training method when the computer program is executed by a processor, or implements the above-mentioned image processing when the computer program is executed by the processor method.
  • the method determines the first cosine similarity between the first quantized network output after the symmetrical quantization process and the original network output, and according to the first cosine Similarity, cosine similarity threshold, and preset mixed-precision quantization methods are used to further optimize the preset neural network model to ensure that the loss value between the preset network output of the preset neural network model and the original network output reaches the preset value.
  • the trained neural network model is recorded as an image processing model, so that the network output of the final image processing model and the original network output can be obtained through the mixed-precision quantization method and the method of continuously adjusting the initial parameters.
  • the loss value is small, and the parameters of the preset neural network can be reduced in the mixed-precision quantization process, thereby improving the inference rate of the image processing model obtained by the final training, so that when the image processing model is set on the smart chip, the smart chip can be guaranteed. processing performance and processing accuracy.
  • FIG. 1 is a flowchart of an image model training method in an embodiment of the present invention
  • FIG. 2 is a flowchart of step S20 in the image model training method according to an embodiment of the present invention
  • step S40 in the image model training method according to an embodiment of the present invention
  • step S40 is another flowchart of step S40 in the image model training method in an embodiment of the present invention.
  • step S40 in the image model training method according to an embodiment of the present invention.
  • step S40 in the image model training method according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a computer device in an embodiment of the present invention.
  • an image model training method which includes the following steps:
  • S10 Acquire a sample image set, the sample image set includes at least one sample image, and each sample image is associated with an original network output.
  • the sample images in the sample image set may be images collected in various scenarios.
  • the sample images may be surveillance images captured by surveillance cameras, or may be images downloaded through the Internet.
  • the original network output may be an output obtained by processing the sample image through other neural network models (ie, models other than the following preset neural network models including preset initial parameters). It is understandable that other neural network models process the sample images differently from the processing methods in the preset neural network model including initial parameters in the following steps.
  • S20 Input the sample image into a preset neural network model including initial parameters, and after performing symmetrical quantization processing on the sample image by using the preset neural network model, obtain a first quantized network output output by the preset neural network model .
  • the preset neural network model is obtained by improving a network model such as a CNN (Convolutional Neural Networks, convolutional neural network), and the preset neural network model includes a neuron processing unit and a vector processing unit.
  • a network model such as a CNN (Convolutional Neural Networks, convolutional neural network)
  • the preset neural network model includes a neuron processing unit and a vector processing unit.
  • step S20 the sample image is input into a preset neural network model including initial parameters, and the sample image is subjected to symmetrical quantization processing by the preset neural network model, To obtain the first quantized network output output by the preset neural network model, the steps include:
  • S201 Acquire a first precision operator and a second precision operator corresponding to the sample image; the precision of the second precision operator is higher than that of the first precision operator.
  • the first precision operator may be Convolution (convolution operator), Dense (dense operator), Add (superposition operator), Relu (activation operator), Pool (pooling operator) and other operators. son.
  • the second precision operator may be SoftMax (classification operator), Sigmoid (activation operator), and Normalize (normalization operator). It should be noted that the first precision operator and the second precision operator are both float type operators, and the precision of the second precision operator is higher than that of the first precision operator.
  • the symmetric quantization process refers to a process of quantizing the first precision operator from a float type to an INT type operator, and after performing the calculation with the INT type operator, inversely quantizing the INT type operator into a float type operator.
  • step S302 that is, performing symmetrical quantization processing on the first precision operator and obtaining the first network output, including the following steps:
  • the quantization parameters in the symmetric quantization model After obtaining the first precision operator and the second precision operator corresponding to the sample image, obtain the quantization parameters in the symmetric quantization model; according to the quantization parameters and the preset rounding method, performing rounding processing on the first precision operator, to obtain a rounding operator corresponding to the first precision operator; the quantization operator.
  • quantization operator can be obtained by the following expression:
  • scale is the shrink factor
  • threshold is the quantization threshold
  • N is the quantization bit width (N can be selected from 4, 8, 16, etc.)
  • x int is the rounding operator
  • ceil() is the round-up function
  • x is the first Precision operator
  • x q is the quantization operator
  • clip() is the interception function.
  • the first precision operator is a float type operator, and the first precision operator needs to be converted into an INT type operator before it can be calculated by the neuron processing unit, and for the first precision operator, Using 8-bit quantization can ensure the accuracy of the first-precision operator while reducing the parameters of the first-precision operator.
  • the symmetric quantization model uses 8-bit Per channel to quantize the first-precision operator by default, that is, in the above expression
  • the value of N is 8, so for the symmetric quantization models of different channels, the corresponding scales are also different; the quantization objects corresponding to the first precision operator include Activation (the activation output of the operator) and weight (the operator parameter), so
  • the 8bit Per channel is used to quantize the first precision operator, that is, the Activation and weight of the first precision operator are represented by INT8.
  • the threshold in the above expression when the quantization object of the first precision operator is Activation, is obtained by means of calibration (calibration) statistics; when the quantization object of the first precision operator is weight, directly take the value of the weight.
  • the clip() in the above expression is the interception function, where -2 N-1 is the operator with the minimum allowed length, that is, if any operator has a length less than this value, it will be discarded; 2 N-1 -1 is The operator with the maximum allowable length, that is, any operator whose length is greater than this value, is discarded; that is, after quantizing the first precision operator, the operator length of each operator is limited to -2 N-1 ⁇ 2 N-1 -1.
  • the vector processing unit After the quantization operator is calculated by the neuron processing unit, the vector processing unit performs inverse quantization processing on the calculated quantization operator to obtain the first network output.
  • the preset neural network model further includes a neuron processing unit.
  • the neuron processing unit handles intensive low-precision operators.
  • the neuron processing unit mainly supports INT8 and INT4 operators.
  • the inverse quantization process refers to inverse quantization of the calculated quantization operator from an INT type operator to a float type operator.
  • the vector processing unit inversely quantizes the calculated quantization operator to obtain the first network output.
  • x f is the output of the first network
  • x q is the quantization operator
  • scale is the shrinking factor
  • the inverse quantization model is stored in the vector processing unit, that is, after the quantization operator is calculated by the neuron processing unit, the calculated quantization operator is input into the inverse quantization model in the vector processing unit. , and perform inverse quantization processing on the calculated quantization operator through the inverse quantization model, and then inverse quantize the INT type quantization operator into a float type operator.
  • the 8bit Per channel is generally used by default to quantify the first precision operator, that is, INT8 is used to represent Activation and weight, and the neuron processing unit supports INT8 and mainly processes intensive low-precision operators. Therefore, when the quantization operator is calculated, the quantization operator is calculated by the neuron processing unit.
  • S203 Acquire a second network output corresponding to the second precision operator.
  • the second network output is obtained by calculating the second precision operator by the vector processing unit.
  • the preset neural network model also includes a vector processing unit.
  • the vector processing unit processes operators with higher precision than the first precision operator, such as the second precision operator.
  • the vector processing unit mainly supports INT8, INT16 and Float16 operators. .
  • the symmetric quantization model is not used to quantize it in this embodiment, because the second-precision operator does not need to be converted into an INT-type operator and then calculated. Therefore, this implementation
  • the Float16 operator can be directly used to represent the second precision operator; and since the vector processing unit can support the Float16 operator, but the neuron processing unit does not support the Float16 operator, the second precision operator is performed by the vector processing unit. Calculate to get the second network output.
  • S204 Record the first network output and the second network output as the first quantitative network output of the preset neural network model.
  • the first precision operator and the second precision operator corresponding to the sample image after performing symmetrical quantization processing on the first precision operator, acquire the first precision operator corresponding to the symmetrically quantized first precision operator. a network output; obtain the second network output corresponding to the second precision operator (that is, the second precision operator does not perform symmetrical quantization, and directly calculates the second precision operator, and then obtains the second network output); A network output and a second network output are recorded as the first quantitative network output of the preset neural network model.
  • the first network output and the second network output are recorded as the first quantitative network output of the preset neural network model. Understandably, since the quantization operator is inversely quantized, the data is float, and the output of the second network is represented by the Float16 operator, so the output of the second network is also float data, so the output of the first quantization network is float type data, and then the cosine similarity calculation can be performed with the original network output in step S30.
  • the first cosine similarity is used to represent the similarity between the first quantized network output and the original network output.
  • the first precision operator by other computer equipment (such as a personal computer, etc.) and the original network output obtained by the second precision operator, and then determine the first cosine similarity between the first quantized network output and the original network output according to the following expression:
  • is the first cosine similarity
  • X i is the ith operator in the output of the first quantization network
  • Y i is the ith operator in the output of the original network
  • n is the output of the first quantization network or the original network output The total number of neutrons.
  • S40 Perform optimization processing on the preset neural network model according to the first cosine similarity, the preset similarity threshold, and the preset mixed-precision quantization method, and obtain a preset value of the optimized preset neural network model. network output.
  • the value range of the first cosine similarity is -1 to 1, and the larger the value of the first cosine similarity, the higher the similarity between the output of the first quantization network and the output of the original network. Therefore, this implementation
  • the preset similarity threshold can be set as 0.97, 0.98, etc.
  • the preset mixed-precision quantization method refers to the use of operators with different precisions to substitute and quantify the preset neural network model for the magnitude relationship between the first cosine similarity and the preset similarity threshold, so as to optimize the preset neural network. methods of network models.
  • the preset network output refers to the final output result of the preset neural network model after the preset neural network model is optimized.
  • S50 Determine a loss value between the preset network output and the original network output, and when the loss value does not reach a preset convergence condition, iteratively update the initial parameters of the preset neural network model until the When the loss value reaches a preset convergence condition, the preset neural network model is recorded as an image processing model.
  • the loss value refers to the similarity loss between the preset network output and the original network output.
  • the similarity between the preset network output and the original network output is determined, and the difference between the similarity and the preset similarity threshold can be determined.
  • the difference is recorded as the loss value
  • the preset convergence condition can be that the loss value is less than 0.01, that is, the difference between the preset network output and the original network output and the preset similarity threshold needs to be less than 0.01, which can ensure that the output of the optimized preset neural network model can meet the requirements of cosine similarity, and at the same time, the parameter amount of the optimized preset neural network model is reduced, and the inference rate of the preset neural network model is improved.
  • the convergence condition can be the condition that the loss value is less than the set threshold, that is, when the loss value is less than the set threshold, the training is stopped; the convergence condition can also be that the loss value after 10,000 calculations is very small and In the condition that the loss will not decrease again, that is, when the loss value is small and will not decrease after 10,000 calculations, the training is stopped, and the preset neural network model after convergence is recorded as the image processing model.
  • the initial parameters of the preset neural network model are adjusted according to the loss value, and the sample image Re-input into the preset neural network model after adjusting the initial parameters, so that when the loss value corresponding to the sample image reaches the preset convergence condition, another sample image in the sample image set is selected, and steps S30-S40 are executed to obtain the same
  • the loss value corresponding to the sample image, and when the loss value does not reach the preset convergence condition, the initial parameters of the preset neural network model are adjusted again according to the loss value, so that the loss value corresponding to the sample image also reaches the preset value. Convergence condition.
  • the output of the preset neural network model can be continuously approached to the accurate result, until the loss values corresponding to all the sample images reach the preset convergence.
  • the preset neural network model after convergence is recorded as an image processing model.
  • the neuron processing unit in the preset neural network model since the computing performance of the first-precision operator is much higher than that of the floating-point operator, and the neuron processing unit in the preset neural network model only supports fixed-point computing, in this implementation, the first-precision computing After the symmetric quantization of the sub (originally a float operator) (which is converted to an INT operator at this time), fixed-point calculation can be performed, which can make the parameters in the preset neural network model smaller after symmetric quantization, thereby reducing the preset neural network. storage space in the model; and by performing mixed-precision quantization processing on the preset neural network model according to the first cosine similarity and the preset similarity threshold, the calculation amount of the preset neural network model can be further reduced, Increase the calculation rate of preset neural network models.
  • step S40 that is, according to the first cosine similarity, the preset similarity threshold and the preset mixed precision quantization method, the preset neural network model is quantified.
  • Optimizing processing including:
  • S401 Detect whether the first cosine similarity is greater than or equal to the preset similarity threshold.
  • the first integer type operator refers to an operator of various bytes (eg, 4 bytes, 2 bytes, etc.) and a signed integer type.
  • the first preset calculation amount may be determined according to the total number of operators output by the first network. According to the description in the above embodiment, 8bit per channel (that is, INT8 type operator) is used by default to quantize and calculate the first precision operator. Therefore, as an option, the first integer operator in this embodiment can be an INT4 type operator. son.
  • the operators calculated by the neuron processing unit after being represented by the INT8 operator in the first network output are sorted from large to small according to the amount of calculation, or sorted from small to large ( Understandably, in the above description, when the output of the first network is symmetrically quantized, INT8 is used to represent the activation and weight of the first precision operator, so the output of the first network is represented by INT8 and then calculated by the neuron processing unit.
  • S403 Calculate the first substitution operator by the neuron processing unit to obtain a third network output, and record the third network output and the second network output as a second quantization network output.
  • the first replacement operator includes INT8 and the first integer operator, and the above description points out that the neuron processing unit mainly supports INT8 and INT4 Therefore, at this time, the first substitution operator should be calculated by the neuron processing unit, and the calculated first substitution operator should be inversely quantized to obtain the third network output corresponding to the first substitution operator. , and record the third network output and the second network output as the second quantization network output.
  • S405 When the second cosine similarity is greater than or equal to the preset similarity threshold, use an optimized query method to perform query substitution processing on the second quantized network output to obtain the preset network output.
  • the value range of the second cosine similarity is -1 to 1, and the larger the value of the second cosine similarity, the higher the similarity between the output of the second quantization network and the output of the original network.
  • the optimization query method may be a dynamic regression algorithm, a backtracking method, a divide-and-conquer method, a greedy algorithm, etc.; in this embodiment, a greedy algorithm is used as the optimization query method, and the query substitution processing refers to the third network output from the second quantitative network output Find an operator that is continuous and can be replaced by the first integer operator.
  • the second cosine similarity between the second quantization network output and the original network output determines the second cosine similarity between the second quantization network output and the original network output; detect the second cosine Whether the similarity is greater than or equal to the preset similarity threshold, when the second cosine similarity is greater than or equal to the preset similarity threshold, a greedy algorithm is used to find a continuous operator in the output of the third network that can be replaced by INT4, and at the same time Set the maximum search range (such as 100 operators) to prevent the search time from being too long; when finding a continuous operator in the third network output that can be replaced by the first integer operator (such as in the remaining operators represented by INT8) Look for continuous operators that can be replaced by the first integer operator), and use the continuous and INT4 instead of the quantization network (that is, including the second network output and the replaced third network output) as the to-be-quantized).
  • the preset network output is obtained.
  • step S405 is performed; if the number of operators that are replaced by the first integer operator in step S402 is reduced to 1 (after the reduction to 0, the third network outputs Similarly, step S405 is performed to indicate that the cosine similarity is greater than or equal to the preset similarity threshold), and the cosine similarity at this time is still less than the preset similarity threshold, then the output of the current preset neural network model is used as the preset network output.
  • the first integer operator is used to replace the operator exceeding the first preset calculation amount in the output of the first network, thereby further reducing the preset neural network
  • the calculation amount of the model further improves the calculation rate of the preset neural network model, and when using the optimized query method to query, the maximum search range is set to prevent the search time from being too long and shorten the time for the quantification process of the preset neural network model.
  • the preset neural network model further includes a special processing unit. As shown in FIG. 4 , after step S401, that is, after detecting whether the first cosine similarity is greater than or equal to the preset similarity threshold ,Also includes:
  • the floating-point operator refers to each floating-point operator.
  • the floating-point operator in this embodiment uses 32 bits floating point operator.
  • the second preset calculation amount may be determined according to the total number of operators output by the second network.
  • the operators calculated by the vector processing unit after being represented by the Float16 operator in the second network output are sorted according to the calculation amount from small to large, or sorted from large to small (understandably , in the above description, the second network output is calculated by the vector processing unit after the second precision operator is represented by the Float16 operator), and then the calculation amount in the second network output is lower than the second preset calculation amount.
  • S407 Calculate the second substitution operator by the special processing unit to obtain a fourth network output, and record the first network output and the fourth network output as a third quantization network output.
  • the special processing unit may be a DSP (Digital Signal Process, digital signal processor) or a CPU (Central Processing Unit, central processing unit), and the special processing unit can process the neuron processing unit and the operator not supported by the vector processing unit or Some second-precision operators (eg Float32, Float16, etc.).
  • DSP Digital Signal Process, digital signal processor
  • CPU Central Processing Unit, central processing unit
  • the second replacement operator includes Float16 and floating-point operators (that is, the Float32 operator in the above description), but the vector processing unit does not support To process the Float32 operator, it is necessary to use a special processing unit to calculate the second substitution operator, obtain the fourth network output corresponding to the second substitution operator, and record the first network output and the fourth network output as the third quantization Network output (since only the operator output by the second network is quantized at this time, the output of the first network does not change).
  • S408 Use a preset query method to query whether there is a first minimum operator combination in the output of the third quantization network; the first minimum operator combination means that the first minimum similarity in the output of the third quantization network is greater than or Among all the operator combinations equal to the preset similarity threshold, the operator combination with the smallest number of operators; the first minimum similarity refers to the operator combination in the output of the third quantization network and the prediction in the output of the original network. Set the cosine similarity between subcombinations.
  • the preset query method can be a dichotomy method, or a superposition method (that is, searching from a single operator, if the single operator is not the first minimum operator combination, the expansion range is two operators, four operators, and And so on, until the first minimum operator combination is queried or the query ends when all operator combinations are queried), in this embodiment, the dichotomy method is preferably selected as the default query method.
  • the cosine similarity between the third quantization network output and the original network output is determined by the expression in step S30 , it can be understood that in this expression, it is for the cosine similarity between the output of the third quantization network and each operator in the output of the original network (for example, when i is 2, it is X 1 and X 2 , and The similarity between Y 1 and Y 2 ), because at this time, the dichotomy method can be used to find the minimum operator combination whose cosine similarity is greater than the preset threshold, that is, according to a certain order (such as the calculation amount from large to small order) After the three-quantized network output and the operators in the original network output are sorted, the binary search starts with the operator whose sorting of each operator is in the middle position.
  • the dichotomy is an algorithm for querying the half with the middle operator as the boundary
  • the operator combination is updated to the first minimum Operator combination, and so on, until the query is completed, the first minimum operator combination is determined.
  • the Activation in the first network output is all through INT8. After the operators are represented, they are calculated by the neuron processing unit.
  • the operators in the first network output are sorted according to the calculation amount from large to small (or from small to large), and the calculation amount in the first network output exceeds
  • the operator of the first preset calculation amount is replaced with a first integer operator (preferably an INT4 operator), that is, the operator that exceeds the first preset calculation amount is replaced with INT4 to represent its Activation and/or weight ( Understandably, the original operator that exceeds the first preset calculation amount is calculated by converting it into an 8-byte signed integer, and now the operator that exceeds the first preset calculation amount is replaced by 4 bytes and signed integer type), and the first network output after replacement is recorded as the first replacement operator.
  • a first integer operator preferably an INT4 operator
  • S410 Calculate the first substitution operator by the neuron processing unit to obtain the third network output, and record the third network output and the fourth network output as the fourth quantization network output .
  • the INT4 operator is used to replace the operator whose calculation amount exceeds the first preset calculation amount in the first network output, and the After the replacement of the first network output is recorded as the first replacement operator, the first replacement operator includes INT8 and INT4 operators.
  • the neuron processing unit mainly supports INT8 and INT4 operators. Therefore, At this time, the first substitution operator should be calculated by the neuron processing unit, and the calculated first substitution operator should be inversely quantized to obtain the output of the third network corresponding to the first substitution operator, and the third network The output along with the fourth network output is recorded as the fourth quantization network output.
  • S411 Determine a third cosine similarity between the fourth quantized network output and the original network output.
  • the value range of the third cosine similarity is -1 to 1, and the larger the value of the third cosine similarity, the higher the similarity between the output of the fourth quantization network and the output of the original network.
  • a third cosine similarity between the fourth quantization network output and the original network output is determined ; Detect whether the third cosine similarity is greater than or equal to the preset similarity threshold, and when the third cosine similarity is greater than or equal to the preset similarity threshold, use an optimized query method to find the third network output that is continuous and can use INT4
  • the replacement operator (such as finding a continuous operator in the remaining operators represented by INT8 that can be replaced by the first integer operator), will be continuous and replaced by INT4 after the quantization network output (that is, including the first integer operator).
  • the four network outputs and the replaced third network output are used as the default network outputs.
  • the floating-point operator when the first cosine similarity does not reach the preset similarity threshold, the floating-point operator is used to replace the operator whose calculation amount is lower than the second preset calculation amount in the second network output to improve the
  • the cosine similarity between the network output after the preset neural network model and the original network output is further reduced under the premise of ensuring that the cosine similarity meets the preset similarity threshold, and the calculation amount of the preset neural network model is reduced.
  • the network output of the first precision operator part and the network output of the second precision operator part are replaced by operators to improve the cosine similarity between the network output after the preset neural network model and the original network output.
  • step S408 that is, after adopting the preset query method to determine whether there is a first minimum operator combination in the output of the third quantization network, the method further includes:
  • the second integer operator may be an INT16 operator, that is, an operator of 16 bytes and a signed integer.
  • the preset activation output threshold may be determined according to the activation of each operator in the output of the first network (that is, the activation output corresponding to each operator).
  • the output of the third quantization network is the same as the output of the third quantization network.
  • the cosine similarity between each operator combination in the original network output is less than the preset similarity threshold. It is understandable that after all the Float16 operators in the second network output are replaced by Float32, the first one has not been queried.
  • the replacement of the Float16 operator in the second network output in step S306 is abandoned, that is, the third quantization network output is discarded; the first network output is expressed by the INT8 operator and calculated by the neuron processing unit.
  • Operator according to the Activation value of each operator and in accordance with the order from large to small (or from small to large) to sort each operator, and replace the operator whose Activation is lower than the preset activation output threshold in the first network output with INT16 operator.
  • the operator that is lower than the preset activation output threshold is replaced by INT16 to represent its Activation (It is understandable that the original operator lower than the preset activation output threshold is converted into an 8-byte signed integer by converting Perform calculation, now replace the operators below the preset activation output threshold with 16 bytes and signed integers for calculation), and record the first network output after the replacement as the third replacement operator.
  • step S412 it is not necessary to consider the weight corresponding to each operator, because in most cases, it is sufficient to quantify the weight of the first precision operator by using 8bit Per channel, so it is not required in step S412.
  • the weight of each operator in the output of the first network is taken into consideration, and other operators in the output of the first network are not replaced except for the operators below the preset activation output threshold that need to be replaced.
  • S414 Calculate the third substitution operator by the vector processing unit to obtain a fifth network output, and record the fifth network output and the second network output as the fifth quantization network output.
  • the third quantization network output is discarded, and the second integer operator is used to replace the activation output in the first network output
  • the first network output after the replacement is recorded as the third replacement operator, and the third replacement operator includes INT8 and the second integer operator (also That is, INT16 operator), and it is pointed out in the description of the above embodiment that the vector processing unit mainly supports INT8, INT16 and Float16 operators, while the neuron processing unit mainly supports INT8 and INT4 operators, so at this time, vector processing should be used.
  • the unit calculates the third substitution operator to obtain the fifth network output, and then records the fifth network output and the second network output as the fifth quantization network output (because it is pointed out in step S411 that the third quantization network output is discarded, that is, The replacement of the second network output in step S406 is discarded, so it should be the second network output and the fifth network output at this time).
  • S415 Use a preset query method to query whether there is a second minimum operator combination in the output of the fifth quantization network; the second minimum operator combination refers to that the second minimum similarity in the output of the fifth quantization network is greater than or Among all the operator combinations equal to the preset similarity threshold, the operator combination with the smallest number of operators; the second minimum similarity refers to the operator combination in the output of the fifth quantization network and the prediction in the output of the original network. Set the cosine similarity between subcombinations.
  • S416 Obtain a preset network output when the second minimum operator combination is queried by using the preset query method.
  • the cosine similarity between the fifth quantization network output and the original network output is determined through the expression in step S30 , it can be understood that in this expression, it is for the cosine similarity between the output of the fifth quantization network and each operator in the output of the original network (for example, when i is 2, it is X 1 and X 2 , and The similarity between Y 1 and Y 2 ), because at this time, the dichotomy method can be used to find the minimum operator combination whose cosine similarity is greater than the preset threshold, that is, according to a certain order (such as the calculation amount from large to small order) After the five-quantized network output and the operators in the original network output are sorted, the binary search starts with the operator whose sorting of each operator is in the middle position.
  • the dichotomy is an algorithm to query the half with the middle operator as the boundary
  • the other operator combination is updated to the first The second minimum operator combination, and so on, until the query is completed, the second minimum operator combination is determined, and the current preset network output of the preset neural network model is obtained.
  • the quantization processing of the preset neural network model is changed, that is, the second integer operator is used for substitution to ensure the preset The accuracy of the neural network model is improved, and the second minimum operator combination that meets the preset similarity threshold exists in the preset neural network model after replacement, and the calculation amount of the preset neural network model is reduced.
  • step S415 that is, after adopting the preset query method to determine whether there is a second minimum operator combination in the output of the fifth quantization network, the method further includes:
  • the fifth network output and the second network output are recorded as the fifth quantization network output, if the second minimum operator combination is not queried by using the preset query method, that is, the fifth quantization network output If the cosine similarity with each operator combination in the original network output is less than the preset similarity threshold, then all operators in the fifth network output are replaced with the second integer operator.
  • the fifth network output obtained after the replacement of S412 contains INT8 and INT16 operators, so all the remaining INT8 operators will be replaced by INT16 operators at this time, that is, there are 8 bytes in the original fifth network output and signed Integer operators, now replace all of them with 16-byte signed integer operators, and record the fifth quantization network output after the substitution as the fourth substitution operator.
  • each operator in the fourth substitution operator is an INT16 operator, that is, each operator is a 16-byte signed integer type operator.
  • S418 Calculate the fourth substitution operator by the vector processing unit to obtain a sixth network output, and record the sixth network output and the second network output as the sixth quantization network output.
  • the second integer operator is used to replace all the operators in the output of the fifth network, and the output of the fifth quantization network after the replacement is recorded.
  • each operator in the fourth substitution operator is the second integer operator (that is, the INT15 operator), so the fourth substitution operator needs to be calculated by the vector processing unit,
  • the sixth network output is obtained, and the sixth network output and the second network output are recorded as the sixth quantization network output.
  • the second network output is expressed by the Float16 operator and then calculated by the vector processing unit.
  • the second network output is calculated by the vector processing unit after the second precision operator is represented by the Float16 operator) , and then replace the operator whose calculation amount is lower than the second preset calculation amount in the second network output with a floating-point operator (preferably a Float32 operator), that is, replace the operator whose calculation amount is lower than the second preset calculation amount with It is represented by Float32 (it is understandable that the second precision operator in the output of the second network is calculated by the vector processing unit after being converted into a 16-bit floating point form, and now it will be lower than the second preset calculation amount The operator is replaced by a 32-bit floating point form and then calculated by the vector processing unit), and the second network output after the replacement is recorded as the second replacement operator.
  • a floating-point operator preferably a Float32 operator
  • S421 Calculate the second substitution operator by the special processing unit to obtain a fourth network output, and record the sixth network output and the fourth network output as a seventh quantization network output.
  • the second replacement operator includes the Float16 and Float32 operators, and the vector processing unit does not support the processing of the Float32 operator, so a special processing unit is required.
  • the second substitution operator performs calculation to obtain the fourth network output corresponding to the second substitution operator, and records the fourth network output and the sixth network output as the seventh quantization network output (because only the second network output is is quantized by the operator of , so the output of the sixth network does not change).
  • S422 Use a preset query method to query whether there is a third minimum operator combination in the output of the seventh quantization network; the third minimum operator combination refers to that the third minimum similarity in the output of the seventh quantization network is greater than or Among all the operator combinations equal to the preset similarity threshold, the operator combination with the smallest number of operators; the third minimum similarity refers to the operator combination in the output of the seventh quantization network and the prediction in the output of the original network. Let the cosine similarity between the subcombinations.
  • S423 Obtain a preset network output when the third minimum operator combination is queried by using the preset query method.
  • the cosine similarity between the third quantization network output and the original network output is determined by the expression in step S30, which is understandable , in this expression, is for the cosine similarity between the output of the seventh quantized network and each operator in the output of the original network (for example, when i is 2, it is X 1 and X 2 , and Y 1 and Y 2 ), because at this time, the dichotomy method can be used to find the minimum operator combination whose cosine similarity is greater than the preset threshold, that is, the seventh quantization network is output in a certain order (such as the order of the calculation amount from large to small).
  • the operators in the original network output are sorted, start the binary search with the operator whose operators are sorted in the middle position as the boundary, and find the operator combination whose cosine similarity is greater than the preset similarity threshold.
  • Record the operator combination as the third minimum operator combination and continue to use the dichotomy method to search (the total number of operators at this time is half of the total number of operators when the third minimum operator combination was queried last time, so the dichotomy method is an algorithm that uses the intermediate operator as the boundary to query in half), if another operator combination whose cosine similarity is greater than the preset similarity threshold is still found, the operator combination is updated to the third minimum operator combination, with By analogy, until the query is completed, the third minimum operator combination is determined, and the preset network output of the current preset neural network model is obtained.
  • step S422 that is, whether there is a third minimum operator combination in the output of the seventh quantization network using the preset query method, the method further includes:
  • the third minimum operator combination after recording the output of the fourth network and the output of the sixth network as the output of the seventh quantization network, if the third minimum operator combination is not queried by using the preset query method, it means that there is an error in the training of the current preset neural network model , it cannot continue to be quantized, otherwise it will not meet the cosine similarity, and at the same time deviate greatly from the accuracy requirement, at this time, it will prompt an error in the training of the preset neural network model.
  • the preset neural network model is also recorded as an image processing model, further comprising:
  • the accuracy of the image processing model is detected, and when it is determined that the image processing model does not meet the preset accuracy requirements, the relevant personnel are prompted to manually adjust the parameters.
  • the preset accuracy requirements are set according to specific application scenarios and specific calculation requirements of the preset neural network model.
  • the initial parameters of the preset neural network model are iteratively updated, and the preset neural network model is recorded until the loss value reaches the preset convergence condition.
  • the cosine similarity between the preset network output representing the image processing model and the original network output reaches the preset similarity threshold, reducing the calculation amount of the preset neural network model, but Not necessarily optimal quantitative results.
  • the quantitative results that are not necessarily optimal may be manifested in the following two aspects:
  • the first aspect the image processing model may not meet the preset accuracy requirements.
  • operators in the image processing model that may cause precision degradation can be manually marked as second-precision operators, for example, replacing INT8 operators with INT16 operators.
  • the second aspect the utilization rate of the image processing model does not meet the preset utilization requirement, and it is necessary to continue to reduce the calculation amount of the image processing model.
  • the intensive operators in the image processing model can be manually marked as low-precision operators.
  • the present invention by adopting symmetric quantization and quantization processing methods based on mixed precision, and combining with the method of manual parameter adjustment, compared with the method of searching the quantization bit number of each operator through full space in the prior art, the present invention It can be quickly quantified in a short time with less calculation amount, and can ensure that the accuracy of the preset neural network model is less different from the accuracy of the original network output, and at the same time reduce the calculation amount of the preset neural network model, and reduce the The parameters of the neural network model are preset, thereby improving the calculation rate of the preset neural network model.
  • a smart chip including a storage module, a processing module, and an image processing model stored in the storage module and runnable on the processing module, the image processing model is based on the image processing described above. obtained by the model training method; the processing module is configured to execute the above image processing method through the image processing model.
  • a computer device is provided, and the computer device can be a server, and its internal structure diagram can be as shown in FIG. 7 .
  • the computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program and a database.
  • the internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer device is used to store the data used by the above-mentioned image model training method or image processing method.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to implement an image model training method or an image processing method.
  • a computer device including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the image model training method in the above embodiment when the computer program is executed. , or the image processing method in the above embodiment is implemented when the processor executes the computer program.
  • a computer-readable storage medium on which a computer program is stored.
  • the image model training method in the above embodiment is implemented, or when the computer program is executed by the processor, the above-mentioned method is implemented.
  • Image processing method in the embodiment is provided, on which a computer program is stored.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

An image model training method, an image processing method, a chip, a device and a medium. The image model training method comprises: acquiring a sample image set, one sample image being associated with one original network output (S10); inputting each sample image to a preset neural network model comprising initial parameters, and performing symmetric quantization processing on the sample image, so as to obtain a first quantization network output (S20); determining a first cosine similarity between the first quantization network output and the original network output (S30); according to the first cosine similarity, a preset similarity threshold and a preset mixed precision quantization method, performing optimization processing on the preset neural network model, and acquiring a preset network output of the preset neural network model subjected to the optimization processing (S40); and determining a loss value between the preset network output and the original network output, and if the loss value reaches a preset convergence condition, recording the preset neural network model as an image processing model (S50). The present invention decreases parameters of a preset neural network model, and increases the calculation speed of the preset neural network model.

Description

图像模型训练、图像处理方法、芯片、设备及介质Image model training, image processing method, chip, device and medium
本申请要求于2020年9月23日提交中国专利局,申请号为202011005967.0、发明名称为“图像模型训练、图像处理方法、芯片、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on September 23, 2020 with the application number 202011005967.0 and the invention titled "Image Model Training, Image Processing Method, Chip, Device and Medium", the entire content of which is approved by Reference is incorporated in this application.
技术领域technical field
本发明涉及智能芯片领域,尤其涉及一种图像模型训练、图像处理方法、芯片、设备及介质。The present invention relates to the field of intelligent chips, in particular to an image model training, image processing method, chip, device and medium.
背景技术Background technique
随着科学技术的发展,人工智能技术也应用在各个场景中,例如智能机器人、智能家居、智能安防等场景。在上述场景中,往往是通过若干集成的智能芯片亦或者单独智能芯片实现各个场景下的不同功能,不同的场景下,对于智能芯片的要求也不尽相同。With the development of science and technology, artificial intelligence technology is also applied in various scenarios, such as smart robots, smart homes, smart security and other scenarios. In the above scenarios, different functions in each scenario are often realized through several integrated smart chips or a single smart chip. In different scenarios, the requirements for smart chips are also different.
一般的智能芯片中的神经网络模型参数均为float32型算子,但是随着智能芯片中神经网络模型的参数越来越多,智能芯片要获取更高性能以及能耗比时,不得不采用低比特算子计算,例如采用int4/int8/int16/float16来代替原始模型float32算子,因此需要对智能芯片进行量化。虽然量化采用的比特数越少,智能芯片所需要的内存越少,计算越快,功耗越低,但是过少的比特数必然会引起整个神经网络精度的下降。如何决定神经网络中每个处理过程需要量化的比特数来平衡性能和精度是一项复杂的工程,因此现有的智能芯片中缺少具有有效量化方式的神经网路。The parameters of the neural network model in general smart chips are float32 operators, but as the parameters of the neural network model in smart chips increase, smart chips have to use low For bit operator calculation, for example, int4/int8/int16/float16 is used to replace the original model float32 operator, so the smart chip needs to be quantized. Although the smaller the number of bits used for quantization, the less memory required by the smart chip, the faster the calculation, and the lower the power consumption, but too few bits will inevitably lead to a decrease in the accuracy of the entire neural network. How to determine the number of quantized bits required for each processing in a neural network to balance performance and accuracy is a complex project, so there is a lack of neural networks with effective quantization methods in existing smart chips.
发明内容SUMMARY OF THE INVENTION
本发明实施例提供一种图像模型训练、图像处理方法、芯片、设备及介质,以解决神经网络量化的比特数无法平衡智能芯片性能和精度的问题。Embodiments of the present invention provide an image model training, image processing method, chip, device and medium to solve the problem that the number of bits quantized by the neural network cannot balance the performance and accuracy of the smart chip.
一种图像模型训练方法,包括:An image model training method, comprising:
获取样本图像集,所述样本图像集中包含至少一个样本图像;一个所述样本图像与一个原始网络输出关联;obtaining a sample image set, the sample image set includes at least one sample image; one of the sample images is associated with an original network output;
将所述样本图像输入至包含初始参数的预设神经网络模型中,通过所述预设神经网络模型对所述样本图像进行对称量化处理,以获取所述预设神经网络模型输出的第一量化网络输出;Inputting the sample image into a preset neural network model including initial parameters, and performing symmetrical quantization processing on the sample image through the preset neural network model to obtain a first quantization output from the preset neural network model network output;
确定所述第一量化网络输出与所述原始网络输出之间的第一余弦相似度;determining a first cosine similarity between the first quantized network output and the original network output;
根据所述第一余弦相似度、预设相似度阈值以及预设混合精度量化方法,对所述预设神经网络模型进行优化处理,获取优化处理后的预设神经网络模型的预设网络输出;According to the first cosine similarity, the preset similarity threshold and the preset mixed precision quantization method, the preset neural network model is optimized, and the preset network output of the optimized preset neural network model is obtained. ;
确定所述预设网络输出与所述原始网络输出之间的损失值,在所述损失值未达到预设收敛条件时,迭代更新所述预设神经网络模型的初始参数,直至所述损失值达到预设收敛条件时,将所述预设神经网络模型记录为图像处理模型。Determine the loss value between the preset network output and the original network output, and when the loss value does not reach the preset convergence condition, iteratively update the initial parameters of the preset neural network model until the loss value When the preset convergence condition is reached, the preset neural network model is recorded as an image processing model.
一种图像处理方法,包括:An image processing method, comprising:
获取待处理图像;Get the image to be processed;
将所述待处理图像输入至图像处理模型中,得到与所述待处理图像对应图像输出结果;所述图像处理模型是根据上述图像模型训练方法得到的。The to-be-processed image is input into an image processing model, and an image output result corresponding to the to-be-processed image is obtained; the image processing model is obtained according to the above-mentioned image model training method.
一种智能芯片,包括存储模块、处理模块以及存储在所述存储模块中并可在所述处理模块上运行的图像处理模型,所述图像处理模型是根据上述图像处理模型训练方法得到的;所述处理模块用于通过所述图像处理模型执行上述图像处理方法。An intelligent chip, comprising a storage module, a processing module and an image processing model stored in the storage module and running on the processing module, the image processing model is obtained according to the above-mentioned image processing model training method; The processing module is configured to execute the above image processing method through the image processing model.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述图像处理模型训练方法,或者所述处理器执行所述计算机程序时实现上述图像处理方法。A computer device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implements the above-mentioned image processing model training method when the processor executes the computer program, or The above-mentioned image processing method is implemented when the processor executes the computer program.
一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述图像模型训练方法,或者所述计算机程序被处理器执行时实现上述图像处理方法。A computer-readable storage medium storing a computer program, which implements the above-mentioned image model training method when the computer program is executed by a processor, or implements the above-mentioned image processing when the computer program is executed by the processor method.
上述图像模型训练、图像处理方法、芯片、设备及介质,该方法通过确定经过对称量化处理后的第一量化网络输出与原始网络输出之间的第一余弦相似度,并根据第一余弦相似度、余弦相似度阈值以及预设混合精度量化方法,对预设神经网络模型进行进一步优化处理,保并在预设神经网络模型的预设网络输出与原始网络输出之间的损失值达到预设收敛条件时,将训练完成的神经网络模型记录为图像处理模型,以通过混合精度量化方式以及不断调节初始参 数的方法,使得最终得到的图像处理模型的网络输出与原始网路输出之间的损失值较小,并且在混合精度量化过程中可以减少预设神经网络的参数,进而提高最终训练得到的图像处理模型的推理速率,以在将图像处理模型设置在智能芯片时,可以保证智能芯片的处理性能和处理精度。The above-mentioned image model training, image processing method, chip, device and medium, the method determines the first cosine similarity between the first quantized network output after the symmetrical quantization process and the original network output, and according to the first cosine Similarity, cosine similarity threshold, and preset mixed-precision quantization methods are used to further optimize the preset neural network model to ensure that the loss value between the preset network output of the preset neural network model and the original network output reaches the preset value. When the convergence conditions are set, the trained neural network model is recorded as an image processing model, so that the network output of the final image processing model and the original network output can be obtained through the mixed-precision quantization method and the method of continuously adjusting the initial parameters. The loss value is small, and the parameters of the preset neural network can be reduced in the mixed-precision quantization process, thereby improving the inference rate of the image processing model obtained by the final training, so that when the image processing model is set on the smart chip, the smart chip can be guaranteed. processing performance and processing accuracy.
附图说明Description of drawings
图1是本发明一实施例中图像模型训练方法的一流程图;FIG. 1 is a flowchart of an image model training method in an embodiment of the present invention;
图2是本发明一实施例中图像模型训练方法中步骤S20的一流程图;FIG. 2 is a flowchart of step S20 in the image model training method according to an embodiment of the present invention;
图3是本发明一实施例中图像模型训练方法中步骤S40的一流程图;3 is a flowchart of step S40 in the image model training method according to an embodiment of the present invention;
图4是本发明一实施例中图像模型训练方法中步骤S40的另一流程图;4 is another flowchart of step S40 in the image model training method in an embodiment of the present invention;
图5是本发明一实施例中图像模型训练方法中步骤S40的另一流程图;5 is another flowchart of step S40 in the image model training method according to an embodiment of the present invention;
图6是本发明一实施例中图像模型训练方法中步骤S40的另一流程图;6 is another flowchart of step S40 in the image model training method according to an embodiment of the present invention;
图7是本发明一实施例中计算机设备的一示意图。FIG. 7 is a schematic diagram of a computer device in an embodiment of the present invention.
具体实施方式detailed description
在一实施例中,如图1所示,提供一种图像模型训练方法,包括如下步骤:In one embodiment, as shown in FIG. 1, an image model training method is provided, which includes the following steps:
S10:获取样本图像集,所述样本图像集中包含至少一个样本图像,每一样本图像均关联一个原始网络输出。S10: Acquire a sample image set, the sample image set includes at least one sample image, and each sample image is associated with an original network output.
其中,样本图像集中的各样本图像可以为各类不同场景下采集到的图像,示例性地,样本图像可以为监控摄像头拍摄的监控图像,也可以为通过互联网下载得到的图像。原始网络输出可以是通过其它神经网络模型(也即除了下述包含预设初始参数的预设神经网络模型以外的其它模型)对样本图像进行处理后得到的输出。可以理解地,其它神经网络模型对样本图像进行处理的方法,与下述步骤中包含初始参数的预设神经网络模型中的处理方法不相同。The sample images in the sample image set may be images collected in various scenarios. Exemplarily, the sample images may be surveillance images captured by surveillance cameras, or may be images downloaded through the Internet. The original network output may be an output obtained by processing the sample image through other neural network models (ie, models other than the following preset neural network models including preset initial parameters). It is understandable that other neural network models process the sample images differently from the processing methods in the preset neural network model including initial parameters in the following steps.
S20:将所述样本图像输入至包含初始参数的预设神经网络模型中,通过预设神经网络模型对所述样本图像进行对称量化处理之后,获取预设神经网络模型输出的第一量化网络输出。S20: Input the sample image into a preset neural network model including initial parameters, and after performing symmetrical quantization processing on the sample image by using the preset neural network model, obtain a first quantized network output output by the preset neural network model .
其中,预设神经网络模型是通过对如CNN(Convolutional Neural Networks,卷积神经网络)网络模型进行改进得到的,该预设神经网络模型中包含神经元处理单元以及矢量处理单元。The preset neural network model is obtained by improving a network model such as a CNN (Convolutional Neural Networks, convolutional neural network), and the preset neural network model includes a neuron processing unit and a vector processing unit.
具体地,如图2所示,步骤S20中,也即将所述样本图像输入至包含初始 参数的预设神经网络模型中,通过所述预设神经网络模型对所述样本图像进行对称量化处理,以获取所述预设神经网络模型输出的第一量化网络输出,包括如下步骤:Specifically, as shown in FIG. 2, in step S20, the sample image is input into a preset neural network model including initial parameters, and the sample image is subjected to symmetrical quantization processing by the preset neural network model, To obtain the first quantized network output output by the preset neural network model, the steps include:
S201:获取与所述样本图像对应的第一精度算子以及第二精度算子;所述第二精度算子的精度高于所述第一精度算子的精度。S201: Acquire a first precision operator and a second precision operator corresponding to the sample image; the precision of the second precision operator is higher than that of the first precision operator.
示例性地,第一精度算子可以为Convolution(卷积算子)、Dense(密集型算子)、Add(叠加算子)、Relu(激活算子)和Pool(池化算子)等算子。示例性地,第二精度算子可以为SoftMax(分类算子)、Sigmoid(激活算子)、Normalize(归一化算子)。需要说明的是,第一精度算子以及第二精度算子均为float型算子,并且第二精度算子的精度高于第一精度算子的精度。Exemplarily, the first precision operator may be Convolution (convolution operator), Dense (dense operator), Add (superposition operator), Relu (activation operator), Pool (pooling operator) and other operators. son. Exemplarily, the second precision operator may be SoftMax (classification operator), Sigmoid (activation operator), and Normalize (normalization operator). It should be noted that the first precision operator and the second precision operator are both float type operators, and the precision of the second precision operator is higher than that of the first precision operator.
S202:对所述第一精度算子进行对称量化处理并获取第一网络输出;S202: Perform symmetrical quantization processing on the first precision operator and obtain a first network output;
其中,对称量化处理指的是将第一精度算子从float型量化成INT型算子,并在以INT型算子进行计算之后,将INT型算子反量化为float型算子的过程。The symmetric quantization process refers to a process of quantizing the first precision operator from a float type to an INT type operator, and after performing the calculation with the INT type operator, inversely quantizing the INT type operator into a float type operator.
具体地,步骤S302中,也即对所述第一精度算子进行对称量化处理并获取第一网络输出,包括如下步骤:Specifically, in step S302, that is, performing symmetrical quantization processing on the first precision operator and obtaining the first network output, including the following steps:
(1)采用对称量化模型对所述第一精度算子进行量化,得到与所述第一精度算子对应的量化算子。(1) Quantize the first precision operator by using a symmetric quantization model to obtain a quantization operator corresponding to the first precision operator.
具体地,在获取与所述样本图像对应的第一精度算子以及第二精度算子之后,获取所述对称量化模型中的量化参数;根据所述量化参数以及预设取整方法,对所述第一精度算子进行取整处理,得到与所述第一精度算子对应的取整算子;根据所述量化参数以及预设截取方法,对所述取整算子进行截取处理,得到所述量化算子。Specifically, after obtaining the first precision operator and the second precision operator corresponding to the sample image, obtain the quantization parameters in the symmetric quantization model; according to the quantization parameters and the preset rounding method, performing rounding processing on the first precision operator, to obtain a rounding operator corresponding to the first precision operator; the quantization operator.
进一步地,可以通过如下表达式得到量化算子:Further, the quantization operator can be obtained by the following expression:
Figure PCTCN2021114801-appb-000001
Figure PCTCN2021114801-appb-000001
Figure PCTCN2021114801-appb-000002
Figure PCTCN2021114801-appb-000002
x q=clip(x int,-2 N-1,2 N-1-1) x q = clip(x int , -2 N-1 , 2 N-1 -1)
其中:scale为收缩因子;threshold为量化阈值;N为量化位宽(N可以选取4、8、16等);x int为取整算子;ceil()为向上取整函数;x为第一精度算 子;x q为量化算子;clip()为截取函数。 Among them: scale is the shrink factor; threshold is the quantization threshold; N is the quantization bit width (N can be selected from 4, 8, 16, etc.); x int is the rounding operator; ceil() is the round-up function; x is the first Precision operator; x q is the quantization operator; clip() is the interception function.
进一步地,第一精度算子为float型算子,需要将第一精度算子转换成INT型的算子才可以通过神经元处理单元对其进行计算,而对于第一精度算子来说,采用8bit量化可以保证第一精度算子的精度的同时,减小第一精度算子的参数,因此该对称量化模型默认采用8bit Per channel对第一精度算子进行量化,也即上述表达式中的N取值为8,因此对于不同channel的对称量化模型,对应的scale也不相同;第一精度算子对应的量化对象包括Activation(算子的激活输出)和weight(算子参数),因此采用8bit Per channel对第一精度算子进行量化,也即表征采用INT8表示第一精度算子的Activation和weight。Further, the first precision operator is a float type operator, and the first precision operator needs to be converted into an INT type operator before it can be calculated by the neuron processing unit, and for the first precision operator, Using 8-bit quantization can ensure the accuracy of the first-precision operator while reducing the parameters of the first-precision operator. Therefore, the symmetric quantization model uses 8-bit Per channel to quantize the first-precision operator by default, that is, in the above expression The value of N is 8, so for the symmetric quantization models of different channels, the corresponding scales are also different; the quantization objects corresponding to the first precision operator include Activation (the activation output of the operator) and weight (the operator parameter), so The 8bit Per channel is used to quantize the first precision operator, that is, the Activation and weight of the first precision operator are represented by INT8.
进一步地,上述表达式中的threshold,对于第一精度算子的量化对象为Activation时,通过calibration(校准)的方式统计得到;对于第一精度算子的量化对象为weight时,直接取weight的绝对值的最大值。上述表达式中的clip()为截取函数,其中-2 N-1为允许的最小长度的算子,也即出现任意一个算子长度小于该值,则舍去;2 N-1-1为允许的最大长度的算子,也即任意一个算子长度大于该值,则舍去;也即限定对第一精度算子进行量化后,各算子的算子长度限定在-2 N-1~2 N-1-1之间。 Further, the threshold in the above expression, when the quantization object of the first precision operator is Activation, is obtained by means of calibration (calibration) statistics; when the quantization object of the first precision operator is weight, directly take the value of the weight. The absolute maximum value. The clip() in the above expression is the interception function, where -2 N-1 is the operator with the minimum allowed length, that is, if any operator has a length less than this value, it will be discarded; 2 N-1 -1 is The operator with the maximum allowable length, that is, any operator whose length is greater than this value, is discarded; that is, after quantizing the first precision operator, the operator length of each operator is limited to -2 N-1 ~2 N-1 -1.
(2)通过所述神经元处理单元对所述量化算子进行计算之后,通过所述矢量处理单元对计算后的所述量化算子进行反量化处理,得到所述第一网络输出。(2) After the quantization operator is calculated by the neuron processing unit, the vector processing unit performs inverse quantization processing on the calculated quantization operator to obtain the first network output.
其中,预设神经网络模型中还包括神经元处理单元。神经元处理单元处理密集型低精度算子,该神经元处理单元主要支持INT8以及INT4算子。反量化处理指的是将计算之后的量化算子从INT型算子反量化为float型算子。Wherein, the preset neural network model further includes a neuron processing unit. The neuron processing unit handles intensive low-precision operators. The neuron processing unit mainly supports INT8 and INT4 operators. The inverse quantization process refers to inverse quantization of the calculated quantization operator from an INT type operator to a float type operator.
具体地,在采用对称量化模型对所述第一精度算子进行量化,得到与所述第一精度算子对应的量化算子之后,根据所述量化参数以及所述量化算子,通过所述矢量处理单元对计算后的所述量化算子反量化处理,得到所述第一网络输出。Specifically, after using a symmetric quantization model to quantize the first precision operator to obtain a quantization operator corresponding to the first precision operator, according to the quantization parameter and the quantization operator, through the The vector processing unit inversely quantizes the calculated quantization operator to obtain the first network output.
具体地,采用如下反量化模型对计算后的所述量化算子反量化处理:Specifically, the following inverse quantization model is used to inverse quantize the calculated quantization operator:
x f=x qscale x f = x q scale
其中,x f为第一网络输出;x q为量化算子;scale为收缩因子。 Among them, x f is the output of the first network; x q is the quantization operator; scale is the shrinking factor.
需要说明的是,该反量化模型存储在矢量处理单元中,也即在通过神经元 处理单元对量化算子进行计算之后,将计算之后的量化算子输入至矢量处理单元中的反量化模型中,通过该反量化模型对计算后的量化算子进行反量化处理,进而将INT型的量化算子反量化为float型算子。It should be noted that the inverse quantization model is stored in the vector processing unit, that is, after the quantization operator is calculated by the neuron processing unit, the calculated quantization operator is input into the inverse quantization model in the vector processing unit. , and perform inverse quantization processing on the calculated quantization operator through the inverse quantization model, and then inverse quantize the INT type quantization operator into a float type operator.
在上述步骤的说明中已经表明,一般默认采用8bit Per channel对第一精度算子进行量化,也即采用INT8表示Activation和weight,而神经元处理单元支持INT8并且主要处理密集型低精度算子,因此在对量化算子进行计算时,通过神经元处理单元对量化算子进行计算。It has been shown in the description of the above steps that the 8bit Per channel is generally used by default to quantify the first precision operator, that is, INT8 is used to represent Activation and weight, and the neuron processing unit supports INT8 and mainly processes intensive low-precision operators. Therefore, when the quantization operator is calculated, the quantization operator is calculated by the neuron processing unit.
S203:获取与所述第二精度算子对应的第二网络输出。S203: Acquire a second network output corresponding to the second precision operator.
也即通过所述矢量处理单元对所述第二精度算子进行计算,得到所述第二网络输出。That is, the second network output is obtained by calculating the second precision operator by the vector processing unit.
其中,预设神经网络模型中还包括矢量处理单元,矢量处理单元处理如第二精度算子等精度比第一精度算子高的算子,该矢量处理单元主要支持INT8、INT16以及Float16算子。Among them, the preset neural network model also includes a vector processing unit. The vector processing unit processes operators with higher precision than the first precision operator, such as the second precision operator. The vector processing unit mainly supports INT8, INT16 and Float16 operators. .
具体地,对于第二精度算子,在本实施例中不采用对称量化模型对其进行量化,因为第二精度算子不需要通过转换成INT型算子的方式之后再进行计算,因此本实施例中,可以直接采用Float16算子表示第二精度算子;并且由于矢量处理单元可以支持Float16算子,而神经元处理单元不支持Float16算子,因此通过矢量处理单元对第二精度算子进行计算,得到第二网络输出。Specifically, for the second-precision operator, the symmetric quantization model is not used to quantize it in this embodiment, because the second-precision operator does not need to be converted into an INT-type operator and then calculated. Therefore, this implementation In the example, the Float16 operator can be directly used to represent the second precision operator; and since the vector processing unit can support the Float16 operator, but the neuron processing unit does not support the Float16 operator, the second precision operator is performed by the vector processing unit. Calculate to get the second network output.
S204:将所述第一网络输出以及第二网络输出记录为所述预设神经网络模型的第一量化网络输出。S204: Record the first network output and the second network output as the first quantitative network output of the preset neural network model.
具体地,在获取与样本图像对应的第一精度算子以及第二精度算子之后,对第一精度算子进行对称量化处理后,获取与对称量化处理后的第一精度算子对应的第一网络输出;获取与第二精度算子对应的第二网络输出(也即第二精度算子不进行对称量化,直接对第二精度算子进行计算,进而获取第二网络输出);将第一网络输出以及第二网络输出记录为预设神经网络模型的第一量化网络输出。Specifically, after acquiring the first precision operator and the second precision operator corresponding to the sample image, after performing symmetrical quantization processing on the first precision operator, acquire the first precision operator corresponding to the symmetrically quantized first precision operator. a network output; obtain the second network output corresponding to the second precision operator (that is, the second precision operator does not perform symmetrical quantization, and directly calculates the second precision operator, and then obtains the second network output); A network output and a second network output are recorded as the first quantitative network output of the preset neural network model.
进一步地,在获取到第一网络输出以及第二网络输出之后,将第一网络输出以及第二网络输出记录为预设神经网络模型的第一量化网络输出。可以理解地,由于对量化算子进行反量化处理后为float型数据,而第二网络输出中是通过Float16算子表示,因此第二网络输出也为float型数据,因此第一量化网络输出中为float型数据,进而才可以与步骤S30中的原始网络输出进行余弦相似度计算。Further, after acquiring the first network output and the second network output, the first network output and the second network output are recorded as the first quantitative network output of the preset neural network model. Understandably, since the quantization operator is inversely quantized, the data is float, and the output of the second network is represented by the Float16 operator, so the output of the second network is also float data, so the output of the first quantization network is float type data, and then the cosine similarity calculation can be performed with the original network output in step S30.
S30:确定所述第一量化网络输出与所述原始网络输出之间的第一余弦相似度。S30: Determine a first cosine similarity between the first quantized network output and the original network output.
其中,第一余弦相似度用于表征第一量化网络输出以及原始网络输出之间的相似程度。The first cosine similarity is used to represent the similarity between the first quantized network output and the original network output.
具体地,在将所述第一网络输出以及第二网络输出记录为所述预设神经网络模型的第一量化网络输出之后,获取通过其它计算机设备(如个人计算机等)对第一精度算子以及第二精度算子进行计算后得到的原始网络输出,进而根据下述表达式确定第一量化网络输出与原始网络输出之间的第一余弦相似度:Specifically, after recording the first network output and the second network output as the first quantized network output of the preset neural network model, obtain the first precision operator by other computer equipment (such as a personal computer, etc.) and the original network output obtained by the second precision operator, and then determine the first cosine similarity between the first quantized network output and the original network output according to the following expression:
Figure PCTCN2021114801-appb-000003
Figure PCTCN2021114801-appb-000003
其中,α为第一余弦相似度;X i为第一量化网络输出中第i个算子;Y i为原始网络输出中第i个算子;n为第一量化网络输出或者原始网络输出中算子总数。 Among them, α is the first cosine similarity; X i is the ith operator in the output of the first quantization network; Y i is the ith operator in the output of the original network; n is the output of the first quantization network or the original network output The total number of neutrons.
S40:根据所述第一余弦相似度、预设相似度阈值以及预设混合精度量化方法,对所述预设神经网络模型进行优化处理,获取优化处理后的预设神经网络模型的预设网络输出。S40: Perform optimization processing on the preset neural network model according to the first cosine similarity, the preset similarity threshold, and the preset mixed-precision quantization method, and obtain a preset value of the optimized preset neural network model. network output.
其中,第一余弦相似度的取值范围为-1~1,第一余弦相似度的值越大,表明第一量化网络输出与原始网络输出之间的相似程度越高,因此本实施例中预设相似度阈值可以设定为如0.97,0.98等。预设混合精度量化方法指的是针对第一余弦相似度以及预设相似度阈值之间的大小关系,采用不同精度的算子对预设神经网络模型进行替代量化,进而达到优化预设神经网络模型的方法。预设网络输出指的是对预设神经网络模型进行优化处理之后,预设神经网络模型最终的输出结果。The value range of the first cosine similarity is -1 to 1, and the larger the value of the first cosine similarity, the higher the similarity between the output of the first quantization network and the output of the original network. Therefore, this implementation In the example, the preset similarity threshold can be set as 0.97, 0.98, etc. The preset mixed-precision quantization method refers to the use of operators with different precisions to substitute and quantify the preset neural network model for the magnitude relationship between the first cosine similarity and the preset similarity threshold, so as to optimize the preset neural network. methods of network models. The preset network output refers to the final output result of the preset neural network model after the preset neural network model is optimized.
S50:确定所述预设网络输出与所述原始网络输出之间的损失值,在所述损失值未达到预设收敛条件时,迭代更新所述预设神经网络模型的初始参数,直至所述损失值达到预设收敛条件时,将所述预设神经网络模型记录为图像处理模型。S50: Determine a loss value between the preset network output and the original network output, and when the loss value does not reach a preset convergence condition, iteratively update the initial parameters of the preset neural network model until the When the loss value reaches a preset convergence condition, the preset neural network model is recorded as an image processing model.
其中,损失值指的是预设网络输出与原始网络输出之间的相似度损失。Among them, the loss value refers to the similarity loss between the preset network output and the original network output.
具体地,在获取优化处理后的预设神经网络模型的预设网络输出之后,确定预设网络输出与原始网络输出之间的相似度,可以将该相似度与预设相似度阈值之间的差值记录为损失值,则此时预设收敛条件可以为损失值小于0.01,也即预设网络输出与原始网络输出之间的相似度,与预设相似度阈值之间的差 值需要小于0.01,进而可以保证优化处理后的预设神经网络模型的输出可以满足余弦相似度的要求的同时,经过优化处理的预设神经网络模型的参数量减小,提高预设神经网络模型的推理速率。Specifically, after obtaining the preset network output of the optimized preset neural network model, the similarity between the preset network output and the original network output is determined, and the difference between the similarity and the preset similarity threshold can be determined. The difference is recorded as the loss value, then the preset convergence condition can be that the loss value is less than 0.01, that is, the difference between the preset network output and the original network output and the preset similarity threshold needs to be less than 0.01, which can ensure that the output of the optimized preset neural network model can meet the requirements of cosine similarity, and at the same time, the parameter amount of the optimized preset neural network model is reduced, and the inference rate of the preset neural network model is improved. .
可以理解地,该收敛条件可以为损失值小于设定阈值的条件,也即在损失值小于设定阈值时,停止训练;收敛条件还可以为损失值经过了10000次计算后值为很小且不会再下降的条件,也即损失值经过10000次计算后值很小且不会下降时,停止训练,并将收敛之后的预设神经网络模型记录为图像处理模型。Understandably, the convergence condition can be the condition that the loss value is less than the set threshold, that is, when the loss value is less than the set threshold, the training is stopped; the convergence condition can also be that the loss value after 10,000 calculations is very small and In the condition that the loss will not decrease again, that is, when the loss value is small and will not decrease after 10,000 calculations, the training is stopped, and the preset neural network model after convergence is recorded as the image processing model.
进一步地,根据预设网络输出与所述原始网络输出确定损失值之后,在损失值未达到预设的收敛条件时,根据该损失值调整预设神经网络模型的初始参数,并将该样本图像重新输入至调整初始参数后的预设神经网络模型中,以在该样本图像对应的损失值达到预设的收敛条件时,选取样本图像集中另一样本图像,并执行步骤S30-S40,得到与该样本图像对应的损失值,并在该损失值未达到预设的收敛条件时,根据该损失值再次调整预设神经网络模型的初始参数,使得该样本图像对应的损失值也达到预设的收敛条件。Further, after the loss value is determined according to the preset network output and the original network output, when the loss value does not reach the preset convergence condition, the initial parameters of the preset neural network model are adjusted according to the loss value, and the sample image Re-input into the preset neural network model after adjusting the initial parameters, so that when the loss value corresponding to the sample image reaches the preset convergence condition, another sample image in the sample image set is selected, and steps S30-S40 are executed to obtain the same The loss value corresponding to the sample image, and when the loss value does not reach the preset convergence condition, the initial parameters of the preset neural network model are adjusted again according to the loss value, so that the loss value corresponding to the sample image also reaches the preset value. Convergence condition.
如此,在通过样本图像集中所有样本图像对预设神经网络模型进行训练之后,使得预设神经网络模型的输出可以不断向准确地结果靠拢,直至所有样本图像对应的损失值均达到预设的收敛条件时,将收敛之后的预设神经网络模型记录为图像处理模型。In this way, after training the preset neural network model with all the sample images in the sample image set, the output of the preset neural network model can be continuously approached to the accurate result, until the loss values corresponding to all the sample images reach the preset convergence. When conditions are met, the preset neural network model after convergence is recorded as an image processing model.
在本实施例中,由于第一精度算子的计算性能远远高于浮点算子,且预设神经网络模型中神经元处理单元只支持定点计算,因此本实施中通过对第一精度算子(原始为float型算子)进行对称量化之后(此时转换为INT型算子)进行定点计算,可以使得进行对称量化之后预设神经网络模型中的参数变小,进而减少预设神经网络模型中的存储空间;并且通过根据第一余弦相似度以及预设相似度阈值,对所述预设神经网络模型进行基于混合精度的量化处理,可以进一步减少预设神经网络模型的计算量,提高预设神经网络模型的计算速率。In this embodiment, since the computing performance of the first-precision operator is much higher than that of the floating-point operator, and the neuron processing unit in the preset neural network model only supports fixed-point computing, in this implementation, the first-precision computing After the symmetric quantization of the sub (originally a float operator) (which is converted to an INT operator at this time), fixed-point calculation can be performed, which can make the parameters in the preset neural network model smaller after symmetric quantization, thereby reducing the preset neural network. storage space in the model; and by performing mixed-precision quantization processing on the preset neural network model according to the first cosine similarity and the preset similarity threshold, the calculation amount of the preset neural network model can be further reduced, Increase the calculation rate of preset neural network models.
在一具体实施例中,如图3所示,步骤S40中,也即根据所述第一余弦相似度、预设相似度阈值以及预设混合精度量化方法,对所述预设神经网络模型进行优化处理,包括:In a specific embodiment, as shown in FIG. 3, in step S40, that is, according to the first cosine similarity, the preset similarity threshold and the preset mixed precision quantization method, the preset neural network model is quantified. Optimizing processing, including:
S401:检测所述第一余弦相似度是否大于或等于所述预设相似度阈值。S401: Detect whether the first cosine similarity is greater than or equal to the preset similarity threshold.
S402:在所述第一余弦相似度大于或等于所述预设相似度阈值时,采用第一整型算子代替所述第一网络输出中计算量超过第一预设计算量的算子,将替 换之后的所述第一网络输出记录为第一替代算子。S402: When the first cosine similarity is greater than or equal to the preset similarity threshold, use a first integer operator to replace the operator whose calculation amount exceeds the first preset calculation amount in the first network output , record the first network output after replacement as the first replacement operator.
其中,第一整型算子指的是各种字节(如4个字节、2个字节等)且有符号整型的算子。第一预设计算量可以根据第一网络输出的算子总数进行确定。根据上述实施例中说明,默认采用8bit per channel(也即INT8型算子)对第一精度算子进行量化后计算,因此作为优选,本实施例中第一整型算子可以为INT4型算子。The first integer type operator refers to an operator of various bytes (eg, 4 bytes, 2 bytes, etc.) and a signed integer type. The first preset calculation amount may be determined according to the total number of operators output by the first network. According to the description in the above embodiment, 8bit per channel (that is, INT8 type operator) is used by default to quantize and calculate the first precision operator. Therefore, as an option, the first integer operator in this embodiment can be an INT4 type operator. son.
具体地,在确定所述第一量化网络输出与所述原始网络输出之间的第一余弦相似度之后,检测第一余弦相似度是否大于或等于预设相似度阈值;在第一余弦相似度大于或等于预设相似度阈值时,将第一网络输出中通过INT8算子表示后通过神经元处理单元计算的算子按照计算量从大到小排序,亦或者从小到大排序(可以理解地,在上述说明中第一网络输出通过对称量化时,采用INT8表示第一精度算子的Activation和weight,因此第一网络输出中均是通过INT8表示后通过神经元处理单元计算的算子),进而将第一网络输出中计算量超过第一预设计算量的算子替换成第一整型算子,也即将超过第一预设计算量的算子替换成采用INT4进行表示其Activation和/或weight(可以理解地,原来超过第一预设计算量的算子是通过转换成8个字节且有符号整型进行计算,现在将超过第一预设计算量的算子替换成4个字节且有符号整型进行计算),并且将替换之后的所述第一网络输出记录为第一替代算子。可以理解地,第一网络输出中除计算量超过第一预设计算量的算子以外的其它算子,以及第二网络输出中各算子,均不进行替换。Specifically, after determining the first cosine similarity between the first quantized network output and the original network output, detect whether the first cosine similarity is greater than or equal to a preset similarity threshold; When the chord similarity is greater than or equal to the preset similarity threshold, the operators calculated by the neuron processing unit after being represented by the INT8 operator in the first network output are sorted from large to small according to the amount of calculation, or sorted from small to large ( Understandably, in the above description, when the output of the first network is symmetrically quantized, INT8 is used to represent the activation and weight of the first precision operator, so the output of the first network is represented by INT8 and then calculated by the neuron processing unit. (sub), and then replace the operator whose calculation amount exceeds the first preset calculation amount in the first network output with the first integer operator, and also replace the operator that exceeds the first preset calculation amount with the use of INT4 to represent its Activation and/or weight (Understandably, the original operator that exceeds the first preset calculation amount is calculated by converting it into an 8-byte signed integer, and now the operator that exceeds the first preset calculation amount is replaced. 4 bytes and signed integer type for calculation), and the first network output after replacement is recorded as the first replacement operator. It is understandable that other operators in the output of the first network except the operators whose calculation amount exceeds the first preset calculation amount, and the operators in the output of the second network, are not replaced.
S403:通过所述神经元处理单元对所述第一替代算子进行计算,得到第三网络输出,并将所述第三网络输出以及第二网络输出记录为第二量化网络输出。S403: Calculate the first substitution operator by the neuron processing unit to obtain a third network output, and record the third network output and the second network output as a second quantization network output.
具体地,在所述第一余弦相似度大于或等于所述预设相似度阈值时,采用第一整型算子代替所述第一网络输出中计算量超过第一预设计算量的算子,将替换之后的所述第一网络输出记录为第一替代算子之后,第一替代算子中包括INT8以及第一整型算子,上述说明中指出神经元处理单元主要支持INT8以及INT4形式的算子,因此此时应通过神经元处理单元对第一替代算子进行计算,并对计算后的第一替代算子进行反量化,得到与第一替代算子对应的第三网络输出,并将第三网络输出以及第二网络输出记录为第二量化网络输出。Specifically, when the first cosine similarity is greater than or equal to the preset similarity threshold, a first integer operator is used to replace the calculation in the first network output whose calculation amount exceeds the first preset calculation amount After the first network output after replacement is recorded as the first replacement operator, the first replacement operator includes INT8 and the first integer operator, and the above description points out that the neuron processing unit mainly supports INT8 and INT4 Therefore, at this time, the first substitution operator should be calculated by the neuron processing unit, and the calculated first substitution operator should be inversely quantized to obtain the third network output corresponding to the first substitution operator. , and record the third network output and the second network output as the second quantization network output.
S404:确定所述第二量化网络输出与所述原始网络输出之间的第二余弦相似度。S404: Determine a second cosine similarity between the second quantized network output and the original network output.
S405:在所述第二余弦相似度大于或等于所述预设相似度阈值时,采用优化查询方法对所述第二量化网络输出进行查询替代处理,以获取所述预设网络输出。S405: When the second cosine similarity is greater than or equal to the preset similarity threshold, use an optimized query method to perform query substitution processing on the second quantized network output to obtain the preset network output.
其中,第二余弦相似度的取值范围为-1~1,第二余弦相似度的值越大,表明第二量化网络输出与原始网络输出之间的相似程度越高。优化查询方法可以为动态回归算法、回溯法、分治法以及贪心算法等;本实施例中采用贪心算法作为优化查询方法,查询替代处理指的是从第二量化网络输出中的第三网络输出中找出连续且可以采用第一整型算子代替的算子。The value range of the second cosine similarity is -1 to 1, and the larger the value of the second cosine similarity, the higher the similarity between the output of the second quantization network and the output of the original network. The optimization query method may be a dynamic regression algorithm, a backtracking method, a divide-and-conquer method, a greedy algorithm, etc.; in this embodiment, a greedy algorithm is used as the optimization query method, and the query substitution processing refers to the third network output from the second quantitative network output Find an operator that is continuous and can be replaced by the first integer operator.
具体地,在将所述第三网络输出以及第二网络输出记录为第二量化网络输出之后,确定第二量化网络输出与原始网络输出之间的第二余弦相似度;检测第二余弦相似度是否大于或等于预设相似度阈值,在第二余弦相似度大于或等于预设相似度阈值时,采用贪心算法寻找第三网络输出中连续且可以采用INT4代替的算子,并且同时设置最大搜索范围(如100个算子),防止搜索时间过长;在寻找到第三网络输出中连续且可以采用第一整型算子代替的算子时(如在剩余采用INT8表示的算子中寻找连续且可以采用第一整型算子代替的算子),将连续且采用INT4代替之后的量化网络(也即包括了第二网络输出以及代替后的第三网络输出)作为待量化图像模型训练完成的结果,得到预设网络输出。Specifically, after recording the third network output and the second network output as the second quantization network output, determine the second cosine similarity between the second quantization network output and the original network output; detect the second cosine Whether the similarity is greater than or equal to the preset similarity threshold, when the second cosine similarity is greater than or equal to the preset similarity threshold, a greedy algorithm is used to find a continuous operator in the output of the third network that can be replaced by INT4, and at the same time Set the maximum search range (such as 100 operators) to prevent the search time from being too long; when finding a continuous operator in the third network output that can be replaced by the first integer operator (such as in the remaining operators represented by INT8) Look for continuous operators that can be replaced by the first integer operator), and use the continuous and INT4 instead of the quantization network (that is, including the second network output and the replaced third network output) as the to-be-quantized The result of the image model training is completed, and the preset network output is obtained.
进一步地,在第二余弦相似度小于预设相似度阈值时,可以放弃当前结果,并返回步骤S402中,并减少步骤S402中采用第一整型算子进行替代的算子数量(可以理解地,假设第一次采用INT4替代的算子的数量为5个,则此时可以减少为4个),并进一步确定替代后的余弦相似度与预设相似度阈值之间的关系,在余弦相似度大于或等于预设相似度阈值时,执行步骤S405;若在步骤S402中采用第一整型算子进行替代的算子数量减少至1个之后(减少至0个之后与第三网络输出相同,执行步骤S405已经表征余弦相似度大于或等于预设相似度阈值),此时的余弦相似度仍小于预设相似度阈值,则将当前预设神经网络模型的输出作为预设网络输出。Further, when the second cosine similarity is less than the preset similarity threshold, the current result can be discarded, and return to step S402, and the number of operators to be replaced by the first integer operator in step S402 is reduced (it can be understood that Assuming that the number of operators replaced by INT4 for the first time is 5, it can be reduced to 4 at this time), and further determine the relationship between the replaced cosine similarity and the preset similarity threshold, in the cosine When the similarity is greater than or equal to the preset similarity threshold, step S405 is performed; if the number of operators that are replaced by the first integer operator in step S402 is reduced to 1 (after the reduction to 0, the third network outputs Similarly, step S405 is performed to indicate that the cosine similarity is greater than or equal to the preset similarity threshold), and the cosine similarity at this time is still less than the preset similarity threshold, then the output of the current preset neural network model is used as the preset network output.
在本实施例中,在满足预设相似度阈值要求的前提下,采用第一整型算子替换第一网络输出中超过第一预设计算量的算子,进而更进一步减少预设神经网络模型的计算量,更进一步提高预设神经网络模型的计算速率,并且在采用优化查询方法进行查询时,设置最大搜索范围,防止搜索时间过长,缩短对预设神经网络模型进行量化过程的时间。In this embodiment, on the premise that the preset similarity threshold requirement is met, the first integer operator is used to replace the operator exceeding the first preset calculation amount in the output of the first network, thereby further reducing the preset neural network The calculation amount of the model further improves the calculation rate of the preset neural network model, and when using the optimized query method to query, the maximum search range is set to prevent the search time from being too long and shorten the time for the quantification process of the preset neural network model. .
在一实施例中,预设神经网络模型还包括特殊处理单元,如图4所示,步骤S401之后,也即检测所述第一余弦相似度是否大于或等于所述预设相似度阈值之后,还包括:In one embodiment, the preset neural network model further includes a special processing unit. As shown in FIG. 4 , after step S401, that is, after detecting whether the first cosine similarity is greater than or equal to the preset similarity threshold ,Also includes:
S406:在所述第一余弦相似度小于所述预设相似度阈值时,采用浮点算子代替所述第二网络输出中计算量低于第二预设计算量的算子,将替换之后的所述第二网络输出记录为第二替代算子。S406: When the first cosine similarity is less than the preset similarity threshold, use a floating-point operator to replace the operator whose calculation amount is lower than the second preset calculation amount in the second network output, and replace The subsequent second network output is recorded as a second substitution operator.
其中,浮点算子指的是各浮点型算子,作为优选,由于上述实施例中选用Float16型算子表示第二精度算子,因此本实施例中浮点算子采用的是32位浮点型算子。第二预设计算量可以根据第二网络输出的算子总数进行确定。The floating-point operator refers to each floating-point operator. Preferably, since the Float16-type operator is selected to represent the second-precision operator in the above embodiment, the floating-point operator in this embodiment uses 32 bits floating point operator. The second preset calculation amount may be determined according to the total number of operators output by the second network.
具体地址,在确定所述第一量化网络输出与所述原始网络输出之间的第一余弦相似度之后,检测第一余弦相似度是否大于或等于预设相似度阈值;在第一余弦相似度小于预设相似度阈值时,将第二网络输出中通过Float16算子表示后通过矢量处理单元计算的算子按照计算量从小到大排序,亦或者从大到小排序(可以理解地,在上述说明中第二网络输出均是通过Float16算子表示第二精度算子后通过矢量处理单元进行计算的),进而将第二网络输出中计算量低于第二预设计算量的算子替换成Float32算子,也即将低于第二预设计算量的算子替换成采用Float32进行表示(可以理解地,原来第二网输出中的第二精度算子是通过转换成16位浮点型形式后通过矢量处理单元进行计算的,现在将低于第二预设计算量的算子替换成32位浮点型形式后通过矢量处理单元进行计算),并将替换之后的第二网络输出记录为第二替代算子。Specifically, after determining the first cosine similarity between the first quantized network output and the original network output, detect whether the first cosine similarity is greater than or equal to a preset similarity threshold; When the chord similarity is less than the preset similarity threshold, the operators calculated by the vector processing unit after being represented by the Float16 operator in the second network output are sorted according to the calculation amount from small to large, or sorted from large to small (understandably , in the above description, the second network output is calculated by the vector processing unit after the second precision operator is represented by the Float16 operator), and then the calculation amount in the second network output is lower than the second preset calculation amount. Replace the operator with Float32 operator, that is, replace the operator lower than the second preset calculation amount with Float32 for representation (understandably, the second precision operator in the original second network output is converted into 16-bit float by converting After the point form is calculated by the vector processing unit, now replace the operator lower than the second preset calculation amount with the 32-bit floating point form and then calculate by the vector processing unit), and replace the second network after The output is recorded as the second substitution operator.
可以理解地,第二网络输出中除计算量低于第二预设计算量的算子以外的其它算子,以及第一网络输出中各算子均不进行替换。It is understandable that, in the output of the second network, except for the operators whose calculation amount is lower than the second preset calculation amount, the operators in the output of the first network are not replaced.
S407:通过所述特殊处理单元对所述第二替代算子进行计算,得到第四网络输出,并将所述第一网络输出以及所述第四网络输出记录为第三量化网络输出。S407: Calculate the second substitution operator by the special processing unit to obtain a fourth network output, and record the first network output and the fourth network output as a third quantization network output.
其中,特殊处理单元可以为DSP(Digital Signal Process,数字信号处理器)或者CPU(Central Processing Unit,中央处理器),该特殊处理单元可以处理神经元处理单元以及矢量处理单元不支持的算子或者部分第二精度算子(例如Float32、Float16等)。The special processing unit may be a DSP (Digital Signal Process, digital signal processor) or a CPU (Central Processing Unit, central processing unit), and the special processing unit can process the neuron processing unit and the operator not supported by the vector processing unit or Some second-precision operators (eg Float32, Float16, etc.).
具体地,在所述第一余弦相似度小于所述预设相似度阈值时,采用浮点算子代替所述第二网络输出中计算量低于第二预设计算量的算子,将替换之后的所述第二网络输出记录为第二替代算子之后,第二替代算子中包括Float16 以及浮点算子(也即上述说明中的Float32算子),而矢量处理单元并不支持处理Float32算子,因此需要采用特殊处理单元对第二替代算子进行计算,得到与第二替代算子对应的第四网络输出,并将第一网络输出以及第四网络输出记录为第三量化网络输出(由于此时仅对第二网络输出的算子进行量化,因此第一网络输出并没有产生变化)。Specifically, when the first cosine similarity is less than the preset similarity threshold, a floating-point operator is used to replace the operator whose calculation amount is lower than the second preset calculation amount in the second network output, and After the replacement of the second network output is recorded as the second replacement operator, the second replacement operator includes Float16 and floating-point operators (that is, the Float32 operator in the above description), but the vector processing unit does not support To process the Float32 operator, it is necessary to use a special processing unit to calculate the second substitution operator, obtain the fourth network output corresponding to the second substitution operator, and record the first network output and the fourth network output as the third quantization Network output (since only the operator output by the second network is quantized at this time, the output of the first network does not change).
S408:采用预设查询方法查询所述第三量化网络输出中是否存在第一最小算子组合;所述第一最小算子组合指的是第三量化网络输出中的第一最小相似度大于或等于所述预设相似度阈值的所有算子组合中算子数量最少的算子组合;所述第一最小相似度指的是第三量化网络输出中的算子组合与原始网络输出中的预设算子组合之间的余弦相似度。S408: Use a preset query method to query whether there is a first minimum operator combination in the output of the third quantization network; the first minimum operator combination means that the first minimum similarity in the output of the third quantization network is greater than or Among all the operator combinations equal to the preset similarity threshold, the operator combination with the smallest number of operators; the first minimum similarity refers to the operator combination in the output of the third quantization network and the prediction in the output of the original network. Set the cosine similarity between subcombinations.
S409:在采用预设查询方法查询到第一最小算子组合时,采用第一整型算子代替所述第一网络输出中计算量较大的算子,得到所述第一替代算子。S409: When the first minimum operator combination is queried by using the preset query method, the first integer operator is used to replace the operator with a relatively large amount of calculation in the first network output to obtain the first replacement operator.
其中,预设查询方法可以为二分法,或者叠加法(也即从单个算子查找,若单个算子不是第一最小算子组合时,扩大范围为两个算子,四个算子,以此类推,直至查询到第一最小算子组合亦或者所有算子组合均查询完毕时结束查询),本实施例中优选选用二分法作为预设查询方法。Among them, the preset query method can be a dichotomy method, or a superposition method (that is, searching from a single operator, if the single operator is not the first minimum operator combination, the expansion range is two operators, four operators, and And so on, until the first minimum operator combination is queried or the query ends when all operator combinations are queried), in this embodiment, the dichotomy method is preferably selected as the default query method.
具体地,在将所述第一网络输出以及所述第四网络输出记录为第三量化网络输出之后,通过步骤S30中的表达式确定第三量化网络输出与原始网络输出之间的余弦相似度,可以理解地,在该表达式中,是针对于第三量化网络输出以及原始网络输出中每一算子之间的余弦相似度(例如i为2时,则是X 1和X 2,与Y 1和Y 2之间的相似度),因为此时可以采用二分法寻找余弦相似度大于预设阈值的最小算子组合,也即按照一定顺序(如计算量从大到小顺序)对第三量化网络输出以及原始网络输出中各算子进行排序之后,以各算子的排序在中间位置的算子为界限开始进行二分法寻找,在查找到余弦相似度大于预设相似度阈值的算子组合时,先将该算子组合记录为第一最小算子组合,并且继续采用二分法进行查找(此时的算子总数为上一次查询到第一最小算子组合时的算子总数的一半,因此二分法是以中间算子为界限对半进行查询的算法),若仍找到余弦相似度大于预设相似度阈值的另一算子组合时,将该算子组合更新为第一最小算子组合,以此类推,直至查询完毕,确定第一最小算子组合。 Specifically, after the first network output and the fourth network output are recorded as the third quantization network output, the cosine similarity between the third quantization network output and the original network output is determined by the expression in step S30 , it can be understood that in this expression, it is for the cosine similarity between the output of the third quantization network and each operator in the output of the original network (for example, when i is 2, it is X 1 and X 2 , and The similarity between Y 1 and Y 2 ), because at this time, the dichotomy method can be used to find the minimum operator combination whose cosine similarity is greater than the preset threshold, that is, according to a certain order (such as the calculation amount from large to small order) After the three-quantized network output and the operators in the original network output are sorted, the binary search starts with the operator whose sorting of each operator is in the middle position. When subcombining, first record the operator combination as the first minimum operator combination, and continue to use the dichotomy method to search (the total number of operators at this time is the total number of operators when the first minimum operator combination was queried last time. half, so the dichotomy is an algorithm for querying the half with the middle operator as the boundary), if another operator combination whose cosine similarity is greater than the preset similarity threshold is still found, the operator combination is updated to the first minimum Operator combination, and so on, until the query is completed, the first minimum operator combination is determined.
进一步地,在采用二分法查询到第一最小算子组合时,由于第三量化网络输出中的第一网络输出并未对其进行量化处理,也即第一网络输出中的Activation均是通过INT8算子进行表示之后通过神经元处理单元进行计算 的,因此按照计算量从大到小排序(或者从小到大)对第一网络输出中各算子进行排序,将第一网络输出中计算量超过第一预设计算量的算子替换成第一整型算子(优选为INT4算子),也即将超过第一预设计算量的算子替换成采用INT4进行表示其Activation和/或weight(可以理解地,原来超过第一预设计算量的算子是通过转换成8个字节且有符号整型进行计算,现在将超过第一预设计算量的算子替换成4个字节且有符号整型进行计算),并且将替换之后的所述第一网络输出记录为第一替代算子。Further, when the first minimum operator combination is queried by using the dichotomy method, since the first network output in the third quantization network output is not quantized, that is, the Activation in the first network output is all through INT8. After the operators are represented, they are calculated by the neuron processing unit. Therefore, the operators in the first network output are sorted according to the calculation amount from large to small (or from small to large), and the calculation amount in the first network output exceeds The operator of the first preset calculation amount is replaced with a first integer operator (preferably an INT4 operator), that is, the operator that exceeds the first preset calculation amount is replaced with INT4 to represent its Activation and/or weight ( Understandably, the original operator that exceeds the first preset calculation amount is calculated by converting it into an 8-byte signed integer, and now the operator that exceeds the first preset calculation amount is replaced by 4 bytes and signed integer type), and the first network output after replacement is recorded as the first replacement operator.
S410:通过所述神经元处理单元对所述第一替代算子进行计算,得到所述第三网络输出,并将所述第三网络输出以及所述第四网络输出记录为第四量化网络输出。S410: Calculate the first substitution operator by the neuron processing unit to obtain the third network output, and record the third network output and the fourth network output as the fourth quantization network output .
具体地,在所述第一余弦相似度大于或等于所述预设相似度阈值时,采用INT4算子代替所述第一网络输出中计算量超过第一预设计算量的算子,将替换之后的所述第一网络输出记录为第一替代算子之后,第一替代算子中包括INT8以及INT4算子,上述说明中指出神经元处理单元主要支持INT8以及INT4形式的算子,因此此时应通过神经元处理单元对第一替代算子进行计算,并对计算后的第一替代算子进行反量化,得到与第一替代算子对应的第三网络输出,并将第三网络输出以及第四网络输出记录为第四量化网络输出。Specifically, when the first cosine similarity is greater than or equal to the preset similarity threshold, the INT4 operator is used to replace the operator whose calculation amount exceeds the first preset calculation amount in the first network output, and the After the replacement of the first network output is recorded as the first replacement operator, the first replacement operator includes INT8 and INT4 operators. The above description points out that the neuron processing unit mainly supports INT8 and INT4 operators. Therefore, At this time, the first substitution operator should be calculated by the neuron processing unit, and the calculated first substitution operator should be inversely quantized to obtain the output of the third network corresponding to the first substitution operator, and the third network The output along with the fourth network output is recorded as the fourth quantization network output.
S411:确定所述第四量化网络输出与所述原始网络输出之间的第三余弦相似度。S411: Determine a third cosine similarity between the fourth quantized network output and the original network output.
S412:在所述第三余弦相似度大于或等于所述预设相似度阈值时,采用优化查询方法对所述第四量化网络输出进行查询替代处理,以获取预设网络输出。S412: When the third cosine similarity is greater than or equal to the preset similarity threshold, use an optimized query method to perform query substitution processing on the fourth quantized network output to obtain a preset network output.
其中,第三余弦相似度的取值范围为-1~1,第三余弦相似度的值越大,表明第四量化网络输出与原始网络输出之间的相似程度越高。The value range of the third cosine similarity is -1 to 1, and the larger the value of the third cosine similarity, the higher the similarity between the output of the fourth quantization network and the output of the original network.
具体地,在将所述第三网络输出以及所述第四网络输出记录为第四量化网络输出之后,确定所述第四量化网络输出与所述原始网络输出之间的第三余弦相似度;检测第三余弦相似度是否大于或等于预设相似度阈值,在第三余弦相似度大于或等于预设相似度阈值时,采用优化查询方法寻找第三网络输出中连续且可以采用INT4代替的算子,(如在剩余采用INT8表示的算子中寻找连续且可以采用第一整型算子代替的算子),将连续且采用INT4代替之后的量化网络输出(也即包括了第四网络输出以及代替后的第三网络输出)作为预设网络输出。Specifically, after the third network output and the fourth network output are recorded as the fourth quantization network output, a third cosine similarity between the fourth quantization network output and the original network output is determined ; Detect whether the third cosine similarity is greater than or equal to the preset similarity threshold, and when the third cosine similarity is greater than or equal to the preset similarity threshold, use an optimized query method to find the third network output that is continuous and can use INT4 The replacement operator, (such as finding a continuous operator in the remaining operators represented by INT8 that can be replaced by the first integer operator), will be continuous and replaced by INT4 after the quantization network output (that is, including the first integer operator). The four network outputs and the replaced third network output) are used as the default network outputs.
在本实施例中,在第一余弦相似度未达到预设相似度阈值时,通过将浮点算子替代第二网络输出中计算量低于第二预设计算量的算子,以提高预设神经网络模型之后的网络输出与原始网络输出之间的余弦相似度,进而在保证余弦相似度满足预设相似度阈值的前提下,减小预设神经网络模型的计算量,并且对第一精度算子部分的网络输出以及第二精度算子部分的网络输出进行算子替换,提升预设神经网络模型之后的网络输出与原始网络输出之间的余弦相似度。In this embodiment, when the first cosine similarity does not reach the preset similarity threshold, the floating-point operator is used to replace the operator whose calculation amount is lower than the second preset calculation amount in the second network output to improve the The cosine similarity between the network output after the preset neural network model and the original network output is further reduced under the premise of ensuring that the cosine similarity meets the preset similarity threshold, and the calculation amount of the preset neural network model is reduced. The network output of the first precision operator part and the network output of the second precision operator part are replaced by operators to improve the cosine similarity between the network output after the preset neural network model and the original network output.
在一实施例中,如图5所示,步骤S408之后,也即采用预设查询方法所述第三量化网络输出中是否存在第一最小算子组合之后,还包括:In one embodiment, as shown in FIG. 5 , after step S408, that is, after adopting the preset query method to determine whether there is a first minimum operator combination in the output of the third quantization network, the method further includes:
S413:在采用预设查询方法未查询到所述第一最小算子组合时,舍弃所述第三量化网络输出,并采用第二整型算子代替所述第一网络输出中激活输出低于预设激活输出阈值的算子,将替换之后的所述第一网络输出记录为第三替代算子。S413: When the first minimum operator combination is not queried by using the preset query method, discard the third quantized network output, and use a second integer operator to replace the activation output in the first network output with an output lower than The operator that activates the output threshold is preset, and the output of the first network after replacement is recorded as a third replacement operator.
其中,第二整型算子可以为INT16算子,也即16个字节且有符号整型的算子。预设激活输出阈值可以根据第一网路输出中各算子的Activation(也即各算子对应的激活输出)进行确定。The second integer operator may be an INT16 operator, that is, an operator of 16 bytes and a signed integer. The preset activation output threshold may be determined according to the activation of each operator in the output of the first network (that is, the activation output corresponding to each operator).
具体地,在将所述第一网络输出以及所述第四网络输出记录为第三量化网络输出之后,若采用二分法未查询到第一最小算子组合时,也即第三量化网络输出与原始网络输出中各算子组合之间余弦相似度均小于预设相似度阈值,可以理解地,也即所有第二网络输出中的Float16型算子均采用Float32代替后,仍未查询到第一最小算子组合时,则放弃步骤S306对第二网络输出中Float16算子的替换,也即舍弃第三量化网络输出;将第一网络输出中通过INT8算子表示后通过神经元处理单元计算的算子,根据各算子的Activation值且按照从大到小(或者从小到大)对各算子进行排序,将第一网络输出中Activation低于预设激活输出阈值的算子替换成INT16算子,也即将低于预设激活输出阈值的算子替换成采用INT16表示其Activation(可以理解地,原来低于预设激活输出阈值的算子是通过转换成8个字节且有符号整型进行计算,现在将低于预设激活输出阈值的算子替换成16个字节且有符号整型进行计算),并将替换之后的第一网络输出记录为第三替代算子。Specifically, after recording the output of the first network and the output of the fourth network as the output of the third quantization network, if the first minimum operator combination is not queried by using the dichotomy method, that is, the output of the third quantization network is the same as the output of the third quantization network. The cosine similarity between each operator combination in the original network output is less than the preset similarity threshold. It is understandable that after all the Float16 operators in the second network output are replaced by Float32, the first one has not been queried. When the minimum operator is combined, the replacement of the Float16 operator in the second network output in step S306 is abandoned, that is, the third quantization network output is discarded; the first network output is expressed by the INT8 operator and calculated by the neuron processing unit. Operator, according to the Activation value of each operator and in accordance with the order from large to small (or from small to large) to sort each operator, and replace the operator whose Activation is lower than the preset activation output threshold in the first network output with INT16 operator. That is to say, the operator that is lower than the preset activation output threshold is replaced by INT16 to represent its Activation (It is understandable that the original operator lower than the preset activation output threshold is converted into an 8-byte signed integer by converting Perform calculation, now replace the operators below the preset activation output threshold with 16 bytes and signed integers for calculation), and record the first network output after the replacement as the third replacement operator.
进一步地,步骤S412中,不需要考虑各算子对应的weight,因为在大多数情况下采用8bit Per channel对第一精度算子的weight进行量化是可以满足要求的,因此在步骤S412中不需要将第一网络输出中各算子的weight纳入 考虑范围,并且除了上述低于预设激活输出阈值的算子需要进行替换之外,第一网络输出中其它算子均不进行替换。Further, in step S412, it is not necessary to consider the weight corresponding to each operator, because in most cases, it is sufficient to quantify the weight of the first precision operator by using 8bit Per channel, so it is not required in step S412. The weight of each operator in the output of the first network is taken into consideration, and other operators in the output of the first network are not replaced except for the operators below the preset activation output threshold that need to be replaced.
S414:通过所述矢量处理单元对所述第三替代算子进行计算,得到第五网络输出,并将所述第五网络输出以及所述第二网络输出记录为第五量化网络输出。S414: Calculate the third substitution operator by the vector processing unit to obtain a fifth network output, and record the fifth network output and the second network output as the fifth quantization network output.
具体地,在在采用预设查询方法未查询到所述第一最小算子组合时,舍弃所述第三量化网络输出,并采用第二整型算子代替所述第一网络输出中激活输出低于预设激活输出阈值的算子,将替换之后的所述第一网络输出记录为第三替代算子之后,此时第三替代算子中包含了INT8以及第二整型算子(也即INT16算子),并且在上述实施例的说明中指出,该矢量处理单元主要支持INT8、INT16以及Float16算子,而神经元处理单元主要支持INT8以及INT4算子,因此此时应通过矢量处理单元对第三替代算子进行计算,得到第五网络输出,进而将第五网络输出以及第二网络输出记录为第五量化网络输出(由于在步骤S411中指出舍弃第三量化网络输出,也即舍弃步骤S406对第二网络输出进行的替换,因此此时应为第二网络输出与第五网络输出)。Specifically, when the first minimum operator combination is not queried by using the preset query method, the third quantization network output is discarded, and the second integer operator is used to replace the activation output in the first network output For the operator below the preset activation output threshold, the first network output after the replacement is recorded as the third replacement operator, and the third replacement operator includes INT8 and the second integer operator (also That is, INT16 operator), and it is pointed out in the description of the above embodiment that the vector processing unit mainly supports INT8, INT16 and Float16 operators, while the neuron processing unit mainly supports INT8 and INT4 operators, so at this time, vector processing should be used. The unit calculates the third substitution operator to obtain the fifth network output, and then records the fifth network output and the second network output as the fifth quantization network output (because it is pointed out in step S411 that the third quantization network output is discarded, that is, The replacement of the second network output in step S406 is discarded, so it should be the second network output and the fifth network output at this time).
S415:采用预设查询方法查询所述第五量化网络输出中是否存在第二最小算子组合;所述第二最小算子组合指的是第五量化网络输出中的第二最小相似度大于或等于所述预设相似度阈值的所有算子组合中算子数量最少的算子组合;所述第二最小相似度指的是第五量化网络输出中的算子组合与原始网络输出中的预设算子组合之间的余弦相似度。S415: Use a preset query method to query whether there is a second minimum operator combination in the output of the fifth quantization network; the second minimum operator combination refers to that the second minimum similarity in the output of the fifth quantization network is greater than or Among all the operator combinations equal to the preset similarity threshold, the operator combination with the smallest number of operators; the second minimum similarity refers to the operator combination in the output of the fifth quantization network and the prediction in the output of the original network. Set the cosine similarity between subcombinations.
S416:在采用预设查询方法查询到第二最小算子组合时,获取预设网络输出。S416: Obtain a preset network output when the second minimum operator combination is queried by using the preset query method.
具体地,在将所述第五网络输出以及所述第二网络输出记录为第五量化网络输出之后,通过步骤S30中的表达式确定第五量化网络输出与原始网络输出之间的余弦相似度,可以理解地,在该表达式中,是针对于第五量化网络输出以及原始网络输出中每一算子之间的余弦相似度(例如i为2时,则是X 1和X 2,与Y 1和Y 2之间的相似度),因为此时可以采用二分法寻找余弦相似度大于预设阈值的最小算子组合,也即按照一定顺序(如计算量从大到小顺序)对第五量化网络输出以及原始网络输出中各算子进行排序之后,以各算子的排序在中间位置的算子为界限开始进行二分法寻找,在查找到余弦相似度大于预设相似度阈值的算子组合时,先将该算子组合记录为第二最小算子组合,并且继续采用二分法进行查找(此时的算子总数为上一次查询到第二最小算子组合 时的算子总数的一半,因此二分法是以中间算子为界限对半进行查询的算法),若仍找到余弦相似度大于预设相似度阈值的另一算子组合时,将该另一算子组合更新为第二最小算子组合,以此类推,直至查询完毕,确定第二最小算子组合,并获取预设神经网络模型当前的预设网络输出。 Specifically, after the fifth network output and the second network output are recorded as the fifth quantization network output, the cosine similarity between the fifth quantization network output and the original network output is determined through the expression in step S30 , it can be understood that in this expression, it is for the cosine similarity between the output of the fifth quantization network and each operator in the output of the original network (for example, when i is 2, it is X 1 and X 2 , and The similarity between Y 1 and Y 2 ), because at this time, the dichotomy method can be used to find the minimum operator combination whose cosine similarity is greater than the preset threshold, that is, according to a certain order (such as the calculation amount from large to small order) After the five-quantized network output and the operators in the original network output are sorted, the binary search starts with the operator whose sorting of each operator is in the middle position. When subcombining, first record the operator combination as the second minimum operator combination, and continue to use the dichotomy method to search (the total number of operators at this time is the total number of operators when the second minimum operator combination was queried last time. half, so the dichotomy is an algorithm to query the half with the middle operator as the boundary), if another operator combination whose cosine similarity is greater than the preset similarity threshold is still found, the other operator combination is updated to the first The second minimum operator combination, and so on, until the query is completed, the second minimum operator combination is determined, and the current preset network output of the preset neural network model is obtained.
在本实施例中,在采用二分法为查询到第一最小算子组合时,改变对预设神经网络模型的量化处理,也即通过采用第二整型算子进行替代的方式,保证预设神经网络模型的精度,并且使得替换之后预设神经网络模型中存在满足预设相似度阈值的第二最小算子组合,并且减小了预设神经网络模型的计算量。In this embodiment, when the first minimum operator combination is queried by using the dichotomy method, the quantization processing of the preset neural network model is changed, that is, the second integer operator is used for substitution to ensure the preset The accuracy of the neural network model is improved, and the second minimum operator combination that meets the preset similarity threshold exists in the preset neural network model after replacement, and the calculation amount of the preset neural network model is reduced.
在一实施例中,如图6所示,步骤S415之后,也即采用预设查询方法所述第五量化网络输出中是否存在第二最小算子组合之后,还包括:In one embodiment, as shown in FIG. 6 , after step S415, that is, after adopting the preset query method to determine whether there is a second minimum operator combination in the output of the fifth quantization network, the method further includes:
S417:在采用预设查询方法未查询到所述第二最小算子组合时,采用第二整型算子代替所述第五网络输出中的所有算子,将替代之后的第五量化网络输出记录为第四替代算子。S417: When the second minimum operator combination is not queried by using the preset query method, use the second integer operator to replace all operators in the fifth network output, and replace the fifth quantization network output after the replacement Recorded as the fourth substitution operator.
具体地,在所述第五网络输出以及所述第二网络输出记录为第五量化网络输出之后,若采用预设查询方法未查询到第二最小算子组合时,也即第五量化网络输出与原始网络输出中各算子组合之间余弦相似度均小于预设相似度阈值,则将第五网络输出中的所有算子均替换成第二整型算子,可以理解地,在进行步骤S412的替换之后得到的第五网络输出中包含了INT8以及INT16算子,因此此时即将剩余INT8算子全部替换成INT16算子,也即原始第五网络输出中存在8个字节且有符号整型的算子,现在将其全部替换成16个字节且有符号整型的算子,并将替代之后的第五量化网络输出记录为第四替代算子。可以理解地,第四替代算子中各算子均为INT16算子,也即各算子均为16个字节且有符号整型的算子。Specifically, after the fifth network output and the second network output are recorded as the fifth quantization network output, if the second minimum operator combination is not queried by using the preset query method, that is, the fifth quantization network output If the cosine similarity with each operator combination in the original network output is less than the preset similarity threshold, then all operators in the fifth network output are replaced with the second integer operator. It is understandable that in the steps of The fifth network output obtained after the replacement of S412 contains INT8 and INT16 operators, so all the remaining INT8 operators will be replaced by INT16 operators at this time, that is, there are 8 bytes in the original fifth network output and signed Integer operators, now replace all of them with 16-byte signed integer operators, and record the fifth quantization network output after the substitution as the fourth substitution operator. It can be understood that each operator in the fourth substitution operator is an INT16 operator, that is, each operator is a 16-byte signed integer type operator.
S418:通过所述矢量处理单元对所述第四替代算子进行计算,得到第六网络输出,并将所述第六网络输出以及所述第二网络输出记录为第六量化网络输出。S418: Calculate the fourth substitution operator by the vector processing unit to obtain a sixth network output, and record the sixth network output and the second network output as the sixth quantization network output.
具体地,在采用二分法未查询到所述第二最小算子组合时,采用第二整型算子代替所述第五网络输出中的所有算子,将替代之后的第五量化网络输出记录为第四替代算子之后,此时第四替代算子中各算子均为第二整型算子(也即INT15算子),因此需要通过矢量处理单元对第四替代算子进行计算,得到第六网络输出,并将第六网络输出以及第二网络输出记录为第六量化网络输出。Specifically, when the second minimum operator combination is not queried by using the dichotomy method, the second integer operator is used to replace all the operators in the output of the fifth network, and the output of the fifth quantization network after the replacement is recorded. After being the fourth substitution operator, each operator in the fourth substitution operator is the second integer operator (that is, the INT15 operator), so the fourth substitution operator needs to be calculated by the vector processing unit, The sixth network output is obtained, and the sixth network output and the second network output are recorded as the sixth quantization network output.
S419:确定所述第六量化网络输出与所述原始网络输出之间的第四余弦相 似度。S419: Determine a fourth cosine similarity between the sixth quantized network output and the original network output.
S420:在所述第三余弦相似度小于所述预设相似度阈值时,采用浮点算子代替所述第二网络输出中计算量低于第二预设计算量的算子,将替代之后的所述第二网络输出记录为所述第二替代算子。S420: When the third cosine similarity is less than the preset similarity threshold, use a floating-point operator to replace the operator whose calculation amount is lower than the second preset calculation amount in the second network output, and replace The subsequent second network output is recorded as the second substitution operator.
具体地,在将第六网络输出以及第二网络输出记录为第六量化网络输出之后,确定所述第六量化网络输出与所述原始网络输出之间的第四余弦相似度;检测第四余弦相似度是否大于或等于预设相似度阈值,在第四余弦相似度小于所述预设相似度阈值时,将第二网络输出中通过Float16算子表示后通过矢量处理单元计算的算子按照计算量从小到大排序,亦或者从大到小排序(可以理解地,在上述说明中第二网络输出均是通过Float16算子表示第二精度算子后通过矢量处理单元进行计算的),进而将第二网络输出中计算量低于第二预设计算量的算子替换成浮点算子(优选为Float32算子),也即将低于第二预设计算量的算子替换成采用Float32进行表示(可以理解地,原来第二网输出中的第二精度算子是通过转换成16位浮点型形式后通过矢量处理单元进行计算的,现在将低于第二预设计算量的算子替换成32位浮点型形式后通过矢量处理单元进行计算),并将替换之后的第二网络输出记录为第二替代算子。Specifically, after recording the sixth network output and the second network output as the sixth quantization network output, determine a fourth cosine similarity between the sixth quantization network output and the original network output; detect the fourth Whether the cosine similarity is greater than or equal to the preset similarity threshold, when the fourth cosine similarity is less than the preset similarity threshold, the second network output is expressed by the Float16 operator and then calculated by the vector processing unit. The operators are sorted according to the calculation amount from small to large, or from large to small (it is understandable that in the above description, the second network output is calculated by the vector processing unit after the second precision operator is represented by the Float16 operator) , and then replace the operator whose calculation amount is lower than the second preset calculation amount in the second network output with a floating-point operator (preferably a Float32 operator), that is, replace the operator whose calculation amount is lower than the second preset calculation amount with It is represented by Float32 (it is understandable that the second precision operator in the output of the second network is calculated by the vector processing unit after being converted into a 16-bit floating point form, and now it will be lower than the second preset calculation amount The operator is replaced by a 32-bit floating point form and then calculated by the vector processing unit), and the second network output after the replacement is recorded as the second replacement operator.
S421:通过所述特殊处理单元对所述第二替代算子进行计算,得到第四网络输出,并将所述第六网络输出以及所述第四网络输出记录为第七量化网络输出。S421: Calculate the second substitution operator by the special processing unit to obtain a fourth network output, and record the sixth network output and the fourth network output as a seventh quantization network output.
具体地,在所述第一余弦相似度小于所述预设相似度阈值时,采用浮点算子代替所述第二网络输出中计算量低于第二预设计算量的算子,将替换之后的所述第二网络输出记录为第二替代算子之后,第二替代算子中包括Float16以及Float32算子,而矢量处理单元并不支持处理Float32算子,因此需要采用特殊处理单元对第二替代算子进行计算,得到与第二替代算子对应的第四网络输出,并将第四网络输出以及第六网络输出记录为第七量化网络输出(由于此时仅对第二网络输出的算子进行量化,因此第六网络输出并没有产生变化)。Specifically, when the first cosine similarity is less than the preset similarity threshold, a floating-point operator is used to replace the operator whose calculation amount is lower than the second preset calculation amount in the second network output, and After the replacement of the second network output is recorded as the second replacement operator, the second replacement operator includes the Float16 and Float32 operators, and the vector processing unit does not support the processing of the Float32 operator, so a special processing unit is required. The second substitution operator performs calculation to obtain the fourth network output corresponding to the second substitution operator, and records the fourth network output and the sixth network output as the seventh quantization network output (because only the second network output is is quantized by the operator of , so the output of the sixth network does not change).
S422:采用预设查询方法查询所述第七量化网络输出中是否存在第三最小算子组合;所述第三最小算子组合指的是第七量化网络输出中的第三最小相似度大于或等于所述预设相似度阈值的所有算子组合中算子数量最少的算子组合;所述第三最小相似度指的是第七量化网络输出中的算子组合与原始网络输出中的预设算子组合之间的余弦相似度。S422: Use a preset query method to query whether there is a third minimum operator combination in the output of the seventh quantization network; the third minimum operator combination refers to that the third minimum similarity in the output of the seventh quantization network is greater than or Among all the operator combinations equal to the preset similarity threshold, the operator combination with the smallest number of operators; the third minimum similarity refers to the operator combination in the output of the seventh quantization network and the prediction in the output of the original network. Let the cosine similarity between the subcombinations.
S423:在采用预设查询方法查询到第三最小算子组合时,获取预设网络输 出。S423: Obtain a preset network output when the third minimum operator combination is queried by using the preset query method.
具体地,在将第四网络输出以及第六网络输出记录为第七量化网络输出之后,通过步骤S30中的表达式确定第三量化网络输出与原始网络输出之间的余弦相似度,可以理解地,在该表达式中,是针对于第七量化网络输出以及原始网络输出中每一算子之间的余弦相似度(例如i为2时,则是X 1和X 2,与Y 1和Y 2之间的相似度),因为此时可以采用二分法寻找余弦相似度大于预设阈值的最小算子组合,也即按照一定顺序(如计算量从大到小顺序)对第七量化网络输出以及原始网络输出中各算子进行排序之后,以各算子排序在中间位置的算子作为界限开始进行二分法寻找,在查找到余弦相似度大于预设相似度阈值的算子组合时,先将该算子组合记录为第三最小算子组合,并且继续采用二分法进行查找(此时的算子总数为上一次查询到第三最小算子组合时的算子总数的一半,因此二分法是以中间算子为界限对半进行查询的算法),若仍找到余弦相似度大于预设相似度阈值的另一算子组合时,将该算子组合更新为第三最小算子组合,以此类推,直至查询完毕,确定第三最小算子组合,获取当前预设神经网络模型的预设网络输出。 Specifically, after the fourth network output and the sixth network output are recorded as the seventh quantization network output, the cosine similarity between the third quantization network output and the original network output is determined by the expression in step S30, which is understandable , in this expression, is for the cosine similarity between the output of the seventh quantized network and each operator in the output of the original network (for example, when i is 2, it is X 1 and X 2 , and Y 1 and Y 2 ), because at this time, the dichotomy method can be used to find the minimum operator combination whose cosine similarity is greater than the preset threshold, that is, the seventh quantization network is output in a certain order (such as the order of the calculation amount from large to small). And after the operators in the original network output are sorted, start the binary search with the operator whose operators are sorted in the middle position as the boundary, and find the operator combination whose cosine similarity is greater than the preset similarity threshold. Record the operator combination as the third minimum operator combination, and continue to use the dichotomy method to search (the total number of operators at this time is half of the total number of operators when the third minimum operator combination was queried last time, so the dichotomy method is an algorithm that uses the intermediate operator as the boundary to query in half), if another operator combination whose cosine similarity is greater than the preset similarity threshold is still found, the operator combination is updated to the third minimum operator combination, with By analogy, until the query is completed, the third minimum operator combination is determined, and the preset network output of the current preset neural network model is obtained.
在本实施例中,在采用预设查询方法为查询到第二最小算子组合时,通过将采用INT8算子表示的所有第一精度算子,替换为采用IN16算子表示,以提高预设网络输出与原始网络输出之间的余弦相似度,使其满足预设相似度阈值的要求,从而在满足余弦相似度要求的前提下,减小预设神经网络模型的计算量。In this embodiment, when using the preset query method to query the second minimum operator combination, all the first precision operators represented by the INT8 operator are replaced by the IN16 operator, so as to improve the preset The cosine similarity between the network output and the original network output makes it meet the requirements of the preset similarity threshold, thereby reducing the calculation amount of the preset neural network model on the premise of satisfying the cosine similarity requirements.
在另一实施例中,步骤S422之后,也即采用预设查询方法所述第七量化网络输出中是否存在第三最小算子组合之后,还包括:In another embodiment, after step S422, that is, whether there is a third minimum operator combination in the output of the seventh quantization network using the preset query method, the method further includes:
在采用预设查询方法未查询到所述第三最小算子组合时,提示所述预设神经网络模型训练出现错误。When the third minimum operator combination is not queried by using the preset query method, it is prompted that an error occurs in the training of the preset neural network model.
具体地,在将第四网络输出以及第六网络输出记录为第七量化网络输出之后,若采用预设查询方法未查询到第三最小算子组合,则表征当前预设神经网络模型训练出现错误,无法对其继续进行量化,否则会导致不满足余弦相似度的同时,与精度要求偏离较大,此时提示预设神经网络模型训练出现错误。Specifically, after recording the output of the fourth network and the output of the sixth network as the output of the seventh quantization network, if the third minimum operator combination is not queried by using the preset query method, it means that there is an error in the training of the current preset neural network model , it cannot continue to be quantized, otherwise it will not meet the cosine similarity, and at the same time deviate greatly from the accuracy requirement, at this time, it will prompt an error in the training of the preset neural network model.
在另一具体实施方式中,步骤S50之后,也即将所述预设神经网络模型记录为图像处理模型,还包括:In another specific embodiment, after step S50, the preset neural network model is also recorded as an image processing model, further comprising:
对图像处理模型进行精度检测,在确定图像处理模型不满足预设精度要求时,提示相关人员对其进行手动调参处理。The accuracy of the image processing model is detected, and when it is determined that the image processing model does not meet the preset accuracy requirements, the relevant personnel are prompted to manually adjust the parameters.
其中,预设精度要求根据具体应用场景以及具体地预设神经网络模型的计算要求进行设定。The preset accuracy requirements are set according to specific application scenarios and specific calculation requirements of the preset neural network model.
具体地,在所述损失值未达到预设收敛条件时,迭代更新所述预设神经网络模型的初始参数,直至所述损失值达到预设收敛条件时,将所述预设神经网络模型记录为图像处理模型之后,此时表征该图像处理模型的预设网络输出与原始网络输出之间的余弦相似度达到预设相似度阈值的前提下,减少了预设神经网络模型的计算量,但是不一定是最优的量化结果。其中,不一定是最优的量化结果可能表现在如下两个方面:Specifically, when the loss value does not reach the preset convergence condition, the initial parameters of the preset neural network model are iteratively updated, and the preset neural network model is recorded until the loss value reaches the preset convergence condition. After the image processing model, the cosine similarity between the preset network output representing the image processing model and the original network output reaches the preset similarity threshold, reducing the calculation amount of the preset neural network model, but Not necessarily optimal quantitative results. Among them, the quantitative results that are not necessarily optimal may be manifested in the following two aspects:
第一个方面:图像处理模型可能不满足预设精度要求。The first aspect: the image processing model may not meet the preset accuracy requirements.
针对第一个方面,可以通过手动将图像处理模型中,可能引起精度下降的算子标注为第二精度算子,例如将INT8算子替换成INT16算子。For the first aspect, operators in the image processing model that may cause precision degradation can be manually marked as second-precision operators, for example, replacing INT8 operators with INT16 operators.
第二个方面:图像处理模型利用率没有达到预设利用率要求,需要继续减少图像处理模型的计算量。The second aspect: the utilization rate of the image processing model does not meet the preset utilization requirement, and it is necessary to continue to reduce the calculation amount of the image processing model.
针对第二个方面,可以通过手动将图像处理模型中,密集型算子标注为低精度算子。For the second aspect, the intensive operators in the image processing model can be manually marked as low-precision operators.
在本发明中,通过采用对称量化以及基于混合精度的量化处理方式,并且结合手动调参的方式,与现有技术中通过全空间搜索每个算子的量化比特数的方法相比,本发明可以在较短的时间内,快速量化出计算量较少,并且可以保证预设神经网络模型的精度与原始网络输出的精度相差较小,同时减少预设神经网络模型的计算量,以及减小预设神经网络模型的参数,从而提高了预设神经网络模型的计算速率。In the present invention, by adopting symmetric quantization and quantization processing methods based on mixed precision, and combining with the method of manual parameter adjustment, compared with the method of searching the quantization bit number of each operator through full space in the prior art, the present invention It can be quickly quantified in a short time with less calculation amount, and can ensure that the accuracy of the preset neural network model is less different from the accuracy of the original network output, and at the same time reduce the calculation amount of the preset neural network model, and reduce the The parameters of the neural network model are preset, thereby improving the calculation rate of the preset neural network model.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
在一实施例中,提供一种智能芯片,包括存储模块、处理模块以及存储在所述存储模块中并可在所述处理模块上运行的图像处理模型,所述图像处理模型是根据上述图像处理模型训练方法得到的;所述处理模块用于通过所述图像处理模型执行上述图像处理方法。In one embodiment, a smart chip is provided, including a storage module, a processing module, and an image processing model stored in the storage module and runnable on the processing module, the image processing model is based on the image processing described above. obtained by the model training method; the processing module is configured to execute the above image processing method through the image processing model.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图7所示。该计算机设备包括通过***总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非 易失性存储介质存储有操作***、计算机程序和数据库。该内存储器为非易失性存储介质中的操作***和计算机程序的运行提供环境。该计算机设备的数据库用于存储上述图像模型训练方法或者图像处理方法所使用到的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种图像模型训练方法或者图像处理方法。In one embodiment, a computer device is provided, and the computer device can be a server, and its internal structure diagram can be as shown in FIG. 7 . The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The non-volatile storage medium stores an operating system, a computer program and a database. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store the data used by the above-mentioned image model training method or image processing method. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to implement an image model training method or an image processing method.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述实施例中的图像模型训练方法,或者处理器执行计算机程序时实现上述实施例中的图像处理方法。In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the image model training method in the above embodiment when the computer program is executed. , or the image processing method in the above embodiment is implemented when the processor executes the computer program.
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述实施例中图像模型训练方法,或者计算机程序被处理器执行时实现上述实施例中图像处理方法。In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the image model training method in the above embodiment is implemented, or when the computer program is executed by the processor, the above-mentioned method is implemented. Image processing method in the embodiment.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。。Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated to different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. .

Claims (15)

  1. 一种图像模型训练方法,其特征在于,包括:An image model training method, comprising:
    获取样本图像集,所述样本图像集中包含至少一个样本图像;一个所述样本图像与一个原始网络输出关联;obtaining a sample image set, the sample image set includes at least one sample image; one of the sample images is associated with an original network output;
    将所述样本图像输入至包含初始参数的预设神经网络模型中,通过所述预设神经网络模型对所述样本图像进行对称量化处理,以获取所述预设神经网络模型输出的第一量化网络输出;Inputting the sample image into a preset neural network model including initial parameters, and performing symmetrical quantization processing on the sample image through the preset neural network model to obtain a first quantization output from the preset neural network model network output;
    确定所述第一量化网络输出与所述原始网络输出之间的第一余弦相似度;determining a first cosine similarity between the first quantized network output and the original network output;
    根据所述第一余弦相似度、预设相似度阈值以及预设混合精度量化方法,对所述预设神经网络模型进行优化处理,获取优化处理后的预设神经网络模型的预设网络输出;According to the first cosine similarity, the preset similarity threshold and the preset mixed precision quantization method, the preset neural network model is optimized, and the preset network output of the optimized preset neural network model is obtained. ;
    确定所述预设网络输出与所述原始网络输出之间的损失值,在所述损失值未达到预设收敛条件时,迭代更新所述预设神经网络模型的初始参数,直至所述损失值达到预设收敛条件时,将所述预设神经网络模型记录为图像处理模型。Determine the loss value between the preset network output and the original network output, and when the loss value does not reach the preset convergence condition, iteratively update the initial parameters of the preset neural network model until the loss value When the preset convergence condition is reached, the preset neural network model is recorded as an image processing model.
  2. 如权利要求1所述的图像模型训练方法,其特征在于,所述将所述样本图像输入至包含初始参数的预设神经网络模型中,对所述样本图像进行对称量化处理,得到第一量化网络输出,包括:The image model training method according to claim 1, wherein the sample image is input into a preset neural network model including initial parameters, and the sample image is symmetrically quantized to obtain the first quantization Network output, including:
    获取与所述样本图像对应的第一精度算子以及第二精度算子;所述第二精度算子的精度高于所述第一精度算子的精度;acquiring a first precision operator and a second precision operator corresponding to the sample image; the precision of the second precision operator is higher than that of the first precision operator;
    对所述第一精度算子进行对称量化处理并获取第一网络输出;Symmetrical quantization processing is performed on the first precision operator and a first network output is obtained;
    获取与所述第二精度算子对应的第二网络输出;obtaining a second network output corresponding to the second precision operator;
    将所述第一网络输出以及第二网络输出记录为所述预设神经网络模型的第一量化网络输出。The first network output and the second network output are recorded as the first quantitative network output of the preset neural network model.
  3. 如权利要求2所述的图像模型训练方法,其特征在于,所述预设神经网络模型中包括神经元处理单元以及矢量处理单元;The image model training method according to claim 2, wherein the preset neural network model includes a neuron processing unit and a vector processing unit;
    所述对所述第一精度算子进行对称量化处理并获取第一网络输出,包括:The performing symmetrical quantization processing on the first precision operator and obtaining the first network output includes:
    采用对称量化模型对所述第一精度算子进行量化,得到与所述第一精度算子对应的量化算子;Using a symmetric quantization model to quantize the first precision operator to obtain a quantization operator corresponding to the first precision operator;
    通过所述神经元处理单元对所述量化算子进行计算之后,通过所述矢量处理单元对计算后的所述量化算子进行反量化处理,得到所述第一网络输出。After the quantization operator is calculated by the neuron processing unit, the vector processing unit performs inverse quantization processing on the calculated quantization operator to obtain the first network output.
  4. 如权利要求3所述的图像模型训练方法,其特征在于,所述采用对称量化模型对所述第一精度算子进行量化,得到与所述第一精度算子对应的量化算子,包括:The image model training method according to claim 3, wherein the quantization of the first precision operator by using a symmetric quantization model to obtain a quantization operator corresponding to the first precision operator, comprising:
    获取所述对称量化模型中的量化参数;obtaining quantization parameters in the symmetric quantization model;
    根据所述量化参数以及预设取整方法,对所述第一精度算子进行取整处理,得到与所述第一精度算子对应的取整算子;performing rounding processing on the first precision operator according to the quantization parameter and a preset rounding method to obtain a rounding operator corresponding to the first precision operator;
    根据所述量化参数以及预设截取方法,对所述取整算子进行截取处理,得到所述量化算子。According to the quantization parameter and the preset truncation method, truncate the rounding operator to obtain the quantization operator.
  5. 如权利要求4所述的图像模型训练方法,其特征在于,所述通过所述矢量处理单元对计算后的所述量化算子反量化处理,得到所述第一网络输出,包括:The image model training method according to claim 4, wherein the inverse quantization processing of the calculated quantization operator by the vector processing unit to obtain the first network output comprises:
    根据所述量化参数以及所述量化算子,通过所述矢量处理单元对计算后的所述量化算子反量化处理,得到所述第一网络输出。According to the quantization parameter and the quantization operator, the vector processing unit performs inverse quantization processing on the calculated quantization operator to obtain the first network output.
  6. 如权利要求3所述的图像模型训练方法,其特征在于,所述获取与所述第二精度算子对应的第二网络输出,包括:The image model training method according to claim 3, wherein the obtaining the second network output corresponding to the second precision operator comprises:
    通过所述矢量处理单元对所述第二精度算子进行计算,得到所述第二网络输出。The second network output is obtained by calculating the second precision operator by the vector processing unit.
  7. 如权利要求3所述的图像模型训练方法,其特征在于,根据所述第一余弦相似度、预设相似度阈值以及预设混合精度量化方法,对所述预设神经网络模型进行优化处理之后,获取优化处理后的预设神经网络模型的预设网络输出,包括:The image model training method according to claim 3, wherein the preset neural network model is optimized according to the first cosine similarity, a preset similarity threshold and a preset mixed precision quantization method After that, obtain the preset network output of the optimized preset neural network model, including:
    检测所述第一余弦相似度是否大于或等于所述预设相似度阈值;Detecting whether the first cosine similarity is greater than or equal to the preset similarity threshold;
    在所述第一余弦相似度大于或等于所述预设相似度阈值时,采用第一整型算子代替所述第一网络输出中计算量超过第一预设计算量的算子,将替换之后的所述第一网络输出记录为第一替代算子;When the first cosine similarity is greater than or equal to the preset similarity threshold, the first integer operator is used to replace the operator whose calculation amount exceeds the first preset calculation amount in the first network output, and the The first network output after the replacement is recorded as the first replacement operator;
    通过所述神经元处理单元对所述第一替代算子进行计算,得到第三网络输出,并将所述第三网络输出以及第二网络输出记录为第二量化网络输出;The first substitution operator is calculated by the neuron processing unit to obtain a third network output, and the third network output and the second network output are recorded as the second quantization network output;
    确定所述第二量化网络输出与所述原始网络输出之间的第二余弦相似度;determining a second cosine similarity between the second quantized network output and the original network output;
    在所述第二余弦相似度大于或等于所述预设相似度阈值时,采用优化查询方法对所述第二量化网络输出进行查询替代处理,以获取所述预设网络输出。When the second cosine similarity is greater than or equal to the preset similarity threshold, an optimized query method is used to perform query substitution processing on the second quantized network output to obtain the preset network output.
  8. 如权利要求7所述的图像模型训练方法,其特征在于,所述预设神经网络模型中还包括特殊处理单元;The image model training method according to claim 7, wherein the preset neural network model further comprises a special processing unit;
    所述检测所述第一余弦相似度是否大于或等于所述预设相似度阈值之后,还包括:After the detecting whether the first cosine similarity is greater than or equal to the preset similarity threshold, the method further includes:
    在所述第一余弦相似度小于所述预设相似度阈值时,采用浮点算子代替所 述第二网络输出中计算量低于第二预设计算量的算子,将替换之后的所述第二网络输出记录为第二替代算子;When the first cosine similarity is less than the preset similarity threshold, a floating-point operator is used to replace the operator whose calculation amount is lower than the second preset calculation amount in the second network output, and the replaced The second network output record is a second substitution operator;
    通过所述特殊处理单元对所述第二替代算子进行计算,得到第四网络输出,并将所述第一网络输出以及所述第四网络输出记录为第三量化网络输出;The second substitution operator is calculated by the special processing unit to obtain the fourth network output, and the first network output and the fourth network output are recorded as the third quantization network output;
    采用预设查询方法查询所述第三量化网络输出中是否存在第一最小算子组合;所述第一最小算子组合指的是第三量化网络输出中的第一最小相似度大于或等于所述预设相似度阈值的所有算子组合中算子数量最少的算子组合;所述第一最小相似度指的是第三量化网络输出中的算子组合与原始网络输出中的预设算子组合之间的余弦相似度;A preset query method is used to query whether there is a first minimum operator combination in the output of the third quantization network; the first minimum operator combination refers to that the first minimum similarity in the output of the third quantization network is greater than or equal to all The operator combination with the smallest number of operators among all the operator combinations of the preset similarity threshold; the first minimum similarity refers to the operator combination in the output of the third quantization network and the preset calculation in the output of the original network. cosine similarity between subgroups;
    在采用预设查询方法查询到第一最小算子组合时,采用第一整型算子代替所述第一网络输出中计算量较大的算子,得到所述第一替代算子;通过所述神经元处理单元对所述第一替代算子进行计算,得到所述第三网络输出,并将所述第三网络输出以及所述第四网络输出记录为第四量化网络输出;When the first minimum operator combination is queried by using the preset query method, the first integer operator is used to replace the operator with a relatively large amount of calculation in the first network output to obtain the first replacement operator; The neuron processing unit calculates the first substitution operator to obtain the third network output, and records the third network output and the fourth network output as the fourth quantization network output;
    确定所述第四量化网络输出与所述原始网络输出之间的第三余弦相似度;determining a third cosine similarity between the fourth quantized network output and the original network output;
    在所述第三余弦相似度大于或等于所述预设相似度阈值时,采用优化查询方法对所述第四量化网络输出进行查询替代处理,以获取所述预设网络输出。When the third cosine similarity is greater than or equal to the preset similarity threshold, an optimized query method is used to perform query substitution processing on the fourth quantized network output to obtain the preset network output.
  9. 如权利要求8所述的图像模型训练方法,其特征在于,所述采用预设查询方法查询所述第三量化网络输出中是否存在第一最小算子组合之后,还包括:The image model training method according to claim 8, wherein after using a preset query method to query whether there is a first minimum operator combination in the output of the third quantization network, the method further comprises:
    在采用预设查询方法未查询到所述第一最小算子组合时,舍弃所述第三量化网络输出,并采用第二整型算子代替所述第一网络输出中激活输出低于预设激活输出阈值的算子,将替换之后的所述第一网络输出记录为第三替代算子;When the first minimum operator combination is not queried by using the preset query method, the third quantization network output is discarded, and the second integer operator is used to replace the activation output in the first network output which is lower than the preset value. Activate the operator of the output threshold, and record the output of the first network after the replacement as the third replacement operator;
    通过所述矢量处理单元对所述第三替代算子进行计算,得到第五网络输出,并将所述第五网络输出以及所述第二网络输出记录为第五量化网络输出;The third substitution operator is calculated by the vector processing unit to obtain the fifth network output, and the fifth network output and the second network output are recorded as the fifth quantization network output;
    采用预设查询方法查询所述第五量化网络输出中是否存在第二最小算子组合;所述第二最小算子组合指的是第五量化网络输出中的第二最小相似度大于或等于所述预设相似度阈值的所有算子组合中算子数量最少的算子组合;所述第二最小相似度指的是第五量化网络输出中的算子组合与原始网络输出中的预设算子组合之间的余弦相似度;A preset query method is used to query whether there is a second minimum operator combination in the output of the fifth quantization network; the second minimum operator combination means that the second minimum similarity in the output of the fifth quantization network is greater than or equal to the The operator combination with the smallest number of operators among all the operator combinations of the preset similarity threshold; the second minimum similarity refers to the operator combination in the output of the fifth quantization network and the preset calculation in the output of the original network. cosine similarity between subgroups;
    在采用预设查询方法查询到第二最小算子组合时,获取所述预设网络输出。When the second minimum operator combination is queried by using the preset query method, the preset network output is acquired.
  10. 如权利要求9所述的图像模型训练方法,其特征在于,所述采用预设查询方法查询所述第五量化网络输出中是否存在第二最小算子组合之后,还包括:The image model training method according to claim 9, wherein after using a preset query method to query whether there is a second minimum operator combination in the output of the fifth quantization network, the method further comprises:
    在采用二分法未查询到所述第二最小算子组合时,采用第二整型算子代替 所述第五网络输出中的所有算子,将替代之后的第五量化网络输出记录为第四替代算子;When the second minimum operator combination is not queried by using the dichotomy method, the second integer operator is used to replace all operators in the output of the fifth network, and the output of the fifth quantization network after the replacement is recorded as the fourth Substitute operator;
    通过所述矢量处理单元对所述第四替代算子进行计算,得到第六网络输出,并将所述第六网络输出以及所述第二网络输出记录为第六量化网络输出;The fourth substitution operator is calculated by the vector processing unit to obtain the sixth network output, and the sixth network output and the second network output are recorded as the sixth quantization network output;
    确定所述第六量化网络输出与所述原始网络输出之间的第四余弦相似度;determining a fourth cosine similarity between the sixth quantized network output and the original network output;
    在所述第三余弦相似度小于所述预设相似度阈值时,采用浮点算子代替所述第二网络输出中计算量低于第二预设计算量的算子,将替代之后的所述第二网络输出记录为所述第二替代算子;When the third cosine similarity is smaller than the preset similarity threshold, a floating-point operator is used to replace the operator whose calculation amount is lower than the second preset calculation amount in the second network output, and the subsequent the second network output record is the second substitution operator;
    通过所述特殊处理单元对所述第二替代算子进行计算,得到第四网络输出,并将所述第六网络输出以及所述第四网络输出记录为第七量化网络输出;The second substitution operator is calculated by the special processing unit to obtain the fourth network output, and the sixth network output and the fourth network output are recorded as the seventh quantization network output;
    采用预设查询方法查询所述第七量化网络输出中是否存在第三最小算子组合;Using a preset query method to query whether there is a third minimum operator combination in the output of the seventh quantization network;
    在采用预设查询方法查询到第三最小算子组合时,获取所述预设网络输出;所述第三最小算子组合指的是第七量化网络输出中的第三最小相似度大于或等于所述预设相似度阈值的所有算子组合中算子数量最少的算子组合;所述第三最小相似度指的是第七量化网络输出中的算子组合与原始网络输出中的预设算子组合之间的余弦相似度。When the third minimum operator combination is queried by using the preset query method, the preset network output is obtained; the third minimum operator combination refers to that the third minimum similarity in the output of the seventh quantization network is greater than or equal to The operator combination with the smallest number of operators among all the operator combinations of the preset similarity threshold; the third minimum similarity refers to the operator combination in the output of the seventh quantization network and the preset in the output of the original network Cosine similarity between operator combinations.
  11. 如权利要求10所述的图像模型训练方法,其特征在于,所述采用预设查询方法所述第七量化网络输出中是否存在第三最小算子组合,还包括:The image model training method according to claim 10, wherein the using the preset query method to determine whether there is a third minimum operator combination in the output of the seventh quantization network further comprises:
    在采用预设查询方法未查询到所述第三最小算子组合时,提示所述预设神经网络模型训练出现错误。When the third minimum operator combination is not queried by using the preset query method, it is prompted that an error occurs in the training of the preset neural network model.
  12. 一种图像处理方法,其特征在于,包括:An image processing method, comprising:
    获取待处理图像;Get the image to be processed;
    将所述待处理图像输入至图像处理模型中,得到与所述待处理图像对应图像输出结果;所述图像处理模型是根据如权利要求1至11任一项所述图像模型训练方法得到的。The to-be-processed image is input into an image processing model, and an image output result corresponding to the to-be-processed image is obtained; the image processing model is obtained according to the image model training method according to any one of claims 1 to 11.
  13. 一种智能芯片,其特征在于,其特征在于,包括存储模块、处理模块以及存储在所述存储模块中并可在所述处理模块上运行的图像处理模型,所述图像处理模型是根据如权利要求1至11任一项所述图像处理模型训练方法得到的;所述处理模块用于通过所述图像处理模型执行如权利要求12所述的图像处理方法。An intelligent chip, characterized in that it includes a storage module, a processing module, and an image processing model stored in the storage module and operable on the processing module, and the image processing model is based on claims obtained by the image processing model training method according to any one of claims 1 to 11; the processing module is configured to execute the image processing method according to claim 12 by using the image processing model.
  14. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机 程序时实现如权利要求1至11任一项所述图像处理模型训练方法,或者所述处理器执行所述计算机程序时实现如权利要求12所述图像处理方法。A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, when the processor executes the computer program, the implementation of claims 1 to 11. The image processing model training method according to any one of the above, or the image processing method according to claim 12 is implemented when the processor executes the computer program.
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至11任一项所述图像处理模型训练方法,或者所述计算机程序被处理器执行时实现如权利要求12所述图像处理方法。A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the image processing model training method according to any one of claims 1 to 11 is implemented , or when the computer program is executed by the processor, the image processing method according to claim 12 is implemented.
PCT/CN2021/114801 2020-09-23 2021-08-26 Image model training method, image processing method, chip, device and medium WO2022062828A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011005967.0A CN112287968A (en) 2020-09-23 2020-09-23 Image model training method, image processing method, chip, device and medium
CN202011005967.0 2020-09-23

Publications (1)

Publication Number Publication Date
WO2022062828A1 true WO2022062828A1 (en) 2022-03-31

Family

ID=74422168

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114801 WO2022062828A1 (en) 2020-09-23 2021-08-26 Image model training method, image processing method, chip, device and medium

Country Status (2)

Country Link
CN (1) CN112287968A (en)
WO (1) WO2022062828A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115760609A (en) * 2022-11-14 2023-03-07 王育新 Image optimization method and system
CN116543419A (en) * 2023-07-06 2023-08-04 浙江大学金华研究院 Hotel health personnel wearing detection method and system based on embedded platform
CN117893975A (en) * 2024-03-18 2024-04-16 南京邮电大学 Multi-precision residual error quantization method in power monitoring and identification scene

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287968A (en) * 2020-09-23 2021-01-29 深圳云天励飞技术股份有限公司 Image model training method, image processing method, chip, device and medium
CN112906883A (en) * 2021-02-04 2021-06-04 云从科技集团股份有限公司 Hybrid precision quantization strategy determination method and system for deep neural network
CN113505774B (en) * 2021-07-14 2023-11-10 众淼创新科技(青岛)股份有限公司 Policy identification model size compression method
CN113657289B (en) * 2021-08-19 2023-08-08 北京百度网讯科技有限公司 Training method and device of threshold estimation model and electronic equipment
CN113780523B (en) * 2021-08-27 2024-03-29 深圳云天励飞技术股份有限公司 Image processing method, device, terminal equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510083A (en) * 2018-03-29 2018-09-07 国信优易数据有限公司 A kind of neural network model compression method and device
CN110069715A (en) * 2019-04-29 2019-07-30 腾讯科技(深圳)有限公司 A kind of method of information recommendation model training, the method and device of information recommendation
CN110348562A (en) * 2019-06-19 2019-10-18 北京迈格威科技有限公司 The quantization strategy of neural network determines method, image-recognizing method and device
CN110414630A (en) * 2019-08-12 2019-11-05 上海商汤临港智能科技有限公司 The training method of neural network, the accelerated method of convolutional calculation, device and equipment
CN111047049A (en) * 2019-12-05 2020-04-21 北京小米移动软件有限公司 Method, apparatus and medium for processing multimedia data based on machine learning model
WO2020160787A1 (en) * 2019-02-08 2020-08-13 Huawei Technologies Co., Ltd. Neural network quantization method using multiple refined quantized kernels for constrained hardware deployment
CN112287968A (en) * 2020-09-23 2021-01-29 深圳云天励飞技术股份有限公司 Image model training method, image processing method, chip, device and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510083A (en) * 2018-03-29 2018-09-07 国信优易数据有限公司 A kind of neural network model compression method and device
WO2020160787A1 (en) * 2019-02-08 2020-08-13 Huawei Technologies Co., Ltd. Neural network quantization method using multiple refined quantized kernels for constrained hardware deployment
CN110069715A (en) * 2019-04-29 2019-07-30 腾讯科技(深圳)有限公司 A kind of method of information recommendation model training, the method and device of information recommendation
CN110348562A (en) * 2019-06-19 2019-10-18 北京迈格威科技有限公司 The quantization strategy of neural network determines method, image-recognizing method and device
CN110414630A (en) * 2019-08-12 2019-11-05 上海商汤临港智能科技有限公司 The training method of neural network, the accelerated method of convolutional calculation, device and equipment
CN111047049A (en) * 2019-12-05 2020-04-21 北京小米移动软件有限公司 Method, apparatus and medium for processing multimedia data based on machine learning model
CN112287968A (en) * 2020-09-23 2021-01-29 深圳云天励飞技术股份有限公司 Image model training method, image processing method, chip, device and medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115760609A (en) * 2022-11-14 2023-03-07 王育新 Image optimization method and system
CN116543419A (en) * 2023-07-06 2023-08-04 浙江大学金华研究院 Hotel health personnel wearing detection method and system based on embedded platform
CN116543419B (en) * 2023-07-06 2023-11-07 浙江大学金华研究院 Hotel health personnel wearing detection method and system based on embedded platform
CN117893975A (en) * 2024-03-18 2024-04-16 南京邮电大学 Multi-precision residual error quantization method in power monitoring and identification scene
CN117893975B (en) * 2024-03-18 2024-05-28 南京邮电大学 Multi-precision residual error quantization method in power monitoring and identification scene

Also Published As

Publication number Publication date
CN112287968A (en) 2021-01-29

Similar Documents

Publication Publication Date Title
WO2022062828A1 (en) Image model training method, image processing method, chip, device and medium
CN110348562B (en) Neural network quantization strategy determination method, image identification method and device
TWI791610B (en) Method and apparatus for quantizing artificial neural network and floating-point neural network
TWI806922B (en) Method and apparatus for quantizing artificial neural network, and method of quantizing floating-point neural network
WO2022042123A1 (en) Image recognition model generation method and apparatus, computer device and storage medium
JP2022541359A (en) Model compression method, image processing method and apparatus
CN109583561B (en) Activation quantity quantification method and device for deep neural network
CN113723589A (en) Hybrid precision neural network
WO2020134819A1 (en) Method for searching face, and related device
CN110751175A (en) Method and device for optimizing loss function, computer equipment and storage medium
US20230385645A1 (en) Method for automatic hybrid quantization of deep artificial neural networks
CN114677548A (en) Neural network image classification system and method based on resistive random access memory
CN115952855A (en) Neural network quantization method, device and equipment
CN113239697B (en) Entity recognition model training method and device, computer equipment and storage medium
CN114239799A (en) Efficient target detection method, device, medium and system
US11507782B2 (en) Method, device, and program product for determining model compression rate
CN111178162A (en) Image recognition method and device, computer equipment and storage medium
CN115438590B (en) Precipitation prediction correction method and system based on BP neural network
CN112052194A (en) Ternary content addressable memory and method of operating the same
US20230075932A1 (en) Dynamic variable quantization of machine learning parameters
CN113610709B (en) Model quantization method, apparatus, electronic device, and computer-readable storage medium
CN117348837A (en) Quantization method and device for floating point precision model, electronic equipment and storage medium
CN113449863A (en) Neural network quantization method based on table lookup
US20210216867A1 (en) Information processing apparatus, neural network computation program, and neural network computation method
CN113177627B (en) Optimization system, retraining system, method thereof, processor and readable medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21871201

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21871201

Country of ref document: EP

Kind code of ref document: A1