CN113159318A

CN113159318A - Neural network quantification method and device, electronic equipment and storage medium

Info

Publication number: CN113159318A
Application number: CN202110444570.XA
Authority: CN
Inventors: 刘理; 许明恺; 杨超; 刘凌志; 王东; 许柯; 吕君环
Original assignee: Beijing Jiaotong University; Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Jiaotong University; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2021-07-23

Abstract

The present disclosure relates to a quantization method, apparatus, electronic device and storage medium for a neural network, and relates to the field of computer technology, the method comprising: acquiring a training sample image and the type of the training sample image, and inputting the training sample image into a first model to obtain a prediction classification result; determining a first gradient according to the prediction classification result, the type of the training sample image, the initial weight and the initial quantization step length; updating the initial quantization step based on the first gradient, the initial quantization step and the learning rate corresponding to the first model; and adjusting the weight in the first model based on the updated quantization step to obtain a quantized model. According to the method and the device, the electronic equipment can reasonably and effectively determine the quantized model, namely, the prediction accuracy of the quantized model is improved while the quantized model is small in size.

Description

Neural network quantification method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a quantization method and apparatus for a neural network, an electronic device, and a storage medium.

Background

At present, a Deep Convolution Neural Network (DCNN) has been widely used in the fields of image classification, speech recognition, target detection, and the like. For example, a picture may be input into a neural network model to obtain a category corresponding to the picture.

Because the neural network model occupies large storage resources and requires large computational resources, if the neural network model needs to be deployed (or used) in a mobile device (e.g., a terminal), the neural network model generally needs to be compressed to reduce the volume of the neural network model. Specifically, the weight of the 32-bit floating point number in the original neural network model may be quantized to a weight of 8bit or 4bit, for example, the weight value included in the original neural network may be quantized to-1 or 1 two values, that is, the quantized neural network model including the two values is obtained. Further, the input picture can be predicted by the quantized neural network model.

However, the above-mentioned method for quantizing the original neural network model may not be reasonable enough, that is, the quantized neural network model may not be determined reasonably and effectively, and further, the prediction result may be inaccurate.

Disclosure of Invention

The disclosure provides a quantization method and device for a neural network, an electronic device and a storage medium, and solves the technical problem that in the prior art, the quantization mode is unreasonable, and further the quantized model prediction result is inaccurate.

The technical scheme of the embodiment of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, a method for quantizing a neural network is provided. The method can comprise the following steps: acquiring a training sample image and the type of the training sample image, inputting the training sample image into a first model to obtain a prediction classification result, wherein the initial weight of a network layer in the first model is determined based on an initial quantization step and the weight of the network layer in a reference model; determining a first gradient according to the prediction classification result, the type of the training sample image, the initial weight and the initial quantization step length; updating the initial quantization step based on the first gradient, the initial quantization step and the learning rate corresponding to the first model; and adjusting the weight in the first model based on the updated quantization step to obtain a quantized model.

Optionally, the quantization method of the neural network further includes: executing a first operation until the execution times of the first operation reach an operation threshold value to obtain a target model; the first operation includes: acquiring a target image and the type of the target image; respectively inputting the target image into a current model to be processed and the reference model to respectively obtain a first classification result and a second classification result, wherein when the first operation is executed for the first time, the current model to be processed is the quantized model, and when the first operation is not executed for the first time, the current model to be processed is the current model to be processed after the last first operation is executed; determining a second gradient according to the first classification result, the second classification result, the type of the target image, the weight corresponding to the current model to be processed and the quantization step corresponding to the current model to be processed; updating the quantization step corresponding to the current model to be processed based on the second gradient, the quantization step corresponding to the current model to be processed and the learning rate corresponding to the current model to be processed; and adjusting the weight corresponding to the current model to be processed based on the updated quantization step.

Optionally, the determining a first gradient according to the prediction classification result, the type of the training sample image, the initial weight, and the initial quantization step specifically includes: determining a first loss according to the prediction classification result and the type of the training sample image, wherein the first loss is used for representing the difference between the prediction classification result and the type of the training sample image; determining the first gradient according to the first loss, the initial weight and the initial quantization step.

Optionally, the determining the first gradient according to the first loss, the initial weight, and the initial quantization step specifically includes: determining a third gradient and a fourth gradient, wherein the third gradient is the gradient of the first loss to the initial weight, and the fourth gradient is the gradient of the initial weight to the initial quantization step; determining a product of the third gradient and the fourth gradient as the first gradient.

Optionally, the quantization method of the neural network further includes: obtaining the reference model; and determining the initial weight according to the initial quantization step and the weight of the network layer in the reference model.

Optionally, the determining the initial weight according to the initial quantization step and the weight of the network layer in the reference model specifically includes: determining that the initial weight satisfies the following formula:

wherein, w₂Represents the initial weight, w₁Representing the weight of the network layer in the reference model, s₁Represents the initial quantization step, min represents the minimum value of the initial weight, and max represents the maximum value of the initial weight.

Optionally, the determining the first loss according to the prediction classification result and the type of the training sample image specifically includes: determining that the first loss satisfies the following equation:

wherein，L₁Representing the first loss, n representing the number of training sample images, c representing the number of types of the training sample images, p_c(X_i) Represents a prediction classification result corresponding to the ith training sample image included in the prediction classification result, q_c(X_i') indicates the type corresponding to the i-th training sample image in the types of the training sample images, i ≧ 1.

Optionally, the updating the initial quantization step based on the first gradient, the initial quantization step, and the learning rate corresponding to the first model specifically includes: determining that the updated quantization step size satisfies the following equation:

wherein s is₂Represents the updated quantization step size, s₁Which represents the initial quantization step size and which,

representing the first gradient, η₁The learning rate corresponding to the first model is represented.

According to a second aspect of the embodiments of the present disclosure, there is provided a quantization apparatus of a neural network. The apparatus may include: the device comprises an acquisition module, a processing module and a determination module; the acquisition module is configured to acquire a training sample image and a type of the training sample image; the processing module is configured to input the training sample image into a first model, and obtain a prediction classification result, wherein the initial weight of a network layer in the first model is determined based on an initial quantization step and the weight of the network layer in a reference model; the determining module is configured to determine a first gradient according to the prediction classification result, the type of the training sample image, the initial weight and the initial quantization step; the processing module is further configured to update the initial quantization step based on the first gradient, the initial quantization step and a learning rate corresponding to the first model; the processing module is further configured to adjust weights in the first model based on the updated quantization step size to obtain a quantized model.

Optionally, the processing module is further configured to execute a first operation until the number of execution times of the first operation reaches an operation threshold, so as to obtain a target model; the first operation includes the following S1-S5: s1, acquiring a target image and the type of the target image; s2, inputting the target image into a current model to be processed and the reference model respectively to obtain a first classification result and a second classification result respectively, wherein when the first operation is executed for the first time, the current model to be processed is the quantized model, and when the first operation is not executed for the first time, the current model to be processed is the current model to be processed after the last first operation is executed; s3, determining a second gradient according to the first classification result, the second classification result, the type of the target image, the weight corresponding to the current model to be processed and the quantization step size corresponding to the current model to be processed; s4, updating the quantization step corresponding to the current model to be processed based on the second gradient, the quantization step corresponding to the current model to be processed and the learning rate corresponding to the current model to be processed; and S5, adjusting the weight corresponding to the current model to be processed based on the updated quantization step size.

Optionally, the determining module is further configured to determine a first loss according to the prediction classification result and the type of the training sample image, wherein the first loss is used for characterizing a difference between the prediction classification result and the type of the training sample image; the determination module is further configured to determine the first gradient according to the first loss, the initial weight, and the initial quantization step.

Optionally, the determining module is further configured to determine a third gradient and a fourth gradient, the third gradient being a gradient of the first loss to the initial weight, the fourth gradient being a gradient of the initial weight to the initial quantization step; the determination module is further configured to determine a product of the third gradient and the fourth gradient as the first gradient.

Optionally, the obtaining module is further configured to obtain the reference model; the determination module is further configured to determine the initial weight according to the initial quantization step and a weight of a network layer in the reference model.

Optionally, the determining module is further configured to determine that the initial weight satisfies the following formula:

Optionally, the determining module is further configured to determine that the first loss satisfies the following formula:

wherein L is₁Representing the first loss, n representing the number of training sample images, c representing the number of types of the training sample images, p_c(X_i) Represents a prediction classification result corresponding to the ith training sample image included in the prediction classification result, q_c(X_i') indicates the type corresponding to the i-th training sample image in the types of the training sample images, i ≧ 1.

Optionally, the determining module is further configured to determine that the updated quantization step size satisfies the following formula:

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, which may include: a processor and a memory configured to store processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method of quantifying of any of the above first aspects, optionally a neural network.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having instructions stored thereon, which, when executed by an electronic device, enable the electronic device to perform any one of the above-mentioned quantization methods of an optional neural network of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when run on an electronic device, cause the electronic device to perform a method of quantifying optionally a neural network as in any one of the first aspects.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

based on any one of the above aspects, in the present disclosure, the electronic device may obtain a training sample image and a type of the training sample image, and input the training sample image to the first model to obtain a prediction classification result; then, the electronic device determines a first gradient based on the prediction classification result, the type of the training sample image, the initial weight, and the initial quantization step; and the electronic device updates the initial quantization step based on the first gradient, the initial quantization step and the learning rate corresponding to the first model, and adjusts the weight in the first model based on the updated quantization step to obtain a quantized model. In the embodiment of the disclosure, the electronic device may automatically update the initial quantization step size based on the classification result (i.e., the predicted classification result) of the first model and the real classification result (i.e., the type of the training sample image), and then adjust (or update) the weight in the first model to obtain the quantized model, which can reasonably and effectively determine the quantized model, i.e., improve the prediction accuracy of the quantized model while ensuring that the quantized model has a small volume.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a schematic flow chart illustrating a quantization method of a neural network according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart illustrating a quantization method of a neural network provided by an embodiment of the present disclosure;

fig. 3 is a schematic flow chart diagram illustrating a quantization method of a neural network provided by an embodiment of the present disclosure;

fig. 4 is a flowchart illustrating a quantization method of a neural network provided by an embodiment of the present disclosure;

fig. 5 is a flowchart illustrating a quantization method of a neural network according to an embodiment of the present disclosure;

fig. 6 is a flowchart illustrating a quantization method of a neural network according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram illustrating a quantization apparatus of a neural network according to an embodiment of the present disclosure;

fig. 8 shows a schematic structural diagram of a quantization apparatus of a neural network provided in an embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components.

The data to which the present disclosure relates may be data that is authorized by a user or sufficiently authorized by parties.

As described in the background art, in the prior art, the quantization of the original neural network model may not be able to accurately and reasonably determine the quantized neural network model, and thus the prediction result (of the quantized neural network model) may be inaccurate.

Based on this, the embodiment of the present disclosure provides a quantization method for a neural network, where an electronic device may obtain a training sample image and a type of the training sample image, and input the training sample image to a first model to obtain a prediction classification result; then, the electronic device determines a first gradient based on the prediction classification result, the type of the training sample image, the initial weight, and the initial quantization step; and the electronic device updates the initial quantization step based on the first gradient, the initial quantization step and the learning rate corresponding to the first model, and adjusts the weight in the first model based on the updated quantization step to obtain a quantized model. In the embodiment of the disclosure, the electronic device may automatically update the initial quantization step size based on the classification result (i.e., the predicted classification result) of the first model and the real classification result (i.e., the type of the training sample image), and then adjust (or update) the weight in the first model to obtain the quantized model, which can reasonably and effectively determine the quantized model, i.e., improve the prediction accuracy of the quantized model while ensuring that the quantized model has a small volume.

The neural network quantization method, the neural network quantization device, the electronic device and the storage medium provided by the embodiment of the disclosure are applied to a scene in which a certain neural network model needs to be quantized. When the electronic device acquires the first model, the training sample image, and the type of the training sample image, a quantized model may be obtained according to the method provided by the embodiment of the present disclosure.

The quantization method of the neural network provided by the embodiment of the present disclosure is exemplarily described below with reference to the accompanying drawings:

it is understood that the electronic device executing the quantization method of the neural network provided by the embodiment of the present disclosure may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR), a Virtual Reality (VR) device, and the like, which may install and use a content community application (e.g., a fast hand), and the present disclosure does not particularly limit the specific form of the electronic device. The system can be used for man-machine interaction with a user through one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, voice interaction or handwriting equipment and the like.

As shown in fig. 1, a quantization method of a neural network provided by an embodiment of the present disclosure may include S101-S104.

S101, the electronic equipment obtains training sample images and types of the training sample images, inputs the training sample images into a first model, and obtains a prediction classification result.

Wherein the initial weight of the network layer in the first model is determined based on the initial quantization step and the weight of the network layer in the reference model.

It should be understood that the number of the training sample images may be multiple, and in the embodiment of the present disclosure, the electronic device may obtain the multiple training sample images and respective types of the multiple training sample images, that is, different training sample images may correspond to different types, and the type of the training sample image may be understood as a real classification result corresponding to the training sample image.

It is understood that the above reference model is a neural network model which is trained and has high prediction precision (or accuracy). The initial quantization step is a quantization step that quantizes the weights of the network layers in the reference model to the initial weights. The reference model may include a plurality of weights, and the weight of the network layer is one of the plurality of weights, and in the embodiment of the disclosure, the electronic device may determine an initial weight based on the one of the plurality of weights and the initial quantization step, the initial weight being one of the plurality of weights included in the first model.

In the embodiment of the present disclosure, the first model may classify an input image, and specifically, when the electronic device inputs a training sample image to the first model, a classification result of the training sample image in the first model may be obtained.

S102, the electronic equipment determines a first gradient according to the prediction classification result, the type of the training sample image, the initial weight and the initial quantization step.

In combination with the description of the above embodiments, it should be understood that the predicted classification result is a classification result of a training sample image in a first model, the type of the training sample image is a real classification result corresponding to the training sample image, the initial weight is a weight included in the first model, and the initial quantization step is a quantization step for quantizing a weight of a network layer in a reference model into the initial weight. In an embodiment of the disclosure, the electronic device may determine the first gradient according to the prediction classification result, the type of the training sample image, the initial weight, and the initial quantization step.

S103, the electronic device updates the initial quantization step size based on the first gradient, the initial quantization step size and the learning rate corresponding to the first model.

It should be understood that, after determining the first gradient, the electronic device needs to update the initial quantization step size by combining the initial quantization step size and the learning rate corresponding to the first model to obtain an updated quantization step size.

S104, the electronic equipment adjusts the weight in the first model based on the updated quantization step size to obtain a quantized model.

In connection with the above description of the embodiments, it is to be understood that a plurality of weights may be included in the first model, the initial weight being one of the plurality of weights. In the embodiments of the present disclosure, the electronic device may adjust a weight in the first model based on the updated quantization step, for example, may adjust (or update) the initial weight to a weight included in the quantized model. Furthermore, the electronic device may determine each weight included in the quantized model, i.e., obtain the quantized model.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: S101-S104, the electronic equipment can acquire a training sample image and the type of the training sample image, and input the training sample image into a first model to obtain a prediction classification result; then, the electronic device determines a first gradient based on the prediction classification result, the type of the training sample image, the initial weight, and the initial quantization step; and the electronic device updates the initial quantization step based on the first gradient, the initial quantization step and the learning rate corresponding to the first model, and adjusts the weight in the first model based on the updated quantization step to obtain a quantized model. In the embodiment of the disclosure, the electronic device may automatically update the initial quantization step size based on the classification result (i.e., the predicted classification result) of the first model and the real classification result (i.e., the type of the training sample image), and then adjust (or update) the weight in the first model to obtain the quantized model, which can reasonably and effectively determine the quantized model, i.e., improve the prediction accuracy of the quantized model while ensuring that the quantized model has a small volume.

With reference to fig. 1, as shown in fig. 2, in an implementation manner of the embodiment of the present disclosure, the S102 specifically includes S1021-S1022.

And S1021, the electronic equipment determines a first loss according to the prediction classification result and the type of the training sample image.

Wherein the first loss is used to characterize a difference between the prediction classification result and the type of the training sample image.

It should be understood that there may be some difference between the classification result of the training sample image in the first model and the real classification result of the training sample image. In the embodiment of the disclosure, the electronic device may determine the difference (i.e., the first loss) based on the corresponding classification result in the first model and the real classification result, and further determine the first loss.

S1022, the electronic device determines a first gradient according to the first loss, the initial weight, and the initial quantization step.

It is to be understood that, after determining the first loss, the electronic device may determine a first gradient by combining the initial weight and the initial quantization step, where the first gradient may be understood as a gradient of the first loss to the initial quantization step.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: as can be seen from S1021-S1022, the electronic device may determine a first loss according to the prediction classification result and the type of the training sample image, and determine a first gradient according to the first loss, the initial weight, and the initial quantization step. In the embodiment of the disclosure, the electronic device may determine a difference (i.e., a first loss) between a classification result of a training sample image in a first model and a real classification result of the training sample image, and then determine a first gradient based on the first loss, an initial weight, and an initial quantization step size, and may accurately determine the first gradient, thereby improving an update efficiency of the initial quantization step size.

In an implementation manner of the embodiment of the present disclosure, the S1021 specifically includes:

determining that the first loss satisfies the following equation:

It should be understood that the training sample images may be divided into types, and each (e.g., the first) training sample image may correspond to a predictive classification result (i.e., the corresponding classification result in the first model) and a type (i.e., the true classification result), respectively.

Alternatively, p is as defined above_c(X_i) And q is_c(X_i') can be output in the form of softmax, i.e.

Wherein, X_iCan be understood as the predicted classification result, p, corresponding to the ith training sample image included in the predicted classification result_c(X_i) The softmax output value, exp (X), corresponding to the predicted classification result_i) X represents e_iPower of the square, sigma_nexp(X_n) X represents e₁X of power + e₂X of + … + e_nTo the power, n is more than or equal to 1. In the same way, the method for preparing the composite material,

the technical scheme provided by the embodiment can at least bring the following beneficial effects: the electronic device may determine the first loss according to a classification result and a real classification result of each training sample image in the plurality of training sample images in the first model. The determination efficiency of the first loss can be improved, and the determination efficiency of the first gradient is further improved.

With reference to fig. 2, as shown in fig. 3, in an implementation manner of the embodiment of the present disclosure, the S1022 specifically includes S1022a-S1022 b.

S1022a, the electronic device determines a third gradient and a fourth gradient.

Wherein the third gradient is a gradient of the first loss to the initial weight, and the fourth gradient is a gradient of the initial weight to the initial quantization step.

S1022b, the electronics determine a product of the third gradient and the fourth gradient as the first gradient.

In connection with the above description of the embodiments, it is to be understood that the first gradient is a gradient of the first loss versus the initial quantization step.

In one implementation of the disclosed embodiment, the electronic device may determine the first gradient through a gradient estimator, such as a Straight Through Estimator (STE).

Specifically, the first gradient may satisfy the following formula:

wherein L is₁Denotes the first loss, s₁Denotes the initial quantization step size, w₂Denotes an initial weight, w₁Representing the weights of the network layers in the reference model,

a first gradient is represented by a first number of lines,

a third gradient is indicated and is represented,

a fourth gradient is indicated.

It should be understood that the quantization network model (including the first model) is not derivable, and in the disclosed embodiments, STE is introduced, i.e. the gradient included in the reference model (i.e. the gradient included in the reference model is introduced

And

) Approximating the third gradient and the fourth gradient, and determining the first gradient.

In one case, the fourth gradient satisfies the following equation:

it should be understood that the notation "may denote rounding to the nearest integer, e.g.,

is the number of 3, and the number of the carbon atoms is 3,

is 4. Above-mentioned min₁And max₁Respectively represent

Minimum and maximum values of.

In another case, due to the first loss (i.e., L)₁) And the weights of the network layers in the above-mentioned reference model (i.e., w)₁) It is known that, therefore, the third gradient may be determined (or approximated) based on the first loss and the weight, and the product of the third gradient and the fourth gradient may be determined as the first gradient.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: from S1022a-S1022b, the electronic device may determine a third gradient and a fourth gradient, and determine a product of the third gradient and the fourth gradient as the first gradient. In the embodiment of the disclosure, the electronic device may determine the gradient of the first loss to the initial quantization step based on the gradient of the first loss initial weight and the gradient of the initial weight to the initial quantization step, and may accurately and reasonably determine the first gradient, thereby improving the quantization efficiency of the neural network model.

With reference to fig. 1, as shown in fig. 4, in an implementation manner of the embodiment of the present invention, S105-S106 may be further included before S101.

S105, the electronic equipment acquires a reference model.

In connection with the above description of the embodiments, it should be understood that the reference model is a neural network model that has been trained to be highly accurate (or accurate) in prediction.

S106, the electronic equipment determines an initial weight according to the initial quantization step and the weight of the network layer in the reference model.

It should be understood that, when the electronic device obtains the reference model, that is, multiple weights included in the reference model may be obtained, and then the initial weight may be determined according to a certain weight and the initial quantization step size, and similarly, other weights included in the first model may also be determined based on the same manner.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: as can be seen from S105-S106, the electronic device may obtain a reference model and determine an initial weight based on the initial quantization step size and the weights of the network layers in the reference model. In the embodiment of the disclosure, the electronic device may determine the initial weight based on the weights included in the trained neural network model with higher prediction accuracy and the corresponding quantization step (for example, the initial quantization step), and may further determine all the weights included in the first model, that is, may obtain the first model, and may improve the determination efficiency of the first model, and further improve the determination efficiency of the quantized model, and improve user experience.

In an implementation manner of the embodiment of the present disclosure, the step S106 specifically includes:

determining that the initial weight satisfies the following formula:

wherein, w₂Represents the initial weight, w₁Representing the weight of the network layer in the reference model, s₁Represents the initial quantization step size, min represents the minimum value of the initial weight, max represents the maximum value of the initial weight.

It should be understood that clip () (i.e., clip function) is used for the purpose of converting clip ()

Is limited to [ min, max ]]And round [ 2 ]](i.e., round function) and the above

The same effect, i.e. rounding up nearby.

Optionally, the electronic device may also determine (or quantify) an activation value (or input) of the model, i.e., quantify the training sample image, based on the above formula for determining the initial weight. Specifically, in S101, the electronic device may further quantize the training sample image, and input the quantized training sample image to the first model to obtain the prediction classification result.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: the electronic equipment can determine the initial weight based on the initial quantization step length and the weight of the network layer in the reference model and by combining the clip function and the round function, the initial weight can be reasonably and accurately determined, and then all the weights in the first model are determined, and the accuracy of determining the quantized model is further improved.

In an implementation manner of the embodiment of the present disclosure, the step S103 specifically includes:

determining that the updated quantization step size satisfies the following formula:

The technical scheme provided by the embodiment can at least bring the following beneficial effects: the electronic device can determine an updated quantization step based on the initial quantization step, the learning rate corresponding to the first model, the first gradient and the formula, and can reasonably and effectively update the related quantization step, thereby reasonably adjusting the weight in the first model.

With reference to fig. 1, as shown in fig. 5, in an implementation manner of the embodiment of the present disclosure, after S104 described above, the method for quantizing a neural network provided by the embodiment of the present disclosure may further include S107.

S107, the electronic equipment executes the first operation until the execution times of the first operation reach an operation threshold value, and a target model is obtained.

Specifically, as shown in FIG. 6, the first operation includes S1-S5.

S1, the electronic equipment acquires the target image and the type of the target image.

It should be understood that the type of the target image is the real classification result corresponding to the target image.

And S2, the electronic equipment respectively inputs the target image into the current model to be processed and the reference model to respectively obtain a first classification result and a second classification result.

When the first operation is executed for the first time, the current model to be processed is the quantized model. When the first operation is not executed for the first time, the current model to be processed is the current model to be processed after the last first operation is executed.

It is understood that the current model to be processed may be the quantized model in S104, or may be a model obtained by continuously quantizing the quantized model (i.e., a quantized model obtained after performing at least one first operation). The first classification result is a prediction result corresponding to the target image in the current model to be processed, the second classification result is a prediction result corresponding to the target image in the reference model, and the reference model is a trained neural network model with higher prediction precision (or accuracy).

And S3, the electronic equipment determines a second gradient according to the first classification result, the second classification result, the type of the target image, the weight corresponding to the current model to be processed and the quantization step corresponding to the current model to be processed.

In an implementation manner of the embodiment of the present disclosure, the step S3 may specifically include a step a to a step D.

And step A, the electronic equipment determines a second loss according to the first classification result and the type of the target image.

Wherein the second loss is used to characterize a difference between the first classification result and the type of the target image.

And step B, the electronic equipment determines a third loss according to the first classification result and the second classification result.

Wherein the third penalty is used to characterize a difference between the first classification result and the second classification result.

It should be noted that the process of determining the second loss and the third loss by the electronic device is the same as or similar to the process of determining the first loss, and for the description of determining the second loss and the third loss, reference may be made to the explanation of determining the first loss, and details are not repeated here.

And step C, the electronic equipment determines a target loss based on the second loss and the third loss.

In one implementation of the disclosed embodiment, the target loss satisfies the following equation:

L＝w₃×L₂+w₄×L₃

wherein L represents the target loss, w₃A weight parameter, L, representing the second loss₂Represents the second loss, w₄A weight parameter, L, representing the third loss₃Represents the third loss, w₃＞0，w₄＞0。

It is to be understood that the electronic device may assign a weight parameter to each of the second loss and the third loss, and then determine the target loss based on the two losses and the two weight parameters.

Alternatively, w is₃And w₄May be 0.5.

And D, the electronic equipment determines a second gradient according to the target loss, the weight corresponding to the current model to be processed and the quantization step corresponding to the current model to be processed.

It should be noted that the process of determining the second gradient by the electronic device is the same as or the same as the process of determining the first gradient, and for explanation of determining the second gradient by the electronic device, reference may be made to the description of determining the first gradient by the electronic device, and details are not described here again.

And S4, the electronic device updates the quantization step corresponding to the current model to be processed based on the second gradient, the quantization step corresponding to the current model to be processed and the learning rate corresponding to the current model to be processed.

And S5, the electronic equipment adjusts the weight corresponding to the current model to be processed based on the updated quantization step size.

It is understood that the processes in S4-S5 are the same as or similar to the processes in S103-S104, and are not described in detail herein.

It should be understood that, the electronic device performs the above-mentioned first operation once, that is, updates the quantization step corresponding to the current model to be processed once, and adjusts the weight corresponding to the current model to be processed once, so as to complete the quantization process of the current model to be processed once. When the number of times of execution of the first operation reaches the operation threshold (for example, 100 times), it is indicated that the current model to be processed is quantized (or iterated) 100 times from the quantization of the first time, and thus, the target quantization model, i.e., the target model, can be obtained.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: as known from S107, the electronic device may execute the first operation until the number of times of executing the first operation reaches the operation threshold, so as to obtain the target model. The first operation comprises; the electronic equipment acquires a target image and the type of the target image, and inputs the target image into a current model to be processed and a reference model respectively to obtain a first classification result and a second classification result respectively; the electronic equipment determines a second gradient according to the first classification result, the second classification result, the type of the target image, the weight corresponding to the current model to be processed and the quantization step length corresponding to the current model to be processed; then, based on the second gradient, the quantization step corresponding to the current model to be processed and the learning rate corresponding to the current model to be processed update the quantization step corresponding to the current model to be processed; and adjusting the weight corresponding to the current model to be processed based on the updated quantization step. In the embodiment of the disclosure, the electronic device may automatically update the quantization step corresponding to the current model to be processed based on the prediction result of the target image in the model to be processed (i.e., the first classification result), the prediction result of the target image in the reference model (i.e., the second classification result), the true classification result corresponding to the target image (i.e., the type of the target image), and the like, and adjust (or update) the weight corresponding to the current model to be processed, thereby completing the relevant quantization process when the first operation is executed (or iterated) to the operation threshold, and obtaining the target model. The quantized target model can be accurately determined, namely the volume of the model can be reasonably and effectively reduced, and the prediction precision of the target model is improved.

It is understood that, in practical implementation, the electronic device according to the embodiments of the present disclosure may include one or more hardware structures and/or software modules for implementing the quantization method of the corresponding neural network, and these hardware structures and/or software modules may constitute an electronic device. Those of skill in the art will readily appreciate that the present disclosure can be implemented in hardware or a combination of hardware and computer software for implementing the exemplary algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

Based on such understanding, the embodiment of the present disclosure further provides a quantization apparatus of a neural network, and fig. 7 illustrates a schematic structural diagram of the quantization apparatus of the neural network provided by the embodiment of the present disclosure. As shown in fig. 7, the quantization apparatus 10 of the neural network may include: an acquisition module 101, a processing module 102 and a determination module 103.

An acquisition module 101 configured to acquire a training sample image and a type of the training sample image.

And the processing module 102 is configured to input the training sample image into a first model, and obtain a prediction classification result, wherein the initial weight of the network layer in the first model is determined based on the initial quantization step size and the weight of the network layer in the reference model.

A determining module 103 configured to determine a first gradient according to the prediction classification result, the type of the training sample image, the initial weight, and the initial quantization step.

The processing module 102 is further configured to update the initial quantization step based on the first gradient, the initial quantization step, and a learning rate corresponding to the first model.

The processing module 102 is further configured to adjust weights in the first model based on the updated quantization step to obtain a quantized model.

Optionally, the processing module 102 is further configured to execute the first operation until the number of times of executing the first operation reaches an operation threshold, so as to obtain the target model. The first operation includes the following S1-S5:

s1, acquiring the target image and the type of the target image.

S2, inputting the target image into the current to-be-processed model and the reference model respectively, and obtaining a first classification result and a second classification result respectively, where the current to-be-processed model is the quantized model when the first operation is performed for the first time, and the current to-be-processed model is the current to-be-processed model after the last first operation is performed when the first operation is not performed for the first time.

S3, determining a second gradient according to the first classification result, the second classification result, the type of the target image, the weight corresponding to the current model to be processed and the quantization step corresponding to the current model to be processed.

S4, updating the quantization step corresponding to the current model to be processed based on the second gradient, the quantization step corresponding to the current model to be processed, and the learning rate corresponding to the current model to be processed.

And S5, adjusting the weight corresponding to the current model to be processed based on the updated quantization step size.

Optionally, the determining module 103 is further configured to determine a first loss according to the prediction classification result and the type of the training sample image, wherein the first loss is used for characterizing a difference between the prediction classification result and the type of the training sample image.

A determining module 103, further configured to determine the first gradient according to the first loss, the initial weight and the initial quantization step.

Optionally, the determining module 103 is further configured to determine a third gradient and a fourth gradient, the third gradient being a gradient of the first loss to the initial weight, the fourth gradient being a gradient of the initial weight to the initial quantization step.

A determination module 103 further configured to determine a product of the third gradient and the fourth gradient as the first gradient.

Optionally, the obtaining module 101 is further configured to obtain the reference model.

The determining module 103 is further configured to determine the initial weight according to the initial quantization step and the weight of the network layer in the reference model.

Optionally, the determining module 103 is further configured to determine that the initial weight satisfies the following formula:

Optionally, the determining module 103 is further configured to determine that the first loss satisfies the following formula:

Optionally, the determining module 103 is further configured to determine that the updated quantization step size satisfies the following formula:

As described above, the present disclosure may perform functional block division on a quantization apparatus of a neural network according to the above method example. The integrated module can be realized in a hardware form, and can also be realized in a software functional module form. In addition, it should be further noted that the division of the modules in the embodiments of the present disclosure is schematic, and is only a logic function division, and there may be another division manner in actual implementation. For example, the functional blocks may be divided for the respective functions, or two or more functions may be integrated into one processing block.

The specific manner in which each module performs the operation and the beneficial effects of the quantization apparatus for neural network in the foregoing embodiments have been described in detail in the foregoing method embodiments, and are not described again here.

Fig. 8 is a schematic structural diagram of another quantization apparatus of a neural network provided by the present disclosure. As shown in fig. 8, the quantization apparatus 20 of the neural network may include at least one processor 201 and a memory 203 for storing processor-executable instructions. Wherein the processor 201 is configured to execute the instructions in the memory 203 to implement the quantization method of the neural network in the above-described embodiments.

In addition, the quantization device 20 of the neural network may further include a communication bus 202 and at least one communication interface 204.

The processor 201 may be a Central Processing Unit (CPU), a micro-processing unit, an ASIC, or one or more integrated circuits for controlling the execution of programs according to the present disclosure.

The communication bus 202 may include a path that conveys information between the aforementioned components.

The communication interface 204 may be any device, such as a transceiver, for communicating with other devices or communication networks, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.

The memory 203 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these. The memory may be self-contained and connected to the processing unit by a bus. The memory may also be integrated with the processing unit.

The memory 203 is used for storing instructions for executing the disclosed solution, and is controlled by the processor 201. The processor 201 is configured to execute instructions stored in the memory 203 to implement the functions of the disclosed method.

In particular implementations, processor 201 may include one or more CPUs such as CPU0 and CPU1 in fig. 8 for one embodiment.

In a specific implementation, the quantization apparatus 20 of the neural network may include a plurality of processors, such as the processor 201 and the processor 207 in fig. 8, as an embodiment. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In a specific implementation, the quantization apparatus 20 of the neural network may further include an output device 205 and an input device 206, as an embodiment. The output device 205 is in communication with the processor 201 and may display information in a variety of ways. For example, the output device 205 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 206 is in communication with the processor 201 and can accept user input in a variety of ways. For example, the input device 206 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.

Those skilled in the art will appreciate that the architecture shown in fig. 8 does not constitute a definition of the quantification means of the neural network, and may include more or fewer components than those shown, or combine certain components, or adopt a different arrangement of components.

In addition, the present disclosure also provides a computer-readable storage medium including instructions that, when executed by a processor, cause the processor to perform the quantization method of a neural network as provided in the above embodiments.

Additionally, the present disclosure also provides a computer program product comprising instructions which, when executed by a processor, cause the processor to perform the quantization method of the neural network as provided in the above embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method of quantifying a neural network, comprising:

acquiring a training sample image and the type of the training sample image, and inputting the training sample image into a first model to obtain a prediction classification result, wherein the initial weight of a network layer in the first model is determined based on an initial quantization step and the weight of the network layer in a reference model;

determining a first gradient according to the prediction classification result, the type of the training sample image, the initial weight and the initial quantization step length;

updating the initial quantization step based on the first gradient, the initial quantization step and a learning rate corresponding to the first model;

and adjusting the weight in the first model based on the updated quantization step size to obtain a quantized model.

2. The method of quantifying neural networks of claim 1, further comprising:

executing a first operation until the execution times of the first operation reach an operation threshold value to obtain a target model; the first operation includes:

acquiring a target image and the type of the target image;

respectively inputting the target image into a current model to be processed and the reference model to respectively obtain a first classification result and a second classification result, wherein when the first operation is executed for the first time, the current model to be processed is the quantized model, and when the first operation is not executed for the first time, the current model to be processed is the current model to be processed after the last first operation is executed;

determining a second gradient according to the first classification result, the second classification result, the type of the target image, the weight corresponding to the current model to be processed and the quantization step length corresponding to the current model to be processed;

updating the quantization step corresponding to the current model to be processed based on the second gradient, the quantization step corresponding to the current model to be processed and the learning rate corresponding to the current model to be processed;

and adjusting the weight corresponding to the current model to be processed based on the updated quantization step length.

3. The method of claim 1, wherein determining a first gradient according to the prediction classification result, the type of the training sample image, the initial weight, and the initial quantization step comprises:

determining a first loss according to the prediction classification result and the type of the training sample image, wherein the first loss is used for representing the difference between the prediction classification result and the type of the training sample image;

determining the first gradient according to the first loss, the initial weight and the initial quantization step size.

4. The method of claim 3, wherein determining the first gradient according to the first loss, the initial weight, and the initial quantization step comprises:

determining a third gradient and a fourth gradient, wherein the third gradient is the gradient of the first loss to the initial weight, and the fourth gradient is the gradient of the initial weight to the initial quantization step;

determining a product of the third gradient and the fourth gradient as the first gradient.

5. The neural network quantification method according to any one of claims 1 to 4, wherein the method further comprises:

acquiring the reference model;

and determining the initial weight according to the initial quantization step size and the weight of the network layer in the reference model.

6. The quantization method of neural networks, as claimed in claim 5, wherein said determining the initial weights according to the initial quantization step and weights of network layers in the reference model comprises:

determining that the initial weight satisfies the following formula:

wherein, w₂Represents said initial weight, w₁Weights, s, representing network layers in the reference model₁Represents the initial quantization step size, min represents the minimum value of the initial weight, and max represents the maximum value of the initial weight.

7. An apparatus for quantizing a neural network, comprising: the device comprises an acquisition module, a processing module and a determination module;

the acquisition module is configured to acquire a training sample image and a type of the training sample image;

the processing module is configured to input the training sample image into a first model to obtain a prediction classification result, wherein the initial weight of a network layer in the first model is determined based on an initial quantization step size and the weight of the network layer in a reference model;

the determination module is configured to determine a first gradient according to the prediction classification result, the type of the training sample image, the initial weight, and the initial quantization step;

the processing module is further configured to update the initial quantization step based on the first gradient, the initial quantization step, and a learning rate corresponding to the first model;

the processing module is further configured to adjust weights in the first model based on the updated quantization step to obtain a quantized model.

8. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory configured to store the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of quantifying a neural network of any one of claims 1-6.

9. A computer-readable storage medium having instructions stored thereon, wherein the instructions in the computer-readable storage medium, when executed by an electronic device, enable the electronic device to perform a method of quantifying a neural network as defined in any one of claims 1-6.

10. A computer program product, characterized in that it comprises computer instructions which, when run on an electronic device, cause the electronic device to perform the method of quantification of a neural network according to any one of claims 1 to 6.