CN117951605A

CN117951605A - Quantization method and device for diffusion model, computer equipment and storage medium

Info

Publication number: CN117951605A
Application number: CN202410347877.1A
Authority: CN
Inventors: 朱克峰; 黄伟; 戴钰桀; 李兵兵; 宿栋栋; 王彦伟
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2024-03-26
Filing date: 2024-03-26
Publication date: 2024-04-30
Anticipated expiration: 2044-03-26
Also published as: CN117951605B

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a quantization method, a quantization device, computer equipment and a storage medium of a diffusion model, wherein the method comprises the following steps: carrying out quantization treatment on the original diffusion model to generate a corresponding basic quantization model; performing iterative loop on the diffusion model and the basic quantization model, and determining a single-step quantization error corresponding to each iterative step; determining the sampling number corresponding to each iteration step according to the single-step quantization error; sampling according to the sampling quantity to obtain calibration data of each iteration step; and performing iterative loop again on the diffusion model and the basic quantization model by taking the calibration data of each iterative step as input data, and optimizing the weight of the basic quantization model with the aim of minimizing the error between the diffusion model and the basic quantization model to generate a weight-calibrated quantization model. The method has small calculated amount, can reduce quantization errors introduced by each iteration step, and can ensure the generation performance of the weight calibration quantization model.

Description

Quantization method and device for diffusion model, computer equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a quantization method, a quantization device, computer equipment and a storage medium of a diffusion model.

Background

The Diffusion Model (Diffusion Model) has remarkable effects on the aspects of generating effect, training stability and the like, and has become a new industry mainstream beyond GAN (GENERATIVE ADVERSARIAL Network to generate an countermeasure Network). As a flexible generation model, the diffusion model shows excellent performance and effect under various application scenes, including image oversampling, drawing, graph generation, picture-to-picture translation and the like; for example, the diffusion model may generate image parties with high diversity and high fidelity.

However, diffusion models also face challenges. Because of the characteristics of the diffusion model, the generation process of the diffusion model needs long noise estimation iteration, 50-1000 steps are often needed, and the generation speed of the diffusion model is low.

The model quantization can accelerate the model, so that the model calculation is accelerated, but after the diffusion model is quantized, the generation effect of the diffusion model is often deteriorated, and the performance of the diffusion model is affected.

Disclosure of Invention

In view of the above, the present invention provides a method, apparatus, computer device and storage medium for quantifying a diffusion model, so as to solve the problem of poor effect after quantifying the diffusion model.

In a first aspect, the present invention provides a method for quantifying a diffusion model, including:

carrying out quantization treatment on the original diffusion model to generate a corresponding basic quantization model;

Performing iterative loop on the diffusion model and the basic quantization model, and determining a single-step quantization error corresponding to each iterative step; the single-step quantization error is a quantization error between the diffusion model and the base quantization model in a single iteration step;

Determining the sampling number corresponding to each iteration step according to the single-step quantization error; the positive correlation relation is formed between the sampling quantity and the single-step quantization error;

Sampling each iteration step according to the sampling quantity to obtain calibration data of each iteration step;

and taking the calibration data of each iteration step as input data of the corresponding iteration step, executing an iteration loop again on the diffusion model and the basic quantization model, and optimizing the weight of the basic quantization model with the aim of minimizing the error between the diffusion model and the basic quantization model to generate a weight-calibrated quantization model.

In some optional embodiments, the performing an iterative loop on the diffusion model and the basic quantization model to determine a single-step quantization error corresponding to each iteration step includes:

Performing iterative loop on the diffusion model and the basic quantization model, and determining an accumulated quantization error corresponding to each iterative step; the accumulated quantization errors are all quantization errors between the diffusion model and the basic quantization model when the accumulated quantization errors are executed to the corresponding iteration steps;

and taking the difference value between the accumulated quantization errors of two adjacent iteration steps as a single-step quantization error of the corresponding iteration step.

In some optional embodiments, the performing an iterative loop on the diffusion model and the basic quantization model to determine a cumulative quantization error corresponding to each iteration step includes:

performing an iterative loop on the diffusion model and the basic quantization model, and determining a first loss value of the diffusion model and a second loss value of the basic quantization model when iterating to each iteration step;

And taking the error between the second loss value and the first loss value as the accumulated quantization error of the corresponding iteration step.

In some alternative embodiments, the accumulated quantization error satisfies:

；/>；

Wherein, Representing accumulated quantization error of iteration to t iteration step,/>Representing a first loss value of iteration through the t iteration step,/>Representing a second loss value from iteration to the t iteration step; /(I)Representing the weight of the diffusion model,Weights representing the underlying quantization model,/>Representing a loss function,/>And the cumulative mean square error function is represented, and T is the total number of iterative steps.

In some optional embodiments, the determining the number of samples corresponding to each iteration step according to the single-step quantization error includes:

Normalizing the single-step quantization error to determine the sampling weight corresponding to each iteration step;

and determining the sampling quantity of the corresponding iteration step according to the sampling weight of each iteration step.

In some alternative embodiments, the sampling weights satisfy:

；/>；

Wherein, Sampling weight representing the t iteration step,/>Representing the single step quantization error of the t-th iteration step,Representing accumulated quantization errors from iteration to the last T iteration step, wherein T is the total number of iteration steps; and, the difference between the accumulated quantization errors of two adjacent iteration steps is the single-step quantization error of the corresponding iteration step.

In some alternative embodiments, the determining the number of samples of each iteration step according to the sampling weight of each iteration step includes:

presetting a sample set size N of a check data sample set;

determining the sampling number of each iteration step: ；/>；

Where n _t represents the number of samples of the t-th iteration step, Sampling weight representing the t iteration step,/>T is the total number of iterative steps, which is the rounding function.

In some optional embodiments, the sampling each iteration step according to the sampling number to obtain calibration data of each iteration step includes:

Sampling the middle input data of each iteration step, and randomly sampling n _t data samples in the t iteration step; wherein n _t represents the number of samples of the t-th iteration step;

And taking the n _t data samples as calibration data of the t iteration step.

In some alternative embodiments, the optimizing the weights of the base quantization model with the objective of minimizing the error between the diffusion model and the base quantization model includes:

And taking the error between the diffusion model and the basic quantization model as a target, and determining the optimized weight of the basic quantization model by actively rounding the weight of the diffusion model.

In some optional embodiments, the determining the optimized weight of the basic quantization model by actively rounding the weight of the diffusion model with the objective of minimizing the error between the diffusion model and the basic quantization model includes:

adding continuous variables to be optimized for the rounding mode of the weight of the diffusion model, and representing the conversion relation between the weight of the diffusion model and the undetermined weight of the quantization model according to the continuous variables;

determining a corresponding continuous variable based on a first objective function targeting minimizing the product of the weight error and the input data; the weight error is the difference between the weight of the diffusion model and the undetermined weight of the quantization model;

And determining the optimized weight of the basic quantization model according to the determined continuous variable and the conversion relation.

In some alternative embodiments, the conversion relationship satisfies:

；

Wherein, Weights representing the diffusion model,/>Pending weights representing quantization models,/>For the continuous variable to be optimized, s is the quantization scale,/>As a function of limiting the variable x to a minimum lower limit value a and a maximum upper limit value b,/>As a rounding function,/>To map the variable to a monotonic function of the desired range, q _min represents the minimum cutoff threshold for the quantized range and q _max represents the maximum cutoff threshold for the quantized range.

In some of the alternative embodiments of the present invention,As a round-down function,/>A monotonic function mapping variables to 0 to 1; or/>As a round-up function,/>To map variables to monotonic functions of-1 to 0.

In some alternative embodiments, the first objective function satisfies:

；

Wherein, Input data representing the diffusion model,/>Weights representing the diffusion model,/>Pending weights representing quantization models,/>For continuous variables to be optimized,/>Representing the second order norm,/>Represents a regularization term that performs 0 or 1 on the continuous variable V convergence, and λ represents a regularization term coefficient.

In some alternative embodiments, after the generating the weight-calibrated quantization model, the method further comprises:

and calibrating an activation function in the weight calibrated quantization model to generate an activation calibrated quantization model.

In some optional embodiments, the calibrating the activation function in the weight calibrated quantization model includes:

Optimizing the continuous variable again based on a second objective function; the second objective function is an objective function which aims at minimizing the difference between the first activation item and the second activation item; the first activation term is a result of nonlinear transformation of a product between a weight of the diffusion model and input data by using an activation function, and the second activation term is a result of nonlinear transformation of a product between a pending weight of the quantization model and the input data by using an activation function;

and determining the weight of the weight-calibrated quantization model after optimization according to the redetermined continuous variable and the conversion relation.

In some alternative embodiments, the second objective function satisfies:

；

Wherein, Input data representing the diffusion model,/>Weights representing the diffusion model,/>Input data representing a quantization model,/>Pending weights representing quantization models,/>To activate the function,/>For continuous variables to be optimized,/>Representing the second order norm,/>A canonical term representing execution of 0 or 1 for convergence of the continuous variable V,/>Representing the regularized term coefficients.

In a second aspect, the present invention provides a quantization apparatus of a diffusion model, including:

the quantization module is used for carrying out quantization processing on the original diffusion model to generate a corresponding basic quantization model;

the error determining module is used for executing an iteration loop on the diffusion model and the basic quantization model and determining a single-step quantization error corresponding to each iteration step; the single-step quantization error is a quantization error between the diffusion model and the base quantization model in a single iteration step;

The quantity determining module is used for determining the sampling quantity corresponding to each iteration step according to the single-step quantization error; the positive correlation relation is formed between the sampling quantity and the single-step quantization error;

The sampling module is used for sampling each iteration step according to the sampling quantity to obtain the calibration data of each iteration step;

And the weight calibration module is used for taking the calibration data of each iteration step as the input data of the corresponding iteration step, executing the iteration loop again on the diffusion model and the basic quantization model, aiming at minimizing the error between the diffusion model and the basic quantization model, optimizing the weight of the basic quantization model, and generating a weight-calibrated quantization model.

In a third aspect, the present invention provides a computer device comprising: the device comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions, so that the quantization method of the diffusion model of the first aspect or any corresponding implementation mode is executed.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method for quantifying a diffusion model according to the first aspect or any of the embodiments corresponding thereto.

In a fifth aspect, the present invention provides a computer program product comprising computer instructions for causing a computer to perform the method of quantifying a diffusion model of the first aspect or any of its corresponding embodiments described above.

According to the invention, the basic quantization model corresponding to the diffusion model is firstly constructed, the single-step quantization error introduced by each iteration step is determined through an iteration loop, and then the calibration data with the quantity matched with the single-step quantization error is obtained through sampling, the weight of the basic quantization model is calibrated based on the calibration data of each iteration step, so that the calculated quantity is small, the quantization error introduced by each iteration step can be reduced, the quantization model calibrated by the weight is more similar to the original diffusion model, and the generation performance of the quantization model calibrated by the weight can be ensured.

By first determining the accumulated quantization error for each iteration step, the single-step quantization error introduced by each iteration step alone can be obtained simply and conveniently. And taking the accumulated quantization error of the last iteration step as a denominator, and carrying out normalization processing on the single-step quantization error, so that the sum of sampling weights of all the iteration steps is 1, and the total number of sampled calibration data is convenient to be consistent with the size of a sample set. The calibration can be realized relatively simply in an active rounding mode; and based on the first objective function introducing continuous variables, the complexity of optimizing and solving can be reduced, and the finally optimized quantization model can be obtained more quickly.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the related art, the drawings that are required to be used in the description of the embodiments or the related art will be briefly described, and it is apparent that the drawings in the description below are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.

FIG. 1 is a flow chart of a method for quantifying a diffusion model according to an embodiment of the present invention;

FIG. 2 is a flow chart of another method for quantization of a diffusion model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a cumulative quantization error distribution according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a normalized quantization error distribution according to an embodiment of the present invention;

FIG. 5 is a flow chart of a quantization method of a diffusion model according to an embodiment of the present invention;

Fig. 6 is a block diagram of a quantization apparatus of a diffusion model according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The diffusion model has a better generation effect than the GAN model, but the diffusion model has a slower generation speed. For example, GAN models often take less than 1 second to generate, while diffusion models often take several seconds to complete the generation of a picture. Therefore, how to increase the generation speed becomes a bottleneck and challenge for the diffusion model to expand its application.

The existing related work is mostly realized by adopting a mode of reducing the iteration steps. However, this approach ignores an important factor, namely that in each step of noise estimation, it is computationally and memory intensive. This is a factor orthogonal to the iteration step number, which affects not only the reasoning (generation) speed of the diffusion model, but also brings about a larger memory footprint, compared to the way in which the iteration step number is reduced.

Compression of the model becomes another path for accelerating computation, and compression is generally achieved through model quantization. Model quantization refers to a process of approximating a continuous value (floating point number) in a deep learning model to a finite plurality of discrete values (fixed point number). Post-training quantization (PTQ, post-Training Quantization), for example, is a type of quantization method that can be applied directly without retraining. Post-training quantization techniques have been applied to classification models, object detection models, and the like, and can be said to be an indispensable technique in model compression because they require little training data and can be directly applied and deployed on real hardware devices.

Currently, post-training quantization techniques (e.g., PTQ4 DM) have achieved model quantization to 8bit discrete values, but have the limitation of being applied only to small data sets and lower resolutions, which for iterative (sequential execution) diffusion models, pose challenges for post-training quantization techniques due to the structure of the diffusion model.

Specifically, the existing model quantization strategy is applied to the diffusion model, so that the effect of reducing the calculated amount is obvious, and the single-step reasoning calculated amount of the diffusion model can be effectively reduced. However, the existing quantization method without training is mainly aimed at models such as convolutional neural networks, and the like, and the existing quantization method is directly applied to a diffusion model, so that an ideal generation effect cannot be obtained.

This is mainly due to the quantization error accumulation increase due to the depth iteration of the diffusion model, as analyzed. Specifically, quantization errors are continuously superimposed along with the iterative process, so that accumulated quantization errors are continuously increased; particularly, with increasing quantization intensity, when the int4 type is adopted to quantize the diffusion model, the accumulated quantization error can be obviously increased with the progress of iteration, and the rapid decrease of the generation effect is directly caused. Therefore, the quantization error in each diffusion model iteration step is estimated and reduced, which is important for maintaining the model performance after quantization.

The embodiment of the invention provides a quantization method of a diffusion model, which is used for determining quantization errors independently introduced by the diffusion model and the quantization model in each iteration step, sampling corresponding quantity of calibration data based on the quantization errors, realizing weight calibration in each iteration step, reducing the quantization errors introduced by each iteration step and ensuring the generation performance of the quantized model, and has the advantages of small calculated quantity.

According to an embodiment of the present invention, there is provided a quantization method embodiment of a diffusion model, it being noted that the steps shown in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases the steps shown or described may be performed in an order different from that herein.

The embodiment provides a quantization method of a diffusion model, which can be applied to equipment needing model training, such as cloud platform products, acceleration cards and the like related to an FPGA. Fig. 1 is a flowchart of a quantization method of a diffusion model according to an embodiment of the present invention, as shown in fig. 1, the flowchart including the following steps.

Step S101, carrying out quantization processing on an original diffusion model to generate a corresponding basic quantization model.

In this embodiment, in any scene with a generation requirement, a corresponding diffusion model may be constructed, where the diffusion model is an original pre-training model, and weights of the diffusion model are all floating-point (such as FP 32); for example, the diffusion model may be used to generate images, etc. And, the diffusion model may be converted into a corresponding quantization model, i.e., a base quantization model, based on a model quantization technique. For example, the weights in the base quantization model may be linearly quantized based on the weights of the diffusion model; for example, the weights of the underlying quantization model may be discrete values of int 8.

Specifically, based on model quantization, the continuous values (floating point numbers) in the original diffusion model can be approximated to a finite number of discrete values (fixed point numbers). A typical quantization process is shown in the following formula (1):

（1）

Wherein, Weights representing diffusion model,/>Weights representing the underlying quantization model, s being the quantization scale (also called scaling scale),/>To limit the variable x to a function between a minimum lower limit value a and a maximum upper limit value b, in particular,/>。/>For the rounding function, specifically, an up-rounding function, a down-rounding function, a neighbor rounding function (e.g., rounding function), etc., q _min represents the minimum truncation threshold of the quantization range, and q _max represents the maximum truncation threshold of the quantization range.

Where the quantization range is determined by a minimum truncation threshold q _min and a maximum truncation threshold q _max, key tradeoffs set by the quantization range include truncation errors and rounding errors, and their impact on the loss of the final quantizer. Model weights can typically be quantified directly, however, determining the activated quantization parameters typically requires several calibration data.

For example, the quantized truncation threshold q _min、q_max may be determined according to a simple minimum maximum (Min-Max) method, which may effectively avoid truncation errors. For example, the number of the cells to be processed,，/>. This approach may be more sensitive to outlier data, which ultimately tends to result in larger rounding errors.

In addition, the cut-off threshold value can be determined in a mode of minimizing errors, and rounding errors caused by abnormal data can be effectively relieved. Alternatively, the quantized truncation threshold may be determined by a moving average maximum and minimum method, a KL divergence sampling method, or the like, and the method for determining the truncation threshold is not limited in this embodiment.

For example, KL divergence is generally used to measure similarity between two distributions, and the KL divergence sampling method can be expressed as: . Where the P and Q distributions represent pre-quantization weight (e.g., FP32 data) distribution and post-quantization weight (e.g., int8 data) distribution. Note that the above equation requires P, Q that the two statistical histograms are the same length.

In the present embodiment, the aboveAnd/>Are all in the form of a matrix comprising a plurality of weight values. Weight/>, based on formula (1) above, on diffusion modelCompressing to determine the weight/>, of the quantization modelThereby constructing a quantization model corresponding to the diffusion model, namely a basic quantization model.

Step S102, performing iterative loop on the diffusion model and the basic quantization model, and determining a single-step quantization error corresponding to each iterative step; the single-step quantization error is the quantization error between the diffusion model and the base quantization model in a single iteration step.

In this embodiment, the basic quantization model is a compressed diffusion model, i.e. it is also a diffusion model in nature. Since there is a difference in weight between the diffusion model and the basic quantization model, and the basic quantization model is obtained by quantizing (compressing) the diffusion model, quantization errors between the diffusion model and the basic quantization model may be caused by quantizing the original diffusion model. To reduce quantization errors introduced by iteration, the present embodiment determines quantization errors introduced in separate iteration steps, i.e., a single step quantization error for each iteration step needs to be determined.

Specifically, in the process of executing an iterative loop by the diffusion model and the basic quantization model, in a separate t-th iteration step, a certain quantization error exists between the diffusion model and the basic quantization model due to quantization processing, and the quantization error is taken as a single-step quantization error of the t-th iteration step. Wherein, a certain quantization error is introduced in each iteration step, and no intersection exists between single-step quantization errors introduced by different iteration steps.

The original diffusion model and the basic quantization model are only different in weight, and the total number of iteration steps of the original diffusion model and the basic quantization model is the same, for example, the total number of iteration steps of the original diffusion model and the basic quantization model is T, namely, the original diffusion model and the basic quantization model are subjected to reasoning iteration of T rounds, and finally required data are generated; accordingly, a single-step quantization error introduced by each of the T iteration steps may be determined. Wherein,。

Step S103, determining the sampling number corresponding to each iteration step according to the single-step quantization error; the number of samples is positively correlated with the single step quantization error.

In this embodiment, for the t-th iteration step, the number of calibration data, i.e., the number of samples, required at the t-th iteration step needs to be determined. If the single-step quantization error independently introduced in the t iteration step is larger, the larger the loss of the quantization error caused by the t iteration step is, so that the error of the diffusion model caused by multi-step calculation can be effectively reduced, and relatively more calibration data are distributed for the t iteration step.

Specifically, for any iteration step, the number of samples and the single-step quantization error are in positive correlation, i.e. the larger the single-step quantization error is, the larger the number of samples is determined.

Step S104, each iteration step is sampled according to the sampling quantity, and calibration data of each iteration step are obtained.

In this embodiment, after determining the number of samples in each step, the corresponding iteration step may be sampled to obtain the calibration data required by the iteration step. Wherein the calibration data is data that can represent a true input distribution, which can be used to calibrate the weights in the underlying quantization model. Since the calibration data is part of the input data, subsequent calibrations based on the calibration data can reduce the amount of computation required for single step reasoning.

For example, during an iterative loop of the diffusion model, the output data of each iteration step is also the input data of the next iteration step (the input data of the first iteration step is the initial input data at the beginning); the input data of the t iteration step may be sampled based on the number of samples of the t iteration step to obtain calibration data for the number of samples.

Step S105, the calibration data of each iteration step is used as the input data of the corresponding iteration step, the iteration loop is executed again on the diffusion model and the basic quantization model, the error between the diffusion model and the basic quantization model is minimized, the weight of the basic quantization model is optimized, and the quantization model with the calibrated weight is generated.

In this embodiment, after determining the calibration data required for each iteration step, an iteration loop needs to be executed again on the diffusion model and the basic quantization model, which is different from the iteration loop executed in the step S102 described above in that in the iteration loop of the step S102, the initial input data is the complete initial input data, and in the iteration loop of the step S105, the input data of each iteration step is the calibration data obtained by sampling the corresponding iteration step, so that the amount of calculation required for single-step reasoning can be reduced.

And in the process of each iteration step, the weight of the basic quantization model is optimized by taking the error between the minimum diffusion model and the basic quantization model as a target, so that the weight of the basic quantization model can be determined when the error between the diffusion model and the basic quantization model is minimum, and the determined weight of the basic quantization model is the quantization model obtained by carrying out weight calibration on the original weight of the basic quantization model, namely the weight calibrated quantization model.

Based on the above equation (1), it can be seen that the original weight of the basic quantization model is affectedThe factor of (2) is mainly the rounding mode used. Specifically, the above formula (1) is a conventional quantization method, and the rounding function/>, used in the above formula (1)Typically fixed, such as a neighbor rounding function. Neighbor rounding is more in line with visual cognition, and rounding errors are reduced; however, the neighbor rounding method is not necessarily the optimal choice in actual quantization. To illustrate this problem, we assume a neural network model whose weights are noted/>If a small weight change/>, is introduced for the weightThe weight change is a loss function/>, of the neural networkThe impact of (2) can be expressed as:

（2）

Wherein, For inputting data,/>Is the corresponding label; equation (2) above is a taylor expansion and approximation of the loss function. In the final results obtained, wherein/>And/>The two terms represent the gradient to the weight and the hessian matrix, respectively,/>Represents the desired function, and: /(I)，/>。

Heisen matrixThe joint contribution of the different gradient variations to the loss function is characterized. Based on this, it may be determined that the rounding operation is not an optimal solution in some cases; for convenience of description, it is exemplified by the following simple examples.

It is assumed that the number of the sub-blocks,，/>. Wherein/>Respectively represent weight change/>Is included in the weight of the first and second images.

In the above equation (2), the increase of the loss function due to the gradient change is proportional to the following equation:

（3）

Based on the above formula (3), the first two terms on the right of the equal sign correspond to the elements on the diagonal line in the hessian matrix, respectively, and if only the diagonal line elements are in the hessian matrix, the neighbor rounding operation is optimal. However, in the above example, since the off-diagonal elements are not zero and are positive numbers, then The contribution to the loss function is then positive and not negligible. Therefore, when the hessian matrix off-diagonal element is not zero, neighbor rounding-based quantization is not an optimal method. It was verified that during the quantization of convolutional neural network Resnet, about 50% of the quantization operating neighbor rounding is not optimal, thereby reducing the accuracy of the model. Therefore, how to choose the optimal rounding is critical to the performance of the quantization model.

Weights for diffusion modelWhatever rounding mode is used, i.e. whatever rounding function/>In particular, the weight/>, based on the above formula (1)Linear quantization is carried out, and the obtained weight/>, of the basic quantization modelMust be the minimum integer value/>Or a maximum integer value/>; Wherein the minimum integer value/>By weighting/>Performing a downward rounding (i.e., rounding function/>For a round-down function), and a maximum integer value/>By weighting/>Performing a round-up (i.e. rounding function/>A round-up function), the weights of the underlying quantization modelThe range of the values is as follows: /(I)。

At this time, the objective function of minimizing the error between the diffusion model and the basic quantization model in step S105 described above may be expressed as:

（4）

Wherein the weight changes The method comprises the following steps: /(I)。/>Input data for diffusion model,/>Is the corresponding label.

The weight change when the error between the diffusion model and the basic quantization model is minimum can be determined by using the formula (4) as the optimized objective functionAnd further, the weight in the basic quantization model is calibrated, so that a weight calibrated quantization model is obtained.

According to the quantization method of the diffusion model, the basic quantization model corresponding to the diffusion model is firstly constructed, the single-step quantization error introduced by each iteration step is determined through an iteration loop, and then the calibration data with the quantity matched with the single-step quantization error are obtained through sampling, the weight of the basic quantization model is calibrated based on the calibration data of each iteration step, the calculated quantity is small, the quantization error introduced by each iteration step can be reduced, the weight-calibrated quantization model is enabled to be more similar to the original diffusion model, and therefore the generation performance of the weight-calibrated quantization model can be guaranteed.

The embodiment provides a quantization method of a diffusion model, which can be applied to equipment needing model training, such as cloud platform products, acceleration cards and the like related to an FPGA. Fig. 2 is a flowchart of a quantization method of a diffusion model according to an embodiment of the present invention, as shown in fig. 2, the flowchart including the following steps.

Step S201, carrying out quantization processing on the original diffusion model to generate a corresponding basic quantization model.

Please refer to step S101 in the embodiment shown in fig. 1 in detail, which is not described herein.

Step S202, performing iterative loop on the diffusion model and the basic quantization model, and determining a single-step quantization error corresponding to each iterative step; the single-step quantization error is the quantization error between the diffusion model and the base quantization model in a single iteration step.

Specifically, the step S202 "performs an iterative loop on the diffusion model and the basic quantization model, and the determination of the single-step quantization error corresponding to each iterative step" includes the following steps S2021 to S2022.

Step S2021, performing an iterative loop on the diffusion model and the basic quantization model, and determining an accumulated quantization error corresponding to each iterative step; the accumulated quantization error is all quantization errors between the diffusion model and the base quantization model when performing the corresponding iteration step.

Step S2022, the difference between the accumulated quantization errors of the adjacent two iteration steps is taken as the single-step quantization error of the corresponding iteration step.

In this embodiment, the single-step quantization error is a quantization error introduced in a single iteration step, and when a diffusion model and a basic quantization model execute an iteration loop, the quantization error between the diffusion model and the basic quantization model is the accumulation of errors in all previous iteration steps, and the difference between the accumulated quantization errors of two adjacent iteration steps can be used as the single-step quantization error of the corresponding iteration step, although the single-step quantization error is not easy to be directly determined.

Specifically, when the t-th iteration step is performed, a quantization error between the diffusion model and the base quantization model may be determined, the quantization error including a quantization error introduced at a previous iteration step (e.g., t-1 th iteration step, etc.), so the quantization error is a cumulative quantization error of the t-th iteration step. The accumulated quantization error of the t iteration step is AE _i, and correspondingly, the accumulated quantization error of the t iteration step is AE _i-1; it will be appreciated that the difference between AE _i and AE _i-1 is the quantization error introduced separately at the t-th iteration step, i.e., the single-step quantization error E _i at the t-th iteration step. Thus E _i= AE_i- AE_i-1.

In some alternative embodiments, the step S2021 "performs an iterative loop on the diffusion model and the basic quantization model, and determining the accumulated quantization error corresponding to each iterative step" may include the following steps A1 to A2.

Step A1, performing iteration loop on a diffusion model and a basic quantization model, and determining a first loss value of the diffusion model and a second loss value of the basic quantization model when iteration is performed to each iteration step;

and step A2, taking the error between the second loss value and the first loss value as the accumulated quantization error of the corresponding iteration step.

In this embodiment, in the process of performing an iterative loop on the diffusion model and the basic quantization model, the performance of the two models may be recorded at each iteration step; wherein the performance of the model is represented based on a loss value determined by a loss function of the model.

Specifically, the input data of the diffusion model isThe input data/>The corresponding label is/>; If the weight of the diffusion model is/>Based on the loss function/>The determined loss value of the diffusion model may be expressed as/>The loss value/>Representation is/>, for weightsTo input data/>Input to the diffusion model, the output result of the diffusion model and the true tag/>Loss (error) between; it will be appreciated that the smaller the loss value, the better the performance of the model (the better the effect of generation). For convenience of description, loss value of diffusion model/>Referred to as the first loss value and abbreviated as; And, at each iteration step, the first loss value of the diffusion model can be determined, and the first loss value of the t iteration step is recorded as/>。

Similarly, the loss value of the basic quantization modelReferred to as the second loss value and abbreviated as/>; And, at each iteration step, determining a second loss value of the basic quantization model, and recording the second loss value of the t iteration step as/>。

At the determination of the first loss value of each iteration stepAnd a second loss value/>Then, the error between the two loss values, i.e. the accumulated quantization error of the t iteration step, can be determined.

Alternatively, the quantization error is expressed as a mean square error of the loss value, and accordingly, the accumulated quantization error is an accumulated mean square error. Specifically, the accumulated quantization error may be expressed as:

（5）

Wherein, ；/>Representing accumulated quantization error of iteration to t iteration step,/>Representing a first loss value of iteration through the t iteration step,/>Representing a second loss value from iteration to the t iteration step; /(I)Weights representing diffusion model,/>Weights representing the underlying quantization model,/>Representing a loss function,/>And the mean square error function is represented, and T is the total number of iterative steps.

In this embodiment, by recording the performance of the diffusion model and the basic quantization model in each iteration step, the cumulative process of performance loss of the basic quantization model in the whole process compared with the original diffusion model can be obtained, i.e., the cumulative quantization error of each iteration step can be determined. Taking the diffusion model as an example for 100 iteration steps, i.e. t=100, by comparing the errors of the two model loss values, the cumulative quantization error distribution as shown in fig. 3 can be obtained. In fig. 3, the abscissa is the iteration step of the reasoning process, and the ordinate is the accumulated quantization error.

As can be seen from fig. 3, the accumulated quantization error increases as the inference iteration proceeds. It should be noted that, the abscissa in fig. 3 is from 1 to 100, and the iterative steps are sequentially executed according to the time sequence in the iterative reasoning process; however, in the basic concept of the diffusion model, step 0 represents the final generated image, i.e. the number of iterative steps in the diffusion model is from 100 to 1, wherein step 100 is the initial step, representing the initialized white noise. The number of iteration steps in the diffusion model is not adopted in the embodiment, but is named as 1 st iteration step, 2 nd iteration step and the like in sequence according to the execution sequence.

To further analyze the contribution of each iteration step to the error accumulation, the accumulated quantization error is converted into a discrete quantization error distribution for each single step in this embodiment. Specifically, after determining the accumulated quantization error of each iteration step, the difference between the accumulated quantization errors of two adjacent iteration steps can be used as the single-step quantization error of the corresponding iteration step. And, as shown in fig. 3, through trend analysis of the error quantization accumulation, it can be found that the contributions of different iteration steps to the quantization error are different, i.e. the single-step quantization error of different iteration steps may be different, and then based on the single-step quantization error of each iteration step, the corresponding sampling number can be determined.

In this embodiment, the accumulated quantization error of each iteration step may be determined first, so as to obtain a single-step quantization error introduced by each iteration step alone.

Step S203, determining the sampling number corresponding to each iteration step according to the single-step quantization error; the number of samples is positively correlated with the single step quantization error.

Please refer to step S103 in the embodiment shown in fig. 1 in detail, which is not described herein.

In some alternative embodiments, the step S203 "determining the number of samples corresponding to each iteration step according to the single-step quantization error" includes the following steps B1 to B2.

And step B1, carrying out normalization processing on the single-step quantization error, and determining the sampling weight corresponding to each iteration step.

In this embodiment, in order to determine a suitable number of samples based on the single-step quantization error, the overall distribution of the single-step quantization error is normalized, and the normalization result is taken as the sampling weight of the corresponding iteration step, so that the sum of all the sampling weights is 1.

For example, taking the cumulative quantization error shown in fig. 3 as an example, the single-step quantization error thus determined is normalized, and the determined normalization result can be shown in fig. 4, which can represent the normalized quantization error distribution as a whole. The abscissa in fig. 4 is the iteration step of the reasoning process, and the ordinate is the normalization result of the single-step quantization error.

Optionally, the sampling weight satisfies:；/>。

For example, if the cumulative quantization error is determined based on the above equation (5), the sampling weight of the t-th iteration stepCan be expressed as:

（6）

Wherein, The accumulated quantization error of the t-1 th iteration step is represented, and the molecule of the above formula (6) is the single-step quantization error/>；/>Representing the accumulated quantization error/>, of the T-th iteration step (i.e., the last iteration step)。

And, accumulated quantization error in the last T iteration stepAs denominator, the single-step quantization error is normalized so that the sum of the sampling weights of all iteration steps is 1, i.e./>。

And step B2, determining the sampling quantity of the corresponding iteration step according to the sampling weight of each iteration step.

In this embodiment, the sampling weightsWeights representing the number of corresponding iterative step samples n _t, sampling weights/>Is also positively correlated with the number of samples n _t, i.e. sampling weight/>The larger the number of samples n _t, the larger. Where the number of samples n _t represents the number of samples of the t-th iteration step. Error-aware weighted sampling of the calibration data can be achieved by determining a corresponding number of samples by the sampling weights.

Optionally, the step B2 "determines the number of samples of each iteration step according to the sampling weight of each iteration step" includes the following steps B21 to B22.

And step B21, presetting a sample set size N of the check data sample set.

Step B22, determining the number of samples in each iteration step:；/>；

In this embodiment, based on the normalized quantization error distribution, the sampling weight of the calibration data may be constructed, for example, the normalized quantization error distribution may be used as a probability distribution, and based on a given overall sample number (i.e., the sample set size N), the sampling of the calibration data may be directly performed on each iteration step according to the probability distribution.

The global size of the calibration data sample set, i.e., the sample set size N, is preset, and the sample set size N may be an over-parameter. Based on previously obtained sampling weightsThe calibration data required for each iteration step may be assigned a corresponding number of samples n _t. Wherein, for the iteration step with larger loss error contribution, the single-step quantization error is larger, and the sampling weight/>Also, the iteration step is allocated relatively more calibration data, i.e. the number of samples n _t is also larger. In the subsequent weight calibration process, the accumulated quantization error of the diffusion model caused by multi-step calculation can be reduced more effectively.

In particular, the method comprises the steps of,Wherein/>Is a rounding function; in general, the/>A neighbor rounding function is chosen, i.e. the number of samples n _t per iteration step is determined in a rounded manner. It will be appreciated that the total number of calibration data sampled for the T iteration steps is substantially consistent with the sample set size N.

Step S204, each iteration step is sampled according to the sampling quantity, and calibration data of each iteration step are obtained.

Please refer to step S104 in the embodiment shown in fig. 1 in detail, which is not described herein.

In some alternative embodiments, step S204 "step S samples each iteration step according to the number of samples, and the obtaining the calibration data of each iteration step" includes steps C1 to C2.

Step C1, sampling the middle input data of each iteration step, and randomly sampling n _t data samples in the t iteration step; where n _t represents the number of samples of the t-th iteration step.

And C2, taking n _t data samples as calibration data of the t iteration step.

In this embodiment, after determining the number of samples n _t in each iteration step, sampling may be started; for example, using a rounding functionAnd rounding the weighted result to obtain the sampling number n _t corresponding to each iteration step, and then starting the sampling process.

Specifically, random input data can be provided for an original diffusion model and iterative reasoning calculation can be started, for each iteration step passing in the process, the intermediate input data of the iteration step is sampled, n _t pieces of the intermediate input data are collected in total, for example, the t th iteration step is sampled, and then all the obtained n _t data samples are obtainedAdding to a calibration data sample set; wherein/>Represents the i-th data sample sampled in the t-th iteration step, i=1, 2, …, n _t. After the sampling of all the iteration steps from 1 to T is completed, all N data samples can be obtained, constituting the required calibration sample data set.

Step S205, the calibration data of each iteration step is used as the input data of the corresponding iteration step, the iteration loop is executed again on the diffusion model and the basic quantization model, the error between the diffusion model and the basic quantization model is minimized, the weight of the basic quantization model is optimized, and the quantization model with the calibrated weight is generated.

Please refer to step S105 in the embodiment shown in fig. 1 in detail, which is not described herein.

In some alternative embodiments, the step S105 "optimizing the weights of the base quantization model with the objective of minimizing the error between the diffusion model and the base quantization model" specifically includes the following step D1.

And D1, taking the error between the minimum diffusion model and the basic quantization model as a target, and determining the optimized weight of the basic quantization model by actively rounding the weight of the diffusion model.

In this embodiment, based on the above equations (2), (3), etc., it is known that the weight calibration can be achieved by optimizing the rounding function used in the quantization process. For example, the weights of the diffusion model may be actively rounded based on the objective function determined by the above equation (4), that is, a suitable rounding mode is selected, so that more suitable quantization weights may be determined.

Optionally, the step D1 "of determining the optimized weight of the basic quantization model by actively rounding the weight of the diffusion model with the objective of minimizing the error between the diffusion model and the basic quantization model includes the following steps D11 to D13.

And D11, adding continuous variables to be optimized for the rounding mode of the weight of the diffusion model, and representing the conversion relation between the weight of the diffusion model and the undetermined weight of the quantization model according to the continuous variables.

Step D12, determining corresponding continuous variables based on a first objective function aiming at minimizing the product of the weight error and the input data; the weight error is the difference between the weight of the diffusion model and the undetermined weight of the quantization model;

And step D13, determining the optimized weight of the basic quantization model according to the determined continuous variable and the conversion relation.

For the objective function shown in equation (4) above, for each new weight changeAll that is required is to obtain it by a set of input data and forward computation. To reduce the repetition of the calculation, similar to equation (2) above, the approximation calculation may be accomplished using a second order taylor expansion. Thus, for each layer weight, the rounding optimization problem is further rewritten as:

（7）

Wherein, Weight matrix representing layer I,/>Representing the weight change matrix of layer I,/>AndThe two terms respectively represent the weight/>Is a gradient and hessian matrix. Wherein each non-zero element of the hessian matrix corresponds to a layer. Whereas for a given diffusion model, one term for the gradient in the above equation (the previous one) is zero, the problem can be reduced to:

（8）

For solving the optimization problem, there are two problems, on one hand, the hessian matrix occupies larger calculation and storage, and on the other hand, the complexity of the objective function optimization problem in the formula (8) increases nonlinearly with the increase of the dimension. There is a need for a more optimal approach to address both of these issues.

In order to solve the problem of the computation complexity of the hessian matrix, the hessian matrix can be further simplified and approximated for each layer to become a local optimization problem, and the mean square error of the output activation of each layer is minimized, so that the optimization problem is irrelevant to all layers behind the layer, is only the optimization problem of the current layer, and no global hessian matrix is required to be computed, thereby solving the first problem. Specifically, the optimization procedure for the above formula (8) is as follows:

（9）

Wherein, In the weight matrix representing the first layer, the weight of the kth line,/>Representing the input data of layer 1.

To solve the problem of optimization solution complexity, the above optimization problem can be further approximated, i.e., the above equation (9) is further simplified to be converted into a continuous optimization problem.

Specifically, in this embodiment, a continuous variable to be optimized is added to a rounding mode of the weight of the diffusion model, and the conversion relationship between the weight of the diffusion model and the undetermined weight of the quantization model is represented according to the continuous variable.

Optionally, the conversion relation satisfies:

（10）

Wherein, Weights representing diffusion model,/>Pending weights representing quantization models,/>For the continuous variable to be optimized, s is the quantization scale,/>As a function of limiting the variable x to a minimum lower limit value a and a maximum upper limit value b,/>As a rounding function,/>To map the variable to a monotonic function of the desired range, q _min represents the minimum cutoff threshold for the quantized range and q _max represents the maximum cutoff threshold for the quantized range.

Wherein the method comprisesAs a round-down function,/>A monotonic function mapping variables to 0 to 1; or alternativelyAs a round-up function,/>To map variables to monotonic functions of-1 to 0. In general, the/>As a round-down function,/>Corrected sigmoid functions, etc. may be employed to model the continuous variable/>Mapped to a value range of 0 to 1.

Alternatively, approximating equation (9) above, it may be converted into the following objective function:

（11）

Wherein, Input data representing diffusion model,/>Weights representing diffusion model,/>Pending weights representing quantization models,/>For continuous variables to be optimized,/>Representing the second order norm, i.e. 2-norm,/>Represents a regularization term that converges the continuous variable V by 0 or 1, and λ represents a regularization term coefficient, which may be based on reality. Wherein the continuous variable/>Is a two-dimensional matrix comprising a plurality of rows and columns of elements.

For convenience of description, the objective function shown in the above formula (11) is referred to as a first objective function, that is, the first objective function is:。

Specifically, after determining the set of calibration data samples, an iterative loop may be performed on the diffusion model and the base quantization model, with the previously determined calibration data being used as input data in the corresponding iteration step, i.e., as input data in equation (11) above, at each iteration step Solving the first objective function by an inverse gradient method and the like to finally obtain the optimal continuous variable/>I.e. the continuous variable/>, can be determinedV _ij, for example. In determining the continuous variable/>Then, the weight/>, of the diffusion model can be determined based on the formula (10)Corresponding pending weights/>Determined pending weight/>The weight is obtained after the weight in the basic quantization model is optimized; after the iteration of step T is finished, optimization of all weights can be completed, and finally a weight calibration quantization model is obtained.

In some alternative embodiments, the activation profile is also different for each iteration step of the diffusion model generation, and for post-training quantization methods for convolutional neural networks, a method using data calibration is used, so-called calibration data is used as sample data capable of representing the true input profile, so that based on these data, the activation profile in the model can be calibrated. Whereas for iterative processes of diffusion models, the activation profile for each step is constantly changing, the differences between the steps further apart are quite significant. This makes the quantization of the diffusion model more difficult, because model noise in a plurality of iteration steps is only estimated and used for calibration, and the characteristics of all the active layers in the whole iteration process cannot be reflected, so that the diffusion model generation process only follows the distribution of the active layers represented by the few calibration data, and the diffusion model generation process has no good generalization and further influences the generation effect.

In this embodiment, after the step S205 of generating the weight calibrated quantization model, the method further includes a process of activating calibration, which specifically includes the following step E1.

And E1, calibrating an activation function in the weight calibrated quantization model to generate an activated calibrated quantization model.

In this embodiment, the optimization may be performed layer by layer based on the first objective function described above, however, the effect of the activation function on the layer-by-layer accumulation of quantization errors is not considered at this time. To solve this problem, in the present embodiment, further activation calibration is performed on the quantization model to reduce the influence due to the activation function.

Optionally, the step E1 "calibrate the activation function in the weight-calibrated quantization model" includes the following steps E11 to E12.

E11, optimizing the continuous variable again based on the second objective function; the second objective function is an objective function which aims at minimizing the difference between the first activation term and the second activation term; the first activation term is a result of nonlinear transformation of a product between a weight of the diffusion model and the input data using the activation function, and the second activation term is a result of nonlinear transformation of a product between a pending weight of the quantization model and the input data using the activation function.

And E12, determining the optimized weight of the weight-calibrated quantization model according to the redetermined continuous variable and the conversion relation.

In this embodiment, similar to the above-mentioned weight calibration process, continuous variables to be optimized are still introduced for the rounding mode of the weights, and corresponding conversion relationships are determined, and the specific process can be seen from the above-mentioned step D11. Specifically, the conversion relationship is as shown in the above formula (10); at this time, the undetermined weights of the quantization modelIn particular the weights of the quantization model that need to be subjected to weight calibration. Thereafter, the continuous variable may be optimized again based on the second objective function, thereby determining an optimized continuous variable.

Wherein an activation function is utilizedWeights to diffusion model/>And input data/>The product between them is non-linearly transformed to form a first activation term, e.g./>, the first activation term; Similarly, an activation function/>, is utilizedPending weights to quantization model/>And input data/>The product between them is non-linearly transformed to obtain a second activation term, i.e. Wherein the input data/>The method specifically comprises the steps of inputting data of a quantization model; wherein the input data/>And input data/>May be identical; or after determining the weight-calibrated quantization model, the input data of each iteration step in the weight-calibrated quantization model can be sampled based on the previously determined sampling number n _t of each iteration step, so that the calibration data corresponding to each iteration step can be determined, and the calibration data can be used as the input data/>, of the corresponding iteration step。

Optionally, the second objective function satisfies:

（12）

Wherein, Input data representing diffusion model,/>Weights representing diffusion model,/>Input data representing a quantization model,/>Pending weights representing quantization models,/>To activate the function,/>For continuous variables to be optimized,/>Representing the second order norm,/>A canonical term representing execution of 0 or 1 for convergence of the continuous variable V,/>Represents a regularized term coefficient that is identical to the coefficient/>May be the same.

In this embodiment, after the weight calibration, the activation quantization update may be further performed, and at this time, the iterative reasoning operation of the two models is restarted, and for each iteration step, based on the original diffusion model, the quantized basic quantization model, and the calibration data of the layer, based on the above second objective function, the continuous variable may be determined againFurther, the/>, in the above formula (10) is usedCan realize automatic rounding, and the weight/>, which is determined at the momentNamely activating calibration; and finishing after the T-step circulation, and finally realizing the optimization process of the basic quantization model. The finally obtained activated and calibrated quantization model is a quantization model subjected to weight calibration and activation calibration, and has better performance than a basic quantization model.

According to the embodiment, aiming at the quantization processing of the diffusion model of the deep sequence iteration characteristic, different problems of error accumulation and activation distribution in each iteration step are fully considered, the quantization error of each iteration step can be dynamically perceived and weighted, and the quantization calibration process of the diffusion model is effectively improved.

The embodiment provides a quantization method of a diffusion model, which can be applied to equipment needing model training, such as cloud platform products, acceleration cards and the like related to an FPGA. Fig. 5 is a flowchart of a quantization method of a diffusion model according to an embodiment of the present invention, as shown in fig. 5, the flowchart including the following steps.

Step S501, quantization processing is performed on the original diffusion model, and a corresponding basic quantization model is generated.

The depth of the diffusion model and the depth of the basic quantization model are both T, namely the total number of iteration steps is T; for example t=100.

Step S502, performing iterative reasoning loop on the original diffusion model, wherein the iteration middle step is marked as T, the loop is from 1 to T, and the corresponding first loss value is obtained in each iteration step。

Step S503, performing iterative reasoning loop on the basic quantization model, wherein the iteration middle step is denoted as T, the loop is also from 1 to T, and the corresponding second loss value is obtained in each iteration step。

Step S504, for each iteration step, determines the cumulative quantization error AE _t for each iteration step based on the loss values determined in step S502 and step S503.

In step S505, based on the obtained accumulated quantization errors AE _t of all the iteration steps, a single-step quantization error of each iteration step may be determined, and further, the sampling weight λ _t of each iteration step may be determined, and the sampling number n _t may be calculated.

Wherein the sampling weight λ _t can be determined based on the above equation (6), and the number of samples n _t per iteration step is determined. For example, the number of samples n _t is。

Step S506, sampling is carried out according to the sampling quantity of each iteration step, and the calibration data of each iteration step are extracted to form a calibration data set.

For example, an iterative inference loop may be performed again with the diffusion model, provided to the diffusion model for random input and starting iteration, for each intermediate step t, based on the number of samples n _t for that step determined in step S505, n _t of the input data for that intermediate layer are obtained, forming calibration data。

Wherein the size N of the data set can be preset (e.g., n=2000), the calibration data set is noted as D, and the calibration data set is initialized to be empty; determining calibration data per iteration stepThen, the calibration data set D can be added; and circularly executing all 1 to T steps, acquiring all N calibration data samples, and completing the construction of a calibration data set.

Step S507, weight calibration.

Specifically, an iterative loop is again performed on the diffusion model and the basic quantization model, the intermediate steps of the iteration being denoted T, the loop being from 1 to T, for each iterative step T, n _t calibration data corresponding to that step being obtained from the calibration data set DTaking the calibration data as input data x, and finishing the quantitative calibration of the weight of the basic quantitative model in the iterative step based on the formulas (10) and (11) so as to update the weight/>Corresponding values in (3) are updated as/>And (3) circularly executing all 1 to T steps to finish the weight calibration updating of the basic quantization model.

Step S508, calibration is activated.

Specifically, an iterative loop is again performed on the diffusion model and the basic quantization model, the intermediate steps of the iteration being denoted T, the loop being from 1 to T, for each iterative step T, the weights are appliedAnd n _t calibration data/>, corresponding to the step, are obtained from the calibration data set DAs input, active rounding is implemented based on the above formulas (10) and (12), thereby implementing active calibration for the iterative step; and executing all steps 1 to T in a circulating way, completing the activation calibration updating of the quantized model after the weight calibration, and finally obtaining the optimized quantized model. The performance loss of the diffusion model is effectively reduced while the reasoning speed of the diffusion model is effectively compressed and accelerated.

In this embodiment, aiming at a diffusion model quantization scene, the problem of error conduction caused by deep iteration of a diffusion model and the problem of difficulty in estimating activation distribution are fully considered, the error perception weighted diffusion model quantization method provided by this embodiment is provided, error statistical analysis is performed on the diffusion model iteration process, cumulative error distribution is constructed, and the error distribution is applied to further iteration process perception weighted quantization calibration; based on the constructed error distribution of the iterative process, a weighted calibration data sampling mode perceived by the iterative process is formed, so that a corresponding calibration data set can be constructed, and finally, based on the constructed error-perceived weighted sampling calibration data set, active rounding is performed, so that the quantifier corresponding to all iterative steps of the diffusion model can be calibrated and updated, and finally, the quantification calibration of the diffusion model is completed. The method reduces the calculation amount required by single-step reasoning of the diffusion model, can effectively solve the problem of accumulation of quantization errors specific to the diffusion model, and can ensure the generation performance of the diffusion model while finally ensuring the acceleration reasoning.

The embodiment also provides a quantization device of the diffusion model, which is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides a quantization apparatus of a diffusion model, as shown in fig. 6, including:

The quantization module 601 is configured to perform quantization processing on an original diffusion model, and generate a corresponding basic quantization model;

the error determining module 602 is configured to perform an iterative loop on the diffusion model and the basic quantization model, and determine a single-step quantization error corresponding to each iterative step; the single-step quantization error is a quantization error between the diffusion model and the base quantization model in a single iteration step;

A number determining module 603, configured to determine, according to the single-step quantization error, a number of samples corresponding to each iteration step; the positive correlation relation is formed between the sampling quantity and the single-step quantization error;

the sampling module 604 is configured to sample each iteration step according to the sampling number, so as to obtain calibration data of each iteration step;

And the weight calibration module 605 is configured to take the calibration data of each iteration step as input data of a corresponding iteration step, execute an iterative loop again on the diffusion model and the basic quantization model, and optimize the weight of the basic quantization model with the objective of minimizing the error between the diffusion model and the basic quantization model, so as to generate a weight-calibrated quantization model.

In some alternative embodiments, the error determining module 602 performs an iterative loop on the diffusion model and the basic quantization model, and determines a single-step quantization error corresponding to each iterative step, including:

In some alternative embodiments, the error determining module 602 performs an iterative loop on the diffusion model and the basic quantization model, and determines a cumulative quantization error corresponding to each iterative step, including:

In some alternative embodiments, the accumulated quantization error satisfies:

；/>；

In some alternative embodiments, the number determining module 603 determines the number of samples corresponding to each iteration step according to the single-step quantization error, including:

In some alternative embodiments, the sampling weights satisfy:

；/>；

In some alternative embodiments, the number determining module 603 determines the number of samples of each iteration step according to the sampling weight of the corresponding iteration step, including:

presetting a sample set size N of a check data sample set;

determining the sampling number of each iteration step: ；/>；

In some alternative embodiments, the sampling module 604 samples each iteration step according to the number of samples to obtain calibration data for each iteration step, including:

And taking the n _t data samples as calibration data of the t iteration step.

In some alternative embodiments, the weight calibration module 605 optimizes the weights of the base quantization model with the goal of minimizing the error between the diffusion model and the base quantization model, including:

In some alternative embodiments, the weight calibration module 605 may determine the optimized weights of the base quantization model by actively rounding the weights of the diffusion model with the goal of minimizing the error between the diffusion model and the base quantization model, including:

In some alternative embodiments, the conversion relationship satisfies:

；

In some alternative embodiments, the first objective function satisfies:

；

In some alternative embodiments, the apparatus further comprises:

After the generation of the weight-calibrated quantization model, further comprising: activating a calibration module for: and calibrating an activation function in the weight calibrated quantization model to generate an activation calibrated quantization model.

In some alternative embodiments, the activation calibration module calibrates activation functions in the weight calibrated quantization model, including:

In some alternative embodiments, the second objective function satisfies:

；

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The quantization means of the diffusion model in this embodiment is presented in the form of functional units, where the units refer to ASIC (Application SPECIFIC INTEGRATED Circuit) circuits, including processors and memories executing one or more software or fixed programs, and/or other devices that can provide the above functions.

The embodiment of the invention also provides computer equipment, which is provided with the quantization device of the diffusion model shown in the figure 6.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 7, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 7.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Portions of the present invention may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or aspects in accordance with the present invention by way of operation of the computer. Those skilled in the art will appreciate that the form of computer program instructions present in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., and accordingly, the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A method for quantifying a diffusion model, the method comprising:

2. The method of claim 1, wherein performing an iterative loop on the diffusion model and the base quantization model to determine a single-step quantization error corresponding to each iterative step comprises:

3. The method of claim 2, wherein performing an iterative loop on the diffusion model and the base quantization model to determine a cumulative quantization error corresponding to each iteration step comprises:

4. A method according to claim 3, characterized in that the accumulated quantization error satisfies:

；/>；

Wherein, Representing accumulated quantization error of iteration to t iteration step,/>Representing a first loss value of iteration through the t iteration step,/>Representing a second loss value from iteration to the t iteration step; /(I)Weights representing the diffusion model,/>Weights representing the underlying quantization model,/>Representing a loss function,/>And the cumulative mean square error function is represented, and T is the total number of iterative steps.

5. The method of claim 1, wherein determining the number of samples corresponding to each iteration step based on the single-step quantization error comprises:

6. The method of claim 5, wherein the sampling weights satisfy:

；/>；

Wherein, Sampling weight representing the t iteration step,/>Single step quantization error representing the t-th iteration step,/>Representing accumulated quantization errors from iteration to the last T iteration step, wherein T is the total number of iteration steps; and, the difference between the accumulated quantization errors of two adjacent iteration steps is the single-step quantization error of the corresponding iteration step.

7. The method of claim 5, wherein determining the number of samples for each iteration step based on the sampling weight for the corresponding iteration step comprises:

presetting a sample set size N of a check data sample set;

determining the sampling number of each iteration step: ；/>；

8. The method of claim 1, wherein sampling each iteration step according to the number of samples to obtain calibration data for each iteration step comprises:

And taking the n _t data samples as calibration data of the t iteration step.

9. The method of claim 1, wherein optimizing weights of the base quantization model with the goal of minimizing errors between the diffusion model and the base quantization model comprises:

10. The method of claim 9, wherein said determining the weights of the base quantization model optimized by actively rounding the weights of the diffusion model with the goal of minimizing the error between the diffusion model and the base quantization model, comprises:

11. The method of claim 10, wherein the transformation relationship satisfies:

；

12. The method of claim 11, wherein the step of determining the position of the probe is performed,

As a round-down function,/>A monotonic function mapping variables to 0 to 1;

Or alternatively As a round-up function,/>To map variables to monotonic functions of-1 to 0.

13. The method of claim 10, wherein the first objective function satisfies:

；

14. The method of claim 10, further comprising, after the generating the weight-calibrated quantization model:

15. The method of claim 14, wherein calibrating the activation function in the weight-calibrated quantization model comprises:

16. The method of claim 15, wherein the second objective function satisfies:

；

Wherein, Input data representing the diffusion model,/>Weights representing the diffusion model,/>Input data representing a quantization model,/>Pending weights representing quantization models,/>To activate the function,/>As a continuous variable to be optimized,Representing the second order norm,/>A canonical term representing execution of 0 or 1 for convergence of the continuous variable V,/>Representing the regularized term coefficients.

17. A quantization apparatus of a diffusion model, the apparatus comprising:

18. A computer device, comprising:

A memory and a processor in communication with each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the method of quantifying a diffusion model according to any of claims 1 to 16.

19. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of quantifying a diffusion model according to any of claims 1 to 16.

20. A computer program product comprising computer instructions for causing a computer to perform the method of quantifying a diffusion model according to any of claims 1 to 16.