WO2022006919A1

WO2022006919A1 - Activation fixed-point fitting-based method and system for post-training quantization of convolutional neural network

Info

Publication number: WO2022006919A1
Application number: PCT/CN2020/101550
Authority: WO
Inventors: 王培松; 程健
Original assignee: 中国科学院自动化研究所
Priority date: 2020-07-10
Filing date: 2020-07-13
Publication date: 2022-01-13
Also published as: CN111783961A

Abstract

An activation fixed-point fitting-based method and system for the post-training quantization of a convolutional neural network, which aims to solve the problem in the existing technology in which the post-training quantization of a convolutional neural network cannot be implemented by means of a more efficient low-bit quantization method. The quantization method comprises: performing low-bit fixed-point quantization on a weight matrix of each layer of an original convolutional neural network; obtaining a group of verification data, constructing an optimized target function from input activation to output activation, iteratively optimizing a fixed-point weight matrix and a weight quantization scale factor, and obtaining a weight fixed-point quantization convolutional neural network; and on the basis of the verification data and the weight fixed-point quantization convolutional neural network, solving an activation quantization scale factor, and obtaining a weight-activated fixed-point convolutional neural network. By means of directly learning a low-bit mapping function from input activation to output activation, it is ensured that the convolution output before and after weight quantization is similar, the accuracy of a quantized model is high, and the quantization process does not require the use of data for retraining.

Description

基于激活定点拟合的卷积神经网络训练后量化方法及***Post-training quantization method and system for convolutional neural network based on activation fixed-point fitting

技术领域technical field

本发明属于数据处理领域，具体涉及了一种基于激活定点拟合的卷积神经网络训练后量化方法及***。The invention belongs to the field of data processing, and particularly relates to a post-training quantization method and system of a convolutional neural network based on activation fixed-point fitting.

背景技术Background technique

近几年来，深度卷积神经网络在图像处理、计算机视觉、深度学习等众多领域取得了突破性进展，在例如图像分类、目标检测、人脸识别等各个领域，已经接近甚至超越人类识别精度，因此，目前深度卷积神经网络已经在辅助驾驶、电子商务、视频监控等行业中得到了广泛应用。In recent years, deep convolutional neural networks have made breakthroughs in many fields such as image processing, computer vision, and deep learning. In various fields such as image classification, target detection, and face recognition, they have approached or even surpassed human recognition accuracy. Therefore, deep convolutional neural networks have been widely used in assisted driving, e-commerce, video surveillance and other industries.

随着深度卷积神经网络性能的不断提升，网络的宽度和深度也越来越大，带来的问题是网络的存储和计算代价都会大幅度增长。而在网络部署过程中，需要将深度神经网络部署到各种各样的嵌入式设备中，而目前嵌入式设备的存储、内存、计算力以及电量都非常有限，因此如此大规模的深度卷积神经网络很难部署到这些低端嵌入式设备中。同时，例如自动驾驶、视频监控等任务，对实时性要求非常高，目前深度神经网络的推理时间很难满足实时性要求。因此，深度卷积神经网络的加速压缩受到了越来越多的关注。With the continuous improvement of the performance of deep convolutional neural networks, the width and depth of the network are also getting larger and larger, and the problem is that the storage and computing costs of the network will increase significantly. In the process of network deployment, deep neural networks need to be deployed in a variety of embedded devices. At present, the storage, memory, computing power and power of embedded devices are very limited, so such a large-scale deep convolution Neural networks are difficult to deploy into these low-end embedded devices. At the same time, tasks such as autonomous driving and video surveillance have very high real-time requirements, and the current inference time of deep neural networks is difficult to meet the real-time requirements. Therefore, accelerated compression of deep convolutional neural networks has received increasing attention.

低比特定点量化是深度卷积神经网络压缩的方法之一，由于定点量化能够适用于各种各样的网络结构，并且对硬件非常友好，因此低比特定点量化成为网络压缩领域重要方法之一。但是，目前对深度卷积神经网络进行低比特定点量化，往往需要原始的训练数据，经过大量的网络重训练，才能获得比较高的精度，即训练中量化(Training-aware Quantization)。然而，在网络量化过程中，一方面，由于隐私等原因，往往很难获取到训练数据；另一方面，即使能够获得全部训练数据，但是在量化过程中对量化网络进行重新训练或微调，需要花费大量的计算资源，使用门槛比较高。目前，也有一些训练后量化(Post-training Quantization)方法，在不需要使用数据重新训练的条件下，对网络进行量化，但是目前大部分训练后量化方法，往往只针对高比特定点量化效果比较好。Low-ratio specific point quantization is one of the methods of deep convolutional neural network compression. Since fixed-point quantization can be applied to a variety of network structures and is very hardware-friendly, low-ratio specific point quantization has become one of the important methods in the field of network compression. However, at present, the quantization of the deep convolutional neural network at a low ratio to a specific point often requires the original training data. After a large number of network retraining, a relatively high accuracy can be obtained, that is, training-aware Quantization. However, in the process of network quantization, on the one hand, it is often difficult to obtain training data due to reasons such as privacy; on the other hand, even if all training data can be obtained, retraining or fine-tuning the quantization network during the quantization process requires It spends a lot of computing resources, and the threshold for use is relatively high. At present, there are also some post-training quantization methods, which quantify the network without using data to retrain. However, most of the current post-training quantization methods are often only aimed at high ratio specific points. The quantization effect is better .

发明内容SUMMARY OF THE INVENTION

为了解决现有技术中的上述问题，即现有技术无法通过更加高效的低比特量化方法实现卷积神经网络训练后量化的问题，本发明提供了一种基于激活定点拟合的卷积神经网络训练后量化方法，该量化方法包括：In order to solve the above problem in the prior art, that is, the prior art cannot realize the quantization after training of the convolutional neural network through a more efficient low-bit quantization method, the present invention provides a convolutional neural network based on activation fixed-point fitting Post-training quantization method, the quantization method includes:

步骤S10，网络权重矩阵拟合，获取原始卷积神经网络各层权值矩阵，并分别进行各层权值矩阵的低比特定点量化，获得网络各层的定点权值矩阵和权值量化尺度因子；Step S10, network weight matrix fitting, obtaining the weight matrix of each layer of the original convolutional neural network, and performing low-to-specific point quantization of the weight matrix of each layer respectively, to obtain the fixed-point weight matrix and weight quantization scale factor of each layer of the network ;

步骤S20，网络第一次激活矩阵拟合，获取一组校验数据，基于所述网络各层的定点权值矩阵和权值量化尺度因子，构造由输入激活到输出激活的优化目标函数，迭代进行定点权值矩阵和权值量化尺度因子的优化，获得权值定点量化卷积神经网络；Step S20, the network is fitted for the first activation matrix, a set of verification data is obtained, and an optimization objective function from input activation to output activation is constructed based on the fixed-point weight matrix and weight quantization scale factor of each layer of the network, and iterative Optimize the fixed-point weight matrix and the weight quantization scale factor to obtain the weight fixed-point quantization convolutional neural network;

步骤S30，网络第二次激活矩阵拟合，基于所述一组校验数据以及权值定点量化卷积神经网络，求解激活量化尺度因子，获得量化后的权值-激活定点量化卷积神经网络。Step S30, the network is fitted for the second activation matrix, based on the set of verification data and the weight fixed-point quantized convolutional neural network, the activation quantization scale factor is solved, and the quantized weight-activated fixed-point quantized convolutional neural network is obtained .

在一些优选的实施例中，步骤S10中“分别进行各层权值矩阵的低比特定点量化，获得网络各层的定点权值矩阵和权值量化尺度因子”，其方法为：In some preferred embodiments, in step S10, "respectively perform low-to-specific point quantization of the weight matrix of each layer to obtain the fixed-point weight matrix and weight quantization scale factor of each layer of the network", and the method is:

步骤S11，将当前层的权值矩阵按行拆分为权值向量，并以权值向量的最大绝对值除以权值量化定点数的最大值作为各行的初始权值量化尺度因子；Step S11, dividing the weight matrix of the current layer into weight vectors by row, and dividing the maximum absolute value of the weight vector by the maximum value of the fixed-point weight quantization as the initial weight quantization scale factor of each row;

步骤S12，构建所述权值向量、权值量化定点数以及初始权值量化尺度因子之间的权值量化误差函数；Step S12, constructing the weight quantization error function between the weight vector, the weight quantization fixed-point number and the initial weight quantization scale factor;

步骤S13，基于所述权值量化误差函数，迭代进行权值量化定点数和权值量化尺度因子的求解，直至所述权值量化误差函数值低于设定阈值，获得当前层的定点权值矩阵和权值量化尺度因子；Step S13, based on the weight quantization error function, iteratively perform the solution of the weight quantization fixed-point number and the weight quantization scale factor, until the weight quantization error function value is lower than the set threshold, and obtain the fixed-point weight of the current layer. Matrix and weight quantization scale factor;

步骤S14，通过步骤S11-步骤S13的方法分别获取网络各层的定点权值矩阵和权值量化尺度因子。In step S14, the fixed-point weight matrix and the weight quantization scale factor of each layer of the network are obtained respectively by the method of step S11-step S13.

在一些优选的实施例中，所述权值向量、权值量化定点数以及初始权值量化尺度因子之间的权值量化误差函数为：In some preferred embodiments, the weight quantization error function between the weight vector, the weight quantization fixed-point number and the initial weight quantization scale factor is:

其中，W为C _out×K的二维浮点数矩阵，W _i为矩阵的第i行的权值向量，C _out为当前层的输出通道数；当前层为卷积层时，K＝C _in*K _h*K _w，K _h和K _w分别为卷积核的高度和宽度，当前层为全连接层时，K＝C _in，C _in为当前层的输入通道数；Q _i代表权值量化定点数；Λ _ii代表初始权值量化尺度因子；

代表向量的2-范数的平方。 Among them, W is a two-dimensional floating-point number matrix _{of C out} _{× K, W i} is the weight vector of the i-th row of the matrix, and C _out is the number of output channels of the current layer; when the current layer is a convolutional layer, K=C _in *K _h *K _w , K _h and K _w are the height and width of the convolution kernel respectively, when the current layer is a fully connected layer, K=C _in , C _in is the number of input channels of the current layer; Q _i represents the weight value Quantization fixed point number; Λ _ii represents the initial weight quantization scale factor;

Represents the square of the 2-norm of the vector.

在一些优选的实施例中，步骤S20包括：In some preferred embodiments, step S20 includes:

步骤S21，将获取的一组校验数据输入原始卷积神经网络，获取当前层的输入激活和输出激活；Step S21, inputting the obtained set of verification data into the original convolutional neural network to obtain the input activation and output activation of the current layer;

步骤S22，基于所述当前层的输入激活和输出激活，构建当前层的定点约束下的线性最小二乘优化目标函数；Step S22, based on the input activation and output activation of the current layer, construct a linear least squares optimization objective function under the fixed-point constraint of the current layer;

步骤S23，按照定点权值矩阵的行进行所述线性最小二乘优化目标函数的拆分，并迭代进行权值量化尺度因子和定点权值的优化，直至输出激活量化误差小于设定阈值，获得当前层的优化后的定点权值矩阵和权值量化尺度因子；Step S23, splitting the linear least squares optimization objective function according to the row of the fixed-point weight matrix, and iteratively carrying out the optimization of the weight quantization scale factor and the fixed-point weight, until the output activation quantization error is less than the set threshold, obtain: The optimized fixed-point weight matrix and weight quantization scale factor of the current layer;

步骤S24，通过步骤S21-步骤S23的方法分别获取网络各层的优化后的定点权值矩阵和权值量化尺度因子，获得权值定点量化卷积神经网络。In step S24, the optimized fixed-point weight matrix and weight quantization scale factor of each layer of the network are obtained respectively by the methods of steps S21-S23, and the weight fixed-point quantization convolutional neural network is obtained.

在一些优选的实施例中，所述线性最小二乘优化目标函数为：In some preferred embodiments, the linear least squares optimization objective function is:

其中，X和Y分别代表当前层的输入激活和输出激活，Q为当前层的定点权值矩阵，Λ为当前层的权值量化尺度因子，

代表矩阵的F-范数的平方。 Among them, X and Y represent the input activation and output activation of the current layer, respectively, Q is the fixed-point weight matrix of the current layer, Λ is the weight quantization scale factor of the current layer,

Represents the square of the F-norm of the matrix.

在一些优选的实施例中，所述一组校验数据为少量原始卷积神经网络的训练数据，或者为少量与原始卷积神经网络的训练数据分布类似的其他数据，或者是人工生成的仿真数据，或者是随机生成的随机数据。In some preferred embodiments, the set of verification data is a small amount of training data of the original convolutional neural network, or a small amount of other data whose distribution is similar to the training data of the original convolutional neural network, or artificially generated simulation data, or randomly generated random data.

在一些优选的实施例中，步骤S30包括：In some preferred embodiments, step S30 includes:

步骤S31，将所述一组校验数据输入所述权值定点卷积神经网络，获取各层的输出激活构成输出激活向量；Step S31, inputting the set of verification data into the weighted fixed-point convolutional neural network, and obtaining the output activations of each layer to form an output activation vector;

步骤S32，以所述输出激活向量的最大绝对值除以激活量化定点数的最大值作为初始激活量化尺度因子；Step S32, divide the maximum absolute value of the output activation vector by the maximum value of the activation quantization fixed-point number as the initial activation quantization scale factor;

步骤S33，构建所述输出激活向量、激活量化函数以及初始激活量化尺度因子之间的激活量化误差函数；Step S33, constructing the activation quantization error function between the output activation vector, the activation quantization function and the initial activation quantization scale factor;

步骤S34，基于所述激活量化误差函数，迭代进行定点激活向量和激活量化尺度因子的求解，直至所述激活量化误差函数值低于设定阈值，获得优化的定点激活向量和激活量化尺度因子，获得量化后的权值-激活定点量化卷积神经网络。Step S34, based on the activation quantization error function, iteratively carry out the solution of the fixed-point activation vector and the activation quantization scale factor, until the activation quantization error function value is lower than the set threshold, obtain the optimized fixed-point activation vector and activation quantization scale factor, Get quantized weights - activate a fixed-point quantized convolutional neural network.

在一些优选的实施例中，所述激活量化函数为：In some preferred embodiments, the activation quantization function is:

其中，q _min和q _max分别代表激活量化定点数的最小值和最大值，α代表激活量化尺度因子，x _i代表第i层的输出激活，round(*)代表四舍五入运算，clip(*)代表阈值截断操作。 Among them, q _min and q _max represent the minimum and maximum activation quantization fixed-point numbers, respectively, α represents the activation quantization scale factor, x _i represents the output activation of the i-th layer, round(*) represents the rounding operation, and clip(*) represents the Threshold truncation operation.

在一些优选的实施例中，所述激活量化误差函数为：In some preferred embodiments, the activation quantization error function is:

其中，α代表激活量化尺度因子，x代表输出激活向量，

代表激活量化函数，

代表向量的2-范数的平方。 where α represents the activation quantization scale factor, x represents the output activation vector,

represents the activation quantization function,

Represents the square of the 2-norm of the vector.

本发明的另一方面，提出了一种基于激活定点拟合的卷积神经网络训练后量化***，基于上述的基于激活定点拟合的卷积神经网络训练后量化方法，该量化***包括网络权重矩阵拟合模块、网络第一次激活矩阵拟合模块、网络第二次激活矩阵拟合和输出模块；In another aspect of the present invention, a post-training quantization system of convolutional neural network based on activation fixed-point fitting is proposed. Based on the above-mentioned quantization method of convolutional neural network post-training based on activation fixed-point fitting, the quantization system includes network weights. Matrix fitting module, network first activation matrix fitting module, network second activation matrix fitting and output module;

所述网络权重矩阵拟合模块，配置为获取原始卷积神经网络各层权值矩阵，并分别进行各层权值矩阵的低比特定点量化，获得网络各层的定点权值矩阵和权值量化尺度因子；The network weight matrix fitting module is configured to obtain the weight matrices of each layer of the original convolutional neural network, and perform low-ratio specific point quantization of the weight matrix of each layer respectively, and obtain the fixed-point weight matrix and weight quantization of each layer of the network. scale factor;

所述网络第一次激活矩阵拟合模块，获取一组校验数据，基于所述网络各层的定点权值矩阵和权值量化尺度因子，构造由输入激活到输出激活的优化目标函数，迭代进行定点权值矩阵和权值量化尺度因子的优化，获得权值定点量化卷积神经网络；The network activates the matrix fitting module for the first time, obtains a set of verification data, constructs an optimization objective function from input activation to output activation based on the fixed-point weight matrix and weight quantization scale factor of each layer of the network, and iterates Optimize the fixed-point weight matrix and the weight quantization scale factor to obtain the weight fixed-point quantization convolutional neural network;

所述网络第二次激活矩阵拟合，配置为基于所述一组校验数据以及权值定点量化卷积神经网络，求解激活量化尺度因子，获得量化后的权值-激活定点量化卷积神经网络；The second activation matrix fitting of the network is configured to solve the activation quantization scale factor based on the set of verification data and the weight fixed-point quantized convolutional neural network, and obtain the quantized weight-activated fixed-point quantized convolutional neural network The internet;

所述输出模块，配置为输出获取的量化后的权值-激活定点量化卷积神经网络。The output module is configured to output the obtained quantized weight-activated fixed-point quantized convolutional neural network.

本发明的第三方面，提出了一种存储装置，其中存储有多条程序，所述程序适于由处理器加载并执行以实现上述的基于激活定点拟合的卷积神经网络训练后量化方法。In a third aspect of the present invention, a storage device is provided, wherein a plurality of programs are stored, and the programs are adapted to be loaded and executed by a processor to realize the above-mentioned post-training quantization method of convolutional neural network based on activation fixed-point fitting .

本发明的第四方面，提出了一种处理装置，包括处理器、存储装置；所述处理器，适于执行各条程序；所述存储装置，适于存储多条程序；所述程序适于由处理器加载并执行以实现上述的基于激活定点拟合的卷积神经网络训练后量化方法。In a fourth aspect of the present invention, a processing device is provided, including a processor and a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing multiple programs; the program is suitable for Loaded and executed by a processor to implement the above-described post-training quantization method for a convolutional neural network based on activation fixed-point fitting.

本发明的有益效果：Beneficial effects of the present invention:

(1)本发明基于激活定点拟合的卷积神经网络训练后量化方法，通过将卷积神经网络的卷积层以及全连接层的权值参数矩阵和输入激活进行定点量化，使用定点数存储代替原有的浮点数存储，并使用定点数运算代替原有的浮点数运算，从而可以实现深度卷积神经网络训练后的优化加速与压缩。(1) The present invention is based on the post-training quantization method of the convolutional neural network of activation fixed-point fitting, by quantizing the weight parameter matrix and input activation of the convolutional layer of the convolutional neural network and the fully connected layer and the input activation, using fixed-point number storage Replacing the original floating-point number storage, and using fixed-point number operations to replace the original floating-point number operations, can achieve optimized acceleration and compression after deep convolutional neural network training.

(2)本发明基于激活定点拟合的卷积神经网络训练后量化方法，量化过程无需使用数据重新训练，量化过程所需计算资源较少，使用门槛低，量化速度快。(2) The present invention is based on the post-training quantization method of the convolutional neural network that activates fixed-point fitting, the quantization process does not need to use data to retrain, the quantization process requires less computing resources, the use threshold is low, and the quantization speed is fast.

附图说明Description of drawings

通过阅读参照以下附图所作的对非限制性实施例所作的详细描述，本申请的其它特征、目的和优点将会变得更明显：Other features, objects and advantages of the present application will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

图1是本发明基于激活定点拟合的卷积神经网络训练后量化方法的流程示意图；1 is a schematic flowchart of a quantization method after training of a convolutional neural network based on activation fixed-point fitting of the present invention;

图2是本发明基于激活定点拟合的卷积神经网络训练后量化方法一种实施例的卷积神经网络的图像分类过程示意图；2 is a schematic diagram of the image classification process of the convolutional neural network according to an embodiment of the quantization method after the training of the convolutional neural network based on activation fixed-point fitting of the present invention;

图3是本发明基于激活定点拟合的卷积神经网络训练后量化方法一种实施例的图像分类过程中卷积神经网络的卷积操作示意图。3 is a schematic diagram of a convolution operation of a convolutional neural network in an image classification process according to an embodiment of the post-training quantization method of a convolutional neural network based on activation fixed-point fitting according to the present invention.

具体实施方式detailed description

下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅用于解释相关发明，而非对该发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与有关发明相关的部分。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict. The present application will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

本发明的一种基于激活定点拟合的卷积神经网络训练后量化方法，该量化方法包括：A post-training quantization method of a convolutional neural network based on activation fixed-point fitting of the present invention, the quantization method includes:

为了更清晰地对本发明基于激活定点拟合的卷积神经网络训练后量化方法进行说明，下面结合图1对本发明实施例中各步骤展开详述。In order to more clearly describe the post-training quantization method of the convolutional neural network based on activation fixed-point fitting of the present invention, each step in the embodiment of the present invention will be described in detail below with reference to FIG. 1 .

如图2所示，为本发明基于激活定点拟合的卷积神经网络训练后量化方法一种实施例的卷积神经网络的图像分类过程示意图，其中，卷积神经网络包含多个卷积层和多个全连接层，输入图像经过卷积层和全连接层的处理后得到分类结果。As shown in FIG. 2, it is a schematic diagram of an image classification process of a convolutional neural network according to an embodiment of the quantization method after training of a convolutional neural network based on activation fixed-point fitting, wherein the convolutional neural network includes a plurality of convolutional layers and multiple fully-connected layers, the input image is processed by convolutional layers and fully-connected layers to obtain classification results.

如图3所示，为本发明基于激活定点拟合的卷积神经网络训练后量化方法一种实施例的图像分类过程中卷积神经网络的卷积操作示意图，其中，每个卷积层都有一组卷积核，该组卷积核共同组成该层的权值张量，例如，卷积核可以设置为3×3；卷积层的处理方式就是使用所述卷积核对该层的输入特征图进行卷积操作(即计算每个卷积核与输入特征图的每个位置的卷积区域对应元素相乘并求和)，获得对应层的输出特征图。As shown in FIG. 3, it is a schematic diagram of the convolution operation of the convolutional neural network in the image classification process according to an embodiment of the quantization method after the training of the convolutional neural network based on the activation fixed-point fitting, wherein, each convolutional layer is There is a group of convolution kernels, which together form the weight tensor of the layer. For example, the convolution kernel can be set to 3×3; the processing method of the convolution layer is to use the input of the convolution kernel to the layer The feature map is subjected to convolution operation (that is, the multiplication and summation of each convolution kernel and the corresponding element of the convolution area of each position of the input feature map), and the output feature map of the corresponding layer is obtained.

本发明一种实施例的基于激活定点拟合的卷积神经网络训练后量化方法，包括步骤S10-步骤S30，各步骤详细描述如下：The post-training quantization method for a convolutional neural network based on activation fixed-point fitting according to an embodiment of the present invention includes steps S10 to S30, and each step is described in detail as follows:

步骤S10，网络权重矩阵拟合，获取原始卷积神经网络各层权值矩阵，并分别进行各层权值矩阵的低比特定点量化，获得网络各层的定点权值矩阵和权值量化尺度因子。Step S10, network weight matrix fitting, obtaining the weight matrix of each layer of the original convolutional neural network, and performing low-to-specific point quantization of the weight matrix of each layer respectively, to obtain the fixed-point weight matrix and weight quantization scale factor of each layer of the network .

网络各层的权值参数矩阵W为C _out×K的二维浮点数矩阵，其中，对于卷积层K＝C _in*K _h*K _w，而对于全连接层K＝C _in，C _in和C _out分别为网络当前层的输入通道数和输出通道数，K _h和K _w分别为卷积层的卷积核宽度和高度；记待求定点权值矩阵为Q，其维度与W相同；记待求权值量化尺度因子为Λ，是一个C _out×C _out的浮点数对角阵；通过最小化W、Q以及Λ之间的量化误差函数的最优化问题来求解，即求解

函数。 The weight parameter matrix W of each layer of the network is _{a two-dimensional floating-point matrix of C out} ×K, wherein, for the convolutional layer K=C _in *K _h *K _w , and for the fully connected layer K=C _in , C _in and C _out are the number of input channels and output channels of the current layer of the network, respectively, K _h and K _w are the width and height of the convolution kernel of the convolution layer, respectively; the fixed-point weight matrix to be calculated is Q, and its dimension is the same as W ; The quantization scale factor of the weight to be calculated is Λ, which is a _{floating-point diagonal matrix of C out} × C _out ; it is solved by minimizing the optimization problem of the quantization error function between W, Q and Λ, that is, solving

function.

步骤S11，将当前层的权值矩阵按行拆分为权值向量，并以权值向量的最大绝对值除以权值量化定点数的最大值作为各行的初始权值量化尺度因子，如式(1)所示：In step S11, the weight matrix of the current layer is divided into weight vectors by row, and the maximum absolute value of the weight vector is divided by the maximum value of the fixed-point weight quantization as the initial weight quantization scale factor of each row, as shown in the formula: (1) shows:

其中，Λ _ii为初始权值量化尺度因子，W _i将当前层的权值矩阵按行拆分获得的第i行的权值向量，Q _max为权值量化定点数的最大值，max(|W _i|)代表W _i的最大绝对值。 Among them, Λ _ii is the initial weight quantization scale factor, W _{i is} the weight vector of the i-th row obtained by dividing the weight matrix of the current layer into rows, Q _max is the maximum value of the fixed-point weight quantization, max(| W _i |) represents the maximum absolute value of W _i.

步骤S12，构建所述权值向量、权值量化定点数以及初始权值量化尺度因子之间的权值量化误差函数。Step S12, constructing a weight quantization error function among the weight vector, the weight quantization fixed-point number, and the initial weight quantization scale factor.

在对W、Q以及Λ之间的量化误差函数进行最小化求解时，可以将上述优化目标拆分成C _out个子优化问题，即权值矩阵W的每一行W _i将通过尺度因子Λ _ii被量化为Q _i，通过最小化W _i、Q _i以及Λ _ii之间的量化误差函数的最优化问题来求解，如式(2)所示： When quantization error function between the W, Q and Λ minimizing solving the optimization goal may be split into C _out sub-optimization problem, i.e. the weight matrix W for each row W _i by the scale factor _ii are Λ quantized to Q _i, by minimizing W _{_i,} Q _i and Λ function between _ii quantization error optimization problem to solve, as shown in (2) of formula:

代表向量的2-范数的平方。 Among them, W is a two-dimensional floating-point number matrix _{of C out} _{×K, W i} is the weight vector of the i-th row of the matrix, and C _out is the number of output channels of the current layer; when the current layer is a convolutional layer, K=C _in *K _h *K _w , K _h and K _w are the height and width of the convolution kernel respectively, when the current layer is a fully connected layer, K=C _in , C _in is the number of input channels of the current layer; Q _i represents the weight value Quantization fixed point number; Λ _ii represents the initial weight quantization scale factor;

Represents the square of the 2-norm of the vector.

步骤S13，基于所述权值量化误差函数，迭代进行权值量化定点数和权值量化尺度因子的求解，直至所述权值量化误差函数值低于设定阈值，获得当前层的定点权值矩阵和权值量化尺度因子。Step S13, based on the weight quantization error function, iteratively perform the solution of the weight quantization fixed-point number and the weight quantization scale factor, until the weight quantization error function value is lower than the set threshold, and obtain the fixed-point weight of the current layer. Matrices and weights quantize scale factors.

固定Λ _ii求解Q _i，如式(3)所示： Fix Λ _{ii to} solve Q _i , as shown in equation (3):

其中，Q _min和Q _max分别代表权值量化定点数的最小值和最大值，round(*)代表四舍五入运算，clip(*)代表阈值截断操作。 Among them, Q _min and Q _max represent the minimum and maximum value of the fixed-point weight quantization, respectively, round(*) represents the rounding operation, and clip(*) represents the threshold truncation operation.

固定Q _i求解Λ _ii，如式(4)所示： Fix Q _{i to} solve Λ _ii , as shown in equation (4):

其中，T代表转置。where T stands for transpose.

通过对式(3)和(4)进行迭代，即可获得Λ _ii和Q _i的最优解，即当前层的定点权值矩阵和权值量化尺度因子。 By iterating the equations (3) and (4), _{the optimal solutions of Λ ii} and Q _i can be obtained, that is, the fixed-point weight matrix and the weight quantization scale factor of the current layer.

遍历网络的每一层，获取网络各层的定点权值矩阵和权值量化尺度因子。Traverse each layer of the network to obtain the fixed-point weight matrix and weight quantization scale factor of each layer of the network.

进行网络各层权值矩阵的低比特定点量化的方法有很多种，上述为本发明优选的一个量化过程，其他实施例中也可以选择其他的方法进行网络各层权值矩阵的低比特定点量化，本发明在此不一一详述。There are many methods for quantizing the low-ratio specific point of the weight matrix of each layer of the network. The above is a preferred quantization process of the present invention. In other embodiments, other methods can also be selected to perform the low-ratio specific point quantization of the weight matrix of each layer of the network. , the present invention will not be described in detail here.

步骤S20，网络第一次激活矩阵拟合，获取一组校验数据，基于所述网络各层的定点权值矩阵和权值量化尺度因子，构造由输入激活到输出激活的优化目标函数，迭代进行定点权值矩阵和权值量化尺度因子的优化，获得权值定点量化卷积神经网络。Step S20, the network is fitted for the first activation matrix, a set of verification data is obtained, and an optimization objective function from input activation to output activation is constructed based on the fixed-point weight matrix and weight quantization scale factor of each layer of the network, and iterative The fixed-point weight matrix and the weight quantization scale factor are optimized to obtain the weight fixed-point quantization convolutional neural network.

基于一组批校验数据，构造由输入激活到输出激活的定点函数拟合优化目标，进一步优化定点权值矩阵和权值量化尺度因子，获得权值定点卷积神经网络。Based on a set of batch verification data, a fixed-point function fitting optimization objective from input activation to output activation is constructed, and the fixed-point weight matrix and weight quantization scale factor are further optimized to obtain a weighted fixed-point convolutional neural network.

步骤S21，将获取的一组校验数据输入原始卷积神经网络，获取当前层的输入激活X和输出激活Y。Step S21 , input the obtained set of verification data into the original convolutional neural network, and obtain the input activation X and output activation Y of the current layer.

步骤S22，基于所述当前层的输入激活X和输出激活Y，构建当前层的定点约束下的线性最小二乘优化目标函数，如式(5)所示：Step S22, based on the input activation X and output activation Y of the current layer, construct a linear least squares optimization objective function under the fixed-point constraint of the current layer, as shown in formula (5):

Represents the square of the F-norm of the matrix.

步骤S23，按照定点权值矩阵的行进行所述线性最小二乘优化目标函数的拆分，并迭代进行权值量化尺度因子和定点权值的优化，直至输出激活量化误差小于设定阈值，获得当前层的优化后的定点权值矩阵和权值量化尺度因子。Step S23, splitting the linear least squares optimization objective function according to the row of the fixed-point weight matrix, and iteratively carrying out the optimization of the weight quantization scale factor and the fixed-point weight, until the output activation quantization error is less than the set threshold, obtain: The optimized fixed-point weight matrix and weight quantization scale factor of the current layer.

对优化目标函数进行求解，也可以拆分成C _out个子优化问题，如式(6)所示： To solve the optimization objective function, it can also be divided into C _out sub-optimization problems, as shown in formula (6):

其中，q为长度为K的M比特定点数向量，λ是对应的浮点数尺度因子。将q拆分成M-1个三值向量q ₁,…,q _M-1,则优化目标函数可转换成式(7)所示： where q is an M-to-specific point vector of length K, and λ is the corresponding floating-point scale factor. Split q into M-1 three-valued vectors q ₁ ,...,q _M-1 , then the optimization objective function can be converted into equation (7):

将步骤S10获取的优化后的定点权值矩阵和权值量化尺度因子作为q和λ的初始化值，固定q求解λ，如式(8)所示：The optimized fixed-point weight matrix and weight quantization scale factor obtained in step S10 are used as the initialization values of q and λ, and q is fixed to solve λ, as shown in formula (8):

固定λ求解q，迭代对q ₁,…,q _M-1分别进行求解；假设对q _m进行优化，记

则优化目标函数转化为式(9)所示： Fix λ to solve q, and iteratively solve q ₁ ,...,q _M-1 respectively; assuming that q _m is optimized, record

Then the optimization objective function is transformed into formula (9):

使用上述优化目标函数对q _m进行优化，可以采用按位迭代优化方法，由于q _m是长度为K的三值向量，因此可以固定其余其中K-1位，只对第k位进行优化，记

则获得式(10)： Using the above optimization objective function to optimize q _m , the bitwise iterative optimization method can be used. Since q _m is a three-valued vector of length K, the remaining K-1 bits can be fixed, and only the k-th bit can be optimized.

Then the formula (10) is obtained:

其中，

in,

对上述求解λ和q的过程进行迭代，即可获得λ和q的近似解。The approximate solution of λ and q can be obtained by iterating the above process of solving λ and q.

遍历网络的每一层，获取网络各层的优化后的定点权值矩阵和权值量化尺度因子，替换原始卷积神经网络中的参数，获得权值定点量化卷积神经网络。Traverse each layer of the network, obtain the optimized fixed-point weight matrix and weight quantization scale factor of each layer of the network, replace the parameters in the original convolutional neural network, and obtain the weighted fixed-point quantized convolutional neural network.

步骤S31，将所述一组校验数据输入所述权值定点卷积神经网络，获取各层的输出激活构成输出激活向量。Step S31: Input the set of verification data into the weighted fixed-point convolutional neural network, and obtain the output activations of each layer to form an output activation vector.

使用校验数据输入到权值定点卷积神经网络中，并记录层输出激活值构成输出激活向量x。Use the check data to input into the weighted fixed-point convolutional neural network, and record the layer output activation value to form the output activation vector x.

步骤S32，以所述输出激活向量的最大绝对值除以激活量化定点数的最大值作为初始激活量化尺度因子，如式(11)所示：Step S32, divide the maximum absolute value of the output activation vector by the maximum value of the activation quantization fixed-point number as the initial activation quantization scale factor, as shown in formula (11):

其中，α为初始激活量化尺度因子，x为输出激活向量，q _max为激活量化定点数的最大值，max(|x|)代表x的最大绝对值。 Among them, α is the initial activation quantization scale factor, x is the output activation vector, q _max is the maximum value of the activation quantization fixed-point number, and max(|x|) represents the maximum absolute value of x.

步骤S33，构建所述输出激活向量、激活量化函数以及初始激活量化尺度因子之间的激活量化误差函数，如式(12)所示：Step S33, constructing the activation quantization error function between the output activation vector, activation quantization function and initial activation quantization scale factor, as shown in formula (12):

其中，α代表激活量化尺度因子，x代表输出激活向量，

代表激活量化函数，

represents the activation quantization function,

Represents the square of the 2-norm of the vector.

激活量化函数如式(13)所示：The activation quantization function is shown in formula (13):

固定α求解

如式(14)所示： Fixed alpha solution

As shown in formula (14):

固定

求解α，如式(15)所示： fixed

Solve for α, as shown in equation (15):

对上述求解α和

的过程进行迭代，即可获得α的最优解。 Solve the above for α and

Iterative process of α can obtain the optimal solution of α.

遍历网络的每一层，获取网络各层的优化的定点激活向量和激活量化尺度因子，替换权值定点量化卷积神经网络中的参数，获得量化后的权值-激活定点量化卷积神经网络。Traverse each layer of the network, obtain the optimized fixed-point activation vector and activation quantization scale factor of each layer of the network, replace the parameters in the weighted fixed-point quantized convolutional neural network, and obtain the quantized weight-activated fixed-point quantized convolutional neural network .

本发明一个实施例中，步骤S20和S30中用到的校验数据为少量原始卷积神经网络的训练数据，或者为少量与原始卷积神经网络的训练数据分布类似的其他数据，或者是人工生成的仿真数据，或者是随机生成的随机数据。In an embodiment of the present invention, the verification data used in steps S20 and S30 is a small amount of training data of the original convolutional neural network, or a small amount of other data with a distribution similar to the training data of the original convolutional neural network, or artificial Generated simulation data, or randomly generated random data.

对于卷积层和全连接层，原卷积神经网络的浮点数矩阵乘法可以转换成定点数矩阵乘法，并且存储时参数矩阵可以用定点矩阵代替，因此可以显著地降低运算开销和存储量，提高运行速度。For the convolutional layer and the fully connected layer, the floating-point matrix multiplication of the original convolutional neural network can be converted into a fixed-point matrix multiplication, and the parameter matrix can be replaced by a fixed-point matrix during storage, so it can significantly reduce the computational overhead and storage capacity. running speed.

本发明通过对深度卷积神经网络的权值和激活进行训练后量化，把权值和激活由32位的浮点数转换为低比特整数值，对权值用低比特定点数格式进行存储达到压缩网络模型的目的。同时卷积运算也由原来的浮点型乘加构成的运算转换为低比特定点数运算，达到加速网络前向推理速度的目的。需要注意的是，本发明主要面向训练后量化，即不需要使用训练数据对量化后网络进行重新训练或微调。因此，本发明很容易推广到网络的训练中量化。The invention quantizes the weights and activations of the deep convolutional neural network after training, converts the weights and activations from 32-bit floating point numbers to low-bit integer values, and stores the weights in a low-to-specific point format to achieve compression. The purpose of the network model. At the same time, the convolution operation is also converted from the original floating-point multiplication and addition operation to a low-ratio specific point operation, so as to achieve the purpose of accelerating the forward inference speed of the network. It should be noted that the present invention is mainly oriented to post-training quantization, that is, it is not necessary to use training data to retrain or fine-tune the post-quantization network. Therefore, the present invention can be easily generalized to quantization in the training of the network.

本发明提供的方法可以实现对深度卷积神经网络的加速与压缩，不同于以往训练后量化方法中把权值和激活***开来并通过人工选定的标准，分别确定权值和激活的量化尺度因子，本发明提供的方法优势之一在于提供了一种基于激活低比特定点拟合的训练后量化方法和计算方案。在以往的训练后量化方案中，对于权值量化，需要通过最小化量化前和量化后权值矩阵之间的距离，求解权值量化尺度因子；而对于激活量化采用同样的方法，即最小化量化前激活和量化后激活之间的距离，求解激活量化的尺度因子。此处对权值进行定点量化，仅仅考虑了权值矩阵量化误差最小化，而并没有考虑权值量化前后卷积的输出是否相似，这是导致其定点量化方法精度不高的原因之一。本发明直接学习由输入激活到输出激活之间的低比特映射函数，能够保障权值量化前后，卷积的输出是相似的，因此在本发明提供的训练后量化方法模型精度远高于以往训练后量化方案。The method provided by the present invention can realize the acceleration and compression of the deep convolutional neural network, which is different from the previous post-training quantization method in which the weights and activations are split, and the quantization of the weights and activations is determined separately through manually selected criteria. Scale factor, one of the advantages of the method provided by the present invention is to provide a post-training quantization method and calculation scheme based on the fitting of a specific point with a low activation ratio. In the previous post-training quantization scheme, for weight quantization, it is necessary to solve the weight quantization scale factor by minimizing the distance between the weight matrix before and after quantization; while for activation quantization, the same method is used, that is, minimizing Distance between pre-quantization activations and post-quantization activations, solving for the scale factor for activation quantization. The fixed-point quantization of the weights here only considers the minimization of the quantization error of the weight matrix, but does not consider whether the outputs of the convolution before and after the weights quantization are similar, which is one of the reasons for the low accuracy of the fixed-point quantization method. The present invention directly learns the low-bit mapping function from the input activation to the output activation, which can ensure that the output of the convolution is similar before and after the weight quantization. Therefore, the model accuracy of the quantization method provided by the present invention after training is much higher than previous training. Post-quantification scheme.

以图像分类中应用的ResNet18深度卷积神经网络为例，选定权值和激活量化比特数均为4比特，利用本发明方法对ResNet18深度卷积神经网络进行训练后量化处理，获得权值-激活定点量化的ResNet18深度卷积神经网络。通过测试，经过本发明方法进行处理后得到的权值-激活定点量化的ResNet18深度卷积神经网络占用的存储空间至少减小至原来的1/4，计算由原来的32比特浮点运算转换成4比特定点运算。在大规模图像分类任务ImageNet上的测试精度也为目前已知的训练后量化网络中的最高精度。Taking the ResNet18 deep convolutional neural network applied in image classification as an example, the selected weights and the number of activated quantization bits are both 4 bits, and the method of the present invention is used to quantize the ResNet18 deep convolutional neural network after training, and obtain the weight- Activates the fixed-point quantized ResNet18 deep convolutional neural network. Through the test, the storage space occupied by the ResNet18 deep convolutional neural network with the weight-activated fixed-point quantization obtained after being processed by the method of the present invention is reduced to at least 1/4 of the original, and the calculation is converted from the original 32-bit floating-point operation to 4-bit operation. The test accuracy on ImageNet, a large-scale image classification task, is also the highest among known post-training quantization networks.

本发明第二实施例的基于激活定点拟合的卷积神经网络训练后量化***，基于上述的基于激活定点拟合的卷积神经网络训练后量化方法，该量化***包括网络权重矩阵拟合模块、网络第一次激活矩阵拟合模块、网络第二次激活矩阵拟合和输出模块；The post-training quantization system of convolutional neural network based on activation fixed-point fitting according to the second embodiment of the present invention is based on the above-mentioned post-training quantization method of convolutional neural network based on activation fixed-point fitting, the quantization system includes a network weight matrix fitting module , the first activation matrix fitting module of the network, the second activation matrix fitting and output module of the network;

所属技术领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的***的具体工作过程及有关说明，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process and related description of the system described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here.

需要说明的是，上述实施例提供的基于激活定点拟合的卷积神经网络训练后量化***，仅以上述各功能模块的划分进行举例说明，在实际应用中，可以根据需要而将上述功能分配由不同的功能模块来完成，即将本发明实施例中的模块或者步骤再分解或者组合，例如，上述实施例的模块可以合并为一个模块，也可以进一步拆分成多个子模块，以完成以上描述的全部或者部分功能。对于本发明实施例中涉及的模块、步骤的名称，仅仅是为了区分各个模块或者步骤，不视为对本发明的不当限定。It should be noted that the post-training quantization system of the convolutional neural network based on activation fixed-point fitting provided by the above embodiments is only illustrated by the division of the above functional modules. In practical applications, the above functions can be allocated as required. It is completed by different functional modules, that is, the modules or steps in the embodiments of the present invention are decomposed or combined. For example, the modules in the above embodiments can be combined into one module, and can also be further split into multiple sub-modules to complete the above description. all or part of the functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing each module or step, and should not be regarded as an improper limitation of the present invention.

本发明第三实施例的一种存储装置，其中存储有多条程序，所述程序适于由处理器加载并执行以实现上述的基于激活定点拟合的卷积神经网络训练后量化方法。A storage device according to a third embodiment of the present invention stores a plurality of programs, and the programs are adapted to be loaded and executed by a processor to implement the above-mentioned post-training quantization method of a convolutional neural network based on activation fixed-point fitting.

本发明第四实施例的一种处理装置，包括处理器、存储装置；处理器，适于执行各条程序；存储装置，适于存储多条程序；所述程序适于由处理器加载并执行以实现上述的基于激活定点拟合的卷积神经网络训练后量化方法。A processing device according to a fourth embodiment of the present invention includes a processor and a storage device; the processor is adapted to execute various programs; the storage device is adapted to store multiple programs; the programs are adapted to be loaded and executed by the processor In order to realize the above-mentioned quantization method after training of convolutional neural network based on activation fixed-point fitting.

所属技术领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的存储装置、处理装置的具体工作过程及有关说明，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process and relevant description of the storage device and processing device described above can refer to the corresponding process in the foregoing method embodiments, which is not repeated here. Repeat.

本领域技术人员应该能够意识到，结合本文中所公开的实施例描述的各示例的模块、方法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，软件模块、方法步骤对应的程序可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。为了清楚地说明电子硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以电子硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those skilled in the art should be aware that the modules and method steps of each example described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two, and the programs corresponding to the software modules and method steps Can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or as known in the art in any other form of storage medium. In order to clearly illustrate the interchangeability of electronic hardware and software, the components and steps of each example have been described generally in terms of functionality in the foregoing description. Whether these functions are performed in electronic hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods of implementing the described functionality for each particular application, but such implementations should not be considered beyond the scope of the present invention.

术语“第一”、“第二”等是用于区别类似的对象，而不是用于描述或表示特定的顺序或先后次序。The terms "first," "second," etc. are used to distinguish between similar objects, and are not used to describe or indicate a particular order or sequence.

术语“包括”或者任何其它类似用语旨在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备/装置不仅包括那些要素，而且还包括没有明确列出的其它要素，或者还包括这些过程、方法、物品或者设备/装置所固有的要素。The term "comprising" or any other similar term is intended to encompass a non-exclusive inclusion such that a process, method, article or device/means comprising a list of elements includes not only those elements but also other elements not expressly listed, or Also included are elements inherent to these processes, methods, articles or devices/devices.

至此，已经结合附图所示的优选实施方式描述了本发明的技术方案，但是，本领域技术人员容易理解的是，本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下，本领域技术人员可以对相关技术特征作出等同的更改或替换，这些更改或替换之后的技术方案都将落入本发明的保护范围之内。So far, the technical solutions of the present invention have been described with reference to the preferred embodiments shown in the accompanying drawings, however, those skilled in the art can easily understand that the protection scope of the present invention is obviously not limited to these specific embodiments. Without departing from the principle of the present invention, those skilled in the art can make equivalent changes or substitutions to the relevant technical features, and the technical solutions after these changes or substitutions will fall within the protection scope of the present invention.

Claims

一种基于激活定点拟合的卷积神经网络训练后量化方法，其特征在于，该量化方法包括：A post-training quantization method for a convolutional neural network based on activation fixed-point fitting, characterized in that the quantization method comprises:

步骤S10，网络权重矩阵拟合，获取原始卷积神经网络各层权值矩阵，并分别进行各层权值矩阵的低比特定点量化，获得网络各层的定点权值矩阵和权值量化尺度因子；Step S10, network weight matrix fitting, obtaining the weight matrix of each layer of the original convolutional neural network, and performing low-to-specific point quantization of the weight matrix of each layer respectively, to obtain the fixed-point weight matrix and weight quantization scale factor of each layer of the network ;

步骤S20，网络第一次激活矩阵拟合，获取一组校验数据，基于所述网络各层的定点权值矩阵和权值量化尺度因子，构造由输入激活到输出激活的优化目标函数，迭代进行定点权值矩阵和权值量化尺度因子的优化，获得权值定点量化卷积神经网络；Step S20, the network is fitted for the first activation matrix, a set of verification data is obtained, and an optimization objective function from input activation to output activation is constructed based on the fixed-point weight matrix and weight quantization scale factor of each layer of the network, and iterative Optimize the fixed-point weight matrix and the weight quantization scale factor to obtain the weight fixed-point quantization convolutional neural network;

步骤S30，网络第二次激活矩阵拟合，基于所述一组校验数据以及权值定点量化卷积神经网络，求解激活量化尺度因子，获得量化后的权值-激活定点量化卷积神经网络。Step S30, the network is fitted for the second activation matrix, based on the set of verification data and the weight fixed-point quantized convolutional neural network, the activation quantization scale factor is solved, and the quantized weight-activated fixed-point quantized convolutional neural network is obtained .
根据权利要求1所述的基于激活定点拟合的卷积神经网络训练后量化方法，其特征在于，步骤S10中“分别进行各层权值矩阵的低比特定点量化，获得网络各层的定点权值矩阵和权值量化尺度因子”，其方法为：The post-training quantization method for a convolutional neural network based on activation fixed-point fitting according to claim 1, wherein in step S10, "respectively perform low-to-specific point quantization of the weight matrix of each layer, and obtain the fixed-point weight of each layer of the network. value matrix and weight quantization scale factor", the method is:

步骤S11，将当前层的权值矩阵按行拆分为权值向量，并以权值向量的最大绝对值除以权值量化定点数的最大值作为各行的初始权值量化尺度因子；Step S11, dividing the weight matrix of the current layer into weight vectors by row, and dividing the maximum absolute value of the weight vector by the maximum value of the fixed-point weight quantization as the initial weight quantization scale factor of each row;

步骤S12，构建所述权值向量、权值量化定点数以及初始权值量化尺度因子之间的权值量化误差函数；Step S12, constructing the weight quantization error function between the weight vector, the weight quantization fixed-point number and the initial weight quantization scale factor;

步骤S13，基于所述权值量化误差函数，迭代进行权值量化定点数和权值量化尺度因子的求解，直至所述权值量化误差函数值低于设定阈值，获得当前层的定点权值矩阵和权值量化尺度因子；Step S13, based on the weight quantization error function, iteratively perform the solution of the weight quantization fixed-point number and the weight quantization scale factor, until the weight quantization error function value is lower than the set threshold, obtain the fixed-point weight of the current layer Matrix and weight quantization scale factor;

步骤S14，通过步骤S11-步骤S13的方法分别获取网络各层的定点权值矩阵和权值量化尺度因子。In step S14, the fixed-point weight matrix and the weight quantization scale factor of each layer of the network are obtained respectively by the method of step S11-step S13.
根据权利要求2所述的基于激活定点拟合的卷积神经网络训练后量化方法，其特征在于，所述权值向量、权值量化定点数以及初始权值量化尺度因子之间的权值量化误差函数为：The post-training quantization method for a convolutional neural network based on activation fixed-point fitting according to claim 2, wherein the weight quantization between the weight vector, the weight quantization fixed point number and the initial weight quantization scale factor The error function is:

其中，W为C _out×K的二维浮点数矩阵，W _i为矩阵的第i行的权值向量，C _out为当前层的输出通道数；当前层为卷积层时，K＝C _in*K _h*K _w，K _h和K _w分别为卷积核的高度和宽度，当前层为全连接层时，K＝C _in，C _in为当前层的输入通道数；Q _i代表权值量化定点数；Λ _ii代表初始权值量化尺度因子；
代表向量的2-范数的平方。 Among them, W is a two-dimensional floating-point number matrix _{of C out} _{× K, W i} is the weight vector of the i-th row of the matrix, and C _out is the number of output channels of the current layer; when the current layer is a convolutional layer, K=C _in *K _h *K _w , K _h and K _w are the height and width of the convolution kernel respectively, when the current layer is a fully connected layer, K=C _in , C _in is the number of input channels of the current layer; Q _i represents the weight value Quantization fixed point number; Λ _ii represents the initial weight quantization scale factor;
Represents the square of the 2-norm of the vector.
根据权利要求1所述的基于激活定点拟合的卷积神经网络训练后量化方法，其特征在于，步骤S20包括：The post-training quantization method for a convolutional neural network based on activation fixed-point fitting according to claim 1, wherein step S20 comprises:

步骤S21，将获取的一组校验数据输入原始卷积神经网络，获取当前层的输入激活和输出激活；Step S21, inputting the obtained set of verification data into the original convolutional neural network to obtain the input activation and output activation of the current layer;

步骤S22，基于所述当前层的输入激活和输出激活，构建当前层的定点约束下的线性最小二乘优化目标函数；Step S22, based on the input activation and output activation of the current layer, construct a linear least squares optimization objective function under the fixed-point constraint of the current layer;

步骤S23，按照定点权值矩阵的行进行所述线性最小二乘优化目标函数的拆分，并迭代进行权值量化尺度因子和定点权值的优化，直至输出激活量化误差小于设定阈值，获得当前层的优化后的定点权值矩阵和权值量化尺度因子；Step S23, splitting the linear least squares optimization objective function according to the row of the fixed-point weight matrix, and iteratively carrying out the optimization of the weight quantization scale factor and the fixed-point weight, until the output activation quantization error is less than the set threshold, obtain: The optimized fixed-point weight matrix and weight quantization scale factor of the current layer;

步骤S24，通过步骤S21-步骤S23的方法分别获取网络各层的优化后的定点权值矩阵和权值量化尺度因子，获得权值定点量化卷积神经网络。In step S24, the optimized fixed-point weight matrix and weight quantization scale factor of each layer of the network are obtained respectively by the methods of steps S21-S23, and the weight fixed-point quantization convolutional neural network is obtained.
根据权利要求4所述的基于激活定点拟合的卷积神经网络训练后量化方法，其特征在于，所述线性最小二乘优化目标函数为：The post-training quantization method for a convolutional neural network based on activation fixed-point fitting according to claim 4, wherein the linear least squares optimization objective function is:

其中，X和Y分别代表当前层的输入激活和输出激活，Q为当前层的定点权值矩阵，Λ为当前层的权值量化尺度因子，
代表矩阵的F-范数的平方。 Among them, X and Y represent the input activation and output activation of the current layer, respectively, Q is the fixed-point weight matrix of the current layer, Λ is the weight quantization scale factor of the current layer,
Represents the square of the F-norm of the matrix.
根据权利要求1或4所述的基于激活定点拟合的卷积神经网络训练后量化方法，其特征在于，所述一组校验数据为少量原始卷积神经网络的训练数据，或者为少量与原始卷积神经网络的训练数据分布类似的其他数据，或者是人工生成的仿真数据，或者是随机生成的随机数据。The post-training quantization method for a convolutional neural network based on activation fixed-point fitting according to claim 1 or 4, wherein the set of verification data is a small amount of training data of the original convolutional neural network, or a small amount of The training data of the original convolutional neural network has a similar distribution of other data, either artificially generated simulation data, or randomly generated random data.
根据权利要求1所述的基于激活定点拟合的卷积神经网络训练后量化方法，其特征在于，步骤S30包括：The post-training quantization method for a convolutional neural network based on activation fixed-point fitting according to claim 1, wherein step S30 comprises:

步骤S31，将所述一组校验数据输入所述权值定点卷积神经网络，获取各层的输出激活构成输出激活向量；Step S31, inputting the set of verification data into the weighted fixed-point convolutional neural network, and obtaining the output activations of each layer to form an output activation vector;

步骤S32，以所述输出激活向量的最大绝对值除以激活量化定点数的最大值作为初始激活量化尺度因子；Step S32, divide the maximum absolute value of the output activation vector by the maximum value of the activation quantization fixed-point number as the initial activation quantization scale factor;

步骤S33，构建所述输出激活向量、激活量化函数以及初始激活量化尺度因子之间的激活量化误差函数；Step S33, constructing the activation quantization error function between the output activation vector, the activation quantization function and the initial activation quantization scale factor;

步骤S34，基于所述激活量化误差函数，迭代进行定点激活向量和激活量化尺度因子的求解，直至所述激活量化误差函数值低于设定阈值，获得优化的定点激活向量和激活量化尺度因子，获得量化后的权值-激活定点量化卷积神经网络。Step S34, based on the activation quantization error function, iteratively carry out the solution of the fixed-point activation vector and the activation quantization scale factor, until the activation quantization error function value is lower than the set threshold, obtain the optimized fixed-point activation vector and activation quantization scale factor, Get quantized weights - activate a fixed-point quantized convolutional neural network.
根据权利要求7所述的基于激活定点拟合的卷积神经网络训练后量化方法，其特征在于，所述激活量化函数为：The post-training quantization method for a convolutional neural network based on activation fixed-point fitting according to claim 7, wherein the activation quantization function is:

其中，q _min和q _max分别代表激活量化定点数的最小值和最大值，α代表激活量化尺度因子，x _i代表第i层的输出激活，round(*)代表四舍五入运算，clip(*)代表阈值截断操作。 Among them, q _min and q _max represent the minimum and maximum activation quantization fixed-point numbers, respectively, α represents the activation quantization scale factor, x _i represents the output activation of the i-th layer, round(*) represents the rounding operation, and clip(*) represents the Threshold truncation operation.
根据权利要求7所述的基于激活定点拟合的卷积神经网络训练后量化方法，其特征在于，所述激活量化误差函数为：The post-training quantization method for a convolutional neural network based on activation fixed-point fitting according to claim 7, wherein the activation quantization error function is:

其中，α代表激活量化尺度因子，x代表输出激活向量，
代表激活量化函数，
代表向量的2-范数的平方。 where α represents the activation quantization scale factor, x represents the output activation vector,
represents the activation quantization function,
Represents the square of the 2-norm of the vector.
一种基于激活定点拟合的卷积神经网络训练后量化***，其特征在于，基于权利要求1-9任一项所述的基于激活定点拟合的卷积神经网络训练后量化方法，该量化***包括网络权重矩阵拟合模块、网络第一次激活矩阵拟合模块、网络第二次激活矩阵拟合和输出模块；A post-training quantization system for a convolutional neural network based on activation-fixed-point fitting, characterized in that, based on the post-training quantization method for a convolutional neural network based on activation-fixed-point fitting according to any one of claims 1-9, the quantization The system includes a network weight matrix fitting module, a network first activation matrix fitting module, and a network second activation matrix fitting and output module;

所述网络权重矩阵拟合模块，配置为获取原始卷积神经网络各层权值矩阵，并分别进行各层权值矩阵的低比特定点量化，获得网络各层的定点权值矩阵和权值量化尺度因子；The network weight matrix fitting module is configured to obtain the weight matrices of each layer of the original convolutional neural network, and perform low-ratio specific point quantization of the weight matrix of each layer respectively, and obtain the fixed-point weight matrix and weight quantization of each layer of the network. scale factor;

所述网络第一次激活矩阵拟合模块，获取一组校验数据，基于所述网络各层的定点权值矩阵和权值量化尺度因子，构造由输入激活到输出激活的优化目标函数，迭代进行定点权值矩阵和权值量化尺度因子的优化，获得权值定点量化卷积神经网络；The network activates the matrix fitting module for the first time, obtains a set of verification data, constructs an optimization objective function from input activation to output activation based on the fixed-point weight matrix and weight quantization scale factor of each layer of the network, and iterates Optimize the fixed-point weight matrix and the weight quantization scale factor to obtain the weight fixed-point quantization convolutional neural network;

所述网络第二次激活矩阵拟合，配置为基于所述一组校验数据以及权值定点量化卷积神经网络，求解激活量化尺度因子，获得量化后的权值-激活定点量化卷积神经网络；The second activation matrix fitting of the network is configured to solve the activation quantization scale factor based on the set of verification data and the weight fixed-point quantized convolutional neural network, and obtain the quantized weight-activated fixed-point quantized convolutional neural network The internet;

所述输出模块，配置为输出获取的量化后的权值-激活定点量化卷积神经网络。The output module is configured to output the obtained quantized weight-activated fixed-point quantized convolutional neural network.