CN116484936A

CN116484936A - Storage pool ESN network quantification method and device and electronic equipment

Info

Publication number: CN116484936A
Application number: CN202210032496.5A
Authority: CN
Inventors: 沈成; 赵斌; 张瑞涛
Original assignee: Photon Arithmetic Beijing Technology Co ltd
Current assignee: Photon Arithmetic Beijing Technology Co ltd
Priority date: 2022-01-12
Filing date: 2022-01-12
Publication date: 2023-07-25

Abstract

The application provides a reserve pool ESN network quantification method, a device and electronic equipment, wherein the method comprises the following steps: inputting weight parameters to be quantized, and obtaining original probability distribution; finding the maximum value of the weight parameters to be quantized; under the condition of different thresholds, the iterative calculation quantifies the probability distribution of the calculated model and calculates the corresponding relative entropy value; and finding out the minimum threshold value corresponding to the relative entropy value and outputting a corresponding quantization model. The scheme achieves the purposes of reducing the size of the model, reducing the memory consumption of the model and accelerating the reasoning speed of the model on the basis of hardly losing the network flow performance of the ESN network prediction data center.

Description

Storage pool ESN network quantification method and device and electronic equipment

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a storage pool ESN network quantification method, a storage pool ESN network quantification device and electronic equipment.

Background

With the continued development of AI (Artificial Intelligence ) technology, neural networks are also being used in such scenarios as face recognition, intelligent navigation, telecommunications, traffic prediction, etc. With the wide application of cloud computing in search engines, social media, electronic commerce, etc., data center networks become an important network structure. The problem easily appearing in the network can be avoided by predicting the network flow of the data center, the network resources can be effectively optimized and adjusted, and the network connection of the important nodes is further ensured. The network traffic prediction results can be used as important references for traffic trend in a future period of time. The prediction of data center network traffic may be regarded as a time series prediction problem. This problem is addressed to analyze time series data already collected on the network and thereby predict demand in future networks. Because the network traffic does not change smoothly, but rather has great fluctuation, the prediction method using some nonlinearities is closer to the real situation, such as artificial neural network, deep learning and other methods. The echo state network (Echo State Network, ESN) is an improved model of recurrent neural network, is widely applied to data center network traffic prediction, and achieves good effects. The scale of the ESN network is related to the number of neurons, and the more the number of neurons is, the more the nonlinear model can be accurately described, and the future network flow can be predicted better. However, more neurons means more model data, resulting in higher prediction performance of the ESN model, which requires more space and computational performance.

As one of the means of general deep learning optimization, model quantization quantizes a deep learning model into a smaller fixed-point model and a faster reasoning speed, which is applicable to a vast majority of models and usage scenarios. The model quantization is carried out at the cost of losing the reasoning precision, floating point type parameters (weights or tensors) with continuous values or discrete values in a network are linearly mapped into discrete values with fixed point approximation, the original float32 format data is replaced, and meanwhile, the input and the output are kept to be floating point type, so that the aims of reducing the size of the model, reducing the memory consumption of the model, accelerating the reasoning speed of the model and the like are achieved.

Disclosure of Invention

The embodiment of the application aims to provide a method and a device for quantifying an ESN (storage pool) network and electronic equipment, and the method and the device achieve the aims of reducing the size of a model, reducing the memory consumption of the model and accelerating the reasoning speed of the model on the basis of hardly losing the network flow performance of an ESN network prediction data center.

The embodiment of the application provides a reserve pool ESN network quantification method, which comprises the following steps: inputting weight parameters to be quantized, and obtaining original probability distribution; finding the maximum value of the weight parameters to be quantized; under the condition of different thresholds, the iterative calculation quantifies the probability distribution of the calculated model and calculates the corresponding relative entropy value; and finding out the minimum threshold value corresponding to the relative entropy value and outputting a corresponding quantization model.

Further, in the step of obtaining the original probability distribution by inputting the weight parameters to be quantized, specifically, the ESN weight parameters to be quantized are input, which is assumed to beThe method comprises the steps of carrying out a first treatment on the surface of the The probability P0 of the original ESN model is calculated.

In the implementation process, ESN (electronic service network) weight parameters to be quantizedMore than one weight parameter. The ESN model comprises 3 components, an input control layer, an implicit control layer and an output control layer. Input layer activation vector containing k input neurons isImplicit activation vector containing F implicit neurons isAn output activation vector containing L output neurons is. In the model, the values of the input layer activation vector, the implicit activation vector and the output layer activation vector at a specific time n are respectively:

the state equation for ESN is:

wherein,,the update state of the hidden layer activation vector at the time t+1;outputting an update state of the layer activation vector for the time t+1;connecting a weight matrix for the hidden layer-hidden layer;connecting a weight matrix for an input layer-hidden layer;connecting a weight matrix for an output layer-hidden layer;connecting a weight matrix for input;activating a function for internal neuron movement;as an output function. During training of ESNs, a weight matrix is connected to a reservoir、、Is randomly generated and does not change during training.Is obtained by training. The to-be-quantizedRefers to、、And. By aligning、、Andand the quantization of four parameters reduces the size of the model, reduces the memory consumption of the model and accelerates the model reasoning speed.

Further, the step of finding the maximum value of the weight parameter to be quantized is specifically foundThe maximum value in the number of series is recorded as。

Further, the iterative calculation quantifies probability distribution of the calculated model under different threshold conditions, and in the step of calculating corresponding relative entropy values, the merits of the quantified model under different threshold conditions are found out by changing the size of the threshold S.

In the implementation process, the merits of the quantization model under different threshold conditions are found out by changing the size of the threshold S. In general, quantization of the weight parameter is to find the maximum absolute value of the weight distribution, and take this as the maximum boundary to map with int8 in an equal ratio to obtain the quantized parameter. However, the four weight parameters of the ESN network are unevenly distributed, and the mapping by directly finding the maximum weight value generates information loss, so that the quantization result is not ideal. The patent finds out the merits of the quantization model under different threshold conditions by changing the threshold.

Further, the quantization calculating method includes: and carrying out quantization calculation on the original weight parameters.

In the implementation process, the original floating point type weight parameters are mapped to fixed point values of precision requirements. The following equation is used,

，

wherein R represents a true floating point value; q represents the quantized setpoint value; z represents a quantization fixed point value corresponding to 0 floating point; m represents the minimum scale which can be represented after fixed-point quantization；Representing the largest floating point value;representing the smallest floating point value;representing a corresponding threshold;representing the maximum setpoint value;representing the smallest setpoint value. ChangingDifferent quantization scales are realized. The original weight data can be quantized into fixed-point data with the precision requirement through the three formulas.

Further, the calculating the corresponding relative entropy value includes: and calculating a relative entropy value.

In the above implementation, the relative entropy value is calculated by calculating the probability distribution of the original model and the quantized model probability distribution.,Is a probability distribution function of two discrete random variables,with respect toThe relative entropy of (2) is:

if it is,The higher the similarity of (2)The smaller the value. In the quantization process, the weight distribution of the original model is the optimal expected distribution, and the fitting model obtained by quantizing the original model is better as the weight distribution of the fitting model is closer to the original model. The invention adopts relative entropy as a fitting error optimization quantization evaluation index.

Further, in the step of finding the minimum threshold corresponding to the relative entropy value and outputting the corresponding quantization model, a series of relative entropy values under different threshold conditions are calculated through iteration, so that quantization models with different degrees can be obtained. Finding out the threshold value corresponding to the minimum relative entropy valueValue, representing threshold valueThe quantized model is closest to the original model under the condition of value, and outputsAnd (5) a quantization model corresponding to the value.

The embodiment of the application also provides a storage pool ESN network quantification device, which comprises: the device comprises an input acquisition module, an iterative quantization module and an optimal quantization model output module; the input acquisition module is used for inputting ESN network weight parameters to be quantized and acquiring probability distribution of an original ESN model; the iterative quantization module is used for iteratively calculating probability distribution of the model after the calculation under the condition of different thresholds, and calculating corresponding relative entropy values; the optimal quantization model output module is used for finding out a threshold value corresponding to the minimum relative entropy value and outputting a corresponding quantization model.

The embodiment of the application also provides electronic equipment, which comprises a processor, a memory and a communication bus; the communication bus is used for realizing connection communication between the processor and the memory; the processor is configured to execute one or more programs stored in the memory to implement any of the pool ESN network quantization methods described above.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a method for quantifying an ESN network of a pool according to an embodiment of the present application;

fig. 2 is a detailed flowchart of a method for quantifying an ESN network of a pool according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a pool ESN network quantization device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Embodiment one:

the embodiment of the application aims to provide a method and a device for quantifying an ESN (storage pool) network and electronic equipment, and the method and the device achieve the aims of reducing the size of a model, reducing the memory consumption of the model and accelerating the reasoning speed of the model on the basis of hardly losing the network flow performance of an ESN network prediction data center. Referring to fig. 1, fig. 1 is a schematic flow chart of a model training method provided in an embodiment of the present application, including:

s101: and inputting weight parameters to be quantized, and obtaining original probability distribution.

the state equation for ESN is:

S102: and finding the maximum value of the weight parameters to be quantized.

It should be noted that, the target type in the embodiment of the present application refers to a data type of a preset quantization parameter. By way of example, the target type may be fp16 type, may be int8 type, may be uint8 type, etc., and is not limiting in the embodiments of the present application.

It should be noted that, in the embodiment of the present application, a user or an engineer may preset a quantization accuracy, so that the activation value and the weight value are mapped to the activation value and the weight value of the target type according to the quantization accuracy.

It should be understood that, in the embodiment of the present application, the activation value and the weight value are mapped to the activation value and the weight value of the target type, and the mapping of the activation value and the weight value to the activation value and the weight value of the target type may be implemented by dispersing consecutive decimal numbers into close integers.

For example, assume an activation value of 0.125677, a weight value of 2.186511, and a quantization accuracy of 1 bit after a decimal point. Then it can be quantized to 0.1 and 2.2 resulting in quantized activation and weight values.

It should be noted that, the above is only an alternative manner of mapping the activation value and the weight value to the activation value and the weight value of the target type by using the model illustrated in the embodiment of the present application, and other various quantization manners may be adopted in the embodiment of the present application, which is not limited herein.

S103: and under the condition of different thresholds, quantizing the probability distribution of the calculated model by iterative calculation, and calculating the corresponding relative entropy value.

，

wherein R represents a true floating point value; q represents the quantized setpoint value; z represents a quantization fixed point value corresponding to 0 floating point; m represents the minimum scale which can be represented after fixed-point quantization;representing the largest floating point value;representing the smallest floating point value;representing a corresponding threshold;representing the maximum setpoint value;representing the smallest setpoint value. ChangingDifferent quantization scales are realized. The original weight data can be quantized into fixed-point data with the precision requirement through the three formulas. The variation range of this patent S is (0,1.2)）。

，

S104: and finding out the minimum threshold value corresponding to the relative entropy value and outputting a corresponding quantization model.

By iteratively calculating a series of relative entropy values under different threshold conditions, quantization models of different degrees can be obtained. Finding out the threshold value corresponding to the minimum relative entropy valueValue, representing threshold valueThe quantized model is closest to the original model under the condition of value, and outputsAnd (5) a quantization model corresponding to the value.

In the model quantization method provided by the embodiment of the application, when the maximum value of the weight parameters to be quantized is found, the quantization result is changed by changing the size of the threshold S, wherein the range of S is 0 to 1.2 times of the maximum weight parameter. Different quantization results are obtained by changing the threshold value S. And introducing a relative threshold value, and calculating the closeness degree of different quantization models and the original model to find out the optimal quantization result. Therefore, the purposes of reducing the size of the model, reducing the memory consumption of the model and accelerating the reasoning speed of the model are achieved on the basis of hardly losing the network flow performance of the ESN network prediction data center.

Embodiment two:

based on the same inventive concept, a pool ESN network quantization apparatus 300 is also provided in the embodiments of the present application. Referring to fig. 3, fig. 3 shows a model quantization apparatus employing the method shown in fig. 1. It should be appreciated that the specific functions of the apparatus 300 may be found in the above description, and detailed descriptions are omitted herein as appropriate to avoid repetition. The device 300 includes at least one software functional module that can be stored in memory in the form of software or firmware or cured in the operating system of the device 300. Specifically:

referring to fig. 3, the apparatus 300 includes: an input acquisition module 301, an iterative quantization module 302, and an optimal quantization model output module 303. Wherein:

the input obtaining module 301 is configured to input an ESN network weight parameter to be quantized, and obtain probability distribution of an original ESN model;

the iterative quantization module 302 is configured to iteratively calculate probability distributions of the model after calculation under different threshold values, and calculate corresponding relative entropy values;

the best quantization model output module 303 is configured to find a threshold corresponding to the minimum relative entropy value, and output a corresponding quantization model.

It should be understood that, for simplicity of description, the descriptions in the first embodiment are omitted in this embodiment.

Embodiment III:

this embodiment provides an electronic device, see fig. 4, comprising a processor 401, a memory 402 and a communication bus 403. Wherein: a communication bus 403 is used to enable connected communication between the processor 401 and the memory 402.

The processor 401 is configured to execute one or more first programs stored in the memory 402 to implement the model training method in the first and/or second embodiments.

It will be appreciated that the configuration shown in fig. 4 is merely illustrative, and that the electronic device may also include more or fewer components than shown in fig. 4, or have a different configuration than shown in fig. 4.

It should be noted that, the electronic device described in the embodiments of the present application may be a device having data processing capability, such as a computer, a server, or the like.

The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application, and various modifications and variations may be suggested to one skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. A pool ESN network quantization method, comprising:

inputting weight parameters to be quantized, and obtaining original probability distribution; finding the maximum value of the weight parameters to be quantized; under the condition of different thresholds, the iterative calculation quantifies the probability distribution of the calculated model and calculates the corresponding relative entropy value; and finding out the minimum threshold value corresponding to the relative entropy value and outputting a corresponding quantization model.

2. The pool ESN network quantization method of claim 1, wherein the weight parameters to be quantized comprise:、、and。

3. a pool ESN network quantization method as claimed in claim 1, wherein the threshold value ranges from a maximum weight parameter value greater than zero and less than 1.2 times.

4. A pool ESN network quantization apparatus, comprising:

the device comprises an input acquisition module, an iterative quantization module and an optimal quantization model output module; the input acquisition module is used for inputting ESN network weight parameters to be quantized and acquiring probability distribution of an original ESN model; the iterative quantization module is used for iteratively calculating probability distribution of the model after the calculation under the condition of different thresholds, and calculating corresponding relative entropy values; the optimal quantization model output module is used for finding out a threshold value corresponding to the minimum relative entropy value and outputting a corresponding quantization model.

5. An electronic device comprising a processor, a memory, and a communication bus; the communication bus is used for realizing connection communication between the processor and the memory; the processor is configured to execute one or more programs stored in the memory to implement the pool ESN network quantization method of any one of claims 1 to 3 above.