CN116484936A - Storage pool ESN network quantification method and device and electronic equipment - Google Patents

Storage pool ESN network quantification method and device and electronic equipment Download PDF

Info

Publication number
CN116484936A
CN116484936A CN202210032496.5A CN202210032496A CN116484936A CN 116484936 A CN116484936 A CN 116484936A CN 202210032496 A CN202210032496 A CN 202210032496A CN 116484936 A CN116484936 A CN 116484936A
Authority
CN
China
Prior art keywords
model
quantization
value
esn
quantized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210032496.5A
Other languages
Chinese (zh)
Inventor
沈成
赵斌
张瑞涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Photon Arithmetic Beijing Technology Co ltd
Original Assignee
Photon Arithmetic Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Photon Arithmetic Beijing Technology Co ltd filed Critical Photon Arithmetic Beijing Technology Co ltd
Priority to CN202210032496.5A priority Critical patent/CN116484936A/en
Publication of CN116484936A publication Critical patent/CN116484936A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a reserve pool ESN network quantification method, a device and electronic equipment, wherein the method comprises the following steps: inputting weight parameters to be quantized, and obtaining original probability distribution; finding the maximum value of the weight parameters to be quantized; under the condition of different thresholds, the iterative calculation quantifies the probability distribution of the calculated model and calculates the corresponding relative entropy value; and finding out the minimum threshold value corresponding to the relative entropy value and outputting a corresponding quantization model. The scheme achieves the purposes of reducing the size of the model, reducing the memory consumption of the model and accelerating the reasoning speed of the model on the basis of hardly losing the network flow performance of the ESN network prediction data center.

Description

Storage pool ESN network quantification method and device and electronic equipment
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a storage pool ESN network quantification method, a storage pool ESN network quantification device and electronic equipment.
Background
With the continued development of AI (Artificial Intelligence ) technology, neural networks are also being used in such scenarios as face recognition, intelligent navigation, telecommunications, traffic prediction, etc. With the wide application of cloud computing in search engines, social media, electronic commerce, etc., data center networks become an important network structure. The problem easily appearing in the network can be avoided by predicting the network flow of the data center, the network resources can be effectively optimized and adjusted, and the network connection of the important nodes is further ensured. The network traffic prediction results can be used as important references for traffic trend in a future period of time. The prediction of data center network traffic may be regarded as a time series prediction problem. This problem is addressed to analyze time series data already collected on the network and thereby predict demand in future networks. Because the network traffic does not change smoothly, but rather has great fluctuation, the prediction method using some nonlinearities is closer to the real situation, such as artificial neural network, deep learning and other methods. The echo state network (Echo State Network, ESN) is an improved model of recurrent neural network, is widely applied to data center network traffic prediction, and achieves good effects. The scale of the ESN network is related to the number of neurons, and the more the number of neurons is, the more the nonlinear model can be accurately described, and the future network flow can be predicted better. However, more neurons means more model data, resulting in higher prediction performance of the ESN model, which requires more space and computational performance.
As one of the means of general deep learning optimization, model quantization quantizes a deep learning model into a smaller fixed-point model and a faster reasoning speed, which is applicable to a vast majority of models and usage scenarios. The model quantization is carried out at the cost of losing the reasoning precision, floating point type parameters (weights or tensors) with continuous values or discrete values in a network are linearly mapped into discrete values with fixed point approximation, the original float32 format data is replaced, and meanwhile, the input and the output are kept to be floating point type, so that the aims of reducing the size of the model, reducing the memory consumption of the model, accelerating the reasoning speed of the model and the like are achieved.
Disclosure of Invention
The embodiment of the application aims to provide a method and a device for quantifying an ESN (storage pool) network and electronic equipment, and the method and the device achieve the aims of reducing the size of a model, reducing the memory consumption of the model and accelerating the reasoning speed of the model on the basis of hardly losing the network flow performance of an ESN network prediction data center.
The embodiment of the application provides a reserve pool ESN network quantification method, which comprises the following steps: inputting weight parameters to be quantized, and obtaining original probability distribution; finding the maximum value of the weight parameters to be quantized; under the condition of different thresholds, the iterative calculation quantifies the probability distribution of the calculated model and calculates the corresponding relative entropy value; and finding out the minimum threshold value corresponding to the relative entropy value and outputting a corresponding quantization model.
Further, in the step of obtaining the original probability distribution by inputting the weight parameters to be quantized, specifically, the ESN weight parameters to be quantized are input, which is assumed to beThe method comprises the steps of carrying out a first treatment on the surface of the The probability P0 of the original ESN model is calculated.
In the implementation process, ESN (electronic service network) weight parameters to be quantizedMore than one weight parameter. The ESN model comprises 3 components, an input control layer, an implicit control layer and an output control layer. Input layer activation vector containing k input neurons isImplicit activation vector containing F implicit neurons isAn output activation vector containing L output neurons is. In the model, the values of the input layer activation vector, the implicit activation vector and the output layer activation vector at a specific time n are respectively:
the state equation for ESN is:
wherein,,the update state of the hidden layer activation vector at the time t+1;outputting an update state of the layer activation vector for the time t+1;connecting a weight matrix for the hidden layer-hidden layer;connecting a weight matrix for an input layer-hidden layer;connecting a weight matrix for an output layer-hidden layer;connecting a weight matrix for input;activating a function for internal neuron movement;as an output function. During training of ESNs, a weight matrix is connected to a reservoirIs randomly generated and does not change during training.Is obtained by training. The to-be-quantizedRefers toAnd. By aligningAndand the quantization of four parameters reduces the size of the model, reduces the memory consumption of the model and accelerates the model reasoning speed.
Further, the step of finding the maximum value of the weight parameter to be quantized is specifically foundThe maximum value in the number of series is recorded as
Further, the iterative calculation quantifies probability distribution of the calculated model under different threshold conditions, and in the step of calculating corresponding relative entropy values, the merits of the quantified model under different threshold conditions are found out by changing the size of the threshold S.
In the implementation process, the merits of the quantization model under different threshold conditions are found out by changing the size of the threshold S. In general, quantization of the weight parameter is to find the maximum absolute value of the weight distribution, and take this as the maximum boundary to map with int8 in an equal ratio to obtain the quantized parameter. However, the four weight parameters of the ESN network are unevenly distributed, and the mapping by directly finding the maximum weight value generates information loss, so that the quantization result is not ideal. The patent finds out the merits of the quantization model under different threshold conditions by changing the threshold.
Further, the quantization calculating method includes: and carrying out quantization calculation on the original weight parameters.
In the implementation process, the original floating point type weight parameters are mapped to fixed point values of precision requirements. The following equation is used,
wherein R represents a true floating point value; q represents the quantized setpoint value; z represents a quantization fixed point value corresponding to 0 floating point; m represents the minimum scale which can be represented after fixed-point quantization;Representing the largest floating point value;representing the smallest floating point value;representing a corresponding threshold;representing the maximum setpoint value;representing the smallest setpoint value. ChangingDifferent quantization scales are realized. The original weight data can be quantized into fixed-point data with the precision requirement through the three formulas.
Further, the calculating the corresponding relative entropy value includes: and calculating a relative entropy value.
In the above implementation, the relative entropy value is calculated by calculating the probability distribution of the original model and the quantized model probability distribution.,Is a probability distribution function of two discrete random variables,with respect toThe relative entropy of (2) is:
if it is,The higher the similarity of (2)The smaller the value. In the quantization process, the weight distribution of the original model is the optimal expected distribution, and the fitting model obtained by quantizing the original model is better as the weight distribution of the fitting model is closer to the original model. The invention adopts relative entropy as a fitting error optimization quantization evaluation index.
Further, in the step of finding the minimum threshold corresponding to the relative entropy value and outputting the corresponding quantization model, a series of relative entropy values under different threshold conditions are calculated through iteration, so that quantization models with different degrees can be obtained. Finding out the threshold value corresponding to the minimum relative entropy valueValue, representing threshold valueThe quantized model is closest to the original model under the condition of value, and outputsAnd (5) a quantization model corresponding to the value.
The embodiment of the application also provides a storage pool ESN network quantification device, which comprises: the device comprises an input acquisition module, an iterative quantization module and an optimal quantization model output module; the input acquisition module is used for inputting ESN network weight parameters to be quantized and acquiring probability distribution of an original ESN model; the iterative quantization module is used for iteratively calculating probability distribution of the model after the calculation under the condition of different thresholds, and calculating corresponding relative entropy values; the optimal quantization model output module is used for finding out a threshold value corresponding to the minimum relative entropy value and outputting a corresponding quantization model.
The embodiment of the application also provides electronic equipment, which comprises a processor, a memory and a communication bus; the communication bus is used for realizing connection communication between the processor and the memory; the processor is configured to execute one or more programs stored in the memory to implement any of the pool ESN network quantization methods described above.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a method for quantifying an ESN network of a pool according to an embodiment of the present application;
fig. 2 is a detailed flowchart of a method for quantifying an ESN network of a pool according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a pool ESN network quantization device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Embodiment one:
the embodiment of the application aims to provide a method and a device for quantifying an ESN (storage pool) network and electronic equipment, and the method and the device achieve the aims of reducing the size of a model, reducing the memory consumption of the model and accelerating the reasoning speed of the model on the basis of hardly losing the network flow performance of an ESN network prediction data center. Referring to fig. 1, fig. 1 is a schematic flow chart of a model training method provided in an embodiment of the present application, including:
s101: and inputting weight parameters to be quantized, and obtaining original probability distribution.
In the implementation process, ESN (electronic service network) weight parameters to be quantizedMore than one weight parameter. The ESN model comprises 3 components, an input control layer, an implicit control layer and an output control layer. Input layer activation vector containing k input neurons isImplicit activation vector containing F implicit neurons isAn output activation vector containing L output neurons is. In the model, the values of the input layer activation vector, the implicit activation vector and the output layer activation vector at a specific time n are respectively:
the state equation for ESN is:
wherein,,the update state of the hidden layer activation vector at the time t+1;outputting an update state of the layer activation vector for the time t+1;connecting a weight matrix for the hidden layer-hidden layer;connecting a weight matrix for an input layer-hidden layer;connecting a weight matrix for an output layer-hidden layer;connecting a weight matrix for input;activating a function for internal neuron movement;as an output function. During training of ESNs, a weight matrix is connected to a reservoirIs randomly generated and does not change during training.Is obtained by training. The to-be-quantizedRefers toAnd. By aligningAndand the quantization of four parameters reduces the size of the model, reduces the memory consumption of the model and accelerates the model reasoning speed.
S102: and finding the maximum value of the weight parameters to be quantized.
It should be noted that, the target type in the embodiment of the present application refers to a data type of a preset quantization parameter. By way of example, the target type may be fp16 type, may be int8 type, may be uint8 type, etc., and is not limiting in the embodiments of the present application.
It should be noted that, in the embodiment of the present application, a user or an engineer may preset a quantization accuracy, so that the activation value and the weight value are mapped to the activation value and the weight value of the target type according to the quantization accuracy.
It should be understood that, in the embodiment of the present application, the activation value and the weight value are mapped to the activation value and the weight value of the target type, and the mapping of the activation value and the weight value to the activation value and the weight value of the target type may be implemented by dispersing consecutive decimal numbers into close integers.
For example, assume an activation value of 0.125677, a weight value of 2.186511, and a quantization accuracy of 1 bit after a decimal point. Then it can be quantized to 0.1 and 2.2 resulting in quantized activation and weight values.
It should be noted that, the above is only an alternative manner of mapping the activation value and the weight value to the activation value and the weight value of the target type by using the model illustrated in the embodiment of the present application, and other various quantization manners may be adopted in the embodiment of the present application, which is not limited herein.
S103: and under the condition of different thresholds, quantizing the probability distribution of the calculated model by iterative calculation, and calculating the corresponding relative entropy value.
In the implementation process, the original floating point type weight parameters are mapped to fixed point values of precision requirements. The following equation is used,
wherein R represents a true floating point value; q represents the quantized setpoint value; z represents a quantization fixed point value corresponding to 0 floating point; m represents the minimum scale which can be represented after fixed-point quantization;representing the largest floating point value;representing the smallest floating point value;representing a corresponding threshold;representing the maximum setpoint value;representing the smallest setpoint value. ChangingDifferent quantization scales are realized. The original weight data can be quantized into fixed-point data with the precision requirement through the three formulas. The variation range of this patent S is (0,1.2))。
In the above implementation, the relative entropy value is calculated by calculating the probability distribution of the original model and the quantized model probability distribution.,Is a probability distribution function of two discrete random variables,with respect toThe relative entropy of (2) is:
if it is,The higher the similarity of (2)The smaller the value. In the quantization process, the weight distribution of the original model is the optimal expected distribution, and the fitting model obtained by quantizing the original model is better as the weight distribution of the fitting model is closer to the original model. The invention adopts relative entropy as a fitting error optimization quantization evaluation index.
S104: and finding out the minimum threshold value corresponding to the relative entropy value and outputting a corresponding quantization model.
By iteratively calculating a series of relative entropy values under different threshold conditions, quantization models of different degrees can be obtained. Finding out the threshold value corresponding to the minimum relative entropy valueValue, representing threshold valueThe quantized model is closest to the original model under the condition of value, and outputsAnd (5) a quantization model corresponding to the value.
In the model quantization method provided by the embodiment of the application, when the maximum value of the weight parameters to be quantized is found, the quantization result is changed by changing the size of the threshold S, wherein the range of S is 0 to 1.2 times of the maximum weight parameter. Different quantization results are obtained by changing the threshold value S. And introducing a relative threshold value, and calculating the closeness degree of different quantization models and the original model to find out the optimal quantization result. Therefore, the purposes of reducing the size of the model, reducing the memory consumption of the model and accelerating the reasoning speed of the model are achieved on the basis of hardly losing the network flow performance of the ESN network prediction data center.
Embodiment two:
based on the same inventive concept, a pool ESN network quantization apparatus 300 is also provided in the embodiments of the present application. Referring to fig. 3, fig. 3 shows a model quantization apparatus employing the method shown in fig. 1. It should be appreciated that the specific functions of the apparatus 300 may be found in the above description, and detailed descriptions are omitted herein as appropriate to avoid repetition. The device 300 includes at least one software functional module that can be stored in memory in the form of software or firmware or cured in the operating system of the device 300. Specifically:
referring to fig. 3, the apparatus 300 includes: an input acquisition module 301, an iterative quantization module 302, and an optimal quantization model output module 303. Wherein:
the input obtaining module 301 is configured to input an ESN network weight parameter to be quantized, and obtain probability distribution of an original ESN model;
the iterative quantization module 302 is configured to iteratively calculate probability distributions of the model after calculation under different threshold values, and calculate corresponding relative entropy values;
the best quantization model output module 303 is configured to find a threshold corresponding to the minimum relative entropy value, and output a corresponding quantization model.
It should be understood that, for simplicity of description, the descriptions in the first embodiment are omitted in this embodiment.
Embodiment III:
this embodiment provides an electronic device, see fig. 4, comprising a processor 401, a memory 402 and a communication bus 403. Wherein: a communication bus 403 is used to enable connected communication between the processor 401 and the memory 402.
The processor 401 is configured to execute one or more first programs stored in the memory 402 to implement the model training method in the first and/or second embodiments.
It will be appreciated that the configuration shown in fig. 4 is merely illustrative, and that the electronic device may also include more or fewer components than shown in fig. 4, or have a different configuration than shown in fig. 4.
It should be noted that, the electronic device described in the embodiments of the present application may be a device having data processing capability, such as a computer, a server, or the like.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application, and various modifications and variations may be suggested to one skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims (5)

1. A pool ESN network quantization method, comprising:
inputting weight parameters to be quantized, and obtaining original probability distribution; finding the maximum value of the weight parameters to be quantized; under the condition of different thresholds, the iterative calculation quantifies the probability distribution of the calculated model and calculates the corresponding relative entropy value; and finding out the minimum threshold value corresponding to the relative entropy value and outputting a corresponding quantization model.
2. The pool ESN network quantization method of claim 1, wherein the weight parameters to be quantized comprise:and
3. a pool ESN network quantization method as claimed in claim 1, wherein the threshold value ranges from a maximum weight parameter value greater than zero and less than 1.2 times.
4. A pool ESN network quantization apparatus, comprising:
the device comprises an input acquisition module, an iterative quantization module and an optimal quantization model output module; the input acquisition module is used for inputting ESN network weight parameters to be quantized and acquiring probability distribution of an original ESN model; the iterative quantization module is used for iteratively calculating probability distribution of the model after the calculation under the condition of different thresholds, and calculating corresponding relative entropy values; the optimal quantization model output module is used for finding out a threshold value corresponding to the minimum relative entropy value and outputting a corresponding quantization model.
5. An electronic device comprising a processor, a memory, and a communication bus; the communication bus is used for realizing connection communication between the processor and the memory; the processor is configured to execute one or more programs stored in the memory to implement the pool ESN network quantization method of any one of claims 1 to 3 above.
CN202210032496.5A 2022-01-12 2022-01-12 Storage pool ESN network quantification method and device and electronic equipment Pending CN116484936A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210032496.5A CN116484936A (en) 2022-01-12 2022-01-12 Storage pool ESN network quantification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210032496.5A CN116484936A (en) 2022-01-12 2022-01-12 Storage pool ESN network quantification method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116484936A true CN116484936A (en) 2023-07-25

Family

ID=87216429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210032496.5A Pending CN116484936A (en) 2022-01-12 2022-01-12 Storage pool ESN network quantification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116484936A (en)

Similar Documents

Publication Publication Date Title
Gholami et al. A survey of quantization methods for efficient neural network inference
CN110175641B (en) Image recognition method, device, equipment and storage medium
US20220004884A1 (en) Convolutional Neural Network Computing Acceleration Method and Apparatus, Device, and Medium
Han et al. Network traffic prediction using variational mode decomposition and multi-reservoirs echo state network
JP2022507704A (en) Adaptive quantization methods and devices, devices, media
CN112200296A (en) Network model quantification method and device, storage medium and electronic equipment
CN113051130A (en) Mobile cloud load prediction method and system of LSTM network combined with attention mechanism
CN117217280A (en) Neural network model optimization method and device and computing equipment
CN116502774B (en) Time sequence prediction method based on time sequence decomposition and Legend projection
CN112257466A (en) Model compression method applied to small machine translation equipment
JP2024043504A (en) Acceleration method, device, electronic apparatus, and medium for neural network model inference
CN112561050B (en) Neural network model training method and device
CN116484936A (en) Storage pool ESN network quantification method and device and electronic equipment
CN114065913A (en) Model quantization method and device and terminal equipment
CN117348837A (en) Quantization method and device for floating point precision model, electronic equipment and storage medium
CN112437460B (en) IP address black gray list analysis method, server, terminal and storage medium
CN116644783A (en) Model training method, object processing method and device, electronic equipment and medium
CN114444688A (en) Neural network quantization method, apparatus, device, storage medium, and program product
CN108629134B (en) Similarity strengthening method for small fields in manifold
CN114462592A (en) Model training method and device, electronic equipment and computer readable storage medium
CN113297540A (en) APP resource demand prediction method, device and system under edge Internet of things agent service
Liu et al. Accurate and efficient quantized reservoir computing system
CN112069455A (en) Log-softmax function hardware acceleration computing method
Zhang et al. FedmPT: Federated learning for multiple personalized tasks over mobile computing
CN114385831B (en) Knowledge-graph relation prediction method based on feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication