CN115982567A - Refrigerating system load prediction method based on sequence-to-sequence model - Google Patents

Refrigerating system load prediction method based on sequence-to-sequence model Download PDF

Info

Publication number
CN115982567A
CN115982567A CN202211626813.2A CN202211626813A CN115982567A CN 115982567 A CN115982567 A CN 115982567A CN 202211626813 A CN202211626813 A CN 202211626813A CN 115982567 A CN115982567 A CN 115982567A
Authority
CN
China
Prior art keywords
sequence
model
input
output
characteristic diagram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211626813.2A
Other languages
Chinese (zh)
Inventor
李佳佳
宁德军
陈逸君
王天逸
郭千朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Carbon Soot Energy Service Co ltd
Original Assignee
Shanghai Carbon Soot Energy Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Carbon Soot Energy Service Co ltd filed Critical Shanghai Carbon Soot Energy Service Co ltd
Priority to CN202211626813.2A priority Critical patent/CN115982567A/en
Publication of CN115982567A publication Critical patent/CN115982567A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a refrigerating system load prediction method based on a sequence-to-sequence model, which comprises the steps of sorting historical data into a multi-dimensional time sequence MTS with a timestamp attribute, training the sequence-to-sequence prediction model by using the MTS, then using time sequence data of a given window as input, and predicting load tendency by using the trained prediction model, wherein the prediction model comprises an encoder and a decoder, the encoder adopts a multilayer structure consisting of a multilayer structure, a sparse self-attention module and a distillation module, a vector formed by embedding and encoding the time sequence data is used as the input of an initial layer structure, then the output of a previous layer structure is used as the input of a next layer structure until a final characteristic diagram is output, the input of each layer structure is divided into two paths, the first path sequentially passes through the sparse self-attention module and the distillation module to output a characteristic diagram I, the second path passes through a downsampling characteristic diagram II, and then the characteristic diagram I and the characteristic diagram II are subjected to characteristic fusion through a residual error connecting module to output the characteristic diagram of the current layer.

Description

Refrigerating system load prediction method based on sequence-to-sequence model
Technical Field
The invention relates to the technical field of energy efficiency of a refrigerating machine room, in particular to a refrigerating system load prediction method based on a sequence-to-sequence model.
Background
Energy efficiency optimization of efficient power systems is increasingly important in view of the fact that the energy consumption of power plant houses of many high energy consuming enterprises accounts for nearly 50% of the total energy consumption of the enterprises, however, efficient energy efficiency optimization strategies rely on accurate prediction of the load of the efficient power systems.
The existing load prediction methods mostly adopt a regression algorithm and a short time sequence prediction method, such as a recurrent neural network (LSTM, GRU) and the like, and when the methods are used for predicting medium and long time sequence prediction problems, the effect is poor due to rapid accumulation of errors. At present, with the success of sequence-to-sequence algorithms from a coder, a Transformer and the like in the fields of natural language processing and the like, a new idea is brought for developing medium-long time sequence prediction of high-efficiency power system loads, however, the models have the following problems:
1. the overhead of network computation amount and time complexity due to the use of attention mechanism becomes huge compared with the traditional CNN and RNN deep learning method;
2. the long-order dependence, the position dependence, the special time point and the mutual weight influence of the time sequence are not fully considered for the embedded coding of the time sequence;
3. in order to obtain more features, many models adopt a way of stacking a plurality of encoders, which not only increases a large number of network parameters, but also causes excessive parameters, which leads to difficult training and convergence.
Disclosure of Invention
In order to solve the problems, the invention provides a refrigeration system load prediction method based on a sequence-to-sequence model, which solves the problems of error accumulation and low accuracy rate of medium-long time sequence prediction of high-efficiency power system load, realizes more accurate prediction of medium-long time load of a power station house system, and meets the accuracy requirement of engineering-level power system energy efficiency optimization application.
The invention can be realized by the following technical scheme:
a refrigerating system load prediction method based on a sequence-to-sequence model is characterized in that a large amount of historical data is arranged into a multidimensional time sequence MTS with a timestamp attribute, a sequence-to-sequence prediction model is trained, then time sequence data of a given window is used as input, the trained prediction model is used for predicting load trend in the next period of time,
the prediction model comprises an encoder and a decoder, wherein the encoder is of a multilayer structure consisting of a sparse self-attention module and a distillation module, a vector formed by embedding and encoding time sequence data is used as the input of an initial layer structure, the output of a previous layer structure is used as the input of a next layer structure until a final feature map is output, the input of each layer structure is divided into two paths, the first path sequentially passes through the sparse self-attention module and the distillation module to output a feature map I, the second path sequentially passes through a downsampling output feature map II, and then the feature map I and the feature map II are subjected to feature fusion through a residual error connection module to output a current layer feature map;
the decoder fills the target elements to be predicted into zero, then carries out embedded coding to generate a vector input mask sparse self-attention module, then uses the generated feature map as a query vector and a final feature map output by the encoder to sequentially input the full self-attention module and a full connection layer, and finally outputs the predicted target elements in real time in a generating mode.
Further, the sparse self-attention module projects the feature of the query matrix Q and the key matrix K after being fused to a new probability space by using a full connection layer, and then utilizes a formula I(Q) = FC (Q + K) calculate importance score of query by setting n = clnL Q ,L Q Representing the number of rows of a matrix Q, selecting the first n query vectors with the highest scores as attention scores for subsequent calculation, wherein FC (-) represents full-connection operation, the number of input channels of an FC layer is the characteristic size, and the number of output channels is 1,I (Q) with the shape of L Q ×1。
Further, the process of "refining" the feature map i from the jth sparse self-attention module to the j +1 sparse self-attention module is defined as:
Figure SMS_1
wherein t represents the current time period [. ]] AB Denotes the attention block, gamma denotes a learnable parameter, DS (-) denotes the down-sampling operation,
Figure SMS_2
conv1d (-) denotes one-dimensional convolution filtering in the time dimension with the ELU (-) activation function.
Further, the multi-dimensional time series MTS is expressed in a matrix form, z-score standardization is firstly carried out, then batch division is carried out according to rows, then the following formula is utilized to carry out embedded coding to form a vector as the input of a prediction model,
Figure SMS_3
wherein the content of the first and second substances,
Figure SMS_4
represents the result of the multi-dimensional time series embedded coding, i is epsilon {1, …, L x T and L x The sub-table represents the current time period and the number of rows of data. Let the characteristic dimension after encoding be D model
Figure SMS_5
A characteristic dimension representing the input is d model In a multi-dimensional time series/>
Figure SMS_6
After being projected by one-dimensional convolution, the characteristic dimension is D model The vector of (a); />
Figure SMS_7
Representing a temporal coding;
Figure SMS_8
represents the position coding:
Figure SMS_9
Figure SMS_10
wherein
Figure SMS_11
And then projected to D using a one-dimensional convolution model Dimension, pos, represents the current location.
Figure SMS_12
Parameters for adjusting the position coding and time coding weights are represented by the following calculation method:
Figure SMS_13
wherein Relu () is an activation function, conv1 () is a one-dimensional convolution, and the number of input channels thereof is D model The number of output channels is 1.
Further, z-score normalization is performed using the following equation,
Figure SMS_14
wherein d is (i,j) Is the first in the multidimensional time series MTSj columns and i rows of values, D (,) All values in column j are shown, mean (-) and Std (-) indicate Mean and standard deviation, respectively, in column j of the data set.
The beneficial technical effects of the invention are as follows:
1. the sparse self-attention mechanism based on the neural network is provided, and the capability of the transform self-attention mechanism in the aspects of time complexity and memory use is further optimized by acquiring a pair of outstanding contribution attention point products through the learnable neural network;
2. the stacking mode of the traditional depth model encoder is improved, a multi-class pooling and residual distillation mechanism is adopted, and the characteristics are obtained as much as possible under the condition that a plurality of encoders are not overlapped;
3. a new time sequence embedding coding mode is provided, so that the local position coding and the global time coding have stronger robust performance.
Drawings
FIG. 1 is a data presentation diagram of the present invention;
FIG. 2 is a schematic diagram of the overall structure of the prediction model of the present invention;
fig. 3 is a partial detailed view of the encoder structure of the present invention.
Detailed Description
The following detailed description of the preferred embodiments will be made with reference to the accompanying drawings.
As shown in FIG. 1, the present invention provides a method for predicting load of a refrigeration system based on a sequence-to-sequence model, which arranges a large amount of historical data into a multi-dimensional time sequence MTS with a time stamp attribute, trains a sequence-to-sequence prediction model, then predicts the load trend in the next period of time by using the trained prediction model with the time sequence data of a given window as input,
the prediction model comprises an encoder and a decoder, wherein the encoder is of a multilayer structure consisting of a sparse self-attention module and a distillation module, a vector formed by embedding and encoding time sequence data is used as the input of an initial layer structure, the output of a previous layer structure is used as the input of a next layer structure until a final feature map is output, the input of each layer structure is divided into two paths, the first path sequentially passes through the sparse self-attention module and the distillation module to output a feature map I, the second path sequentially passes through a downsampling output feature map II, and then the feature map I and the feature map II are subjected to feature fusion through a residual error connection module to output a current layer feature map;
the decoder fills the target elements to be predicted into zero, then carries out embedded coding to generate a vector input mask sparse self-attention module, then uses the generated feature map as a query vector and a final feature map output by the encoder to input the feature map into a full self-attention module and a full connection layer in sequence, and finally outputs the predicted target elements in real time in a generating mode.
The method comprises the following specific steps:
step 1: time series data preparation and preprocessing
Historical data in the efficient power system is a key factor for model training and establishment, so the method needs to further process the data, arranges a large amount of historical data into a multidimensional time sequence MTS with a time stamp attribute, trains a prediction model, namely a deep network, based on the MTS, gives time sequence data of a certain window, and can predict the load trend in the next period of time by using the trained deep network, as shown in FIG. 1.
When the maximum and minimum values of a certain attribute in MTS are unknown, or there are outliers, min-max normalization of data is not applicable, so we define the z-score normalization method of data as follows:
Figure SMS_15
wherein d is (i,) Is the value in column jth row i of the MTS dataset as follows:
Figure SMS_16
D (,) stands for column jThere are numerical values, importantly, mean (-) and Std (-) indicate the Mean and standard deviation, respectively, of column j in the data set. It is worth noting that D' is only used for training the model, and the data difference is prevented from being too large to affect the training. While the validation and test data sets need to be cut out of the entire data set, no z-score normalization is required.
Generally, we convert the training data D' into several small batches by rows, for example, the minimum batch number can be set to be 10 rows by 100 rows of our training data, then B =100/10 is 10 batches in total, and the row sequence number in each batch is k. The inputs to the deep network may be defined as:
Figure SMS_17
where B refers to the index of the minimum batch (B ∈ 1, …, B), and k is the index of the row in the minimum batch (k ∈ 1, …, L) x ) Lx is the total number of rows of the data time series, and the characteristic dimension of the minimum batch is D x
The output predicted value is:
Figure SMS_18
where H is the number of time steps in the interval after the current timestamp, k is the row index in the output (k ∈ 1, …, L) y ) Ly is the total number of predicted lines in the time series, and the characteristic dimension of the output is D y
Step 2: constructing a sequence-to-sequence model;
the overall architecture of the present invention is shown in fig. 2 and follows an encoder-decoder architecture. In the encoding process, the input is embedded to form a vector, then the vector enters the sparse self-attention module provided by the invention, the output of the sparse self-attention module needs to pass through the distillation module and then output a feature map, in order to ensure that the loss is reduced in the feature forward propagation process, the embedded vector is subjected to down-sampling and then is fused with the output of the distillation module to obtain the feature map, and the process is called as a residual error connection module.
The decoder receives the long sequence input, fills the target elements to be predicted into zero, carries out embedded coding in the same way as the encoder, generates a vector input mask sparse self-attention module, and then sends the generated feature map into a full self-attention module as a query vector and a key vector and a value vector output by the encoder. The fully-connected layer instantaneously predicts the output elements in a generational manner.
2.1 Embedded coding
A multi-dimensional time series is a chronological sequence of data whose values are in a continuous space. In most cases, the original data is used as model input after embedding, and therefore the embedding code determines how well the data behaves. Previous work designed data embedding by manually adding different time windows, hysteresis operators, and other manual feature derivations, however, this approach was too cumbersome and required domain-specific knowledge. In a deep learning model, an embedding method based on a neural network is widely used, and particularly, considering that position semantics and timestamp information can influence the embedding of data, the invention provides the following embedding coding mode:
Figure SMS_19
wherein the content of the first and second substances,
Figure SMS_20
is the result of the multi-dimensional time sequence embedded code, i belongs to {1, …, L x Let the characteristic dimension after encoding be D model
Figure SMS_21
Multi-dimensional time series->
Figure SMS_22
(characteristic dimension is d) model ) Vector projected by one-dimensional convolution (feature dimension D) model );
Figure SMS_23
For position coding:
Figure SMS_24
Figure SMS_25
wherein
Figure SMS_26
Then projected to D using a one-dimensional convolution model Dimension. In other words, once the length L of the sequence is entered x And a characteristic dimension d model The position embedding is fixed, pos represents the current position.
Each global timestamp is embedded by a learnable value
Figure SMS_27
The one-hot code used, in particular year, month, day, hour, minute, second, holiday is mapped to and/or by full concatenation>
Figure SMS_28
Same characteristic dimension (D) model ) The vector of (2). Such as 11 minutes 11 seconds, 11 months 11 days 11, a time 2022, and if all times are 2000 to 2022,
then the year: [0,0, … ] is 23-dimensional vector, the 1 st 0 is set to 1 in 2022, and then the full connection is projected to 512-dimensional vector
And (4) month: [0,0, … ] is a 12-dimensional vector, the 11 th 0 is set to 1 in 11 months, and the full connection is projected to 512-dimensional vector
Day: [0,0, … ] is 31-dimensional vector, the 11 th 0 is set to 1 at 11 days, and then the full connection is used to project to 512 dimensions
When the method is used: [0,0, … ] is a 24-dimensional vector, when 11, the 11 th 0 is set to 1, and then the full connection is projected to 512-dimensional vector
Dividing into: [0,0, … ] is a 60-dimensional vector, the 11 th 0 is set to 1 in 11 minutes, and the vector is projected to 512 dimensions by full connection
Second: [0,0, … ] is a 60-dimensional vector, the 11 th 0 is set to 1 in 11 seconds, and the full connection is projected to 512-dimensional
In addition, the invention uses
Figure SMS_29
The parameters are used for adjusting the weight of the position code and the time code, and the calculation mode can be expressed as: />
Figure SMS_30
Wherein Relu () is the activation function, conv1D () is the one-dimensional convolution with a number of input channels D model The number of output channels is 1. Such an embedding method not only can tap out more MTS features, but also facilitates training, which is used by both encoder and decoder embedding.
2.2 sparse self-attention Module
The traditional full-volume self-attention module is based on tuple inputs, i.e. queries, keys and values, which can be described as:
Figure SMS_31
where Q, K, V are the matrices of query, key and value, respectively, d k Is the input dimension. Further, if q is i ,k i ,v i For representing the ith row in the Q, K, V matrix, then the ith row of the output can be represented as:
Figure SMS_32
wherein k (q) i, k j ) Is in fact an asymmetric exponential kernel function
Figure SMS_33
This also means that it can weight the sum of vectors of values (V matrix), which requires a quadratic dot product calculation and O (L) Q L K ) Memory deviceUse, this is a major limitation in extending the prediction capability.
Numerous studies have shown that sparse self-attention scores form a long-tailed distribution, i.e. a few pairs of dot products contribute to the main attention, and other pairs of dot products can be ignored. In this case, if we can compute the most important n query vectors by the relationship between Q and K, then O (L) Q L K ) Can be optimized. Therefore, the present invention provides a method for implementing a filtering process of queries through neural network learning, which is defined as follows:
I(Q)=FC(Q+K)
where I (Q) represents the importance score of the query, which is shaped as L Q X 1; FC (-) indicates a full connection operation, the number of input channels of the FC layer is a characteristic size, and the number of output channels is 1.
In this way, we abandon the traditional method of computing attention scores in the Transformer, and instead, we project the fused features of Q and K to a new probability space by using the full-link layer. In addition, we obtained the score of the query and set n = clnL Q The top n most-scored query vectors are selected to implement this process, which we name as sparse self-attention method, so that the temporal and spatial complexity of the self-attention module can be from O (L) 2 ) Increase to O (LlnL).
In summary, our method has the following advantages:
1. the calculation workload in the process of filtering the query vectors is reduced;
2. obtaining a faster training speed and a lower GPU utilization rate;
3. good continuity of the feature fields is achieved.
2.3 encoder
To extract the robust long-range dependence of long-sequence inputs, we propose a single-encoder feature extraction method and improve the distillation operation, which is represented in fig. 3. After the input embedded coded vector is calculated by our sparse self-attention module, we get the N-headed weight matrix of the attention module in the graph, and our "refinement" process from the jth attention block to the (j + 1) attention block can be defined as:
Figure SMS_34
wherein [. ]] AB Represents the attention block, γ is a learnable parameter; DS (-) represents the downsampling operation, we use global mean pooling (stride = 2) on DS (-). In addition to this, the present invention is,
Figure SMS_35
the calculation method comprises the following steps:
Figure SMS_36
where Conv1d (-) performs one-dimensional convolution filtering (convolution kernel size 3) in the time dimension with the ELU (-) activation function. While downsampling may reduce the dimensionality of the features, some semantic information will be lost. To mitigate this effect, we obtain as much semantic information as possible (both step size 2) by adding a max pooling layer (Maxpool) and a mean pooling layer (AvePool) in parallel, and we also add a learnable y γ to adjust the importance of these two collective operations. Furthermore, to prevent the disappearance of gradients and features, we add a residual join. After encoding, the length of the feature map is one fourth of the original length. Compared with the method of stacking encoders, the method has the advantages of fewer parameters, higher calculation speed and capability of obtaining as many characteristics as possible.
2.4 decoder
The input of the decoder comprises two parts, one part is the output (key sum value) of the encoder, and the other part is the query vector calculated by the mask sparse self-attention module after the embedded vector with the target element filled with 0. In contrast to the sparse self-attention module, the mask sparse self-attention module masks future parts before calculating Softmax (·) and fills in with the sum of the V vectors at all time points before each query. This filling method can prevent the model from paying attention to future information. Finally, the query, key, and value are passed to a conventional full-volume self-attention module, through a full connectivity layer, to obtain the predicted result.
And step 3: experimental setup
3.1 data set
The data set used by the invention is data of a refrigerating system of a cold room, the time span is 2022.05.01-2022.08.31, the data interval is 1 minute, and 177120 pieces of data are provided, and the data dimensions are respectively as follows: time, chiller load (kw), chiller primary side load (kw), cooling tower load (kw), and cooling pump load (kw), for 5 dimensions. We follow 6:2:2, dividing a training set, a verification set and a test set, processing a data set by adopting a sliding window, wherein the length of an input sequence is N, and then taking M steps as a true value (carrying out MSE training with a prediction value of a model).
3.2 Experimental setup
The deep network model provided by the invention is realized under a Pythrch frame, and is trained by using an Adam optimizer, and the initial learning rate is 10 -4 Weight decay 5e -4 Momentum is 0.9, batch size is 32, iterations 20, and learning rate decays 0.5 every 5 epochs. Training is implemented on a NVIDIA Geforce GTX 3090Ti GPU and Intel (R) Core (TM) i9-10900K CPU.
3.3 evaluation index
For the evaluation index of the depth model of the present invention, we use CORR, MAE and MSE, where CORR represents the empirical correlation coefficient, MAE is the average absolute error, and MSE represents the average squared error. They are defined as follows:
Figure SMS_37
Figure SMS_38
Figure SMS_39
wherein y and
Figure SMS_40
respectively, a ground truth signal and a system prediction signal. Further, we set y = y 1 ,y 2 ,…,y n
And
Figure SMS_41
n represents the number of samples.
And 4, step 4: model execution
Based on the model for predicting the load from the sequence of the refrigeration machine room to the sequence, the historical load data of the refrigeration machine room to be processed is used as input, and a corresponding load prediction result is obtained and is used for optimizing the energy efficiency of the refrigeration machine room.
Due to the adoption of the technical scheme, the invention has the beneficial effects that:
1) Compared with the traditional prediction method, the CORR obtained by medium-long term prediction (more than 30 steps) is improved by 3 percent, the MAE is improved by 25 percent, and the MSE is improved by 70 percent;
2) Compared with the self-attention module of the traditional Transformer, the invention can lead the time and space complexity to be from O (L) 2 ) Reduced to O (LlnL);
3) Compared with the traditional Transformer model, the method reduces the parameter quantity of the model by 50 percent
The above;
4) Compared with the traditional self-attention module of the Transformer, when the prediction step length exceeds 100 steps, the training time is shortened by more than 5 times, and the video memory occupation is shortened by more than 2 times.
In addition, more fields of application research experiments are carried out, and the specific steps are as follows:
example 1: load prediction for refrigeration system of cold room
The method is applied to load prediction of a refrigeration system, the step length of input data is 60, the dimension is 5, the prediction step length is 30, the prediction dimension is 5, time, cold load (kw), cold primary side load (kw), cooling tower load (kw), cooling pump load (kw) and total load are the sum of loads of all devices. The model was constructed in the manner shown in FIG. 2, and the results for CORR, MAE and MSE between the predicted and true values we obtained are CORR:0.954, MAE:0.188, MSE:0.079, whereas the conventional method of the Transformer obtained CORR:0.917, MAE.
Example 2 of implementation: exchange rate prediction based on depth algorithm of the invention
We have collected daily rates from 1990 to 2016 in eight countries including australia, uk, canada, switzerland, china, japan, new zealand and singapore. For a total of 7588 pieces of data, we have according to 6:2: and 2, performing training set, verification set and test set segmentation on the ratio. We apply this patent to exchange rate prediction with input data step size set to 120 and prediction step size set to 60 and build the model of the invention according to fig. 2. The results for CORR, MAE and MSE between our predicted and true values are CORR:0.911, MAE.
Example 3 of implementation:
we used a public statistical data set of residential electricity usage as a test. The data set provides two years of data, recorded every minute for each data point, from one region of our country. The data set contained 1,051,200 data points for 365 days in 2 years. Each data point contains 8-dimensional features including the date the data point was recorded, the predicted value "oil temperature" and 6 different types of external load values. We follow 6:2: and 2, performing training set, verification set and test set segmentation on the ratio. We apply this patent to exchange rate prediction with input data step size set to 720 and prediction step size set to 360 and build the model of the invention according to fig. 2. The results for CORR, MAE and MSE between our predicted and true values are CORR:0.772, MAE:0.308, MSE.
It will be appreciated by those skilled in the art that these are merely examples and that many variations or modifications may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is therefore defined by the appended claims.

Claims (5)

1. A refrigeration system load prediction method based on a sequence-to-sequence model is characterized in that: a large amount of historical data is arranged into a multi-dimensional time sequence MTS with a time stamp attribute, a prediction model from the sequence to the sequence is trained, then time sequence data of a given window is used as input, the trained prediction model is used for predicting the load trend in the next period of time,
the prediction model comprises a coder and a decoder, wherein the coder adopts a multilayer structure consisting of a sparse self-attention module and a distillation module, a vector formed by embedding and coding time sequence data is used as the input of an initial layer structure, the output of a previous layer structure is used as the input of a next layer structure until a final characteristic diagram is output, the input of each layer structure is divided into two paths, the first path sequentially passes through the sparse self-attention module and the distillation module to output a characteristic diagram I, the second path sequentially passes through a downsampling output characteristic diagram II, and then the characteristic diagram I and the characteristic diagram II are subjected to characteristic fusion through a residual error connecting module to output a current layer characteristic diagram;
the decoder fills the target elements to be predicted into zero, then carries out embedded coding to generate a vector input mask sparse self-attention module, then uses the generated feature map as a query vector and a final feature map output by the encoder to sequentially input the full self-attention module and a full connection layer, and finally outputs the predicted target elements in real time in a generating mode.
2. The sequence-to-sequence model-based refrigerant system load prediction method of claim 1, wherein: the sparse self-attention module projects the feature of the query matrix Q and the key matrix K after fusion to a new probability space by using a full connection layer, calculates the importance score of the query by using a formula I (Q) = FC (Q + K), and sets n = clnL Q ,L Q Representing the number of rows of a matrix Q, selecting the first n query vectors with the highest scores as attention scores for subsequent calculation, wherein FC (-) represents full-connection operation, and the number of input channels of an FC layer is the characteristic sizeThe number of output channels is 1,I (Q) and the shape is L Q ×1。
3. The sequence-to-sequence model-based refrigerant system load prediction method of claim 2, wherein: the process of refining the feature map I from the jth sparse self-attention module to the j +1 sparse self-attention module is defined as follows:
Figure FDA0004003833780000021
wherein t represents the current time period, [ ·] AB Denotes the attention block, γ denotes a learnable parameter, DS (-) denotes the down-sampling operation,
Figure FDA0004003833780000022
conv1d (-) indicates one-dimensional convolution filtering in the time dimension with the ELU (-) activation function.
4. The sequence-to-sequence model-based refrigerant system load prediction method of claim 1, wherein: expressing the multi-dimensional time series MTS in a matrix form, firstly carrying out z-score standardization processing, then carrying out batch division according to rows, then carrying out embedded coding by using the following formula to form a vector as the input of a prediction model,
Figure FDA0004003833780000023
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0004003833780000024
represents the result of the multi-dimensional time series embedded coding, i is epsilon {1, …, L x T and L x The sub-table represents the current time period and the number of rows of data. Let the characteristic dimension after encoding be D model
Figure FDA0004003833780000025
A characteristic dimension representing the input is d model Is based on a multi-dimensional time series->
Figure FDA0004003833780000026
After being projected by one-dimensional convolution, the characteristic dimension is D model The vector of (a); />
Figure FDA0004003833780000027
Representing a temporal coding;
Figure FDA0004003833780000028
represents the position coding: />
Figure FDA0004003833780000029
Figure FDA00040038337800000210
Wherein
Figure FDA00040038337800000213
Then projected to D using a one-dimensional convolution model Dimension, pos, represents the current location.
Figure FDA00040038337800000211
Parameters for adjusting the position coding and time coding weights are represented by the following calculation method:
Figure FDA00040038337800000212
wherein Relu () is an activation function, conv1d () is a one-dimensional convolution,the number of input channels is D model The number of output channels is 1.
5. The sequence-to-sequence model-based refrigerant system load prediction method of claim 4, wherein: the z-score normalization process is performed using the following equation,
Figure FDA0004003833780000031
wherein d is (i,j) Is the value of the ith row in the jth column in the multi-dimensional time series MTS, D (,) All values in column j are shown, mean (-) and Std (-) indicate Mean and standard deviation, respectively, in column j of the data set.
CN202211626813.2A 2022-12-16 2022-12-16 Refrigerating system load prediction method based on sequence-to-sequence model Pending CN115982567A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211626813.2A CN115982567A (en) 2022-12-16 2022-12-16 Refrigerating system load prediction method based on sequence-to-sequence model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211626813.2A CN115982567A (en) 2022-12-16 2022-12-16 Refrigerating system load prediction method based on sequence-to-sequence model

Publications (1)

Publication Number Publication Date
CN115982567A true CN115982567A (en) 2023-04-18

Family

ID=85969298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211626813.2A Pending CN115982567A (en) 2022-12-16 2022-12-16 Refrigerating system load prediction method based on sequence-to-sequence model

Country Status (1)

Country Link
CN (1) CN115982567A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116612393A (en) * 2023-05-05 2023-08-18 北京思源知行科技发展有限公司 Solar radiation prediction method, system, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116612393A (en) * 2023-05-05 2023-08-18 北京思源知行科技发展有限公司 Solar radiation prediction method, system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111161535A (en) Attention mechanism-based graph neural network traffic flow prediction method and system
CN111210633B (en) Short-term traffic flow prediction method based on deep learning
CN115240425B (en) Traffic prediction method based on multi-scale space-time fusion graph network
CN112330951B (en) Method for realizing road network traffic data restoration based on generation of countermeasure network
CN111861013B (en) Power load prediction method and device
CN110059896A (en) A kind of Prediction of Stock Index method and system based on intensified learning
CN110619419B (en) Passenger flow prediction method for urban rail transit
CN109829495A (en) Timing image prediction method based on LSTM and DCGAN
CN111915081B (en) Peak sensitive travel demand prediction method based on deep learning
CN115273464A (en) Traffic flow prediction method based on improved space-time Transformer
CN113051811B (en) Multi-mode short-term traffic jam prediction method based on GRU network
CN105024886B (en) A kind of Fast W eb service QoS Forecasting Methodologies based on user metadata
CN117096867A (en) Short-term power load prediction method, device, system and storage medium
CN115982567A (en) Refrigerating system load prediction method based on sequence-to-sequence model
CN115099461A (en) Solar radiation prediction method and system based on double-branch feature extraction
CN115310677A (en) Flight path prediction method and device based on binary coded representation and multi-classification
CN115840893A (en) Multivariable time series prediction method and device
Liang Optimization of quantitative financial data analysis system based on deep learning
CN116596150A (en) Event prediction method of transform Hoxwell process model based on multi-branch self-attention
CN116743182B (en) Lossless data compression method
CN116778709A (en) Prediction method for traffic flow speed of convolutional network based on attention space-time diagram
CN115796029A (en) NL2SQL method based on explicit and implicit characteristic decoupling
CN114564512A (en) Time series prediction method, time series prediction device, electronic equipment and storage medium
CN115481788A (en) Load prediction method and system for phase change energy storage system
CN111859263B (en) Accurate dosing method for tap water treatment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination