CN114742278A

CN114742278A - Building energy consumption prediction method and system based on improved LSTM

Info

Publication number: CN114742278A
Application number: CN202210265853.2A
Authority: CN
Inventors: 于军琪; 董芳楠; 权炜; 康智桓
Original assignee: Xian University of Architecture and Technology
Current assignee: Xian University of Architecture and Technology
Priority date: 2022-03-17
Filing date: 2022-03-17
Publication date: 2022-07-12

Abstract

The invention discloses a building energy consumption prediction method and a system based on improved LSTM, wherein the method comprises the following processes: obtaining optimal parameters corresponding to the LSTM neural network; introducing the optimal parameters into an LSTM variant neural network, optimizing the hyper-parameters in the LSTM variant neural network by using a random gradient optimization algorithm based on weight attenuation to obtain the optimal hyper-parameters of the LSTM variant neural network, and taking the LSTM variant neural network corresponding to the optimal hyper-parameters as an optimal LSTM prediction model; and processing the collected data influencing the building load by using the optimal LSTM prediction model, predicting the load data of the building at the specified time, and realizing the prediction of the building energy consumption. The method has higher prediction precision and better stability, and is more suitable for short-term energy consumption prediction of commercial buildings.

Description

Building energy consumption prediction method and system based on improved LSTM

Technical Field

The invention belongs to the technical field of energy consumption prediction, and particularly relates to a building energy consumption prediction method and system based on improved LSTM.

Background

With the acceleration of urbanization pace, the number of urban buildings is increased, and the proportion of building energy consumption in the whole energy consumption system is larger and larger. The global building energy consumption exceeds the industrial and transportation industries, accounts for 46 percent of the total energy consumption, and the building carbon emission accounts for as high as 36 percent. People spend 90% of the time in buildings, and the continuous pursuit of heat comfort by people causes the increase of building energy consumption, greenhouse gases and the like, so that the energy demand management of the building industry with huge energy consumption becomes an important research field.

In all types of buildings, commercial buildings consume 30% more energy than residential buildings, mainly due to their large area, large traffic, long working time, and high lighting and air conditioning requirements. According to survey and analysis, the market public building energy consumption is highest in commercial buildings, and the average energy consumption per unit building area is 3.521 GJ/(m)²A) about 3 times that of office buildings and 2 times that of hotel buildings. In view of the fact that energy consumption prediction is the key to improving energy utilization efficiency and reducing peak power demand, commercial building energy consumption prediction is a global and widely concerned problem. However, outdoor temperature and humidity, solar radiation, personnel movement and the like can cause energy consumption change in the building, and meanwhile, due to the nonlinear and fluctuating characteristics of operation of most of equipment in the building, accurate prediction of energy consumption data becomes a huge challenge. In recent years, the wide application of high-precision sensors provides important support for predicting building energy consumption.

In recent years, energy consumption prediction methods are mainly classified into two types: (1) physical models (including EnergyPlus, eQuest, Ecotecet, etc.); (2) data-driven models (including Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), Decision Trees (DTs), Regression Models (RMs), etc.). The data driving method has the advantages of universality, flexibility, high precision and the like, and is widely adopted in building energy consumption prediction in recent years.

In terms of short-term load prediction: kim Y demonstrated that the ANN model had the best prediction accuracy, but was poorly interpretable, by predicting the 1 hour peak load in the korean seoul agency building. ANN has proven to be a robust method for efficiently predicting the power consumption of domestic units. The CNN is used to extract nonlinear load characteristics and nonlinear load temperature characteristics from the constructed hourly load cube, using the extracted characteristics as input to Support Vector Regression (SVR) for short-term load prediction. SVMs find the most active application in predicting short-term power loads. Mohandes M firstly applies the same data and weight as those of autoregression to SVM for power load prediction, and the result proves that the performance is better, but the finding that the SVM is more suitable for long-term load prediction is proved.

Regression prediction is directed to the dependency inside the data set, and time series prediction is directed to the dependency of the data set with time. In recent years, when studying the nonlinear time series problem, many scholars have begun to combine the traditional time series prediction model with the Recurrent Neural Network (RNN) and achieve good performance. From the neuron structure, nodes of the RNN hidden layer are connected together, previous information can be memorized, the output of a node behind is influenced by the previous information, although the training of sequence data can be well solved theoretically, the problems of gradient explosion and disappearance are often encountered, and the problems are particularly serious when the sequence is long. In order to solve the RNN deficiency, a variant long-short term memory (LSTM) neural network of RNN was proposed in 1997 by Hochreiter & Schmidhuber, and the LSTM network was innovative in that three control gates (an input gate, an output gate, and a forgetting gate) were introduced, the memory function of the model was realized by adjusting the opening and closing of the valves, and previous data were selectively retained and forgotten, so as to realize the influence of the early sequence on the final result. LSTM is widely used for text generation, machine translation, speech recognition, and gesture prediction due to its excellent ability to memorize time series. However, Klaus Greff et al found that setting of LSTM-related hyper-parameters has a significant influence on prediction accuracy, so it becomes a challenging task to select a suitable algorithm to improve LSTM prediction accuracy according to different application scenarios. Gradient descent methods and back propagation algorithms are commonly used to update the LSTM superparameters. However, the performance of the gradient descent depends on the learning rate, weight decay, momentum, etc. superparameters.

The adaptive moment estimation (Adam) is an effective stochastic gradient optimization algorithm, comprehensively considers first moment estimation and second moment estimation, can update parameters through the oscillation condition of historical gradients and the real historical gradients after filtering oscillation, limits the updated step length within a certain range according to the initial learning rate, is not influenced by gradient expansion transformation, and has good interpretability on hyper-parameters.

However, the performance of a single model is limited, and nowadays a large number of scholars use a hybrid model to improve the prediction accuracy of energy consumption. However, the current hybrid models are used to accomplish specific tasks, rather than adjusting the structure of the LSTM itself to improve its prediction accuracy. Therefore, research to solve practical engineering problems by enhancing the performance of the LSTM algorithm itself is still in the blank phase. In addition, Adam's stability is affected by weight decay.

Disclosure of Invention

Based on the problems, the invention provides a building energy consumption prediction method (DwddAdam-LSTM) and a system based on an improved LSTM in order to better realize energy consumption prediction.

The technical scheme adopted by the invention is as follows:

a building energy consumption prediction method based on improved LSTM comprises the following processes:

obtaining optimal parameters corresponding to the LSTM neural network;

introducing the optimal parameters into an LSTM variant neural network, optimizing the hyperparameters in the LSTM variant neural network by using a weight attenuation-based random gradient optimization algorithm to obtain the optimal hyperparameters of the LSTM variant neural network, and taking the LSTM variant neural network corresponding to the optimal hyperparameters as an optimal LSTM prediction model;

and processing the collected data influencing the building load by using the optimal LSTM prediction model, predicting the load data of the building at the specified time, and realizing the prediction of the building energy consumption.

Preferably, the obtaining process of the corresponding optimal parameter of the LSTM neural network includes:

converting data influencing building load in a preset time period before building energy consumption prediction into a three-dimensional array, taking the three-dimensional array as original data of a predicted later time step, determining an LSTM neural network batch b, a hidden layer number d and a hidden unit number u by adopting a grid search method, wherein the batch b, the hidden layer number d and the hidden unit number u form a three-dimensional search space, stepb, stepd and stepu respectively correspond to grid step lengths searched by the batch b, the hidden layer number d and the hidden unit number u, training and testing are carried out by using a training data set in a pre-established historical database, training is carried out for preset times in a parameter value range, and an average absolute error MAE value is taken as a target function of the grid search method to obtain an optimal parameter corresponding to the LSTM neural network.

Preferably, the corresponding optimal parameters of the LSTM neural network include batch number, hidden layer number, and hidden unit number, where the batch range is 13-18, the hidden layer number is 1-3, and the hidden unit number is 20-80.

Preferably, the LSTM variant neural network model is obtained by improving the gate structure of the LSTM neural network;

the process of improving the gate structure of the LSTM neural network includes: introducing the cell state of the previous moment when calculating forgetting data, and adding peephole connection into a forgetting gate system; the forgetting gate is connected with the input gate, new information is introduced when old information is forgotten, and the retained old information and the introduced new information are set to be complementary without changing the input activation function.

Preferably:

the forgetting gate calculation formula is as follows:

f_t＝sigmoid(W_fx_t，h_t-1，C_t-1]+b_f)

the input gate calculation formula is as follows:

i_t＝sigmoid(W_i|x_t，h_t-1，(1-f_t)]+b_i)

in the formula, C_t-1Is t-1Step-by-step cell State, 1-f_tIs the old information retained, W_fWeight matrix for forgetting gate, W_iAnd W_CAs a weight matrix of input gates, vector b_fBias vector for forgetting gate, b_iAnd b_cAs an offset vector of the input gate, h_t-1Is the output value, x, of the t-1 time step hidden state_tIs t time step input information, f_tIs left behind gate output, i_tIs the output of the input gate or gates,

is a candidate value for time t, sigmoid () and tanh () are activation functions, respectively.

Preferably, the random gradient optimization algorithm based on weight attenuation is obtained by introducing a weight attenuation term during parameter updating through deviation correction after gradient moment estimation in an Adam optimization algorithm.

Preferably, the parameter updating formula of the weight attenuation-based random gradient optimization algorithm is as follows:

θ＝[W，b]

wherein W comprises a weight matrix W of a forgetting gate_fWeight matrix W of input gates_iWeight matrix W of input gates_CAnd the weight matrix Wo, b of the output gate comprises a bias matrix b of the forgetting gate_fBias matrix b of input gates_iInput gate bias matrix b_cAnd an offset matrix b of output gates_o，θ_tUpdated parameters for t time steps, θ_t-1Parameter, m, updated for t-1 time step_tIs a first moment vector of the first order,

is the deviation correction of the second moment vector, epsilon is the minimum value, eta is the learning rate, omega_t-1Is the weighted decay rate.

Preferably, the hyper-parameters optimized for the LSTM variant neural network using the weight-decay-based stochastic gradient optimization algorithm include a weight matrix of forgetting gates, a weight matrix of input gates, a weight matrix of output gates, a bias matrix of forgetting gates, a bias matrix of input gates, and a bias matrix of output gates (i.e., specifically including W_f、W_i、W_C、Wo、b_f、b_iBo and b_c) (ii) a When optimizing, firstly setting the first estimated exponential decay rate beta in the random gradient optimization algorithm based on weight decay₁Second estimated exponential decay rate beta₂Learning rate eta, gradient g of initialization parameter vector t time step_tA first moment vector m_tCorrection of deviations of the second moment vector vt and the first moment vector

Correction of deviations of the second moment vector

And updating the parameter theta_tThen, training is carried out by using a training data set in a pre-established database, and a group of hyper-parameters which enable the loss function f (theta) to be minimum are found, wherein the hyper-parameters at the moment are the optimal hyper-parameters.

Preferably, the data influencing the building load comprise temperature, humidity, solar radiation, wind speed and air conditioning load actual data.

The invention also provides a building energy consumption prediction system based on the improved LSTM, which comprises the following components:

an optimal parameter acquisition unit: the method comprises the steps of obtaining optimal parameters corresponding to the LSTM neural network;

data is divided into units: the method comprises the steps of introducing optimal parameters into an LSTM variant neural network, optimizing hyper-parameters in the LSTM variant neural network by using a random gradient optimization algorithm based on weight attenuation to obtain optimal hyper-parameters of the LSTM variant neural network, and taking the LSTM variant neural network corresponding to the optimal hyper-parameters as an optimal LSTM prediction model;

a data application unit: the method is used for processing the collected data influencing the building load by using the optimal LSTM prediction model, predicting the load data of the building at the appointed time and realizing the prediction of the building energy consumption.

The invention has the following beneficial effects

Most studies use SVR and Adam-LSTM neural network models to predict the hourly energy consumption of air conditioning systems. However, due to the limitations of the predictive model itself, the prediction results are not satisfactory. The method not only provides the self-adaptive learning rate for the hyper-parameters, but also adds the weight attenuation item to update the loss parameters, thereby improving the convergence rate. Experimental results show that the method can fully and effectively memorize historical data compared with an SVR network model, has stronger stability than Adam-LSTM, and has more accurate prediction precision. Predicted MSE values for time-to-time energy consumption were reduced by 83% and 78% compared to SVR and LSTM, and 66%, 71% and 30% compared to SCA-LSTM, RMSprop-LSTM and Adam-LSTM, respectively. Therefore, the method has higher prediction precision and better stability, and is more suitable for short-term energy consumption prediction of commercial buildings.

Drawings

FIG. 1 is a schematic diagram of an improved LSTM neural network according to the present invention;

FIG. 2 is a flow chart of the AdamW-LSTM network model of the present invention;

FIG. 3(a) is a graph showing the correlation between the load and the temperature of the present invention at a hysteresis cycle of 6 hours;

FIG. 3(b) is a graph showing the correlation between load and humidity in the case of the present invention at a hysteresis cycle of 6 hours;

FIG. 3(c) is a graph of the correlation of the present invention load with solar radiation at a lag period of 6 hours;

FIG. 3(d) is a graph showing the correlation between the load and the wind speed in the hysteresis cycle of 6 hours according to the present invention;

FIG. 4 is a diagram illustrating loss values of different numbers of concealment layers during iteration according to the present invention;

FIG. 5 is a schematic diagram of the variation of loss values of 1 hidden layer after 100 iterations according to the present invention;

fig. 6 is a schematic diagram of the change of loss values of 3 hidden layers after 100 iterations according to the present invention.

Detailed Description

The invention is further described below with reference to the figures and examples.

The invention relates to a building energy consumption prediction method based on improved LSTM, which comprises the following steps:

step 1: construction of energy consumption prediction model (DwdAdam-LSTM)

The energy consumption prediction model comprises four layers of structures which are respectively as follows: the system comprises a data acquisition layer, a data preprocessing layer, a data analysis layer and a data application layer.

The data acquisition layer adopts automation equipment such as a high-precision sensor and a controller to acquire, summarize and store temperature, humidity, solar radiation and wind speed which affect building loads.

The data preprocessing layer processes the acquired original data. Firstly, original data can be damaged or inaccurate due to extreme weather of acquisition equipment; meanwhile, bad values or data loss is caused by data packet loss and other reasons in the transmission process, the bad values are removed, and the bad values and the lost data are processed through interpolation, an average filter and other means. And the second step is to observe the internal rule of each group of data by drawing a statistical chart, and simultaneously, research the correlation between the energy consumption and each influence factor by adopting a correlation analysis method to carry out proper screening. In addition, the data is subjected to minimum-maximum normalization processing, so that the data is mapped between [0 and 1], the problems caused by different data dimensions are solved, and the convergence speed of the model can be increased. And finally, introducing a cross validation idea in data processing, and dividing the energy consumption data into a training set and a test set.

And the data analysis layer optimizes the LSTM variant neural network by using an improved Dwdadam optimization algorithm to obtain an optimal value of the hyperparameter so as to improve the prediction accuracy. Training and testing the existing energy consumption data, calculating the loss (MAE) of the test set, and finishing iteration when the loss is minimum.

And the data application layer predicts the load data of the large commercial building at a specified time by using the trained model.

Step 2: constructing LSTM variant neural networks

The invention improves the gate structure of the LSTM, introduces the cell state at the previous moment when calculating forgetting data, adds peephole connection into a forgetting gate system, and can more accurately learn the information required to be reserved. The forgetting gate is connected with the input gate, new information is introduced when old information is forgotten, and the retained old information and the introduced new information are set to be complementary without changing the input activation function. The improved LSTM variant neural network structural model is shown in figure 1.

The calculation of the forgetting gate and the input gate after the LSTM network variant is shown as formulas (1) to (3):

A) forget gate calculation like formula

f_t＝sigmoid(W_f[x_t，h_t-1，C_t-1]+b_f) (1)

B) Input gate calculation such as formula

i_t＝sigmoid(W_i[x_t，h_t-1，(1-f_t)]+b_i)

In the formula, C_t-1Is the t-1 time step cell status, 1-f_tIs the old information retained, W_fWeight matrix for forgetting gate, W_iAnd W_CAs a weight matrix of input gates, vector b_fBias vector for forgetting gate, b_iAnd b_cIs the offset vector of the input gate, h_t-1Is the output value, x, of the t-1 time step hidden state_tIs t time step input information, f_tIs left behind gate output, i_tIs the output of the input gate or gates,

And step 3: dwdadam optimization algorithm

The traditional Adam optimizer carries out first moment estimation and second moment estimation on the gradient, calculates the individual self-adaptive learning rate of different parameters through deviation correction, and finds the optimal value of the hyper-parameter. However, Adam was found to be less stable than SGD in some tasks, the main reason for this being weight decay.

According to the method, after the gradient moment is estimated, through deviation correction, a weight attenuation item is introduced during parameter updating to obtain the random gradient optimizer DwddAdam based on weight attenuation, so that the updating of the individual self-adaptive learning rate is decoupled from the weight attenuation, the hyper-parameters are not dependent on each other, and the independent optimization of the hyper-parameters is realized. The calculation process of the DwdAdam optimization algorithm is as follows, equations (4) - (5):

θ＝[W，b] (5)

wherein W comprises a weight matrix W of a forgetting gate_fWeight matrix W of input gates_iWeight matrix W of input gates_CAnd a weight matrix W of output gates_oB bias matrix b containing forget gate_fBias matrix b of input gates_iBias matrix b of input gates_cAnd an offset matrix b of output gates_o，θ_tUpdated parameters for t time steps, θ_t-1Parameter, m, updated for t-1 time step_tIs a first moment vector of a first order,

is the deviation correction of the second moment vector, epsilon is the minimum value, eta is the learning rate, omega_t-1Is the weight decay rate.

And 4, step 4: building energy consumption prediction

By analyzing the data, the short-term load prediction has certain dependence on the load and weather factors of the previous days, so that the optimal hyperparameter of the LSTM network is obtained by using a Dwdadam optimization algorithm, and the most efficient result is obtained in the shortest time.

The whole work of building energy consumption prediction can be divided into 4 parts, namely (1) data cleaning; (2) optimizing an LSTM structure; (3) predicting a short-term energy consumption value; (4) evaluation of DwddAm-LSTM model. The algorithm flow is shown in fig. 2.

Step 4-1: data cleansing

The proposed model is used to predict the energy consumption requirements of a building, using real data collected within the building to prove the superiority of the established model. In order to further improve the efficiency of parameter updating and the accuracy of the model for predicting the energy consumption data, the abnormal and missing data in the original data set are interpolated, the main factors influencing the energy consumption are screened out by adopting a correlation analysis method, the data normalization processing is in the range of [0,1], and the sample set is divided into a training set and a test set as shown in fig. 3(a) -3 (d).

Step 4-2: optimizing the structure of an LSTM neural network

Before load prediction using LSTM, the load, temperature, humidity, solar radiation data for the first 6 hours were converted into a three-dimensional array as raw data for predicting the later time step. And determining the LSTM batch b, the number d of hidden layers and the number u of hidden units by adopting a grid search method. b. d and u form a three-dimensional search space, stepb, stepd and stepu respectively correspond to the grid step length searched by each parameter, a training data set in a historical database is used for training and testing, 5 times of training are carried out in the value range of the parameters, and the average absolute error MAE value is used as the target function of the grid search algorithm to obtain the optimal parameters corresponding to the LSTM neural network. The ranges of values of the LSTM network parameter variables are shown in table 1.

TABLE 1

Number of batches	[13-18]
Number of batches	[13-18]	Number of hidden layers	[1-3]
Number of hidden layer units	[20-80]	Number of hidden layers	[1-3]

Optimal parameters are introduced into the LSTM variant neural network. Then optimizing the hyperparameter W in the LSTM by using DwdAdam optimization algorithm_f、W_i、W_C、W_o、b_f、b_i、b_oAnd b_cFirst, the parameter β in DwdAdam is set₁、β₂Eta and initialization parameter vector g_t、m_t、v_t、

θ_tThen, training is performed by using a training data set in the database, and a group of hyper-parameters which enable the loss function f (theta) to be minimum is found, so that an optimal LSTM prediction model is obtained.

Step 4-3: short term energy consumption prediction

And (4) testing the optimal LSTM prediction model obtained in the step (4-2) by using a test set of a database to obtain a short-term energy consumption prediction value.

Step 4-4: evaluation model

And evaluating the accuracy of the energy consumption prediction model by using CV-RMSE, MSE, MAE and MAPE according to the energy consumption prediction value.

Examples

The embodiment proves the excellent prediction performance of the used model by researching the energy consumption data of a large commercial building.

Description of the experiments

Hardware equipment such as a temperature and humidity sensor, a solar radiation sensor, a micro wind speed sensor, an intelligent electric meter, an intelligent gateway, a DDC controller, a data concentrator, an air switch, a 24V switching power supply and the like are used in the experiment. The DwdAdam-LSTM energy consumption prediction model used was implemented using a Python3.8 language environment under the Windows10 operating system based on the AMD R7 processor. The whole experiment process is divided into four stages, (1) data acquisition; (2) preprocessing data; (3) setting a DwdAdam-LSTM model; (4) and (6) performance evaluation.

1.1 data set description

Data was collected from a large commercial building with a mall height of 40.6 meters and a building area of about 25 thousand square meters, with an air conditioner footprint of about 18.76 thousand square meters. The data set comprises actual data of temperature, humidity, solar radiation, wind speed and air conditioning load from 8 am to 22 pm from 6/2/2021/8/12/2021/12/morning, and the actual data is recorded once per hour, and 1080 groups of data are counted.

1.2 data preprocessing

The data preprocessing is an indispensable step before analyzing the data, so that the model is prevented from being unscented and found due to missing values, abnormal values and the like, meanwhile, the compatible format of the learning model caused by dimension difference among the data is also considered, and the prediction precision is prevented from being reduced by adopting the minimum and maximum normalization processing.

And (3) researching the correlation between the energy consumption and each influence factor by observing the internal rule of each group of data and adopting a correlation analysis method, and screening out factors with high correlation. Training and testing samples are divided for the processed data set, so that the complexity of the model is reduced, and the prediction precision of the model is improved. The experimental data sets totaled 1080 sets with test samples accounting for 10% of the total data set. 3(a) -3 (d) show the correlation of the load and other influencing factors in a lag period of 6 hours, and the correlation of the load with the temperature, the humidity and the solar radiation is obviously found to exceed the upper and lower confidence limits, wherein the correlation coefficient of the load with the solar radiation is the highest, but the correlation of the load with the wind speed is the lowest, so that the wind speed variable is eliminated.

1.3 DwdAdam-LSTM model set-up

The structure parameters of the LSTM are obtained by a grid search algorithm, and the specific parameter settings are shown in table 2.

TABLE 2

Simulation experiments were carried out in a tensiflow-based Keras deep learning library in Python.

1.4 model evaluation index

In order to effectively evaluate the model, four indexes of Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Mean Square Error (MSE) and mean square error coefficient of variation (CV-RMSE) are used as the standard for evaluating the performance of the energy consumption prediction model, the MAE is used as a loss function, and the smaller the value of the loss function is, the better the performance of the prediction model is.

2 results and discussion

Various experimental models are compared and analyzed, and the advantages of the Dwdadam-LSTM in energy consumption prediction accuracy compared with the comparative experimental model are proved. In order to reduce the accidental error of the experiment, the performance evaluation value is obtained by averaging after 20 times of running.

2.1 LSTM optimal parameters

The first step of the experiment is to find the best LSTM hidden layer number, batch and hidden unit number by using a grid search algorithm. An LSTM neural network model with hidden layer number [1, 3], batch range [5, 18] and hidden unit number range [30, 80] was tested.

The experimental result shows that the optimal batch is 15, the main reason is that the opening time of a market is 8-22 points per day, the energy consumption prediction aims at the air conditioning load of a commercial area, and the change of the air conditioning load is related to weather and time factors and becomes regular change. The optimum hidden unit is 50, and the number of hidden layers is 3. Experiments again verify the magnitude of the loss values of different hidden layers under the optimal hidden unit and batch, and fig. 4 shows the influence of different numbers of hidden layers through 5 iterations in the form of mean absolute error MAE. It is seen from fig. 4 that the loss of 2 hidden layers is slightly higher than that of the other two, wherein the loss of 3 hidden layers is the smallest in three times, but the training loss of 3 hidden layers is introduced into the test data set to iterate 100 times to find as in fig. 5 and fig. 6, although the training loss of 3 hidden layers is smaller than that of 1 hidden layer, the tested lost hidden layer of 3 is inferior to that of 1 hidden layer, which indicates that the overfitting phenomenon is generated, and therefore the optimal number of hidden layers is 1.

4.2.2 comparison of the method of the invention with the prediction model of the mainstream

The LSTM and SVR neural networks have energy consumption prediction capability, and an optimization algorithm and a data driving model are mixed for research when multi-source heterogeneous data are processed, so that the precision and the stability of an energy consumption prediction model can be improved.

The predicted value obtained by applying the LSTM neural network and the SVR network is basically consistent with the actual value, but the error of the SVR network in the week is obviously higher than that of the LSTM neural network, and the range of the error is relatively large. Analysis finds that the influence of the energy consumption of the air conditioning system and external environmental factors is large, the fluctuation is strong, and in addition, a plurality of input parameters such as temperature, solar radiation and the like are also reasons for the performance reduction of the SVR. However, LSTM is a recurrent neural network that retains both the non-linear mapping capabilities of SVR neural networks and is suitable for processing trend data, since historical data is long and forgetting gates retain useful data.

After different optimization algorithms are applied to optimize the LSTM neural network, the prediction effect is remarkably improved, but the integral absolute error of SCA-LSTM is found to be large, the extreme difference between errors of RMSprop-LSTM and Adam-LSTM is large, the stability is poor, DwdAdam-LSTM is stable in performance, and the error fluctuation range is within 15%.

Calculating the quality performance indexes of different prediction models under the optimal architecture, and experiments show that the single data driving model LSTM has higher prediction precision than SVR, and the MSE value of the time-by-time energy consumption prediction is reduced by 23%; however, a single data-driven model is far inferior to a hybrid model in terms of convergence problem and model accuracy, and the performance of the hybrid model is also different among various performance indexes, for example, the SCA-LSTM is inferior to the RMSprop-LSTM in MAE and MAPE performance, but the MSE is good. SCA-LSTM in the mixed model has the worst performance, and CV-RMSE, MAE, MAPE and MSE of the SCA-LSTM are reduced by 19%, 8.6%, 10.7% and 35.7% compared with LSTM; compared with the LSTM, CV-RMSE, MAE, MAPE and MSE of the proposed model Dwdadam-LSTM are respectively reduced by 58%, 81%, 79% and 78%, and compared with Adam-LSTM with inferior performance, CV-RMSE, MAE, MAPE and MSE are respectively reduced by 41%, 53%, 52% and 30%, and the comparison results of quality performance indexes of energy consumption prediction of different prediction models are shown in Table 3.

TABLE 3

It can be seen that the energy consumption prediction model based on DwdAdam-LSTM is significantly improved in various performance evaluation indexes compared with the comparative model.

Claims

1. A building energy consumption prediction method based on improved LSTM is characterized by comprising the following processes:

obtaining optimal parameters corresponding to the LSTM neural network;

introducing the optimal parameters into an LSTM variant neural network, optimizing the hyper-parameters in the LSTM variant neural network by using a random gradient optimization algorithm based on weight attenuation to obtain the optimal hyper-parameters of the LSTM variant neural network, and taking the LSTM variant neural network corresponding to the optimal hyper-parameters as an optimal LSTM prediction model;

2. The improved LSTM-based building energy consumption prediction method of claim 1, wherein the LSTM neural network corresponding to the optimal parameters obtaining process comprises:

3. The improved LSTM-based building energy consumption prediction method of claim 2, wherein the LSTM neural network corresponding optimal parameters comprise batch number, hidden layer number and hidden unit number, wherein the batch range is 13-18, the hidden layer number is 1-3, and the hidden unit number is 20-80.

4. The improved LSTM-based building energy consumption prediction method of claim 1, wherein the LSTM variant neural network model is obtained by improving the gate structure of the LSTM neural network;

5. The improved LSTM-based building energy consumption prediction method of claim 4, wherein:

the forgetting gate calculation formula is as follows:

f_t＝sigmoid(W_f[x_t，h_t-1，C_t-1]+b_f)

the input gate calculation formula is as follows:

i_t＝sigmoid(W_i[x_t，h_t-1，(1-f_t)]+b_i)

in the formula, C_t-1Is the t-1 time step cell status, 1-f_tIs the old information retained, W_fTo forgetWeight matrix of the gate, W_iAnd W_CAs a weight matrix of input gates, vector b_fBias vector for forgetting gate, b_iAnd b_cIs the offset vector of the input gate, h_t-1Is the output value, x, of the t-1 time step hidden state_tIs t time step input information, f_tIs left-behind gate output, i_tIs the output of the input gate or gates,

6. The improved LSTM-based building energy consumption prediction method according to claim 1, wherein the weight attenuation-based stochastic gradient optimization algorithm is obtained by introducing a weight attenuation term during parameter update after gradient moment estimation in an Adam optimization algorithm through bias correction.

7. The improved LSTM-based building energy consumption prediction method according to claim 6, wherein the weight attenuation-based stochastic gradient optimization algorithm has the following parameter update formula:

θ＝[W，b]

wherein W includes a weight matrix of a forgetting gate, a weight matrix of an input gate, and a weight matrix of an output gate, b includes an offset matrix of a forgetting gate, an offset matrix of an input gate, and an offset matrix of an output gate, and θ_tUpdated parameters for t time steps, θ_t-1Parameter, m, updated for t-1 time step_tIs a first moment vector of a first order,

is the deviation correction of the second moment vector, epsilon is the minimum value,eta is learning rate, omega_t-1Is the weight decay rate.

8. The improved LSTM-based building energy consumption prediction method of claim 1, wherein the hyper-parameters optimized for the LSTM variant neural network using the weight attenuation-based stochastic gradient optimization algorithm comprise weight matrix of forgetting gates, weight matrix of input gates, weight matrix of output gates, bias matrix of forgetting gates, bias matrix of input gates, bias matrix of output gates; when optimization is carried out, firstly, the exponential decay rate of the first estimation, the exponential decay rate of the second estimation, the learning rate, the gradient of the initialization parameter vector t in a time step, the first moment vector, the second moment vector, the deviation correction of the first moment vector, the deviation correction of the second moment vector and the updating parameter are set in a random gradient optimization algorithm based on weight decay, then, training is carried out by using a training data set in a pre-established database, a group of hyper-parameters which enable the loss function f (theta) to be minimum are found, and the hyper-parameters at the moment are the optimal hyper-parameters.

9. The improved LSTM based building energy consumption prediction method of claim 1 where the data affecting building load includes temperature, humidity, solar radiation, wind speed and air conditioning load actual data.

10. An improved LSTM based building energy consumption prediction system, comprising:

dividing data into units: the LSTM variant neural network prediction model is used for introducing the optimal parameters into the LSTM variant neural network, optimizing the super parameters in the LSTM variant neural network by using a random gradient optimization algorithm based on weight attenuation to obtain the optimal super parameters of the LSTM variant neural network, and taking the LSTM variant neural network corresponding to the optimal super parameters as the optimal LSTM prediction model;