CN113962454A

CN113962454A - LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization

Info

Publication number: CN113962454A
Application number: CN202111213171.9A
Authority: CN
Inventors: 谌东海; 王宁; 刘杰; 王伟; 刘畅
Original assignee: Changjiang Institute of Survey Planning Design and Research Co Ltd
Current assignee: Changjiang Institute of Survey Planning Design and Research Co Ltd
Priority date: 2021-10-18
Filing date: 2021-10-18
Publication date: 2022-01-21

Abstract

The invention discloses an LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization. The method comprises the following steps: performing correlation analysis on time and feature dimensions of the original data set by adopting an MI mutual information method, and selecting front N' dimension features most effective on the energy consumption prediction target value; step two: performing secondary feature selection on the N-dimensional features to obtain N' dimensional features after PMI feature selection; step three: performing model training and prediction on the data after PMI dual feature selection by adopting an LSTM model to obtain an initial prediction sequence y (t); step four: and optimizing the hyperparameter units, dropout and batchsize of the LSTM model by adopting a PSO algorithm, thereby improving the prediction precision of the LSTM model and finally obtaining the PMI-LSTM-PSO model. The method has the advantages of high prediction precision, high algorithm efficiency and stable prediction performance.

Description

LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization

Technical Field

The invention relates to the technical field of building energy consumption prediction, in particular to an LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization.

Background

With the wide application of more and more complex science and technology products, the demand for electric power is gradually increasing worldwide, and the electric power grid needs to be controlled to realize the sustainable development of electric power. In the artificial intelligence era, the power internet of things is gradually connected into daily life, the development of the smart power grid also needs adaptive testing capability, and the smart power meter is produced accordingly. The continuous expansion of the infrastructure of the intelligent electric meter in the global range also lays a foundation for introducing an active electric energy system into an intelligent power grid. Since the 'strong smart grid' plan was introduced in 2009, the power grid companies in China are always deploying smart meters, power distribution automation, embedded intelligence and other technologies on a large scale.

For household buildings and enterprise buildings, the prediction of energy consumption is used for improving the use efficiency of energy consumption and reducing the energy consumption, so that the method has great practical significance. Commercial and residential buildings account for 30% to 40% of the total energy consumption of intelligent buildings. Current trends indicate that this percentage may increase in the near future and that global energy consumption and penetration are increasing. Short-term energy consumption prediction is crucial, and is a challenging problem due to the complexity and various uncertainties of infrastructure behavior of buildings, and the disadvantages of low efficiency, serious waste of electric energy, weak information interaction capability and low automation degree of the traditional power grid.

In view of this, researchers have developed many predictive methods to improve grid quality and optimize energy usage. In many related researches, a time series model ARIMA and the like are also often used as a reference model for verifying whether the prediction performance of some newly proposed methods is superior. Researchers now often use historical data in conjunction with machine learning and deep learning algorithms, such as Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), adaptive neuro-fuzzy inference systems (ANFIS), and Extreme Learning Machines (ELMs) for prediction. The convolutional neural network, the BP neural network and the like have been studied in the field of power consumption, but are still in the early stage of the prediction method.

In the data preprocessing process, the accuracy of the model is largely determined by the quality of feature selection of the original data. The predictive model is better enhanced if the number of input data features can be reduced by selecting the most efficient and useful inputs. The methods of feature selection methods include correlation analysis and numerical sensitivity analysis, but these methods are linear input selection methods, while the energy consumption data are nonlinear. Therefore, the mutual information feature selection method is more effective, and the efficiency of calculating the correlation between input data and output data is high. Feature variable selection based on mutual information is a novel variable selection method, wherein mutual information is quantized and the correlation between different related variables is calculated.

1) MI mutual information algorithm

Mutual Information (MI), which represents the interdependence between two variables X and Y.

Mutual information I (X; Y) between X and Y is defined as:

wherein p (x, y) is a joint probability density function, and p (x), p (y) are edge probability density functions of x and y, respectively. MI is the amount of information used to evaluate the contribution of the occurrence of one event to the occurrence of another event. The MI mutual information method is characterized in that mutual information measurement of all characteristics and target characteristics is calculated, then sequencing is carried out, and N' characteristics with the highest correlation are selected, so that the purpose of characteristic selection is achieved.

2) Correlation coefficient of Person

Wherein the content of the first and second substances,

are the average values of X and Y, respectively. If r is more than or equal to 0.5, the correlation between X and Y is stronger, otherwise, the correlation between X and Y is weaker. The features can be further reduced by performing a quadratic feature selection by the Person correlation coefficient.

3) LSTM model

LSTM is a deep learning model that can efficiently process longer time series and automatically learn data and mine deeper functions. However, similar to other neural network models, the setting of part of hyper-parameters in the LSTM neural network model often depends on the experience of researchers, and such models lack scientific rigor. PSO has the advantage of being simple to implement, PSO solutions provide faster convergence speed, and no many parameters need to be adjusted. Genetic algorithms and ant colony algorithms, etc. do not have such a guiding mechanism.

The long-short Time neural memory network (LSTM) is proposed by Hochreiter and used for solving the problems of gradient extinction and gradient explosion existing in Back-propagation Through Time (BPTT). With the continuous improvement of the model, the LSTM network architecture is gradually developed into the widely used LSTM network architecture. The internal part of the device consists of 3 unique gate structures and 1 state module for storing and memorizing. The structure of the LSTM cell is shown in fig. 1. Wherein C is_tFor the state information stored in the local LSTM cell, h_tFor the output of the hidden layer of this unit, f_tTo forget the door, i_tIn order to input the information into the gate,

as information of the current time o_tIn order to output the output gate, the output gate is provided with a gate,

which represents the multiplication of the elements of the matrix,

representing a matrix addition.

Forget the door: control the last unitState C_t-1Degree of forgetting:

f_t＝σ(W_f·[h_t-1,x_t]+b_f) (3)

an input gate: control which information is added to the unit:

i_t＝σ(W_i·[h_t-1,x_t]+b_i) (4)

updating the state of the unit: according to f_tSelectively recording new information to C_tThe method comprises the following steps:

an output gate: c is to be_tActivating and controlling C_tDegree of being filtered:

o_t＝σ(W_o·[h_t-1,x_t]+b_o) (7)

W_f，W_i，

W_oweight matrices corresponding to the respective modules, b_f，b_i，

b_oIs a bias term, sigma is a sigmoid activation function, and tanh is a hyperbolic tangent activation function defined as

σ(x)＝1/(1+e^-x) (9)

tanh(x)＝(e^x-e^-x)/(e^x+e^-x) (10)

The output layer is represented by the formula (11)_tObtaining the final predicted value y through a full connection layer (dense)_t：

Wherein, W_y，b_yRespectively, a weight matrix and an offset term.

y_t＝σ(W_y·h_t+b_y) (11)

The LSTM controls the transfer of historical information through a gate function, and has certain time sequence processing and prediction capabilities.

4) PSO particle swarm optimization algorithm

The basic idea of the particle swarm optimization is as follows: a group of birds randomly flies to a certain position in a certain area to search for food, and all the birds only know the distance between the birds and the food and the position information of other birds. Each bird, when flying away from the current location to another location, will rely on the following information: at present, the surrounding area of the bird nearest to the food is judged according to the flying experience of the bird.

The PSO is initialized to a population of random particles (random solution). The optimal solution is then found by iteration. In each iteration, the particle updates itself by tracking two "extrema" (the local optimal solution pbest, the global optimal solution gbest). After finding these two optimal values, the particle updates its velocity and position by the following formula.

v_i＝v_i+c₁×rand()×(pbest_i-x_i)+c₂×rand()×(gbest_i-x_i) (12)

x_i＝x_i+v_i

Wherein i is 1, 2, …, and N is the total number of particles in the particle group.

v_i: current velocity of ith particle

And rand (): random number between (0, 1)

x_i: current position of i particle

c₁And c₂: learning factor

pbest_iAnd gbest_iRespectively is the local optimum of the current particle swarmLocation and global optimum location.

However, the existing MI mutual information algorithm, LSTM model and PSO particle swarm optimization algorithm have low precision on energy consumption prediction and unstable prediction performance, and do not meet the requirement of building energy consumption prediction. Therefore, it is necessary to develop an energy consumption prediction method applied to buildings, which has high prediction accuracy and stable prediction performance.

Disclosure of Invention

The invention aims to provide an LSTM energy consumption prediction method based on multi-dimensional feature selection and particle swarm optimization, which is an energy consumption prediction method applied to buildings, and has high prediction precision and stable prediction performance.

In order to achieve the purpose, the technical scheme of the invention is as follows: an energy consumption prediction method based on MI-LSTM-PSO is characterized by comprising the following steps: as shown in fig. 2, includes the steps of,

the method comprises the following steps: performing correlation analysis on time and feature dimensions of the original data set by adopting an MI mutual information method, and selecting front N' dimension features most effective on the energy consumption prediction target value, thereby eliminating redundant data and playing a role in improving the efficiency of a model algorithm;

step two: calculating a pearson correlation coefficient value between the top N 'dimensional feature selected by the MI mutual information method and the predicted sequence, and selecting an N' dimensional feature with the pearson correlation coefficient value being greater than or equal to 0.5;

step three: performing model training and prediction on the N' dimensional feature data after PMI dual feature selection by adopting an LSTM model to obtain an initial prediction sequence y (t);

step four: and optimizing the hyperparameter units, dropout and batchsize of the LSTM model by adopting a particle swarm optimization PSO algorithm, thereby improving the prediction precision of the LSTM model and finally obtaining the MI-LSTM-PSO model.

In the above technical solution, in the first step and the second step, N' is 60, that is, the first 60-dimensional feature most effective for the energy consumption prediction target value is selected.

In the above technical solution, the first step specifically includes the following steps,

s11, forming the first 24-hour 20-dimensional feature data into 24M (i.e. 480) -dimensional feature components by using a sliding window, wherein the original data sequence comprises: photovoltaic power generation capacity of 2 areas, energy consumption of 17 areas and different facilities, and total input electric quantity of a system power grid (data can be in different sequences according to different scene data sets);

s12, selecting the feature of the above 24M (480) dimension feature component by MI mutual information method;

wherein p (X, Y) is a joint probability density function of X and Y, and p (X) and p (Y) are marginal density functions, if X and Y are not related at all, p (X, Y) will be equal to p (X) p (Y), and mutual information will be equal to 0, if I (X; Y) is larger, the correlation between the two variables is stronger;

s13, determining the optimal parameter N of MI feature selection dimension through experimental optimization; if the value of N is too large, the model training data set will contain too much redundant information and noise, which will deteriorate the prediction performance, while if the value of N is too small, the model training data set will contain too little information, which will also deteriorate the prediction result; generally, the optimal N value is between 3M and 6M, and the feature dimension with better prediction performance and smaller N value is selected;

and S14, based on mutual information sequencing of the characteristic sequence x (t) and the target sequence Y, integrating time and characteristic dimension data, and selecting the first 60-dimensional characteristic most effective on the energy consumption prediction target value as a training data set of a subsequent model.

In the above technical solution, the second step specifically comprises the following steps,

s21, calculating a pearson correlation coefficient of the above 60-dimensional feature component with the target sequence Y (i.e. Gi);

wherein the content of the first and second substances,

are respectively X and are respectively a group of X,the average value of Y; if r is more than or equal to 0.5, the correlation between X and Y is stronger, otherwise, the correlation between X and Y is weaker;

and S22, selecting 37-dimensional feature data with the pearson correlation coefficient larger than or equal to 0.5 according to the fact that the pearson correlation coefficient is smaller than 0.5, which indicates that the correlation between the two is weak.

In the above technical solution, the LSTM network includes three gate structures and a state module for storing memory, as shown in fig. 1, the third step specifically includes the following steps:

s31, setting C_tFor the state information stored for the local LSTM cell, x_tAs input to the input layer, h_tFor the output of the hidden layer of this unit, f_tTo forget the door, i_tIn order to input the information into the gate,

as information of the current time o_tFor the output gate, "×" indicates matrix element multiplication, "+" indicates addition operation, σ is sigmoid function;

s32, forget gate: for controlling the last cell state C_t-1The degree of forgetting, the expression of which is as follows:

f_t＝σ(W_f*[h_t-1,x_t]+b_f) (3)

s33, input gate: for controlling which information is added to the unit, the expression is as follows:

i_t＝σ(W_i*[h_t-1,x_t]+b_i) (4)

s34, cell stored state information: for according to f_tAnd i_tSelectively recording new information to C_tWherein the expression is as follows:

s35, output gate: for mixing C_tActivating and controlling C_tThe degree of filtering is expressed as follows:

o_t＝σ(W_o*[h_t-1,x_t]+b_o) (7)

h_t＝o_t*tanh(C_t) (8)

wherein h is_tThe output of the hidden layer of the unit; h is_t-1The output of the previous unit hidden layer; w_f、W_i、

W_oAre respectively f_t、i_t、

o_tCorresponding weight matrix, b_f、b_i、

b_oAre respectively f_t、i_t、

o_tThe corresponding bias term, tanh, is a hyperbolic tangent activation function, defined as follows:

σ(x)＝1/(1+e^-x) (9)

tanh(x)＝(e^x-e^-x)/(e^x+e^-x) (10)

s36, the output layer is h_tObtaining the final predicted value y through a full connection layer_t：

y_t＝σ(W_y*h_t+b_y) (11)

In the above formula, W_yAnd b_yRespectively, a weight matrix and an offset term.

In the above technical solution, the step four specifically includes the following steps,

s41, initializing modification parameters, setting the range units belonging to [20,300], dropout belonging to [0,1], batchsize belonging to [20,300 ];

s42, randomly initializing a particle swarm (20 particles) in an initial range, calculating an adaptive value (mean absolute error MAE) of each particle according to a fixness function (LSTM model fitting result), and determining the optimal position (pbest) of the particle swarm of the iteration and the optimal orientation (gbest) of a historical particle swarm according to the prediction index MAE of each current particle;

s43, updating the position and the speed of the current particle according to the position and the speed of the optimal particle, fitting the updated particle through an LSTM model, calculating the MAE of each particle, and updating pbest and gbest according to the MAE;

v_i＝v_i+c₁×rand()×(pbest_i-x_i)+c₂×rand()×(gbest_i-x_i) (12)

x_i＝x_i+v_i

in formula (12): i is 1, 2, …, N is the total number of particles in the population;

v_i: the current velocity of the ith particle;

and rand (): a random number between (0, 1);

x_i: i current position of the particle;

c₁and c₂: a learning factor;

pbest_iand gbest_iRespectively obtaining a local optimal position and a global optimal position of the current particle swarm;

s44, after the updated particles are trained through an LSTM model, calculating the adaptive value of each particle, and updating the optimal position of the particle swarm of the iteration and the optimal orientation of the historical particle swarm according to the adaptive value;

s45, when the fitness value of the optimal particle is not changed any more or the iteration number reaches the upper limit value, the algorithm is considered to have converged at the moment; if the particle is not converged, the flow returns to S33 to update the particle;

and S46, substituting the obtained optimal particle parameters units, dropout and batchsize into the LSTM model, and performing model prediction on the data in the first step to obtain a final prediction result.

The foregoing "+" indicates: and multiplied by it.

The invention has the following advantages:

(1) the invention is an energy consumption prediction method applied to buildings, which has high prediction precision and stable prediction performance;

(2) according to the method, redundant characteristics are reduced by 87.5% through MI, a good effect is achieved on improving the efficiency of the model algorithm, and the efficiency of the model algorithm is high;

(3) the method adopts the PSO algorithm to optimize the hyperparameter units, dropout and Batchsize of the LSTM model, thereby improving the prediction precision of the LSTM model and achieving good model fitting effect;

(4) the prediction value of the PMI-PSO-LSTM model is basically in the confidence interval of the true value, the prediction trend is close to the true value, and the prediction precision is high;

(5) the MAE and SMAPE of the PMI-PSO-LSTM combined model are superior to all results of other models, and the PMI-PSO-LSTM combined model has higher robustness and more stable prediction performance.

Drawings

Fig. 1 is a schematic view of the internal structure of a conventional LSTM.

FIG. 2 is a schematic structural diagram of the PMI-PSO-LSTM model of the present invention.

FIG. 3 is a graph comparing the predicted results of the basic model according to the embodiment of the present invention.

FIG. 4 is a scatter plot comparing the prediction results of the base model in accordance with the present invention.

FIG. 5 is a comparison graph of the combined model prediction results according to the embodiment of the present invention.

FIG. 6 is a comparison scatter plot of combined model prediction results according to an embodiment of the present invention.

FIG. 7 is a comparison chart of evaluation indexes of the model according to the embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described in detail with reference to the accompanying drawings, which are not intended to limit the present invention, but are merely exemplary. While the advantages of the invention will be clear and readily understood by the description.

Examples

The invention will be described in detail by taking the prediction of the electricity consumption of a certain building as an example, and has a guiding function for applying the invention to the prediction of the energy consumption of other buildings.

The implementation takes the historical electricity consumption of a certain building as a time sequence to predict the electricity consumption of a short-term single step 1 h.

In this embodiment, the prediction of the power consumption of a certain building includes the following contents:

1. experimental data set and MI feature selection

The data set used in the embodiment is the electricity consumption of a building from 10 and 15 days in 2019 to 6 and 4 days in 2019, and the data set has 20 characteristics in total. These features are described in table 1. Where column 5 data is the pearson's correlation coefficient value for the current feature and the Gi feature.

Table 1 data set description

In the present embodiment, the data of the previous 24 hours is used to predict the value of Gi in the next hour, so that the data of 20 features in 24 hours is formed into 480 feature components using a sliding window. Then, the first 60-dimensional feature with the maximum MI value among the 480 feature components formed by the sliding window method is selected by using an MI mutual information method.

The selection results are shown in table 2; the selection results are shown in table 2;

wherein, selected characteristics such as Gi (t-1) represent that the previous hour is input from the public power grid of the industrial factory building by taking the current time as a reference;

TABLE 2 characteristics of MI selection

Wherein the selected characteristic, such as Gi (t-1), indicates that the previous hour was entered from the industrial plant utility grid based on the current time. The MI value is the size of the mutual information value of the current characteristic component X and the Gi component (i.e. I (X; Gi (t)) based on the current time, and as can be seen from Table 2, the mutual information values of most of characteristics in the previous four hours and the Gi characteristic at the current time are larger, and the mutual information values of the characteristics in the previous 24 hours of Gi, Ao, Co and A2 and the Gi characteristic at the current time are also relatively larger, therefore, the MI value is reduced by 87.5% of redundant characteristics, and the method plays a good role in improving the efficiency of the model algorithm.

The data set used in this example is a 20-dimensional feature, and the previous 24-hour data is used to predict the 25 th hour data in the future.

Experiments were performed on the 20-dimensional feature data set in this example, and the experimental results show that:

1) the prediction result obtained by selecting the first 60-dimensional features is almost the same as the 100-dimensional feature;

2) when the feature data dimension is increased (namely, the feature data dimension is selected to be more than 100), the prediction result is deteriorated;

3) when the feature data dimension is reduced (i.e. the feature data dimension is selected to be less than 60), the data set contains too little information, which also degrades the prediction result.

Therefore, the present embodiment selects the top 60-dimensional feature having the largest MI value among the 480 feature components formed using the sliding window method using the MI mutual information method.

2. Evaluation index

4 evaluation indexes are used for evaluating the quality of the model.

Root mean square error: RMSE, the smaller the number, the better the model fit.

Mean absolute error: the smaller the MAE, the better the model fitting.

Mean absolute percentage error of symmetry: SMAPE, the smaller the value, the better the model fitting effect.

Coefficient of block: r2, the larger the number, the better the model fit.

In the formulae (13), (14), (15), (16),

to predict value, y_iIn order to be the true value of the value,

the mean of the true values, n is the number of data.

3. Model parameter setting

In order to verify the prediction effect of the proposed MI + PSO-LSTM combined model, this example uses two groups of 6 experimental models (i.e. M1-M6) in Table 3 for experimental comparison, and the main parameters of the models are shown in tables 4 and 5.

TABLE 3 experimental reference model

No	Model (model)	Description of the invention
			M1	ARIMA	Differential integration moving average autoregressive model
M2	KNR	K nearest neighbor (regression) model
			M3	LSTM	LSTM model
M4	MI-LSTM	Mutual information method + LSTM model
			M5	PMI-LSTM	Mutual information method + LSTM model
M6	PMI-LSTM-PSO	Mutual information method + PSO optimization LSTM model

Table 4 comparative model main parameters 1

Table 5 comparative model principal parameters 2

4. Analysis of model Experimental data

4.1 analysis of basic model test results

In the embodiment, a basic model M1-M3 of Table 3 is adopted, and single-step prediction experiment comparison is carried out on the total input electric quantity Gi of the public power grid through characteristics 1-20.

In the experimental comparison results (table 6), the best LSTM model prediction results can be seen from the four model prediction evaluation indexes, namely the coefficient of performance, the root mean square error, and the symmetric average absolute percentage error.

TABLE 6 comparison of basic model experiments

Model (model)	R2	RMSE	MAE	SMAPE
					ARIMA	0.872609	12.1688	7.496174	8.548175
KNR	0.849556	13.21612	8.155453	9.543262
					LSTM	0.889503	11.211024	6.622012	7.594866

The comparison of the predicted results of 1h power usage predicted by ARMA, K neighbors and LSTM with the true values is shown in fig. 3 and 4. It can be seen from fig. 3 and 4 that the predicted trend of the LSTM model is closest to the true value, and only the LSTM model is within the confidence interval of the original value. The result curve predicted by the ARIMA and K neighbor model is not in the confidence interval of the true value, and the problem of prediction lag exists. In summary, the predicted effect of the LSTM model is best compared to the ARMA, K-nearest neighbor regression model. LSTM was chosen as the experimental base model.

4.2 analysis of the results of the LSTM combined model experiment

In the embodiment, 20 groups of single-step prediction comparison experiments are carried out on the total input electric quantity Gi of the public power grid through the characteristics 1-20 by adopting the combined models M3-M6 shown in the table 3.

The comparison of the predicted results of the four models for predicting 1h electricity consumption Gi with the true values is shown in fig. 5 and 6. It can be seen from fig. 5 and 6 that the predicted values of the four models are substantially within the confidence interval of the true values, and the predicted trend of the PMI-PSO-LSTM model is closest to the true values. As can be seen from fig. 7, the evaluation indexes of the PMI-PSO-LSTM model are all optimal (in fig. 7, M3, M4, M5, and M6 are combination models M3-M6 in table 3, respectively, in this embodiment).

Table 7 shows the average of the experimental results of 20 groups of four combined models, the first four columns are four evaluation indexes of the prediction model, and the fifth column is the training time of the prediction model. As can be seen from Table 7, the MI + PSO-LSTM model did not improve significantly on R2, but improved performance by about 20%, 10%, 5% on MAE, SMAPE, respectively, compared to the LSTM, MI-LSTM, and PMI-LSTM models. Compared with the LSTM model, the performance of MI-LSTM is not improved significantly, but after features are selected through MI, the dimension of input data is reduced by 87.5%, and the time for model training is reduced by about 63%. Compared with the MI-LSTM model, the PMI-LSTM performance is hardly improved, but after secondary feature selection, the dimension of input data is reduced by about 40%, so that the time for model training is reduced by about 20%;

TABLE 7 comparison of evaluation indexes of combination models

Model (model)	R2	RMSE	MAE	SMAPE	t
						LSTM	0.88724	11.18282	7.19766	8.56986	159S
MI-LSTM	0.89722	10.67590	6.66639	7.82360	59S
						PMI-LSTM	0.92301	10.73256	6.42070	7.49299	46S
MI-PSO-LSTM	0.90482	10.27717	6.12843	6.87869	44S

FIG. 7 is a box plot of four evaluation indexes of 20 experiments of M3-M6. The '+' symbols in fig. 7 that are not within the box shape are outliers (negligible). As can be seen from FIG. 7, the four evaluation indexes of the MI-PSO-LSTM model are obviously superior to those of the other three models, the MAE and SMAPE of the MI-PSO-LSTM model are superior to all the results of the other models, and the R2 and RMSE of the MI-PSO-LSTM model are also superior to those of the other models by about 95%. The four evaluation indexes of MI + LSTM are partially overlapped with LSTM, but the overall trend of MI-LSTM is superior to that of the LSTM model. As can be seen from FIG. 7, the box plot shape (upper and lower quartile difference) of the MI-PSO-LSTM model is minimal compared to the LSTM, MI-LSTM, and PMI-LSTM models, indicating that the MI-PSO-LSTM model is more stable than the other models.

In summary, the invention provides a short-term energy consumption combined prediction model based on PMI, PSO and LSTM. Firstly, in the data preprocessing stage, the mutual information method and the Pearson coefficient are used for carrying out double feature selection on the original data, and redundant features are deleted. And then matching and optimizing the network architecture of the LSTM by using the PSO to ensure that the adaptability of the topology structure of the LSTM and the current input data is the best, and finally inputting the data after the characteristic selection into the optimized LSTM to predict the energy consumption data in a short term. In order to verify the effect of the MI-PSO-LSTM model on short-term energy consumption prediction, a multi-dimensional single-step prediction comparison experiment is carried out on an energy consumption time sequence dataset of a certain building. The results of the above experiments are combined to show that 4 evaluation indexes of the MI-PSO-LSTM combined model are all optimal, namely that the MI-PSO-LSTM model has higher prediction precision and robustness and more stable prediction performance. The MI-PSO-LSTM combined model can provide a beneficial research idea for exploring the aspect of predictive analysis of time series by utilizing deep learning. However, the MI-PSO-LSTM combined model still has a large optimization space, such as a noise filtering problem and a feature dynamic intelligent selection problem which are researched in time series, so that the model prediction accuracy is further optimized.

Other parts not described belong to the prior art.

Claims

1. A LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization is characterized by comprising the following steps: comprises the following steps of (a) carrying out,

the method comprises the following steps: performing correlation analysis on time and feature dimensions of the original data set by adopting an MI mutual information method, and selecting front N' dimension features most effective on the energy consumption prediction target value;

step two: performing secondary feature selection on the N-dimensional features selected in the step one by adopting a Person correlation coefficient to obtain N' dimensional features after PMI feature selection;

step three: carrying out model training and prediction on the N' dimensional feature data after PMI feature selection by adopting an LSTM model to obtain an initial prediction sequence y (t);

step four: and optimizing the hyperparameter units, dropout and batchsize of the LSTM model by adopting a particle swarm optimization PSO algorithm, thereby improving the prediction precision of the LSTM model and finally obtaining the PMI-LSTM-PSO model.

2. The LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization according to claim 1, wherein: the first step specifically comprises the following steps of,

s11, forming the M-dimensional feature data of the first 24 hours into 24M-dimensional feature components using a sliding window, wherein the original data sequence includes: photovoltaic power generation of 2 areas, energy consumption of 17 different facilities of the areas, and total electric quantity input by a system power grid;

s12, selecting the characteristics of the 24M dimensional characteristic components by using an MI mutual information method;

in formula (1): p (X, Y) is a joint probability density function of X and Y, and p (X) and p (Y) are marginal density functions, if X and Y are not related at all, p (X, Y) will be equal to p (X) p (Y), and mutual information will be equal to 0, if I (X; Y) is larger, the correlation between the two variables is stronger;

and S14, based on mutual information sequencing of the characteristic sequence x (t) and the target sequence Y, integrating time and characteristic dimension data, and selecting the first N' dimension characteristic most effective on the energy consumption prediction target value as a training data set of a subsequent model.

3. The LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization according to claim 2, wherein: the second step specifically comprises the following steps:

s21, calculating a Pearson correlation coefficient of the N' -dimensional characteristic component and the target sequence Y;

in formula (2):

respectively the average values of X and Y;

if r is more than or equal to 0.5, the correlation between X and Y is stronger, otherwise, the correlation between X and Y is weaker;

s22, selecting the N' dimension characteristic data with pearson correlation coefficient larger than or equal to 0.5.

4. The LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization according to claim 3, wherein: the LSTM network internally comprises three gate structures and a state module for storing and memorizing, and the third step specifically comprises the following steps:

f_t＝σ(W_f*[h_t-1,x_t]+b_f) (3)

i_t＝σ(W_i*[h_t-1,x_t]+b_i) (4)

o_t＝σ(W_o*[h_t-1,x_t]+b_o) (7)

h_t＝o_t*tanh(C_t) (8)

formula (3) to formula (8): w is_f、W_i、

W_oAre respectively f_t、i_t、

o_tCorresponding weight matrix, b_f、b_i、

b_oAre respectively f_t、i_t、

σ(x)＝1/(1+e^-x) (9)

tanh(x)＝(e^x-e^-x)/(e^x+e^-x) (10)

y_t＝σ(W_y*h_t+b_y) (11)

In formula (11): w_yAnd b_yRespectively, a weight matrix and an offset term.

5. The LSTM energy consumption prediction method based on dual feature selection + particle swarm optimization according to claim 4, wherein: the fourth step specifically comprises the following steps of,

s42, randomly initializing the particle swarm in an initial range, calculating an adaptive value of each particle according to the fixness function, and determining pbest of the iterated particle swarm and gbest of the historical particle swarm according to the prediction index MAE of each current particle;

v_i＝v_i+c₁×rand()×(pbest_i-x_i)+c₂×rand()×(gbest_i-x_i) (12)

x_i＝x_i+v_i

v_i: the current velocity of the ith particle;

and rand (): a random number between (0, 1);

x_i: i current position of the particle;

c₁and c₂: a learning factor;