CN110188919A - A kind of load forecasting method based on shot and long term memory network - Google Patents
A kind of load forecasting method based on shot and long term memory network Download PDFInfo
- Publication number
- CN110188919A CN110188919A CN201910325295.2A CN201910325295A CN110188919A CN 110188919 A CN110188919 A CN 110188919A CN 201910325295 A CN201910325295 A CN 201910325295A CN 110188919 A CN110188919 A CN 110188919A
- Authority
- CN
- China
- Prior art keywords
- data
- load
- shot
- long term
- term memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007787 long-term memory Effects 0.000 title claims abstract description 35
- 238000013277 forecasting method Methods 0.000 title claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000012545 processing Methods 0.000 claims abstract description 14
- 230000006835 compression Effects 0.000 claims abstract description 9
- 238000007906 compression Methods 0.000 claims abstract description 9
- 238000003066 decision tree Methods 0.000 claims description 7
- 238000000513 principal component analysis Methods 0.000 claims description 6
- 230000000644 propagated effect Effects 0.000 claims description 5
- 238000009826 distribution Methods 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 claims description 2
- 239000004744 fabric Substances 0.000 claims description 2
- 238000011478 gradient descent method Methods 0.000 claims description 2
- 238000012216 screening Methods 0.000 claims description 2
- 238000000034 method Methods 0.000 abstract description 11
- 230000006870 function Effects 0.000 description 6
- 238000011161 development Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000007774 longterm Effects 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 230000015654 memory Effects 0.000 description 5
- 238000013480 data collection Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000005611 electricity Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 208000006011 Stroke Diseases 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000002688 persistence Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 206010008190 Cerebrovascular accident Diseases 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000010248 power generation Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 238000004513 sizing Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Development Economics (AREA)
- Biophysics (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Primary Health Care (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to Techniques for Prediction of Electric Loads fields, disclose a kind of load forecasting method based on shot and long term memory network, comprising: acquisition target area forms raw data set in the power load data and corresponding weather characteristics data of certain time period;Missing values processing is carried out to raw data set using Spark cluster;Feature selecting is carried out to raw data set;Feature Compression is carried out to raw data set;Establish the prediction model based on shot and long term memory network;Distributed training is carried out to prediction model using Spark cluster;According to the weather characteristics data of previous time point and power load data, Distributed Predictive is carried out using prediction model, obtains the predicted load of current point in time.The present invention can solve is difficult under big data scene the problem of fast and efficiently carrying out electro-load forecast in the prior art, effectively can quickly be extracted, be handled and operation to large data sets.
Description
Technical field
The present invention relates to Techniques for Prediction of Electric Loads field more particularly to a kind of loads based on shot and long term memory network
Prediction technique.
Background technique
Load prediction problem is about prediction electric power enterprise electric load needed for some specific time in the future, is electricity
One of core content in network planning stroke.Electric power enterprise will be according to the historical data analysis of load and to Future Development trend
The change conditions and development trend of electric load in a period of time from now on are forecast in judgement.One accurate load prediction is to electric power
The short term scheduling arrangement of enterprise and long-term system planning are all vital, are that it works out power supply plan, development plan, money
The basis of golden financial planning etc..
By the end of the year in 2017, new-energy grid-connected capacity reached 2.80 hundred million kilowatts in national grid scheduling range, apoplexy
Electric 145,390,000 kilowatts, 120,830,000 kilowatts of solar power generation, rank first in the world;Intelligent electric meter is accumulative to be installed more than 400,000,000,
It is basic to realize all standing of power information automatic collection.Data it is growing, load prediction enters big data era, and load is pre-
Surveying with big data technological incorporation generation is imperative practice, has important strategic importance for the development of power industry.
In order to cope with mass data bring challenge, " big data " has expedited the emergence of a large amount of based on computer technology
Distributed Parallel Computing and memory technology.The MapReduce programming framework and Google File developed with Google
Based on System, the Hadoop project of Apache opens the epoch of enterprise-level big data processing, and what is come along is to surround
The appearance of the various distributed computings, storage platform of Hadoop research and development.As the demand calculated in real time is higher and higher, with
Spark Streaming, Flink are that the related Stream Processing engine of representative has started the new wave tide of big data development.
In order to solve the problems, such as under big data scene fast and efficiently electro-load forecast, need to propose that a whole set of is applicable in
Load prediction data processing and modeling scheme in big data processing scene.
Summary of the invention
The embodiment of the present application solves existing by providing a kind of load forecasting method based on shot and long term memory network
The problem of of fast and efficiently carrying out electro-load forecast, is difficult under big data scene in technology.
The embodiment of the present application provides a kind of load forecasting method based on shot and long term memory network, comprising the following steps:
Step S1, power load data and corresponding weather characteristics data of the acquisition target area in certain time period, shape
At raw data set;
Step S2, missing values processing is carried out to the raw data set using Spark cluster;
Step S3, feature selecting is carried out to the raw data set;
Step S4, Feature Compression is carried out to the raw data set;
Step S5, the prediction model based on shot and long term memory network is established;
Step S6, distributed training is carried out to the prediction model using the Spark cluster;
Step S7, according to the weather characteristics data of previous time point and power load data, using the prediction model into
Row Distributed Predictive obtains the predicted load of current point in time.
Preferably, the load forecasting method based on shot and long term memory network further include:
Step S8: real-time collected power load data and weather characteristics data are stored in Hbase cluster,
And show predicted load and the real-time collection value of load.
Preferably, in the step S1, the raw data set of formation is stored into Hbase cluster.
Preferably, it in the step S2, after carrying out K- mean cluster to missing values using the Spark cluster, takes same
The strategy of class mean value interpolation is handled.
Preferably, in the step S3, the Pearson for calculating the power load data and the weather characteristics data is closed
Connection degree, and decision tree is promoted by training gradient and carries out feature importance ranking.
Preferably, the characteristic variable that conspicuousness is higher than 0.05 is rejected, relevancy ranking is carried out to all variables, before taking
30% feature is as fisrt feature collection;It is promoted after decision tree is trained using gradient and obtains characteristic variable importance ranking,
Take preceding 30% feature as second feature collection;Take the intersection feature of the fisrt feature collection and the second feature collection as sieve
Feature after choosing.
Preferably, in the step S4, Feature Compression is carried out to the weather characteristics data using principal component analysis.
Preferably, it in the step S5, is propagated using clean cut system along the direction of time and carries out parameter update.
Preferably, in the step S6, it is based on the Spark cluster, carries out data using asynchronous stochastic gradient descent method
Parallel distribution training.
Preferably, in the step S7, the power load data and weather characteristics data of previous time point, root are obtained
Degree of parallelism is set according to data volume and cluster hardware information, predicts load using the Spark cluster.
One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:
In the embodiment of the present application, it by obtaining power load data set, the weather characteristics data set of history, and carries out
Data prediction, feature selecting and Feature Compression;Then, net is remembered along the shot and long term that time reversal is propagated using clean cut system
Network carries out asynchronous stochastic gradient descent distribution training, establishes the prediction model of power load amount, carries out the electricity consumption at a certain moment
Amount prediction.The present invention effectively can quickly extract large data sets, handle and operation.
Detailed description of the invention
It, below will be to required use in embodiment description in order to illustrate more clearly of the technical solution in the present embodiment
Attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is one embodiment of the present of invention, for ability
For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is at a kind of data of the load forecasting method based on shot and long term memory network provided in an embodiment of the present invention
Platform;
Fig. 2 is shot and long term in a kind of load forecasting method based on shot and long term memory network provided in an embodiment of the present invention
The topological diagram of memory network model;
Fig. 3 is clean cut system in a kind of load forecasting method based on shot and long term memory network provided in an embodiment of the present invention
The schematic diagram propagated along time reversal;
Fig. 4 is that Spark is general in a kind of load forecasting method based on shot and long term memory network provided in an embodiment of the present invention
The topological diagram of logical pattern drill;
Fig. 5 is Spark net in a kind of load forecasting method based on shot and long term memory network provided in an embodiment of the present invention
The topological diagram of lattice pattern drill.
Specific embodiment
In order to better understand the above technical scheme, right in conjunction with appended figures and specific embodiments
Above-mentioned technical proposal is described in detail.
A kind of load forecasting method based on shot and long term memory network provided in this embodiment, mainly comprises the steps that
Step S1: the associated weather of acquisition somewhere history power load data and same area, same time period
Data form raw data set, and store as in Hbase cluster.
Historical data (i.e. raw data set) is stored in Hbase, it, can be with due to the good feature of Hbase horizontal extension
The requirement for meeting big data quantity is met the operational requirements to data persistence and reading, while can be carried out using Hive
Interactive query operation.
Step S2: missing values processing operation is carried out to data set using Spark cluster.
It is parallel using the calculate node in Spark cluster by providing relevant information to the driving node of Spark cluster
Big data is loaded into the distributed memory of cluster system on ground and persistence, data are abstracted as elasticity point in Spark cluster
Cloth data set simultaneously calculates based on memory, reduces the time overhead of data prediction.It is poly- based on K- mean value to data missing values
Class carries out the processing of parallel similar mean value interpolation, is more nearly true value compared to common mean value interpolation.
Step S3: the feature selecting of data is carried out.
The pearson correlation degree of each weather characteristics data Yu power load data is calculated, and promotes decision tree using gradient
Show that characteristic variable importance ranking, both joints carry out Feature Selection, farthest remain original number after being trained
According to feature, reduce amount of training data.
Step S4: the Feature Compression of data is carried out.
Using principal component analysis compressed data, while retaining initial data feature to the maximum extent, algorithm model is reduced
Input data amount, improve model calculating speed.
Step S5: the model based on shot and long term memory network is established.
The modeling that using shot and long term memory network compressed data are carried out with load prediction, can prevent " gradient
Disappear " problem.It is propagated using clean cut system along the direction of time and carries out parameter update, can reduce that parameter in network updates answers
The frequency of parameter update can be improved in miscellaneous degree, this method, the neural network so that same operational capability is quickly formed.
Step S6: distributed training pattern is carried out using Spark cluster.
Based on Spark cluster, using the data parallelism training method of asynchronous stochastic gradient descent, with Spark cluster mould
Formula training network will be significantly reduced the data volume that parameter updates between node, to improve model training speed.
Step S7: according to the power load data of previous time point and weather characteristics data, remembered by the shot and long term of foundation
Recall network model, carries out Distributed Predictive, obtain the predicted load of current point in time.
The power load data and weather characteristics data for reading last moment, are arranged reasonable degree of parallelism, use
Spark cluster predicts load, improves predetermined speed.
Step S8: real-time collected power load data and weather data are stored in Hbase cluster, in Web
End shows predicted load and the real-time collection value of load (i.e. true load value) by graphical interfaces.
In real-time collected newest power load data and weather characteristics data deposit Hbase cluster.It is true in real time
Load value (i.e. the real-time collection value of load) is shown simultaneously with predicted load, so as to observation error and trend.
The present invention is described in further detail below.
Fig. 1 illustrates the structural block diagram of data processing platform (DPP) provided by the invention, comprising: provides bottom storage
Distributed file system HDFS in Hadoop;It is that Hbase and Hive provide the Computational frame of data operating interface
MapReduce;PC cluster frame Spark;Receive the message queue MQ of real time data.Wherein Hbase is distributed data
Library, Hive are used to provide SQL formula data manipulation for relevant staff.
The scene of the comprehensive reality of the present invention, it is contemplated that the factors such as weather conditions, time factor, regional relevance pass through number
Modeling is learned, under conditions of guaranteeing certain serious forgiveness and precision, to the power load amount at the following a certain area corresponding a certain moment
It is predicted.
A kind of load forecasting method based on shot and long term memory network provided in this embodiment specifically includes that
Step S1: the associated weather of acquisition somewhere history power load data and same area, same time period
Data are in Hbase cluster.
For original electricity data collection and weather data collection, it is previously stored in Hbase respectively, Hbase can be deposited
The data set for storing up magnanimity, as a kind of NoSQL types of database of column memory-type, its data column can according to demand dynamically
Ground increases, to meet the numerous load profile of dimension.
Step S2: missing values processing operation is carried out to data set using Spark cluster.
K- mean cluster is carried out to Meteorological Characteristics missing values and handles to obtain several different aggregates of data, in same aggregate of data
Mean value interpolation is carried out, compared to common mean value interpolation, the mean value of interpolation will be closer to true value, thus the accuracy of lift scheme.
K- mean value is an iterative algorithm, it is assumed that we want data clusters into K group, method are as follows: select first
K random points, referred to as cluster centre;For each of data set data, according to the distance apart from each central point,
Data use feature vector 2- norm.It is associated with apart from nearest central point, with the associated institute of the same central point
It is polymerized to one kind a little.Central point associated by the group, is moved to the position of average value by the average value for calculating each group.K-
Mean value minimization problem is to minimize the sum of the distance between all data point and the cluster centre point associated by it, K-
The cost function of mean value is as follows:
Wherein,It represents and characteristic vector x(i)Nearest cluster centre point, algorithm optimization target are exactly to find out to make
Obtain the smallest c of cost function(1),…c(m)And u1,…,uk。
Algorithm flow is as follows:
(1) K point is created at random as starting center;
(2) to each feature vector in Meteorological Characteristics data set, its distance relative to each center is calculated;
(3) feature vector is assigned to and its immediate center;
(4) for the grouping newly obtained, the vector center of each grouping is calculated;
It repeats process (2) and arrives (4), until algorithmic statement.
K- mean algorithm can not determine number of clusters, although excessive can make cost function smaller, will cause number
According to over-fitting.Select a suitable number of clusters amount extremely important for the accuracy of mean value interpolation.It calculates under inhomogeneity quantity
The error amount of Meteorological Characteristics data, select so that the maximum categorical measure of error fall off rate as classification difference classification according to
According to.To ready-portioned weather characteristics classification, mean value interpolation is carried out respectively, to complete missing values processing work.
Step S3: the feature selecting of data is carried out.
Pearson's degree of association is calculated, promotes decision making algorithm training pattern using gradient, both joints carry out feature selecting,
Feature fault-tolerance with higher.
Pearson correlation degree and conspicuousness are calculated, the skin of following formula calculated load amount and each weather characteristics is used
The Ademilson degree of association:
Wherein,AndRespectively XiSample and YiThe average value of sample.Feature of the conspicuousness higher than 0.05 is rejected to become
Amount carries out relevancy ranking to all variables, takes preceding 30% feature as feature set A.
It is promoted after decision tree is trained using gradient and obtains characteristic variable importance ranking, preceding 30% feature is taken to make
It is characterized collection B.Take the intersection feature of feature set A and feature set B as the feature after screening.
Step S4: the Feature Compression of data is carried out.
Using principal component analysis dimensionality reduction, the transition matrix W of d × k dimension will be constructed, it thus can be by a feature
Vector x is mapped in a new k dimensional feature subspace, and the dimension in this space is less than original d dimensional feature space:
Principal Component Analysis Algorithm process is as follows:
(1) standardization is done to original d dimension data collection;
(2) sample covariance matrix is constructed;
(3) characteristic value and corresponding feature vector of covariance matrix are calculated;
(4) feature vector corresponding with preceding k maximum eigenvalue is selected, wherein k is the dimension (k < d) in new feature space;
(5) mapping matrix W is constructed by the one before feature vector;
(6) the d input data set tieed up is transformed by mapping matrix W by new k dimensional feature subspace.
Compressed time series data collection is split according to the ratio of 8:2 in Spark cluster, is split into
Training characteristics data set and test data set two parts.Wherein characteristic data set is used for training pattern, and test data set is used to comment
Estimate model.
Step S5: the Cyclic Operation Network based on shot and long term memory network is established.
Because of the following load per hour for 24 hours of prediction, the time step for choosing shot and long term memory network is 24,
The sequence output of i.e. continuous load for 24 hours is used as a sample, and specific structure is shown in Fig. 2.Training characteristics are that compressed weather is special
Data are levied, training label is the load value of future time point.Using shot and long term memory network structure, even if time step is longer,
Be not in " gradient disappearance " and influence model training.
Shot and long term memory network the operational capability of sizing is required when handling longer sequence it is high, using clean cut system along when
Between backpropagation fast shaping network.By length in Fig. 3 be 4 clean cut system along time reversal propagation for, each subparameter is more
4 time steps are all only passed through in new backpropagation, therefore can reduce the complexity that parameter updates in network.Input compared with
The frequency of parameter update can be improved when long sequence data along time reversal propagation algorithm using clean cut system, so that same fortune
Calculation ability is quickly formed neural network.
Step S6: distributed training is carried out to model using Spark cluster.
Use the distributed training side of the asynchronous stochastic gradient descent (asynchronous stochastic gradient descent) based on data parallel
Case.Definition loss function is L, for n parameter, the gradient vector of loss function are as follows:
Parameter vector W is with learning rate a after SGD i+1 time iteration are as follows:
Wherein, WiTo be after parameter vector i-th iteration as a result,The node data is trained for j-th of calculate node
The gradient vector of resulting loss function after copy, n are the quantity of calculate node.
In asynchronous stochastic gradient descent, parameter updated valueCompletion Shi Caihui is calculated to be used in parameter vector,
Period without following strictly parameter update.Asynchronous stochastic gradient descent can obtain higher gulp down in a distributed system
The amount of spitting: working node can spend more times to execute useful calculating, rather than parameter averaging step is waited to complete.Secondly,
Quickly merge the information from other working nodes when working node is than using synchronized update.
Distributed model training is carried out using asynchronous stochastic gradient descent based on Spark computing cluster, when in cluster
Less than 32 nodes of number of nodes will use general mode, see Fig. 4;When cluster scale is larger, using network mode, Fig. 5 is seen.
Parameter update between node is carried out coding compression by both training modes, to reduce inter-node traffic, is effectively improved
Model training speed.
In the ordinary mode, the coding of quantization, which is updated, is transmitted to host node by working node, and then host node will more new biography
It is multicast to remaining node.This can ensure that host node holds the latest edition of model always.Meanwhile by the fault-tolerant machine of Spark cluster
System, it is ensured that the reliable communication of node, if Spark Master node breaks down, cluster can elect new
Master node, to avoid single-point problem.
Mesh model is a multiway tree, and root node is Spark Master.Under default situations, each node is most
There can be eight nodes, the node tree in Spark cluster can have up to five ranks.Under network mode, each node section
Point, which updates coding, is relayed to all nodes connected to it, and each node aggregation is from being connected to its every other section
The received update of point.Under mesh model, Master node is no longer the bottleneck of performance, because of its direct received traffic
Reduce.
Step S7: according to the power load data and weather characteristics data of previous time point, pass through the shot and long term of foundation
Memory network model carries out Distributed Predictive, obtains the load prediction amount of current point in time.
The power load data and weather characteristics data that last moment is read from Hbase cluster, use Spark collection
Group's parallel anticipation load is arranged the reasonable data number of partitions and piecemeal size according to cluster situation, improves to the greatest extent pre-
Degree of testing the speed.
Step S8: real-time collected power load data and weather data are stored in Hbase cluster, and
The end Web is shown by graphical interfaces.
It collected load and weather characteristics data deposit Hbase will be used in real time for training pattern in the future.It will be pre-
It surveys result and message queue is written, using the predicted load and true value in Network Programming Technology real-time reception message queue, with
The mode of line chart is shown, so as to observation error and trend.
A kind of load forecasting method based on shot and long term memory network provided in an embodiment of the present invention includes at least following skill
Art effect:
Based on the big datas processing technique such as Hadoop, Spark, similar mean value interpolation is done using K- mean cluster, combines skin
The Ademilson degree of association and gradient promote decision tree Feature Selection and the data processing of principal component analysis dimensionality reduction compressive features data is grasped
Make, considerably reduces the data volume of model training while utmostly retaining initial data feature, accelerate model training
Speed.Distributed load modeling training is carried out to data using shot and long term memory network, compared to conventional individual training, greatly
Accelerate training process.By saving on a distributed training pattern, by Spark computing cluster, complete
Distributed parallel load prediction reduces predicted time expense.The present invention can be efficiently and accurately complete under big data scene
At electro-load forecast.The present invention corresponds to certain for the moment under conditions of guaranteeing certain serious forgiveness and precision, to the following a certain area
The power load amount at quarter is predicted, provides certain reference reference significance for the scheduling of power resources to relevant departments.
It should be noted last that the above specific embodiment is only used to illustrate the technical scheme of the present invention and not to limit it,
Although being described the invention in detail referring to example, those skilled in the art should understand that, it can be to the present invention
Technical solution be modified or replaced equivalently, without departing from the spirit and scope of the technical solution of the present invention, should all cover
In the scope of the claims of the present invention.
Claims (10)
1. a kind of load forecasting method based on shot and long term memory network, which comprises the following steps:
Step S1, acquisition target area is formed former in the power load data and corresponding weather characteristics data of certain time period
Beginning data set;
Step S2, missing values processing is carried out to the raw data set using Spark cluster;
Step S3, feature selecting is carried out to the raw data set;
Step S4, Feature Compression is carried out to the raw data set;
Step S5, the prediction model based on shot and long term memory network is established;
Step S6, distributed training is carried out to the prediction model using the Spark cluster;
Step S7, according to the weather characteristics data of previous time point and power load data, divided using the prediction model
Cloth prediction, obtains the predicted load of current point in time.
2. the load forecasting method according to claim 1 based on shot and long term memory network, which is characterized in that further include:
Step S8: real-time collected power load data and weather characteristics data are stored in Hbase cluster, and are shown
Predicted load and the real-time collection value of load.
3. the load forecasting method according to claim 1 based on shot and long term memory network, which is characterized in that the step
In S1, the raw data set of formation is stored into Hbase cluster.
4. the load forecasting method according to claim 1 based on shot and long term memory network, which is characterized in that the step
In S2, after carrying out K- mean cluster to missing values using the Spark cluster, at the strategy of taking similar mean value interpolation
Reason.
5. the load forecasting method according to claim 1 based on shot and long term memory network, which is characterized in that the step
In S3, Pearson's degree of association of the power load data Yu the weather characteristics data is calculated, and promoted by training gradient
Decision tree carries out feature importance ranking.
6. the load forecasting method according to claim 5 based on shot and long term memory network, which is characterized in that reject significant
Property be higher than 0.05 characteristic variable, relevancy ranking is carried out to all variables, takes preceding 30% feature as fisrt feature collection;Make
It is promoted after decision tree is trained with gradient and obtains characteristic variable importance ranking, take preceding 30% feature as second feature collection;
Take the intersection feature of the fisrt feature collection and the second feature collection as the feature after screening.
7. the load forecasting method according to claim 1 based on shot and long term memory network, which is characterized in that the step
In S4, Feature Compression is carried out to the weather characteristics data using principal component analysis.
8. the load forecasting method according to claim 1 based on shot and long term memory network, which is characterized in that the step
In S5, is propagated using clean cut system along the direction of time and carry out parameter update.
9. the load forecasting method according to claim 1 based on shot and long term memory network, which is characterized in that the step
In S6, it is based on the Spark cluster, the distribution training of data parallel is carried out using asynchronous stochastic gradient descent method.
10. the load forecasting method according to claim 1 based on shot and long term memory network, which is characterized in that the step
In rapid S7, the power load data and weather characteristics data of previous time point are obtained, according to data volume and cluster hardware information
Degree of parallelism is set, predicts load using the Spark cluster.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910325295.2A CN110188919A (en) | 2019-04-22 | 2019-04-22 | A kind of load forecasting method based on shot and long term memory network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910325295.2A CN110188919A (en) | 2019-04-22 | 2019-04-22 | A kind of load forecasting method based on shot and long term memory network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110188919A true CN110188919A (en) | 2019-08-30 |
Family
ID=67714857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910325295.2A Pending CN110188919A (en) | 2019-04-22 | 2019-04-22 | A kind of load forecasting method based on shot and long term memory network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110188919A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110879802A (en) * | 2019-10-28 | 2020-03-13 | 同济大学 | Log pattern extraction and matching method |
CN110991689A (en) * | 2019-10-17 | 2020-04-10 | 国网河南省电力公司鹤壁供电公司 | Distributed photovoltaic power generation system short-term prediction method based on LSTM-Morlet model |
CN111091243A (en) * | 2019-12-13 | 2020-05-01 | 南京工程学院 | PCA-GM-based power load prediction method, system, computer-readable storage medium, and computing device |
CN111178587A (en) * | 2019-12-06 | 2020-05-19 | 广东工业大学 | Spark framework-based short-term power load rapid prediction method |
CN111311001A (en) * | 2020-02-17 | 2020-06-19 | 合肥工业大学 | Bi-LSTM network short-term load prediction method based on DBSCAN algorithm and feature selection |
CN111680074A (en) * | 2019-12-31 | 2020-09-18 | 国网浙江省电力有限公司 | Clustering algorithm-based electric power collection load leakage point feature mining method |
CN113450141A (en) * | 2021-06-09 | 2021-09-28 | 重庆锦禹云能源科技有限公司 | Intelligent prediction method and device based on electricity selling quantity characteristics of large-power customer groups |
CN113988393A (en) * | 2021-10-21 | 2022-01-28 | 青岛联众芯云科技有限公司 | Energy internet load prediction method |
CN114881343A (en) * | 2022-05-18 | 2022-08-09 | 清华大学 | Short-term load prediction method and device of power system based on feature selection |
CN114970938A (en) * | 2022-03-11 | 2022-08-30 | 武汉大学 | Self-adaptive residential load prediction method considering user privacy protection |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108985965A (en) * | 2018-06-22 | 2018-12-11 | 华北电力大学 | A kind of photovoltaic power interval prediction method of combination neural network and parameter Estimation |
CN109032671A (en) * | 2018-06-25 | 2018-12-18 | 电子科技大学 | A kind of distributed deep learning method and system based on data parallel strategy |
CN109117864A (en) * | 2018-07-13 | 2019-01-01 | 华南理工大学 | Coronary heart disease risk prediction technique, model and system based on heterogeneous characteristic fusion |
CN109325624A (en) * | 2018-09-28 | 2019-02-12 | 国网福建省电力有限公司 | A kind of monthly electric power demand forecasting method based on deep learning |
CN109508811A (en) * | 2018-09-30 | 2019-03-22 | 中冶华天工程技术有限公司 | Parameter prediction method is discharged based on principal component analysis and the sewage treatment of shot and long term memory network |
CN109543203A (en) * | 2017-09-22 | 2019-03-29 | 山东建筑大学 | A kind of Building Cooling load forecasting method based on random forest |
-
2019
- 2019-04-22 CN CN201910325295.2A patent/CN110188919A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109543203A (en) * | 2017-09-22 | 2019-03-29 | 山东建筑大学 | A kind of Building Cooling load forecasting method based on random forest |
CN108985965A (en) * | 2018-06-22 | 2018-12-11 | 华北电力大学 | A kind of photovoltaic power interval prediction method of combination neural network and parameter Estimation |
CN109032671A (en) * | 2018-06-25 | 2018-12-18 | 电子科技大学 | A kind of distributed deep learning method and system based on data parallel strategy |
CN109117864A (en) * | 2018-07-13 | 2019-01-01 | 华南理工大学 | Coronary heart disease risk prediction technique, model and system based on heterogeneous characteristic fusion |
CN109325624A (en) * | 2018-09-28 | 2019-02-12 | 国网福建省电力有限公司 | A kind of monthly electric power demand forecasting method based on deep learning |
CN109508811A (en) * | 2018-09-30 | 2019-03-22 | 中冶华天工程技术有限公司 | Parameter prediction method is discharged based on principal component analysis and the sewage treatment of shot and long term memory network |
Non-Patent Citations (1)
Title |
---|
刘琪琛等: ""基于Spark平台和并行随机森林回归算法的短期电力负荷预测"", 《电力建设》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991689B (en) * | 2019-10-17 | 2022-10-04 | 国网河南省电力公司鹤壁供电公司 | Distributed photovoltaic power generation system short-term prediction method based on LSTM-Morlet model |
CN110991689A (en) * | 2019-10-17 | 2020-04-10 | 国网河南省电力公司鹤壁供电公司 | Distributed photovoltaic power generation system short-term prediction method based on LSTM-Morlet model |
CN110879802A (en) * | 2019-10-28 | 2020-03-13 | 同济大学 | Log pattern extraction and matching method |
CN111178587A (en) * | 2019-12-06 | 2020-05-19 | 广东工业大学 | Spark framework-based short-term power load rapid prediction method |
CN111178587B (en) * | 2019-12-06 | 2022-11-22 | 广东工业大学 | Spark framework-based short-term power load rapid prediction method |
CN111091243A (en) * | 2019-12-13 | 2020-05-01 | 南京工程学院 | PCA-GM-based power load prediction method, system, computer-readable storage medium, and computing device |
CN111680074A (en) * | 2019-12-31 | 2020-09-18 | 国网浙江省电力有限公司 | Clustering algorithm-based electric power collection load leakage point feature mining method |
CN111680074B (en) * | 2019-12-31 | 2023-07-04 | 国网浙江省电力有限公司 | Clustering algorithm-based power acquisition load leakage point feature mining method |
CN111311001A (en) * | 2020-02-17 | 2020-06-19 | 合肥工业大学 | Bi-LSTM network short-term load prediction method based on DBSCAN algorithm and feature selection |
CN111311001B (en) * | 2020-02-17 | 2021-11-19 | 合肥工业大学 | Bi-LSTM network short-term load prediction method based on DBSCAN algorithm and feature selection |
CN113450141A (en) * | 2021-06-09 | 2021-09-28 | 重庆锦禹云能源科技有限公司 | Intelligent prediction method and device based on electricity selling quantity characteristics of large-power customer groups |
CN113450141B (en) * | 2021-06-09 | 2023-09-01 | 重庆锦禹云能源科技有限公司 | Intelligent prediction method and device based on electricity sales quantity characteristics of large power customer group |
CN113988393A (en) * | 2021-10-21 | 2022-01-28 | 青岛联众芯云科技有限公司 | Energy internet load prediction method |
CN114970938A (en) * | 2022-03-11 | 2022-08-30 | 武汉大学 | Self-adaptive residential load prediction method considering user privacy protection |
CN114970938B (en) * | 2022-03-11 | 2024-05-07 | 武汉大学 | Self-adaptive house load prediction method considering user privacy protection |
CN114881343A (en) * | 2022-05-18 | 2022-08-09 | 清华大学 | Short-term load prediction method and device of power system based on feature selection |
CN114881343B (en) * | 2022-05-18 | 2023-11-14 | 清华大学 | Short-term load prediction method and device for power system based on feature selection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110188919A (en) | A kind of load forecasting method based on shot and long term memory network | |
CN110378799A (en) | Aluminium oxide comprehensive production index decision-making technique based on multiple dimensioned depth convolutional network | |
CN106547882A (en) | A kind of real-time processing method and system of big data of marketing in intelligent grid | |
CN105844371A (en) | Electricity customer short-term load demand forecasting method and device | |
CN110751318A (en) | IPSO-LSTM-based ultra-short-term power load prediction method | |
CN109670650A (en) | The method for solving of Cascade Reservoirs scheduling model based on multi-objective optimization algorithm | |
CN108959187A (en) | A kind of variable branch mailbox method, apparatus, terminal device and storage medium | |
CN108694470A (en) | A kind of data predication method and device based on artificial intelligence | |
CN108921324A (en) | Platform area short-term load forecasting method based on distribution transforming cluster | |
CN116187640B (en) | Power distribution network planning method and device based on grid multi-attribute image system | |
CN114239385A (en) | Intelligent decision making system and method for warehouse resource allocation | |
CN113971089A (en) | Method and device for selecting equipment nodes of federal learning system | |
CN114118569A (en) | Wind power multi-step prediction method based on multi-mode multi-task Transformer network | |
CN110059873A (en) | A kind of intelligent dispatching method towards power grid enterprises' test environment cloud resource | |
CN106980906A (en) | A kind of Ftrl voltage-prediction methods based on spark | |
CN115600729A (en) | Grid load prediction method considering multiple attributes | |
CN115965160A (en) | Data center energy consumption prediction method and device, storage medium and electronic equipment | |
CN108241864A (en) | Server performance Forecasting Methodology based on multivariable grouping | |
CN112288187A (en) | Big data-based electricity sales amount prediction method | |
CN115270921B (en) | Power load prediction method, system and storage medium based on combined prediction model | |
CN109299725A (en) | A kind of forecasting system and device based on the decomposition of tensor chain Parallel Implementation high-order dominant eigenvalue | |
Pan et al. | Application of Parallel Clustering Algorithm Based on R in Power Customer Classification | |
CN103440540B (en) | A kind of parallel method of land utilization space layout artificial immunity Optimized model | |
Liu et al. | Line loss prediction method of distribution network based on long short-term memory | |
Guo | Research on Logistics Big Data Asset Management and Data Mining Based on Particle Swarm Optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190830 |