CN111461400B - Kmeans and T-LSTM-based load data completion method - Google Patents
Kmeans and T-LSTM-based load data completion method Download PDFInfo
- Publication number
- CN111461400B CN111461400B CN202010128406.3A CN202010128406A CN111461400B CN 111461400 B CN111461400 B CN 111461400B CN 202010128406 A CN202010128406 A CN 202010128406A CN 111461400 B CN111461400 B CN 111461400B
- Authority
- CN
- China
- Prior art keywords
- data
- load
- day
- load data
- complemented
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000013499 data model Methods 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 21
- 230000000295 complement effect Effects 0.000 claims abstract description 18
- 238000012545 processing Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000012795 verification Methods 0.000 claims description 4
- 230000003631 expected effect Effects 0.000 abstract description 2
- 238000012217 deletion Methods 0.000 description 5
- 230000037430 deletion Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000007418 data mining Methods 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Business, Economics & Management (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Economics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Strategic Management (AREA)
- Evolutionary Biology (AREA)
- Human Resources & Organizations (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Mathematical Optimization (AREA)
- Marketing (AREA)
- Mathematical Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Quality & Reliability (AREA)
- Databases & Information Systems (AREA)
Abstract
The invention discloses a load data complement method based on Kmeans and T-LSTM, and relates to a data complement method. The existing data complement method has large data deviation and often cannot achieve the expected effect. The invention comprises the following steps: constructing a data model; respectively training the data of the K load intervals to obtain corresponding K data models; the load data of the same day of the data to be complemented are taken at regular time; calculating the average value of load data of the same day; acquiring a corresponding data model according to the average value; and inputting the to-be-complemented load data into a corresponding data model, and calculating to obtain the complemented complete load data. According to the technical scheme, load data with similar characteristics can be classified into one type, and interference of data with different characteristics is discharged; the real load value of the missing data can be accurately reflected. The method realizes accurate data complement and has the advantages of small error and high convergence rate.
Description
Technical Field
The invention relates to a data complement method, in particular to a load data complement method based on Kmeans and T-LSTM.
Background
In the current age background, the rapid development and diversified data acquisition approaches of information industry technologies enable the data volume of each industry organization to be increased rapidly, for example, the power load data of the national network has extremely large data storage quantity, and the rapid speed is still increased rapidly at present. Experience has shown that there are many available contents in these data, and it is very interesting to extract potential data value and to apply upper layers if it is possible to analyze the content underlying the data more effectively and completely.
However, most theoretical innovation, development and technical implementation in the current data mining field are based on ideal and complete data sets, however, load data collected by a real terminal is missing and incomplete due to various reasons such as terminal damage and no communication, and incomplete load data can distort, invalidate and even draw an erroneous conclusion on the result of data mining. The completion of the missing data is a particularly important, non-negligible link in the data mining process.
The existing data complement methods comprise linear complement, difference value complement and the like, the idea of the linear complement algorithm is to obtain a missing data value by using the average of the previous time data and the next time data of the missing point, and the method is simple but has large deviation compared with a true value, and often cannot achieve the expected effect. Moreover, many complement algorithms do not classify historical load data, and the model is influenced by abrupt change of the load data, so that the error is too large. In addition, the LSTM (Long Short Term Memory) network based on the time sequence has better complementing effect under the condition that continuous and time intervals are regular, but the actual condition is that missing data are random, so that the LSTM network data complementing can not meet the requirement.
Disclosure of Invention
The invention aims to solve the technical problems and provide the technical task of perfecting and improving the prior art scheme, and provides a load data complement method based on Kmeans and T-LSTM so as to achieve the purpose of accurately complementing data. For this purpose, the present invention adopts the following technical scheme.
A load data complement method based on Kmeans and T-LSTM comprises the following steps:
1) Constructing a data model;
101 Batch acquiring load data;
102 Randomly digging out continuous points in the load data as the load data to be complemented;
103 Carrying out Kmeans clustering on the load data;
104 Obtaining optimal K classification modes through Kmeans clustering, dividing a total sample into K categories according to the K classification modes, wherein each category corresponds to different load sections, and obtaining K classified load sections;
105 Calculating a load average value and carrying out normalization processing on load data;
106 Determining a load interval according to the load average value, and inputting the load data subjected to normalization processing into a T-LSTM neural network of the corresponding load interval for training, so as to obtain a data model of the corresponding load interval; respectively training the data of the K load intervals to obtain corresponding K data models;
2) The load data of the same day of the data to be complemented are taken at regular time;
3) Calculating the average value of load data of the same day;
4) Acquiring a corresponding data model according to the average value;
5) And inputting the to-be-complemented load data into a corresponding data model, and calculating to obtain the complemented complete load data.
As a preferable technical means: when the data model is constructed:
in step 101), the acquired load data includes load data of a certain unit of a certain day and 1 day before a certain day and a seventh day;
in step 102), randomly digging out continuous points in load data of a certain day as load data to be complemented;
in step 105), the average value of the load on a certain day is calculated, and the load data on a certain day and the day and seventh days before are normalized.
As a preferable technical means: in the step 2), load data of the day before and the seventh day before the data to be complemented are also obtained in addition to the load data of the day of the data to be complemented at regular time;
in step 5), load data of the previous day and the previous seventh day of the normalization processing are input into the corresponding data model, in addition to the data to be complemented are input into the corresponding data model; the data model is complemented according to the load data of the day, the previous day and the previous seventh day.
As a preferable technical means: step 104), K value is obtained by using an elbow method when Kmeans clustering is performed.
As a preferable technical means: and when the step 1) is carried out to construct the data model, finally, a verification step is further included, the data with the missing is normalized and then is input into the corresponding data model, the historical information at the moment is supplemented, the historical data before yesterday and seven days are included, the complete sequence is finally obtained, then the complete sequence is compared with the real data to obtain an error, and after the error is converged, training is finished, and a final data model is obtained and stored.
The beneficial effects are that: according to the technical scheme, the Kmeans method is adopted for clustering the collected public variable load data, the load data with similar characteristics can be well classified, and interference of different characteristic data is discharged. And then the data of the same category is input into the T-LSTM neural network, because the T-LSTM design considers the deletion rule of the load deletion data, some of the deletion data are continuous, some of the deletion data are discontinuous, and the delta T can be well distinguished, so that the neural network learns interval information, and the real load value of the deletion data can be reflected more accurately. The method realizes accurate data complement and has the advantages of small error and high convergence rate.
Drawings
Fig. 1 is a flow chart of the present invention.
Fig. 2 is a graph of the sum of squares of cluster errors versus k of the present invention.
Fig. 3 is a diagram of the LSTM network structure of the present invention.
FIG. 4 is a diagram of the structure of the T-LSTM of the present invention.
Fig. 5 is a data model training diagram of the present invention.
FIG. 6 is a test flow chart of the present invention
Detailed Description
The technical scheme of the invention is further described in detail below with reference to the attached drawings.
As shown in fig. 1, the present invention includes the steps of:
1) Constructing a data model;
101 Obtaining load data of a certain day and 1 day before a certain day of a certain unit in batches;
102 Randomly digging out continuous points in the load data as the load data to be complemented;
103 Carrying out Kmeans clustering on the load data;
104 Obtaining optimal K classification modes through Kmeans clustering, dividing a total sample into K categories according to the K classification modes, wherein each category corresponds to different load sections, and obtaining K classified load sections;
105 Calculating the average value of the load on a certain day, and normalizing the load data on the day before and the seventh day before the certain day;
106 Determining a load interval according to the load average value, and inputting the load data subjected to normalization processing into a T-LSTM neural network of the corresponding load interval for training, so as to obtain a data model of the corresponding load interval; respectively training the data of the K load intervals to obtain corresponding K data models;
2) The method comprises the steps of regularly taking load data of the day of data to be complemented, load data of the day before the data to be complemented and load data of the seventh day before the data to be complemented;
3) Calculating the average value of load data of the same day;
4) Acquiring a corresponding data model according to the average value;
5) And inputting the load data to be complemented and the load data of the previous day and the previous seventh day of normalization processing into corresponding data models, and calculating to obtain the complemented complete load data.
The following further describes some of the steps:
kmeans clustering: the K value is obtained by using the elbow method, and the clustering effect is the best because the curvature at the elbow is the largest.
The technical scheme adopts an elbow method to determine the k value (the number of clusters) of the clusters. The core idea of the elbow method is that when K is smaller than the actual cluster number, the aggregation degree of each cluster is greatly increased due to the increase of K, so that the decrease amplitude of the square sum of the cluster errors of all samples is large, when K reaches the actual cluster number, the return of the aggregation degree obtained by increasing K again is rapidly reduced, so that the decrease amplitude of the square sum of the cluster errors is rapidly reduced, and then gradually becomes gentle along with the continuous increase of K value, namely, the relation diagram of the square sum of the cluster errors and K is an elbow shape, and the K (curvature highest) value corresponding to the elbow is the actual cluster number of data, so that the K value is determined by utilizing the characteristic.
Because the power supply characteristics of different public transformers are different, the daily load changes of the public transformers have the characteristics of the public transformers, and the absolute values of the loads are also very different, the method for classifying the data by using the clustering analysis is provided, and interference among samples with different power supply characteristics is eliminated. And (3) dividing the total auspicious cost into a plurality of categories through Kmeans clustering, and taking the total auspicious cost as a training sample of each data complement network. The method comprises the following specific steps: taking 96 load values of 4 kilo-metric transformers of Jinhua department in one day and the load values of the same as characteristics of samples, inputting the characteristics into a Kmeans clustering model, and drawing a relation graph of a sum of squares of clustering errors (sum of differences between the load values of the samples and the load values of the center points) and k as shown in figure 2. Since k decreases relatively rapidly before 3 and gradually from 3 onwards, the number of clusters of kmeans can be assumed to be 3 (the highest curvature).
T-LSTM (long and short term memory network of variants): with the T-LSTM neural network, the problem of missing data complement can be well handled by the T-LSTM, wherein the uncertainty of the missing data of the load is considered, and the situation of missing of a plurality of points is possible.
LSTM was originally proposed by Hochreiter et al and improved by Graves, a modified version of the recurrent neural network proposed for the gradient explosion problem and long-term dependency problem in the native RNN, as shown in fig. 3. The main work of the LSTM is to modify the internal structure of the RNN network, and control the memory duration is realized by adding a plurality of gates, for example, a plurality of information is filtered by a plurality of forgetting gates, so that the information is remembered for a longer time. As shown in fig. 3.
The formula is as follows:
g t =tanh(W g x t +U g h t-1 +b g )
i t =σ(W i x t +U i h t-1 +b i )
f t =σ(W f x t +U f h t-1 +b f )
o t =σ(W o x t +U o h t-1 +b f )
c t =f t ·c {t-1} +i t ·g t
h t =o t ·tanh(c t )
wherein h is t ,c t ∈R H H is the hidden layer size and σ (·) is the sigmoid function, i, f, o, g represent input gate, forget gate, output gate and cell state, respectively.
{W f ,U f ,b f },{W i ,U i ,b i },{W o ,U o ,b o },{W c ,U c ,b c Each is a network parameter of each part. More specifically, the input gate i adjusts the degree of new value data fed into the unit, the forgetting gate f adjusts the degree of forgetting history, and the output gate o determines the weights of different parts to calculate the output.
However, for the data with missing, our input is discontinuous, the time interval is irregular, and the LSTM network cannot have a good processing effect, so the technical solution adopts a T-LSTM network considering the time interval, as shown in fig. 4: Δt is added to the input layer, and other parameters are not changed, so that the network learns the time interval information.
The improved portion of T-LSTM over LSTM is as follows:
g(δ t )=1/log(e+δt)
h t =o t ·tanh(c t )
wherein delta t I.e. the time interval Δt of the current input, the definition of the input gate, the output gate and the forget gate are identical to LSTM, except for the update of the cell state. Compared with LSTM, T-LSTM considers not only the specific value of the current input but also the interval of the input, and solves the problem of inconsistent interval in the time sequence with the missing. Inputting the cell state c at the last moment in each T-LSTM cell t-1 Hidden layer state h t-1 Current input value x t And time interval delta t Obtaining the cell state c of this cell t Hidden layer state h t And proceeds to the next T-LSTM cell.
And (3) calculating the average value of the to-be-completed load data: in order to determine which class of load interval the data to be complemented belongs to.
Training a data model:
as shown in fig. 5, in order to improve accuracy, in step 1), when the data model is constructed, a verification step is finally included, the data with the missing is normalized and then input into the corresponding data model, and the historical information of this moment is supplemented, including the historical data before yesterday and seven days, so as to finally obtain a complete sequence, then the complete sequence is compared with the real data to obtain an error, and when the error converges, training is finished, and a final data model is obtained and stored. The complete model training process comprises the following steps: extracting data as a training data set, performing kmeans clustering to obtain n kinds of load data and load intervals after data processing, normalizing the data with the defects, then encoding by using T-LSTM to obtain a Temporal context, then inputting the Temporal context into a decoder taking the LSTM as a unit, assisting with the historical information of the moment, including the historical data before yesterday and seven days, finally obtaining a complete decoded sequence, comparing the complete decoded sequence with real data to obtain errors, and after error convergence, finishing training to obtain K models and storing.
The following data model training description is performed by taking Jinhua portion data as an example:
1. public transformer load data of 5 thousands of 2018, 11 months to 2019, 5 months and 8 months in total were prepared in Jinhua department.
2. The training data set is processed, namely continuous missing points are dug out as data to be complemented, and the average value of the load of the whole day to be complemented is obtained.
3. And inputting the processed training data into Kmeans for clustering to obtain K classifications.
4. The K classified data are normalized by adding the load data of the first 1 day and the first 7 days and then are input into a T-LSTM network for coding processing to obtain a sample context
5. Inputting the obtained sample context into LSTM decoder, and comparing with real data to obtain error
6. If the error is not converged, continuing training
7. And after the error is converged, training is finished, K models are obtained and stored.
On the basis of obtaining K models, taking the data of Jinhua homemade part as an example to carry out flow description of data completion:
the data set was derived from 221 days of data from 11 months of the last year of Jinhua, a total of 174 users, 96 load points per day. We manually excavate about 1% of the points' data (approximating the true data loss rate) and are all missing for 5 points in succession, which is closer to the true case loss.
The method comprises the following specific steps:
1. data from day 221 of 2018, 11 of Jinhua ministry of China was prepared, and there were 174 public transformer users and 96 load data per day.
2. Data to be completed and load data from the previous day and the previous day 7 were collected in batches.
3. And 5 continuous points to be complemented are manually dug out to serve as verification, and the average value of the data of the days to be complemented is calculated.
4. And judging which type of load interval the load average value belongs to.
5. Data normalization
5. The historical information of the moment of adding the missing value to the trained model is added, the historical information comprises the historical data of the previous day and the previous seventh day, and the completed load data is finally obtained
And calculating the completed load data and the original data to obtain the average absolute error and average absolute percentage error of the test data. The data are shown in Table one.
The left is the result obtained by the method, the right is the result obtained by a linear model (the missing data value is obtained by averaging the sum of the last point and the last point), where mae is the mean absolute error and mape is the mean absolute percent error. It can be seen that the method is better than the linear model and has a percentage error of about 10% in the case of a relatively large load value.
Table one: test results
The load data complement method based on Kmeans and T-LSTM shown in the above figures 1-6 is a specific embodiment of the present invention, has already shown the essential characteristics and improvements of the present invention, and can be modified in terms of shape, structure, etc. according to the practical use requirement, under the teaching of the present invention, all of which are within the scope of protection of the present invention.
Claims (5)
1. The load data complement method based on Kmeans and T-LSTM is characterized by comprising the following steps:
1) Constructing a data model;
101 Batch acquiring load data;
102 Randomly digging out continuous points in the load data as the load data to be complemented;
103 Carrying out Kmeans clustering on the load data;
104 Obtaining optimal K classification modes through Kmeans clustering, dividing a total sample into K categories according to the K classification modes, wherein each category corresponds to different load sections, and obtaining K classified load sections;
105 Calculating a load average value and carrying out normalization processing on load data;
106 Determining a load interval according to the load average value, and inputting the load data subjected to normalization processing into a T-LSTM neural network of the corresponding load interval for training, so as to obtain a data model of the corresponding load interval; respectively training the data of the K load intervals to obtain corresponding K data models;
2) The load data of the same day of the data to be complemented are taken at regular time;
3) Calculating the average value of load data of the same day;
4) Acquiring a corresponding data model according to the average value;
5) And inputting the to-be-complemented load data into a corresponding data model, and calculating to obtain the complemented complete load data.
2. The Kmeans and T-LSTM based load data completion method of claim 1, wherein:
when the data model is constructed:
in step 101), the acquired load data includes load data of a certain unit of a certain day and 1 day before a certain day and a seventh day;
in step 102), randomly digging out continuous points in load data of a certain day as load data to be complemented;
in step 105), the average value of the load on a certain day is calculated, and the load data on a certain day and the day and seventh days before are normalized.
3. A Kmeans and T-LSTM based load data completion method according to claim 2, wherein: in the step 2), load data of the day before and the seventh day before the data to be complemented are also obtained in addition to the load data of the day of the data to be complemented at regular time;
in step 5), load data of the previous day and the previous seventh day of the normalization processing are input into the corresponding data model, in addition to the data to be complemented are input into the corresponding data model; the data model is complemented according to the load data of the day, the previous day and the previous seventh day.
4. A Kmeans and T-LSTM based load data completion method according to claim 3, wherein: step 104), K value is obtained by using an elbow method when Kmeans clustering is performed.
5. A Kmeans and T-LSTM based load data completion method according to claim 2, wherein: and when the step 1) is carried out to construct the data model, finally, a verification step is further included, the data with the missing is normalized and then is input into the corresponding data model, the historical information at the moment is supplemented, the historical data before yesterday and seven days are included, the complete sequence is finally obtained, then the complete sequence is compared with the real data to obtain an error, and after the error is converged, training is finished, and a final data model is obtained and stored.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010128406.3A CN111461400B (en) | 2020-02-28 | 2020-02-28 | Kmeans and T-LSTM-based load data completion method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010128406.3A CN111461400B (en) | 2020-02-28 | 2020-02-28 | Kmeans and T-LSTM-based load data completion method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111461400A CN111461400A (en) | 2020-07-28 |
CN111461400B true CN111461400B (en) | 2023-06-23 |
Family
ID=71682448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010128406.3A Active CN111461400B (en) | 2020-02-28 | 2020-02-28 | Kmeans and T-LSTM-based load data completion method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111461400B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107833153A (en) * | 2017-12-06 | 2018-03-23 | 广州供电局有限公司 | A kind of network load missing data complementing method based on k means clusters |
CN109598381A (en) * | 2018-12-05 | 2019-04-09 | 武汉理工大学 | A kind of Short-time Traffic Flow Forecasting Methods based on state frequency Memory Neural Networks |
CN109754113A (en) * | 2018-11-29 | 2019-05-14 | 南京邮电大学 | Load forecasting method based on dynamic time warping Yu length time memory |
CN109934375A (en) * | 2018-11-27 | 2019-06-25 | 电子科技大学中山学院 | Power load prediction method |
CN110245801A (en) * | 2019-06-19 | 2019-09-17 | 中国电力科学研究院有限公司 | A kind of Methods of electric load forecasting and system based on combination mining model |
CN110334726A (en) * | 2019-04-24 | 2019-10-15 | 华北电力大学 | A kind of identification of the electric load abnormal data based on Density Clustering and LSTM and restorative procedure |
CN110674999A (en) * | 2019-10-08 | 2020-01-10 | 国网河南省电力公司电力科学研究院 | Cell load prediction method based on improved clustering and long-short term memory deep learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190143517A1 (en) * | 2017-11-14 | 2019-05-16 | Arizona Board Of Regents On Behalf Of Arizona State University | Systems and methods for collision-free trajectory planning in human-robot interaction through hand movement prediction from vision |
-
2020
- 2020-02-28 CN CN202010128406.3A patent/CN111461400B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107833153A (en) * | 2017-12-06 | 2018-03-23 | 广州供电局有限公司 | A kind of network load missing data complementing method based on k means clusters |
CN109934375A (en) * | 2018-11-27 | 2019-06-25 | 电子科技大学中山学院 | Power load prediction method |
CN109754113A (en) * | 2018-11-29 | 2019-05-14 | 南京邮电大学 | Load forecasting method based on dynamic time warping Yu length time memory |
CN109598381A (en) * | 2018-12-05 | 2019-04-09 | 武汉理工大学 | A kind of Short-time Traffic Flow Forecasting Methods based on state frequency Memory Neural Networks |
CN110334726A (en) * | 2019-04-24 | 2019-10-15 | 华北电力大学 | A kind of identification of the electric load abnormal data based on Density Clustering and LSTM and restorative procedure |
CN110245801A (en) * | 2019-06-19 | 2019-09-17 | 中国电力科学研究院有限公司 | A kind of Methods of electric load forecasting and system based on combination mining model |
CN110674999A (en) * | 2019-10-08 | 2020-01-10 | 国网河南省电力公司电力科学研究院 | Cell load prediction method based on improved clustering and long-short term memory deep learning |
Non-Patent Citations (2)
Title |
---|
T-LSTM: A Long Short-Term Memory Neural Network Enhanced by Temporal Information for Traffic Flow Prediction;LUNTIAN MOU;IEEE ACCESS;98053-98061 * |
基于ST-LSTM 网络的位置预测模型;许芳芳;计算机工程;1-7 * |
Also Published As
Publication number | Publication date |
---|---|
CN111461400A (en) | 2020-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111738512B (en) | Short-term power load prediction method based on CNN-IPSO-GRU hybrid model | |
CN107315884B (en) | Building energy consumption modeling method based on linear regression | |
CN106022521B (en) | Short-term load prediction method of distributed BP neural network based on Hadoop architecture | |
WO2018045642A1 (en) | A bus bar load forecasting method | |
CN105488528B (en) | Neural network image classification method based on improving expert inquiry method | |
CN111563706A (en) | Multivariable logistics freight volume prediction method based on LSTM network | |
CN111814956B (en) | Multi-task learning air quality prediction method based on multi-dimensional secondary feature extraction | |
CN111178611B (en) | Method for predicting daily electric quantity | |
CN106022954B (en) | Multiple BP neural network load prediction method based on grey correlation degree | |
CN110674999A (en) | Cell load prediction method based on improved clustering and long-short term memory deep learning | |
CN106251001A (en) | A kind of based on the photovoltaic power Forecasting Methodology improving fuzzy clustering algorithm | |
CN111008726B (en) | Class picture conversion method in power load prediction | |
CN114065653A (en) | Construction method of power load prediction model and power load prediction method | |
CN114528949A (en) | Parameter optimization-based electric energy metering abnormal data identification and compensation method | |
CN112241836B (en) | Virtual load leading parameter identification method based on incremental learning | |
CN111353603A (en) | Deep learning model individual prediction interpretation method | |
CN109214444B (en) | Game anti-addiction determination system and method based on twin neural network and GMM | |
CN116542701A (en) | Carbon price prediction method and system based on CNN-LSTM combination model | |
CN113627594B (en) | One-dimensional time sequence data augmentation method based on WGAN | |
CN112766537B (en) | Short-term electric load prediction method | |
CN113762591A (en) | Short-term electric quantity prediction method and system based on GRU and multi-core SVM counterstudy | |
CN111783688B (en) | Remote sensing image scene classification method based on convolutional neural network | |
CN111461400B (en) | Kmeans and T-LSTM-based load data completion method | |
CN111311025B (en) | Load prediction method based on meteorological similar days | |
CN110288002B (en) | Image classification method based on sparse orthogonal neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |