CN111027841A - Low-voltage transformer area line loss calculation method based on gradient lifting decision tree - Google Patents

Low-voltage transformer area line loss calculation method based on gradient lifting decision tree Download PDF

Info

Publication number
CN111027841A
CN111027841A CN201911228303.8A CN201911228303A CN111027841A CN 111027841 A CN111027841 A CN 111027841A CN 201911228303 A CN201911228303 A CN 201911228303A CN 111027841 A CN111027841 A CN 111027841A
Authority
CN
China
Prior art keywords
line loss
low
voltage transformer
transformer area
decision tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911228303.8A
Other languages
Chinese (zh)
Inventor
祝云
姚梦婷
韦化
李滨
张驰
何鹏辉
伍文侠
徐泽天
陆世豪
甘莲琼
陈家腾
梁峻超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gxu Energy Co ltd
Guangxi University
Original Assignee
Gxu Energy Co ltd
Guangxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gxu Energy Co ltd, Guangxi University filed Critical Gxu Energy Co ltd
Priority to CN201911228303.8A priority Critical patent/CN111027841A/en
Publication of CN111027841A publication Critical patent/CN111027841A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Probability & Statistics with Applications (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a low-voltage transformer area line loss calculation method based on a gradient lifting decision tree, which comprises the steps of preprocessing low-voltage transformer area data; extracting electrical characteristic indexes, and establishing a low-voltage transformer area characteristic index system; classifying the low-voltage transformer area; and establishing a GBDT line loss prediction model, predicting the line loss rate of the low-voltage transformer area, and carrying out error analysis on a prediction result. According to the method, the low-voltage transformer area line loss data is mined, the nonlinear relation between the transformer area electrical characteristic index and the line loss rate is revealed, error analysis and abnormity identification are carried out on the line loss result data, and decision support is provided for rapid evaluation, abnormity identification and loss reduction planning of the low-voltage transformer area line loss data, so that the low-voltage transformer area line loss standardization and fine management level is effectively improved.

Description

Low-voltage transformer area line loss calculation method based on gradient lifting decision tree
Technical Field
The invention relates to the technical field of power distribution network line loss calculation, in particular to a low-voltage transformer area line loss calculation method based on a gradient lifting decision tree.
Background
The line loss is the electric quantity lost in the power supply and sale process of the power grid, and is an important economic index for assessing the operation departments of the power grid. By means of line loss theoretical calculation, the operating economy and the structural reasonability of the power grid can be assessed, the reason of the power loss is analyzed, measures for reducing the power loss are assessed, and a basis is provided for planning and modifying the power grid.
At present, there are many methods for calculating line loss, which mainly include a traditional method, a trend method, an intelligent algorithm, and the like. The traditional methods also comprise an average current method, a voltage loss method, an equivalent resistance method and the like, the traditional methods all need to simplify the power grid structure through a series of assumptions to calculate the line loss, and the calculation accuracy is not high; the method based on load flow calculation has more accurate calculation result, but needs to collect a large amount of operation data and structural parameters, and has huge investment in manpower and material resources; the regression analysis method is simple in model and suitable for rapid calculation of line loss under the condition of low precision requirement, but the determination of the regression equation needs a large amount of data support, complex nonlinear relation between the line loss and characteristic indexes cannot be fitted, and the prediction accuracy is low. In recent years, an intelligent algorithm is widely applied to a power system, wherein the most representative method is a line loss calculation method based on an artificial neural network, a mathematical model is not required to be established, any complex function can be fitted, and the defects of overfitting, local optimum and the like cannot be avoided.
Meanwhile, the low-voltage transformer area is large in quantity, weak in management and poor in data quality, so that the workload of line loss calculation is increased, and the calculation efficiency is low; the traditional line loss management of the transformer area usually manually sets line loss rate indexes, and lacks scientific basis; the grid structure has large difference, and when the electricity consumption property and the proportion are in a certain range, the difference between the electricity consumption and the load rate is large. The above factors seriously affect the quality and level of fine management of line loss in the low-voltage transformer area.
Therefore, how to provide a fast and accurate method for calculating the line loss of the distribution room is a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a low-voltage transformer area line loss calculation method based on a gradient lifting decision tree, which applies data mining and machine learning algorithms to low-voltage transformer area line loss rate prediction, realizes line loss anomaly identification through analysis of a line loss prediction result, and solves the problem of low line loss calculation accuracy of the current transformer area.
In order to achieve the purpose, the invention adopts the following technical scheme:
a low-voltage transformer area line loss calculation method based on a gradient lifting decision tree comprises the following steps:
preprocessing the data of the low-voltage transformer area;
extracting electrical characteristic indexes, and establishing a low-voltage transformer area characteristic index system;
classifying the low-voltage transformer area;
and establishing a GBDT line loss prediction model, predicting the line loss rate of the low-voltage transformer area, and carrying out error analysis on a prediction result.
Further, the method for preprocessing the data of the low-voltage transformer area specifically comprises the following steps:
filling missing values by adopting an average value, detecting abnormal points, deleting the abnormal points or replacing the abnormal points by the average value according to the number of the abnormal points, detecting the abnormal points by adopting a DBSCAN clustering algorithm for the missing values, if the number of the abnormal points is less, directly deleting the abnormal points, and otherwise, replacing the abnormal points by the average value;
and extracting the characteristic data and carrying out standardization processing on the characteristic data.
Outlier detection depends on the required scan radius e and the minimum number of samples contained in the neighborhood, MinPts, and the selected distance metric, in the present invention the euclidean distance is chosen, the formula is as follows:
Figure RE-GDA0002388365620000021
where ρ is a point (x)2,y2) And point (x)1,y1) The euclidean distance between them.
Further, the feature data is normalized, and the conversion function is as follows:
z=(x-μ)/σ
where μ is the mean of the raw data and σ is the standard deviation.
Further, establishing a low-voltage transformer area characteristic index system specifically comprises the following steps:
preliminarily selecting a plurality of characteristic indexes reflecting the characteristics of the net rack and the load characteristics;
the GBDT model and the Spearman correlation coefficient are jointly adopted to evaluate the importance of the characteristic index;
forming a plurality of feature sets by using different quantities of feature indexes, inputting the feature sets into the GBDT model, and respectively calculating corresponding standard deviation values of the feature sets in the GBDT model; the GBDT model mentioned here is not the final model, but a feature set with the minimum standard deviation value is selected as a final low-voltage transformer area feature index system by comparing the standard deviation values of different models.
Further, the Spearman correlation coefficient is an index for measuring the dependence of two variables, and the Spearman correlation coefficient is calculated according to the following formula:
Figure RE-GDA0002388365620000031
wherein d isiIs the grade difference number of two columns of paired variables, n is the number of samples, and the value range of S is [ -1,1]Negative values indicate negative correlation, positive values indicate positive correlation, and larger values are more correlated.
Further, in the GBDT model, the global importance of the feature j is measured by the average value of the importance of the feature j in a single tree;
wherein, the global importance calculation formula of the feature j is as follows:
Figure RE-GDA0002388365620000032
where M is the number of trees, TmIs a set of M decision trees;
the importance formula of feature j in a single tree is as follows:
Figure RE-GDA0002388365620000033
wherein L is the number of leaf nodes of the tree, L-1 is the number of non-leaf nodes of the tree, vtIs a feature associated with the node t,
Figure RE-GDA0002388365620000034
is the reduced value of the square loss after the splitting of the node t, and the value range of the characteristic importance degree is [0,100 ]]。
Further, the standard deviation calculation formula is as follows:
Figure RE-GDA0002388365620000035
wherein m is the number of samples, yiIn the form of an actual value of the value,
Figure RE-GDA0002388365620000036
is a predicted value; the smaller the standard deviation is, the more accurate the model prediction result is.
Further, the low-voltage transformer area is classified, and the specific steps are as follows:
if the set of the sample points in the distribution area is C { (X)1,y1),(X2,y2),…,(Xn,yn) Wherein each variable is Xi=(xi1,xi2,…,xim);
Inputting a station area data set, and setting a scanning radius e and the minimum contained sample number MinPts in a neighborhood;
respectively calculating the standardized Euclidean distance between every two sample points, wherein the calculation formula is as follows:
Figure RE-GDA0002388365620000041
wherein n represents an n-dimensional space, xhkAnd xikRepresenting two sample points, SkRepresenting the corresponding variance;
constructing a core object sample set for sample point XiFinding all sample points in the neighborhood with the radius of e, and if the number of the sample points is more than MinPts, then X is addediAdding the core object sample set;
combining objects with directly reachable density in the core object sample set, randomly selecting one core object as a new cluster, searching density reachable points of the core object through a core object list, adding the density reachable points into the new cluster, and searching the new object according to the density reachable points until the clusters are not changed any more;
the process of searching and adding is repeated starting with core objects outside the cluster where no change occurs any more until all core objects are grouped into the cluster, at which point the cluster set is formed as:
H={M1,M2,…,MK}
in the formula, K is the number of clusters.
Further, a GBDT line loss prediction model is established to predict the line loss rate of the low-voltage transformer area, and the specific process is as follows:
constructing a GBDT line loss prediction model, and setting X as an input station area characteristic vector, wherein X is (X)1,x2,…,xm) M is the number of the transformer areas; f. ofi(X) is the predicted outcome of the ith decision tree; calculating the predicted value of the line loss of the low-voltage transformer area, wherein the calculation formula is as follows:
F(X)=f1(X)+f2(X)+…+fn(X)
wherein n is the total number of samples; the number of iterations is M and the loss function is L (y, f (x)).
And (5) initializing. Find a constant value c that minimizes the loss function L and let f0(X)=c;
When the mth decision tree is constructed, the following steps are executed circularly:
a) calculating the value r of the negative gradient of the loss function in the current prediction modelmiAs an estimate of the residual:
b) estimating a leaf node region Rms (S1, 2, …, S) to fit an approximation of the residual;
c) estimating the value c of a leaf node region using a linear searchmsMinimizing the loss function:
Figure RE-GDA0002388365620000042
where c is a constant value for minimizing the loss function L, and f is a functioniAnd (X) is the prediction result of the ith decision tree.
d) Updating the GBDT line loss prediction model:
Figure RE-GDA0002388365620000051
wherein upsilon is a learning rate, and I (-) is an indication function, namely, the condition Xi is satisfied and falls into RmsIf so, 1 is taken, otherwise, 0 is taken.
And inputting the data in the characteristic index system into the GBDT line loss prediction model, and outputting a line loss prediction result.
Further, the error analysis of the loss prediction result includes the following specific processes:
the line loss value is preliminarily judged to be qualified in a region with an actual measurement value interval of (0, 8%) and a relative error of-5% to 5%;
the measured value is more than 8%, and the relative error is in a region between-5% and 5%, the region is primarily determined to be a heavy loss region, the region needs to pay high attention, the problems of small wire diameter, light load or long transmission distance may exist, and the large loss reduction potential is achieved.
Specifically, there are three cases of abnormality: negative values, missing values, line loss too large, this region may be marked as abnormal. For outliers, we give reasonable values as reference; in addition, the relevant departments should pay high attention to check whether the measured values are normal or not, determine the error source and make reasonable loss reduction measures.
According to the technical scheme, compared with the prior art, the method for calculating the line loss of the low-voltage transformer area based on the gradient lifting decision tree is characterized in that the line loss data of the low-voltage transformer area are mined, the nonlinear relation between the electrical characteristic indexes of the transformer area and the line loss rate is revealed, error analysis and abnormal recognition are carried out on the line loss result data, decision support is provided for quick evaluation, abnormal recognition and loss reduction planning of the line loss data of the low-voltage transformer area, and therefore the line loss standardization and fine management level of the low-voltage transformer area is effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of a low-voltage transformer area line loss calculation method based on a gradient boosting decision tree according to the present invention.
FIG. 2 is a graph of the feature contribution in the Spearman correlation coefficient in an embodiment of the present invention;
fig. 3 is a diagram illustrating statistics of GBDT relative feature importance data according to an embodiment of the present invention.
FIG. 4 is a line graph of the standard deviation values for different numbers of features in an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a GBDT line loss prediction model according to an embodiment of the present invention.
FIG. 6 is a schematic diagram of gradient boosting decision tree establishment according to an embodiment of the present invention.
FIG. 7 is a diagram illustrating the establishment of a regression tree in an embodiment of the present invention.
Fig. 8 is a graph of line loss prediction in an embodiment of the present invention.
FIG. 9 is a graph of predicted relative error in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to the attached drawing 1, the embodiment of the invention discloses a low-voltage transformer area line loss calculation method based on a gradient lifting decision tree, which comprises the following steps:
s1, preprocessing the data of the low-voltage transformer area;
s2, extracting electrical characteristic indexes and establishing a low-voltage transformer area characteristic index system;
s3, classifying the low-voltage transformer area;
and S4, establishing a GBDT line loss prediction model, predicting the line loss rate of the low-voltage transformer area, and carrying out error analysis on the prediction result.
In a specific embodiment, the preprocessing the low-voltage transformer area data specifically includes the following steps:
(1) filling missing values by adopting an average value, detecting abnormal points, deleting the abnormal points or replacing the abnormal points by the average value according to the number of the abnormal points, detecting the abnormal points by adopting a DBSCAN clustering algorithm for the missing values, if the number of the abnormal points is less, directly deleting the abnormal points, and otherwise, replacing the abnormal points by the average value;
(2) outlier detection depends on the required scan radius e and the minimum number of samples contained in the neighborhood, MinPts, and the selected distance metric, in the present invention the euclidean distance is chosen, the formula is as follows:
Figure RE-GDA0002388365620000071
where ρ is a point (x)2,y2) And point (x)1,y1) The euclidean distance between them.
(3) And extracting the characteristic data and carrying out standardization processing on the characteristic data.
Specifically, the feature data is normalized, and the conversion function is as follows:
z=(x-μ)/σ
where μ is the mean of the raw data and σ is the standard deviation.
In a specific embodiment, the establishing of the low-voltage platform area characteristic index system specifically comprises the following steps:
(1) preliminarily selecting a plurality of characteristic indexes reflecting the characteristics of the net rack and the load characteristics;
(2) the GBDT characteristic importance and the Spearman correlation coefficient are jointly adopted to evaluate the importance of the characteristic index; the GBDT characteristic importance degree mentioned here is used for selecting the characteristics of the GBDT model;
(3) forming a plurality of feature sets by using different quantities of feature indexes, inputting the feature sets into the GBDT model, and respectively calculating corresponding standard deviation values of the feature sets in the GBDT model;
(4) and selecting the feature set with the minimum standard deviation value as a final low-voltage distribution area feature index system.
In statistics, spearman rank correlation coefficients, named as charles spearman, are sperman correlation coefficients. Often denoted by the greek letter p. It is a non-parametric indicator that measures the dependence of two variables. It evaluates the correlation of two statistical variables using a monotonic equation. If there are no repeated values in the data, and when the two variables are perfectly monotonically correlated, the spearman correlation coefficient is either +1 or-1.
In the present embodiment, the Spearman correlation coefficient is calculated as follows:
Figure RE-GDA0002388365620000072
wherein d isiIs the grade difference number of two columns of paired variables, n is the number of samples, and the value range of S is [ -1,1]Negative values indicate negative correlation, positive values indicate positive correlation, and larger values are more correlated.
In a specific embodiment, in the GBDT model, the global importance of a feature j is measured by the average of the importance of the feature j in a single tree;
wherein, the global importance calculation formula of the feature j is as follows:
Figure RE-GDA0002388365620000081
where M is the number of trees, TmIs a set of M decision trees;
the importance formula of feature j in a single tree is as follows:
Figure RE-GDA0002388365620000082
wherein L is the number of leaf nodes of the tree, L-1 is the number of non-leaf nodes of the tree, vtIs a feature associated with the node t,
Figure RE-GDA0002388365620000083
is the reduced value of the square loss after the splitting of the node t, and the value range of the characteristic importance degree is [0,100 ]]。
In one specific embodiment, the standard deviation MSE is calculated as follows:
Figure RE-GDA0002388365620000084
wherein m is the number of samples, yiIn the form of an actual value of the value,
Figure RE-GDA0002388365620000085
is a predicted value; the smaller the standard deviation is, the more accurate the model prediction result is.
In the present embodiment, the contribution degree of the low-voltage station area characteristic index is shown in fig. 2 and 3, and as can be seen from fig. 2 and 3, the power supply amount and the main line cross-sectional area are always important characteristics, the power factor score is always low, and the ranks of other characteristics are changed. It was shown that the feature importance and Spearman correlation coefficient were somewhat consistent at this point and the power factor could be removed.
The final low-voltage distribution room characteristic index system is shown in fig. 4, and as can be seen from fig. 4, when the number of characteristic indexes is 4, the standard deviation value is the minimum, which is 2.058. At this time, the prediction performance of the GBDT model is the best and is consistent with the conclusion obtained in the table 1, so that the number of the characteristic indexes in the characteristic index system of the low-voltage transformer area is 4 finally.
In a specific embodiment, the low-voltage transformer area is classified by the following specific steps:
if the set of the sample points in the distribution area is C { (X)1,y1),(X2,y2),…,(Xn,yn) Wherein each variable is Xi=(xi1,xi2,…,xim);
Inputting a station area data set, and setting a scanning radius e and the minimum contained sample number MinPts in a neighborhood;
respectively calculating the standardized Euclidean distance between every two sample points, wherein the calculation formula is as follows:
Figure RE-GDA0002388365620000086
in the formula, n representsn-dimensional space, xhkAnd xikRepresenting two sample points, SkRepresenting the corresponding variance;
constructing a core object sample set for sample point XiFinding all sample points in the neighborhood with the radius of e, and if the number of the sample points is more than MinPts, then X is addediAdding the core object sample set;
combining objects with directly reachable density in the core object sample set, randomly selecting one core object as a new cluster, searching density reachable points of the core object through a core object list, adding the density reachable points into the new cluster, and searching the new object according to the density reachable points until the clusters are not changed any more;
the process of searching and adding is repeated starting with core objects outside the cluster where no change occurs any more until all core objects are grouped into the cluster, at which point the cluster set is formed as:
H={M1,M2,…,MK}
in the formula, K is the number of clusters.
In this embodiment, a DBSCAN algorithm is used to calculate a clustering center for a feature index in a feature index system, and the clustering result is shown in table 1 below:
TABLE 1 DBSCAN clustering results
Figure RE-GDA0002388365620000091
As can be seen from table 1, class a indicates the main line cross-sectional area, total line length and the area where the power supply area is large, and class D indicates the opposite. In a word, the practical significance of each type of low-voltage distribution network can be seen, the distinguishing is obvious, and the clustering effect is good.
In a specific embodiment, a GBDT line loss prediction model is established to predict the line loss rate of the low-voltage transformer area, and the specific process is as follows:
as shown in fig. 5, a GBDT line loss prediction model is constructed, where X is an input station area feature vector, and X is (X ═ X)1,x2,…,xm) M is the number of the transformer areas; f. ofi(X) is the ith blockPlanning the prediction result of the tree;
calculating the predicted value of the line loss of the low-voltage transformer area, wherein the calculation formula is as follows:
F(X)=f1(X)+f2(X)+…+fn(X)
wherein n is the total number of samples; the number of iterations is M and the loss function is L (y, f (x)).
And establishing a gradient lifting decision tree for line loss prediction. For illustrative purposes, a regression tree is built with a maximum depth of 2 for the tree, as shown in FIG. 6.
Data in the characteristic index system is input into the GBDT line loss prediction model to obtain a line loss prediction curve and a relative error curve, which can be respectively shown in fig. 7 and 8. We compare GBDT, Support Vector Regression (SVR) and Random Forest (RF) in terms of prediction accuracy. We can see that the predicted curve of SVR deviates most from the measured value followed by RF, while the overall trend of the GBDT curve is substantially the same as the measured value curve with a maximum value of relative error of less than 8%. Therefore, the prediction accuracy of GBDT is higher than SVR and RF.
In a specific embodiment, the error analysis of the loss prediction results is shown in table 2:
TABLE 2 statistical table of line loss prediction results
Figure RE-GDA0002388365620000101
As shown in table 2, 10 samples were taken and their line loss rates were estimated. As shown in fig. 9, samples with an interval of measurement values of (0, 8%) and a relative error between-5% and 5% occupied 91.40% of the entire sample, and these plateau samples were marked as "pass". The measured value is more than 8%, the samples with the relative error of-5% to 5% account for 5.66% of the whole samples, the samples can be judged to be heavily damaged samples, the samples in the damaged area need to be paid high attention, the problems of small wire diameter, light load or long transmission distance can exist, and the damage reduction potential is very large. There are three cases for the abnormal plateau area sample: negative values, missing values, line losses are too large. For these outliers, we give reasonable values as reference; in addition, the relevant departments should pay high attention to check whether the measured values are normal or not, determine the error source and make reasonable loss reduction measures.
The embodiment applies the ensemble learning algorithm to the prediction of the low-voltage transformer area line loss rate, realizes the line loss abnormal recognition through the analysis of the line loss prediction result, and provides a basis for scientifically and reasonably formulating the loss reduction plan, so that the low-voltage transformer area line loss management level is improved, and the practicability is high.
In summary, compared with the prior art, the low-voltage transformer area line loss calculation method based on the gradient boosting decision tree disclosed by the embodiment of the invention has the following advantages:
according to the method, the low-voltage transformer area line loss data is mined, the nonlinear relation between the transformer area electrical characteristic index and the line loss rate is revealed, error analysis and abnormity identification are carried out on the line loss result data, and decision support is provided for rapid evaluation, abnormity identification and loss reduction planning of the low-voltage transformer area line loss data, so that the low-voltage transformer area line loss standardization and fine management level is effectively improved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A low-voltage transformer area line loss calculation method based on a gradient lifting decision tree is characterized by comprising the following steps:
preprocessing the data of the low-voltage transformer area;
extracting electrical characteristic indexes, and establishing a low-voltage transformer area characteristic index system;
classifying the low-voltage transformer area;
and establishing a GBDT line loss prediction model, predicting the line loss rate of the low-voltage transformer area, and carrying out error analysis on a prediction result.
2. The low-voltage transformer area line loss calculation method based on the gradient boosting decision tree as claimed in claim 1, wherein the low-voltage transformer area data are preprocessed, and the method specifically comprises the following steps:
filling missing values by adopting an average value, detecting abnormal points, and deleting the abnormal points or replacing the abnormal points by the average value according to the number of the abnormal points;
and extracting the characteristic data and carrying out standardization processing on the characteristic data.
3. The method for calculating the line loss of the low-voltage transformer area based on the gradient lifting decision tree as claimed in claim 2, wherein the feature data is normalized by a conversion function of:
z=(x-μ)/σ
where μ is the mean of the raw data and σ is the standard deviation.
4. The method for calculating the line loss of the low-voltage transformer area based on the gradient boosting decision tree as claimed in claim 1, wherein a low-voltage transformer area characteristic index system is established, and the method specifically comprises the following steps:
preliminarily selecting a plurality of characteristic indexes reflecting the characteristics of the net rack and the load characteristics;
the GBDT characteristic importance and the Spearman correlation coefficient are jointly adopted to evaluate the importance of the characteristic index;
and forming a plurality of feature sets by using different numbers of feature indexes, inputting the feature sets into the GBDT line loss prediction model, respectively calculating corresponding standard difference values of the feature sets in the GBDT line loss prediction model, and selecting the feature set with the minimum standard difference value as a final low-voltage transformer area feature index system.
5. The method for calculating the line loss of the low-voltage transformer area based on the gradient boosting decision tree as claimed in claim 4, wherein the Spearman correlation coefficient is calculated by the following formula:
Figure FDA0002302838940000011
wherein d isiIs the grade difference number of two columns of paired variables, n is the number of samples, and the value range of S is [ -1,1]Negative values indicate negative correlation, positive values indicate positive correlation, and larger values are more correlated.
6. The method for calculating the line loss of the low-voltage transformer area based on the gradient lifting decision tree as claimed in claim 4, wherein in the GBDT feature importance, the global importance of the feature j is measured by the average value of the importance of the feature j in a single tree;
wherein, the global importance calculation formula of the feature j is as follows:
Figure FDA0002302838940000021
where M is the number of decision trees, TmIs a set of M decision trees;
the importance formula of feature j in a single tree is as follows:
Figure FDA0002302838940000022
wherein L is the number of leaf nodes of the tree, L-1 is the number of non-leaf nodes of the tree, vtIs a feature associated with node t, it 2The value of the reduction of the square loss after the splitting of the node t is (0, 100).
7. The method for calculating the line loss of the low-voltage transformer area based on the gradient boost decision tree as claimed in claim 4, wherein the standard deviation calculation formula is as follows:
Figure FDA0002302838940000023
wherein m is the number of samples, yiIn the form of an actual value of the value,
Figure FDA0002302838940000024
is a predicted value; the smaller the standard deviation is, the more accurate the model prediction result is.
8. The method for calculating the line loss of the low-voltage transformer area based on the gradient boosting decision tree as claimed in claim 1, wherein the low-voltage transformer area is classified, and the specific steps are as follows:
if the set of the sample points in the distribution area is C { (X)1,y1),(X2,y2),…,(Xn,yn) Wherein each variable is Xi=(xi1,xi2,…,xim);
Inputting a station area data set, and setting a scanning radius e and the minimum contained sample number MinPts in a neighborhood;
respectively calculating the standardized Euclidean distance between every two sample points, wherein the calculation formula is as follows:
Figure FDA0002302838940000025
wherein n represents an n-dimensional Euclidean space, xhkAnd xikRepresenting two sample points, SkRepresenting the corresponding variance;
constructing a core object sample set for sample point XiFinding all sample points in the neighborhood with the radius of e, and if the number of the sample points is more than MinPts, then X is addediAdding the core object sample set;
combining objects with directly reachable density in the core object sample set, randomly selecting one core object as a new cluster, searching density reachable points of the core object through a core object list, adding the density reachable points into the new cluster, and searching the new object according to the density reachable points until the clusters are not changed any more;
the process of searching and adding is repeated starting with core objects outside the cluster where no change occurs any more until all core objects are grouped into the cluster, at which point the cluster set is formed as:
H={M1,M2,…,MK}
in the formula, K is the number of clusters.
9. The method for calculating the line loss of the low-voltage transformer area based on the gradient lifting decision tree as claimed in claim 1, wherein a GBDT line loss prediction model is established to predict the line loss rate of the low-voltage transformer area, and the specific process is as follows:
constructing a GBDT line loss prediction model, and setting X as an input station area characteristic vector, wherein X is (X)1,x2,…,xm) M is the number of the transformer areas; f. ofi(X) is the predicted outcome of the ith decision tree; calculating the predicted value of the line loss of the low-voltage transformer area, wherein the calculation formula is as follows:
F(X)=f1(X)+f2(X)+…+fn(X)
wherein n is the total number of samples;
establishing a gradient lifting decision tree for line loss prediction;
and inputting the data in the characteristic index system into the GBDT line loss prediction model according to the gradient lifting decision tree, and outputting a line loss prediction result.
10. The method for calculating the line loss of the low-voltage transformer area based on the gradient lifting decision tree as claimed in claim 1, wherein the error analysis of the line loss prediction result is carried out by the following specific processes:
the line loss value is preliminarily judged to be qualified in a region with an actual measurement value interval of (0, 8%) and a relative error of-5% to 5%;
and the area with the measured value being more than 8% and the relative error being between-5% and 5% is primarily determined as the heavy loss area.
CN201911228303.8A 2019-12-04 2019-12-04 Low-voltage transformer area line loss calculation method based on gradient lifting decision tree Pending CN111027841A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911228303.8A CN111027841A (en) 2019-12-04 2019-12-04 Low-voltage transformer area line loss calculation method based on gradient lifting decision tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911228303.8A CN111027841A (en) 2019-12-04 2019-12-04 Low-voltage transformer area line loss calculation method based on gradient lifting decision tree

Publications (1)

Publication Number Publication Date
CN111027841A true CN111027841A (en) 2020-04-17

Family

ID=70204354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911228303.8A Pending CN111027841A (en) 2019-12-04 2019-12-04 Low-voltage transformer area line loss calculation method based on gradient lifting decision tree

Country Status (1)

Country Link
CN (1) CN111027841A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529067A (en) * 2020-12-04 2021-03-19 国网电力科学研究院武汉南瑞有限责任公司 Power transmission line ice wind disaster fault type evaluation method based on naive Bayes
CN113125903A (en) * 2021-04-20 2021-07-16 广东电网有限责任公司汕尾供电局 Line loss anomaly detection method, device, equipment and computer-readable storage medium
CN113591322A (en) * 2021-08-11 2021-11-02 广西大学 Low-voltage transformer area line loss rate prediction method based on extreme gradient lifting decision tree

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967486A (en) * 2017-11-17 2018-04-27 江苏大学 A kind of nearby vehicle Activity recognition method based on V2V communications with HMM-GBDT mixed models
CN109272176A (en) * 2018-12-10 2019-01-25 贵州电网有限责任公司 Calculation method is predicted to platform area line loss per unit using K-means clustering algorithm
CN110348713A (en) * 2019-06-28 2019-10-18 广东电网有限责任公司 A kind of platform area line loss calculation method based on association analysis and data mining

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967486A (en) * 2017-11-17 2018-04-27 江苏大学 A kind of nearby vehicle Activity recognition method based on V2V communications with HMM-GBDT mixed models
CN109272176A (en) * 2018-12-10 2019-01-25 贵州电网有限责任公司 Calculation method is predicted to platform area line loss per unit using K-means clustering algorithm
CN110348713A (en) * 2019-06-28 2019-10-18 广东电网有限责任公司 A kind of platform area line loss calculation method based on association analysis and data mining

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MENGTING YAO 等: ""Research on Predicting Line Loss Rate in Low Voltage Distribution Network Based on Gradient Boosting Decision Tree"", 《ENERGIES》 *
张海林等: "改进K-means算法的馈线线损计算", 《软件导刊》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529067A (en) * 2020-12-04 2021-03-19 国网电力科学研究院武汉南瑞有限责任公司 Power transmission line ice wind disaster fault type evaluation method based on naive Bayes
CN113125903A (en) * 2021-04-20 2021-07-16 广东电网有限责任公司汕尾供电局 Line loss anomaly detection method, device, equipment and computer-readable storage medium
CN113591322A (en) * 2021-08-11 2021-11-02 广西大学 Low-voltage transformer area line loss rate prediction method based on extreme gradient lifting decision tree

Similar Documents

Publication Publication Date Title
CN104809658B (en) A kind of rapid analysis method of low-voltage distribution network taiwan area line loss
CN112149873B (en) Low-voltage station line loss reasonable interval prediction method based on deep learning
CN110991786A (en) 10kV static load model parameter identification method based on similar daily load curve
CN111027841A (en) Low-voltage transformer area line loss calculation method based on gradient lifting decision tree
CN113126019B (en) Remote estimation method, system, terminal and storage medium for error of intelligent ammeter
CN110689162B (en) Bus load prediction method, device and system based on user side classification
CN111008726B (en) Class picture conversion method in power load prediction
CN111028100A (en) Refined short-term load prediction method, device and medium considering meteorological factors
CN110555058A (en) Power communication equipment state prediction method based on improved decision tree
CN111178585A (en) Fault reporting amount prediction method based on multi-algorithm model fusion
CN112289391A (en) Anode aluminum foil performance prediction system based on machine learning
CN113193551A (en) Short-term power load prediction method based on multi-factor and improved feature screening strategy
CN116821832A (en) Abnormal data identification and correction method for high-voltage industrial and commercial user power load
CN113868938A (en) Short-term load probability density prediction method, device and system based on quantile regression
CN111709668A (en) Power grid equipment parameter risk identification method and device based on data mining technology
CN113902181A (en) Short-term prediction method and equipment for common variable heavy overload
CN107274025B (en) System and method for realizing intelligent identification and management of power consumption mode
CN113591322A (en) Low-voltage transformer area line loss rate prediction method based on extreme gradient lifting decision tree
CN117277312A (en) Gray correlation analysis-based power load influence factor method and equipment
CN117035509A (en) Electric energy meter state evaluation method and device, electronic equipment and readable storage medium
CN112508363A (en) Deep learning-based power information system state analysis method and device
CN117154716A (en) Planning method and system for accessing distributed power supply into power distribution network
CN115759395A (en) Training of photovoltaic detection model, detection method of photovoltaic power generation and related device
CN116245212A (en) PCA-LSTM-based power data anomaly detection and prediction method and system
CN115201394A (en) Multi-component transformer oil chromatography online monitoring method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination