CN117493922A

CN117493922A - Power distribution network household transformer relation identification method based on data driving

Info

Publication number: CN117493922A
Application number: CN202311233645.5A
Authority: CN
Inventors: 周云海; 高怡欣; 燕良坤; 崔黎丽; 石基辰; 郑培城; 张泰源; 陈潇潇; 罗琰琳; 季怀招; 周勇
Original assignee: China Three Gorges University CTGU
Current assignee: China Three Gorges University CTGU
Priority date: 2023-09-22
Filing date: 2023-09-22
Publication date: 2024-02-02

Abstract

A method for identifying a household transformer relation of a power distribution network based on data driving comprises the following steps: step one, voltage data processing normalization, wherein a maximum and minimum normalization method is adopted due to a large voltage difference of users; step two, analyzing the main components of the kernel to reduce the dimension; the kernel principal component analysis dimension reduction method is a principal component analysis dimension reduction method added with kernel functions, the kernel principal component analysis uses the kernel functions to map data into a high-dimension feature space through the kernel functions, and then dimension reduction is carried out in the space. And thirdly, K-means clustering, namely clustering the data subjected to the dimension reduction processing based on the principle that the same platform region voltage data have similarity, and identifying the platform region household change relation. Firstly, preprocessing the original data by adopting a maximum and minimum value normalization method, so that the characteristic difference of the user voltage data is more obvious. And then, performing dimension reduction on the processed data by adopting kernel principal component analysis, thereby facilitating the accuracy and rapidity of a subsequent clustering algorithm.

Description

Power distribution network household transformer relation identification method based on data driving

Technical Field

The invention relates to the technical field of power systems, in particular to a method for identifying a household transformer relation of a power distribution network based on data driving.

Background

In recent years, smart grids have become one of the most influential changes and innovations in the power industry field, and are also an essential component of smart city construction. However, the rapid development of smart grids also puts higher demands on the fine management of the distribution network. The current ubiquitous distribution area problems, such as abnormal line loss calculation and the like, influence the smooth promotion of a plurality of advanced applications such as area operation and planning, limit the capacity of realizing intelligent management of the whole area and directly influence the safe electricity utilization of users. In this problem, the difficulty in exact matching between the end user and the distribution transformer under control of the distribution transformer area is a major symptom. Therefore, the accurate and efficient user-change relation identification method has important strategic significance for realizing the informatization, automation and interaction targets of the intelligent power distribution area.

Disclosure of Invention

The invention aims to solve the technical problem of providing a data-driven power distribution network user-variable relation identification method, which utilizes the bidirectional communication of an AMI intelligent ammeter and the capability of recording detailed load information, and provides an AMI-based user-variable relation identification method based on a voltage fluctuation curve similarity theory. Firstly, preprocessing the original data by adopting a maximum and minimum value normalization method, so that the characteristic difference of the user voltage data is more obvious. And then, performing dimension reduction on the processed data by adopting kernel principal component analysis, thereby facilitating the accuracy and rapidity of a subsequent clustering algorithm. And finally, based on a voltage fluctuation similarity theory, carrying out k-means clustering on the data subjected to dimension reduction, classifying users with high voltage change similarity into one class, and belonging to the same transformer, thereby realizing user change relationship identification.

In order to solve the technical problems, the invention adopts the following technical scheme: a method for identifying a household transformer relation of a power distribution network based on data driving comprises the following steps:

step one, voltage data processing normalization, wherein a maximum and minimum normalization method is adopted due to a large voltage difference of users;

step two, analyzing the main components of the kernel to reduce the dimension; the kernel principal component analysis dimension reduction method is a principal component analysis dimension reduction method added with kernel functions, the kernel principal component analysis uses the kernel functions to map data into a high-dimension feature space through the kernel functions, and then dimension reduction is carried out in the space.

And thirdly, K-means clustering, namely clustering the data subjected to the dimension reduction processing based on the principle that the same platform region voltage data have similarity, and identifying the platform region household change relation.

Preferably, the first step includes the following:

1) For data set X collected by one user _i ＝[x _i，1 x _i，2 … x _i，t ]Determining a normalized range, selecting scaling the voltage data to [0,1 ]]Within the scope, wherein x _i，1 For the first acquired voltage data of user i, x _i，2 For the voltage data acquired by user i for the second time, x _i，t User i t-th acquisitionVoltage data obtained;

2) Finding the maximum x of individual user voltage data _max And a minimum value x _min ；

3) For a voltage data point x collected by a single user, it is converted to a normalized value x' using the following normalization formula,

4) Applying the above normalization formula to each data point in the data set, converting them into values within the specified range, and outputting the data set X _i 'and form a total voltage matrix X' = [ X ] ₁ X ₂ … X _i ] ^T 。

Preferably, the second step includes the following:

1) Inputting a preprocessed regional individual user voltage data set X _i ' calculate X _i ' Gaussian kernel function values between data sets, using Euclidean distance formula,

where δ is the bandwidth parameter of the gaussian kernel function;

2) The core matrix K is subjected to centering processing, a centering matrix H is calculated, the centering matrix is a matrix related to a dimension t, and the expression is as follows:

wherein I is an identity matrix, 1 is an all 1 vector, 11 ^T Is a matrix generated from all 1 vectors;

calculating a centralizing kernel matrix K ', and applying the centralizing matrix H to the original kernel matrix K to obtain the centralizing kernel matrix K ' '

K′＝HKH

The centralized nuclear moment K' array is used for the subsequent characteristic value decomposition and principal component selection steps;

3) Performing eigenvalue decomposition on the centering kernel matrix K' to obtain eigenvalue lambda ₁ ，λ ₂ ，...λ _t And corresponding feature vector phi ₁ ，φ ₂ ，…φ _t ；

4) Will characteristic value lambda ₁ ，λ ₂ ，...λ _t Normalizing to obtain a characteristic value in a percentage form;

5) Setting the number of principal components to be retained according to the cumulative interpretable variance ratio;

6) The principal components to be kept are selected, the data point x is projected through the principal components, the data point Z after dimension reduction is obtained, and the total voltage data processing set Z after dimension reduction is obtained after repeating the steps.

Preferably, in the third step, clustering is performed on the data after the dimension reduction processing, including the following steps:

1) Initializing cluster centers, randomly selecting k data points as the initial cluster centers, and supposing that the cluster centers are c respectively ₁ ，c ₂ ，...c _k

2) For each data point z, calculate it to each cluster center c _j Z to the cluster Sj corresponding to the cluster center closest to,

s _j ＝{z|||z-c _j || ² ≤||z-c _l || ² for all 1.ltoreq.l.ltoreq.k })

4) For each cluster Sj, calculating the average value of all data points in the cluster Sj to obtain a new cluster center c _j ，

4) Repeating steps 2) and 3) until no significant change occurs in the cluster center;

5) Finally, each data point z is assigned to one cluster Sj, resulting in a clustered result.

The invention provides a data-driven power distribution network household transformer relation identification method, which has the following beneficial effects:

1. for the user voltage data acquired by AMI, the main reason of adopting maximum and minimum normalization is that the method can effectively resist the interference of abnormal values, and the influence of the extreme outlier on the model training can be limited by mapping the data characteristics into a normalized scale range, so that the stability and the robustness of the model are ensured. This process not only helps maintain the accuracy of the model, but also helps mitigate unreasonable deviations from outlier results.

2. The nuclear principal component analysis is an advanced dimension reduction technology, and combines the powerful characteristics of principal component analysis and nuclear methods. PCA is a linear dimension reduction technique that may not capture the intrinsic structure of the data for nonlinear data. The kernel principal component analysis can process nonlinear data and perform dimension reduction by using kernel skills, so that the data is more suitable for a linear model. The core principal component analysis better retains the key features of the data compared to linear PCA because it uses a kernel function to capture the nonlinear structure of the data, so that the reduced-dimension data can better distinguish between different classes or clusters, the power system topology is typically composed of a complex power network, while the nonlinear topology analysis capability of the core principal component analysis enables it to capture the nonlinear power system topology more accurately, while simplifying the high-dimension data by dimension reduction, thereby efficiently managing and analyzing the vast power data. In addition, in high dimensional data sets, the number of features far exceeds the number of samples may cause problems such as overfitting. The kernel principal component analysis can help reduce dimensionality, thereby improving the generalization performance of the model and mitigating the impact of dimensional disasters. The comprehensive characteristics enable the analysis of the nuclear principal components to become a powerful tool in the topology identification of the power system, and promote the further improvement of the reliability and efficiency of the power system.

3. The K-means clustering method can automatically perform data grouping on nodes or equipment in the power distribution network without predefining a topological structure, and is particularly beneficial to facing complex and variable low-voltage power distribution networks. The user-to-transformer relation identification needs to process real-time information, and k-means has real-time performance and can update topology information in real time along with the change of the state of the power distribution network, so that fault detection, monitoring and quick problem response are supported. And the clustering result of K-means is easy to understand and visualize, so that operators can be helped to better understand the topological structure of the low-voltage distribution network, including the distribution, connection relation and load distribution of equipment, which is helpful for decision making and problem analysis.

Drawings

The invention is further illustrated by the following examples in conjunction with the accompanying drawings:

FIG. 1 is a flow chart of data processing according to the present invention;

FIG. 2 is a class aggregation flow diagram of the present invention;

FIG. 3 is a graph of the invention based on the conventional principal component analysis data dimension reduction

FIG. 4 is a graph of the invention for dimension reduction based on analysis of kernel principal components.

Detailed Description

As shown in fig. 1, a method for identifying a household transformer relation of a power distribution network based on data driving is characterized by comprising the following steps:

Preferably, the first step includes the following:

1) For data set X collected by one user _i ＝[x _i，1 x _i，2 …x _i，t ]Determining a normalized range, selecting scaling the voltage data to [0,1 ]]Within the scope, wherein x _i，1 First for user iSub-acquired voltage data, x _i，2 For the voltage data acquired by user i for the second time, x _i，t The voltage data acquired by the user i for the t time;

Preferably, the second step includes the following:

where δ is the bandwidth parameter of the gaussian kernel function;

K′＝HKH

3) Performing eigenvalue decomposition on the centering kernel matrix K' to obtain eigenvalue lambda ₁ ，λ ₂ ，...λ _t And corresponding feature vector phi ₁ ，φ ₂ ，...φ _t ；

4. The method for identifying the household transformer relation of the power distribution network based on data driving according to claim 1, wherein in the third step, the data after the dimension reduction processing is clustered, and the method comprises the following steps:

S _j ＝{z|||z-c _j || ² ≤||z-c _l || ² for all 1.ltoreq.l.ltoreq.k })

5) For each cluster Sj, calculating the average value of all data points in the cluster Sj to obtain a new cluster center c _j ，

The invention takes 200 user voltage data extracted from five areas in a certain area as an example to carry out user change relation identification calculation analysis:

first, we perform maximum and minimum preprocessing on the voltage data acquired based on the AMI system, and perform kernel component dimension reduction on the voltage data.

And randomly extracting voltage data of two users in each area to perform traditional principal component analysis and dimension reduction processing of a nuclear principal component analysis technology, wherein the original data is mapped from 96 dimensions to a 50-dimension space. By comparing the traditional principal component analysis with the kernel principal component analysis, the comparison of the reduced-dimension data images can obviously show that the reduced-dimension data still maintains the change characteristics of the original data, as shown in fig. 3 and 4, wherein the horizontal axis in the figure is the data dimension, and the vertical axis is the voltage fluctuation value.

This result shows that the self-encoder can efficiently extract information that is significant to the data changes during the dimension reduction process and encode it into a low-dimensional representation. Although the dimension of the data after dimension reduction is reduced, the change trend and the structure of the original data can be maintained, and the effectiveness of the kernel principal component analysis in simultaneously realizing the dimension reduction target and maintaining the key characteristics of the data is verified.

Dimension reduction data accuracy comparison

PCA+K-means	Kernel PCA+K-means
		90.96％	92.05％

The graph results clearly show that the Kernel principal component analysis (Kernel PCA) shows clear superiority when processing nonlinear data of the user voltage in the power system, compared with the traditional principal component analysis method. By mapping the data to the high-dimensional feature space by using the Kernel function, the Kernel PCA can more effectively reserve nonlinear features in the data, so that higher accuracy is presented in the data after the dimension reduction. The discovery strongly supports the application of the nuclear principal component analysis in the power system, not only can the main change characteristics of the user voltage be reserved, but also the accuracy of the household change relation of the low-voltage distribution network station area can be improved

In order to deeply verify the accuracy of the method proposed in the paper, we have performed a comprehensive performance comparison, and examined the performance of a series of common clustering algorithms and dimension reduction techniques. The detailed performance evaluation is carried out by adopting evaluation indexes such as a Kaolinski-Harabasz Index (Calinski-Harabasz Index) and an RI (Rand Index).

Lande coefficient

	20D	30D	40D	50D	60D
						PCA+Birch	0.678413	0.66382	0.652543	0.660385	0.637875
PCA+Kmeans	0.756582	0.755845	0.759092	0.749383	0.728877
						KPCA+Kmeans	0.757283	0.754523	0.759285	0.756562	0.7592

Karnssky-Harabase index

	20D	30D	40D	50D	60D
						PCA+Birch	80.91547	62.07512	54.71555	47.2577	42.6965
PCA+Kmeans	123.8984	95.56359	80.76184	71.73475	64.62178
						KPCA+Kmeans	123.6157	95.59159	80.65116	71.86636	65.91663

In this study, the Kernel principal component analysis (Kernel PCA) and K-means clustering algorithms are significantly prominent, and the method herein is superior to the conventional method. The core principal component analysis as a nonlinear dimension reduction technique shows excellent ability in processing nonlinear structures of time-series data of an electric power system. By introducing the kernel function, the kernel function can map the data to a high-dimensional feature space, so that the complex nonlinear characteristic in the data is better reserved. This is of great importance for the modeling and interpretation of the complexity of the power system data. In addition, kernel PCA improves computational efficiency by reducing data dimensionality while maintaining critical information of the data. And the K-means clustering algorithm shows excellent performance in a power system data clustering task. The lower sensitivity of the clustering method to the outliers enables the outliers to be effectively divided into independent clusters, so that the accuracy of clustering results is improved. The excellent performance of the kernel principal component analysis and the K-means clustering algorithm in the power system topology identification is derived from the fact that the kernel principal component analysis and the K-means clustering algorithm respectively exert the characteristics of nonlinear descent and efficient clustering. The cooperative application of the two methods provides a powerful tool for the data analysis of the power system, and has obvious influence on the aspect of improving the accuracy and efficiency of user-to-user relationship identification.

The above embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and the scope of the present invention should be defined by the claims, including the equivalents of the technical features in the claims. I.e., equivalent replacement modifications within the scope of this invention are also within the scope of the invention.

Claims

1. The method for identifying the household transformer relation of the power distribution network based on data driving is characterized by comprising the following steps of:

step one, voltage data processing normalization adopts a maximum and minimum normalization method;

And thirdly, K-means clustering, namely clustering the data subjected to the dimension reduction processing, and identifying the household change relation of the station area.

2. The method for identifying a household transformer relation of a power distribution network based on data driving according to claim 1, wherein the first step comprises the following steps:

1) For data set X collected by one user _i =[x _i，1 x _i，2 …x _i，t ]Determining a normalized range, selecting scaling the voltage data to [0,1 ]]Within the scope, wherein x _i，1 For the first acquired voltage data of user i, x _i，2 For the voltage data acquired by user i for the second time, x _i，t The voltage data acquired by the user i for the t time;

4) Applying the above normalization formula to each data point in the data set, converting them into values within the specified range, and outputting the data set X _i 'and form a total voltage matrix X' = [ X ] ₁ X ₂ …X _i ] ^T 。

3. The method for identifying the household transformer relations of the power distribution network based on data driving according to claim 1, wherein the second step comprises the following steps:

where δ is the bandwidth parameter of the gaussian kernel function;

K′＝HKH

S _j ＝[z|||z-c _j || ² ≤||z-c _l || ² for all 1.ltoreq.l.ltoreq.k })

3) For each cluster Sj, an average of all data points therein is calculated,obtaining a new cluster center c _j ，