CN110766032A

CN110766032A - Power distribution network data clustering integration method based on hierarchical progressive strategy

Info

Publication number: CN110766032A
Application number: CN201810842519.2A
Authority: CN
Inventors: 王希; 罗海珠; 邵平珍; 袁璐; 王广生; 查四平; 黄纪佳; 陈立敏; 涂筠; 曾春
Original assignee: State Grid Corp of China SGCC; State Grid Jiangxi Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Jiangxi Electric Power Co Ltd
Priority date: 2018-07-27
Filing date: 2018-07-27
Publication date: 2020-02-07

Abstract

The invention is suitable for the technical field of power grid data analysis and decision research, and provides a power distribution network data clustering integration method, which comprises the following steps: generating an alternative cluster set of the power distribution network data; screening the candidate cluster set based on a hierarchical progressive local weight algorithm to obtain a basic cluster set; and integrating the basic clustering set by a hierarchical clustering method to obtain final integrated clustering. Therefore, the power distribution network data clustering integration method based on the hierarchical progressive strategy can effectively improve the clustering effect of the power distribution network data.

Description

Power distribution network data clustering integration method based on hierarchical progressive strategy

Technical Field

The invention relates to the technical field of power grid data analysis and decision research, in particular to a power distribution network data clustering integration method.

Background

The clustering integration method is an effective power distribution network line data clustering analysis method, wherein an integration strategy based on clustering weight information entropy is an effective novel integration clustering scheme, but the method has the defects of large calculation amount, sensitivity to basic clustering and the like.

In summary, the clustering method for power distribution network data in the prior art obviously has inconvenience and defects in practical use, so that improvement is needed.

Disclosure of Invention

In view of the above defects, the present invention provides a power distribution network data clustering integration method, which can effectively improve the clustering effect of power distribution network data.

In order to achieve the above object, the present invention provides a power distribution network data clustering integration method based on a hierarchical progressive strategy, which includes:

A. generating an alternative cluster set of the power distribution network data;

B. screening the candidate cluster set based on a hierarchical progressive local weight algorithm to obtain a basic cluster set;

C. and integrating the basic clustering set by a hierarchical clustering method to obtain final integrated clustering.

According to the power distribution network data clustering integration method, in the step A, the alternative clustering set is generated through a K-means method.

According to the power distribution network data clustering integration method, the step of generating the alternative cluster set through a K-means method comprises the following steps:

a1, initial data set S ═ S for distribution network data₁,s₂,...,s_nThe element number | S | is n, and the number of the alternative clusters of the alternative cluster set to be generated is M;

a2, setting a first control parameter j and setting an initial value to be 1, and defining a parameter k;

a3, if j is less than M, the slave data set S ═ S₁,s₂,...,s_nRandomly selecting K elements as initial cluster centers to form K clusters;

a4, setting a second control parameter i to be 1, if i is smaller than n-k, jumping to the step A5, otherwise, jumping to the step A7;

a5, respectively calculating Euclidean distances from the ith element to k cluster centers in the remaining n-k elements in the data set S, and selecting the cluster closest to the cluster center to join;

a6, adding 1 to the second control parameter i, and going to step A5;

a7, selecting 1 cluster center from the k clusters again to form k new cluster centers, calculating the Euclidean distance from the element to the new cluster center, if the new cluster center is the same as the previous cluster center, jumping to the step A8, otherwise, jumping to the step A4;

a8, obtaining the candidate cluster theta_jAnd the mark is determined clustering and added with an alternative clustering set theta;

a9, adding 1 to the first control parameter j, and jumping to the step A3;

a10, obtaining the candidate cluster set as Θ ═ θ₁，θ₂，...，θ_M}。

According to the power distribution network data clustering and integrating method, the step B comprises the following steps:

b1, using the candidate cluster set Θ { Θ ═ θ generated by the K-means method₁，θ₂，....θ，θ_MSetting an initial value of a control parameter M as 1, and setting a cycle time limit M;

b2, judging whether the control parameter M is less than or equal to M, if so, executing the next step, otherwise, turning to the step 9;

b3, calculating candidate cluster set theta ═ { theta ═ theta₁，θ₂，...，θ_MThe m-th cluster in (theta)_mAny one of the clusters of

θ_mE Θ, entropy of uncertainty information with respect to all said candidate clusters in Θ

B4, calculating theta obtained in the step B3_mOf each clusterSum of (θ)_m)；

B5 converting the sum value Σ (θ) using a normalized weight conversion method_m) Entropy of uncertain information of clusters in each of said candidate clusters

Said sum Σ (θ)_m) After normalization by the normalization weight conversion method, the weight W (Sigma (theta)) of the corresponding candidate cluster is obtained_m) Make the value interval of the weight be (0, 1)]；

B6, setting a first threshold α and a second threshold β, and 0< ═ α < β < ═ 1;

b7, W (theta) of all the candidate clusters calculated in the step B5_m) ) is compared to a first threshold α and a second threshold β if the candidate cluster θ is_mE theta satisfies the condition α < W (theta)_m) β and/or β W (Σ)(θ_m) 1), the candidate cluster is labeled as determined. If the candidate cluster is theta_mE theta is left with 0< W (theta)_m) If α, marking the alternative cluster as deletion;

b8, adding 1 to the control parameter m, and returning to the step B2.

B9, theta ═ theta for the candidate cluster set₁，θ₂，....θ，θ_MAnd extracting all the candidate clusters marked as determined from the theta, and re-marking the number of the candidate clusters marked as determined as M to obtain a basic cluster set theta ═ theta { (theta)₁，θ₂，...，θ_M}。

According to the power distribution network data clustering integration method, in the step B3, the uncertain information entropy is

The calculation formula of (2) is as follows:

wherein M is more than or equal to 1 and less than or equal to M, N is more than or equal to 1 and less than or equal to N^M，

1≤μ≤M，1≤j≤N^M，

| is the number of elements of the set.

According to the power distribution network data clustering integration method, in the step B4, the sum value sigma (theta)_m) The calculation formula of (2) is as follows:

wherein N is^m＝|θ_m|。

According to the power distribution network data clustering integration method, in the step B5, the calculation formula of the normalized weight conversion method is as follows:

W(*)＝e^-*。

according to the power distribution network data clustering and integrating method, the step C comprises the following steps:

c1, calculating an initial data set S ═ S of the distribution network data based on the weight information of the clusters in the basic cluster obtained in the step B₁,s₂,...,s_nThe times of occurrence of any two elements in the same cluster of each basic cluster in the basic cluster set and the product of the times and the cluster weight serve as weighted integration distance Dis(s) between any two elements_i,s_j)。

C2, initial data set S ═ { S } obtained based on step C1₁,s₂,...,s_xThe weighted integration distance Dis(s) between any two elements_i,s_j) And as the clustering distance among elements in the hierarchical clustering method, hierarchical clustering is carried out on the initial data set S, and the final integrated clustering output is obtained.

According to the power distribution network data clustering integration method, the initial data set S ═ S₁,s₂,...,s_xThe weighted integration distance Dis(s) between any two elements_i,s_j) The calculation formula of (2) is as follows:

wherein s is_i∈S,s_jE S and S_i≠s_j,，C_n ^mClustering θ for the basis_mMiddle S_iThe cluster is marked as s_i∈C_n ^m，C_n ^m∈θ_m,n∈[1,N^m]，w_i ^mIs equal to

If s is_i∈C_n ^mWhen s is_jAlso belong to the basis cluster theta_mCluster C in (1)_n ^mTime phi_ij ^mIf s is 1_i∈C_n ^mWhen s is_jNot belonging to said basic cluster θ_mCluster C in (1)_n ^mTime phi_ij ^m＝0。

Aiming at the characteristics of high complexity degree, huge data volume and the like of power grid distribution network data, a power distribution network data clustering integration method based on a hierarchical progressive strategy is used by introducing a hierarchical progressive thought in an uncertain theory, and firstly, an alternative clustering set of the power distribution network data is generated; then, screening the candidate cluster set based on a hierarchical progressive local weight algorithm to obtain a basic cluster set; and finally, integrating the basic clustering set by a hierarchical clustering method to obtain the final integrated cluster. Therefore, the power distribution network data clustering integration method based on the hierarchical progressive strategy can effectively improve the clustering effect of the power distribution network data.

Drawings

FIG. 1 is a flow chart of a power distribution network data clustering integration method based on a hierarchical progressive strategy according to the present invention;

FIG. 2 is a flowchart of a preferred embodiment of a method for clustering and integrating power distribution network data based on a hierarchical progressive strategy according to the present invention;

FIG. 3 is a flowchart of method steps of a preferred embodiment of a method for clustering and integrating power distribution network data based on a hierarchical progressive strategy according to the present invention;

FIG. 4 is one of the algorithm flow diagrams of the method steps corresponding to the preferred embodiment of the method for power distribution network data clustering integration based on the hierarchical progressive strategy of the present invention;

FIG. 5 is a flowchart of method steps of a preferred embodiment of a method for clustering and integrating power distribution network data based on a hierarchical progressive strategy according to the present invention;

fig. 6 is one of the algorithm flow charts of the method for power distribution network data clustering integration based on the hierarchical progressive strategy according to the present invention, which corresponds to the method steps of the preferred embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, in an embodiment of the present invention, a method for clustering and integrating power distribution network data based on a hierarchical progressive policy is provided, including:

step S101, generating an alternative cluster set of power distribution network data;

s102, screening the candidate cluster set based on a hierarchical progressive local weight algorithm to obtain a basic cluster set;

and S103, integrating the basic cluster set by a hierarchical clustering method to obtain the final integrated cluster.

In the embodiment, a research object is a data set of power distribution network data, aiming at the characteristics of high complexity degree, huge data volume and the like of the power distribution network data and the defects of the existing integrated clustering technology, a power distribution network data clustering integration method based on a hierarchical progressive strategy is used by introducing a hierarchical progressive thought in an uncertain theory, and firstly, an alternative clustering set of the power distribution network data is generated; then, screening the candidate cluster set based on a hierarchical progressive local weight algorithm to obtain a basic cluster set; and finally, integrating the basic clustering set by a hierarchical clustering method to obtain the final integrated cluster. Therefore, the anti-interference performance and robustness of the power distribution network data clustering and the final effect of the integrated clustering can be effectively improved, and a more powerful reference basis is provided for the power data clustering.

Referring to fig. 2, in an embodiment of the present invention, in the step S101, the set of candidate clusters is generated by a K-means method. Specifically, M candidate clusters (all labeled as definite clusters) are produced by a K-means clustering method on the data set S, wherein each candidate cluster has K clusters, and the candidate cluster set is composed of theta ═ theta₁，θ₂，...，θ_M}; the specific steps of generating the candidate cluster set by the K-means method comprise:

in step S201, S ═ S for the initial data set of the distribution network data₁,s₂,...,s_nThe element number | S | is n, and the number of the alternative clusters of the alternative cluster set to be generated is M;

step S202, setting a first control parameter j and setting an initial value to be 1, and defining a parameter k;

in step S203, if j is smaller than M, the slave data set S ═ S₁,s₂,...,s_nRandomly selecting K elements as initial cluster centers to form K clusters;

step S204, setting a second control parameter i to be 1, if i is smaller than n-k, jumping to step S205, otherwise, jumping to step S207;

step S205, selecting the added clusters according to Euclidean distance: respectively calculating Euclidean distances from the ith element to k cluster centers in the remaining n-k elements in the data set S, and selecting the cluster closest to the cluster center to join, namely selecting the cluster where the cluster center closest to the selected cluster center is located according to the Euclidean distances to join;

step S206, adding 1 to the second control parameter i, and going to step S205;

step S207, respectively selecting 1 cluster center from the k clusters again to form k new cluster centers, calculating the Euclidean distance from the element to the new cluster center, if the new cluster center is the same as the previous cluster center, jumping to step S208, otherwise, jumping to step S204;

step S208, obtaining the alternative cluster theta_jAnd the mark is determined clustering and added with an alternative clustering set theta;

step S209, adding 1 to the first control parameter j, and jumping to step S203;

step S210, obtaining the candidate cluster set as Θ ═ θ₁，θ₂，...，θ_M}。

Referring to fig. 3 to 4, fig. 3 is a flowchart of method steps of the present invention, fig. 4 is an algorithm flowchart corresponding to the method steps of this embodiment, in an embodiment of the present invention, basic clusters are screened based on a local weight algorithm of hierarchical progression, iterative computation of a hierarchical progression idea is performed based on two decision strategies in an uncertain theory and a local information entropy computation weight, and finally, basic clusters with higher quality and fewer number are screened. The step S102 includes:

step S301 of using the candidate cluster set Θ { θ } generated by the K-means method₁，θ₂，...，θ_MSetting an initial value of a control parameter M as 1, and setting a cycle time limit M;

step S302, judging whether the control parameter M is less than or equal to M, if so, executing the next step, otherwise, turning to the step S309;

step S303, calculating a candidate cluster set Θ ═ θ₁，θ₂，...，θ_MThe m-th cluster in (theta)_mAny one of the clusters ofθ_mE Θ, entropy of uncertainty information with respect to all said candidate clusters in Θ

In particular, the uncertain information entropy

The calculation formula of (2) is as follows:

1≤μ≤M，1≤j≤N^M，

| is the number of elements of the set.

Step S304, calculating theta obtained in step S303_mEntropy of uncertain information per cluster in

Sum of (θ)_m) (ii) a In particular, the sum Σ, (θ_m) The calculation formula of (2) is as follows:

wherein N is^m＝|θ_m|。

Step S305, converting the sum value sigma (theta) by using a normalized weight conversion method_m) Entropy encoding the uncertain information of each cluster in the candidate cluster

Said sum Σ (θ)_m) After normalization by the normalization weight conversion method, the weight W (Sigma (theta)) of the corresponding candidate cluster is obtained_m) Make the value interval of the weight be (0, 1)](ii) a Specifically, the calculation formula of the normalized weight conversion method is as follows:

W(*)＝e^-*。

step S306, setting a first threshold α and a second threshold β, and 0< ═ α < β < ═ 1;

step S307, the W (theta) of all the candidate clusters calculated in step S305 is calculated_m) Compare the candidate cluster to the first threshold α and the second threshold β if θ is the candidate cluster_mE theta satisfies the condition α < W (theta)_m) β and/or β W (theta)_m) And if so, marking the candidate cluster as determined (namely the candidate cluster has relatively good effect and can be used as effective candidate cluster data of the power distribution network).

If the candidate cluster is theta_mE theta is left with 0< W (theta)_m) If the value is less than α, marking the alternative cluster as deleted (namely the alternative cluster is relatively poor and cannot be used as effective alternative cluster data of the power distribution network);

in step S308, the control parameter m is incremented by 1, and the process returns to step S302.

Step S309, for the candidate cluster set Θ ═ θ₁，θ₂，...，θ_MExtracting all the candidate clusters marked as determined from the theta, and extracting all the candidate clusters marked as determinedAnd the number of the alternative clusters is recorded as M again, and a basic cluster set theta is obtained₁，θ₂，...，θ_M}。

Referring to fig. 5 to 6, fig. 5 is a flowchart of method steps of the present invention, fig. 6 is a flowchart of an algorithm corresponding to the method steps of the present embodiment, and in one embodiment of the present invention, the step S103 includes:

step S501, calculating an initial data set S ═ S of the power distribution network data based on the weight information of the clusters in the basic cluster obtained in the step B₁,s₂,...,s_nThe times of occurrence of any two elements in the same cluster of each basic cluster in the basic cluster set and the product of the times and the cluster weight serve as weighted integration distance Dis(s) between any two elements_i,s_j) In particular, the initial data set S ═ S₁,s₂,...,s_xThe weighted integration distance Dis(s) between any two elements_i,s_j) The calculation formula of (2) is as follows:

If s is_i∈C_n ^mWhen s is_jAlso belong to the basis cluster theta_mCluster C in (1)_n ^mTime phi_ij ^mIf s is 1_i∈C_n ^mWhen s is_jNot belonging to said basic cluster θ_mCluster C in (1)_n ^mTime phi i_j ^m＝0。

Step S502, based on the initial value obtained in step S501Data set S ═ S₁,s₂,...,s_xThe weighted integration distance Dis(s) between any two elements_i,s_j) And as the clustering distance among elements in the hierarchical clustering method, hierarchical clustering is carried out on the initial data set S by adopting a classical hierarchical clustering mode, and the final integrated clustering output is obtained.

In summary, the invention uses a power distribution network data clustering integration method based on a hierarchical progressive strategy by introducing a hierarchical progressive idea aiming at the characteristics of high complexity degree, huge data volume and the like of power distribution network data, and firstly, generates an alternative clustering set of the power distribution network data; then, screening the candidate cluster set based on a hierarchical progressive local weight algorithm to obtain a basic cluster set; and finally, integrating the basic clustering set by a hierarchical clustering method to obtain the final integrated cluster. Therefore, the power distribution network data clustering integration method based on the hierarchical progressive strategy can effectively improve the anti-interference performance and robustness of the power distribution network data clustering and the final effect of the integrated clustering, and provides a more powerful reference basis for the power data clustering.

The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it should be understood that various changes and modifications can be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A power distribution network data clustering integration method based on a hierarchical progressive strategy is characterized by comprising the following steps:

2. The power distribution network data clustering integration method according to claim 1, wherein in the step a, the candidate cluster set is generated by a K-means method.

3. The power distribution network data clustering integration method according to claim 2, wherein the step of generating the candidate cluster set by a K-means method comprises:

a6, adding 1 to the second control parameter i, and going to step A5;

a9, adding 1 to the first control parameter j, and jumping to the step A3;

4. The power distribution network data clustering and integrating method according to claim 3, wherein the step B comprises the steps of:

b1, using the candidate cluster set Θ { Θ ═ θ generated by the K-means method₁，θ₂，...，θ_MSetting an initial value of a control parameter M as 1, and setting a cycle time limit M;

b2, judging whether the control parameter M is less than or equal to M, if so, executing the next step, otherwise, turning to

Step 9;

Entropy of uncertain information with respect to all said candidate clusters in Θ

B4, calculating theta obtained in the step B3_mOf each cluster

Sum of (θ)_m)；

b7, W (theta) of all the candidate clusters calculated in the step B5_m) ) is compared to a first threshold α and a second threshold β if the candidate cluster θ is_mE.g. theta satisfiesCondition α < W (Theta)_m) β and/or β W (theta)_m) 1), the candidate cluster is labeled as determined. If the candidate cluster is theta_mE theta is left with 0< W (theta)_m) If α, marking the alternative cluster as deletion;

b8, adding 1 to the control parameter m, and returning to the step B2.

B9, theta ═ theta for the candidate cluster set₁，θ₂，...，θ_MAnd extracting all the candidate clusters marked as determined from the theta, and re-marking the number of the candidate clusters marked as determined as M to obtain a basic cluster set theta ═ theta { (theta)₁，θ₂，...，θ_M}。

5. The power distribution network data clustering integration method according to claim 4, wherein in the step B3, the uncertain information entropy is determined

The calculation formula of (2) is as follows:

wherein M is more than or equal to 1 and less than or equal to M, N is more than or equal to 1 and less than or equal to N^M，1≤μ≤M，1≤j≤N^M，

| is the number of elements of the set.

6. The power distribution network data clustering integration method according to claim 4, wherein in the step B4, the sum value Σ (θ)_m) The calculation formula of (2) is as follows:

wherein N is^m＝θ_m|。

7. The method for clustering and integrating the data of the power distribution network according to claim 4, wherein in the step B5, the calculation formula of the normalized weight conversion method is as follows:

W(*)＝e^-*。

8. the power distribution network data clustering integration method according to claim 1, wherein the step C comprises:

9. The power distribution network data clustering integration method according to claim 1, wherein the initial data set S ═ S₁,s₂,...,s_xThe weighted integration distance Dis(s) between any two elements_i,s_j) The calculation formula of (2) is as follows:

wherein s is_i∈S,s_jE S and S_i≠s_j,，C_n ^mIs the base polymerClass theta_mMiddle S_iThe cluster is marked as s_i∈C_n ^m，C_n ^m∈θ_m,n∈[1,N^m]，w_i ^mIs equal to