CN114004285A

CN114004285A - Non-invasive load identification method based on improved kNN algorithm

Info

Publication number: CN114004285A
Application number: CN202111201436.3A
Authority: CN
Inventors: 王新迪; 卞海红; 潘柯言; 王新策; 房可
Original assignee: Nanjing Institute of Technology
Current assignee: Nanjing Institute of Technology
Priority date: 2021-10-15
Filing date: 2021-10-15
Publication date: 2022-02-01

Abstract

The invention provides a non-invasive load identification method based on an improved kNN algorithm, which comprises the following steps: step S01: selecting a V-I track as a load characteristic, extracting a track characteristic, and adding an amplitude characteristic in the extracted track characteristic; step S02: improving the KNN algorithm, distributing different weights to the training samples in the KNN algorithm, and increasing the voting weight of the minority samples in the classification judgment, wherein the step S03 is as follows: two load characteristics of a binary V-I track and an amplitude are extracted in the step S01, a comprehensive similarity concept is introduced, the two load characteristics are combined by using the improved KNN algorithm in the step S02, the category of the sample to be detected is determined, and the load is identified. Under the condition of unbalanced data set, the method improves the identification accuracy of a few types of samples with similar V-I track shapes with a plurality of types by increasing the weight, and improves the identification accuracy of two types of electric equipment with the same front-end circuit topology but different power levels.

Description

Non-invasive load identification method based on improved kNN algorithm

Technical Field

The invention relates to a non-invasive load identification method based on an improved kNN algorithm.

Background

The kNN algorithm classifies samples to be detected by comparing the similarity between the samples to be detected and a large number of training samples, and has the core idea that K samples which are most similar to the samples to be detected are selected, and if the total similarity between the samples to be detected and one class of the K training samples is the maximum, the samples to be detected are classified into the class.

The kNN algorithm has the disadvantage that when the data set has an imbalance problem, most training samples with a large number of samples are easily selected as K nearest neighbors, and the judgment of the few classes is interfered.

Disclosure of Invention

1. The technical problem to be solved is as follows:

the conventional kNN algorithm has the problem that when a data set has an unbalanced problem, most types of training samples with a large number of samples are easy to select as K nearest neighbors, so that the judgment of the few types of training samples is interfered.

2. The technical scheme is as follows:

in order to solve the above problems, the present invention provides a non-intrusive load identification method based on an improved kNN algorithm, comprising the following steps: step S01: selecting a V-I track as a load characteristic, extracting a track characteristic, and adding an amplitude characteristic in the extracted track characteristic; step S02: improving the KNN algorithm, distributing different weights to the training samples in the KNN algorithm, and increasing the voting weight of the minority samples in the classification judgment, wherein the step S03 is as follows: two load characteristics of a binary V-I track and an amplitude are extracted in the step S01, a comprehensive similarity concept is introduced, the two load characteristics are combined by using the improved KNN algorithm in the step S02, the category of the sample to be detected is determined, and the load is identified.

The amplitude characteristics comprise active power, reactive power and current amplitude of fundamental wave and current amplitude of 3, 5 and 7 harmonic waves when the electric equipment is in steady operation.

Under the condition of increasing amplitude characteristics, the amplitude and the phase of fundamental waves and each harmonic can be obtained by performing fast Fourier transform on voltage and current, and the calculation formula of power is as follows:

wherein: a is₁And b₁The amplitudes of the fundamental voltage and the current respectively;

is the phase difference between the two.

In step S02, the improved KNN algorithm specifically includes:

wherein: weight (T)_j) For training sample T_jThe weight of (c).

The majority sample weight assignment method is as follows:

wherein size (C)_Tj) Represents T_jThe number of training samples contained in the category.

Step S03 specifically includes: step S031: calculating the V-I track similarity and the amplitude similarity of the sample to be detected and all training samples, and respectively recording the V-I track similarity and the amplitude similarity as Sim1 and Sim 2:

wherein: dist1 and dist2 are the distance of the V-I locus and the distance of the amplitude among 2 samples respectively; step S032: arranging the samples in descending order according to the sizes of the Sim1, and taking the first K training samples with the largest Sim1 as the K nearest neighbors of the current test samples; step S033: calculating the comprehensive similarity between the current test sample and all K nearest neighbors: sim (a, T)_j)＝Sim 1(a，T_j)×weight(T_j)+Sim 2(a，T_j)；

Step S034: and (4) counting the total comprehensive similarity of the sample to be detected and each class in the K nearest neighbors, and taking the class with the maximum total comprehensive similarity as a prediction result.

Both the dist1 and dist2 are Euclidean distances.

3. Has the advantages that:

under the condition of unbalanced data set, the method improves the identification accuracy of a few types of samples with similar V-I track shapes with a plurality of types by increasing the weight, and improves the identification accuracy of two types of electric equipment with the same front-end circuit topology but different power levels.

Detailed Description

The present invention will be described in detail below.

The method is used for solving the problem that different types of electric equipment with similar topological structures of a front-stage circuit cannot be distinguished due to the fact that numerical characteristics are lacked in a V-I track.

Step S01, the shape of the V-I track of the electric equipment is related to the topological structure of the preceding stage circuit, the function range of the electric equipment can be divided according to the characteristic, and the requirement on the completeness of the database is reduced, therefore, the invention firstly selects the V-I track as the load characteristic, the extraction method of the track characteristic is to convert the original V-I track into the binary V-I track through the mapping [6,12], and the process is as follows: firstly, waveform data of high-frequency voltage u and current I in one period during stable operation of electric equipment are collected, and an original V-I track is drawn by taking u as an abscissa and I as an ordinate. Dividing the voltage-current 2-dimensional plane into 2 Nx 2N grids, and calculating the length (voltage span) and the height (current span) of each grid as follows:

initializing a 2-dimensional matrix B with dimensions 2 Nx 2N, each element being assigned a value of 1, and displaying as white, for a data point (u) in the original V-I trajectory_j，i_j) (J ═ 1, 2, …, J), the index of the position it occupies in matrix B is (x)_j，y_j) If 0 < x_j< 2N +1 and 0 < y_j< 2N +1, the element B (x) of the matrix B_j，y_j) Set to 0, indicating that the V-I track of the device passes through this cell, marked black:

as can be seen from the above binary trajectory extraction method, the mapping process is equivalent to normalizing the voltage and current data, and the trajectory includes only shape features reflecting information such as voltage-current phase difference, load nonlinearity, and harmonic characteristics, but does not include features related to the power level. When the V-I tracks of the 2 types of electric equipment are similar, misjudgment is easy to occur, so that the distinguishability of the electric equipment is improved by increasing the dimension of the amplitude characteristic.

The amplitude characteristics comprise active power, reactive power and current amplitude of fundamental wave and current amplitude of 3, 5 and 7 harmonic waves when the electric equipment is in steady operation. The amplitude and the phase of the fundamental wave and each harmonic can be obtained by performing fast Fourier transform on the voltage and the current, and the calculation formula of the power is as follows:

is the phase difference between the two.

Step S02 improved kNN algorithm

The specific process of the kNN algorithm is as follows:

calculating the similarity of a and all training samples for a to-be-detected sample a, arranging the similarity in a reverse order, and taking the first K as K nearest neighbors of a;

secondly, respectively calculating the sum of the similarity of each category of the a and K nearest neighbors, wherein the category of the a is the category with the maximum total similarity, such as: sample a and class C_iHas a total similarity of

In the formula: t is_jThe j (th) nearest neighbor of the sample a to be detected is shown, if T_jBelong to class C_iA and C_iThe overall similarity of (a) increases, and the final class of a is:

the kNN algorithm has the disadvantage that when the data set has an imbalance problem, most training samples with a large number of samples are easily selected as K nearest neighbors, and the judgment of the few classes is interfered. Firstly, by using an under-sampling or over-sampling method, the number of two types of samples is enabled to be close by deleting most types of samples or synthesizing few types of samples, thereby eliminating the problem of data set imbalance; and secondly, the algorithm is improved, different weights are distributed to the training samples, and the voting weight of the minority samples during classification judgment is increased. In order not to delete useful data or introduce redundant data, the invention adopts a second type of solution to improve the algorithm of the total similarity of the sample a and the class Ci:

wherein: weight (T)_j) For training sample T_jWeight of (1) is T_jWhen the weight is distributed, the principle that the weight of a few types of samples is great and the weight of a majority type of samples is small should be followed, and the distribution method is as follows:

Step S03 is a category decision method based on the integrated similarity. After the similarity calculation method based on the weight is determined, the next step is to judge the category of the sample to be detected by using a judgment rule, because the invention extracts two load characteristics of a binary V-I track and an amplitude value, the concept of comprehensive similarity is introduced, the category of the sample to be detected is determined according to the comprehensive similarity by combining the two load characteristics, and the process is as follows: step S031 calculates V-I track similarity and amplitude similarity of the sample to be measured and all training samples, and respectively records as Sim1 and Sim 2:

Sim 1＝1/(1+dist1)

Sim 2＝1/(1+dist2)，

wherein: dist1 and dist2 are distances of V-I locus and amplitude among 2 samples respectively, and are Euclidean distances. Step S032, arranging the training samples in descending order according to the sizes of the Sim1, and taking the first K training samples with the largest Sim1 as the K nearest neighbors of the current test sample;

step SO 33: calculating the comprehensive similarity between the current test sample and all K nearest neighbors:

Sim(a，T_j)＝Sim 1(a，T_j)×weight(T_j)+Sim2(a，T_j)，

step SO 34: and (4) counting the total comprehensive similarity of the sample to be detected and each class in the K nearest neighbors, and taking the class with the maximum total comprehensive similarity as a prediction result.

Claims

1. A non-intrusive load identification method based on an improved kNN algorithm comprises the following steps: step S01: selecting a V-I track as a load characteristic, extracting a track characteristic, and adding an amplitude characteristic in the extracted track characteristic; step S02: improving the KNN algorithm, distributing different weights to the training samples in the KNN algorithm, and increasing the voting weight of the minority samples in the classification judgment, wherein the step S03 is as follows: two load characteristics of a binary V-I track and an amplitude are extracted in the step S01, a comprehensive similarity concept is introduced, the two load characteristics are combined by using the improved KNN algorithm in the step S02, the category of the sample to be detected is determined, and the load is identified.

2. The method of claim 1, wherein the amplitude characteristics include fundamental active, reactive power, fundamental current amplitude, and 3, 5, 7 harmonic current amplitude at steady state operation of the powered device.

3. The method of claim 2, wherein: under the condition of increasing amplitude characteristics, the amplitude and the phase of fundamental waves and each harmonic can be obtained by performing fast Fourier transform on voltage and current, and the calculation formula of power is as follows:

is the phase difference between the two.

4. The method of claim 1, wherein: in step S02, the improved KNN algorithm specifically includes:

wherein: weight (T)_j) For training sample T_jThe weight of (c).

5. The method of claim 3, wherein: the majority sample weight assignment method is as follows: weight (T)_j)＝1/size(C_Tj) Wherein size (C)_Tj) Represents T_jThe number of training samples contained in the category.

6. The method of any one of claims 1 to 5, wherein: step S03 specifically includes: step S031: calculating the V-I track similarity and the amplitude similarity of the sample to be detected and all training samples, and respectively recording the V-I track similarity and the amplitude similarity as Sim1 and Sim 2:

Sim＝1/(1+dist)

sim2 ═ 1/(1+ dist2), where: dist1 and dist2 are the distance of the V-I locus and the distance of the amplitude among 2 samples respectively; step S032: arranging the training samples according to the descending order of the sizes of the Sim1, and taking the first K training samples with the largest Sim1 as the current testK nearest neighbors of the sample; step S033: calculating the comprehensive similarity between the current test sample and all K nearest neighbors: sim (a, T)_j)＝Sim1(a，T_j)×weight(T_j)+Sim2(a，T_j) (ii) a Step S034: and (4) counting the total comprehensive similarity of the sample to be detected and each class in the K nearest neighbors, and taking the class with the maximum total comprehensive similarity as a prediction result.

7. The method of claim 6, wherein: both the dist1 and dist2 are Euclidean distances.