CN110287762B - Non-invasive load identification method and device based on data mining technology - Google Patents

Non-invasive load identification method and device based on data mining technology Download PDF

Info

Publication number
CN110287762B
CN110287762B CN201910267587.5A CN201910267587A CN110287762B CN 110287762 B CN110287762 B CN 110287762B CN 201910267587 A CN201910267587 A CN 201910267587A CN 110287762 B CN110287762 B CN 110287762B
Authority
CN
China
Prior art keywords
load
data set
sample data
time
period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910267587.5A
Other languages
Chinese (zh)
Other versions
CN110287762A (en
Inventor
季海娟
朱德省
黄柳胜
腾锋雷
季海涛
顾心怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Linyang Energy Co ltd
Original Assignee
Jiangsu Linyang Energy Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Linyang Energy Co ltd filed Critical Jiangsu Linyang Energy Co ltd
Priority to CN201910267587.5A priority Critical patent/CN110287762B/en
Publication of CN110287762A publication Critical patent/CN110287762A/en
Application granted granted Critical
Publication of CN110287762B publication Critical patent/CN110287762B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A non-intrusive load identification method and device based on a data mining technology are disclosed, the method comprises the following steps: acquiring a sample data set of load characteristics; carrying out pretreatment operation; performing feature selection on the preprocessed sample data set, and reconstructing the sample data set; establishing a random forest model by taking the CART trees as weak classifiers based on the reconstructed sample data set, and reserving the partition attribute and the attribute value of each CART tree to form a model parameter library; and for the waveform data of the voltage and the current within a period of time to be analyzed, carrying out load identification according to the model parameter library to obtain the load proportion composition condition within the period of time. The method fully identifies the load at each time point at the tail end of the power grid by using data mining technologies such as data preprocessing, feature selection, random forest models and the like, can accurately obtain the load proportion composition within a period of time, and can provide reference for future power supply of the power grid and energy efficiency management of large equipment and electric appliances of a factory.

Description

Non-invasive load identification method and device based on data mining technology
Technical Field
The invention relates to the field of non-invasive load identification, in particular to a non-invasive load identification method based on a data mining technology.
Background
The non-intrusive load identification means that the smart meter or the non-intrusive load decomposition device directly installed on the main circuit acquires information of current, voltage and power of the total electric load and identifies the operation condition of each electric device in the house or the plant according to the information. Compared with an intrusive load identification method, the method is simple and convenient, a large amount of additional equipment is not needed to be installed, a large amount of manpower and financial resources are saved, the method is easy to popularize, and effective energy-saving measures can be taken according to comprehensive information such as a load identification result, load proportion composition and the like.
Although the concept of non-intrusive load identification has been proposed for decades, no effective non-intrusive load identification method has been developed.
With the rapid development of data mining and machine learning technologies, the field is paid attention by a plurality of researchers. Nowadays, data mining technology has been widely applied in various aspects such as speech recognition, face recognition, number recognition, etc., and the performance of the integrated learning random forest in numerous matches is more remarkable, and due to the introduction of randomness, overfitting and noise data influence can be well prevented.
Disclosure of Invention
The invention aims to provide a load identification method based on a data mining technology, and provides a method for fully identifying loads at all time points at the tail end of a power grid by using data mining technologies such as data preprocessing, feature selection, a random forest model based on a CART tree and the like, so that the proportion composition of the loads in a period of time can be accurately obtained, certain reference can be provided for future power supply of the power grid and energy efficiency management of large equipment and electric appliances of a factory, and the method is particularly suitable for large customer groups with fixed load types.
The technical scheme of the invention is as follows:
the invention provides a non-intrusive load identification method based on a data mining technology, which comprises the following steps:
step one, acquiring a sample data set of load characteristics;
secondly, preprocessing the sample data set;
thirdly, performing feature selection on the preprocessed sample data set, and reconstructing the sample data set based on the load feature data reserved after the feature selection;
step four, establishing a random forest model by taking the CART trees as weak classifiers based on the reconstructed sample data set, and reserving the partition attributes and attribute values of the CART trees to form a model parameter library;
and step five, carrying out load identification on the waveform data of the voltage and the current within a period of time to be analyzed according to the model parameter library established in the step four, and obtaining the load proportion constitution condition within the period of time.
Further, in the first step:
n electric load appliances to be identified are combined to obtain n 2 -1 combined load category with no load of 0;
acquiring a sample data set within a preset time window and a preset duration of each load type, wherein each sample in the sample data set comprises the following load characteristics: values of active power, reactive power, fundamental current, second harmonic, third harmonic, fifth harmonic, seventh harmonic, ninth harmonic, eleventh harmonic, thirteenth harmonic, voltage, current and phase of the electrical load. (obtained by FFT Fourier transform calculation)
Further, the second step specifically includes the following steps:
s2.1: presetting a maximum threshold value of each load characteristic in a sample data set, carrying out abnormal value processing on the acquired sample data set, and deleting the sample when one-dimensional load characteristics exceed the corresponding threshold value, wherein the dimension is the characteristic number;
s2.2: taking out the maximum value x of each dimension in the load characteristic for all sample data sets max And the minimum value x min Carrying out normalization operation on the data of each dimension of the load characteristic by adopting a formula (1) to obtain a preprocessed sample data set;
Figure BDA0002017333550000031
wherein x represents corresponding dimension data, namely corresponding load characteristic parameter values, and x' represents corresponding dimension data after preprocessing.
Further, in the third step, the feature selection specifically includes the following processes:
s3.1: processing the preprocessed sample data set:
firstly, respectively obtaining the sequence of load characteristics by adopting a univariate characteristic selection method based on a random forest model, an overall characteristic selection method based on the random forest model and a chi-square detection method, and selecting a plurality of load characteristics in front of various processing methods;
secondly, load characteristics with the same quantity as the load characteristics of the first three processing methods are obtained by adopting a recursive characteristic elimination method; (recursive feature elimination is used to iteratively build the model, then select the best load feature, set aside the selected load feature, and repeat the process on the remaining load features until the first plurality of load features have been selected)
S3.2: and (3) selecting the load characteristics contained in the results of any three methods for the load characteristics obtained by the four processing methods in the step (3.1), and adding the load characteristics into finally-reserved load characteristic data, wherein the finally-reserved load characteristic data are omega.
Further, in step S3.1, the first three methods select the load characteristics of the top seven ranks.
Further, in step S3.1, the recursive feature elimination method specifically includes: and (3) processing the preprocessed sample data set by adopting a sampling recursive feature elimination method, wherein the recursive feature elimination method obtains the sequencing of the load features according to the built model, then selects the one-dimensional features which can identify and obtain the most electric appliances, then reconstructs the model on the rest load features, and repeats the recursion in the above way until the selected feature quantity is consistent with the quantity obtained by the first three processing methods.
Further, the fourth step specifically includes the following steps:
s4.1: randomly extracting 70% of samples in the reconstructed sample data set as N training sample sets with the samples replaced for N times;
s4.2: establishing N corresponding CART trees aiming at N training sample sets;
s4.3: pruning the N CART trees, including height limitation and minimum leaf node splitting number limitation, to obtain N cut CART trees, and establishing a random forest model; (where N CART tree models have been generated based on a sample set, height limiting refers to limiting the height of the generated CART trees, not requiring the trees to be too high, and a minimum number of splits at a leaf node refers to a node that does not split any further down to become a leaf node when the number of samples in the node is less than the minimum number of splits)
S4.4: and taking the division attributes and the values of the N cut CART trees in the random forest model, and adding the division attributes and the values into a model parameter library.
Further, the step five specifically includes the following processes:
s5.1: for voltage and current waveform data of each time window in a period of time to be analyzed, omega load characteristic data corresponding to a model parameter library are obtained, and the maximum value x of each dimension of the load characteristic is determined max And the minimum value x min Carrying out normalization operation on omega load characteristics by adopting a formula (1) to obtain a data set to be analyzed;
s5.2: carrying out load identification on the data set to be analyzed according to the model parameter library to obtain N load identification results, and taking the load identification result with the largest occurrence frequency as a final load identification result; (the model parameter library refers to a library formed by the partition attributes and the values of N CART trees, then N load identification results are obtained according to the model parameter libraries (the partition attributes and the values of the N CART trees) of the N CART trees after a data set to be analyzed comes in, then the results are selected, and the load identification result with the largest occurrence frequency is the final load identification result.)
S5.3: and calculating the load proportion composition condition in a period of time to be analyzed according to the load identification result of each time window and the active power corresponding to the time window.
Further, in S5.3, the calculating of the load proportion composition specifically includes the following steps:
if the load identification result of a certain time window is of a single electrical appliance, the load electricity consumption W in the time window i The power consumption W of the load is obtained from the active power value multiplied by the time window value on each time window i The calculation formula is as follows:
W i =p i ×t (2)
wherein: i represents the number of the time window; p is a radical of i Representing the active power in the data corresponding to the time window, t being in seconds, representing the time windowA port value, one piece of data is obtained every t seconds;
if the load identification result of a certain time window is the combined electrical apparatus, the electric quantity W commonly used by the combined electrical apparatus of the time window is obtained according to the formula (2) i Dividing the electric quantity W according to the ratio of the average active power values of various electric appliances related to other time windows in the time period in the combined electric appliance i The calculation formula of the load electricity consumption is as follows:
Figure BDA0002017333550000051
wherein: j represents the number of the appliance, m represents the total number of appliances,
Figure BDA0002017333550000052
the average active power W of each time window of the electric appliance j representing the other identification results of the electric appliance j in the period of time as a single electric appliance ij The calculated load electricity consumption of the jth electrical appliance in the ith time window is used;
then, adding the electricity consumption of each electric appliance related to each time window in the period of time to obtain the electricity consumption of each electric appliance in the period of time;
and finally, calculating the ratio of the load consumption to the total power consumption in the period of time to obtain the load proportion composition condition in the period of time to be analyzed.
A non-intrusive load identification apparatus based on data mining technology, the apparatus comprising:
a sample acquisition module: the method comprises the steps of obtaining a sample data set of load characteristics;
a pretreatment module: the preprocessing operation is used for preprocessing the sample data set;
a sample recombination module: performing feature selection on the preprocessed sample data set, and reconstructing the sample data set based on the load feature data reserved after the feature selection;
a parameter base establishing module: establishing a random forest model by taking the CART trees as weak classifiers based on the reconstructed sample data set, and reserving the partition attribute and the attribute value of each CART tree to form a model parameter library;
a load identification module: and carrying out load identification on voltage and current waveform data to be analyzed in a period of time according to a model parameter library, and calculating the load proportion composition condition in the period of time.
The invention has the beneficial effects that:
the invention has simple operation, and the related load characteristics are easy to extract and representative. The invention adds a characteristic selection link when using the characteristics, reduces the characteristic dimension and reduces the calculated amount in the process of load identification.
The invention identifies the load of each time window in a period of time, the result has little influence on the final load proportion composition condition, and the identification accuracy is high.
The load identification is carried out through the integrated learning random forest model, the noise resistance is strong, and overfitting can be prevented; the method can provide certain reference for future power supply of a power grid and energy efficiency management of large equipment and electric appliances of a factory, and is particularly suitable for load monitoring and distinguishing of large customer groups with fixed load types.
Additional features and advantages of the invention will be set forth in the detailed description which follows.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a flow chart of the feature selection process of the present invention.
FIG. 3 is a flow chart of a random forest model building process of the present invention.
FIG. 4 is an exemplary diagram of a decision tree result generated in an embodiment of the present invention.
Wherein, X0, X1, X2, X3 and X6 respectively refer to load characteristic active power, fundamental current, third harmonic, fifth harmonic and phase, value refers to sample division condition, gini refers to gini value obtained by gini index, and samples refers to the number of samples included in the class.
Detailed Description
Preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein.
The invention provides a non-intrusive load identification method based on a data mining technology, which comprises the following steps:
step one, acquiring a sample data set of load characteristics; n electric load appliances to be identified are combined to obtain n 2 -1 combined load category with no load of 0;
acquiring a sample data set within a preset time window and a preset duration of each load type, wherein each sample in the sample data set comprises the following load characteristics: the load characteristics of the electric load comprise values of active power, reactive power, fundamental current, second harmonic, third harmonic, fifth harmonic, seventh harmonic, ninth harmonic, eleventh harmonic, thirteenth harmonic, voltage, current and phase, and are obtained by FFT (fast Fourier transform) calculation;
secondly, preprocessing the sample data set;
s2.1: presetting a maximum threshold value of each load characteristic in a sample data set, carrying out abnormal value processing on the acquired sample data set, and deleting the sample when one-dimensional load characteristics exceed the corresponding threshold value, wherein the dimension is the characteristic number;
s2.2: taking out the maximum value x of each dimension in the load characteristic for all sample data sets max And the minimum value x min Carrying out normalization operation on the data of each dimension of the load characteristic by adopting a formula (1) to obtain a preprocessed sample data set;
Figure BDA0002017333550000071
wherein x represents corresponding dimension data, namely corresponding load characteristic parameter values, and x' represents corresponding dimension data after preprocessing.
Thirdly, performing feature selection on the preprocessed sample data set, and reconstructing the sample data set based on the load feature data reserved after the feature selection;
further, in the third step, the feature selection specifically includes the following processes:
s3.1: processing the preprocessed sample data set:
firstly, a univariate feature selection method based on a random forest model, an overall feature selection method based on the random forest model and a chi-square detection method are adopted to respectively obtain the sequence of load features, for example: selecting the load characteristics of the first seven of the three processing methods;
secondly, a recursive feature elimination method is adopted to obtain useful features.
S3.2: and (3) selecting the load characteristics obtained by the four processing methods in the step (3.1), adding the load characteristics contained in the results of any three methods into finally-reserved load characteristic data, wherein the finally-reserved load characteristic data are omega.
Step four, establishing a random forest model by taking the CART trees as weak classifiers based on the reconstructed sample data set, and reserving the partition attributes and attribute values of the CART trees to form a model parameter library;
s4.1: randomly extracting 70% of samples in the reconstructed sample data set as N training sample sets with the samples replaced for N times;
s4.2: establishing N corresponding CART trees aiming at N training sample sets;
s4.3: pruning the N CART trees, including height limitation and minimum splitting number limitation of leaf nodes, to obtain N cut CART trees, and establishing a random forest model; ( Note: the N CART tree models are generated according to the sample set, the height limitation means that the height of the generated CART tree is limited, the tree is not required to be too high, the minimum splitting number of the leaf node means that when the number of samples in a certain node on a branch is less than the minimum splitting number, the node is not split downwards and becomes a leaf node, )
S4.4: and taking the division attributes and the values of the N cut CART trees in the random forest model, and adding the division attributes and the values into a model parameter library.
And step five, carrying out load identification on the waveform data of the voltage and the current within a period of time to be analyzed according to the model parameter library established in the step four, and obtaining the load proportion constitution condition within the period of time.
S5.1: for voltage and current waveform data of each time window in a period of time to be analyzed, omega load characteristic data corresponding to a model parameter library are obtained, and the maximum value x of each dimension of the load characteristic is determined max And the minimum value x min Carrying out normalization operation on omega load characteristics by adopting a formula (1) to obtain a data set to be analyzed;
s5.2: carrying out load identification on the data set to be analyzed according to the model parameter library to obtain N load identification results, and taking the load identification result with the largest occurrence frequency as a final load identification result; ( Note: the model parameter library refers to a library formed by the partition attributes and the values of the N CART trees, then N load identification results are obtained according to the model parameter library of the N CART trees (the partition attributes and the values of the N CART trees) after a data set to be analyzed comes in, then the results are selected, and the load identification result with the largest occurrence frequency is the final load identification result. )
S5.3: and calculating the load proportion composition condition in a period of time to be analyzed according to the load identification result of each time window and the active power corresponding to the time window.
If the load identification result of a certain time window is of a single electrical appliance, the load electricity consumption W in the time window i The power consumption W of the load is obtained from the active power value multiplied by the time window value on each time window i The calculation formula is as follows:
W i =p i ×t (2)
wherein: i represents the number of the time window; p is a radical of i Indicating the number of corresponding time windowsAccording to the active power, the unit of t is second, a time window value is represented, and one piece of data is obtained every t seconds;
if the load identification result of a certain time window is the combined electrical apparatus, the electric quantity W commonly used by the combined electrical apparatus of the time window is obtained according to the formula (2) i Dividing the electric quantity W according to the ratio of the average active power values of various electric appliances in the combined electric appliance in other time windows in the period i The formula for calculating the load electricity consumption is as follows:
Figure BDA0002017333550000091
wherein: j represents the number of the appliance, m represents the total number of appliances,
Figure BDA0002017333550000092
the average active power W of each time window of the electric appliance j representing the other identification results of the electric appliance j in the period of time as a single electric appliance ij The calculated load electricity consumption of the jth electrical appliance in the ith time window is used;
then, adding the electricity consumption of each electric appliance related to each time window in the period of time to obtain the electricity consumption of each electric appliance in the period of time;
and finally, calculating the ratio of the load usage to the total power consumption in the period of time to obtain the load proportion composition condition in the period of time to be analyzed.
A non-intrusive load identification apparatus based on data mining technology, the apparatus comprising:
a sample acquisition module: the method comprises the steps of obtaining a sample data set of load characteristics;
a preprocessing module: the preprocessing operation is used for preprocessing the sample data set;
a sample recombination module: performing feature selection on the preprocessed sample data set, and reconstructing the sample data set based on the load feature data reserved after the feature selection;
a parameter base establishing module: establishing a random forest model by taking the CART trees as weak classifiers based on the reconstructed sample data set, and reserving the partition attribute and the attribute value of each CART tree to form a model parameter library;
a load identification module: and carrying out load identification on voltage and current waveform data to be analyzed in a period of time according to a model parameter library, and calculating the load proportion composition condition in the period of time.
Examples
In this embodiment, as shown in fig. 1, a non-intrusive load identification method based on a data mining technology of the present invention includes the following steps:
step one, acquiring a load characteristic sample data set;
taking the example of collecting voltage and current waveform data of 4 load electrical appliances and their combination modes within a period of time from beginning to end of operation, the 4 load electrical appliances are respectively microwave oven, electric cooker, electric kettle and TV set, and the combination modes of them have 15 load categories, and the load categories are respectively assigned as 1-15, and the no-load is assigned as 0 category. The method comprises the steps of calculating the active power, the reactive power, the voltage, the current and the phase value of the voltage and current waveform of each load category according to each time window (80 ms), obtaining the values of fundamental current, second harmonic, third harmonic, fifth harmonic, seventh harmonic, ninth harmonic, eleventh harmonic and thirteenth harmonic through FFT conversion in the time window, combining the load characteristics into one sample, and collecting the sample data to obtain a plurality of samples which are obtained by dividing according to the time window (80 ms).
Secondly, preprocessing the sample data set;
(1) Abnormal value processing is carried out on the sample data set, and the value is considered as an abnormal sample if only one-dimensional load characteristic value exceeds a threshold value through manual intervention, and is deleted;
(2) After the abnormal sample is deleted, the maximum value x of each dimension load characteristic in the sample data set is taken out max And the minimum value x min Then, carrying out normalization operation on the data of each dimension of the load characteristics;
thirdly, performing feature selection on the preprocessed sample data set, and reconstructing the sample data set by using the reserved load feature data, as shown in fig. 2;
(1) Performing feature selection on the normalized sample data set by using four feature selection methods, namely a univariate feature selection method of a random forest, an overall feature selection method based on the random forest, chi-square detection and recursive feature elimination;
(2) According to the load characteristic ordering obtained by the 4 methods or the suggested characteristic, 7 load characteristics of active power, fundamental wave, third harmonic, fifth harmonic, seventh harmonic, voltage and phase are extracted, and a sample data set is reconstructed;
step four, establishing a random forest model by taking the CART trees as weak classifiers based on the reconstructed sample data set, and reserving the partition attributes and attribute values of the CART trees to form a model parameter library, as shown in FIG. 3;
(1) Dividing the sample into N times, taking 10 times as an example, randomly extracting 70% of samples in the reconstructed sample data set as 10 training sample sets;
(2) Establishing 10 CART trees aiming at 10 training sample sets, removing 70% of the training sample sets from the reconstructed sample data set, and testing the accuracy of the CART tree model by using the rest 30% of samples;
(3) Pruning 10 CART trees, namely obtaining 10 cut CART trees by mainly limiting the height to be 4 layers and setting the minimum splitting number of leaf nodes to be 200, wherein the 10 CART tree models are combined to form a random forest model, and a certain cut CART tree is represented in a graph 4;
(4) Taking out the partition attributes and values of 10 CART trees in the random forest model, for example, X1, X2, X3, etc. in FIG. 4 are partition attributes, and 0.066, 0.035, 0.853, etc. are values taken by the partition attributes, and storing the two parts one by one to generate a model parameter library;
and step five, carrying out load identification on voltage and current waveform data to be analyzed in a period of time according to a model parameter library, and calculating the load proportion composition condition in the period of time.
(1) According to the voltage and current waveform data of each time window (80 ms) in a period of time to be analyzed, acquiring data of 7 selected load characteristics according to the characteristics in the third step in a one-to-one manner;
(2) Using the maximum value x of each dimension of the saved load characteristics max And the minimum value x min Carrying out normalization operation on 7 load characteristics of each time window (80 ms) in a period of time to be analyzed to obtain a data set to be analyzed;
(3) Carrying out load identification on a data set to be analyzed according to the partition attributes and the values of 10 CART trees in a random forest model parameter library to obtain 10 belonged load results, and determining the final belonged load through a voting selection method;
(4) And calculating the load proportion composition condition in the period of time and the power consumption condition of each load according to the load identification result of each time window (80 ms) and the corresponding active power.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are also included in the scope of the present invention.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Claims (10)

1. A non-intrusive load identification method based on a data mining technology is characterized by comprising the following steps:
step one, acquiring a sample data set of load characteristics;
secondly, preprocessing the sample data set;
thirdly, performing feature selection on the preprocessed sample data set, and reconstructing the sample data set based on the load feature data reserved after the feature selection;
step four, establishing a random forest model by taking the CART trees as weak classifiers based on the reconstructed sample data set, and reserving the partition attributes and attribute values of the CART trees to form a model parameter library;
and step five, carrying out load identification on the waveform data of the voltage and the current within a period of time to be analyzed according to the model parameter library established in the step four, and obtaining the load proportion constitution condition within the period of time.
2. The method according to claim 1, wherein the first step comprises:
n electric load appliances to be identified are combined to obtain n 2 -1 combined load category with no load of 0;
acquiring a sample data set within a preset time window and a preset duration of each load type, wherein each sample in the sample data set comprises the following load characteristics: values of active power, reactive power, fundamental current, second harmonic, third harmonic, fifth harmonic, seventh harmonic, ninth harmonic, eleventh harmonic, thirteenth harmonic, voltage, current and phase of the electrical load.
3. The method according to claim 1, wherein the second step comprises the following steps:
s2.1: presetting a maximum threshold value of each load characteristic in a sample data set, carrying out abnormal value processing on the acquired sample data set, and deleting the sample when one-dimensional load characteristics exceed the corresponding threshold value, wherein the dimension is the characteristic number;
s2.2: taking out the maximum value x of each dimension in the load characteristic for all sample data sets max And the minimum value x min Carrying out normalization operation on the data of each dimension of the load characteristic by adopting a formula (1) to obtain a preprocessed sample data set;
Figure RE-FDA0002152815020000021
wherein x represents corresponding dimension data, namely corresponding load characteristic parameter values, and x' represents corresponding dimension data after preprocessing.
4. The method according to claim 1, wherein the step three, the feature selection specifically comprises the following processes:
s3.1: processing the preprocessed sample data set:
firstly, respectively obtaining the sequence of load characteristics by adopting a univariate characteristic selection method based on a random forest model, an overall characteristic selection method based on the random forest model and a chi-square detection method, and selecting a plurality of load characteristics in front of various processing methods;
secondly, load characteristics with the same quantity as the load characteristics of the first three processing methods are obtained by adopting a recursive characteristic elimination method;
s3.2: and (3) selecting the load characteristics obtained by the four processing methods in the step (3.1), adding the load characteristics contained in the results of any three methods into finally-reserved load characteristic data, wherein the finally-reserved load characteristic data are omega.
5. The non-intrusive load identification method based on the data mining technology as defined in claim 4, wherein in step S3.1, the first three methods select the load characteristics with the top seven ranks.
6. The non-intrusive load identification method based on the data mining technology as claimed in claim 4, wherein in the step S3.1, the recursive feature elimination method specifically comprises: and (3) processing the preprocessed sample data set by adopting a sampling recursive feature elimination method, wherein the recursive feature elimination method obtains the sequencing of the load features according to the built model, then selects the one-dimensional features which can identify and obtain the most electric appliances, then reconstructs the model on the rest load features, and repeats the recursion in the above way until the selected feature quantity is consistent with the quantity obtained by the first three processing methods.
7. The method according to claim 1, wherein the fourth step specifically comprises the following steps:
s4.1: randomly extracting 70% of samples in the reconstructed sample data set as N training sample sets with the samples replaced for N times;
s4.2: establishing N corresponding CART trees aiming at N training sample sets;
s4.3: pruning the N CART trees, including height limitation and minimum splitting number limitation of leaf nodes, to obtain N cut CART trees, and establishing a random forest model;
s4.4: and taking out the respective division attributes and values of the N cut CART trees in the random forest model, and adding the division attributes and values into a model parameter library.
8. The method according to claim 1, wherein the step five specifically comprises the following steps:
s5.1: for voltage and current waveform data of each time window in a period of time to be analyzed, omega load characteristic data corresponding to a model parameter library are obtained, and the maximum value x of each dimension of the load characteristic is determined max And the minimum value x min Carrying out normalization operation on omega load characteristics by adopting a formula (1) to obtain a data set to be analyzed;
s5.2: carrying out load identification on the data set to be analyzed according to the model parameter library to obtain N load identification results, and taking the load identification result with the largest occurrence frequency as a final load identification result;
s5.3: and calculating the load proportion composition condition in a period of time to be analyzed according to the load identification result of each time window and the active power corresponding to the time window.
9. The method of claim 8, wherein the step of calculating the load proportion composition comprises the following steps:
if the load identification result of a certain time window is of a single electrical appliance, the load electricity consumption W in the time window i The power consumption W of the load is obtained from the active power value multiplied by the time window value on each time window i The calculation formula is as follows:
W i =p i ×t (2)
wherein: i represents the number of the time window; p is a radical of i The active power in the data of the corresponding time window is represented, the unit of t is second, the value of the time window is represented, and one piece of data is obtained every t seconds;
if the load identification result of a certain time window is the combined electrical appliance, the electric quantity W commonly used by the combined electrical appliance of the time window is obtained according to the formula (2) i Dividing the electric quantity W according to the ratio of the average active power values of various electric appliances related to other time windows in the time period in the combined electric appliance i The calculation formula of the load electricity consumption is as follows:
Figure RE-FDA0002152815020000041
wherein: j represents the number of the appliance, m represents the total number of appliances,
Figure RE-FDA0002152815020000042
the average active power W of each time window of the electric appliance j representing the other identification results of the electric appliance j in the period of time as a single electric appliance ij The calculated load electricity consumption of the jth electrical appliance in the ith time window is used;
then, adding the electricity consumption of each electric appliance related to each time window in the period of time to obtain the electricity consumption of each electric appliance in the period of time;
and finally, calculating the ratio of the load electricity consumption to the total electricity consumption in the period of time to obtain the load proportion composition condition in the period of time to be analyzed.
10. A non-intrusive load identification device based on a data mining technology is characterized in that: the device includes:
a sample acquisition module: the method comprises the steps of acquiring a sample data set of load characteristics;
a preprocessing module: the preprocessing operation is used for preprocessing the sample data set;
a sample recombination module: performing feature selection on the preprocessed sample data set, and reconstructing the sample data set based on the load feature data reserved after the feature selection;
a parameter base establishing module: establishing a random forest model by taking the CART trees as weak classifiers based on the reconstructed sample data set, and reserving the partition attribute and the attribute value of each CART tree to form a model parameter library;
a load identification module: and carrying out load identification on voltage and current waveform data to be analyzed in a period of time according to a model parameter library, and calculating the load proportion composition condition in the period of time.
CN201910267587.5A 2019-04-03 2019-04-03 Non-invasive load identification method and device based on data mining technology Active CN110287762B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910267587.5A CN110287762B (en) 2019-04-03 2019-04-03 Non-invasive load identification method and device based on data mining technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910267587.5A CN110287762B (en) 2019-04-03 2019-04-03 Non-invasive load identification method and device based on data mining technology

Publications (2)

Publication Number Publication Date
CN110287762A CN110287762A (en) 2019-09-27
CN110287762B true CN110287762B (en) 2023-01-20

Family

ID=68001313

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910267587.5A Active CN110287762B (en) 2019-04-03 2019-04-03 Non-invasive load identification method and device based on data mining technology

Country Status (1)

Country Link
CN (1) CN110287762B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898694B (en) * 2020-08-07 2021-09-17 广东电网有限责任公司计量中心 Non-invasive load identification method and device based on random tree classification
CN112288598A (en) * 2020-12-24 2021-01-29 中国电力科学研究院有限公司 Method and system for determining composition of load element of transformer substation
CN113239976A (en) * 2021-04-22 2021-08-10 湘潭大学 GRU non-invasive load identification method based on preferential binary classification
CN114880948A (en) * 2022-06-02 2022-08-09 国网重庆市电力公司电力科学研究院 Harmonic prediction modeling method and system based on random forest optimization algorithm
CN115166625A (en) * 2022-07-06 2022-10-11 云南电网有限责任公司电力科学研究院 Intelligent ammeter error estimation method and device
CN116204784B (en) * 2022-12-30 2023-09-08 成都天仁民防科技有限公司 DAS-based subway tunnel external hazard operation intrusion recognition method
CN117807598A (en) * 2024-02-29 2024-04-02 典基网络科技(上海)有限公司 Method and device for detecting malicious software

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007130978A (en) * 2005-11-14 2007-05-31 Sumitomo Heavy Ind Ltd Control method of injection moulding machine
CN107273920A (en) * 2017-05-27 2017-10-20 西安交通大学 A kind of non-intrusion type household electrical appliance recognition methods based on random forest

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007130978A (en) * 2005-11-14 2007-05-31 Sumitomo Heavy Ind Ltd Control method of injection moulding machine
CN107273920A (en) * 2017-05-27 2017-10-20 西安交通大学 A kind of non-intrusion type household electrical appliance recognition methods based on random forest

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于模糊聚类与随机森林的短期负荷预测;黄青平等;《电测与仪表》;20171210(第23期);全文 *

Also Published As

Publication number Publication date
CN110287762A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
CN110287762B (en) Non-invasive load identification method and device based on data mining technology
CN109543943B (en) Electric price checking execution method based on big data deep learning
CN107273920A (en) A kind of non-intrusion type household electrical appliance recognition methods based on random forest
CN110544177A (en) Load identification method based on power fingerprint and computer readable storage medium
CN110991263B (en) Non-invasive load identification method and system for resisting background load interference
CN111722028A (en) Load identification method based on high-frequency data
CN109840691A (en) Non-intrusion type subitem electricity estimation method based on deep neural network
CN116401532B (en) Method and system for recognizing frequency instability of power system after disturbance
CN113030564A (en) Load identification method based on double-core intelligent electric meter system
Chakraborty et al. Random forest based fault classification technique for active power system networks
CN111612074A (en) Identification method and device of non-invasive load monitoring electric equipment and related equipment
CN114859169A (en) Intelligent identification method and system for distribution transformer outgoing line load and storage medium
CN112085111A (en) Load identification method and device
Hernandez et al. Development of a non-intrusive load monitoring (nilm) with unknown loads using support vector machine
CN116861316B (en) Electrical appliance monitoring method and device
CN113779328A (en) Power supply monitoring data integration processing method, system, terminal and storage medium
CN116883059B (en) Distribution terminal management method and system
CN113076354A (en) User electricity consumption data analysis method and device based on non-invasive load monitoring
CN112595918A (en) Low-voltage meter reading fault detection method and device
CN115828091A (en) Non-invasive load identification method and system based on end-cloud cooperation
CN116340724A (en) ICCEEMDAN-LRTC-based power load completion method and system
CN115796937A (en) Big data complex relevance electric power supply and demand trend analysis method and device
CN113671287B (en) Intelligent detection method, system and readable storage medium for power grid automation terminal
CN114186631A (en) Load identification method based on non-invasive intelligent terminal
CN113902136A (en) Load identification method based on electric power fingerprint features and integrated learning mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant