CN115329663A - Key feature selection method and device for processing power load monitoring sparse data - Google Patents

Key feature selection method and device for processing power load monitoring sparse data Download PDF

Info

Publication number
CN115329663A
CN115329663A CN202210885026.3A CN202210885026A CN115329663A CN 115329663 A CN115329663 A CN 115329663A CN 202210885026 A CN202210885026 A CN 202210885026A CN 115329663 A CN115329663 A CN 115329663A
Authority
CN
China
Prior art keywords
data
features
bsf
power load
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210885026.3A
Other languages
Chinese (zh)
Inventor
汤向华
王栋
吴迪
施雄杰
张丽娟
汪家钰
俞天鹤
罗飞
陈飞龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Nantong Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Nantong Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority to CN202210885026.3A priority Critical patent/CN115329663A/en
Publication of CN115329663A publication Critical patent/CN115329663A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pure & Applied Mathematics (AREA)
  • Medical Informatics (AREA)
  • General Business, Economics & Management (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Computational Mathematics (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Algebra (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Remote Monitoring And Control Of Power-Distribution Networks (AREA)

Abstract

The invention discloses a key feature selection method and a device for processing power load monitoring sparse data, which reduce the subsequent calculation amount of data set analysis and improve the working efficiency by performing feature selection on the power load monitoring sparse data; the key features are selected, and irrelevant features and redundant features are removed, so that the accuracy of machine learning training and prediction is improved; the large redundancy of the high-dimensional heterogeneous data can find the relationship between the scenes and the associated features in the feature analysis of each typical scene of the power system, so that the method has pertinence in the actual analysis and can obviously shorten the processing time; by the mutual matching of the key feature selection method and the device, the problems that the electric power data sample is difficult to obtain and has real-time performance, and data loss is caused by system failure or external interference possibly encountered in various links such as acquisition, transmission, storage and the like are solved; and the correct operation decision can be made through the accurate analysis of the incomplete data.

Description

Key feature selection method and device for processing power load monitoring sparse data
Technical Field
The invention relates to the field of feature selection in the field of power load data mining, in particular to a key feature selection method and device for processing power load monitoring sparse data.
Background
In recent years, with the wide application of power grid digital technology, when high-proportion power electronic equipment is connected to a power grid in a large scale, massive multi-source heterogeneous data is generated. High-dimensional heterogeneous data has a large amount of redundancy, the relationship between a scene and the associated features of the scene cannot be found in the feature analysis of each typical scene of the power system, the pertinence is not only lacked in the actual analysis, and the processing time is also obviously increased. In addition, the electric power data sample is difficult to obtain and has real-time performance, and system faults or external interference may occur in various links such as collection, transmission and storage, so that data loss is caused. And making a correct operation decision by means of inaccurate analysis of incomplete data.
A method for processing power quality data as described in publication No. CN110084408A, comprising: a step of blocking: receiving power quality data acquired by a computer public network or a power quality monitoring platform, wherein the power quality data comprises spatial information, time information and event information, grouping the power quality data according to the spatial information, grouping the power quality data with the same spatial information into the same group, and dividing each group of power quality data according to time intervals; a cleaning step: acquiring the electric energy quality data after the blocking according to the blocking step, and cleaning the electric energy quality data by using a blocking fusion method; and (3) an analysis step: and analyzing the power quality data obtained in the cleaning step by adopting a statistical model. The method for processing the high-speed power quality data is provided, the power quality data is linked with environment information of different positions through the cause and effect, and the visualization of the processing and analysis results of the power quality data is realized.
A method and system for processing measurement data related to an electric power network or other electrical devices by using machine learning techniques and providing abnormal event detection from the electrical measurement data is also described in publication number WO2022074400 A1. According to a first aspect, a method of processing high resolution electrical measurement data may comprise obtaining high resolution electrical measurement data relating to time series data of an electrical or other parameter measured from an electrical grid system or other electrical device, wherein the time series data comprises a first set of data points. The time series data may be converted to feature vector format data, where the time series data is grouped into a plurality of data sets, each data set representing a subset of the first set of data points. A statistical data clustering scheme may be performed to generate different clustering patterns from the feature vector format data as cluster data, the cluster data including a first cluster related to the first electrical trend and a second cluster related to a second cluster different from the second electrical trend. A first electrical trend, wherein the cluster data comprises an anomalous data pattern that is part of the first cluster or the second cluster, and the anomalous data pattern is remote from its respective cluster center. The anomalous event detection can be based at least in part on the anomalous data cluster data including a first cluster associated with a first electrical trend and a second cluster associated with a second electrical trend different from the first electrical trend, wherein the cluster data includes anomalous data patterns that are partial clusters of the first or second electrical trends and the anomalous data patterns are remote from their respective cluster centers. The anomalous event detection can be based at least in part on the anomalous data cluster data including a first cluster associated with a first electrical trend and a second cluster associated with a second electrical trend different from the first electrical trend, wherein the cluster data includes anomalous data patterns that are partial clusters of the first or second electrical trends and the anomalous data patterns are remote from their respective cluster centers. The abnormal event detection may be based at least in part on abnormal data.
In summary, the technical problems to be solved by the present invention are:
1) A large amount of redundancy exists in high-dimensional heterogeneous data, the relationship between a scene and associated features of the scene cannot be found accurately in feature analysis of each typical scene of a power system, the pertinence is lacked in actual analysis, and the processing time is obviously increased;
2) The electric power data sample is difficult to obtain and real-time, and data loss is caused by system failure or external interference in various links such as acquisition, transmission, storage and the like;
3) And making a correct operation decision by means of inaccurate analysis of incomplete data.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a key feature selection method and device for processing power load monitoring sparse data, so as to solve the problems.
In order to achieve the above object, the present invention is achieved by the following technical solutions.
The key feature selection method for processing the power load monitoring sparse data comprises the following steps of: acquiring power load monitoring sparse data F, forming monitoring data input sparse flow characteristics, and constructing a buffer matrix B; putting the buffer matrix B into a pre-trained hidden feature filling model, calculating missing values and filling the missing values into a complete matrix
Figure RE-GDA0003870970170000031
And carrying out flow characteristic selection on the complete matrix and storing an optimal characteristic subset BSF.
Preferably, the acquired power load monitoring sparse data F includes steady-state characteristic data before the fault and transient-state characteristic data after the fault in M rows and N columns, and the constructed buffer matrix B is used for caching newly arrived sparse flow characteristics in M rows and Bs columns, where Bs < < N.
Preferably, after the buffer matrix is full, the buffer matrix B is put into a pre-trained hidden feature filling model, PM × k and QN × k are randomly generated, R = B, and the objective function is optimized through Cauchy loss. Prediction matrix
Figure RE-GDA0003870970170000032
Filling the predicted value in the missing position to obtain a complete matrix
Figure RE-GDA0003870970170000033
Figure RE-GDA0003870970170000034
Wherein λ P and λ Q are regularization parameters corresponding to P and Q, γ is a constant, Ω M × N is an indication matrix, and when the corresponding position in R has a monitoring value of 1, otherwise it is 0.
Preferably, for complete matrices
Figure RE-GDA0003870970170000035
Develop flow feature selection, pair
Figure RE-GDA0003870970170000036
Analyzing the real-time conditions of medium-sized feature simulation one by one, firstly performing correlation analysis on newly arrived features, calculating the correlation between the features and the labels by using Fisher-z test, returning a p value, and transferring a significance level alpha through a fuzzy membership function to enable the alpha to fluctuate between 0.01 and 0.1;
judging whether p < alpha is true, if so, carrying out redundancy analysis on the characteristic;
otherwise, judging whether alpha < p <0.1 is established or not, and if so, performing fuzzy correlation analysis on the new features;
otherwise the new feature will be discarded.
Preferably, the redundancy analysis is carried out on the new characteristics with the p value smaller than alpha. The method mainly comprises two steps: firstly, calculating the redundancy between the new feature and the existing features in the optimal feature subset BSF (initialized to be empty) through Fisher-z test, if the redundancy exists, discarding the new feature, otherwise, adding the new feature into the BSF; secondly, calculating whether existing features in the BSF become redundant due to the arrival of new features, and if the existing features are redundant, discarding the features;
preferably, fuzzy correlation analysis is carried out on the features which are not independent or correlated, the dependency degree of the features is calculated through a near neighbor rough set, and the features are added into the fuzzy correlation feature subset FSF and are sorted. Adding BSF to the front-ranked | BSF | or 2 in the FSF until no new features flow in; the above steps are repeated until no new features flow in, and finally the BSF is output.
The invention also provides a key characteristic selection device for processing the sparse data of the power load monitoring, which comprises a data buffer module, a data completion module and a flow characteristic selection module which are sequentially connected,
the data buffering module: the method is used for acquiring sparse data of power load monitoring and caching real-time data into a buffer matrix;
a data completion module: the method comprises the steps of putting a sparse buffer matrix into a pre-trained hidden feature model, calculating missing values and filling the missing values into a complete matrix;
a stream feature selection module: and carrying out feature selection on the complete matrix, and storing the result into an optimal feature subset, wherein the stream feature selection module comprises a correlation analysis unit, a redundancy analysis unit, a fuzzy correlation analysis unit and a storage unit.
Preferably, in the data buffering module, the acquisition of the real-time sparse data includes steady-state characteristic data before the fault and transient-state characteristic data after the fault. The columns of the buffer matrix used to buffer the real-time sparse data are much smaller than the columns of the entire power load data set.
Preferably, the data completion module generates a predicted value of the missing position by inputting the buffer matrix into the trained implicit feature model, and fills the predicted value into the missing position.
Preferably, the correlation analysis unit: for correlation analysis of features in the newly influent buffer matrix, the returned p-value is calculated by Fisher-z test. Let α denote the significance level of the fuzzy correlation. If p < α, then enter redundancy analysis, if α < p <0.1, then perform fuzzy correlation analysis, otherwise the feature is discarded.
Preferably, the redundancy analysis unit: respectively calculating whether the new features entering the redundancy analysis are redundant with the existing features in the BSF, if so, discarding the new features, and otherwise, adding the new features into the BSF; and whether to make the original features in the BSF redundant, and if so, to discard the original features.
Preferably, the fuzzy correlation analysis unit: and (5) fuzzy correlation analysis. And calculating the dependency degree of the new characteristics which are neither related nor independent with the label, and sequencing. After no new features flow in, the new features are sorted, half of the size of the first BSF is taken and stored in the BSF, and the storage unit: for storing the BSF, FSF generated after each cell execution.
Compared with the prior art, the invention discloses a key feature selection method and device for processing the power load monitoring sparse data, which reduces the subsequent calculation amount of data set analysis and improves the working efficiency by selecting the features of the power load monitoring sparse data; the key features are selected, and irrelevant features and redundant features are removed, so that the accuracy of machine learning training and prediction is improved;
the large redundancy of the high-dimensional heterogeneous data can find the relationship between the scenes and the associated features in the feature analysis of each typical scene of the power system, so that the method has pertinence in the actual analysis and can obviously shorten the processing time;
by the mutual matching of the key feature selection method and the device, the problems that the electric power data sample is difficult to obtain and has real-time performance, and data loss is caused by system failure or external interference possibly encountered in various links such as acquisition, transmission, storage and the like are solved;
and the correct operation decision can be made through the accurate analysis of the incomplete data.
Drawings
FIG. 1 is a flow chart of the steps of a key feature selection method of the present invention for processing power load monitoring sparse data;
FIG. 2 is a block diagram of a key feature selection device for processing sparse data for power load monitoring according to the present invention;
FIG. 3 is a comparison of the accuracy of monitoring sparse data feature selection performed by the present invention with four comparison algorithms.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
The key feature selection method for processing the power load monitoring sparse data comprises the steps of obtaining the power load monitoring sparse data F, forming monitoring data input sparse flow features and constructing a buffer matrix B; putting the buffer matrix B into a pre-trained hidden feature filling model, calculating missing values and filling the missing values into a complete matrix
Figure RE-GDA0003870970170000061
And carrying out flow characteristic selection on the complete matrix and storing the flow characteristic selection into an optimal characteristic subset BSF.
The method comprises the steps of obtaining power load monitoring sparse data F in M rows and N columns, wherein the power load monitoring sparse data F comprises steady-state characteristic data before failure and transient-state characteristic data after failure, and constructing a buffer matrix B in M rows and Bs columns for caching newly arrived sparse flow characteristics, wherein Bs < < N.
Putting the buffer matrix B into a pre-trained hidden feature model, calculating a missing value and filling the missing value into a complete matrix
Figure RE-GDA0003870970170000062
The missing value calculation steps are as follows:
the method comprises the following steps: PM × k and QN × k are randomly generated, let R = B, and the following objective function is optimized by cauchy loss:
Figure RE-GDA0003870970170000071
wherein λ P and λ Q are regularization parameters corresponding to P and Q, γ is a constant, Ω M × N is an indication matrix, and when the corresponding position in R has a monitoring value of 1, otherwise it is 0;
step two: prediction matrix
Figure RE-GDA0003870970170000072
Filling the predicted value in the missing position to obtain a complete matrix
Figure RE-GDA0003870970170000073
For the complete matrix
Figure RE-GDA0003870970170000074
Developing stream feature selection, comprising the steps of:
the method comprises the following steps: and (3) correlation analysis: when new features flow in, calculating feature correlation through Fisher-z inspection, returning p value, making alpha represent significance level of fuzzy correlation, if p < alpha, entering redundancy analysis, if alpha < p <0.1, performing fuzzy correlation analysis, otherwise discarding the features;
step two: and (3) redundancy analysis: respectively calculating whether the new features entering the redundancy analysis are redundant with the existing features in the BSF, if so, discarding the new features, otherwise, adding the new features into the BSF; and whether the original features in the BSF are made redundant or not, if so, the original features are discarded;
step three: fuzzy correlation analysis: calculating the dependency degree of the new features subjected to the step three and the label, adding the new features into the fuzzy correlation feature subset FSF, sequencing the features, and taking half of the size of the previous BSF to store in the BSF after no new features flow in;
the above steps are repeated until no new features flow in, and finally the BSF is output.
The key feature selection device for processing the sparse data of the power load monitoring comprises a data buffer module, a data completion module and a flow feature selection module which are sequentially connected,
the data buffer module is used for acquiring sparse data of power load monitoring and caching real-time data into a buffer matrix;
the data completion module is used for putting the sparse buffer matrix into a pre-trained hidden feature model, calculating a missing value and filling the missing value into a complete matrix;
the stream characteristic selection module is used for carrying out characteristic selection on the complete matrix and storing the result into the optimal characteristic subset,
the stream feature selection module comprises a correlation analysis unit, a redundancy analysis unit, a fuzzy correlation analysis unit and a storage unit.
In the data buffering module, the acquisition of the monitoring sparse data comprises steady-state characteristic data before failure and transient-state characteristic data after failure, and the column of a buffering matrix for buffering the real-time monitoring sparse data is far smaller than that of the whole power load data set.
And the data completion module generates a predicted value of the missing position by putting the buffer matrix into the trained hidden feature model and fills the predicted value into the missing position.
The correlation analysis unit is used for carrying out correlation analysis on the characteristics in the newly-flowed buffer matrix, calculating a returned p value through Fisher-z inspection, enabling alpha to represent the significance level of fuzzy correlation, transferring alpha to be 0.01-0.1 by a fuzzy membership function, entering redundancy analysis if p is less than alpha, carrying out fuzzy correlation analysis if alpha is less than p is less than 0.1, and otherwise discarding the characteristics.
The redundancy analysis unit is used for respectively calculating whether the new features entering the redundancy analysis are redundant with the features already existing in the BSF, if so, discarding the new features, and otherwise, adding the new features into the BSF; and whether to make the original features in the BSF redundant, and if so, to discard the original features.
The fuzzy correlation analysis unit is used for carrying out fuzzy correlation analysis, calculating the dependency degree of the new features in the third step and storing the new features and the label into a fuzzy correlation feature subset FSF, sorting the new features, and after no new features flow in, sorting the new features, and storing half of the size of the previous BSF into the BSF; a storage unit: for storing the BSF, FSF generated after each cell execution.
The key feature selection method for processing the power load monitoring sparse data comprises the following steps:
s101, inputting sparse data through power load monitoring F. And automatically acquiring power load operation data in real time through equipment to serve as original input data. The data may be steady state characteristic data before the fault or transient characteristic data after the fault. And because links such as collection, transmission, storage break down, usually have sparse data of missing value.
And S102, storing the data into a buffer matrix B. For data generated in real time, a buffer matrix B is arranged to buffer the arriving data, when the buffer matrix is full, the process goes to step S103, and step S102 continues to buffer the arriving data.
S103, inputting a pre-trained hidden feature model to obtain a complete matrix. Let R = B by randomly generating PM × k and QN × k, and optimize the objective function by cauchy loss.
Figure RE-GDA0003870970170000091
Wherein λ P and λ Q are regularization parameters corresponding to P and Q, γ is a constant, Ω mxn is an indicator matrix, and is 1 when the corresponding position in R has a monitoring value, otherwise it is 0. Prediction matrix
Figure RE-GDA0003870970170000092
Filling the predicted value in the missing position to obtain a complete matrix
Figure RE-GDA0003870970170000093
And S104, the complete matrix flows out new features f one by one.
And S105, evaluating the correlation between the new features and the tags, and returning a p value. For the new inflow feature f, its correlation with the target is calculated by Fisher-z test and p value is returned. The significance level alpha is mobilized through a trapezoidal fuzzy membership function. The Fisher-z test formula is as follows:
Figure RE-GDA0003870970170000094
in the formula, N represents an example number, z is a condition characteristic, and xi is a partial correlation coefficient.
S106, when p is smaller than alpha, f has correlation with the target, and the process goes to step S107, otherwise, the process goes to step S111.
And S107, evaluating the redundancy between the new feature f and the existing features in the optimal feature subset BSF. And calculating the independence between the new feature f and the existing features in the BSF through Fisher-z test, wherein if the new feature f is independent, the new feature is not redundant.
And S108, judging whether the new feature f is redundant. If not, the process proceeds to step S110, otherwise, the process proceeds to step S109.
S109, the new feature f is discarded, and the process proceeds to step S114.
And S110, adding the BSF into the new features, evaluating the redundancy among the existing features, and discarding the redundant features. The specific method is the same as step S107. Subsequently, the process proceeds to step S114.
And S111, judging that alpha < p <0.1, if the alpha < p <0.1 is not satisfied, entering S112, and if the alpha < p > is not satisfied, entering S113.
S112, discard the new feature f, and proceed to step S114.
And S113, calculating the dependency of the new feature f and storing the dependency in a fuzzy correlation feature subset FSF. The way the dependencies are computed by blurring the coarse set is as follows:
Figure RE-GDA0003870970170000101
i.e. the ratio of the lower approximation to the upper corpus of the feature f.
And S114, inquiring whether new characteristics continuously flow in. When no new feature is entered, the process proceeds to step S115. Otherwise, the step S105 is continued, and feature selection is performed again on the newly flowed-in features.
And S115, adding the front | BSF |/2 fuzzy correlation features in the FSF into the BSF, wherein the front | BSF |/2 fuzzy correlation features are the finally selected optimal feature subset.
And S116, outputting the optimal feature subset BSF.
Fig. 2 shows a block diagram of a key feature selection apparatus for processing sparse data of power load monitoring in an embodiment of the present invention. The device comprises:
the data buffering module 210: the method is used for acquiring sparse data of power load monitoring and caching real-time monitoring data into a buffer matrix;
in the embodiment of the invention, the acquisition of the monitoring sparse data comprises steady-state characteristic data before failure and transient-state characteristic data after failure. The columns of the buffer matrix used to buffer the monitoring sparse data are much smaller than the columns of the entire power load data set.
The data completion module 220: the method comprises the steps of putting a sparse buffer matrix into a pre-trained hidden feature model, calculating missing values and filling the missing values into a complete matrix;
the stream feature selection module 230: and carrying out feature selection on the complete matrix, and storing the result into the optimal feature subset.
The stream feature selection module includes:
correlation analysis unit 231: for correlation analysis of features in the newly influent buffer matrix, the returned p-value is calculated by Fisher-z test. Let α denote the significance level of the fuzzy correlation. If p < α, then enter redundancy analysis, if α < p <0.1, then perform fuzzy correlation analysis, otherwise the feature is discarded.
Redundancy analysis unit 232: respectively calculating whether the new features entering the redundancy analysis are redundant with the existing features in the BSF, if so, discarding the new features, otherwise, adding the new features into the BSF; and whether to make the original features in the BSF redundant, and if so, to discard the original features.
Fuzzy correlation analysis unit 233: and (5) fuzzy correlation analysis. And calculating the dependency degree of the new features which are neither related nor independent with the label, adding the new features into the fuzzy related feature subset FSF and sequencing the new features. After no new features are fed in, the new features are sorted, and half of the size of the top BSF is taken and stored in the BSF.
The storage unit 234: for storing the BSF generated after each cell execution.
FIG. 3 is a comparison of the accuracy of sparse data feature selection with four algorithms after application of the embodiments of the present invention. The OS2FSU is the proposed method of the present invention, compared to a classical algorithm Fast-OSFS (IEEE T PATTERN ANAL, 2012) and a newly proposed algorithm SFS _ FI (IEEE TNNLS, 2020). Six datasets (COIL, lung, SMK _ CAN _191, isolet, USPS, mfeat-fac) from ASU (https:// jundongl. Githu. Io/sciit-feature/datasets. Html), UCI (http:// architecture. UCI. Edu/ml/index. Php) were verified, feature selection was performed with 50% missing dataset data, and classification training was performed using support vector machine, random forest, K nearest neighbor classification algorithms. And taking the BSF finally output by the feature selection algorithm as the input of the classifier, substituting the training samples into the three classifiers for fitting training and calling functions for hyper-parameter adjustment to obtain the model with the best effect. And predicting the test sample and calculating the accuracy. The average value was taken as the final accuracy, and the result is shown in fig. 3. The prediction accuracy of the method is obviously higher than that of the other four methods, and the key feature selection method for processing the power load monitoring sparse data can effectively improve the precision and speed of data mining.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular is intended to include the plural unless the context clearly dictates otherwise, and it should be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of features, steps, operations, devices, components, and/or combinations thereof.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The key feature selection method for processing the power load monitoring sparse data is characterized by comprising the following steps of: acquiring power load monitoring sparse data F, forming monitoring data input sparse flow characteristics, and constructing a buffer matrix B;
putting the buffer matrix B into a pre-trained hidden feature filling model, calculating missing values and filling the missing values into a complete matrix
Figure RE-FDA0003870970160000016
And carrying out flow characteristic selection on the complete matrix and storing the flow characteristic selection into an optimal characteristic subset BSF.
2. The method of claim 1, wherein the method comprises: the method comprises the steps of obtaining power load monitoring sparse data F in M rows and N columns, wherein the power load monitoring sparse data F comprises steady-state characteristic data before failure and transient-state characteristic data after failure, and constructing a buffer matrix B in M rows and Bs columns for caching newly arrived sparse flow characteristics, wherein Bs < < N.
3. The method of claim 1, wherein the method comprises: putting the buffer matrix B into a pre-trained hidden feature model, calculating missing values and filling the missing values into a complete matrix
Figure RE-FDA0003870970160000015
The missing value calculation steps are as follows:
the method comprises the following steps: randomly generating PM × k and QN × k, making R = B, and optimizing the following objective function by Cauchy loss:
Figure RE-FDA0003870970160000011
wherein λ P and λ Q are regularization parameters corresponding to P and Q, γ is a constant, Ω M × N is an indication matrix, and when the corresponding position in R has a monitoring value of 1, otherwise it is 0;
step two: prediction matrix
Figure RE-FDA0003870970160000012
Filling the predicted value in the missing position to obtain a complete matrix
Figure RE-FDA0003870970160000013
4. The method of claim 1, wherein the method comprises: for the complete matrix
Figure RE-FDA0003870970160000014
Developing stream feature selection, comprising the steps of:
the method comprises the following steps: and (3) correlation analysis: when new features flow in, calculating feature correlation through Fisher-z inspection, returning p value, making alpha represent significance level of fuzzy correlation, if p < alpha, entering redundancy analysis, if alpha < p <0.1, performing fuzzy correlation analysis, otherwise discarding the features;
step two: and (3) redundancy analysis: respectively calculating whether the new features entering the redundancy analysis are redundant with the existing features in the BSF, if so, discarding the new features, and otherwise, adding the new features into the BSF; and whether the original features in the BSF are made redundant or not, if so, the original features are discarded;
step three: fuzzy correlation analysis: calculating the dependency degree of the new features subjected to the step three and the label, adding the new features into the fuzzy correlation feature subset FSF, sequencing the features, and taking half of the size of the previous BSF to store in the BSF after no new features flow in;
the above steps are repeated until no new features flow in, and finally the BSF is output.
5. A key feature selection device for processing sparse data for power load monitoring as claimed in any one of claims 1 to 4, wherein: comprises a data buffer module, a data completion module and a stream characteristic selection module which are connected in sequence,
the data buffer module is used for acquiring sparse data of power load monitoring and caching real-time data into a buffer matrix;
the data completion module is used for putting a sparse buffer matrix into a pre-trained hidden feature model, calculating a missing value and filling the missing value into a complete matrix;
the stream characteristic selection module is used for carrying out characteristic selection on the complete matrix and storing the result into the optimal characteristic subset,
the stream feature selection module comprises a correlation analysis unit, a redundancy analysis unit, a fuzzy correlation analysis unit and a storage unit.
6. The apparatus for selecting key features of processing sparse data for power load monitoring as claimed in claim 5, wherein: in the data buffering module, the acquisition of the monitoring sparse data comprises steady-state characteristic data before failure and transient-state characteristic data after failure, and the column of a buffering matrix for buffering the real-time monitoring sparse data is far smaller than that of the whole power load data set.
7. The apparatus for selecting key features of processing sparse data for power load monitoring as claimed in claim 5, wherein: and the data completion module generates a predicted value of the missing position by putting the buffer matrix into the trained hidden feature model and fills the predicted value into the missing position.
8. A key feature selection device for processing power load monitoring sparse data as recited in claim 5, wherein: the correlation analysis unit is used for carrying out correlation analysis on the characteristics in the newly-flowed buffer matrix, calculating a returned p value through Fisher-z inspection, enabling alpha to represent the significance level of fuzzy correlation, transferring alpha to be 0.01-0.1 by a fuzzy membership function, entering redundancy analysis if p is less than alpha, carrying out fuzzy correlation analysis if alpha is less than p is less than 0.1, and otherwise discarding the characteristics.
9. A key feature selection device for processing power load monitoring sparse data as recited in claim 5, wherein: the redundancy analysis unit is used for respectively calculating whether the new features entering the redundancy analysis are redundant with the existing features in the BSF, if so, discarding the new features, and otherwise, adding the new features into the BSF; and whether to make the original features in the BSF redundant, and if so, to discard the original features.
10. The apparatus for selecting key features of processing sparse data for power load monitoring as claimed in claim 5, wherein: the fuzzy correlation analysis unit is used for carrying out fuzzy correlation analysis, calculating the dependency degree of the new features in the third step and storing the new features and the label into a fuzzy correlation feature subset FSF, sorting the new features, and after no new features flow in, sorting the new features, and storing half of the size of the previous BSF into the BSF; a storage unit: for storing the BSF, FSF generated after each cell execution.
CN202210885026.3A 2022-07-26 2022-07-26 Key feature selection method and device for processing power load monitoring sparse data Pending CN115329663A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210885026.3A CN115329663A (en) 2022-07-26 2022-07-26 Key feature selection method and device for processing power load monitoring sparse data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210885026.3A CN115329663A (en) 2022-07-26 2022-07-26 Key feature selection method and device for processing power load monitoring sparse data

Publications (1)

Publication Number Publication Date
CN115329663A true CN115329663A (en) 2022-11-11

Family

ID=83919268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210885026.3A Pending CN115329663A (en) 2022-07-26 2022-07-26 Key feature selection method and device for processing power load monitoring sparse data

Country Status (1)

Country Link
CN (1) CN115329663A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115729957A (en) * 2022-11-28 2023-03-03 安徽大学 Unknown stream feature selection method and device based on maximum information coefficient
CN116485075A (en) * 2023-04-23 2023-07-25 国网江苏省电力有限公司南通市海门区供电分公司 FTS-based power grid load prediction method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115729957A (en) * 2022-11-28 2023-03-03 安徽大学 Unknown stream feature selection method and device based on maximum information coefficient
CN115729957B (en) * 2022-11-28 2024-01-19 安徽大学 Unknown stream feature selection method and device based on maximum information coefficient
CN116485075A (en) * 2023-04-23 2023-07-25 国网江苏省电力有限公司南通市海门区供电分公司 FTS-based power grid load prediction method

Similar Documents

Publication Publication Date Title
CN115329663A (en) Key feature selection method and device for processing power load monitoring sparse data
CN107169628B (en) Power distribution network reliability assessment method based on big data mutual information attribute reduction
CN110703057B (en) Power equipment partial discharge diagnosis method based on data enhancement and neural network
CN106021771A (en) Method and device for diagnosing faults
CN107561997A (en) A kind of power equipment state monitoring method based on big data decision tree
CN112257963B (en) Defect prediction method and device based on spaceflight software defect data distribution outlier
CN107679089A (en) A kind of cleaning method for electric power sensing data, device and system
CN115278741A (en) Fault diagnosis method and device based on multi-mode data dependency relationship
CN111507504A (en) Adaboost integrated learning power grid fault diagnosis system and method based on data resampling
CN110321493A (en) A kind of abnormality detection of social networks and optimization method, system and computer equipment
CN115021679A (en) Photovoltaic equipment fault detection method based on multi-dimensional outlier detection
CN114881101A (en) Power system typical scene associated feature selection method based on bionic search
CN115130578A (en) Incremental rough clustering-based online evaluation method for state of power distribution equipment
CN114037001A (en) Mechanical pump small sample fault diagnosis method based on WGAN-GP-C and metric learning
CN113569345B (en) Numerical control system reliability modeling method and device based on multisource information fusion
Gastoni et al. Robust state-estimation procedure based on the maximum agreement between measurements
CN115033591A (en) Intelligent detection method and system for electricity charge data abnormity, storage medium and computer equipment
CN117579513B (en) Visual operation and maintenance system and method for convergence and diversion equipment
CN114062812A (en) Fault diagnosis method and system for metering cabinet
CN111966758B (en) Electric power hidden trouble investigation method based on image data analysis technology
CN117495422A (en) Cost management system and method based on power communication network construction
WO2014173271A1 (en) Optimization method and system for the number of monitoring units of digital man-machine interface
CN116720095A (en) Electrical characteristic signal clustering method for optimizing fuzzy C-means based on genetic algorithm
CN110045691A (en) A kind of multitasking fault monitoring method of multi-source heterogeneous big data
CN115392710A (en) Wind turbine generator operation decision method and system based on data filtering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination