CN114154617A - Low-voltage resident user abnormal electricity utilization identification method and system based on VFL - Google Patents
Low-voltage resident user abnormal electricity utilization identification method and system based on VFL Download PDFInfo
- Publication number
- CN114154617A CN114154617A CN202111256656.6A CN202111256656A CN114154617A CN 114154617 A CN114154617 A CN 114154617A CN 202111256656 A CN202111256656 A CN 202111256656A CN 114154617 A CN114154617 A CN 114154617A
- Authority
- CN
- China
- Prior art keywords
- data
- abnormal
- user
- electricity
- low
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000005611 electricity Effects 0.000 title claims abstract description 178
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 131
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 23
- 238000007781 pre-processing Methods 0.000 claims abstract description 23
- 238000013528 artificial neural network Methods 0.000 claims abstract description 7
- 238000005070 sampling Methods 0.000 claims abstract description 5
- 238000012545 processing Methods 0.000 claims description 21
- 230000005856 abnormality Effects 0.000 claims description 17
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 238000013499 data model Methods 0.000 claims description 5
- 238000011428 standard deviation standardization method Methods 0.000 claims description 2
- 230000008901 benefit Effects 0.000 description 7
- 239000002131 composite material Substances 0.000 description 7
- 230000002265 prevention Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 230000006872 improvement Effects 0.000 description 5
- 239000000463 material Substances 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 230000004927 fusion Effects 0.000 description 4
- 238000007689 inspection Methods 0.000 description 4
- 238000007418 data mining Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000556 factor analysis Methods 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 230000001617 migratory effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000001932 seasonal effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Software Systems (AREA)
- Strategic Management (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Educational Administration (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Biophysics (AREA)
- Development Economics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Entrepreneurship & Innovation (AREA)
- Tourism & Hospitality (AREA)
- Life Sciences & Earth Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A low-voltage resident user abnormal electricity utilization identification method and system based on VFL comprises the following steps: step 1, collecting historical electricity utilization data of a low-voltage resident user with set duration, importing the historical electricity utilization data into a database, and preprocessing the electricity utilization data; step 2, extracting feature data capable of representing the electricity utilization mode of the low-voltage residential users according to the electricity utilization data obtained through preprocessing in the step 1, and step 3, extracting abnormal electricity utilization features of the low-voltage residential users in four dimensions of global abnormity, local abnormity, region space and time sequence by using the feature data obtained in the step 2, and performing longitudinal federation; step 4, constructing a convolutional neural network model, and performing hierarchical sampling, neural network structure description and training method configuration on the data processed in the step to complete the training of the model; and 5, working by using the model, loading the trained model, and inputting the power utilization data to be judged so as to finish judging the abnormal power utilization condition of the user.
Description
Technical Field
The invention belongs to the technical field of power distribution, and particularly relates to a low-voltage residential user abnormal power utilization identification method and system based on VFL.
Background
With the continuous improvement of the informatization degree of the power system and the rapid increase of the data volume of the power distribution and utilization, the algorithm suitable for power distribution and utilization data mining is researched, an effective knowledge discovery model is established, the method has important significance for power distribution and utilization business mode innovation and intelligent power grid development, and the establishment of the data mining model based on the existing power big data is the development trend of the existing intelligent power grid.
However, as the smart grid is rapidly developed, massive power consumption data are collected, and a solid data base is provided for big data analysis in a power consumption link, but only accumulating data and not utilizing data is still an important problem faced by power enterprises. In the face of the increase of mass power utilization data, most of the existing power departments only use the traditional statistical method to perform anomaly analysis, and the event information stored behind the anomaly data cannot be effectively extracted.
In recent years, power consumption identification and power consumption fraud detection based on data mining theory have been proposed in succession. The existing non-technical loss detection methods are mainly classified into clustering and classification. The former is an unsupervised learning method, and the knowledge structure of given data can be directly learned without learning a training data set with class labels; the latter, a supervised learning approach, requires class labels. Here, the training data set, which is usually abnormal and normal in electricity, is used to train a model, and then the trained model is used to identify whether the existing user is abnormal. The present invention belongs to the latter.
Document 1(CN110321934A) in the prior art is a technical scheme for power consumption identification using clustering, and provides a method for detecting abnormal data of user power consumption, where K-means algorithm is used for clustering calculation, and data sets of clustering centers satisfying that the number of noise points is greater than a preset limit are all used as abnormal power consumption data sets and output. The prior art document 1 has the disadvantages that firstly, the K value is required to be set firstly by using the K-means algorithm for clustering calculation, namely the number of the types of the divided data sets, the accuracy of the clustering algorithm is influenced by the value of the K, and the uncertainty of the distribution state of the point data set samples has great trouble in dividing the number of the types; secondly, the selection of the initial value of the sample center in the iterative process affects the stability of the final result, the final clustering results of different sample centers have differences, and if the initial sample center selection deviates from the globally optimal calculation region, the final result may cause the occurrence of a locally optimal solution.
The prior art document 2(CN112101420A) is a technical scheme for identifying classified power consumption, and provides an abnormal power consumption user identification method based on a Stacking integration algorithm under a dissimilar model, wherein a power consumption feature index is established in three dimensions of recording conditions of power consumption load data of a single user, time series division statistics and user power consumption similarity in a power consumption information acquisition system, a user power consumption feature set is extracted, and deep-level features of data are more effectively mined. The prior art document 2 has the disadvantages that user data privacy protection is not taken into account, model training can be performed only locally, and multiple participants cannot cooperate, so that the machine learning effect is improved.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a low-voltage resident user abnormal electricity utilization identification method and system based on VFL (Vertical fed Learning), and provides a supervised Learning method for abnormal electricity utilization identification according to electricity load information of power users. The operation cost of the power company can be effectively reduced, on one hand, a large amount of manpower and material resources can be saved, and on the other hand, the loss caused by electricity stealing can be reduced. The model uses algorithm modules such as a support vector machine, local abnormal factor analysis, similarity measurement of users in the same station area, correlation change rate measurement of the most similar user and the like. And extracting four-dimensional composite features based on the algorithm to carry out longitudinal federation, and describing the user abnormal degree from four angles of global abnormality, local abnormality, region space and time sequence.
The invention adopts the following technical scheme. The invention provides a VFL-based method for identifying abnormal electricity consumption of low-voltage residential users, which comprises the following steps of:
step 3, extracting abnormal electricity utilization characteristics of low-voltage residential users with four dimensions of global abnormity, local abnormity, region space and time sequence by using the characteristic data obtained in the step 2, and carrying out longitudinal federation;
step 4, constructing a convolutional neural network model, and performing hierarchical sampling, neural network structure description and training method configuration on the data processed in the step to complete the training of the model;
and 5, working by using the model, loading the trained model, and inputting the power utilization data to be judged so as to finish judging the abnormal power utilization condition of the user.
Preferably, the pre-treatment comprises: missing value processing, outlier processing, and data normalization for subsequent use.
Preferably, step 1 specifically comprises:
step 1.1, collecting historical electricity utilization data of a set duration of a low-voltage resident user;
step 1.2, processing missing values of data by adopting a Lagrange interpolation method;
step 1.3, processing abnormal data values, and correcting the abnormal values by adopting the average value of the front observation value and the rear observation value;
and step 1.4, carrying out standardization treatment by adopting a standard deviation standardization method.
Preferably, in step 2, with the electricity consumption data obtained through the preprocessing in step 1, day feature data and month feature data capable of representing the electricity consumption pattern of the low-voltage residential users are extracted, wherein the day feature data refers to daily electricity consumption of the low-voltage residential users, and the month feature data refers to daily average electricity consumption of the low-voltage residential users per month, namely, the total electricity consumption of the low-voltage residential users per month is divided by the number of days of the month.
Preferably, step 3 specifically comprises:
extracting global abnormal features by using the daily feature data obtained in the step 2, and performing binary classification to obtain a label of each user power consumption abnormality and a probability value of the user abnormal degree, wherein the label and the probability value are expressed by the following formula
In the formula:
x represents a user electricity data value,
y represents whether abnormal power consumption is performed, 1 represents abnormal power consumption,
pr (y is 1| x) represents a conditional probability that the electricity consumer is abnormal electricity consumption,
PA,B[f(x)]a sigmoid function is represented as a function,
(x) a data model representing electricity usage by the user,
e denotes a natural constant.
Preferably, step 3 specifically comprises: and (3) extracting local abnormal features by using the daily feature data obtained in the step (2), extracting local abnormal factor features by using an LOF algorithm, and judging that the electricity utilization data of a certain low-voltage residential user is an abnormal electricity utilization user when the LOF value of the electricity utilization data is higher than a set value.
Preferably, step 3 specifically comprises: and (3) extracting regional space abnormal features by using the monthly feature data obtained in the step (2), and judging the abnormal electricity utilization user when the electricity utilization data mode of a certain user is inconsistent with that of most users in the same region.
Preferably, step 3 specifically comprises: and (3) extracting time series abnormal features by using the monthly feature data obtained in the step (2), and judging the abnormal power utilization user when the power utilization load mode of a certain user is greatly different from the power utilization load mode of the initial most relevant user.
Preferably, step 5 specifically includes:
and 5.1, operating the model, inputting the number of the cell to be verified and the power utilization data file, judging whether the model exists by the system, and if so, sequentially executing data preprocessing and feature extraction.
And 5.2, if the model does not exist after the judgment, prompting that the model does not exist, prompting a user to upload data for model training, and executing the step 5.1 after the training is finished.
And 5.3, reading the trained model from the database, loading the trained model into the system through deserialization, and operating the model to obtain a result.
A second aspect of the present invention provides a VFL-based low voltage resident abnormal electricity usage recognition system using the VFL-based low voltage resident abnormal electricity usage recognition method as set forth in any one of claims 1 to 9, comprising: a data acquisition module, a data preprocessing module and a longitudinal federal module,
the method is characterized in that:
the data acquisition module is used for acquiring the electricity utilization data of low-voltage resident users;
the data preprocessing module is used for preprocessing the electricity utilization data of the low-voltage resident users, and comprises missing value processing, abnormal value processing and data normalization;
the longitudinal federation module is used for extracting low-voltage resident user abnormal electricity utilization characteristics with four dimensions of global abnormality, local abnormality, region space and time sequence and carrying out longitudinal federation;
the model generation module is used for constructing a convolutional neural network model and finishing the training of the convolutional neural network model by receiving the output of the longitudinal federal module;
and the low-voltage resident user abnormal electricity utilization identification module is used for loading the trained convolutional neural network model, receiving the electricity utilization data obtained by the data acquisition module and used for judging the low-voltage resident user, and judging whether abnormal electricity utilization exists.
Compared with the prior art, the method has the advantages that in order to protect user privacy and data safety, a model based on longitudinal federal learning and a convolutional neural network is adopted.
The adoption of longitudinal federated learning helps to reduce risks and costs brought by a traditional machine learning model, and the federated learning adopts distributed data and utilizes a plurality of clients to cooperatively train the model under the coordination of a central server according to the principle of centralized collection and data minimization.
Meanwhile, the convolutional neural network is a multilayer supervised learning neural network and is essentially a multilayer perceptron, wherein a convolutional layer and a pooling layer are cores of the convolutional neural network and can realize the feature vector extraction of an input network.
The method extracts four user characteristic dimensions of global abnormity, local abnormity, region space and time sequence to carry out longitudinal federation, and realizes joint training of a plurality of data owners on the premise of guaranteeing information safety during big data exchange, protecting privacy of terminal data and member data and guaranteeing legal compliance.
Compared with the traditional electricity larceny prevention method which extremely depends on the report and the periodic patrol of the user. The low-voltage resident user abnormal electricity utilization identification model based on longitudinal federal learning has a relatively active working mode and less workload. And has great improvement on the purpose and the timeliness. The manpower and material resources cost of the power company is greatly reduced, and the loss caused by electricity stealing is reduced. The invention has important practical significance.
Abnormal electricity utilization identification is carried out by combining a longitudinal federal learning technology with low-voltage resident user multi-dimensional composite characteristics, and because a single data model cannot accurately classify normal electricity utilization users and abnormal electricity utilization users, the accuracy can be further improved by extracting the multi-dimensional composite characteristics by using different models and classifying from multiple aspects. The model provided by the invention can fully utilize massive power utilization data and effectively extract event information behind abnormal data.
By adopting the low-voltage resident abnormal electricity utilization identification method and system based on longitudinal federal learning, the abnormal electricity utilization behavior of the user is identified, and the following effects can be realized:
1) the workload required by the electricity stealing prevention work is reduced, the accuracy and the timeliness of the electricity stealing prevention work are enhanced, the abnormal electricity consumption suspected user is quickly positioned, and the case processing speed term definition is improved;
2) the accuracy of anti-electricity-stealing work is improved through a convolutional neural network training model, the positioning success rate is improved, and unnecessary inspection is reduced; the method has deterrence to users with electricity stealing ideas and reduces the occurrence of electricity stealing behaviors.
3) The user privacy is fully protected by the federal learning technology, and cooperation is carried out by combining a plurality of participants, so that the model training and abnormal electricity utilization recognition effects are improved.
In conclusion, a quick and accurate abnormal electricity utilization identification system is essential for an electric power company, compared with the traditional field inspection, the method has good timeliness and accuracy, the operation cost of the electric power company can be effectively reduced, on one hand, a large amount of manpower and material resources can be saved, and on the other hand, the loss caused by electricity stealing can be reduced. These indicate that the abnormal electricity consumption identification system has great economic benefits.
Drawings
FIG. 1 is a flow chart of a low-voltage resident user abnormal electricity utilization identification method based on longitudinal federal learning, provided by the invention;
FIG. 2 is a flow chart of data preprocessing in the abnormal electricity consumption identification method provided by the present invention;
FIG. 3 is a flow chart of composite feature extraction in the abnormal electricity consumption identification method provided by the invention;
FIG. 4 is a flowchart of model training in the abnormal electricity consumption recognition method according to the present invention;
FIG. 5 is a flowchart illustrating the operation of a model in the abnormal electricity consumption recognition method according to the present invention;
fig. 6 is a structural diagram of a convolutional neural network used in the abnormal electricity consumption identification method provided by the present invention.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present application is not limited thereby.
As shown in fig. 1, embodiment 1 of the present invention provides a VFL-based method for identifying abnormal electricity consumption of low-voltage residential users, comprising the steps of:
The step 1 specifically comprises the following steps:
step 1.1, collecting historical electricity consumption data of a set duration of a low-voltage resident user, wherein a preferable but non-limiting implementation mode is that the historical electricity consumption data of the low-voltage resident user in the past 2 years are collected, source data in an Excel format are imported into a MySQL database, and daily electricity consumption data of the low-voltage resident user are obtained according to daily freezing indications of an electric energy meter.
And step 1.2, processing missing data values, wherein partial incomplete and abnormal data exist in the massive electric power original data, and the model training efficiency is seriously influenced. In a further preferred embodiment of the invention, the missing values are processed using lagrange interpolation.
And step 1.3, processing abnormal data values, and correcting the abnormal values by adopting the average value of the front observation value and the rear observation value aiming at the abnormal values of the daily electricity consumption data of the low-voltage resident users.
Another preferred but non-limiting embodiment is the deletion of electricity usage data for low voltage residential users who cannot be used to train the model, comprising: according to the low-voltage resident user deletion, a preferred but non-limiting implementation mode is that for low-voltage resident users with electricity consumption data reaching a set proportion or low-voltage resident users with history data all being zero, all history electricity consumption data of the low-voltage resident users are deleted, namely the low-voltage resident users are deleted; and deleting data according to the date, and deleting all historical electricity utilization data of a single day when the lack of electricity utilization data reaches a set proportion, namely deleting the single day.
And 1.4, carrying out normalized processing on the daily electricity consumption data of the low-voltage resident users, wherein a preferable but non-limiting implementation mode is that a standard deviation normalization method is adopted, and the processed data conform to standard normal distribution, namely the average value is 0 and the standard deviation is 1.
And 2, extracting characteristic quantities capable of representing the electricity utilization mode of the low-voltage resident users by using the electricity utilization data obtained through the preprocessing in the step 1, wherein the characteristic quantities comprise day characteristic data and month characteristic data, the day characteristic data refer to the daily electricity utilization quantity of the low-voltage resident users, the month characteristic data refer to the monthly daily average electricity utilization quantity of the low-voltage resident users, and the monthly average electricity utilization quantity refers to the total electricity utilization quantity of each month of the low-voltage resident users divided by the number of days in the month.
And 3, extracting abnormal electricity utilization characteristics of the low-voltage residential users in four dimensions of global abnormity, local abnormity, region space and time sequence by using the daily characteristic data and the monthly characteristic data obtained in the step 2, and performing longitudinal federation.
In embodiments of the present invention, federated learning enables numerous clients, such as, but not limited to, mobile devices or entire organizations, to collaborate training models under the coordination of a central server, such as, but not limited to, a service provider, while keeping the training data dispersed. This makes it a substantial improvement in security and privacy. And the modeling effect of the federal learning is slightly improved compared with the traditional method while the requirement on privacy is met.
Federal learning is subdivided into horizontal federal learning, vertical federal learning, and federal migratory learning. Wherein horizontal federated learning is suitable for cases where the overlap of user features is large and the overlap of user features is small for 2 data sets, vertical federated learning is suitable for cases where the overlap of user features is large and the overlap of user features is small for 2 data sets, and federated migration learning is suitable for cases where the overlap of user features and user features for 2 data sets is small. And the electricity utilization data has the data distribution situation that the user overlapping part of the 2 data sets is large and the user characteristic overlapping part is small, so longitudinal federal learning is further selected to be used.
And simultaneously extracting multi-dimensional composite features, and performing feature fusion from four features of global abnormality, local abnormality, region space and time sequence.
The step 3 specifically comprises the following steps:
and (3) extracting global abnormal features by using the daily feature data obtained in the step (2), and performing binary classification on the data set by using a C-SVC model in the LIBSVM library to obtain a label of each user power consumption abnormality and a probability value of the user abnormal degree. Is expressed by the following formula
In the formula:
x represents a user electricity data value,
y represents whether abnormal power consumption is performed, 1 represents abnormal power consumption,
pr (y is 1| x) represents a conditional probability that the electricity consumer is abnormal electricity consumption,
PA,B[f(x)]a sigmoid function is represented as a function,
(x) a data model representing electricity usage by the user,
e denotes a natural constant.
For massive user data, from the perspective of global anomaly, an LR anomaly detection or SVM processing method can be generally used, and compared with the LR method, the SVM uses hinge loss, has strong generalization capability and is sensitive to an abnormal value, and can obtain a more ideal result. And in order to realize the compensation of the missing data, a Lagrangian interpolation method is used for realizing the compensation.
And (3) extracting local abnormal features by using the daily feature data obtained in the step (2), and enhancing and improving the extracted global abnormal features. Due to SVM limitation, part of data can be misclassified as normal users by extracting the probability characteristic of abnormal power utilization of the users, and actually, the users are abnormal users. That is, SVM detection also has a disadvantage that there is no abnormality in the partial data as a whole, but an abnormality is displayed in the partial view. For this purpose, the abnormal electricity consumption information needs to be further analyzed.
In a preferred but non-limiting embodiment, the LOF method is used at this stage to identify the abnormal user, i.e. local abnormal factor features are extracted by an LOF algorithm, and when the LOF value of the electricity consumption data of a certain low-voltage residential user is higher than a set value, the user is determined to be the abnormal electricity consumption user. According to the invention, local anomaly detection is realized by adopting an LOF algorithm, and the accuracy of the local anomaly detection is improved.
And (3) extracting regional space abnormal features by using the monthly feature data obtained in the step (2), extracting correlation degree set features of electricity utilization data of similar users, and judging that the user is abnormal electricity utilization when the electricity utilization load of the user is greatly different from that of most users in the same region.
More specifically, from the perspective of geographical space, a correlation metric of similar user electricity utilization data is calculated, so as to reflect the degree of abnormality of the user, and when a user electricity load is greatly different from a surrounding electricity load, there may be a case of abnormal electricity utilization. Therefore, the correlation coefficient principle is adopted to calculate the correlation degree set characteristics of the electricity utilization data of the similar users. Since the electricity consumption data hardly satisfy the normal distribution condition, the pearson correlation coefficient is used in the technique. When the electricity data mode of a certain user is not consistent with that of most users in the same area, the abnormal electricity user is judged to be represented by the following formula,
in the formula:
similarity _ r represents the similarity between the measured user electricity consumption data and the average user electricity consumption data in the same area,
cov{x,xmeanthe covariance of the electricity consumption data of the tested user and the average electricity consumption data of the users in the same area is represented,
x represents the electricity consumption data of the tested user,
xmeanrepresents average electricity usage data of users under the same area,
σxthe standard deviation of the electricity consumption data of the tested user is shown,
σmeanand (3) standard deviation of average electricity consumption data of users in the same area.
And (3) extracting time series abnormal features by using the monthly feature data obtained in the step (2), calculating the change rate of the correlation of the most relevant users by using pearson correlation coefficients and calculating the change rate of the correlation of the most relevant users because the data are in nonlinear arrangement, starting from the time series by using the method, thereby measuring the abnormal degree of power utilization, and judging that the user is abnormal power utilization when the power utilization load mode of the user is obviously different from the initial most relevant user.
More specifically, from the time series perspective, the date is divided into two series D1 and D2, the correlation change rate measure of the most relevant user is calculated, the result can indicate the degree of abnormality of the user, for example, when the power load pattern of a certain user is greatly different from the power load pattern of the initial most relevant user, the user is determined to be an abnormal power user, and the power load pattern is expressed by the following formula,
in the formula:
the change _ rate represents the time series before and after, the rate of increase or decrease of the correlation coefficient,
indicating the correlation coefficient between the user i and the maximum correlation user j in the second half time sequence,
it is shown that in the first half of the time series,the correlation coefficient of user i with the largest correlated user j.
And performing feature fusion according to the features obtained by the four feature dimensions, namely performing multi-layer fusion of the features and then performing prediction in an early fusion mode. The technical scheme provided by the invention can fully utilize massive power utilization data, develop multi-party model training in advance under the condition of ensuring data safety, reduce privacy risks and cost brought by a traditional mode, furthest exert the advantages of each party platform and realize abnormal power utilization identification.
Step 4, constructing a convolutional neural network model, and performing hierarchical sampling, neural network structure description and configuration training method on the data processed in the step; the training of the model can be completed through the steps. The method comprises the following specific steps:
layered sampling: the training data is divided into groups of train/test pairs, and the groups can be set according to needs and default to 10. And set their proportions, train _ size 0.8, test _ size 0.2, and random tree seed random _ state, which helps the model to train the results the same each time with the same training set, and the prediction results for the test set, otherwise there will be fluctuations in accuracy.
The neural network structure that can be obtained by the model () method is shown in fig. 6, where the first column represents the state; the second column represents the size of the convolved output, and the calculation formula is: (input size-convolution kernel size +2 xpanding)/step size + 1; the third column shows the output parameters, and the calculation formula is: convolution kernel size 2x number of channels x number of filters + number of filters.
The training method comprises the following steps: the invention selects SGD as a training optimizer, the learning rate is 0.1, and the SGD has the advantages of fast calculation, automatic saddle point escape and automatic poor local optimum point escape. The loss function selects a binary cross entropy, when the binary cross entropy is adopted, the gradient of the last layer of weight is not related to the derivative of the activation function any more, and is only in direct proportion to the difference value between the output value and the true value, and then the convergence is faster. And the reverse propagation is multiply-by-multiply, so the updating of the whole weight matrix is accelerated. Furthermore, the multi-class cross-entropy loss derivation is simpler, with the loss being related only to the probability of the correct class. And the loss is simple to derive from the input of the active layer.
More specifically, the data are input into original data according to the cell number classification, the original data are preprocessed and then input into a convolutional neural network in a three-dimensional numpy array form, and training is carried out to obtain a model, the power utilization data trend distribution is changed due to seasonal changes, and the data of the same cell can train a plurality of models for selective use. The specific flow is shown in fig. 4.
And 5, working by using the model. And loading the trained model, and inputting the original data to finish the judgment of the abnormal power utilization condition of the user.
The step 5 specifically comprises the following steps:
and 5.1, operating the model, inputting the number of the cell to be verified and the power utilization data file, judging whether the model exists by the system, and if so, sequentially executing data preprocessing and feature extraction.
And 5.2, if the model does not exist after the judgment, prompting that the model does not exist, prompting a user to upload data for model training, and executing the step 5.1 after the training is finished.
And 5.3, reading the trained model from the database, loading the trained model into the system through deserialization, and operating the model to obtain a result. The above steps are shown in fig. 5.
the data acquisition module is used for acquiring the electricity utilization data of low-voltage resident users;
the data preprocessing module is used for preprocessing the electricity utilization data of the low-voltage resident users, and comprises missing value processing, abnormal value processing and data normalization;
the longitudinal federation module is used for extracting low-voltage resident user abnormal electricity utilization characteristics with four dimensions of global abnormality, local abnormality, region space and time sequence and carrying out longitudinal federation;
the model generation module is used for constructing a convolutional neural network model and finishing the training of the convolutional neural network model by receiving the output of the longitudinal federal module;
and the low-voltage resident user abnormal electricity utilization identification module is used for loading the trained convolutional neural network model, receiving the electricity utilization data obtained by the data acquisition module and used for judging the low-voltage resident user, and judging whether abnormal electricity utilization exists.
Compared with the prior art, the method has the advantages that in order to protect user privacy and data safety, a model based on longitudinal federal learning and a convolutional neural network is adopted.
The adoption of longitudinal federated learning helps to reduce risks and costs brought by a traditional machine learning model, and the federated learning adopts distributed data and utilizes a plurality of clients to cooperatively train the model under the coordination of a central server according to the principle of centralized collection and data minimization.
Meanwhile, the convolutional neural network is a multilayer supervised learning neural network and is essentially a multilayer perceptron, wherein a convolutional layer and a pooling layer are cores of the convolutional neural network and can realize the feature vector extraction of an input network.
The method extracts four user characteristic dimensions of global abnormity, local abnormity, region space and time sequence to carry out longitudinal federation, and realizes joint training of a plurality of data owners on the premise of guaranteeing information safety during big data exchange, protecting privacy of terminal data and member data and guaranteeing legal compliance.
Compared with the traditional electricity larceny prevention method which extremely depends on the report and the periodic patrol of the user. The low-voltage resident user abnormal electricity utilization identification model based on longitudinal federal learning has a relatively active working mode and less workload. And has great improvement on the purpose and the timeliness. The manpower and material resources cost of the power company is greatly reduced, and the loss caused by electricity stealing is reduced. The invention has important practical significance.
Abnormal electricity utilization identification is carried out by combining a longitudinal federal learning technology with low-voltage resident user multi-dimensional composite characteristics, and because a single data model cannot accurately classify normal electricity utilization users and abnormal electricity utilization users, the accuracy can be further improved by extracting the multi-dimensional composite characteristics by using different models and classifying from multiple aspects. The model provided by the invention can fully utilize massive power utilization data and effectively extract event information behind abnormal data.
By adopting the low-voltage resident abnormal electricity utilization identification method and system based on longitudinal federal learning, the abnormal electricity utilization behavior of the user is identified, and the following effects can be realized:
1) the workload required by the electricity stealing prevention work is reduced, the accuracy and the timeliness of the electricity stealing prevention work are enhanced, the abnormal electricity consumption suspected user is quickly positioned, and the case processing speed term definition is improved;
2) abnormal information is fully mined by using a federal learning technology, the positioning success rate is improved, and unnecessary inspection is reduced;
3) the accuracy of electricity stealing prevention work is improved through the convolutional neural network training model, deterrence is provided for users with electricity stealing ideas, and the occurrence of electricity stealing behaviors is reduced.
In particular, compared to prior art document 2, the VFL employed in the present invention has several major features:
(1) all data are kept locally, so that privacy is not disclosed and regulations are not violated;
(2) a plurality of participants jointly establish a virtual common model by data, so that respective use purposes are realized to benefit together;
(3) the modeling effect of federated learning is similar to that of traditional deep learning;
(4) the federation is a data federation, and different federations have different operation frameworks and serve different operation purposes. For example, different alliances can be formed in the power industry, and various business requirements of customers can be ascertained through modeling. Meanwhile, across industries, different purpose federals can be formed, such as the power industry and the financial industry, and the modeling can be used for expanding demand examination and approval of the power industry customer industry and providing evaluation support for enterprise credit for banks.
In conclusion, a quick and accurate abnormal electricity utilization identification system is essential for an electric power company, compared with the traditional field inspection, the method has good timeliness and accuracy, the operation cost of the electric power company can be effectively reduced, on one hand, a large amount of manpower and material resources can be saved, and on the other hand, the loss caused by electricity stealing can be reduced. These indicate that the abnormal electricity consumption identification system has great economic benefits.
The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are merely preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for limiting the scope of the present invention, and on the contrary, any improvement or modification made based on the spirit of the present invention should fall within the scope of the present invention.
Claims (10)
1. A low-voltage resident user abnormal electricity utilization identification method based on VFL is characterized by comprising the following steps:
step 1, collecting historical electricity utilization data of a low-voltage resident user with set duration, importing the historical electricity utilization data into a database, and preprocessing the electricity utilization data;
step 2, extracting characteristic data capable of representing the electricity consumption mode of the low-voltage resident users according to the electricity consumption data obtained through the preprocessing in the step 1,
step 3, extracting abnormal electricity utilization characteristics of low-voltage residential users with four dimensions of global abnormity, local abnormity, region space and time sequence by using the characteristic data obtained in the step 2, and carrying out longitudinal federation;
step 4, constructing a convolutional neural network model, and performing hierarchical sampling, neural network structure description and training method configuration on the data processed in the step to complete the training of the model;
and 5, working by using the model, loading the trained model, and inputting the power utilization data to be judged so as to finish judging the abnormal power utilization condition of the user.
2. A VFL-based low voltage residential customer abnormal electricity usage recognition method as claimed in claim 1, wherein:
the pretreatment comprises the following steps: missing value processing, outlier processing, and data normalization for subsequent use.
3. A VFL based low voltage resident user abnormal electricity consumption recognizing method in accordance with claim 1 or 2, wherein:
the step 1 specifically comprises the following steps:
step 1.1, collecting historical electricity utilization data of a set duration of a low-voltage resident user;
step 1.2, processing missing values of data by adopting a Lagrange interpolation method;
step 1.3, processing abnormal data values, and correcting the abnormal values by adopting the average value of the front observation value and the rear observation value;
and step 1.4, carrying out standardization treatment by adopting a standard deviation standardization method.
4. A VFL-based low voltage residential customer abnormal electricity usage recognition method as claimed in claim 3, wherein:
and 2, extracting daily characteristic data and monthly characteristic data which can represent the electricity utilization mode of the low-voltage resident users according to the electricity utilization data obtained through the preprocessing in the step 1, wherein the daily characteristic data refers to the daily electricity consumption of the low-voltage resident users, and the monthly characteristic data refers to the monthly daily average electricity consumption of the low-voltage resident users, namely, the total electricity consumption of each month of the low-voltage resident users is divided by the number of days of the month.
5. A VFL-based low voltage resident user abnormal electricity usage recognizing method in accordance with claim 4, wherein:
the step 3 specifically comprises the following steps:
extracting global abnormal features by using the daily feature data obtained in the step 2, and performing binary classification to obtain a label of each user power consumption abnormality and a probability value of the user abnormal degree, wherein the label and the probability value are expressed by the following formula
In the formula:
x represents a user electricity data value,
y represents whether abnormal power consumption is performed, 1 represents abnormal power consumption,
pr (y is 1| x) represents a conditional probability that the electricity consumer is abnormal electricity consumption,
PA,B[f(x)]a sigmoid function is represented as a function,
(x) a data model representing electricity usage by the user,
e denotes a natural constant.
6. A VFL-based low voltage resident user abnormal electricity usage recognizing method in accordance with claim 4, wherein:
the step 3 specifically comprises the following steps: and (3) extracting local abnormal features by using the daily feature data obtained in the step (2), extracting local abnormal factor features by using an LOF algorithm, and judging that the electricity utilization data of a certain low-voltage residential user is an abnormal electricity utilization user when the LOF value of the electricity utilization data is higher than a set value.
7. A VFL-based low voltage resident user abnormal electricity usage recognizing method in accordance with claim 4, wherein:
the step 3 specifically comprises the following steps: and (3) extracting regional space abnormal features by using the monthly feature data obtained in the step (2), and judging the abnormal electricity utilization user when the electricity utilization data mode of a certain user is inconsistent with that of most users in the same region.
8. A VFL-based low voltage resident user abnormal electricity usage recognizing method in accordance with claim 4, wherein:
the step 3 specifically comprises the following steps: and (3) extracting time series abnormal features by using the monthly feature data obtained in the step (2), and judging the abnormal power utilization user when the power utilization load mode of a certain user is greatly different from the power utilization load mode of the initial most relevant user.
9. A VFL-based low voltage resident user abnormal electricity usage recognizing method in accordance with any one of claims 5 to 8, wherein:
the step 5 specifically comprises the following steps:
and 5.1, operating the model, inputting the number of the cell to be verified and the power utilization data file, judging whether the model exists by the system, and if so, sequentially executing data preprocessing and feature extraction.
And 5.2, if the model does not exist after the judgment, prompting that the model does not exist, prompting a user to upload data for model training, and executing the step 5.1 after the training is finished.
And 5.3, reading the trained model from the database, loading the trained model into the system through deserialization, and operating the model to obtain a result.
10. A VFL-based low voltage resident abnormal electricity usage recognition system using the VFL-based low voltage resident abnormal electricity usage recognition method according to any one of claims 1 to 9, comprising: a data acquisition module, a data preprocessing module and a longitudinal federal module,
the method is characterized in that:
the data acquisition module is used for acquiring the electricity utilization data of low-voltage resident users;
the data preprocessing module is used for preprocessing the electricity utilization data of the low-voltage resident users, and comprises missing value processing, abnormal value processing and data normalization;
the longitudinal federation module is used for extracting low-voltage resident user abnormal electricity utilization characteristics with four dimensions of global abnormality, local abnormality, region space and time sequence and carrying out longitudinal federation;
the model generation module is used for constructing a convolutional neural network model and finishing the training of the convolutional neural network model by receiving the output of the longitudinal federal module;
and the low-voltage resident user abnormal electricity utilization identification module is used for loading the trained convolutional neural network model, receiving the electricity utilization data obtained by the data acquisition module and used for judging the low-voltage resident user, and judging whether abnormal electricity utilization exists.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111256656.6A CN114154617A (en) | 2021-10-27 | 2021-10-27 | Low-voltage resident user abnormal electricity utilization identification method and system based on VFL |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111256656.6A CN114154617A (en) | 2021-10-27 | 2021-10-27 | Low-voltage resident user abnormal electricity utilization identification method and system based on VFL |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114154617A true CN114154617A (en) | 2022-03-08 |
Family
ID=80458437
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111256656.6A Pending CN114154617A (en) | 2021-10-27 | 2021-10-27 | Low-voltage resident user abnormal electricity utilization identification method and system based on VFL |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114154617A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114331761A (en) * | 2022-03-15 | 2022-04-12 | 浙江万胜智能科技股份有限公司 | Equipment parameter analysis and adjustment method and system for special transformer acquisition terminal |
-
2021
- 2021-10-27 CN CN202111256656.6A patent/CN114154617A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114331761A (en) * | 2022-03-15 | 2022-04-12 | 浙江万胜智能科技股份有限公司 | Equipment parameter analysis and adjustment method and system for special transformer acquisition terminal |
CN114331761B (en) * | 2022-03-15 | 2022-07-08 | 浙江万胜智能科技股份有限公司 | Equipment parameter analysis and adjustment method and system for special transformer acquisition terminal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097297B (en) | Multi-dimensional electricity stealing situation intelligent sensing method, system, equipment and medium | |
Buzau et al. | Hybrid deep neural networks for detection of non-technical losses in electricity smart meters | |
CN109255506B (en) | Internet financial user loan overdue prediction method based on big data | |
CN110223168B (en) | Label propagation anti-fraud detection method and system based on enterprise relationship map | |
CN111882446B (en) | Abnormal account detection method based on graph convolution network | |
CN110852856B (en) | Invoice false invoice identification method based on dynamic network representation | |
Alzate et al. | Improved electricity load forecasting via kernel spectral clustering of smart meters | |
CN106570778A (en) | Big data-based data integration and line loss analysis and calculation method | |
CN112132233A (en) | Criminal personnel dangerous behavior prediction method and system based on effective influence factors | |
CN111681022A (en) | Network platform data resource value evaluation method | |
CN115545886A (en) | Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium | |
CN110009427B (en) | Intelligent electric power sale amount prediction method based on deep circulation neural network | |
CN114154617A (en) | Low-voltage resident user abnormal electricity utilization identification method and system based on VFL | |
CN116401601B (en) | Power failure sensitive user handling method based on logistic regression model | |
CN115905319B (en) | Automatic identification method and system for abnormal electricity fees of massive users | |
CN116451125A (en) | New energy vehicle owner identification method, device, equipment and storage medium | |
Aquize et al. | Self-organizing maps for anomaly detection in fuel consumption. Case study: Illegal fuel storage in Bolivia | |
CN116821759A (en) | Identification prediction method and device for category labels, processor and electronic equipment | |
CN114723554B (en) | Abnormal account identification method and device | |
CN114372835B (en) | Comprehensive energy service potential customer identification method, system and computer equipment | |
CN112256735B (en) | Power consumption monitoring method and device, computer equipment and storage medium | |
CN114626940A (en) | Data analysis method and device and electronic equipment | |
CN113435494A (en) | Low-voltage resident user abnormal electricity utilization identification method and simulation system | |
CN113379211A (en) | Block chain-based logistics information platform default risk management and control system and method | |
Peiyi et al. | Analysis and research on enterprise resumption of work and production based on K-means clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |