CN114154617A - Low-voltage resident user abnormal electricity utilization identification method and system based on VFL - Google Patents

Low-voltage resident user abnormal electricity utilization identification method and system based on VFL Download PDF

Info

Publication number
CN114154617A
CN114154617A CN202111256656.6A CN202111256656A CN114154617A CN 114154617 A CN114154617 A CN 114154617A CN 202111256656 A CN202111256656 A CN 202111256656A CN 114154617 A CN114154617 A CN 114154617A
Authority
CN
China
Prior art keywords
data
abnormal
user
electricity
low
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111256656.6A
Other languages
Chinese (zh)
Inventor
何维民
赵磊
邓君华
陈奕彤
许高俊
孙莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Original Assignee
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co ltd Marketing Service Center filed Critical State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Priority to CN202111256656.6A priority Critical patent/CN114154617A/en
Publication of CN114154617A publication Critical patent/CN114154617A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Development Economics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A low-voltage resident user abnormal electricity utilization identification method and system based on VFL comprises the following steps: step 1, collecting historical electricity utilization data of a low-voltage resident user with set duration, importing the historical electricity utilization data into a database, and preprocessing the electricity utilization data; step 2, extracting feature data capable of representing the electricity utilization mode of the low-voltage residential users according to the electricity utilization data obtained through preprocessing in the step 1, and step 3, extracting abnormal electricity utilization features of the low-voltage residential users in four dimensions of global abnormity, local abnormity, region space and time sequence by using the feature data obtained in the step 2, and performing longitudinal federation; step 4, constructing a convolutional neural network model, and performing hierarchical sampling, neural network structure description and training method configuration on the data processed in the step to complete the training of the model; and 5, working by using the model, loading the trained model, and inputting the power utilization data to be judged so as to finish judging the abnormal power utilization condition of the user.

Description

Low-voltage resident user abnormal electricity utilization identification method and system based on VFL
Technical Field
The invention belongs to the technical field of power distribution, and particularly relates to a low-voltage residential user abnormal power utilization identification method and system based on VFL.
Background
With the continuous improvement of the informatization degree of the power system and the rapid increase of the data volume of the power distribution and utilization, the algorithm suitable for power distribution and utilization data mining is researched, an effective knowledge discovery model is established, the method has important significance for power distribution and utilization business mode innovation and intelligent power grid development, and the establishment of the data mining model based on the existing power big data is the development trend of the existing intelligent power grid.
However, as the smart grid is rapidly developed, massive power consumption data are collected, and a solid data base is provided for big data analysis in a power consumption link, but only accumulating data and not utilizing data is still an important problem faced by power enterprises. In the face of the increase of mass power utilization data, most of the existing power departments only use the traditional statistical method to perform anomaly analysis, and the event information stored behind the anomaly data cannot be effectively extracted.
In recent years, power consumption identification and power consumption fraud detection based on data mining theory have been proposed in succession. The existing non-technical loss detection methods are mainly classified into clustering and classification. The former is an unsupervised learning method, and the knowledge structure of given data can be directly learned without learning a training data set with class labels; the latter, a supervised learning approach, requires class labels. Here, the training data set, which is usually abnormal and normal in electricity, is used to train a model, and then the trained model is used to identify whether the existing user is abnormal. The present invention belongs to the latter.
Document 1(CN110321934A) in the prior art is a technical scheme for power consumption identification using clustering, and provides a method for detecting abnormal data of user power consumption, where K-means algorithm is used for clustering calculation, and data sets of clustering centers satisfying that the number of noise points is greater than a preset limit are all used as abnormal power consumption data sets and output. The prior art document 1 has the disadvantages that firstly, the K value is required to be set firstly by using the K-means algorithm for clustering calculation, namely the number of the types of the divided data sets, the accuracy of the clustering algorithm is influenced by the value of the K, and the uncertainty of the distribution state of the point data set samples has great trouble in dividing the number of the types; secondly, the selection of the initial value of the sample center in the iterative process affects the stability of the final result, the final clustering results of different sample centers have differences, and if the initial sample center selection deviates from the globally optimal calculation region, the final result may cause the occurrence of a locally optimal solution.
The prior art document 2(CN112101420A) is a technical scheme for identifying classified power consumption, and provides an abnormal power consumption user identification method based on a Stacking integration algorithm under a dissimilar model, wherein a power consumption feature index is established in three dimensions of recording conditions of power consumption load data of a single user, time series division statistics and user power consumption similarity in a power consumption information acquisition system, a user power consumption feature set is extracted, and deep-level features of data are more effectively mined. The prior art document 2 has the disadvantages that user data privacy protection is not taken into account, model training can be performed only locally, and multiple participants cannot cooperate, so that the machine learning effect is improved.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a low-voltage resident user abnormal electricity utilization identification method and system based on VFL (Vertical fed Learning), and provides a supervised Learning method for abnormal electricity utilization identification according to electricity load information of power users. The operation cost of the power company can be effectively reduced, on one hand, a large amount of manpower and material resources can be saved, and on the other hand, the loss caused by electricity stealing can be reduced. The model uses algorithm modules such as a support vector machine, local abnormal factor analysis, similarity measurement of users in the same station area, correlation change rate measurement of the most similar user and the like. And extracting four-dimensional composite features based on the algorithm to carry out longitudinal federation, and describing the user abnormal degree from four angles of global abnormality, local abnormality, region space and time sequence.
The invention adopts the following technical scheme. The invention provides a VFL-based method for identifying abnormal electricity consumption of low-voltage residential users, which comprises the following steps of:
step 1, collecting historical electricity utilization data of a low-voltage resident user with set duration, importing the historical electricity utilization data into a database, and preprocessing the electricity utilization data;
step 2, extracting characteristic data capable of representing the electricity consumption mode of the low-voltage resident users according to the electricity consumption data obtained through the preprocessing in the step 1,
step 3, extracting abnormal electricity utilization characteristics of low-voltage residential users with four dimensions of global abnormity, local abnormity, region space and time sequence by using the characteristic data obtained in the step 2, and carrying out longitudinal federation;
step 4, constructing a convolutional neural network model, and performing hierarchical sampling, neural network structure description and training method configuration on the data processed in the step to complete the training of the model;
and 5, working by using the model, loading the trained model, and inputting the power utilization data to be judged so as to finish judging the abnormal power utilization condition of the user.
Preferably, the pre-treatment comprises: missing value processing, outlier processing, and data normalization for subsequent use.
Preferably, step 1 specifically comprises:
step 1.1, collecting historical electricity utilization data of a set duration of a low-voltage resident user;
step 1.2, processing missing values of data by adopting a Lagrange interpolation method;
step 1.3, processing abnormal data values, and correcting the abnormal values by adopting the average value of the front observation value and the rear observation value;
and step 1.4, carrying out standardization treatment by adopting a standard deviation standardization method.
Preferably, in step 2, with the electricity consumption data obtained through the preprocessing in step 1, day feature data and month feature data capable of representing the electricity consumption pattern of the low-voltage residential users are extracted, wherein the day feature data refers to daily electricity consumption of the low-voltage residential users, and the month feature data refers to daily average electricity consumption of the low-voltage residential users per month, namely, the total electricity consumption of the low-voltage residential users per month is divided by the number of days of the month.
Preferably, step 3 specifically comprises:
extracting global abnormal features by using the daily feature data obtained in the step 2, and performing binary classification to obtain a label of each user power consumption abnormality and a probability value of the user abnormal degree, wherein the label and the probability value are expressed by the following formula
Figure RE-GDA0003473210730000031
In the formula:
x represents a user electricity data value,
y represents whether abnormal power consumption is performed, 1 represents abnormal power consumption,
pr (y is 1| x) represents a conditional probability that the electricity consumer is abnormal electricity consumption,
PA,B[f(x)]a sigmoid function is represented as a function,
(x) a data model representing electricity usage by the user,
e denotes a natural constant.
Preferably, step 3 specifically comprises: and (3) extracting local abnormal features by using the daily feature data obtained in the step (2), extracting local abnormal factor features by using an LOF algorithm, and judging that the electricity utilization data of a certain low-voltage residential user is an abnormal electricity utilization user when the LOF value of the electricity utilization data is higher than a set value.
Preferably, step 3 specifically comprises: and (3) extracting regional space abnormal features by using the monthly feature data obtained in the step (2), and judging the abnormal electricity utilization user when the electricity utilization data mode of a certain user is inconsistent with that of most users in the same region.
Preferably, step 3 specifically comprises: and (3) extracting time series abnormal features by using the monthly feature data obtained in the step (2), and judging the abnormal power utilization user when the power utilization load mode of a certain user is greatly different from the power utilization load mode of the initial most relevant user.
Preferably, step 5 specifically includes:
and 5.1, operating the model, inputting the number of the cell to be verified and the power utilization data file, judging whether the model exists by the system, and if so, sequentially executing data preprocessing and feature extraction.
And 5.2, if the model does not exist after the judgment, prompting that the model does not exist, prompting a user to upload data for model training, and executing the step 5.1 after the training is finished.
And 5.3, reading the trained model from the database, loading the trained model into the system through deserialization, and operating the model to obtain a result.
A second aspect of the present invention provides a VFL-based low voltage resident abnormal electricity usage recognition system using the VFL-based low voltage resident abnormal electricity usage recognition method as set forth in any one of claims 1 to 9, comprising: a data acquisition module, a data preprocessing module and a longitudinal federal module,
the method is characterized in that:
the data acquisition module is used for acquiring the electricity utilization data of low-voltage resident users;
the data preprocessing module is used for preprocessing the electricity utilization data of the low-voltage resident users, and comprises missing value processing, abnormal value processing and data normalization;
the longitudinal federation module is used for extracting low-voltage resident user abnormal electricity utilization characteristics with four dimensions of global abnormality, local abnormality, region space and time sequence and carrying out longitudinal federation;
the model generation module is used for constructing a convolutional neural network model and finishing the training of the convolutional neural network model by receiving the output of the longitudinal federal module;
and the low-voltage resident user abnormal electricity utilization identification module is used for loading the trained convolutional neural network model, receiving the electricity utilization data obtained by the data acquisition module and used for judging the low-voltage resident user, and judging whether abnormal electricity utilization exists.
Compared with the prior art, the method has the advantages that in order to protect user privacy and data safety, a model based on longitudinal federal learning and a convolutional neural network is adopted.
The adoption of longitudinal federated learning helps to reduce risks and costs brought by a traditional machine learning model, and the federated learning adopts distributed data and utilizes a plurality of clients to cooperatively train the model under the coordination of a central server according to the principle of centralized collection and data minimization.
Meanwhile, the convolutional neural network is a multilayer supervised learning neural network and is essentially a multilayer perceptron, wherein a convolutional layer and a pooling layer are cores of the convolutional neural network and can realize the feature vector extraction of an input network.
The method extracts four user characteristic dimensions of global abnormity, local abnormity, region space and time sequence to carry out longitudinal federation, and realizes joint training of a plurality of data owners on the premise of guaranteeing information safety during big data exchange, protecting privacy of terminal data and member data and guaranteeing legal compliance.
Compared with the traditional electricity larceny prevention method which extremely depends on the report and the periodic patrol of the user. The low-voltage resident user abnormal electricity utilization identification model based on longitudinal federal learning has a relatively active working mode and less workload. And has great improvement on the purpose and the timeliness. The manpower and material resources cost of the power company is greatly reduced, and the loss caused by electricity stealing is reduced. The invention has important practical significance.
Abnormal electricity utilization identification is carried out by combining a longitudinal federal learning technology with low-voltage resident user multi-dimensional composite characteristics, and because a single data model cannot accurately classify normal electricity utilization users and abnormal electricity utilization users, the accuracy can be further improved by extracting the multi-dimensional composite characteristics by using different models and classifying from multiple aspects. The model provided by the invention can fully utilize massive power utilization data and effectively extract event information behind abnormal data.
By adopting the low-voltage resident abnormal electricity utilization identification method and system based on longitudinal federal learning, the abnormal electricity utilization behavior of the user is identified, and the following effects can be realized:
1) the workload required by the electricity stealing prevention work is reduced, the accuracy and the timeliness of the electricity stealing prevention work are enhanced, the abnormal electricity consumption suspected user is quickly positioned, and the case processing speed term definition is improved;
2) the accuracy of anti-electricity-stealing work is improved through a convolutional neural network training model, the positioning success rate is improved, and unnecessary inspection is reduced; the method has deterrence to users with electricity stealing ideas and reduces the occurrence of electricity stealing behaviors.
3) The user privacy is fully protected by the federal learning technology, and cooperation is carried out by combining a plurality of participants, so that the model training and abnormal electricity utilization recognition effects are improved.
In conclusion, a quick and accurate abnormal electricity utilization identification system is essential for an electric power company, compared with the traditional field inspection, the method has good timeliness and accuracy, the operation cost of the electric power company can be effectively reduced, on one hand, a large amount of manpower and material resources can be saved, and on the other hand, the loss caused by electricity stealing can be reduced. These indicate that the abnormal electricity consumption identification system has great economic benefits.
Drawings
FIG. 1 is a flow chart of a low-voltage resident user abnormal electricity utilization identification method based on longitudinal federal learning, provided by the invention;
FIG. 2 is a flow chart of data preprocessing in the abnormal electricity consumption identification method provided by the present invention;
FIG. 3 is a flow chart of composite feature extraction in the abnormal electricity consumption identification method provided by the invention;
FIG. 4 is a flowchart of model training in the abnormal electricity consumption recognition method according to the present invention;
FIG. 5 is a flowchart illustrating the operation of a model in the abnormal electricity consumption recognition method according to the present invention;
fig. 6 is a structural diagram of a convolutional neural network used in the abnormal electricity consumption identification method provided by the present invention.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present application is not limited thereby.
As shown in fig. 1, embodiment 1 of the present invention provides a VFL-based method for identifying abnormal electricity consumption of low-voltage residential users, comprising the steps of:
step 1, collecting historical electricity utilization data of a set time length of a low-voltage resident user, importing the historical electricity utilization data into a database, and preprocessing the electricity utilization data, wherein the method comprises the following steps: missing value processing, outlier processing, and data normalization for subsequent use, as shown in FIG. 2.
The step 1 specifically comprises the following steps:
step 1.1, collecting historical electricity consumption data of a set duration of a low-voltage resident user, wherein a preferable but non-limiting implementation mode is that the historical electricity consumption data of the low-voltage resident user in the past 2 years are collected, source data in an Excel format are imported into a MySQL database, and daily electricity consumption data of the low-voltage resident user are obtained according to daily freezing indications of an electric energy meter.
And step 1.2, processing missing data values, wherein partial incomplete and abnormal data exist in the massive electric power original data, and the model training efficiency is seriously influenced. In a further preferred embodiment of the invention, the missing values are processed using lagrange interpolation.
And step 1.3, processing abnormal data values, and correcting the abnormal values by adopting the average value of the front observation value and the rear observation value aiming at the abnormal values of the daily electricity consumption data of the low-voltage resident users.
Another preferred but non-limiting embodiment is the deletion of electricity usage data for low voltage residential users who cannot be used to train the model, comprising: according to the low-voltage resident user deletion, a preferred but non-limiting implementation mode is that for low-voltage resident users with electricity consumption data reaching a set proportion or low-voltage resident users with history data all being zero, all history electricity consumption data of the low-voltage resident users are deleted, namely the low-voltage resident users are deleted; and deleting data according to the date, and deleting all historical electricity utilization data of a single day when the lack of electricity utilization data reaches a set proportion, namely deleting the single day.
And 1.4, carrying out normalized processing on the daily electricity consumption data of the low-voltage resident users, wherein a preferable but non-limiting implementation mode is that a standard deviation normalization method is adopted, and the processed data conform to standard normal distribution, namely the average value is 0 and the standard deviation is 1.
And 2, extracting characteristic quantities capable of representing the electricity utilization mode of the low-voltage resident users by using the electricity utilization data obtained through the preprocessing in the step 1, wherein the characteristic quantities comprise day characteristic data and month characteristic data, the day characteristic data refer to the daily electricity utilization quantity of the low-voltage resident users, the month characteristic data refer to the monthly daily average electricity utilization quantity of the low-voltage resident users, and the monthly average electricity utilization quantity refers to the total electricity utilization quantity of each month of the low-voltage resident users divided by the number of days in the month.
And 3, extracting abnormal electricity utilization characteristics of the low-voltage residential users in four dimensions of global abnormity, local abnormity, region space and time sequence by using the daily characteristic data and the monthly characteristic data obtained in the step 2, and performing longitudinal federation.
In embodiments of the present invention, federated learning enables numerous clients, such as, but not limited to, mobile devices or entire organizations, to collaborate training models under the coordination of a central server, such as, but not limited to, a service provider, while keeping the training data dispersed. This makes it a substantial improvement in security and privacy. And the modeling effect of the federal learning is slightly improved compared with the traditional method while the requirement on privacy is met.
Federal learning is subdivided into horizontal federal learning, vertical federal learning, and federal migratory learning. Wherein horizontal federated learning is suitable for cases where the overlap of user features is large and the overlap of user features is small for 2 data sets, vertical federated learning is suitable for cases where the overlap of user features is large and the overlap of user features is small for 2 data sets, and federated migration learning is suitable for cases where the overlap of user features and user features for 2 data sets is small. And the electricity utilization data has the data distribution situation that the user overlapping part of the 2 data sets is large and the user characteristic overlapping part is small, so longitudinal federal learning is further selected to be used.
And simultaneously extracting multi-dimensional composite features, and performing feature fusion from four features of global abnormality, local abnormality, region space and time sequence.
The step 3 specifically comprises the following steps:
and (3) extracting global abnormal features by using the daily feature data obtained in the step (2), and performing binary classification on the data set by using a C-SVC model in the LIBSVM library to obtain a label of each user power consumption abnormality and a probability value of the user abnormal degree. Is expressed by the following formula
Figure RE-GDA0003473210730000081
In the formula:
x represents a user electricity data value,
y represents whether abnormal power consumption is performed, 1 represents abnormal power consumption,
pr (y is 1| x) represents a conditional probability that the electricity consumer is abnormal electricity consumption,
PA,B[f(x)]a sigmoid function is represented as a function,
(x) a data model representing electricity usage by the user,
e denotes a natural constant.
For massive user data, from the perspective of global anomaly, an LR anomaly detection or SVM processing method can be generally used, and compared with the LR method, the SVM uses hinge loss, has strong generalization capability and is sensitive to an abnormal value, and can obtain a more ideal result. And in order to realize the compensation of the missing data, a Lagrangian interpolation method is used for realizing the compensation.
And (3) extracting local abnormal features by using the daily feature data obtained in the step (2), and enhancing and improving the extracted global abnormal features. Due to SVM limitation, part of data can be misclassified as normal users by extracting the probability characteristic of abnormal power utilization of the users, and actually, the users are abnormal users. That is, SVM detection also has a disadvantage that there is no abnormality in the partial data as a whole, but an abnormality is displayed in the partial view. For this purpose, the abnormal electricity consumption information needs to be further analyzed.
In a preferred but non-limiting embodiment, the LOF method is used at this stage to identify the abnormal user, i.e. local abnormal factor features are extracted by an LOF algorithm, and when the LOF value of the electricity consumption data of a certain low-voltage residential user is higher than a set value, the user is determined to be the abnormal electricity consumption user. According to the invention, local anomaly detection is realized by adopting an LOF algorithm, and the accuracy of the local anomaly detection is improved.
And (3) extracting regional space abnormal features by using the monthly feature data obtained in the step (2), extracting correlation degree set features of electricity utilization data of similar users, and judging that the user is abnormal electricity utilization when the electricity utilization load of the user is greatly different from that of most users in the same region.
More specifically, from the perspective of geographical space, a correlation metric of similar user electricity utilization data is calculated, so as to reflect the degree of abnormality of the user, and when a user electricity load is greatly different from a surrounding electricity load, there may be a case of abnormal electricity utilization. Therefore, the correlation coefficient principle is adopted to calculate the correlation degree set characteristics of the electricity utilization data of the similar users. Since the electricity consumption data hardly satisfy the normal distribution condition, the pearson correlation coefficient is used in the technique. When the electricity data mode of a certain user is not consistent with that of most users in the same area, the abnormal electricity user is judged to be represented by the following formula,
Figure RE-GDA0003473210730000091
in the formula:
similarity _ r represents the similarity between the measured user electricity consumption data and the average user electricity consumption data in the same area,
cov{x,xmeanthe covariance of the electricity consumption data of the tested user and the average electricity consumption data of the users in the same area is represented,
x represents the electricity consumption data of the tested user,
xmeanrepresents average electricity usage data of users under the same area,
σxthe standard deviation of the electricity consumption data of the tested user is shown,
σmeanand (3) standard deviation of average electricity consumption data of users in the same area.
And (3) extracting time series abnormal features by using the monthly feature data obtained in the step (2), calculating the change rate of the correlation of the most relevant users by using pearson correlation coefficients and calculating the change rate of the correlation of the most relevant users because the data are in nonlinear arrangement, starting from the time series by using the method, thereby measuring the abnormal degree of power utilization, and judging that the user is abnormal power utilization when the power utilization load mode of the user is obviously different from the initial most relevant user.
More specifically, from the time series perspective, the date is divided into two series D1 and D2, the correlation change rate measure of the most relevant user is calculated, the result can indicate the degree of abnormality of the user, for example, when the power load pattern of a certain user is greatly different from the power load pattern of the initial most relevant user, the user is determined to be an abnormal power user, and the power load pattern is expressed by the following formula,
Figure RE-GDA0003473210730000092
in the formula:
the change _ rate represents the time series before and after, the rate of increase or decrease of the correlation coefficient,
Figure RE-GDA0003473210730000093
indicating the correlation coefficient between the user i and the maximum correlation user j in the second half time sequence,
Figure RE-GDA0003473210730000101
it is shown that in the first half of the time series,the correlation coefficient of user i with the largest correlated user j.
And performing feature fusion according to the features obtained by the four feature dimensions, namely performing multi-layer fusion of the features and then performing prediction in an early fusion mode. The technical scheme provided by the invention can fully utilize massive power utilization data, develop multi-party model training in advance under the condition of ensuring data safety, reduce privacy risks and cost brought by a traditional mode, furthest exert the advantages of each party platform and realize abnormal power utilization identification.
Step 4, constructing a convolutional neural network model, and performing hierarchical sampling, neural network structure description and configuration training method on the data processed in the step; the training of the model can be completed through the steps. The method comprises the following specific steps:
layered sampling: the training data is divided into groups of train/test pairs, and the groups can be set according to needs and default to 10. And set their proportions, train _ size 0.8, test _ size 0.2, and random tree seed random _ state, which helps the model to train the results the same each time with the same training set, and the prediction results for the test set, otherwise there will be fluctuations in accuracy.
The neural network structure that can be obtained by the model () method is shown in fig. 6, where the first column represents the state; the second column represents the size of the convolved output, and the calculation formula is: (input size-convolution kernel size +2 xpanding)/step size + 1; the third column shows the output parameters, and the calculation formula is: convolution kernel size 2x number of channels x number of filters + number of filters.
The training method comprises the following steps: the invention selects SGD as a training optimizer, the learning rate is 0.1, and the SGD has the advantages of fast calculation, automatic saddle point escape and automatic poor local optimum point escape. The loss function selects a binary cross entropy, when the binary cross entropy is adopted, the gradient of the last layer of weight is not related to the derivative of the activation function any more, and is only in direct proportion to the difference value between the output value and the true value, and then the convergence is faster. And the reverse propagation is multiply-by-multiply, so the updating of the whole weight matrix is accelerated. Furthermore, the multi-class cross-entropy loss derivation is simpler, with the loss being related only to the probability of the correct class. And the loss is simple to derive from the input of the active layer.
More specifically, the data are input into original data according to the cell number classification, the original data are preprocessed and then input into a convolutional neural network in a three-dimensional numpy array form, and training is carried out to obtain a model, the power utilization data trend distribution is changed due to seasonal changes, and the data of the same cell can train a plurality of models for selective use. The specific flow is shown in fig. 4.
And 5, working by using the model. And loading the trained model, and inputting the original data to finish the judgment of the abnormal power utilization condition of the user.
The step 5 specifically comprises the following steps:
and 5.1, operating the model, inputting the number of the cell to be verified and the power utilization data file, judging whether the model exists by the system, and if so, sequentially executing data preprocessing and feature extraction.
And 5.2, if the model does not exist after the judgment, prompting that the model does not exist, prompting a user to upload data for model training, and executing the step 5.1 after the training is finished.
And 5.3, reading the trained model from the database, loading the trained model into the system through deserialization, and operating the model to obtain a result. The above steps are shown in fig. 5.
Embodiment 2 of the present invention provides a VFL-based low-voltage residential consumer abnormal electricity consumption identification system, which uses the VFL-based low-voltage residential consumer abnormal electricity consumption identification method described in embodiment 1, and includes: the system comprises a data acquisition module, a data preprocessing module and a longitudinal federal module. More specifically:
the data acquisition module is used for acquiring the electricity utilization data of low-voltage resident users;
the data preprocessing module is used for preprocessing the electricity utilization data of the low-voltage resident users, and comprises missing value processing, abnormal value processing and data normalization;
the longitudinal federation module is used for extracting low-voltage resident user abnormal electricity utilization characteristics with four dimensions of global abnormality, local abnormality, region space and time sequence and carrying out longitudinal federation;
the model generation module is used for constructing a convolutional neural network model and finishing the training of the convolutional neural network model by receiving the output of the longitudinal federal module;
and the low-voltage resident user abnormal electricity utilization identification module is used for loading the trained convolutional neural network model, receiving the electricity utilization data obtained by the data acquisition module and used for judging the low-voltage resident user, and judging whether abnormal electricity utilization exists.
Compared with the prior art, the method has the advantages that in order to protect user privacy and data safety, a model based on longitudinal federal learning and a convolutional neural network is adopted.
The adoption of longitudinal federated learning helps to reduce risks and costs brought by a traditional machine learning model, and the federated learning adopts distributed data and utilizes a plurality of clients to cooperatively train the model under the coordination of a central server according to the principle of centralized collection and data minimization.
Meanwhile, the convolutional neural network is a multilayer supervised learning neural network and is essentially a multilayer perceptron, wherein a convolutional layer and a pooling layer are cores of the convolutional neural network and can realize the feature vector extraction of an input network.
The method extracts four user characteristic dimensions of global abnormity, local abnormity, region space and time sequence to carry out longitudinal federation, and realizes joint training of a plurality of data owners on the premise of guaranteeing information safety during big data exchange, protecting privacy of terminal data and member data and guaranteeing legal compliance.
Compared with the traditional electricity larceny prevention method which extremely depends on the report and the periodic patrol of the user. The low-voltage resident user abnormal electricity utilization identification model based on longitudinal federal learning has a relatively active working mode and less workload. And has great improvement on the purpose and the timeliness. The manpower and material resources cost of the power company is greatly reduced, and the loss caused by electricity stealing is reduced. The invention has important practical significance.
Abnormal electricity utilization identification is carried out by combining a longitudinal federal learning technology with low-voltage resident user multi-dimensional composite characteristics, and because a single data model cannot accurately classify normal electricity utilization users and abnormal electricity utilization users, the accuracy can be further improved by extracting the multi-dimensional composite characteristics by using different models and classifying from multiple aspects. The model provided by the invention can fully utilize massive power utilization data and effectively extract event information behind abnormal data.
By adopting the low-voltage resident abnormal electricity utilization identification method and system based on longitudinal federal learning, the abnormal electricity utilization behavior of the user is identified, and the following effects can be realized:
1) the workload required by the electricity stealing prevention work is reduced, the accuracy and the timeliness of the electricity stealing prevention work are enhanced, the abnormal electricity consumption suspected user is quickly positioned, and the case processing speed term definition is improved;
2) abnormal information is fully mined by using a federal learning technology, the positioning success rate is improved, and unnecessary inspection is reduced;
3) the accuracy of electricity stealing prevention work is improved through the convolutional neural network training model, deterrence is provided for users with electricity stealing ideas, and the occurrence of electricity stealing behaviors is reduced.
In particular, compared to prior art document 2, the VFL employed in the present invention has several major features:
(1) all data are kept locally, so that privacy is not disclosed and regulations are not violated;
(2) a plurality of participants jointly establish a virtual common model by data, so that respective use purposes are realized to benefit together;
(3) the modeling effect of federated learning is similar to that of traditional deep learning;
(4) the federation is a data federation, and different federations have different operation frameworks and serve different operation purposes. For example, different alliances can be formed in the power industry, and various business requirements of customers can be ascertained through modeling. Meanwhile, across industries, different purpose federals can be formed, such as the power industry and the financial industry, and the modeling can be used for expanding demand examination and approval of the power industry customer industry and providing evaluation support for enterprise credit for banks.
In conclusion, a quick and accurate abnormal electricity utilization identification system is essential for an electric power company, compared with the traditional field inspection, the method has good timeliness and accuracy, the operation cost of the electric power company can be effectively reduced, on one hand, a large amount of manpower and material resources can be saved, and on the other hand, the loss caused by electricity stealing can be reduced. These indicate that the abnormal electricity consumption identification system has great economic benefits.
The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are merely preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for limiting the scope of the present invention, and on the contrary, any improvement or modification made based on the spirit of the present invention should fall within the scope of the present invention.

Claims (10)

1. A low-voltage resident user abnormal electricity utilization identification method based on VFL is characterized by comprising the following steps:
step 1, collecting historical electricity utilization data of a low-voltage resident user with set duration, importing the historical electricity utilization data into a database, and preprocessing the electricity utilization data;
step 2, extracting characteristic data capable of representing the electricity consumption mode of the low-voltage resident users according to the electricity consumption data obtained through the preprocessing in the step 1,
step 3, extracting abnormal electricity utilization characteristics of low-voltage residential users with four dimensions of global abnormity, local abnormity, region space and time sequence by using the characteristic data obtained in the step 2, and carrying out longitudinal federation;
step 4, constructing a convolutional neural network model, and performing hierarchical sampling, neural network structure description and training method configuration on the data processed in the step to complete the training of the model;
and 5, working by using the model, loading the trained model, and inputting the power utilization data to be judged so as to finish judging the abnormal power utilization condition of the user.
2. A VFL-based low voltage residential customer abnormal electricity usage recognition method as claimed in claim 1, wherein:
the pretreatment comprises the following steps: missing value processing, outlier processing, and data normalization for subsequent use.
3. A VFL based low voltage resident user abnormal electricity consumption recognizing method in accordance with claim 1 or 2, wherein:
the step 1 specifically comprises the following steps:
step 1.1, collecting historical electricity utilization data of a set duration of a low-voltage resident user;
step 1.2, processing missing values of data by adopting a Lagrange interpolation method;
step 1.3, processing abnormal data values, and correcting the abnormal values by adopting the average value of the front observation value and the rear observation value;
and step 1.4, carrying out standardization treatment by adopting a standard deviation standardization method.
4. A VFL-based low voltage residential customer abnormal electricity usage recognition method as claimed in claim 3, wherein:
and 2, extracting daily characteristic data and monthly characteristic data which can represent the electricity utilization mode of the low-voltage resident users according to the electricity utilization data obtained through the preprocessing in the step 1, wherein the daily characteristic data refers to the daily electricity consumption of the low-voltage resident users, and the monthly characteristic data refers to the monthly daily average electricity consumption of the low-voltage resident users, namely, the total electricity consumption of each month of the low-voltage resident users is divided by the number of days of the month.
5. A VFL-based low voltage resident user abnormal electricity usage recognizing method in accordance with claim 4, wherein:
the step 3 specifically comprises the following steps:
extracting global abnormal features by using the daily feature data obtained in the step 2, and performing binary classification to obtain a label of each user power consumption abnormality and a probability value of the user abnormal degree, wherein the label and the probability value are expressed by the following formula
Figure FDA0003324091570000021
In the formula:
x represents a user electricity data value,
y represents whether abnormal power consumption is performed, 1 represents abnormal power consumption,
pr (y is 1| x) represents a conditional probability that the electricity consumer is abnormal electricity consumption,
PA,B[f(x)]a sigmoid function is represented as a function,
(x) a data model representing electricity usage by the user,
e denotes a natural constant.
6. A VFL-based low voltage resident user abnormal electricity usage recognizing method in accordance with claim 4, wherein:
the step 3 specifically comprises the following steps: and (3) extracting local abnormal features by using the daily feature data obtained in the step (2), extracting local abnormal factor features by using an LOF algorithm, and judging that the electricity utilization data of a certain low-voltage residential user is an abnormal electricity utilization user when the LOF value of the electricity utilization data is higher than a set value.
7. A VFL-based low voltage resident user abnormal electricity usage recognizing method in accordance with claim 4, wherein:
the step 3 specifically comprises the following steps: and (3) extracting regional space abnormal features by using the monthly feature data obtained in the step (2), and judging the abnormal electricity utilization user when the electricity utilization data mode of a certain user is inconsistent with that of most users in the same region.
8. A VFL-based low voltage resident user abnormal electricity usage recognizing method in accordance with claim 4, wherein:
the step 3 specifically comprises the following steps: and (3) extracting time series abnormal features by using the monthly feature data obtained in the step (2), and judging the abnormal power utilization user when the power utilization load mode of a certain user is greatly different from the power utilization load mode of the initial most relevant user.
9. A VFL-based low voltage resident user abnormal electricity usage recognizing method in accordance with any one of claims 5 to 8, wherein:
the step 5 specifically comprises the following steps:
and 5.1, operating the model, inputting the number of the cell to be verified and the power utilization data file, judging whether the model exists by the system, and if so, sequentially executing data preprocessing and feature extraction.
And 5.2, if the model does not exist after the judgment, prompting that the model does not exist, prompting a user to upload data for model training, and executing the step 5.1 after the training is finished.
And 5.3, reading the trained model from the database, loading the trained model into the system through deserialization, and operating the model to obtain a result.
10. A VFL-based low voltage resident abnormal electricity usage recognition system using the VFL-based low voltage resident abnormal electricity usage recognition method according to any one of claims 1 to 9, comprising: a data acquisition module, a data preprocessing module and a longitudinal federal module,
the method is characterized in that:
the data acquisition module is used for acquiring the electricity utilization data of low-voltage resident users;
the data preprocessing module is used for preprocessing the electricity utilization data of the low-voltage resident users, and comprises missing value processing, abnormal value processing and data normalization;
the longitudinal federation module is used for extracting low-voltage resident user abnormal electricity utilization characteristics with four dimensions of global abnormality, local abnormality, region space and time sequence and carrying out longitudinal federation;
the model generation module is used for constructing a convolutional neural network model and finishing the training of the convolutional neural network model by receiving the output of the longitudinal federal module;
and the low-voltage resident user abnormal electricity utilization identification module is used for loading the trained convolutional neural network model, receiving the electricity utilization data obtained by the data acquisition module and used for judging the low-voltage resident user, and judging whether abnormal electricity utilization exists.
CN202111256656.6A 2021-10-27 2021-10-27 Low-voltage resident user abnormal electricity utilization identification method and system based on VFL Pending CN114154617A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111256656.6A CN114154617A (en) 2021-10-27 2021-10-27 Low-voltage resident user abnormal electricity utilization identification method and system based on VFL

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111256656.6A CN114154617A (en) 2021-10-27 2021-10-27 Low-voltage resident user abnormal electricity utilization identification method and system based on VFL

Publications (1)

Publication Number Publication Date
CN114154617A true CN114154617A (en) 2022-03-08

Family

ID=80458437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111256656.6A Pending CN114154617A (en) 2021-10-27 2021-10-27 Low-voltage resident user abnormal electricity utilization identification method and system based on VFL

Country Status (1)

Country Link
CN (1) CN114154617A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114331761A (en) * 2022-03-15 2022-04-12 浙江万胜智能科技股份有限公司 Equipment parameter analysis and adjustment method and system for special transformer acquisition terminal

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114331761A (en) * 2022-03-15 2022-04-12 浙江万胜智能科技股份有限公司 Equipment parameter analysis and adjustment method and system for special transformer acquisition terminal
CN114331761B (en) * 2022-03-15 2022-07-08 浙江万胜智能科技股份有限公司 Equipment parameter analysis and adjustment method and system for special transformer acquisition terminal

Similar Documents

Publication Publication Date Title
CN110097297B (en) Multi-dimensional electricity stealing situation intelligent sensing method, system, equipment and medium
Buzau et al. Hybrid deep neural networks for detection of non-technical losses in electricity smart meters
CN109255506B (en) Internet financial user loan overdue prediction method based on big data
CN110223168B (en) Label propagation anti-fraud detection method and system based on enterprise relationship map
CN111882446B (en) Abnormal account detection method based on graph convolution network
CN110852856B (en) Invoice false invoice identification method based on dynamic network representation
Alzate et al. Improved electricity load forecasting via kernel spectral clustering of smart meters
CN106570778A (en) Big data-based data integration and line loss analysis and calculation method
CN112132233A (en) Criminal personnel dangerous behavior prediction method and system based on effective influence factors
CN111681022A (en) Network platform data resource value evaluation method
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN110009427B (en) Intelligent electric power sale amount prediction method based on deep circulation neural network
CN114154617A (en) Low-voltage resident user abnormal electricity utilization identification method and system based on VFL
CN116401601B (en) Power failure sensitive user handling method based on logistic regression model
CN115905319B (en) Automatic identification method and system for abnormal electricity fees of massive users
CN116451125A (en) New energy vehicle owner identification method, device, equipment and storage medium
Aquize et al. Self-organizing maps for anomaly detection in fuel consumption. Case study: Illegal fuel storage in Bolivia
CN116821759A (en) Identification prediction method and device for category labels, processor and electronic equipment
CN114723554B (en) Abnormal account identification method and device
CN114372835B (en) Comprehensive energy service potential customer identification method, system and computer equipment
CN112256735B (en) Power consumption monitoring method and device, computer equipment and storage medium
CN114626940A (en) Data analysis method and device and electronic equipment
CN113435494A (en) Low-voltage resident user abnormal electricity utilization identification method and simulation system
CN113379211A (en) Block chain-based logistics information platform default risk management and control system and method
Peiyi et al. Analysis and research on enterprise resumption of work and production based on K-means clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination