CN111144540A - Generation method of anti-electricity-stealing simulation data set - Google Patents

Generation method of anti-electricity-stealing simulation data set Download PDF

Info

Publication number
CN111144540A
CN111144540A CN201911230917.XA CN201911230917A CN111144540A CN 111144540 A CN111144540 A CN 111144540A CN 201911230917 A CN201911230917 A CN 201911230917A CN 111144540 A CN111144540 A CN 111144540A
Authority
CN
China
Prior art keywords
data
simulation
variation
data sample
electricity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911230917.XA
Other languages
Chinese (zh)
Inventor
张志�
董贤光
陈祉如
代燕杰
杜艳
王平欣
王清
李琮琮
朱红霞
王者龙
郭亮
徐新光
梁波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
State Grid Shandong Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd, State Grid Shandong Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201911230917.XA priority Critical patent/CN111144540A/en
Publication of CN111144540A publication Critical patent/CN111144540A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for generating an anti-electricity-stealing simulation data set, which comprises the following steps: data preprocessing, namely encoding the normal electricity consumption data samples and the electricity stealing behavior data samples into chromosomes to form an original sample group which can be processed by a genetic algorithm; step two: expanding the quantity of the normal electricity utilization data sample sets, namely expanding the quantity of the normal electricity utilization data samples by using a crossover operator to generate normal electricity utilization data sample sets; step three: and generating a simulation data sample set, and generating the simulation data sample set to be tested by taking the original electricity stealing behavior data sample as a variation factor. The invention can inherit the priori knowledge and rules hidden in the original sample group to the extended normal power consumption data sample set, guides the generation process based on the similarity function of entropy, ensures that the final simulation data sample set has the same attribute and attribute value distribution as the original sample group, and provides a large amount of data basis for the operation of a simulation system.

Description

Generation method of anti-electricity-stealing simulation data set
Technical Field
The invention relates to the technical field of electricity stealing prevention, in particular to a method for generating an electricity stealing prevention simulation data set.
Background
The rapid development of modern science and technology provides a riding machine for the implementation of electricity stealing behaviors. Some electricity stealing molecules refit the electricity stealing prevention device, or utilize equipment such as wireless jammers, wireless remote controllers to steal electricity, mislead the monitoring work of electricity utilization monitoring personnel, lead to the electricity utilization monitoring work to become invalid, cause a large amount of electric energy to run off. The main body of stealing electricity gradually turns to enterprises and groups even from a single individual, the concealment of stealing electricity is higher and higher, electricity stealing cases in the range of national network companies are known in recent years, the situation of stealing electricity at the present stage is found to have the characteristics of specialized means, intelligentization of equipment, concealed behavior, large-scale main body, occupational personnel, networked propaganda and the like, and if the professional quality of electricity utilization monitoring personnel is not deep and the insight is insufficient, the behavior of stealing electricity is difficult to find. Compared with the traditional electricity stealing mode, the current electricity stealing mode has the development trend of diversification and advanced evolution. Some enterprises and groups go away from danger without worry due to high production cost of electric power cost.
In order to promote the conversion of the anti-electricity-stealing management mode from 'finding and remedying after the fact' to 'preventing in advance and controlling in the fact', various anti-electricity-stealing or electricity-stealing prevention early warning monitoring and analyzing systems are produced. If an advanced electricity stealing prevention system (for example, a GSM mobile communication network is used as an information transmission platform) is adopted, an automatic meter reading system is popularized and used, and the electricity utilization condition of a user is monitored in real time and scientifically managed so as to quickly find electricity stealing behaviors and prevent the electricity stealing behaviors; collecting multi-channel data sources, establishing an anti-electricity-stealing analysis model and the like to form an all-dimensional whole-process three-dimensional anti-electricity-stealing management and control system. However, in the aspect of data mining and analysis, because data sources are wide and various system derived data often contain user personal information, in order to guarantee data security and operation realizability, only a part of typical user electricity consumption data samples are used as experimental data, at the moment, the data volume of original sample data which can meet the conditions of normal use cases and electricity stealing use cases is insufficient, the original sample data is directly used for the experiment with poor effect, and the refining of electricity stealing prevention knowledge is difficult to realize. At present, genetic algorithm is widely used for the generation of simulation data set, however, traditional interpolation variation operation is extremely destructive, often destroys the trend characteristic of original data, and meanwhile, the variation operation does not consider the historical state condition of population, and is difficult to realize the data convergence in the later stage of evolution. Therefore, an effective simulation data generation method needs to be researched, a simulation data set for an experiment is generated based on a small amount of acquired data, and the generated data set follows the attributes and incidence relation of an original data sample of the electricity utilization information acquisition system, so that the sample data requirement of an electricity anti-theft experiment case is met, the electricity anti-theft technology research and early warning prevention and control capability are promoted, and the comprehensive electricity anti-theft detection capability of a national grid company is comprehensively promoted.
Disclosure of Invention
The invention provides a generation method of an anti-electricity-stealing simulation data set aiming at the problem that the anti-electricity-stealing application needs to contain a large amount of enough electricity-stealing behavior characteristic data; on the basis of preprocessing the original data, expanding the original population by adopting a cross operator based on similarity; and (3) increasing the proportion of test cases by using a chromosome variation algorithm based on the invariant population trend under the condition of keeping the characteristics of the population data set, thereby forming an automatic generation method of the anti-electricity-stealing simulation data set.
In order to solve the problems, the invention adopts the following technical scheme:
the electricity stealing prevention simulation data set generation method based on the improved genetic algorithm comprises the following steps
The method comprises the following steps: acquiring a normal electricity consumption data sample and an electricity stealing behavior data sample, and encoding the normal electricity consumption data sample and the electricity stealing behavior data sample into chromosomes;
step two: performing quantity expansion on the normal electricity utilization data samples by using a crossover operator to generate a normal electricity utilization data sample set;
step three: taking the original electricity stealing behavior data sample as a variation factor, and carrying out iterative variation on the normal electricity consumption data sample and the electricity stealing behavior data sample to generate a simulation data sample set to be tested;
step four: judging the contact ratio of the simulation data sample set and the original sample population based on an entropy principle to check the simulation data sample set, and returning to the third step to adjust the variation factor when the contact ratio is lower than a set threshold value until the contact threshold value is met;
step five: and outputting the anti-electricity-stealing simulation data set.
Preferably, in the second step, a crossover operator of chromosome similarity is adopted to realize crossover operation of distant chromosome samples, and the crossover operator is used to recombine chromosomes in the crossover process to generate new chromosomes.
Preferably, in the third step, the original electricity stealing behavior data sample is used as a variation factor, a variation algorithm with unchanged group trend is used for varying chromosomes close to the variation factor value in the data sample set into a new chromosome set, and the chromosome variation in the data sample set is ensured not to influence the whole chromosome distribution.
Preferably, in the fourth step, two types of simulation data sets are respectively generated by using a crossover operator of the chromosome similarity and a conventional crossover operator without the chromosome similarity, and the overlap ratio of the two types of simulation data and the original data is compared by using an entropy principle.
Preferably, in the third step, the specific step of performing iterative variation on the normal electricity consumption data samples in the data sample set according to the correlation between the proportion of the simulation use cases and the variation degree is as follows:
(1) firstly, specifying an initial simulation case variation range during the initial variation, and obtaining the chromosome variation degree in the initial proportion according to the incidence relation between the case proportion and the variation degree;
(2) carrying out variation on chromosomes in the variation range of the initial simulation case according to the variation degree obtained by calculation, and judging whether the current ratio reaches the ratio threshold of the specified electricity stealing behavior data sample or not after variation;
(3) if the proportion of the electricity stealing behavior data sample does not reach the proportion threshold value, the current proportion is reduced in proportion, and iterative variation is carried out;
(4) and when the power stealing behavior data sample after the iterative variation reaches a ratio threshold, obtaining a simulation data sample set to be tested.
Preferably, in step one, the formula for forming the original sample population processed by the genetic algorithm is as follows:
P=(c1,c2,…,cN) (1)
in the formula (1), P is a history data group, c is history data as chromosomes, N is the number of history data as the number of chromosomes, and c is t1,t2,…,tkWherein k is the length of the chromosome,t represents parameters of the historical data.
Preferably, in the second step, in the chromosome crossing process of the genetic algorithm, the selected chromosomes are added into a cross mating pool for pairwise pairing, and then the chromosomes are recombined by using a cross operator to generate new chromosomes, wherein the calculation formula of the cross operator is as follows:
Figure BDA0002303512410000031
in the formula (2), qtarThe number of chromosomes in the target population; q. q.sinitIs the number of chromosomes of the original population.
Preferably, in step four, after the simulation data sample set is generated, a calculation formula for calculating the coincidence degree of the simulation data and the original data is as follows:
Figure BDA0002303512410000032
degree of coincidence sim (P) of raw dataI,PG),sim(PI,PG) The closer to 0, the higher the degree of coincidence of the two, sim (P)I,PG) The closer to 1, the lower the degree of coincidence, P in formula (10)IAs raw data, PGIs simulation data;
Figure BDA0002303512410000033
is a population PGThe information entropy of the value at the ith position;
Figure BDA0002303512410000034
is a population PIThe information entropy of the value at the ith position; n is the number of group samples selected for comparison of the contact ratio;
Figure BDA0002303512410000035
is a population PGThe mutual information between the values at the ith and jth positions and the group PIThe difference of the mutual information of the values at the corresponding positions in the image; h (P)G(Xi),PI(Xi) Is the relative entropy of the values at the corresponding positions of the 2 populations.
The generation device of the anti-electricity-stealing simulation data set comprises a data preprocessing module, a data processing module and a data processing module, wherein the data preprocessing module is used for acquiring a normal electricity utilization data sample and an electricity-stealing behavior data sample and encoding the data samples into chromosomes;
the expansion module is used for performing quantity expansion on the normal electricity utilization data samples by using a crossover operator to generate a normal electricity utilization data sample set;
the iteration variation module is used for taking the original electricity stealing behavior data sample as a variation factor, carrying out iteration variation on the normal electricity consumption data sample and the electricity stealing behavior data sample and generating a simulation data sample set to be tested;
the detection module is used for judging the contact ratio of the simulation data sample set and the original sample group according to the entropy principle;
and the output module is used for outputting the electricity stealing prevention simulation data set.
In conclusion, the beneficial effects of the invention are as follows:
1. the invention relates to a digital simulation method for electricity stealing behavior based on improved genetic algorithm and entropy principle verification, which is characterized in that a normal electricity consumption data sample set is expanded by adopting the genetic algorithm, the genetic algorithm has inheritance property for data samples, the prior knowledge and rules implicit in an original sample group can be inherited to the expanded normal electricity consumption data sample set, a similarity function based on entropy guides the generation process to ensure that the final simulation data sample set has the same attribute and attribute value distribution as the original sample group, and a large number of normal electricity consumption data samples and an electricity stealing behavior data sample set are generated through actual data simulation so as to support the operation of an electricity stealing prevention simulation system and provide a large amount of data basis for the operation of the simulation system.
2. In the generation stage of the simulation data sample set, a variation method in the traditional genetic algorithm is improved, the characteristics of a historical typical electricity stealing case are used as variation factors, and the chromosomes in the set of the normal electricity utilization data sample and the electricity stealing behavior data sample are subjected to iterative variation according to the incidence relation between the proportion of the simulation cases and the variation degree, so that the proportion of the concentration of the simulation cases is increased, and the final simulation data set is ensured to have the same or similar attribute value distribution and incidence relation with the original sample group acquired by the electricity utilization information acquisition system;
3. the invention provides a crossover operator based on chromosome similarity, which selects a corresponding crossover operator according to the similarity of chromosomes to be crossed to realize crossover operation of distant chromosomes, avoids the local increase of specific gravity of sample data with the same or higher similarity in a population, and merges a population formed by new chromosomes after the crossover operation and an original population formed by acquired data after the crossover operation is finished to realize population expansion.
Drawings
FIG. 1 is a diagram of a generation process of an anti-theft simulation data set according to the present invention;
FIG. 2 is a diagram of a normal sample population chromosomal variation process based on electricity stealing characteristics according to the present invention;
FIG. 3 is a comparison graph of entropy overlap ratio of various data features of the present invention;
FIG. 4 is a comparison chart of the entropy overlap ratio between the method of the present invention and other simulation data generation methods.
Detailed Description
The invention is further described with reference to the following figures and examples.
The system source data of the embodiment is obtained from a power utilization information acquisition system of a certain provincial electric power company, is obtained from power utilization information acquisition data of 300 ten thousand typical power utilization customers in 2018 of the certain provincial electric power company, and covers meter reading data, load data and the like used in daily business at high frequency, and is shown in table 1:
Figure BDA0002303512410000041
Figure BDA0002303512410000051
Figure BDA0002303512410000061
table 1 partial acquisition data characterization
The implementation environment adopts 2 4 way PC servers to build the test server, adopts the hot standby mode of two machines, and 2 machine configurations are as follows: the system comprises a CPU 32 core, a memory 64GB, a hard disk 1TB and an Oracle 11g database; and 2, adopting 2 sets of 2 paths of PC servers to build an application server, deploying data generation services and application programs, and deploying a tomcat middleware server. The data generation service comprises a plurality of application services to form a computing cluster, when data are generated, data generation tasks are established for different data contents of each type, corresponding execution nodes are distributed for different tasks to be executed in parallel, and the default maximum execution node number is 5.
The specific embodiment of the electricity stealing prevention simulation data set generation method comprises the following steps:
the method comprises the following steps: the method comprises the following steps of collecting various normal electricity utilization data samples of typical users through an electricity utilization information collection system, using electricity utilization behavior data of the typical users as an original data set, using historical data as the original data set, coding each datum into a chromosome to form an original sample group which can be processed by a genetic algorithm, wherein the original sample group comprises the following forms:
P=(c1,c2,…,cN) (1)
in formula (1), P is the historical data population and c is the chromosome, i.e., the historical data. N is the number of chromosomes, i.e., the number of historical data. c ═ t1,t2,…,tkWherein k is the length of the chromosome, and t represents each parameter of the historical data;
step two: aiming at the original sample group, the crossover operator of the chromosome similarity is adopted to realize the crossover operation of distant chromosome samples, so that a new normal power consumption data sample set is generated to realize the expansion of the number of the sample set. In the chromosome crossing process of the genetic algorithm, the selected chromosomes are added into a cross mating pool for pairwise pairing, and then a cross operator, namely the cross probability p, is usedcThe chromosomes are recombined to generate new chromosomes. Cross probability pcThe value of (a) is related to the number of the expected generated population and the original population.
Figure BDA0002303512410000071
In the formula (2), qtarThe number of chromosomes in the target population; q. q.sinitIs the number of chromosomes of the original population.
Under the above conditions, if the similarity of the 2 crossed chromosomes is high, the new chromosomes crossed by the selected 2 crossed chromosomes and the parent chromosomes have little change, and after the population evolves for many times, the proportion of the chromosomes with the same or higher similarity in the population is greatly increased, so that most of crossed operations are caused, and the convergence speed of the algorithm is greatly reduced, even the algorithm cannot converge.
Firstly, calculating the similarity of chromosomes to be crossed by a data cross operator, wherein a similarity formula is shown as a formula (3); then, a non-uniform arithmetic crossing mode is adopted to generate a new individual by linear combination of 2 individuals with lower similarity. Suppose 2 individuals xaAnd xbThe maximum value and the minimum value of the group of 2 individuals are x respectivelymaxAnd xminThe new chromosomes are each xa′And xb′Then, the new entity generated by the crossover operation is shown in equations (4) and (5).
Figure BDA0002303512410000081
Figure BDA0002303512410000082
Figure BDA0002303512410000083
In the formulas (3), (4) and (5), α is a variable factor, the maximum value range of α is [0,1] due to the adoption of non-uniform linear intersection, when α is close to 0 or 1, a new chromosome generated by intersection is too close to a parent chromosome, the balance degree of a population is reduced due to multiple intersections, so that α is a random number of [0.2,0.8], a parameter β is a similarity threshold value, the value is generally 0.1, and only distant chromosomes with low similarity perform intersection operation during intersection, and the intersection of voltage data of the voltage curve of the phase A voltage is shown in Table 2.
Figure BDA0002303512410000084
TABLE 2A-phase voltage crossing operation table
Step three: because the data of the expanded sample set is normal power consumption data samples and the amount of original typical power stealing behavior data samples is small, the power stealing behavior data samples are lacked, various original typical power stealing behavior data samples are used as variation factors, a variation algorithm is used for varying chromosomes close to the variation factor value in a population into a new chromosome set, the chromosomes in the population are varied based on the chromosome variation algorithm with the constant population trend, the chromosome variation in the population is guaranteed not to influence the distribution of the whole chromosomes, namely, the chromosomes in the population are iteratively varied according to the incidence relation between the proportion of simulation cases and the variation degree, and finally all the varied chromosomes meeting the proportion of the simulation cases are obtained. The specific process is shown in fig. 2.
(1) Defining h (0) as an initial simulation case variation range specified when variation is initial, and obtaining the chromosome variation degree when the initial proportion is h (0) according to the incidence relation between the simulation case proportion and the variation degree;
the incidence relation between the proportion of the simulation cases and the degree of variation is shown as a formula (6):
η(i+1)=η(i)+α(i)×h(i) (6)
in the formula (6), η (i) is the variation value at the last iteration, the initial η (0) is 0, α (i) is the variation learning rate, 0 < α (i) < 1 is a gain function, and h (i) is the proportion of the simulation case at the variation of the ith time and gradually decreases along with the increase of the times.
The variability learning rate is expressed as:
Figure BDA0002303512410000091
in the formula (7), α (i) decreases as the number of variations increases, and the initial value α (0) is designated as the maximum learning rate, imaxIs the maximum number of mutations specified.
The proportion change function of the simulation case adopts a Gaussian function and is expressed as
Figure BDA0002303512410000092
In the formula (8), r (i) is a variation range radius, which is gradually smaller with the increase of the iteration number i as well as the learning rate, and the adjustment rule is as follows:
Figure BDA0002303512410000093
in the formula (9), INT is a rounding function, and the reduction of the variation range is determined by the variation of the radius of the range.
(2) Carrying out variation on chromosomes in the range of h (0) according to the variation degree obtained by calculation, and judging whether the current ratio reaches the ratio threshold of the specified electricity stealing behavior data sample or not after variation;
(3) if the proportion of the electricity stealing behavior data sample does not reach the proportion threshold value, the current proportion is reduced in proportion, and iterative variation is carried out;
(4) and when the proportion threshold of the electricity stealing behavior data samples after iterative variation is larger than the proportion threshold, obtaining all simulation data sample sets to be tested.
The variation degree of each iteration variation can be obtained according to the incidence relation between the simulation case occupation ratio and the variation degree and the variation condition of the occupation ratio, and the chromosomes in the population are varied in different degrees according to the variation degree, so that the population meeting the simulation case occupation ratio is finally obtained, and the experimental requirement is met.
Step four: providing sample data according to the power consumption information acquisition system, and utilizing the intersection of chromosome similarity of the textAnd respectively generating simulation data sets by using a fork operator and a traditional crossover operator without chromosome similarity, and comparing the coincidence degree of the simulation data sample set and the original sample population by using an entropy principle to test the simulation data sample set. After the simulation data sample set is generated, the coincidence ratio sim (P) of the simulation data and the original data is calculated by using the formula (10)I,PG),sim(PI,PG) The closer to 0, the higher the degree of coincidence of the two, sim (P)I,PG) The closer to 1, the lower the degree of coincidence.
Figure BDA0002303512410000101
In formula (10): pIAs raw data, PGIs simulation data;
Figure BDA0002303512410000102
is a population PGThe information entropy of the value at the ith position;
Figure BDA0002303512410000103
is a population PIThe information entropy of the value at the ith position; n is the number of group samples selected for comparison of the contact ratio;
Figure BDA0002303512410000104
is a population PGThe mutual information between the values at the ith and jth positions and the group PIThe difference of the mutual information of the values at the corresponding positions in the image; h (P)G(Xi),PI(Xi) Is the relative entropy of the values at the corresponding positions of the 2 populations.
The entropy overlap ratio of the simulation data sample set generated by the two types of intersection operators and the original sample population is shown in fig. 3.
The experimental result shows that the coincidence degree of the simulation data obtained by the similarity-based crossover operator and the original data is higher than that of the traditional crossover operator, wherein the coincidence degree is relatively low because the difference between the electric energy data value and the electric quantity data value is relatively large, and the coincidence degree is relatively high because the numerical value change rate of the load data, the voltage data and the current data is relatively stable. Therefore, the simulation data set generated by the data generation method adopted in the method has no unacceptable data distortion and meets the requirement of being used as simulation data;
step five: and comparing the simulation data with the experimental result and analyzing. Based on the simulation data sample set obtained after the crossover operator, the mutation algorithm and the original interpolation mutation method provided by the method are respectively used for increasing the proportion of simulation cases to a threshold value, then the entropy value overlap ratio of the simulation data sample set without mutation, improved mutation and interpolated mutation and the original sample population is respectively compared, and the validity of the chromosome mutation algorithm is verified.
The interpolation variation refers to taking the value of a variation factor as a specific chromosome value;
in the experimental process, the used variation factors are experimental case values for simulating data boundary conditions, namely the maximum and minimum values of the type of numerical values, for example, the variation factor of the forward active total power is 99.99 at the maximum and 0.01 at the minimum.
The experimental result is shown in fig. 4, and the experimental result shows that, under the condition of using the variation algorithm proposed in the method, the contact ratio of the simulation data and the original data is reduced when no variation exists, but the reduction value is within 0.03, and the reduction value belongs to an acceptable range; the coincidence degree of the simulation data generated by interpolation variation and the original data is low, and the characteristics and the incidence relation of the original data are damaged.
In summary, the electricity stealing behavior digital simulation method is verified and constructed based on an improved genetic algorithm and an entropy principle, a normal electricity consumption data sample set is expanded by the genetic algorithm, the genetic algorithm has inheritance characteristics for data samples, priori knowledge and rules implicit in an original sample group can be inherited to the expanded normal electricity consumption data sample set, a generating process is guided by a similarity function based on entropy, the final simulation data sample set is ensured to have the same attribute and attribute value distribution as those of the original sample group, and a large number of normal electricity consumption data samples and an electricity stealing behavior data sample set are generated through actual data simulation so as to support the operation of an electricity stealing prevention simulation system and provide a large number of data bases for the operation of a simulation system. In the generation stage of the simulation data sample set, the invention improves the variation method in the traditional genetic algorithm, and performs iterative variation on chromosomes in the set of the normal power consumption data sample and the electricity stealing behavior data sample according to the incidence relation between the ratio of the simulation cases and the variation degree by taking the characteristics of the historical typical electricity stealing cases as variation factors, thereby increasing the ratio of the concentration of the simulation cases and ensuring that the final simulation data set has the same or similar attribute value distribution and incidence relation with the original sample group acquired by the power consumption information acquisition system. The invention provides a crossover operator based on chromosome similarity, which selects a corresponding crossover operator according to the similarity of chromosomes to be crossed to realize crossover operation of distant chromosomes, avoids the local increase of specific gravity of sample data with the same or higher similarity in a population, and merges a population formed by new chromosomes after the crossover operation and an original population formed by acquired data after the crossover operation is finished to realize population expansion.
The generation device of the anti-electricity-stealing simulation data set comprises a data preprocessing module, a data processing module and a data processing module, wherein the data preprocessing module is used for acquiring a normal electricity utilization data sample and an electricity-stealing behavior data sample and encoding the data samples into chromosomes;
the expansion module is used for performing quantity expansion on the normal electricity utilization data samples by using a crossover operator to generate a normal electricity utilization data sample set;
the iteration variation module is used for taking the original electricity stealing behavior data sample as a variation factor, carrying out iteration variation on the normal electricity consumption data sample and the electricity stealing behavior data sample and generating a simulation data sample set to be tested;
the detection module is used for judging the contact ratio of the simulation data sample set and the original sample group according to the entropy principle;
and the output module is used for outputting the electricity stealing prevention simulation data set.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (9)

1. A method of generating an anti-theft simulation dataset, comprising:
the method comprises the following steps: acquiring a normal electricity consumption data sample and an electricity stealing behavior data sample, and encoding the normal electricity consumption data sample and the electricity stealing behavior data sample into chromosomes;
step two: performing quantity expansion on the normal electricity utilization data samples by using a crossover operator to generate a normal electricity utilization data sample set;
step three: taking the original electricity stealing behavior data sample as a variation factor, and carrying out iterative variation on the normal electricity consumption data sample and the electricity stealing behavior data sample to generate a simulation data sample set to be tested;
step four: judging the contact ratio of the simulation data sample set and the original sample population based on an entropy principle to check the simulation data sample set, and returning to the third step to adjust the variation factor when the contact ratio is lower than a set threshold value until the contact threshold value is met;
step five: and outputting the anti-electricity-stealing simulation data set.
2. A method of generating an anti-theft simulation data set according to claim 1, characterized by: and in the second step, the crossover operator of the chromosome similarity is adopted to realize the crossover operation of distant chromosome samples, and the crossover operator is used for recombining the chromosome in the crossover process to generate a new chromosome.
3. A method of generating an anti-theft simulation data set according to claim 1, characterized by: in the third step, the original electricity stealing behavior data sample is used as a variation factor, a variation algorithm with invariable population trend is used for varying the chromosomes close to the variation factor value in the data sample set into a new chromosome set, and the chromosome variation in the data sample set is ensured not to influence the whole chromosome distribution.
4. A method of generating an anti-theft simulation data set according to claim 1, characterized by: and in the fourth step, two types of simulation data sets are respectively generated by using a crossover operator with chromosome similarity and a traditional crossover operator without chromosome similarity, and the overlap ratio of the two types of simulation data and the original data is compared by using an entropy principle.
5. A method of generating an anti-theft simulation data set according to claim 1, characterized by: in the third step, the specific steps of carrying out iterative variation on the normal electricity consumption data samples in the data sample set according to the incidence relation between the proportion of the simulation cases and the variation degree are as follows:
(1) firstly, specifying an initial simulation case variation range during the initial variation, and obtaining the chromosome variation degree in the initial proportion according to the incidence relation between the case proportion and the variation degree;
(2) carrying out variation on chromosomes in the variation range of the initial simulation case according to the variation degree obtained by calculation, and judging whether the current ratio reaches the ratio threshold of the specified electricity stealing behavior data sample or not after variation;
(3) if the proportion of the electricity stealing behavior data sample does not reach the proportion threshold value, the current proportion is reduced in proportion, and iterative variation is carried out;
(4) and when the power stealing behavior data sample after the iterative variation reaches a ratio threshold, obtaining a simulation data sample set to be tested.
6. A method of generating an anti-theft simulation data set according to claim 1, characterized by: in step one, the formula for forming the original sample population processed by the genetic algorithm is as follows:
P=(c1,c2,…,cN) (1)
in the formula (1), P is a history data group, c is history data as chromosomes, N is the number of history data as the number of chromosomes, and c is t1,t2,…,tkWherein k is a dyeThe body length, t, represents various parameters of the historical data.
7. A method of generating an anti-theft simulation data set according to claim 1, characterized by: in the step two, in the chromosome crossing process of the genetic algorithm, the selected chromosomes are added into a cross mating pool for pairwise pairing, then the chromosomes are recombined by using a cross operator to generate new chromosomes, and the calculation formula of the cross operator is as follows:
Figure FDA0002303512400000021
in the formula (2), qtarThe number of chromosomes in the target population; q. q.sinitIs the number of chromosomes of the original population.
8. A method of generating an anti-theft simulation data set according to claim 1, characterized by: in the fourth step, after the simulation data sample set is generated, a calculation formula for calculating the contact ratio of the simulation data and the original data is as follows:
Figure FDA0002303512400000022
degree of coincidence sim (P) of raw dataI,PG),sim(PI,PG) The closer to 0, the higher the degree of coincidence of the two, sim (P)I,PG) The closer to 1, the lower the degree of coincidence, P in formula (10)IAs raw data, PGIs simulation data;
Figure FDA0002303512400000023
is a population PGThe information entropy of the value at the ith position;
Figure FDA0002303512400000024
is a population PIThe information entropy of the value at the ith position; n is the number of group samples selected for comparison of the contact ratio;
Figure FDA0002303512400000025
is a population PGThe mutual information between the values at the ith and jth positions and the group PIThe difference of the mutual information of the values at the corresponding positions in the image; h (P)G(Xi),PI(Xi) Is the relative entropy of the values at the corresponding positions of the 2 populations.
9. An apparatus for generating an anti-electricity-stealing simulation dataset, comprising: the system comprises a data preprocessing module, a data processing module and a data processing module, wherein the data preprocessing module is used for acquiring a normal power utilization data sample and a power stealing behavior data sample and encoding the data samples into chromosomes;
the expansion module is used for performing quantity expansion on the normal electricity utilization data samples by using a crossover operator to generate a normal electricity utilization data sample set;
the iteration variation module is used for taking the original electricity stealing behavior data sample as a variation factor, carrying out iteration variation on the normal electricity consumption data sample and the electricity stealing behavior data sample and generating a simulation data sample set to be tested;
the detection module is used for judging the contact ratio of the simulation data sample set and the original sample group according to the entropy principle;
and the output module is used for outputting the electricity stealing prevention simulation data set.
CN201911230917.XA 2019-12-05 2019-12-05 Generation method of anti-electricity-stealing simulation data set Pending CN111144540A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911230917.XA CN111144540A (en) 2019-12-05 2019-12-05 Generation method of anti-electricity-stealing simulation data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911230917.XA CN111144540A (en) 2019-12-05 2019-12-05 Generation method of anti-electricity-stealing simulation data set

Publications (1)

Publication Number Publication Date
CN111144540A true CN111144540A (en) 2020-05-12

Family

ID=70517523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911230917.XA Pending CN111144540A (en) 2019-12-05 2019-12-05 Generation method of anti-electricity-stealing simulation data set

Country Status (1)

Country Link
CN (1) CN111144540A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036456A (en) * 2020-08-19 2020-12-04 阳光电源股份有限公司 Photovoltaic fault data generation method and device and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102323906A (en) * 2011-09-08 2012-01-18 哈尔滨工程大学 MC/DC test data automatic generation method based on genetic algorithm
CN104765690A (en) * 2015-04-22 2015-07-08 哈尔滨工业大学 Embedded software test data generating method based on fuzzy-genetic algorithm
CN105573997A (en) * 2014-10-09 2016-05-11 普华讯光(北京)科技有限公司 Method and device for determining electric larceny suspect user
CN107423818A (en) * 2017-06-26 2017-12-01 中国电力科学研究院 A kind of method and system of the test data set generation of power information acquisition system unified interface
CN110097297A (en) * 2019-05-21 2019-08-06 国网湖南省电力有限公司 A kind of various dimensions stealing situation Intellisense method, system, equipment and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102323906A (en) * 2011-09-08 2012-01-18 哈尔滨工程大学 MC/DC test data automatic generation method based on genetic algorithm
CN105573997A (en) * 2014-10-09 2016-05-11 普华讯光(北京)科技有限公司 Method and device for determining electric larceny suspect user
CN104765690A (en) * 2015-04-22 2015-07-08 哈尔滨工业大学 Embedded software test data generating method based on fuzzy-genetic algorithm
CN107423818A (en) * 2017-06-26 2017-12-01 中国电力科学研究院 A kind of method and system of the test data set generation of power information acquisition system unified interface
CN110097297A (en) * 2019-05-21 2019-08-06 国网湖南省电力有限公司 A kind of various dimensions stealing situation Intellisense method, system, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩霄汉: "基于改进遗传算法的接口测试数据集的生成方法" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036456A (en) * 2020-08-19 2020-12-04 阳光电源股份有限公司 Photovoltaic fault data generation method and device and computer readable storage medium

Similar Documents

Publication Publication Date Title
WO2022126971A1 (en) Density-based text clustering method and apparatus, device, and storage medium
Jindal et al. Decision tree and SVM-based data analytics for theft detection in smart grid
Hruschka et al. A genetic algorithm for cluster analysis
Zhang et al. Stochastic model predictive control using a combination of randomized and robust optimization
CN110428137B (en) Updating method and device of risk prevention and control strategy
CN112187554B (en) Operation and maintenance system fault positioning method and system based on Monte Carlo tree search
Hierons et al. Many-objective test suite generation for software product lines
CN111860692A (en) Abnormal data detection method based on K-media in Internet of things environment
CN112001409A (en) Power distribution network line loss abnormity diagnosis method and system based on K-means clustering algorithm
Xu et al. Risk‐averse multi‐objective generation dispatch considering transient stability under load model uncertainty
Cen et al. Mobile app security risk assessment: A crowdsourcing ranking approach from user comments
Parmar et al. A novel density peak clustering algorithm based on squared residual error
CN113094448B (en) Analysis method and analysis device for residence empty state and electronic equipment
CN114117134A (en) Abnormal feature detection method, device, equipment and computer readable medium
CN111144540A (en) Generation method of anti-electricity-stealing simulation data set
CN112528762B (en) Harmonic source identification method based on data correlation analysis
Dong Application of Big Data Mining Technology in Blockchain Computing
Gan et al. Metasample-based robust sparse representation for tumor classification
CN110189230B (en) Construction method of analytic model of dynamic partition
Safdarian et al. Composite power system adequacy assessment based on postoptimal analysis
CN115543428A (en) Simulated data generation method and device based on strategy template
CN115528684A (en) Ultra-short-term load prediction method and device and electronic equipment
CN114024912A (en) Network traffic application identification analysis method and system based on improved CHAMELEON algorithm
CN111222550A (en) Method and device for determining electricity utilization behavior of user
CN113672952A (en) Data authority configuration and control method and system based on automatic classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination