CN116128332A - Knowledge graph-based power grid comprehensive evaluation index processing method and system - Google Patents

Knowledge graph-based power grid comprehensive evaluation index processing method and system Download PDF

Info

Publication number
CN116128332A
CN116128332A CN202211499589.5A CN202211499589A CN116128332A CN 116128332 A CN116128332 A CN 116128332A CN 202211499589 A CN202211499589 A CN 202211499589A CN 116128332 A CN116128332 A CN 116128332A
Authority
CN
China
Prior art keywords
power grid
data
evaluation index
knowledge graph
comprehensive evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211499589.5A
Other languages
Chinese (zh)
Inventor
余晓伟
凌煦
刘天斌
冯晓霞
李悝
李锴
周晓刚
李泰军
刘兵
汪辰
王慧来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Grid Co Ltd
China Power Engineering Consultant Group Central Southern China Electric Power Design Institute Corp
Original Assignee
Central China Grid Co Ltd
China Power Engineering Consultant Group Central Southern China Electric Power Design Institute Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Grid Co Ltd, China Power Engineering Consultant Group Central Southern China Electric Power Design Institute Corp filed Critical Central China Grid Co Ltd
Priority to CN202211499589.5A priority Critical patent/CN116128332A/en
Publication of CN116128332A publication Critical patent/CN116128332A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention discloses a power grid comprehensive evaluation index processing method and system based on a knowledge graph. The method comprises the following steps: classifying the power grid evaluation indexes by taking the influence factors of the power grid evaluation indexes as references to obtain initialized power grid evaluation index data, and forming a knowledge graph; normalizing the knowledge graph, and converting the initialized power grid evaluation index data into normalized power grid evaluation index data; carrying out fault tolerance processing on the data indexes in the normalized knowledge graph; performing safe transcoding processing on waveform data in the power grid evaluation index; data cleaning is implemented, and data filling is carried out on all power grid evaluation index data through a linear regression model; and carrying out data checking on the comprehensive evaluation index of the power grid. The invention combines a plurality of power grid evaluation indexes, provides two different cleaning methods, adopts fault-tolerant processing, safe transcoding and other operations, and improves the overall robustness.

Description

Knowledge graph-based power grid comprehensive evaluation index processing method and system
Technical Field
The invention belongs to the technical field of knowledge graph data cleaning, and particularly relates to a method and a system for processing comprehensive evaluation indexes of a power grid based on a knowledge graph.
Technical Field
The five performances of the regional power grid, such as adequacy, coordination, reliability, safety, economy and the like, are comprehensive manifestations of the power grid construction level and the power grid operation regulation level, and the factors of grid structure, geographic conditions, operation management, equipment configuration and the like for controlling the regional power grid can be accurately influenced from the five performances, so that a scientific, perfect and comprehensive index system is formed for the comprehensive performance evaluation of the regional power grid.
A Knowledge Graph (knowledgegraph) is a series of different graphs showing the relationship between the Knowledge development process and the structure, and a visualization technology is used for describing Knowledge resources and carriers thereof, mining, analyzing, constructing, drawing and displaying Knowledge and the interrelation between the Knowledge resources and the carriers, so that the problems that the information expression capability of the traditional database is insufficient and the background data service requirements of most electric power systems cannot be effectively supported are solved.
Because the comprehensive evaluation index of the power grid has the problems of large data quantity, wide data sources and inconsistent data types, the process of constructing the knowledge graph is always automated to avoid artificial supervision, so that various errors including blurring, conflict, errors and redundant information are inevitably introduced. The quality assurance of the knowledge graph is a precondition for various knowledge-driven applications. Therefore, how to reasonably design a general and efficient power grid comprehensive evaluation index processing method and system based on the knowledge graph, to realize a series of operation processes such as standardization processing, fault tolerance processing, safe transcoding, index cleaning, data checking and the like on each power grid comprehensive evaluation index in the knowledge graph framework, and to improve the stability, reliability, robustness and the like of the power grid comprehensive evaluation system knowledge graph has become an urgent requirement of the current power enterprises.
Disclosure of Invention
The invention provides a power grid comprehensive evaluation index processing method and system based on a knowledge graph, which are convenient for staff to learn and master a power grid comprehensive evaluation knowledge system, can sense the running condition of a regional power grid in an omnibearing manner, and combines the traditional power grid comprehensive evaluation method with a knowledge graph technology with interconnection, openness, structuring and visual knowledge.
The invention provides a power grid comprehensive evaluation index processing method based on a knowledge graph, which comprises the following steps:
classifying the power grid evaluation indexes by taking the influence factors of the power grid evaluation indexes as references, realizing the distinguishing treatment of the power grid evaluation indexes, obtaining initialized power grid evaluation index data, and forming a knowledge graph by the initialized power grid evaluation index data;
normalizing the knowledge graph, and converting the initialized power grid evaluation index data into normalized power grid evaluation index data;
performing fault tolerance processing on the normalized power grid evaluation index data, and performing regression filling on the missing value through a filling function;
performing safe transcoding processing on waveform data in the regression-filled power grid evaluation index data;
performing data cleaning on all power grid evaluation index data subjected to the safe transcoding processing, checking the data cleaning result until the training model of the knowledge graph and the triad classification model are completely converged, and performing data filling through a linear regression model;
performing data checking on comprehensive evaluation indexes of the regional power grid involved in the knowledge graph construction;
and storing the knowledge graph of the comprehensive evaluation index of the power grid, which is subjected to the data verification, into a graph database.
Further, the grid evaluation index at least comprises one of adequacy, coordination, reliability, safety and economy of the grid.
Further, after the fault tolerance processing is carried out, the missing data is automatically backfilled through a filling function, and then relatively complete data in a power grid comprehensive evaluation system is obtained.
Further, the secure transcoding process includes:
and carrying out smoothing filtering on the comprehensive evaluation index waveform data of the power grid by using Hilbert-Huang transformation, finding out association between the waveform data by using a configuration file, and carrying out waveform parallel drawing to form a graphical waveform file.
Further, the data cleansing and verification data cleansing results include:
acquiring an unwashed knowledge graph to be washed, wherein the knowledge graph to be washed comprises entities formed by comprehensive evaluation index data of a power grid, and corresponding entity attributes and relationship attributes, and each corresponding group of entities, entity attributes and relationship between the entity attributes form a triplet;
the method comprises the steps of performing first cleaning on comprehensive evaluation index data of a power grid to be processed, and removing invalid values and abnormal values in the comprehensive evaluation index data by using a Shore-Viiler criterion;
carrying a preset knowledge graph training model and a ternary group classification model in a power grid comprehensive evaluation index processing system based on the knowledge graph;
training and analyzing through a knowledge graph training model and a ternary group classification model preset by the system, performing second cleaning, screening out error ternary groups and removing the error ternary groups;
using a linear regression model to fill the missing value of the removed data;
performing double-accumulation analysis and inspection on the comprehensive evaluation indexes of the cleaned power grid, and ending data cleaning if the inspection result meets a preset value; and otherwise, continuing to iterate the data, and repeatedly performing the first cleaning, the second cleaning and the missing value filling processing until the test result meets a preset value.
Further, the first cleaning includes:
constructing all power grid evaluation indexes as samples into a data set by using a Showiler criterion, determining a probability band which takes a normal distribution mean value as a center, judging any sample data value which is not in the probability band as an abnormal value by the judgment, and removing the sample data value from the data set;
the calculation formula of the Showler criterion is as follows:
Figure BDA0003966222040000031
wherein D is max For the set maximum deviation value, x is the suspicious outlier, mu is the sample mean value, and delta is the sample standard deviation.
Further, the second cleaning includes:
inputting all triples into a Trans-E training model, and training a noise perceived knowledge graph model by using random negative sampling;
a triplet scoring formula is preset in the Trans-E model, all triples are input into the triplet grouping model for training according to the scoring condition of each trained triplet, and the confidence coefficient of each triplet is refreshed after training is finished, wherein the confidence coefficient of different triples is different;
a preset confidence coefficient threshold value, namely when the confidence coefficient of the triplet is larger than a preset threshold value, judging that the triplet is correct, and reserving the triplet; when the confidence coefficient of the triplet is smaller than a preset threshold value, judging that the triplet is wrong, and eliminating;
the output of the classifier is constrained to be 0-1 by adopting a Sigmoid function in the triplet classification model;
combining the Trans-E model and the triplet classification model with iterative training until the knowledge graph training model and the triplet classification model are completely converged.
Further, the filling by the linear regression model includes:
converting the mathematical expression of the linear regression model into a vector expression:
the mathematical expression of the linear regression model is:
Figure BDA0003966222040000041
the above mathematical expression is converted into the following vector expression:
Figure BDA0003966222040000042
wherein x is a power grid evaluation index containing a missing value; y is a power grid evaluation index without a missing value; k is the number of all the power grid evaluation indexes; n is the comprehensive evaluation performance class number of the power grid; a, a k The influence factors of each evaluation index in a certain power grid performance evaluation item are used; b is a constant term and is used as a power grid evaluation index missing value judgment threshold value.
Further, the data checking of the regional power grid comprehensive evaluation index involved in the knowledge graph construction comprises the following steps:
checking whether the data quantity of each power grid evaluation index is correct;
checking whether the data format type is correct;
determining whether the relation attribute of each power grid evaluation index corresponds or not through the association among the different power grid evaluation indexes;
checking whether index redundancy exists under a certain power grid performance evaluation item;
the data standardization is checked to prevent that the related grid evaluation indexes cannot be summarized under a certain grid performance evaluation item in the knowledge graph when the names of the grid evaluation indexes are not standardized.
The invention also provides a system for processing the comprehensive evaluation index of the power grid based on the knowledge graph, which comprises the following steps:
the knowledge graph construction module is used for classifying the power grid evaluation indexes by taking the influence factors of the power grid evaluation indexes as references, realizing the distinguishing treatment of the power grid evaluation indexes, obtaining initialized power grid evaluation index data and forming a knowledge graph;
the normalization processing module is used for performing normalization processing on the knowledge graph and converting the initialized power grid evaluation index data into normalized power grid evaluation index data;
the fault-tolerant processing module is used for carrying out fault-tolerant processing on the normalized power grid evaluation index data, and then carrying out regression filling on the missing value through a filling function;
the safety transcoding processing module is used for carrying out safety transcoding processing on the waveform data in the regression-filled power grid evaluation index data;
the data cleaning module is used for carrying out data cleaning on all power grid evaluation index data subjected to the safe transcoding processing, checking the data cleaning result until the training model of the knowledge graph and the triad classification model are completely converged, and then carrying out data filling through a linear regression model;
the data checking module is used for checking data of the comprehensive evaluation indexes of the regional power grid involved in the knowledge graph construction;
and the data storage module is used for storing the knowledge graph of the comprehensive evaluation index of the power grid, which is subjected to the data verification, into a graph database.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention provides a method for carrying out effective and complete data cleaning flow on entity relevance of comprehensive evaluation indexes of different power grids in a twice different cleaning method, filling up missing values and checking cleaning results, so that the constructed comprehensive evaluation knowledge graph of the power grid can reflect the relation of the data of the different evaluation indexes on the performances of different power grids more truly and in preparation, and then multilevel data checking operation is adopted, thereby improving the construction efficiency and evaluation index relevance authenticity of the comprehensive evaluation knowledge graph of the power grid, and the data traceability and data reliability of the comprehensive evaluation knowledge graph of the power grid;
2. by taking the influence factors of all power grid evaluation indexes as the reference, the method is used for classifying all power grid evaluation indexes, and can realize the distinguishing treatment of the power grid evaluation indexes, so that the deviation of subsequent analysis treatment caused by the difference of different power grid evaluation indexes on the power grid performance evaluation weight is avoided, the pertinence and the effectiveness of the treatment of different power grid evaluation index data are improved, the complexity of the processing steps of subsequent fault-tolerant treatment, safe transcoding, data cleaning, data checking and the like is reduced, and the data robustness in the process of constructing a map is improved;
3. the method can provide important technical support and research carrier for dispatching and planning of the regional power grid and corresponding decision-making work, and has high popularization and application values.
Drawings
Fig. 1 is a schematic structural diagram of a knowledge graph body model of a power grid comprehensive evaluation system;
fig. 2 is a schematic flow chart of a method for processing comprehensive performance evaluation indexes of a power grid based on a knowledge graph;
fig. 3 is a schematic flow chart of a data cleaning method for comprehensive evaluation indexes of a power grid.
Detailed Description
The invention is described in further detail below with reference to figures 1-3 and the specific examples.
As shown in fig. 1, the embodiment discloses a method for processing a comprehensive evaluation index of a power grid based on a knowledge graph, which comprises the following steps:
classifying the power grid evaluation indexes by taking the influence factors of the power grid evaluation indexes as references, realizing the distinguishing treatment of the power grid evaluation indexes, obtaining initialized power grid evaluation index data, and forming a knowledge graph by the initialized power grid evaluation index data;
normalizing the knowledge graph, and converting the initialized power grid evaluation index data into normalized power grid evaluation index data;
performing fault tolerance processing on the normalized power grid evaluation index data, and performing regression filling on the missing value through a filling function;
performing safe transcoding processing on waveform data in the regression-filled power grid evaluation index data;
performing data cleaning on all power grid evaluation index data subjected to the safe transcoding processing, checking the data cleaning result until the training model of the knowledge graph and the triad classification model are completely converged, and performing data filling through a linear regression model;
performing data checking on comprehensive evaluation indexes of the regional power grid involved in the knowledge graph construction;
and storing the knowledge graph of the comprehensive evaluation index of the power grid, which is subjected to the data verification, into a graph database.
In this embodiment, the grid evaluation index includes adequacy, coordination, reliability, safety and economy of the grid. But may also take the form of including at least one of these five grid evaluation indicators. As shown in fig. 1, by analyzing the characteristics of the digital model of the regional power grid system, a comprehensive performance evaluation index system of the power grid is established in five aspects of adequacy, coordination, reliability, safety and economy based on a knowledge graph.
The evaluation indexes in the aspect of adequacy comprise node voltage qualification rate, power grid expansion margin, power supply capacity margin, high loss allocation transformation rate, resource margin and the like; the evaluation indexes in the coordination aspect comprise node voltage qualification rate, DG capacity grid-connected rate, power supply capacity matching degree, line connection rate, DG permeability, load balancing degree and the like; the evaluation index in the reliability aspect consists of voltage fluctuation rate, system average power failure frequency, user average power failure frequency, capacity-to-load ratio and the like; the assessment indexes in the aspect of safety consist of a capacity-to-load ratio, an N-1 passing rate, an overload risk after an accident, an energy loss rate after the accident, a fault rate and the like; the evaluation index in the aspect of economy consists of fault rate, EV charging station electricity utilization proportion, investment recovery period, net present value and the like.
The DG capacity grid-connected rate is used as an important index basis for evaluating the coordination of the regional power grid, and DG represents distributed (power supply) power generation; and the influence factor of the DG capacity grid-connected rate in regional power grid safety evaluation is very small and can be ignored, namely the DG capacity grid-connected rate can be regarded as the evaluation judgment of the regional power grid safety performance is not influenced by the fluctuation of the DG capacity grid-connected rate.
By taking the influence factors of all power grid evaluation indexes as the reference, the method can realize the distinguishing treatment of the power grid evaluation indexes, thereby avoiding the deviation of subsequent analysis treatment caused by the difference of different power grid evaluation indexes on the power grid performance evaluation weight, improving the pertinence and the effectiveness of the treatment of different power grid evaluation index data, reducing the complexity of the treatment steps of subsequent fault tolerance treatment, safe transcoding, data cleaning, data checking and the like, and improving the data robustness when constructing a map.
In order to eliminate the dimensional influence among different power grid evaluation indexes in the same power grid performance evaluation item and the coupling effect generated by calling the same power grid evaluation index by the different power grid performance evaluation items, the same power grid evaluation indexes need to be normalized, and all power grid evaluation indexes are in the same magnitude after the original data are normalized, so that the subsequent comprehensive evaluation work is facilitated.
In this embodiment, the normalization processing of the knowledge graph composed of the initialized power grid evaluation index data includes at least one of weight normalization, logic normalization and relationship attribute normalization, so as to convert the initialized power grid evaluation index data into normalized power grid evaluation index data. The function of normalization processing is to eliminate the dimension influence among different power grid evaluation indexes in the same power grid performance evaluation item, and the coupling effect generated by calling the same power grid evaluation index by the different power grid performance evaluation items, so that normalization processing is needed, and all power grid evaluation indexes are in the same magnitude after the original data are subjected to normalization processing, thereby facilitating subsequent comprehensive evaluation work. All indexes forming the knowledge graph are preprocessed in the early stage, so that the problem of data loss, incompatible formats and the like in building the graph is avoided.
In this embodiment, after the fault tolerance processing is performed, the missing data is automatically backfilled through a filling function, so that relatively complete data in the comprehensive evaluation system of the power grid is obtained. The padding function is a MICE algorithm. The MICE algorithm is a common method for performing regression filling on missing values in R language.
The specific implementation steps of the MICE algorithm comprise:
inputting an instruction in a command line window of the built-in python IDE of the system to check missing values of various power grid comprehensive evaluation indexes:
data_full.isnull().sum().sum()
the number of missing values is typically divided by the length of the data, and the percentage of the number of missing values is calculated
data_mv.isnull().sum()/len(data_mv)
In this embodiment, the regional power grid adequacy lacks about 16% of the data value, where the power grid expansion margin in the regional power grid adequacy evaluation index lacks about 30% of the data value; the power supply capability margin lacks a data value of about 9%; the resource margin lacks a data value of about 16%.
The regional power grid coordination lacks about 15% of data values, wherein the DG capacity grid-tie rate in the regional power grid coordination evaluation index lacks about 7% of data values; the line contact lacks a data value of about 14%; DG permeability lacks a data value of about 29%; the load balancing lacks about 31% of the data value.
The regional power grid reliability and all evaluation index data thereof are complete, and no missing value is found in MICE algorithm inspection.
The regional power grid safety lacks about 7% of data values, wherein the post-accident overload risk in the regional power grid safety evaluation index lacks about 17% of data values; the energy loss rate after an accident lacks a data value of about 19%; the failure rate lacks a data value of about 6%.
The regional power grid economy lacks approximately 17% of the data value, wherein the failure rate in the regional power grid economy evaluation index lacks approximately 6% of the data value; the investment recovery period lacks about 15% of the data value; the net present value lacks about 21% of the data value.
The implementation of the MICE algorithm is accomplished by calling the built-in pythonIDE of the system and the third party plug-in fascimopute library, in which the MICE algorithm is named as the locallmpulter. Inputting instructions to implement index filling in a command line window of the system built-in pythoide, wherein the specific MICE filling operation is as follows:
Comprehensive_evaluation_index_imputed=data_mv.copy()
# copy data to homogeneity_micro_inputted
mice_imputer=IterativeImputer()
# initialize terativeim filter
Comprehensive_evaluation_index_imputed.iloc[:,:]=mice_imputer.fit_transform(Comprehensive_evaluation_index_mice_imputed)
Use fit_tranform padding data
Checking the filled missing value, wherein the specific instructions are as follows:
Comprehensive_evaluation_index_mice_imputed.isnull().sum()
and filling the missing values of the comprehensive evaluation indexes of each power grid by a MICE algorithm, and finding that all the missing values are successfully filled.
In this embodiment, performing the secure transcoding process on the waveform data in the power grid evaluation index includes: the method can perform safe transcoding processing on waveform data obtained by power grid staff in relevant data acquisition or experiments. The specific operation is as follows: waveform data such as high-loss distribution ratio, line contact ratio, DG permeability, system average power failure frequency, user average power failure frequency and the like in the comprehensive evaluation index of the power grid are smoothly filtered by using Hilbert-Huang conversion, noise interference is reduced as much as possible, association among the waveform data is found by using configuration files, waveform parallel drawing is carried out, and a patterned waveform file power supply network worker directly takes the waveform data, so that safe transcoding of the waveform data is realized. The data types in the power grid evaluation index also comprise common data such as integer type, character type, floating point type and the like.
In this embodiment, as shown in fig. 3, the data cleansing and verification data cleansing results include:
acquiring an unwashed knowledge graph to be washed, wherein the knowledge graph to be washed comprises entities formed by comprehensive evaluation index data of a power grid, and corresponding entity attributes and relationship attributes, and each corresponding group of entities, entity attributes and relationship between the entity attributes form a triplet;
the method comprises the steps of performing first cleaning on comprehensive evaluation index data of a power grid to be processed, and removing invalid values and abnormal values in the comprehensive evaluation index data by using a Shore-Viiler criterion;
carrying a preset knowledge graph training model and a ternary group classification model in a power grid comprehensive evaluation index processing system based on the knowledge graph;
training and analyzing through a knowledge graph training model and a ternary group classification model preset by the system, performing second cleaning, screening out error ternary groups and removing the error ternary groups;
using a linear regression model to fill the missing value of the removed data;
performing double-accumulation analysis and inspection on the comprehensive evaluation indexes of the cleaned power grid, and ending data cleaning if the inspection result meets a preset value; and otherwise, continuing to iterate the data, and repeatedly performing the first cleaning, the second cleaning and the missing value filling processing until the test result meets a preset value.
In this embodiment, the first cleaning includes: abnormal data is processed by using the Showler criterion, and the data judged to be abnormal values is removed from the abnormal data.
According to the cause relation between different power grid evaluation indexes and five power grid performance aspects, eliminating that a cause relation is established between the power grid performance on one aspect and a certain evaluation index under the condition that no artificial setting exists, wherein the influence factor of the evaluation index on the power grid evaluation performance on the other aspect is smaller than a preset threshold value.
And carrying a Las-Scheivler criterion calculation model in the knowledge graph-based power grid comprehensive evaluation index processing system, and processing abnormal data by utilizing the Scheivler criterion (Chauvenet Criterion). And calculating and analyzing the influence factors of all the evaluation indexes on a certain power grid performance evaluation item, determining the weight of the indexes on the certain power grid performance evaluation item according to the influence factors, and eliminating unreasonable or redundant invalid evaluation indexes.
The ideas of the Showylor criteria are: firstly, all grid evaluation indexes for evaluating five performances of the regional grid are constructed into a data set, and a probability band taking a normal distribution mean value as a center is determined, wherein all k indexes of the grid comprehensive evaluation indexes are covered. In this embodiment, the grid evaluation index is defined as a sample, and k is set to 21. After the above setting, any sample data value not within the probability band will be determined as an outlier by the determination and removed from the dataset.
Determining the outlier by finding a mean value D max And comparing the standard deviation number corresponding to the boundary of the surrounding probability band with the absolute value of the difference between the suspected abnormal value and the average value, and judging the value as the abnormal value if the absolute value is larger than the set maximum deviation value.
The calculation formula of the Showler criterion is as follows:
Figure BDA0003966222040000121
wherein D is max For the set maximum deviation value, x is the suspicious outlier, mu is the sample mean value, and delta is the sample standard deviation.
And performing training analysis on the second cleaning through a knowledge graph training model and a ternary group classification model preset by the system, screening out error triples and removing the error triples.
The power grid comprehensive evaluation knowledge graph comprises entities formed by the five aspects of adequacy, harmony, reliability, safety and economy, and corresponding entity attributes and relationship attributes thereof, and the power grid comprehensive evaluation index processing system forms a triplet by corresponding each group of entities, entity attributes and relationships between the entities, and initializes the confidence coefficient of all triples in the knowledge graph.
And carrying a preset knowledge graph training model and a ternary group classification model pair ternary group classification model in the knowledge graph-based power grid comprehensive evaluation index processing system. The preset training model TransE is integrated and packaged, relevant plug-ins are downloaded, and a calling command is set at the rear end for use.
The knowledge graph training model adopts a Trans-E model, and the Trans-E model is an integrated model, and is plug and play, and when in use, the knowledge graph training model can be used for training and adjusting parameters to be optimal according to different data sets. And inputting all triples into the Trans-E model, and training a noise-perceived knowledge graph model by using random negative sampling.
In an alternative embodiment, each entity and relationship in the knowledge-graph is mapped to a vector using a Trans-E model, and the impact of noise data on the embedded vector is reduced by adding confidence to the loss function of the Trans-E model.
And presetting a triplet scoring formula in the Trans-E model, inputting all triples into the triplet grouping model for training according to the scoring condition of each trained triplet, and refreshing the confidence coefficient of each triplet after training is finished, wherein the confidence coefficient of different triples is different.
And the output of the classifier is constrained to be 0-1 by adopting a Sigmoid function in the triplet classification model, namely the confidence coefficient of each triplet is 1 at the maximum value and 0 at the minimum value. The Sigmoid function, also called a Logistic function, is used for implicit neuron output, and has a value range of (0, 1), and can map a real number to the interval of (0, 1) and can be used for classification.
In an alternative embodiment, the Trans-E model and the triplet classification model may be combined with iterative training until the knowledge-graph training model and the triplet classification model are both fully converged.
The iteration times can be set according to actual conditions.
In an alternative implementation mode, according to the results of the multiple parameter adjustment experiments, the training iteration number is set to be 6, and the cleaning effect is optimal; when the cleaning times are less than 6 times, the cleaning effect is poor, and a small part of dirty data is not cleaned; when the number of cleaning times is more than 6, the cleaning effect is over-fitted, and a small part of clean data can be accidentally cleaned.
In an alternative embodiment, the preset confidence threshold is 0.75, that is, when the confidence of the triplet is greater than or equal to 0.75, the triplet is determined to be correct, and then the triplet is reserved; when the confidence of the triplet is less than 0.75, the triplet is judged to be wrong, and then the triplet is rejected.
Clean data can be extracted after the data are cleaned, so that the data quality of the comprehensive evaluation knowledge graph system of the power grid can be improved, and repeated cleaning work caused by unqualified data checking in the future can be avoided.
The cleaning method combines the type characteristics of the comprehensive evaluation index of the power grid and the advantages of machine learning in the aspect of data cleaning, and utilizes the Python language to write a related algorithm. The linear regression model keeps the corresponding feature vector structure and eliminates the data qualitative mark, so that the data cleaning task can be efficiently and conveniently completed, and the variability of the data is effectively reduced.
Therefore, the multi-element secondary cleaning and subsequent filling and checking aiming at the comprehensive evaluation index of the power grid are useless evaluation indexes for realizing the subsequent elimination of outdated or invalid, and the core data and the main network topological structure are reserved; and a reasonable and consistent data basis can be provided for further knowledge graph construction of a comprehensive power grid evaluation system.
In this embodiment, the filling by the linear regression model includes: .
Converting the mathematical expression of the linear regression model into a vector expression:
the mathematical expression of the linear regression model is:
Figure BDA0003966222040000141
wherein k is the number of all evaluation indexes in the power grid comprehensive evaluation knowledge graph system, and x= (x) 1 ,x 2 ,…,x k ) T A random variable with k dimension exists in a column vector form; b is a constant term and is used as a judging threshold value of the power grid evaluation index missing value.
The missing data filling method based on the linear regression model is to take a variable containing a missing value as a prediction target, take other variables or a subset thereof in a data set as input variables, construct a training set through non-missing values of the variables, train a regression model, and predict the missing value of the corresponding variable by using the constructed linear regression model.
In an alternative embodiment, the complete triplet in a given grid comprehensive evaluation knowledge graph system may be set as the data set S:
Figure BDA0003966222040000151
in order to facilitate the implementation of the deficiency value filling processing by using the linear regression model in the power grid comprehensive evaluation index processing system based on the knowledge graph, the mathematical expression of the linear regression model is converted into a vector expression which is easy to program and construct.
The vector expression of the linear regression model is:
Figure BDA0003966222040000152
in the formula, a power grid evaluation index containing a missing value is set as a dependent variable x; the rest of grid evaluation indexes are used as multidimensional independent variables y; k is the number of all evaluation indexes in the comprehensive evaluation knowledge graph system of the power grid, and k=21 is set in the system, namely 21 power grid evaluation indexes such as node voltage qualification rate, power grid expansion margin, power supply capacity margin and the like are set; n is the number of the comprehensive evaluation performance categories of the power grid, n=5 is set in the system, and the comprehensive evaluation performance categories of the power grid are 5, namely, the comprehensive evaluation performance categories of the power grid are five performances of regional power grid, namely, the adequacy, coordination, reliability, safety and economy; a is an influence factor of each evaluation index in a certain power grid performance evaluation item; b is a constant term and is used as a judging threshold value of the power grid evaluation index missing value.
In the process of training the linear regression model, taking complete data in a comprehensive evaluation knowledge graph system of the power grid as a training set, and taking a power grid evaluation index containing a missing value as a test set, wherein the missing value is a dependent variable x to be predicted.
In this embodiment, the data checking of the comprehensive evaluation index of the power grid refers to performing data checking on the comprehensive evaluation index of the regional power grid related to the knowledge graph construction according to the created entity, entity attribute and corresponding relationship attribute.
The data checking method for the comprehensive evaluation index of the power grid comprises the following five items:
firstly, checking whether the data quantity of each power grid evaluation index is correct, namely the integrity of the data;
checking whether the data format type is correct;
determining whether the relation attribute of each power grid evaluation index corresponds or not through the association among the comprehensive performances of the five power grids;
checking whether index redundancy exists under a certain power grid performance evaluation item;
the data standardization is checked to prevent that the related grid evaluation indexes cannot be summarized under a certain grid performance evaluation item in the knowledge graph when the names of the grid evaluation indexes are not standardized.
In this embodiment, storing the knowledge-graph in the graph database includes: and storing the power grid comprehensive evaluation system knowledge graph after the series of processing operations and the corresponding comprehensive evaluation indexes thereof into a Neo4j graph database through submitting a py2Neo library of Python, so that long-term storage and dynamic update of the power grid comprehensive evaluation system knowledge graph can be realized.
The embodiment also provides a power grid comprehensive evaluation index processing system based on the knowledge graph, which comprises:
the knowledge graph construction module is used for classifying the power grid evaluation indexes by taking the influence factors of the power grid evaluation indexes as references, realizing the distinguishing treatment of the power grid evaluation indexes, obtaining initialized power grid evaluation index data and forming a knowledge graph;
the normalization processing module is used for performing normalization processing on the knowledge graph and converting the initialized power grid evaluation index data into normalized power grid evaluation index data;
the fault-tolerant processing module is used for carrying out fault-tolerant processing on the normalized power grid evaluation index data, and then carrying out regression filling on the missing value through a filling function;
the safety transcoding processing module is used for carrying out safety transcoding processing on the waveform data in the regression-filled power grid evaluation index data;
the data cleaning module is used for carrying out data cleaning on all power grid evaluation index data subjected to the safe transcoding processing, checking the data cleaning result until the training model of the knowledge graph and the triad classification model are completely converged, and then carrying out data filling through a linear regression model;
the data checking module is used for checking data of the comprehensive evaluation indexes of the regional power grid involved in the knowledge graph construction;
and the data storage module is used for storing the knowledge graph of the comprehensive evaluation index of the power grid, which is subjected to the data verification, into a graph database.
In summary, the comprehensive performance evaluation index system of the regional power grid is considered in aspects of adequacy, coordination, reliability, safety, economy and the like of the regional power grid, is applied to actual power grid work, and the knowledge graph is adopted to intuitively display the whole evaluation system.
The invention provides a method and a system for processing comprehensive evaluation indexes of a power grid based on a knowledge graph, which can not only carry out targeted standardized processing, fault-tolerant processing, safe transcoding and other preprocessing operations on the comprehensive evaluation indexes of the power grid, but also improve the accuracy and the reliability of the comprehensive evaluation indexes of each power grid.
The method has strong practicability and feasibility, can provide important technical support and research carriers for dispatching and planning of the regional power grid and corresponding decision-making work, and has high popularization and application values.
The above-described invention is merely representative of embodiments of the present invention and should not be construed as limiting the scope of the invention, nor any limitation in any way as to the structure of the embodiments of the present invention. It should be noted that it will be apparent to those skilled in the art that various changes and modifications can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. The power grid comprehensive evaluation index processing method based on the knowledge graph is characterized by comprising the following steps of:
classifying the power grid evaluation indexes by taking the influence factors of the power grid evaluation indexes as references, realizing the distinguishing treatment of the power grid evaluation indexes, obtaining initialized power grid evaluation index data, and forming a knowledge graph by the initialized power grid evaluation index data;
normalizing the knowledge graph, and converting the initialized power grid evaluation index data into normalized power grid evaluation index data;
performing fault tolerance processing on the normalized power grid evaluation index data, and performing regression filling on the missing value through a filling function;
performing safe transcoding processing on waveform data in the regression-filled power grid evaluation index data;
performing data cleaning on all power grid evaluation index data subjected to the safe transcoding processing, checking the data cleaning result until the training model of the knowledge graph and the triad classification model are completely converged, and performing data filling through a linear regression model;
performing data checking on comprehensive evaluation indexes of the regional power grid involved in the knowledge graph construction;
and storing the knowledge graph of the comprehensive evaluation index of the power grid, which is subjected to the data verification, into a graph database.
2. The knowledge-graph-based power grid comprehensive evaluation index processing method as set forth in claim 1, wherein: the power grid evaluation index at least comprises one of adequacy, harmony, reliability, safety and economy of the power grid.
3. The knowledge-graph-based power grid comprehensive evaluation index processing method as set forth in claim 1, wherein: and after the fault tolerance processing is carried out, the missing data is automatically backfilled through a filling function, and then relatively complete data in the comprehensive evaluation system of the power grid is obtained.
4. The knowledge-graph-based power grid comprehensive evaluation index processing method according to claim 1, wherein the secure transcoding process comprises:
and carrying out smoothing filtering on the comprehensive evaluation index waveform data of the power grid by using Hilbert-Huang transformation, finding out association between the waveform data by using a configuration file, and carrying out waveform parallel drawing to form a graphical waveform file.
5. The method for processing the comprehensive evaluation index of the power grid based on the knowledge graph according to claim 1, wherein the data cleaning and verification data cleaning results comprise:
acquiring an unwashed knowledge graph to be washed, wherein the knowledge graph to be washed comprises entities formed by comprehensive evaluation index data of a power grid, and corresponding entity attributes and relationship attributes, and each corresponding group of entities, entity attributes and relationship between the entity attributes form a triplet;
the method comprises the steps of performing first cleaning on comprehensive evaluation index data of a power grid to be processed, and removing invalid values and abnormal values in the comprehensive evaluation index data by using a Shore-Viiler criterion;
carrying a preset knowledge graph training model and a ternary group classification model in a power grid comprehensive evaluation index processing system based on the knowledge graph;
training and analyzing through a knowledge graph training model and a ternary group classification model preset by the system, performing second cleaning, screening out error ternary groups and removing the error ternary groups;
using a linear regression model to fill the missing value of the removed data;
performing double-accumulation analysis and inspection on the comprehensive evaluation indexes of the cleaned power grid, and ending data cleaning if the inspection result meets a preset value; and otherwise, continuing to iterate the data, and repeatedly performing the first cleaning, the second cleaning and the missing value filling processing until the test result meets a preset value.
6. The knowledge-graph-based power grid comprehensive evaluation index processing method according to claim 5, wherein the first cleaning comprises:
constructing all power grid evaluation indexes as samples into a data set by using a Showiler criterion, determining a probability band which takes a normal distribution mean value as a center, judging any sample data value which is not in the probability band as an abnormal value by the judgment, and removing the sample data value from the data set;
the calculation formula of the Showler criterion is as follows:
Figure FDA0003966222030000021
wherein D is max For the set maximum deviation value, x is the suspicious outlier, mu is the sample mean value, and delta is the sample standard deviation.
7. The knowledge-graph-based power grid comprehensive evaluation index processing method according to claim 6, wherein the second cleaning comprises:
inputting all triples into a Trans-E training model, and training a noise perceived knowledge graph model by using random negative sampling;
a triplet scoring formula is preset in the Trans-E model, all triples are input into the triplet grouping model for training according to the scoring condition of each trained triplet, and the confidence coefficient of each triplet is refreshed after training is finished, wherein the confidence coefficient of different triples is different;
a preset confidence coefficient threshold value, namely when the confidence coefficient of the triplet is larger than a preset threshold value, judging that the triplet is correct, and reserving the triplet; when the confidence coefficient of the triplet is smaller than a preset threshold value, judging that the triplet is wrong, and eliminating;
the output of the classifier is constrained to be 0-1 by adopting a Sigmoid function in the triplet classification model;
combining the Trans-E model and the triplet classification model with iterative training until the knowledge graph training model and the triplet classification model are completely converged.
8. The method for processing the comprehensive evaluation index of the power grid based on the knowledge graph according to claim 1, wherein the filling by the linear regression model comprises the following steps:
converting the mathematical expression of the linear regression model into a vector expression:
the mathematical expression of the linear regression model is:
Figure FDA0003966222030000031
converting the mathematical expression into a vector expression as follows:
Figure FDA0003966222030000041
wherein x is a power grid evaluation index containing a missing value; y is a power grid evaluation index without a missing value; k is the number of all the power grid evaluation indexes; n is the comprehensive evaluation performance class number of the power grid; a, a k The influence factors of each evaluation index in a certain power grid performance evaluation item are used; b is a constant term and is used as a power grid evaluation index missing value judgment threshold value.
9. The method for processing the comprehensive evaluation index of the power grid based on the knowledge graph according to claim 1, wherein the data checking of the comprehensive evaluation index of the regional power grid involved in the construction of the knowledge graph comprises the following steps:
checking whether the data quantity of each power grid evaluation index is correct;
checking whether the data format type is correct;
determining whether the relation attribute of each power grid evaluation index corresponds or not through the association among the different power grid evaluation indexes;
checking whether index redundancy exists under a certain power grid performance evaluation item;
the data were checked for standardization.
10. A power grid comprehensive evaluation index processing system based on a knowledge graph is characterized by comprising:
the knowledge graph construction module is used for classifying the power grid evaluation indexes by taking the influence factors of the power grid evaluation indexes as references, realizing the distinguishing treatment of the power grid evaluation indexes, obtaining initialized power grid evaluation index data and forming a knowledge graph;
the normalization processing module is used for performing normalization processing on the knowledge graph and converting the initialized power grid evaluation index data into normalized power grid evaluation index data;
the fault-tolerant processing module is used for carrying out fault-tolerant processing on the normalized power grid evaluation index data, and then carrying out regression filling on the missing value through a filling function;
the safety transcoding processing module is used for carrying out safety transcoding processing on the waveform data in the regression-filled power grid evaluation index data;
the data cleaning module is used for cleaning the data of each power grid evaluation index data subjected to the safe transcoding treatment, checking the data cleaning result until the training model of the knowledge graph and the triad classification model are completely converged, and filling the data through a linear regression model;
the data checking module is used for checking data of the comprehensive evaluation indexes of the regional power grid involved in the knowledge graph construction;
and the data storage module is used for storing the knowledge graph of the comprehensive evaluation index of the power grid, which is subjected to the data verification, into a graph database.
CN202211499589.5A 2022-11-28 2022-11-28 Knowledge graph-based power grid comprehensive evaluation index processing method and system Pending CN116128332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211499589.5A CN116128332A (en) 2022-11-28 2022-11-28 Knowledge graph-based power grid comprehensive evaluation index processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211499589.5A CN116128332A (en) 2022-11-28 2022-11-28 Knowledge graph-based power grid comprehensive evaluation index processing method and system

Publications (1)

Publication Number Publication Date
CN116128332A true CN116128332A (en) 2023-05-16

Family

ID=86299833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211499589.5A Pending CN116128332A (en) 2022-11-28 2022-11-28 Knowledge graph-based power grid comprehensive evaluation index processing method and system

Country Status (1)

Country Link
CN (1) CN116128332A (en)

Similar Documents

Publication Publication Date Title
CN109873501B (en) Automatic identification method for low-voltage distribution network topology
CN111337768A (en) Deep parallel fault diagnosis method and system for dissolved gas in transformer oil
CN111552813A (en) Power knowledge graph construction method based on power grid full-service data
CN109102157A (en) A kind of bank's work order worksheet processing method and system based on deep learning
LU500551B1 (en) Virtual load dominant parameter identification method based on incremental learning
WO2020259391A1 (en) Database script performance testing method and device
CN114281878A (en) Multimode data fusion method, device and medium for power market
CN112865089A (en) Improved large-scale scene analysis method for active power distribution network
CN113112090A (en) Space load prediction method based on principal component analysis of comprehensive mutual information degree
CN117556369B (en) Power theft detection method and system for dynamically generated residual error graph convolution neural network
CN115330435A (en) Method, device, equipment and medium for establishing carbon emission right price index system
CN115660170A (en) Multidimensional index weight collaborative optimization data asset management effect differentiation evaluation method and system
CN114021425B (en) Power system operation data modeling and feature selection method and device, electronic equipment and storage medium
CN108537581B (en) Energy consumption time series prediction method and device based on GMDH selective combination
CN117609818A (en) Power grid association relation discovery method based on clustering and information entropy
CN112819208A (en) Spatial similarity geological disaster prediction method based on feature subset coupling model
Peng et al. Knowledge graph for power grid dispatching of digital homes based on graph convolutional network
CN116128332A (en) Knowledge graph-based power grid comprehensive evaluation index processing method and system
CN113092934B (en) Single-phase earth fault judgment method and system based on clustering and LSTM
CN115409317A (en) Transformer area line loss detection method and device based on feature selection and machine learning
CN113886592A (en) Quality detection method for operation and maintenance data of power information communication system
Liu et al. Research on text classification method of distribution network equipment fault based on deep learning
CN114254828A (en) Power load prediction method based on hybrid convolution feature extractor and GRU
CN114386647A (en) Method and system for predicting energy consumption of oil and gas field industry
CN112329432A (en) Power distribution network voltage out-of-limit problem correlation analysis method based on improved Apriori

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination