CN115983250A

CN115983250A - Knowledge graph-based power anomaly data root cause positioning method and system

Info

Publication number: CN115983250A
Application number: CN202310029233.3A
Authority: CN
Inventors: 唐汉; 杨芳; 汤鲸; 胡胜玉; 罗有志; 谢尚晟; 李琼
Original assignee: State Grid Corp of China SGCC; State Grid Hunan Electric Power Co Ltd; Metering Center of State Grid Hunan Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Hunan Electric Power Co Ltd; Metering Center of State Grid Hunan Electric Power Co Ltd
Priority date: 2023-01-09
Filing date: 2023-01-09
Publication date: 2023-04-18

Abstract

The invention discloses a knowledge graph-based power anomaly data root cause positioning method, which comprises the steps of acquiring data information of a target power system; combing the data assets; extracting knowledge of data; constructing a corresponding knowledge graph; performing map iteration on the knowledge map; and searching the knowledge graph to complete the root positioning of the abnormal power data. The invention also discloses a system for realizing the power anomaly data root cause positioning method based on the knowledge graph. According to the method and the system for positioning the root cause of the abnormal power data based on the knowledge graph, the knowledge graph is constructed through calculation, an analysis algorithm and a regular expression in natural language processing are combined, and the source of the abnormal data generation is intelligently analyzed based on a parallel calculation algorithm; the method can obviously improve the abnormal data root positioning efficiency, and has high reliability and good accuracy.

Description

Knowledge graph-based power anomaly data root cause positioning method and system

Technical Field

The invention belongs to the field of electrical automation, and particularly relates to a knowledge graph-based electric power abnormal data root cause positioning method and system.

Background

With the development of economic technology and the improvement of living standard of people, electric energy becomes essential secondary energy in production and life of people, and brings endless convenience to production and life of people. Therefore, ensuring a stable and reliable supply of electric energy is one of the most important tasks of an electric power system.

The power anomaly data is data that must be located in a power system in time. The large scale, cross-department and cross-specialty of the power data determine the difficulty of the root positioning work of the power abnormal data. Because the data assets of the power enterprises present typical big data characteristics, the power data cover all links of power production, power marketing and power scheduling, including various data such as power grid operation, equipment management, marketing service and enterprise management, and abnormal data can be generated in each link. The richness of the power data leads to higher and higher requirements on specialization and intelligence of the data, and a large amount of data quality problems are generated in the process, so that the difficulty is increased for the root cause positioning of abnormal data.

At present, the discovery of the abnormal power data mainly depends on passive rule checking and script manual checking. Due to the fact that abnormal data relate to complex services, a large number of systems and large data quantity, a traditional data abnormality finding mode is time-consuming, labor-consuming and low in efficiency, and the average time for finding one type of abnormal data is 48 hours.

Disclosure of Invention

One of the purposes of the invention is to provide a power abnormal data root source positioning method based on the knowledge graph, which has high reliability, good accuracy and higher efficiency.

The invention also aims to provide a system for realizing the power anomaly data root cause positioning method based on the knowledge graph.

The invention provides a power anomaly data root cause positioning method based on a knowledge graph, which comprises the following steps:

s1, acquiring data information of a target power system;

s2, combing the data assets according to the data information obtained in the step S1;

s3, extracting knowledge of data according to the carding result of the step S2;

s4, constructing a corresponding knowledge graph according to the knowledge extraction result obtained in the step S3;

s5, performing map iteration on the knowledge map constructed in the step S4 based on a natural language processing technology;

and S6, searching the knowledge graph obtained in the step S5 based on the breadth-first algorithm, the depth-first algorithm and the shortest path algorithm, and completing the root cause positioning of the abnormal power data.

The step S2 of combing the data assets according to the data information obtained in the step S1 specifically includes the following steps:

the data assets include source end data systems; and importing the data into a set data path according to a physical model mode, and inputting the corresponding relation between the table and the field in the data asset and the detailed path of the table according to a designed data model.

And S3, extracting the knowledge of the data according to the combing result in the step S2, which specifically comprises the following steps:

acquiring data contents of the data assets after the combing;

constructing a triple of the acquired data according to the data structure of the entity-relation-entity;

the entities comprise roles, services, processes, data, rules and rectification; the relations include role-business relation, business-process relation, process-data relation, data-rule relation and rule-rectification relation.

Step S4, constructing a corresponding knowledge graph according to the knowledge extraction result obtained in step S3, specifically including the following steps:

converting the knowledge extraction result into a vector in a form of a unique heat vector;

carrying out entity alignment on the data;

defining and dynamically managing the management relationship between the organization and the role from the longitudinal dimension and the transverse dimension through metadata driving, and constructing a map relationship; the map relation comprises the map relation between an organization and a role, the map management between the role and a service, the map relation between the service and data, the map relation between the data and a rule, the map relation between the rule and abnormal data, the map relation between the abnormal data and the service and the map relation between the abnormal data and the role.

The natural language processing technology-based atlas iteration is performed on the knowledge-atlas constructed in the step S4, and specifically comprises the following steps:

extracting keywords of each text in the text set to be processed;

clustering texts to be processed to generate a plurality of theme text sets;

counting the frequency of the seed words in each topic text set: reserving the subject text set with the frequency exceeding a set threshold value, and using the subject text set as a source text set of the field dictionary expansion;

calculating the association degree of each candidate word in the text of the seed word and the source text set, and storing the candidate words with the association degree reaching a set threshold value into a generation-expanded dictionary as field words;

regeneration of relationships between entities: reconstructing the incidence relation of the entities in the map by combining the historical entities and the newly generated entities;

and updating the knowledge graph nodes and the relationship between the nodes.

Obtaining seed words by using the ambiguity segmentation of words and the identification of unknown words;

the ambiguous segmentation of words comprises the following steps:

and (3) detection of segmentation ambiguity: obtaining the probability of each segmentation method through a sequence labeling model obtained through training, and selecting a plurality of segmentation methods with excellent probability performance;

resolving segmentation ambiguity: obtaining a globally optimal segmentation mode of the text through a conditional random field model, and taking the segmentation mode as a final word segmentation result; the formula for the conditional random field model is:

wherein P (y | x) is the conditional probability of the state sequence y under the condition of the observation sequence x; lambda [ alpha ] _k Is a transfer characteristic coefficient; t is t _k Is a transfer characteristic function; y is _i The state at the moment i; x is an observation sequence; i is a subscript variable of the time; mu.s _l Is a state characteristic coefficient at the moment l; s is _l Is a state feature function; z (x) is a normalization term, an

y is a state sequence;

the identification of the unknown words comprises the following steps:

comparing the segmented words with the existing word stock; screening out words which are not in the word bank, and taking the words with the frequency exceeding a set value as unknown words;

comparing with the industry proper noun; and recognizing words in the segmented text through a proper noun dictionary, and taking a recognition result as an unknown word of the text.

And S6, searching the knowledge graph obtained in the step S5 based on the breadth-first algorithm, the depth-first algorithm and the shortest path algorithm to complete the root location of the abnormal data of the electric power, specifically, based on the constructed knowledge graph, adopting the breadth-first algorithm, the depth-first algorithm and the shortest path algorithm to correspond tables and fields related to the abnormal data with nodes in the graph data, and locating the business process generated by the abnormal data and all entities and relations related to the process, thereby discovering the process, link and data item generated by the abnormal data and realizing the location of the root of the abnormal data.

The breadth-first algorithm specifically comprises the following steps:

setting the shortest path for finding nodes α and β in the node set p:

first accessing all the neighboring nodes of the node alpha

Record the node that has been visited with set T and record the existing path with set S, at which time ≧>

Then, all the adjacent nodes are traversed

All non-accessed neighbor nodes that access the neighbor node->

At this time>

Simultaneously recording access paths and storing the access paths in a set S;

and repeating the steps until the node beta is visited, and acquiring the shortest path from the set S at the moment.

The depth-first algorithm specifically comprises the following steps:

setting the shortest path for finding nodes α and β in the node set p:

first accessing a neighboring node alpha of the node alpha ₁ If the set T records the accessed nodes and the set S records the existing path, then T = { alpha, alpha = ₁ }，S＝{α→α ₁ }；

Then, the access node α ₁ A neighboring node of ₂ And is

When T = { α, α ₁ ,α ₂ }，S＝{α→α ₁ →α ₂ }；

Repeating the steps, and restarting from the node alpha when the adjacent node which is not accessed does not exist; and obtaining the shortest path from the set S until the node beta is visited.

The shortest path algorithm specifically comprises the following steps:

setting the shortest path to find nodes α and β:

initialization dis (alpha) ₀ )＝0，α ₀ ＝α；

Find and peak alpha ₀ Marking the corresponding point as the determined point by the path with the shortest distance of the undetermined point;

traverse all by alpha ₀ As the starting edge, get (α) ₀ ,α ₁ D); if dis (alpha) ₁ )＞dis(α ₀ ) + d, then dis (α) is updated ₁ ) Has a value of dis (alpha) ₀ ) + d; d is a distance, α ₁ Is a node;

repeating the two steps until all the points are marked as the points for determining the shortest path; the final determined path is the shortest path.

The invention also discloses a system for realizing the power anomaly data root-source positioning method based on the knowledge graph, which comprises a data acquisition module, an asset combing module, a knowledge extraction module, a graph construction module, a graph iteration module and a root-source positioning module; the data acquisition module, the asset combing module, the knowledge extraction module, the map construction module, the map iteration module and the root positioning module are sequentially connected in series; the data acquisition module is used for acquiring data information of a target power system and uploading the data to the asset carding module; the asset combing module is used for combing the data assets according to the received data and uploading the data to the knowledge extraction module; the knowledge extraction module is used for extracting the knowledge of the data according to the received data and uploading the data to the map construction module; the map building module is used for building a corresponding knowledge map according to the received data and uploading the data to the map iteration module; the map iteration module is used for performing map iteration on the constructed knowledge map according to the received data based on a natural language processing technology and uploading the data to the root positioning module; and the root positioning module is used for searching the acquired knowledge graph based on an breadth-first algorithm, a depth-first algorithm and a shortest path algorithm according to the received data to complete the root positioning of the abnormal power data.

According to the method and the system for positioning the root cause of the abnormal power data based on the knowledge graph, the knowledge graph is constructed through calculation, an analysis algorithm and a regular expression in natural language processing are combined, and the source of the abnormal data generation is intelligently analyzed based on a parallel calculation algorithm; the method can obviously improve the abnormal data root positioning efficiency, and has high reliability and good accuracy.

Drawings

FIG. 1 is a schematic process flow diagram of the process of the present invention.

FIG. 2 is a functional block diagram of the system of the present invention.

Detailed Description

FIG. 1 is a schematic flow chart of the method of the present invention: the invention provides a power anomaly data root cause positioning method based on a knowledge graph, which comprises the following steps:

s1, acquiring data information of a target power system;

s2, combing the data assets according to the data information obtained in the step S1; the method specifically comprises the following steps:

data assets include source data systems (e.g., PMS, CMS, etc.); importing data into a set data path according to a physical model mode, and simultaneously inputting a corresponding relation between a table and a field in the data asset and a detailed path of the table according to a designed data model;

s3, extracting knowledge of data according to the carding result of the step S2; the method specifically comprises the following steps:

acquiring data contents of the data assets after the combing;

constructing a triple of the acquired data according to an entity-relation-entity data structure;

the entities comprise roles, services, processes, data, rules and rectification; the relations comprise a role-business relation, a business-process relation, a process-data relation, a data-rule relation and a rule-rectification relation;

s4, constructing a corresponding knowledge graph according to the knowledge extraction result obtained in the step S3; the method specifically comprises the following steps:

carrying out entity alignment on the data; for example, the same marketing data term may have different meanings in different application scenarios, and entity disambiguation may be required. Unifying different marketing service data, and endowing different attributes to terms according to application scenes;

defining and dynamically managing the management relationship between the organization and the role from the longitudinal dimension and the transverse dimension through metadata driving, and constructing a map relationship; the map relation comprises a map relation between an organization and a role, map management between the role and a service, a map relation between the service and data, a map relation between the data and a rule, a map relation between the rule and abnormal data, a map relation between the abnormal data and the service and a map relation between the abnormal data and the role;

s5, performing map iteration on the knowledge map constructed in the step S4 based on a natural language processing technology; the method specifically comprises the following steps:

extracting keywords of each text in the text set to be processed;

clustering texts to be processed to generate a plurality of theme text sets;

counting the frequency of occurrence of seed words in each topic text set: reserving the subject text set with the frequency exceeding a set threshold value, and using the subject text set as a source text set of the field dictionary expansion;

calculating the association degree of each candidate word in the text of the seed word and the source text set, and storing the candidate word of which the association degree reaches a set threshold value as a field word into a dictionary extended by a generation;

and updating the knowledge graph nodes and the relationship among the nodes.

the ambiguous segmentation of words comprises the following steps:

resolution of segmentation ambiguity: obtaining a globally optimal segmentation mode of the text through a conditional random field model, and taking the segmentation mode as a final word segmentation result; the formula for the conditional random field model is:

y is a state sequence;

the identification of the unknown words comprises the following steps:

comparing with industry proper nouns; recognizing words in the segmented text through a proper noun dictionary, and taking a recognition result as an unknown word of the text;

in the dictionary-based method, for a given word, only the words existing in the dictionary can be recognized, the method used is the forward maximum Matching Method (MM), the effect of which depends on the coverage of the dictionary, and therefore new words need to be updated regularly;

s6, searching the knowledge graph obtained in the step S5 based on an breadth-first algorithm, a depth-first algorithm and a shortest path algorithm to complete root positioning of the abnormal power data; specifically, based on a constructed knowledge graph, a breadth-first algorithm, a depth-first algorithm and a shortest path algorithm are adopted, tables and fields related to abnormal data correspond to nodes in graph data, and a business process generated by the abnormal data and all entities and relations related to the process are located, so that the process, links and data items generated by the abnormal data are found, and the root of the abnormal data is located.

In specific implementation, the breadth-first algorithm specifically includes the following steps:

the breadth-first search algorithm is sent from a certain point, the first step is to visit all adjacent nodes of the point, record the adjacent nodes, then visit the adjacent nodes of the adjacent nodes, if the adjacent nodes have been visited before, the visit is skipped until the target node is obtained;

setting the shortest path for finding nodes α and β in the node set p:

first accessing all the neighboring nodes of the node alpha

Then, all the adjacent nodes are traversed

All non-accessed neighbor nodes that access the neighbor node->

At this time>

Simultaneously recording access paths and storing the access paths in a set S;

repeating the steps until the node beta is accessed, and acquiring the shortest path from the set S at the moment;

the parallel breadth-first search algorithm refers to the fact that in the search process, a plurality of nodes can be started to perform next search and access at the same time, the nodes do not interfere with each other, and efficiency is greatly improved;

in specific implementation, the depth-first algorithm specifically includes the following steps:

the principle of the depth-first search algorithm is that after a next adjacent node is found from a node, the next adjacent node is continuously found until a target node is accessed or the current node is accessed or no adjacent node exists;

setting the shortest path for finding nodes α and β in the node set p:

Then, the access node α ₁ One adjacent node alpha of ₂ And is

When T = { α, α ₁ ,α ₂ }，S＝{α→α ₁ →α ₂ }；

Repeating the steps, and restarting from the node alpha when no adjacent node which is not accessed exists; until the node beta is visited, at the moment, the shortest path is obtained from the set S;

the depth-first search algorithm refers to that the next search and access can be carried out from a plurality of nodes simultaneously in the search process, and the nodes do not interfere with each other, so that the efficiency is greatly improved;

in specific implementation, the shortest path algorithm specifically includes the following steps:

the single source shortest path search algorithm supports finding the shortest path for weighted connecting edges. The main principle is to assume that all vertex sets are G, set a vertex set point set S and continuously make greedy selection to expand the set, and set V = G-S. A vertex belongs to the set sfet and only if the shortest path length from the source to the vertex is known. Initially, S contains only the source, i.e. the starting point. Let u be one vertex of G. A path from a source to u and passing through the middle vertex only in S is called a special path from the source to u, the length of the shortest special path corresponding to each current vertex is recorded by using a matrix A, the vertex with the length of the shortest special path is taken out from V each time, and the length of the shortest path from the source to all other vertices is recorded by using the matrix A;

setting the shortest path to find nodes α and β:

initializing dis (alpha) ₀ )＝0，α ₀ ＝α；

traverse all by alpha ₀ As the starting edge, get (α) ₀ ,α ₁ D); if dis (alpha) ₁ )＞dis(α ₀ ) + d, then update dis (α) ₁ ) Has a value of dis (alpha) ₀ ) + d; d is a distance, α ₁ Is a node;

FIG. 2 is a schematic diagram of functional modules of the system of the present invention: the system for realizing the power anomaly data root cause positioning method based on the knowledge graph comprises a data acquisition module, an asset combing module, a knowledge extraction module, a graph construction module, a graph iteration module and a root cause positioning module; the data acquisition module, the asset combing module, the knowledge extraction module, the map construction module, the map iteration module and the root positioning module are sequentially connected in series; the data acquisition module is used for acquiring data information of the target power system and uploading the data to the asset combing module; the asset combing module is used for combing the data assets according to the received data and uploading the data to the knowledge extraction module; the knowledge extraction module is used for extracting the knowledge of the data according to the received data and uploading the data to the map construction module; the map building module is used for building a corresponding knowledge map according to the received data and uploading the data to the map iteration module; the map iteration module is used for performing map iteration on the constructed knowledge map according to the received data based on a natural language processing technology and uploading the data to the root positioning module; and the root positioning module is used for searching the acquired knowledge graph based on an breadth-first algorithm, a depth-first algorithm and a shortest path algorithm according to the received data to complete the root positioning of the abnormal power data.

Claims

1. A power abnormal data root source positioning method based on a knowledge graph comprises the following steps:

s1, acquiring data information of a target power system;

s2, combing the data assets according to the data information acquired in the step S1;

s3, extracting knowledge of data according to the combing result in the step S2;

2. The method for locating the root cause of the abnormal power data based on the knowledge-graph according to claim 1, wherein the step S2 of combing the data assets according to the data information obtained in the step S1 specifically comprises the following steps:

the data assets include source data systems; and importing the data into a set data path according to a physical model mode, and inputting the corresponding relation between the table and the field in the data asset and the detailed path of the table according to a designed data model.

3. The method for locating the root cause of the abnormal power data based on the knowledge graph as claimed in claim 2, wherein the step S3 is to extract the knowledge of the data according to the combing result of the step S2, and specifically comprises the following steps:

acquiring data contents of the data assets after the combing;

4. The method for locating the root cause of the abnormal power data based on the knowledge graph according to claim 3, wherein the step S4 is to construct a corresponding knowledge graph according to the knowledge extraction result obtained in the step S3, and specifically comprises the following steps:

carrying out entity alignment on the data;

5. The method for locating the root cause of the abnormal power data based on the knowledge graph as claimed in claim 4, wherein the natural language processing technology based on the step S5 is used for performing graph iteration on the knowledge graph constructed in the step S4, and specifically comprises the following steps:

extracting keywords of each text in the text set to be processed;

clustering texts to be processed to generate a plurality of theme text sets;

and updating the knowledge graph nodes and the relationship between the nodes.

6. The knowledge graph-based electric power anomaly data root cause positioning method according to claim 1, characterized by obtaining seed words by using word ambiguity segmentation and unknown word recognition;

the ambiguity segmentation of words comprises the following steps:

wherein P (y | x) is the conditional probability of the state sequence y under the condition of the observation sequence x; lambda [ alpha ] _k Is a transfer characteristic coefficient; t is t _k Is a transfer characteristic function; y is _i The state at the moment i; x is an observation sequence; i is a subscript variable of the time; mu.s _l Is a state characteristic coefficient at the moment l; s _l Is a state feature function; z (x) is a normalization term, and

y is a state sequence;

the identification of the unknown words comprises the following steps:

comparing with industry proper nouns; and recognizing words in the segmented text through a proper noun dictionary, and taking a recognition result as an unknown word of the text.

7. The method for locating the root cause of the abnormal power data based on the knowledge graph as claimed in claim 1, wherein the knowledge graph obtained in the step S5 is searched based on the breadth-first algorithm, the depth-first algorithm and the shortest path algorithm in the step S6 to complete the root cause location of the abnormal power data, and specifically, based on the constructed knowledge graph, the breadth-first algorithm, the depth-first algorithm and the shortest path algorithm are adopted to correspond tables and fields related to the abnormal data to nodes in the graph data, locate a business process generated by the abnormal data and all entities and relations related to the process, thereby discovering processes, links and data items generated by the abnormal data and realizing the location of the root cause of the abnormal data.

8. The method for locating the root cause of the abnormal power data based on the knowledge graph according to claim 1, wherein the breadth-first algorithm specifically comprises the following steps:

setting the shortest path for finding nodes α and β in the node set p:

first accessing all the neighboring nodes of the node alpha

Then, all the adjacent nodes are traversed

All non-accessed neighbor nodes that access the neighbor node->

At this time

Simultaneously recording access paths and storing the access paths in a set S;

the depth-first algorithm specifically comprises the following steps:

setting the shortest path for finding nodes α and β in the node set p:

Then, the access node α ₁ One adjacent node alpha of ₂ And is provided with

At this time T = { α, α ₁ ,α ₂ }，S＝{α→α ₁ →α ₂ }；

Repeating the steps, and restarting from the node alpha when the adjacent node which is not accessed does not exist; until the node beta is visited, at the moment, the shortest path is obtained from the set S;

the shortest path algorithm specifically comprises the following steps:

setting the shortest path to find nodes α and β:

initializing dis (alpha) ₀ )＝0，α ₀ ＝α；

Find and peak alpha ₀ Marking the corresponding point as the determined point on the path with the shortest distance of the undetermined point;

9. A system for realizing the power anomaly data root-cause positioning method based on the knowledge graph of one of claims 1 to 8 is characterized by comprising a data acquisition module, an asset combing module, a knowledge extraction module, a graph construction module, a graph iteration module and a root-cause positioning module; the data acquisition module, the asset combing module, the knowledge extraction module, the map construction module, the map iteration module and the root positioning module are sequentially connected in series; the data acquisition module is used for acquiring data information of the target power system and uploading the data to the asset combing module; the asset combing module is used for combing the data assets according to the received data and uploading the data to the knowledge extraction module; the knowledge extraction module is used for extracting the knowledge of the data according to the received data and uploading the data to the map construction module; the map building module is used for building a corresponding knowledge map according to the received data and uploading the data to the map iteration module; the map iteration module is used for performing map iteration on the constructed knowledge map according to the received data based on a natural language processing technology and uploading the data to the root positioning module; and the root positioning module is used for searching the acquired knowledge graph based on an breadth-first algorithm, a depth-first algorithm and a shortest path algorithm according to the received data to complete the root positioning of the abnormal power data.