CN115983250A - Knowledge graph-based power anomaly data root cause positioning method and system - Google Patents

Knowledge graph-based power anomaly data root cause positioning method and system Download PDF

Info

Publication number
CN115983250A
CN115983250A CN202310029233.3A CN202310029233A CN115983250A CN 115983250 A CN115983250 A CN 115983250A CN 202310029233 A CN202310029233 A CN 202310029233A CN 115983250 A CN115983250 A CN 115983250A
Authority
CN
China
Prior art keywords
data
map
knowledge
module
knowledge graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310029233.3A
Other languages
Chinese (zh)
Inventor
唐汉
杨芳
汤鲸
胡胜玉
罗有志
谢尚晟
李琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Hunan Electric Power Co Ltd
Metering Center of State Grid Hunan Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Hunan Electric Power Co Ltd
Metering Center of State Grid Hunan Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Hunan Electric Power Co Ltd, Metering Center of State Grid Hunan Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202310029233.3A priority Critical patent/CN115983250A/en
Publication of CN115983250A publication Critical patent/CN115983250A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a knowledge graph-based power anomaly data root cause positioning method, which comprises the steps of acquiring data information of a target power system; combing the data assets; extracting knowledge of data; constructing a corresponding knowledge graph; performing map iteration on the knowledge map; and searching the knowledge graph to complete the root positioning of the abnormal power data. The invention also discloses a system for realizing the power anomaly data root cause positioning method based on the knowledge graph. According to the method and the system for positioning the root cause of the abnormal power data based on the knowledge graph, the knowledge graph is constructed through calculation, an analysis algorithm and a regular expression in natural language processing are combined, and the source of the abnormal data generation is intelligently analyzed based on a parallel calculation algorithm; the method can obviously improve the abnormal data root positioning efficiency, and has high reliability and good accuracy.

Description

Knowledge graph-based power anomaly data root cause positioning method and system
Technical Field
The invention belongs to the field of electrical automation, and particularly relates to a knowledge graph-based electric power abnormal data root cause positioning method and system.
Background
With the development of economic technology and the improvement of living standard of people, electric energy becomes essential secondary energy in production and life of people, and brings endless convenience to production and life of people. Therefore, ensuring a stable and reliable supply of electric energy is one of the most important tasks of an electric power system.
The power anomaly data is data that must be located in a power system in time. The large scale, cross-department and cross-specialty of the power data determine the difficulty of the root positioning work of the power abnormal data. Because the data assets of the power enterprises present typical big data characteristics, the power data cover all links of power production, power marketing and power scheduling, including various data such as power grid operation, equipment management, marketing service and enterprise management, and abnormal data can be generated in each link. The richness of the power data leads to higher and higher requirements on specialization and intelligence of the data, and a large amount of data quality problems are generated in the process, so that the difficulty is increased for the root cause positioning of abnormal data.
At present, the discovery of the abnormal power data mainly depends on passive rule checking and script manual checking. Due to the fact that abnormal data relate to complex services, a large number of systems and large data quantity, a traditional data abnormality finding mode is time-consuming, labor-consuming and low in efficiency, and the average time for finding one type of abnormal data is 48 hours.
Disclosure of Invention
One of the purposes of the invention is to provide a power abnormal data root source positioning method based on the knowledge graph, which has high reliability, good accuracy and higher efficiency.
The invention also aims to provide a system for realizing the power anomaly data root cause positioning method based on the knowledge graph.
The invention provides a power anomaly data root cause positioning method based on a knowledge graph, which comprises the following steps:
s1, acquiring data information of a target power system;
s2, combing the data assets according to the data information obtained in the step S1;
s3, extracting knowledge of data according to the carding result of the step S2;
s4, constructing a corresponding knowledge graph according to the knowledge extraction result obtained in the step S3;
s5, performing map iteration on the knowledge map constructed in the step S4 based on a natural language processing technology;
and S6, searching the knowledge graph obtained in the step S5 based on the breadth-first algorithm, the depth-first algorithm and the shortest path algorithm, and completing the root cause positioning of the abnormal power data.
The step S2 of combing the data assets according to the data information obtained in the step S1 specifically includes the following steps:
the data assets include source end data systems; and importing the data into a set data path according to a physical model mode, and inputting the corresponding relation between the table and the field in the data asset and the detailed path of the table according to a designed data model.
And S3, extracting the knowledge of the data according to the combing result in the step S2, which specifically comprises the following steps:
acquiring data contents of the data assets after the combing;
constructing a triple of the acquired data according to the data structure of the entity-relation-entity;
the entities comprise roles, services, processes, data, rules and rectification; the relations include role-business relation, business-process relation, process-data relation, data-rule relation and rule-rectification relation.
Step S4, constructing a corresponding knowledge graph according to the knowledge extraction result obtained in step S3, specifically including the following steps:
converting the knowledge extraction result into a vector in a form of a unique heat vector;
carrying out entity alignment on the data;
defining and dynamically managing the management relationship between the organization and the role from the longitudinal dimension and the transverse dimension through metadata driving, and constructing a map relationship; the map relation comprises the map relation between an organization and a role, the map management between the role and a service, the map relation between the service and data, the map relation between the data and a rule, the map relation between the rule and abnormal data, the map relation between the abnormal data and the service and the map relation between the abnormal data and the role.
The natural language processing technology-based atlas iteration is performed on the knowledge-atlas constructed in the step S4, and specifically comprises the following steps:
extracting keywords of each text in the text set to be processed;
clustering texts to be processed to generate a plurality of theme text sets;
counting the frequency of the seed words in each topic text set: reserving the subject text set with the frequency exceeding a set threshold value, and using the subject text set as a source text set of the field dictionary expansion;
calculating the association degree of each candidate word in the text of the seed word and the source text set, and storing the candidate words with the association degree reaching a set threshold value into a generation-expanded dictionary as field words;
regeneration of relationships between entities: reconstructing the incidence relation of the entities in the map by combining the historical entities and the newly generated entities;
and updating the knowledge graph nodes and the relationship between the nodes.
Obtaining seed words by using the ambiguity segmentation of words and the identification of unknown words;
the ambiguous segmentation of words comprises the following steps:
and (3) detection of segmentation ambiguity: obtaining the probability of each segmentation method through a sequence labeling model obtained through training, and selecting a plurality of segmentation methods with excellent probability performance;
resolving segmentation ambiguity: obtaining a globally optimal segmentation mode of the text through a conditional random field model, and taking the segmentation mode as a final word segmentation result; the formula for the conditional random field model is:
Figure BDA0004046028530000041
wherein P (y | x) is the conditional probability of the state sequence y under the condition of the observation sequence x; lambda [ alpha ] k Is a transfer characteristic coefficient; t is t k Is a transfer characteristic function; y is i The state at the moment i; x is an observation sequence; i is a subscript variable of the time; mu.s l Is a state characteristic coefficient at the moment l; s is l Is a state feature function; z (x) is a normalization term, an
Figure BDA0004046028530000042
y is a state sequence;
the identification of the unknown words comprises the following steps:
comparing the segmented words with the existing word stock; screening out words which are not in the word bank, and taking the words with the frequency exceeding a set value as unknown words;
comparing with the industry proper noun; and recognizing words in the segmented text through a proper noun dictionary, and taking a recognition result as an unknown word of the text.
And S6, searching the knowledge graph obtained in the step S5 based on the breadth-first algorithm, the depth-first algorithm and the shortest path algorithm to complete the root location of the abnormal data of the electric power, specifically, based on the constructed knowledge graph, adopting the breadth-first algorithm, the depth-first algorithm and the shortest path algorithm to correspond tables and fields related to the abnormal data with nodes in the graph data, and locating the business process generated by the abnormal data and all entities and relations related to the process, thereby discovering the process, link and data item generated by the abnormal data and realizing the location of the root of the abnormal data.
The breadth-first algorithm specifically comprises the following steps:
setting the shortest path for finding nodes α and β in the node set p:
first accessing all the neighboring nodes of the node alpha
Figure BDA0004046028530000051
Record the node that has been visited with set T and record the existing path with set S, at which time ≧>
Figure BDA0004046028530000052
Then, all the adjacent nodes are traversed
Figure BDA0004046028530000053
All non-accessed neighbor nodes that access the neighbor node->
Figure BDA0004046028530000054
At this time>
Figure BDA0004046028530000055
Simultaneously recording access paths and storing the access paths in a set S;
and repeating the steps until the node beta is visited, and acquiring the shortest path from the set S at the moment.
The depth-first algorithm specifically comprises the following steps:
setting the shortest path for finding nodes α and β in the node set p:
first accessing a neighboring node alpha of the node alpha 1 If the set T records the accessed nodes and the set S records the existing path, then T = { alpha, alpha = 1 },S={α→α 1 };
Then, the access node α 1 A neighboring node of 2 And is
Figure BDA0004046028530000056
When T = { α, α 12 },S={α→α 1 →α 2 };
Repeating the steps, and restarting from the node alpha when the adjacent node which is not accessed does not exist; and obtaining the shortest path from the set S until the node beta is visited.
The shortest path algorithm specifically comprises the following steps:
setting the shortest path to find nodes α and β:
initialization dis (alpha) 0 )=0,α 0 =α;
Find and peak alpha 0 Marking the corresponding point as the determined point by the path with the shortest distance of the undetermined point;
traverse all by alpha 0 As the starting edge, get (α) 01 D); if dis (alpha) 1 )>dis(α 0 ) + d, then dis (α) is updated 1 ) Has a value of dis (alpha) 0 ) + d; d is a distance, α 1 Is a node;
repeating the two steps until all the points are marked as the points for determining the shortest path; the final determined path is the shortest path.
The invention also discloses a system for realizing the power anomaly data root-source positioning method based on the knowledge graph, which comprises a data acquisition module, an asset combing module, a knowledge extraction module, a graph construction module, a graph iteration module and a root-source positioning module; the data acquisition module, the asset combing module, the knowledge extraction module, the map construction module, the map iteration module and the root positioning module are sequentially connected in series; the data acquisition module is used for acquiring data information of a target power system and uploading the data to the asset carding module; the asset combing module is used for combing the data assets according to the received data and uploading the data to the knowledge extraction module; the knowledge extraction module is used for extracting the knowledge of the data according to the received data and uploading the data to the map construction module; the map building module is used for building a corresponding knowledge map according to the received data and uploading the data to the map iteration module; the map iteration module is used for performing map iteration on the constructed knowledge map according to the received data based on a natural language processing technology and uploading the data to the root positioning module; and the root positioning module is used for searching the acquired knowledge graph based on an breadth-first algorithm, a depth-first algorithm and a shortest path algorithm according to the received data to complete the root positioning of the abnormal power data.
According to the method and the system for positioning the root cause of the abnormal power data based on the knowledge graph, the knowledge graph is constructed through calculation, an analysis algorithm and a regular expression in natural language processing are combined, and the source of the abnormal data generation is intelligently analyzed based on a parallel calculation algorithm; the method can obviously improve the abnormal data root positioning efficiency, and has high reliability and good accuracy.
Drawings
FIG. 1 is a schematic process flow diagram of the process of the present invention.
FIG. 2 is a functional block diagram of the system of the present invention.
Detailed Description
FIG. 1 is a schematic flow chart of the method of the present invention: the invention provides a power anomaly data root cause positioning method based on a knowledge graph, which comprises the following steps:
s1, acquiring data information of a target power system;
s2, combing the data assets according to the data information obtained in the step S1; the method specifically comprises the following steps:
data assets include source data systems (e.g., PMS, CMS, etc.); importing data into a set data path according to a physical model mode, and simultaneously inputting a corresponding relation between a table and a field in the data asset and a detailed path of the table according to a designed data model;
s3, extracting knowledge of data according to the carding result of the step S2; the method specifically comprises the following steps:
acquiring data contents of the data assets after the combing;
constructing a triple of the acquired data according to an entity-relation-entity data structure;
the entities comprise roles, services, processes, data, rules and rectification; the relations comprise a role-business relation, a business-process relation, a process-data relation, a data-rule relation and a rule-rectification relation;
s4, constructing a corresponding knowledge graph according to the knowledge extraction result obtained in the step S3; the method specifically comprises the following steps:
converting the knowledge extraction result into a vector in a form of a unique heat vector;
carrying out entity alignment on the data; for example, the same marketing data term may have different meanings in different application scenarios, and entity disambiguation may be required. Unifying different marketing service data, and endowing different attributes to terms according to application scenes;
defining and dynamically managing the management relationship between the organization and the role from the longitudinal dimension and the transverse dimension through metadata driving, and constructing a map relationship; the map relation comprises a map relation between an organization and a role, map management between the role and a service, a map relation between the service and data, a map relation between the data and a rule, a map relation between the rule and abnormal data, a map relation between the abnormal data and the service and a map relation between the abnormal data and the role;
s5, performing map iteration on the knowledge map constructed in the step S4 based on a natural language processing technology; the method specifically comprises the following steps:
extracting keywords of each text in the text set to be processed;
clustering texts to be processed to generate a plurality of theme text sets;
counting the frequency of occurrence of seed words in each topic text set: reserving the subject text set with the frequency exceeding a set threshold value, and using the subject text set as a source text set of the field dictionary expansion;
calculating the association degree of each candidate word in the text of the seed word and the source text set, and storing the candidate word of which the association degree reaches a set threshold value as a field word into a dictionary extended by a generation;
regeneration of relationships between entities: reconstructing the incidence relation of the entities in the map by combining the historical entities and the newly generated entities;
and updating the knowledge graph nodes and the relationship among the nodes.
Obtaining seed words by using the ambiguity segmentation of words and the identification of unknown words;
the ambiguous segmentation of words comprises the following steps:
and (3) detection of segmentation ambiguity: obtaining the probability of each segmentation method through a sequence labeling model obtained through training, and selecting a plurality of segmentation methods with excellent probability performance;
resolution of segmentation ambiguity: obtaining a globally optimal segmentation mode of the text through a conditional random field model, and taking the segmentation mode as a final word segmentation result; the formula for the conditional random field model is:
Figure BDA0004046028530000091
wherein P (y | x) is the conditional probability of the state sequence y under the condition of the observation sequence x; lambda [ alpha ] k Is a transfer characteristic coefficient; t is t k Is a transfer characteristic function; y is i The state at the moment i; x is an observation sequence; i is a subscript variable of the time; mu.s l Is a state characteristic coefficient at the moment l; s is l Is a state feature function; z (x) is a normalization term, an
Figure BDA0004046028530000092
y is a state sequence;
the identification of the unknown words comprises the following steps:
comparing the segmented words with the existing word stock; screening out words which are not in the word bank, and taking the words with the frequency exceeding a set value as unknown words;
comparing with industry proper nouns; recognizing words in the segmented text through a proper noun dictionary, and taking a recognition result as an unknown word of the text;
in the dictionary-based method, for a given word, only the words existing in the dictionary can be recognized, the method used is the forward maximum Matching Method (MM), the effect of which depends on the coverage of the dictionary, and therefore new words need to be updated regularly;
s6, searching the knowledge graph obtained in the step S5 based on an breadth-first algorithm, a depth-first algorithm and a shortest path algorithm to complete root positioning of the abnormal power data; specifically, based on a constructed knowledge graph, a breadth-first algorithm, a depth-first algorithm and a shortest path algorithm are adopted, tables and fields related to abnormal data correspond to nodes in graph data, and a business process generated by the abnormal data and all entities and relations related to the process are located, so that the process, links and data items generated by the abnormal data are found, and the root of the abnormal data is located.
In specific implementation, the breadth-first algorithm specifically includes the following steps:
the breadth-first search algorithm is sent from a certain point, the first step is to visit all adjacent nodes of the point, record the adjacent nodes, then visit the adjacent nodes of the adjacent nodes, if the adjacent nodes have been visited before, the visit is skipped until the target node is obtained;
setting the shortest path for finding nodes α and β in the node set p:
first accessing all the neighboring nodes of the node alpha
Figure BDA0004046028530000101
Record the node that has been visited with set T and record the existing path with set S, at which time ≧>
Figure BDA0004046028530000102
Then, all the adjacent nodes are traversed
Figure BDA0004046028530000103
All non-accessed neighbor nodes that access the neighbor node->
Figure BDA0004046028530000104
At this time>
Figure BDA0004046028530000105
Simultaneously recording access paths and storing the access paths in a set S;
repeating the steps until the node beta is accessed, and acquiring the shortest path from the set S at the moment;
the parallel breadth-first search algorithm refers to the fact that in the search process, a plurality of nodes can be started to perform next search and access at the same time, the nodes do not interfere with each other, and efficiency is greatly improved;
in specific implementation, the depth-first algorithm specifically includes the following steps:
the principle of the depth-first search algorithm is that after a next adjacent node is found from a node, the next adjacent node is continuously found until a target node is accessed or the current node is accessed or no adjacent node exists;
setting the shortest path for finding nodes α and β in the node set p:
first accessing a neighboring node alpha of the node alpha 1 If the set T records the accessed nodes and the set S records the existing path, then T = { alpha, alpha = 1 },S={α→α 1 };
Then, the access node α 1 One adjacent node alpha of 2 And is
Figure BDA0004046028530000106
When T = { α, α 12 },S={α→α 1 →α 2 };
Repeating the steps, and restarting from the node alpha when no adjacent node which is not accessed exists; until the node beta is visited, at the moment, the shortest path is obtained from the set S;
the depth-first search algorithm refers to that the next search and access can be carried out from a plurality of nodes simultaneously in the search process, and the nodes do not interfere with each other, so that the efficiency is greatly improved;
in specific implementation, the shortest path algorithm specifically includes the following steps:
the single source shortest path search algorithm supports finding the shortest path for weighted connecting edges. The main principle is to assume that all vertex sets are G, set a vertex set point set S and continuously make greedy selection to expand the set, and set V = G-S. A vertex belongs to the set sfet and only if the shortest path length from the source to the vertex is known. Initially, S contains only the source, i.e. the starting point. Let u be one vertex of G. A path from a source to u and passing through the middle vertex only in S is called a special path from the source to u, the length of the shortest special path corresponding to each current vertex is recorded by using a matrix A, the vertex with the length of the shortest special path is taken out from V each time, and the length of the shortest path from the source to all other vertices is recorded by using the matrix A;
setting the shortest path to find nodes α and β:
initializing dis (alpha) 0 )=0,α 0 =α;
Find and peak alpha 0 Marking the corresponding point as the determined point by the path with the shortest distance of the undetermined point;
traverse all by alpha 0 As the starting edge, get (α) 01 D); if dis (alpha) 1 )>dis(α 0 ) + d, then update dis (α) 1 ) Has a value of dis (alpha) 0 ) + d; d is a distance, α 1 Is a node;
repeating the two steps until all the points are marked as the points for determining the shortest path; the final determined path is the shortest path.
FIG. 2 is a schematic diagram of functional modules of the system of the present invention: the system for realizing the power anomaly data root cause positioning method based on the knowledge graph comprises a data acquisition module, an asset combing module, a knowledge extraction module, a graph construction module, a graph iteration module and a root cause positioning module; the data acquisition module, the asset combing module, the knowledge extraction module, the map construction module, the map iteration module and the root positioning module are sequentially connected in series; the data acquisition module is used for acquiring data information of the target power system and uploading the data to the asset combing module; the asset combing module is used for combing the data assets according to the received data and uploading the data to the knowledge extraction module; the knowledge extraction module is used for extracting the knowledge of the data according to the received data and uploading the data to the map construction module; the map building module is used for building a corresponding knowledge map according to the received data and uploading the data to the map iteration module; the map iteration module is used for performing map iteration on the constructed knowledge map according to the received data based on a natural language processing technology and uploading the data to the root positioning module; and the root positioning module is used for searching the acquired knowledge graph based on an breadth-first algorithm, a depth-first algorithm and a shortest path algorithm according to the received data to complete the root positioning of the abnormal power data.

Claims (9)

1. A power abnormal data root source positioning method based on a knowledge graph comprises the following steps:
s1, acquiring data information of a target power system;
s2, combing the data assets according to the data information acquired in the step S1;
s3, extracting knowledge of data according to the combing result in the step S2;
s4, constructing a corresponding knowledge graph according to the knowledge extraction result obtained in the step S3;
s5, performing map iteration on the knowledge map constructed in the step S4 based on a natural language processing technology;
and S6, searching the knowledge graph obtained in the step S5 based on the breadth-first algorithm, the depth-first algorithm and the shortest path algorithm, and completing the root cause positioning of the abnormal power data.
2. The method for locating the root cause of the abnormal power data based on the knowledge-graph according to claim 1, wherein the step S2 of combing the data assets according to the data information obtained in the step S1 specifically comprises the following steps:
the data assets include source data systems; and importing the data into a set data path according to a physical model mode, and inputting the corresponding relation between the table and the field in the data asset and the detailed path of the table according to a designed data model.
3. The method for locating the root cause of the abnormal power data based on the knowledge graph as claimed in claim 2, wherein the step S3 is to extract the knowledge of the data according to the combing result of the step S2, and specifically comprises the following steps:
acquiring data contents of the data assets after the combing;
constructing a triple of the acquired data according to the data structure of the entity-relation-entity;
the entities comprise roles, services, processes, data, rules and rectification; the relations include role-business relation, business-process relation, process-data relation, data-rule relation and rule-rectification relation.
4. The method for locating the root cause of the abnormal power data based on the knowledge graph according to claim 3, wherein the step S4 is to construct a corresponding knowledge graph according to the knowledge extraction result obtained in the step S3, and specifically comprises the following steps:
converting the knowledge extraction result into a vector in a form of a unique heat vector;
carrying out entity alignment on the data;
defining and dynamically managing the management relationship between the organization and the role from the longitudinal dimension and the transverse dimension through metadata driving, and constructing a map relationship; the map relation comprises the map relation between an organization and a role, the map management between the role and a service, the map relation between the service and data, the map relation between the data and a rule, the map relation between the rule and abnormal data, the map relation between the abnormal data and the service and the map relation between the abnormal data and the role.
5. The method for locating the root cause of the abnormal power data based on the knowledge graph as claimed in claim 4, wherein the natural language processing technology based on the step S5 is used for performing graph iteration on the knowledge graph constructed in the step S4, and specifically comprises the following steps:
extracting keywords of each text in the text set to be processed;
clustering texts to be processed to generate a plurality of theme text sets;
counting the frequency of occurrence of seed words in each topic text set: reserving the subject text set with the frequency exceeding a set threshold value, and using the subject text set as a source text set of the field dictionary expansion;
calculating the association degree of each candidate word in the text of the seed word and the source text set, and storing the candidate words with the association degree reaching a set threshold value into a generation-expanded dictionary as field words;
regeneration of relationships between entities: reconstructing the incidence relation of the entities in the map by combining the historical entities and the newly generated entities;
and updating the knowledge graph nodes and the relationship between the nodes.
6. The knowledge graph-based electric power anomaly data root cause positioning method according to claim 1, characterized by obtaining seed words by using word ambiguity segmentation and unknown word recognition;
the ambiguity segmentation of words comprises the following steps:
and (3) detection of segmentation ambiguity: obtaining the probability of each segmentation method through a sequence labeling model obtained through training, and selecting a plurality of segmentation methods with excellent probability performance;
resolution of segmentation ambiguity: obtaining a globally optimal segmentation mode of the text through a conditional random field model, and taking the segmentation mode as a final word segmentation result; the formula for the conditional random field model is:
Figure FDA0004046028520000031
wherein P (y | x) is the conditional probability of the state sequence y under the condition of the observation sequence x; lambda [ alpha ] k Is a transfer characteristic coefficient; t is t k Is a transfer characteristic function; y is i The state at the moment i; x is an observation sequence; i is a subscript variable of the time; mu.s l Is a state characteristic coefficient at the moment l; s l Is a state feature function; z (x) is a normalization term, and
Figure FDA0004046028520000032
y is a state sequence;
the identification of the unknown words comprises the following steps:
comparing the segmented words with the existing word stock; screening out words which are not in the word bank, and taking the words with the frequency exceeding a set value as unknown words;
comparing with industry proper nouns; and recognizing words in the segmented text through a proper noun dictionary, and taking a recognition result as an unknown word of the text.
7. The method for locating the root cause of the abnormal power data based on the knowledge graph as claimed in claim 1, wherein the knowledge graph obtained in the step S5 is searched based on the breadth-first algorithm, the depth-first algorithm and the shortest path algorithm in the step S6 to complete the root cause location of the abnormal power data, and specifically, based on the constructed knowledge graph, the breadth-first algorithm, the depth-first algorithm and the shortest path algorithm are adopted to correspond tables and fields related to the abnormal data to nodes in the graph data, locate a business process generated by the abnormal data and all entities and relations related to the process, thereby discovering processes, links and data items generated by the abnormal data and realizing the location of the root cause of the abnormal data.
8. The method for locating the root cause of the abnormal power data based on the knowledge graph according to claim 1, wherein the breadth-first algorithm specifically comprises the following steps:
setting the shortest path for finding nodes α and β in the node set p:
first accessing all the neighboring nodes of the node alpha
Figure FDA0004046028520000041
Record the node that has been visited with set T and record the existing path with set S, at which time ≧>
Figure FDA0004046028520000042
Then, all the adjacent nodes are traversed
Figure FDA0004046028520000043
All non-accessed neighbor nodes that access the neighbor node->
Figure FDA0004046028520000044
At this time
Figure FDA0004046028520000045
Simultaneously recording access paths and storing the access paths in a set S;
repeating the steps until the node beta is accessed, and acquiring the shortest path from the set S at the moment;
the depth-first algorithm specifically comprises the following steps:
setting the shortest path for finding nodes α and β in the node set p:
first accessing a neighboring node alpha of the node alpha 1 If the set T records the accessed nodes and the set S records the existing path, then T = { alpha, alpha = 1 },S={α→α 1 };
Then, the access node α 1 One adjacent node alpha of 2 And is provided with
Figure FDA0004046028520000046
At this time T = { α, α 12 },S={α→α 1 →α 2 };
Repeating the steps, and restarting from the node alpha when the adjacent node which is not accessed does not exist; until the node beta is visited, at the moment, the shortest path is obtained from the set S;
the shortest path algorithm specifically comprises the following steps:
setting the shortest path to find nodes α and β:
initializing dis (alpha) 0 )=0,α 0 =α;
Find and peak alpha 0 Marking the corresponding point as the determined point on the path with the shortest distance of the undetermined point;
traverse all by alpha 0 As the starting edge, get (α) 01 D); if dis (alpha) 1 )>dis(α 0 ) + d, then update dis (α) 1 ) Has a value of dis (alpha) 0 ) + d; d is a distance, α 1 Is a node;
repeating the two steps until all the points are marked as the points for determining the shortest path; the final determined path is the shortest path.
9. A system for realizing the power anomaly data root-cause positioning method based on the knowledge graph of one of claims 1 to 8 is characterized by comprising a data acquisition module, an asset combing module, a knowledge extraction module, a graph construction module, a graph iteration module and a root-cause positioning module; the data acquisition module, the asset combing module, the knowledge extraction module, the map construction module, the map iteration module and the root positioning module are sequentially connected in series; the data acquisition module is used for acquiring data information of the target power system and uploading the data to the asset combing module; the asset combing module is used for combing the data assets according to the received data and uploading the data to the knowledge extraction module; the knowledge extraction module is used for extracting the knowledge of the data according to the received data and uploading the data to the map construction module; the map building module is used for building a corresponding knowledge map according to the received data and uploading the data to the map iteration module; the map iteration module is used for performing map iteration on the constructed knowledge map according to the received data based on a natural language processing technology and uploading the data to the root positioning module; and the root positioning module is used for searching the acquired knowledge graph based on an breadth-first algorithm, a depth-first algorithm and a shortest path algorithm according to the received data to complete the root positioning of the abnormal power data.
CN202310029233.3A 2023-01-09 2023-01-09 Knowledge graph-based power anomaly data root cause positioning method and system Pending CN115983250A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310029233.3A CN115983250A (en) 2023-01-09 2023-01-09 Knowledge graph-based power anomaly data root cause positioning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310029233.3A CN115983250A (en) 2023-01-09 2023-01-09 Knowledge graph-based power anomaly data root cause positioning method and system

Publications (1)

Publication Number Publication Date
CN115983250A true CN115983250A (en) 2023-04-18

Family

ID=85962905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310029233.3A Pending CN115983250A (en) 2023-01-09 2023-01-09 Knowledge graph-based power anomaly data root cause positioning method and system

Country Status (1)

Country Link
CN (1) CN115983250A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116186350A (en) * 2023-04-23 2023-05-30 浙江大学 Power transmission line engineering searching method and device based on knowledge graph and topic text
CN116562852A (en) * 2023-05-17 2023-08-08 国网安徽省电力有限公司黄山供电公司 Distribution network power failure information management system based on knowledge graph
CN117094688A (en) * 2023-10-20 2023-11-21 国网信通亿力科技有限责任公司 Digital control method and system for power supply station

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116186350A (en) * 2023-04-23 2023-05-30 浙江大学 Power transmission line engineering searching method and device based on knowledge graph and topic text
CN116186350B (en) * 2023-04-23 2023-07-25 浙江大学 Power transmission line engineering searching method and device based on knowledge graph and topic text
CN116562852A (en) * 2023-05-17 2023-08-08 国网安徽省电力有限公司黄山供电公司 Distribution network power failure information management system based on knowledge graph
CN116562852B (en) * 2023-05-17 2024-06-04 国网安徽省电力有限公司黄山供电公司 Distribution network power failure information management system based on knowledge graph
CN117094688A (en) * 2023-10-20 2023-11-21 国网信通亿力科技有限责任公司 Digital control method and system for power supply station
CN117094688B (en) * 2023-10-20 2023-12-19 国网信通亿力科技有限责任公司 Digital control method and system for power supply station

Similar Documents

Publication Publication Date Title
CN115983250A (en) Knowledge graph-based power anomaly data root cause positioning method and system
Ahmed et al. Learning role-based graph embeddings
CN110633366B (en) Short text classification method, device and storage medium
CN113535974B (en) Diagnostic recommendation method and related device, electronic equipment and storage medium
Zanghi et al. Strategies for online inference of model-based clustering in large and growing networks
CN103488790A (en) Polychronic time sequence similarity analysis method based on weighting BORDA counting method
CN111191051B (en) Method and system for constructing emergency knowledge map based on Chinese word segmentation technology
CN109857457A (en) A kind of function level insertion representation method learnt in source code in the hyperbolic space
CN116383422B (en) Non-supervision cross-modal hash retrieval method based on anchor points
CN117149974A (en) Knowledge graph question-answering method for sub-graph retrieval optimization
CN116881430A (en) Industrial chain identification method and device, electronic equipment and readable storage medium
CN110737779B (en) Knowledge graph construction method and device, storage medium and electronic equipment
WO2022188646A1 (en) Graph data processing method and apparatus, and device, storage medium and program product
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
Han et al. DeepRouting: A deep neural network approach for ticket routing in expert network
CN114491081A (en) Electric power data tracing method and system based on data blood relationship graph
CN111753151B (en) Service recommendation method based on Internet user behavior
CN112685452A (en) Enterprise case retrieval method, device, equipment and storage medium
CN110502669B (en) Social media data classification method and device based on N-edge DFS subgraph lightweight unsupervised graph representation learning
CN117010373A (en) Recommendation method for category and group to which asset management data of power equipment belong
CN116821087A (en) Power transmission line fault database construction method, device, terminal and storage medium
Hadiji et al. Computer science on the move: Inferring migration regularities from the web via compressed label propagation
CN114124417B (en) Vulnerability assessment method with enhanced expandability under large-scale network
CN113821650A (en) Information retrieval system based on big data
Yu et al. Workflow recommendation based on graph embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination