CN115277453A - Method for generating abnormal knowledge graph in operation and maintenance field, application method and device - Google Patents
Method for generating abnormal knowledge graph in operation and maintenance field, application method and device Download PDFInfo
- Publication number
- CN115277453A CN115277453A CN202210664886.4A CN202210664886A CN115277453A CN 115277453 A CN115277453 A CN 115277453A CN 202210664886 A CN202210664886 A CN 202210664886A CN 115277453 A CN115277453 A CN 115277453A
- Authority
- CN
- China
- Prior art keywords
- abnormal
- map
- sub
- fault
- indexes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 550
- 238000012423 maintenance Methods 0.000 title claims abstract description 129
- 238000000034 method Methods 0.000 title claims abstract description 86
- 238000001514 detection method Methods 0.000 claims abstract description 41
- 238000002372 labelling Methods 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 19
- 238000001228 spectrum Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 13
- 230000011218 segmentation Effects 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 4
- 230000005856 abnormality Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000000739 chaotic effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005295 random walk Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
- H04L41/064—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
- H04L41/065—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0817—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention provides a generation method, an application method and a device of an abnormal knowledge graph in the field of operation and maintenance. The generation method comprises the following steps: determining time sequence data of the performance index based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance index to determine an abnormal index; grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes; determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity; and labeling fault information including fault names and fault solutions to the domain knowledge map based on the experience of the operation and maintenance experts to obtain the operation and maintenance domain abnormal knowledge map of the target system. The method and the device can automatically generate the abnormal knowledge map in the operation and maintenance field according to the abnormal data generated when the abnormal event occurs.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a generation method, an application method and an application device of an abnormal knowledge map spectrum in the operation and maintenance field.
Background
In a large computer cluster environment such as an IT information system, the deployment of software and hardware is complicated, the occurrence of faults can be described by abnormal performance indexes when the faults occur, and experience can be provided for the treatment of subsequent faults by accumulating fault knowledge to construct a knowledge graph. The existing method for constructing the knowledge graph according to the fault scene depends on expert experience, and the other method summarizes according to fault simulation.
Depending on expert experience: the method mainly comprises the steps that an expert combines self experience to summarize some typical fault scenes, manually compiles the typical fault scenes into a knowledge graph and adds corresponding solutions, and provides reference for subsequent fault judgment and fault solutions.
Summarising according to the fault simulation: the common method is to simulate various fault scenes as much as possible by using chaotic test tools or service embedding points and other modes, then manually count abnormal indexes generated when faults occur, and summarize the abnormal indexes and fault phenomena into a knowledge graph.
It can be seen that both expert experience and fault simulation summary require artificial statistics of anomaly indicators associated with a fault scenario. The generation of a knowledge graph with universality through artificial statistics has the following defects: when fault summary is performed manually, a few performance indexes with obvious abnormal characteristics are usually used as the display of a fault scene, but in a real environment of a system, when a fault occurs, a large number of abnormal indexes are generated within a period of time to form a multi-dimensional abnormal relation, and the description of the fault cannot be accurately performed only through the few performance indexes, so that the difficulty in subsequent fault location is increased. According to the experience of the operation and maintenance experts, the abnormal indexes are filtered by a method of setting a threshold value, a large number of abnormal indexes can be detected, however, false alarm of a large number of indexes is easily formed, normal indexes are identified as abnormal indexes, and operation and maintenance cost is seriously consumed. The method for extracting fault knowledge manually based on abnormal indexes has the defects of high cost, easiness in misinformation and missing report, low timeliness, incapability of realizing 24-hour uninterrupted extraction, incapability of extracting abnormal indexes according to a time window and converting the abnormal indexes into fault knowledge, and easiness in causing insufficient and inaccurate sampling of the fault knowledge.
Disclosure of Invention
The invention provides a method for generating an abnormal knowledge map in an operation and maintenance field, an application method and a device, which are used for overcoming the defects existing in the generation of the abnormal knowledge map through manual statistics and can realize the automatic generation of the abnormal knowledge map in the operation and maintenance field according to abnormal data generated when an abnormal event occurs.
In a first aspect, the invention provides a method for generating an abnormal knowledge graph in the operation and maintenance field, which comprises the following steps:
determining time sequence data of performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
and carrying out fault information labeling on the domain knowledge graph based on the experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution.
According to the method for generating the abnormal knowledge graph in the operation and maintenance field, the abnormal indexes are grouped according to the component types of the performance indexes based on the preset time window, and the corresponding abnormal sub-graph is constructed based on the grouping of the abnormal indexes, and the method comprises the following steps:
dividing the time sequence data of the abnormal indexes according to a preset time window, and determining a first proportion of the abnormal data of each abnormal index in the time sequence data of the current time window;
determining the abnormal index of which the first ratio is larger than a preset first threshold value as a target abnormal index in a corresponding time window;
and grouping the target abnormal indexes in one time window according to the component types to which the performance indexes belong, and constructing a corresponding abnormal sub-map based on the grouping of the target abnormal indexes.
According to the method for generating the abnormal knowledge graph in the operation and maintenance field, the similarity between the abnormal sub-graph and the historical abnormal sub-graph spectrum of other network elements of the same type in the target system is determined, and the abnormal sub-graph spectrum is determined as the field knowledge graph based on the similarity, and the method comprises the following steps:
determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system;
determining a second proportion of the network element generating the abnormal sub-map and other network elements with the similarity larger than a preset second threshold in the network elements of the same type;
determining the abnormal sub-graph spectrum with the second percentage being larger than the third threshold as the domain knowledge graph.
According to the method for generating the abnormal knowledge graph in the operation and maintenance field, the similarity between the abnormal sub-graph and the historical abnormal sub-graph spectrum of other network elements of the same type in the target system is determined, and for each abnormal sub-graph spectrum, the method comprises the following steps:
determining the similarity of the abnormal indexes of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, the quantity of the same abnormal indexes and the similarity of map vectors based on node2 vec;
ranking the determined abnormal index similarity, the number of the same abnormal indexes and the map vector similarity based on the node2vec respectively;
and aiming at the abnormal index similarity of each historical abnormal sub-map, the number of the same abnormal indexes and the ranking summation of the map vector similarity based on node2vec, obtaining the similarity of the corresponding historical abnormal sub-map and the abnormal sub-map.
According to the method for generating the abnormal knowledge graph in the operation and maintenance field, the time series data of the performance index is determined based on the collected operation data of the target system, and the abnormal index is determined by performing abnormal detection on the time series data of the performance index, and the method comprises the following steps:
collecting the operation data of the target system based on an agent program, and processing the operation data to obtain the time sequence data of the performance index;
and carrying out anomaly detection on the time series data of the performance index based on 4-sigma, and determining an anomaly index in the time series data.
In a second aspect, the present invention further provides an application method of the abnormal knowledge graph in the operation and maintenance field, including:
determining time sequence data of performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
splicing the abnormal sub-maps based on the system architecture of the target system, and verifying the spliced abnormal sub-maps to generate a fault knowledge map;
segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph;
matching the fault sub-map with an operation and maintenance field abnormal knowledge map of the target system, and determining the operation and maintenance field abnormal knowledge map corresponding to the fault sub-map;
and obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
In a third aspect, the present invention further provides a device for generating an abnormal knowledge graph in the operation and maintenance field, including:
the abnormal index detection module is used for determining the time sequence data of the performance index based on the collected running data of the target system, and performing abnormal detection on the time sequence data of the performance index to determine the abnormal index in the time sequence data;
the abnormal map building module is used for grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window and building corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
a domain knowledge graph extraction module, configured to determine similarity between the abnormal sub-graph and historical abnormal sub-graphs of other network elements of the same type in the target system, and determine the abnormal sub-graph as a domain knowledge graph based on the similarity;
the marking module of the domain knowledge map is used for marking fault information of the domain knowledge map based on the experience of an operation and maintenance expert to obtain the abnormal knowledge map of the operation and maintenance domain of the target system, wherein the fault information comprises: fault name and fault solution.
In a fourth aspect, the present invention further provides an application apparatus of an abnormal knowledge graph in the operation and maintenance field, including:
the abnormal index detection module is used for determining the time sequence data of the performance index based on the collected running data of the target system, and performing abnormal detection on the time sequence data of the performance index to determine the abnormal index in the time sequence data;
the abnormal map building module is used for grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window and building corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
the fault knowledge map generation module is used for splicing the abnormal sub-maps based on the system architecture of the target system, verifying the spliced abnormal sub-maps and generating a fault knowledge map;
the fault knowledge map segmentation module is used for segmenting the fault knowledge map based on the component type to obtain a fault sub-map;
the domain knowledge map matching module is used for matching the fault sub-map with the operation and maintenance domain abnormal knowledge map of the target system and determining the operation and maintenance domain abnormal knowledge map corresponding to the fault sub-map;
and the fault solution extraction module is used for obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
In a fifth aspect, the present invention further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for generating the abnormal knowledge map in the operation and maintenance field according to the first aspect, or the method for applying the abnormal knowledge map in the operation and maintenance field according to the second aspect when executing the program.
In a sixth aspect, the invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for generating the abnormal knowledge map in the operation and maintenance field according to the first aspect, or the method for applying the abnormal knowledge map in the operation and maintenance field according to the second aspect.
In a seventh aspect, the invention further provides a computer program product, on which a computer program is stored, and when the computer program is executed by a processor, the method for generating the operation and maintenance domain anomaly knowledge graph according to the first aspect or the step of the method for applying the operation and maintenance domain anomaly knowledge graph according to the second aspect is implemented.
According to the method and the device for generating the abnormal knowledge map in the operation and maintenance field, the abnormal knowledge map in the operation and maintenance field is automatically generated according to the abnormal data generated when the abnormal event of the target system occurs, manual participation is not needed, and the abnormal event can be depicted more comprehensively and accurately. The abnormal indexes are automatically extracted based on abnormal data, comprehensiveness and accuracy of extracting the abnormal indexes generated by abnormal events can be guaranteed, fault knowledge is automatically extracted based on the abnormal indexes, the cost is low, false alarm and missed report are not prone to occurring, the timeliness is high, 24-hour uninterrupted extraction can be achieved, the abnormal indexes can be extracted according to a time window and converted into the fault knowledge, sampling of the fault knowledge can be comprehensive and accurate, fault information such as fault names and fault solution schemes is labeled through operation and maintenance experts, the generated operation and maintenance field abnormal knowledge map can specially provide a scheme for solving one type of faults, and important data support is provided for follow-up fault judgment, fault positioning and fault processing.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the following briefly introduces the drawings needed for the embodiments or the prior art descriptions, and obviously, the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic flow chart of a method for generating an abnormal knowledge graph in the operation and maintenance field according to the present invention;
FIGS. 2A, 2B and 2C are schematic diagrams of an anomaly sub-map provided by the present invention;
FIG. 3A is a schematic flow chart of constructing an anomaly sub-map according to the present invention;
FIG. 3B is a schematic flow chart of an application scenario for constructing an anomaly sub-map provided by the present invention;
FIG. 4 is a schematic flow chart of determining a domain knowledge graph provided by the present invention;
FIG. 5 is a schematic flow chart of determining the similarity between an abnormal sub-map and a historical abnormal sub-map according to the present invention;
FIG. 6A is a flow chart of an application method of the abnormal knowledge graph in the operation and maintenance field according to the present invention;
FIG. 6B is a schematic diagram of a failure knowledge graph generated according to the application method of the abnormal knowledge graph in the operation and maintenance field provided by the invention;
FIG. 7 is a schematic diagram of a component structure of an abnormal knowledge graph generation device in the operation and maintenance field according to the present invention;
FIG. 8 is a schematic diagram of a component structure of an application apparatus of an abnormal knowledge graph in the operation and maintenance field provided by the present invention;
fig. 9 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings in the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method for generating the abnormal knowledge graph in the operation and maintenance field provided by the invention is described below with reference to fig. 1 to 5.
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for generating an abnormal knowledge graph in an operation and maintenance field according to the present invention, where the method for generating an abnormal knowledge graph in an operation and maintenance field shown in fig. 1 may be executed by a device for generating an abnormal knowledge graph in an operation and maintenance field, and the device for generating an abnormal knowledge graph in an operation and maintenance field may be disposed in a server, for example, the server may be a physical server including an independent host, a virtual server borne by a host cluster, a cloud server, and the like, which is not limited in this embodiment of the present invention. As shown in fig. 1, the method for generating the abnormal knowledge graph in the operation and maintenance field at least includes:
and 101, determining time series data of the performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time series data of the performance indexes to determine abnormal indexes in the time series data.
In the embodiment of the present invention, the target system may be a system that needs to be operated and maintained, for example, an IT information system, and the type of the target system is not limited in the embodiment of the present invention. The time sequence data of the performance index of the target system can be acquired by regularly acquiring the operation data of the components in the target system. The embodiment of the invention does not limit the number and types of the components for acquiring the operation data in the target system, the set time for acquiring the operation data and the type of the performance index acquired based on the acquired operation data. For example, the operation data of all the running hardware components and software components in the IT information system may be collected once per minute, and the timing sequence data of the performance indexes of all the running hardware components and software components may be determined based on the collected operation data, where the performance indexes of the hardware components may include CPU occupancy of the host, process count, memory usage rate, and the like, and the performance indexes of the software components may include software compatibility, security, maintainability, and the like.
The embodiment of the invention does not limit the implementation method for acquiring the running data of the target system to obtain the time sequence data of the performance index. Optionally, the running data of the target system may be collected based on the agent program, and the running data is processed to obtain time series data of the performance index; or, the running data of the target system can be acquired by other existing automatic data acquisition methods, and the running data is processed to obtain the time series data of the performance index. For example, the Agent technology may be used to collect operation data from a hardware component and a software component in an IT information system, store the collected operation data in a data warehouse, and process and aggregate the data to obtain time series data of performance indexes of the hardware component and the software component in the IT information system, where the method of processing and aggregating the data may be implemented by using an existing method according to the type of the performance index.
In the embodiment of the present invention, after the time series data of the target system performance index is obtained, abnormal performance indexes, that is, abnormal indexes, in the time series data of the performance indexes can be obtained by detecting abnormal data in the time series data of the performance indexes. The implementation method for performing the anomaly detection on the time series data of the performance index in the embodiment of the invention is not limited. Optionally, an existing anomaly detection algorithm may be used to perform anomaly detection on the time series data of the performance index, for example, a general anomaly detection algorithm such as an isolated forest algorithm, a Local anomaly Factor (LOF) algorithm, and the like; or, the existing anomaly detection algorithm may be improved, and anomaly detection may be performed on the time series data of the performance index based on the improved algorithm, for example, the existing N-sigma algorithm may be improved, where N is a deviation multiple of the threshold, and when N =3, the existing N-sigma algorithm is a general 3-sigma algorithm, and may be solved through multiple iterations, and finally N is determined to be 4, so as to obtain an improved 4-sigma algorithm, and anomaly detection may be performed on the time series data of the performance index based on 4-sigma, so as to determine an anomaly index therein.
And 102, grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes.
In the embodiment of the invention, after the time series data of the target system performance indexes are subjected to anomaly detection to obtain the anomaly indexes, the detected anomaly indexes can be grouped according to the time dimension and the component dimension to construct each anomaly sub-map. The abnormal indexes are grouped according to the time dimension, a time window can be preset, the abnormal indexes are divided according to the preset time window, the abnormal indexes are grouped according to the component dimension, the abnormal indexes in each time window can be grouped according to the component type to which the performance index belongs, and therefore an abnormal Chang Zi map is constructed according to the grouping of each abnormal index, and each abnormal sub-map is constructed. The preset time window width is not limited in the embodiment of the present invention, for example, the preset time window width may be 10 minutes. The embodiment of the present invention does not limit the division of the component type to which the performance index belongs, for example, the component type to which the performance index belongs may include a host class index, a database class index, an application class index, a log class index, a call chain class index, an alarm class index, and the like.
For example, the abnormal index is divided by taking 10 minutes as the width of a time window, the abnormal index in the current time window is grouped and summarized according to a host class index, a database class index, an application class index, a log class index, a calling chain class index, an alarm class index and the like, and the abnormal index of the host class, the abnormal index of the database class and the abnormal index of the application class in the current time window can be obtained, wherein the abnormal index of the host class is marked by a network rate, a CPU occupation rate, a disk IO speed and a memory utilization rate (MEM), the abnormal index of the database class is marked by a maximum connection number, a table space capacity and a cache space size, and the abnormal index of the application class is marked by an object processing number (app.tps) transmitted by application software per second and a request response time, as shown in fig. 2A, fig. 2B and fig. 2C, a host abnormal sub-map, a database abnormal sub-map and an application abnormal sub-map can be respectively constructed according to the grouping of the abnormal index, and the constructed abnormal sub-map is stored in a database.
And 103, determining the similarity of the abnormal sub-map and historical abnormal sub-map spectrums of other network elements of the same type in the target system, and determining the abnormal sub-map spectrums to be the domain knowledge map based on the similarity.
In the embodiment of the invention, after the abnormal indexes are grouped according to the time window and the component type to which the performance index belongs to construct the abnormal sub-map, the similarity between the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system can be determined according to the Shi Yichang sub-map of the target system, whether the abnormal sub-map has the universality in the network elements of the same type in the target system is judged according to the similarity, and if the abnormal sub-map has the universality in the network elements of the same type in the target system, the abnormal sub-map is determined as the domain knowledge map. The embodiment of the invention does not limit the implementation method for determining the similarity between the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system according to the historical abnormal sub-map of the target system. For example, the similarity between the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system can be determined according to a preset algorithm. In the present invention, a network element may refer to an element in a network of a target system, for example, a host, a server, a router, a virtual machine, an application program, and the like, and a network element of the same type may refer to an element having the same or similar function in a network of a target system, for example, a host a and a host B belong to the same type of network element, and a host a and a virtual machine C belong to different types of network elements.
And 104, carrying out fault information labeling on the domain knowledge graph based on the experience of the operation and maintenance expert to obtain an abnormal knowledge graph of the operation and maintenance domain of the target system, wherein the fault information comprises: fault name and fault resolution.
In the embodiment of the invention, after the abnormal sub-map is determined as the domain knowledge map according to the historical abnormal sub-map of the target system, the domain knowledge map can be labeled according to the experience of an operation and maintenance expert, whether the domain knowledge map is a fault or not is labeled, if the domain knowledge map is the fault, the corresponding domain knowledge map is used as an effective domain knowledge map, fault information such as a fault name, a fault solution and the like is further labeled, and the domain knowledge map labeled with the fault information is used as the operation and maintenance domain abnormal knowledge map of the target system and is stored in a knowledge base.
According to the method for generating the abnormal knowledge map in the operation and maintenance field, the abnormal knowledge map in the operation and maintenance field is automatically generated according to the abnormal data generated when the abnormal event of the target system occurs, manual participation is not needed, and the abnormal event can be depicted more comprehensively and accurately. The method has the advantages that the abnormal indexes are automatically extracted based on abnormal data, comprehensiveness and accuracy of extraction of the abnormal indexes generated by abnormal events can be guaranteed, fault knowledge is automatically extracted based on the abnormal indexes, the cost is low, false alarm and missed report are not prone to occurring, the timeliness is high, 24-hour uninterrupted extraction can be achieved, the abnormal indexes can be extracted according to a time window and converted into fault knowledge, sampling of fault knowledge can be comprehensive and accurate, fault information such as fault names and fault solutions can be labeled by operation and maintenance experts, a generated operation and maintenance field abnormal knowledge map can be specially provided for solving a class of faults, and important data support is provided for follow-up fault judgment, fault location and fault processing.
Referring to fig. 3A, fig. 3A is a flow diagram illustrating a process of constructing an abnormal sub-graph according to the present invention, as shown in fig. 3A, the abnormal indexes are grouped according to component types to which the performance indexes belong based on a preset time window, and constructing a corresponding abnormal sub-graph based on the grouping of the abnormal indexes at least includes:
and 301, dividing the time series data of the abnormal indexes according to a preset time window, and determining a first ratio of the abnormal data of each abnormal index in the time series data of the current time window.
And 302, determining the abnormal index with the first ratio larger than a preset first threshold value as a target abnormal index in a corresponding time window.
303, grouping the target abnormal indexes in a time window according to the component types to which the performance indexes belong, and constructing corresponding abnormal sub-maps based on the grouping of the target abnormal indexes.
In the embodiment of the present invention, after obtaining the abnormal indexes through abnormal detection, the time series data of the abnormal indexes may be divided according to a preset time window, and a first ratio of the abnormal data of each abnormal index in the time series data of the current time window in which the abnormal data is located, that is, a first ratio in the total number of detections of the current time window in which the abnormal data is located, is determined, and then it is determined whether the first ratio is greater than a preset first threshold value, if the first ratio is greater than the preset first threshold value, determining the corresponding abnormal index as a target abnormal index in the time window, if the first ratio is less than or equal to a preset first threshold value, not determining the corresponding abnormal index as the target abnormal index in the time window, finally grouping the target abnormal indexes in each time window according to the component types to which the performance indexes belong, and respectively constructing corresponding abnormal sub-maps according to the grouping of the target abnormal indexes, as shown in fig. 3B, wherein fig. 3B is a flow diagram of an application scene for constructing the abnormal sub-maps provided by the invention. The first threshold may be set empirically in advance, and the value of the first threshold is not limited in the embodiment of the present invention, for example, the first threshold may be 10%.
In this embodiment, before the abnormal sub-map is constructed based on the group of abnormal indexes, the abnormal indexes are filtered according to the proportion of the abnormal data of the abnormal indexes in the time sequence data of the abnormal indexes in the time window where the abnormal data of the abnormal indexes are located, so that the wrong abnormal indexes can be removed, the correctness of the abnormal indexes for constructing the abnormal Chang Zi map is ensured, and the correctness of the constructed abnormal sub-map is ensured.
Referring to fig. 4, fig. 4 is a schematic flow chart of determining a domain knowledge graph provided by the present invention, and as shown in fig. 4, determining the similarity between an abnormal sub-graph and a Shi Yichang sub-graph of other network elements of the same type in a target system, and determining the abnormal sub-graph as the domain knowledge graph based on the similarity at least includes:
401, determining similarity between the abnormal sub-graph and historical abnormal sub-graphs of other network elements of the same type in the target system.
And 402, determining a second proportion of the network element generating the abnormal sub-map and other network elements with the similarity larger than a preset second threshold in the network elements of the same type.
And 403, determining the abnormal sub-graph spectrum with the second proportion larger than a third threshold value as the domain knowledge graph spectrum.
In the embodiment of the present invention, after the abnormal sub-maps are constructed based on the abnormal indexes, similarity analysis may be performed on each abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system one by one, and it is determined whether there is an abnormal Chang Zi map whose similarity with the current abnormal sub-map is greater than a preset second threshold in the historical abnormal sub-maps of other network elements of the same type, if there is an abnormal sub-map whose similarity with the current abnormal sub-map is greater than a preset second threshold in the historical abnormal sub-maps of other network elements of the same type, it indicates that a similar fault has occurred in the target system, and further statistics is performed on network elements having similar faults, including a network element generating an abnormal sub-map and other network elements whose similarity is greater than a preset second threshold, a second occupation ratio in the network elements of the same type, and finally it is determined whether the second occupation ratio is greater than a preset third threshold, and if the second occupation ratio is greater than the preset third threshold, the abnormal sub-map is determined as a domain knowledge map. The second threshold and the third threshold may be set empirically in advance, and the values of the second threshold and the third threshold are not limited in the embodiment of the present invention, for example, the second threshold may be 80%, and the third threshold may be 30%.
Referring to fig. 5, fig. 5 is a schematic flow chart illustrating the determining of the similarity between the abnormal sub-map and the historical abnormal sub-map provided by the present invention, as shown in fig. 5, the determining of the similarity between the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system at least includes, for each abnormal sub-map:
501, determining the similarity of the abnormal indexes of the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system, the number of the same abnormal indexes and the similarity of map vectors based on node2 vec.
502, ranking the determined abnormal index similarity, the number of the same abnormal indexes and the map vector similarity based on the node2vec respectively.
503, summing the rank of the similarity of the abnormal indexes, the number of the same abnormal indexes and the similarity of the map vectors based on the node2vec of each historical abnormal sub-map to obtain the similarity of the corresponding historical abnormal sub-map and the abnormal sub-map.
In the embodiment of the present invention, when determining the similarity between the abnormal sub-map and the historical abnormal sub-map, the feature for determining the similarity may be generated based on the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system, and the feature for determining the similarity may include the similarity between the abnormal sub-map and the abnormal index of the historical abnormal sub-map, the number of the same abnormal index, and the similarity between the map vector based on node2 vec.
The similarity of the abnormal indexes of the abnormal sub-maps and the historical abnormal sub-maps can be calculated by adopting a Shape-based distance (SBD) correlation algorithm, wherein errors caused by time shifting of the performance indexes can be ignored by the SBD algorithm, and the correlation degree of the performance index time sequence data can be reflected better. For example, one abnormal sub-map g has m abnormal indexes, n abnormal Chang Zi maps of other network elements of the same type as the abnormal sub-map g can be selected from the historical abnormal sub-map library, each abnormal sub-map has k abnormal indexes, and the calculation complexity is δ = n m k. In the implementation process, in order to improve the efficiency of calculating the similarity by map matching, a parallelization method can be adopted for processing. The maximum value of the SBD value calculated by any two abnormal indexes between the abnormal sub-map and the Shi Yichang sub-map can be selected as the abnormal index similarity of the abnormal sub-map and the historical abnormal sub-map.
The number of the same abnormal indexes of the abnormal sub-map and the historical abnormal sub-map can be respectively determined, and the number of the same abnormal indexes in the abnormal sub-map g and the n historical abnormal sub-maps can be taken as the characteristics.
The abnormal sub-map and the historical abnormal sub-map are based on the map vector similarity of node2vec, the abnormal sub-map and the historical abnormal sub-map can be vectorized based on node2vec to obtain the map vectors of 200 rows of the abnormal sub-map and the historical abnormal sub-map, then the similarity of the map vectors of the abnormal sub-map and the historical abnormal sub-map is determined, wherein the node2vec is a graph embedding method comprehensively considering DFS neighborhoods and BFS neighborhoods, can be regarded as an extension of deepwalk, and is deepwalk combining DFS and BFS random walk.
And then performing feature fusion on the generated features for determining the similarity to obtain the similarity of the final atlas, wherein the feature fusion can adopt a weighting method. The similarity of the abnormal indexes obtained based on the abnormal sub-graph and the historical abnormal sub-graph, the number of the same abnormal indexes and the similarity of the graph vectors based on the node2vec are shown in table 1.
TABLE 1
And respectively ranking the abnormal index similarity, the same abnormal index quantity and the map vector similarity based on the node2vec of the abnormal sub-map and the historical abnormal sub-map in the table 1 to obtain a table 2.
TABLE 2
And summing the abnormal index similarity of the historical abnormal sub-maps in the table 2, the number of the same abnormal indexes and the ranking of the map vector similarity based on the node2vec to obtain a table 3.
TABLE 3
The ranking in the table 3 is converted into similarity, the ranking can be converted by adopting a normalization index function softmax, the ranking is normalized to be a decimal number between 0 and 1, and then the normalized numerical value is subtracted from 1 to obtain a table 4 representing the similarity between the abnormal sub-map and the historical abnormal sub-map, wherein the larger the numerical value is, the higher the similarity is.
TABLE 4
Referring to fig. 6A, fig. 6A is a schematic flow chart of an application method of an abnormal knowledge graph in the operation and maintenance field according to the present invention, where the application method of the abnormal knowledge graph in the operation and maintenance field shown in fig. 6A may be executed by an application device of the abnormal knowledge graph in the operation and maintenance field, and the application device of the abnormal knowledge graph in the operation and maintenance field may be disposed in a server, for example, the server may be a physical server including an independent host, a virtual server borne by a host cluster, a cloud server, and the like, which is not limited in this embodiment of the present invention. As shown in fig. 6A, the application method of the abnormal knowledge graph in the operation and maintenance field at least includes:
601, determining time sequence data of the performance index based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance index to determine an abnormal index therein.
And 602, grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes.
603, splicing the abnormal sub-maps based on the system architecture of the target system, and verifying the spliced abnormal sub-maps to generate a fault knowledge map.
And 604, segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph.
And 605, matching the fault sub-map with the operation and maintenance field abnormal knowledge map of the target system, and determining the operation and maintenance field abnormal knowledge map corresponding to the fault sub-map.
And 606, obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by fault knowledge map segmentation.
In the embodiment of the invention, after the abnormal knowledge graph in the operation and maintenance field of the target system is obtained, when the target system generates an abnormal condition, the abnormal sub-graph spectrum can be obtained based on 601 and 602, then the abnormal sub-graph spectrum is spliced according to the system architecture of the target system, the spliced abnormal sub-graph spectrum is verified, and finally the fault knowledge graph is generated. As shown in fig. 6B, fig. 6B is a schematic diagram of a failure knowledge graph generated according to the application method of the abnormal knowledge graph in the operation and maintenance field provided by the present invention. The description of 601 and 602 can refer to the description of 101 and 102 in fig. 1, and thus will not be repeated here. The implementation method for splicing the abnormal sub-maps is not limited, and for example, the abnormal sub-maps can be spliced by adopting algorithms such as frequent subgraph mining and the like. The implementation method for verifying the spliced abnormal sub-map is not limited, and for example, the spliced abnormal sub-map can be verified and confirmed by calling a chain, expert experience and other methods.
After the fault knowledge graph is generated, the fault knowledge graph can be segmented according to the component types to form fault sub-graphs, then each fault sub-graph is respectively matched with the operation and maintenance field abnormal knowledge graph of the target system in the knowledge base, and if the operation and maintenance field abnormal knowledge graph corresponding to the fault sub-graph is matched, a final fault solution of the abnormal condition generated by the target system can be obtained according to a fault solution labeled by the matched operation and maintenance field abnormal knowledge graph. The fault sub-graphs are matched with the operation and maintenance field abnormal knowledge graph, and the similarity between each fault sub-graph and the operation and maintenance field abnormal knowledge graph of the target system in the knowledge base can be determined.
The operation and maintenance domain abnormal knowledge map generation device provided by the invention is described below, and the operation and maintenance domain abnormal knowledge map generation device described below and the operation and maintenance domain abnormal knowledge map generation method described above can be referred to in a corresponding manner.
Referring to fig. 7, fig. 7 is a schematic diagram illustrating a composition structure of an operation and maintenance domain abnormal knowledge graph generation device according to the present invention, where the operation and maintenance domain abnormal knowledge graph generation device shown in fig. 7 may be disposed in a server for executing the operation and maintenance domain abnormal knowledge graph generation method shown in fig. 1, for example, the server may be a physical server including an independent host, a virtual server carried by a host cluster, a cloud server, and the like, which is not limited in the embodiment of the present invention. As shown in fig. 7, the apparatus for generating an abnormal knowledge graph in the operation and maintenance field at least includes:
and the abnormal index detection module 710 is configured to determine time series data of the performance index based on the collected operation data of the target system, and perform abnormal detection on the time series data of the performance index to determine an abnormal index therein.
And the abnormal map building module 720 is configured to group the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window, and build corresponding abnormal sub-maps based on the grouping of the abnormal indexes.
And the domain knowledge graph extraction module 730 is configured to determine similarity between the abnormal sub-graph and historical abnormal sub-graphs of other network elements of the same type in the target system, and determine the abnormal sub-graph as the domain knowledge graph based on the similarity.
The domain knowledge map marking module 740 is configured to label fault information of the domain knowledge map based on experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge map of the target system, where the fault information includes: fault name and fault solution.
Optionally, the anomaly map building module 720 includes:
and the time division unit is used for dividing the time sequence data of the abnormal indexes according to a preset time window and determining the first proportion of the abnormal data of each abnormal index in the time sequence data of the current time window.
And the index filtering unit is used for determining the abnormal index of which the first ratio is greater than a preset first threshold as the target abnormal index in the corresponding time window.
And the type grouping unit is used for grouping the target abnormal indexes in a time window according to the component types to which the performance indexes belong and constructing corresponding abnormal sub-maps based on the grouping of the target abnormal indexes.
Optionally, the domain knowledge graph extracting module 730 comprises:
and the similarity calculation unit is used for determining the similarity between the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system.
And the network element proportion calculating unit is used for determining a second proportion of the network element generating the abnormal sub-map, other network elements with the similarity larger than a preset second threshold value and network elements of the same type.
And the map extraction unit is used for determining the abnormal sub-map with the second proportion larger than the third threshold value as the domain knowledge map.
Optionally, the similarity calculation unit includes:
and the characteristic generating subunit is used for determining the abnormal index similarity, the same abnormal index quantity and the map vector similarity based on the node2vec of each abnormal sub-map and the historical abnormal sub-maps of other network elements of the same type in the target system.
And the characteristic ranking subunit is used for ranking the determined abnormal index similarity, the number of the same abnormal indexes and the similarity of the map vectors based on the node2vec respectively aiming at each abnormal sub-map.
And the similarity operator unit is used for summing the abnormal index similarity of each historical abnormal Chang Zi map of each abnormal sub-map, the number of the same abnormal indexes and the ranking of the map vector similarity based on the node2vec to obtain the similarity between the corresponding historical abnormal sub-map and the abnormal sub-map.
Optionally, the abnormal index detecting module 710 includes:
and the index determining unit is used for acquiring the operation data of the target system based on the agent program and processing the operation data to obtain the time sequence data of the performance index.
And the abnormality detection unit is used for carrying out abnormality detection on the time series data of the performance indexes based on the 4-sigma and determining the abnormality indexes.
The application device of the abnormal knowledge map in the operation and maintenance field provided by the invention is described below, and the application device of the abnormal knowledge map in the operation and maintenance field described below and the application method of the abnormal knowledge map in the operation and maintenance field described above can be referred to correspondingly.
Referring to fig. 8, fig. 8 is a schematic diagram illustrating a composition structure of an application device of an abnormal knowledge graph in the operation and maintenance field according to the present invention, where the application device of the abnormal knowledge graph in the operation and maintenance field shown in fig. 8 may be disposed in a server for executing the application method of the abnormal knowledge graph in the operation and maintenance field shown in fig. 6A, for example, the server may be a physical server including an independent host, a virtual server borne by a host cluster, a cloud server, and the like, which is not limited in this embodiment of the present invention. As shown in fig. 8, the application device of the abnormal knowledge graph in the operation and maintenance field at least includes:
and the abnormal index detection module 810 is configured to determine time series data of the performance index based on the acquired operation data of the target system, and perform abnormal detection on the time series data of the performance index to determine an abnormal index therein.
And the abnormal map building module 820 is configured to group the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window, and build corresponding abnormal sub-maps based on the grouping of the abnormal indexes.
And a failure knowledge graph generation module 830, configured to splice the abnormal sub-graphs based on the system architecture of the target system, check the spliced abnormal sub-graphs, and generate a failure knowledge graph.
And the failure knowledge graph segmentation module 840 is used for segmenting the failure knowledge graph based on the component type to obtain a failure sub-graph.
And the domain knowledge map matching module 850 is used for matching the fault sub-map with the operation and maintenance domain abnormal knowledge map of the target system to determine the operation and maintenance domain abnormal knowledge map corresponding to the fault sub-map.
And the fault solution extracting module 860 is used for obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
Fig. 9 illustrates a physical structure diagram of an electronic device, and as shown in fig. 9, the electronic device may include: a processor (processor) 910, a communication interface (communication interface) 920, a memory (memory) 930, and a communication bus 940, wherein the processor 910, the communication interface 920, and the memory 930 are in communication with each other via the communication bus 940. Processor 910 may invoke logic instructions in memory 930 to perform the method described above, the method comprising:
determining time sequence data of performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
determining the similarity between the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
and carrying out fault information labeling on the domain knowledge graph based on the experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution. Alternatively, the first and second electrodes may be,
determining time sequence data of performance indexes based on the collected running data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
splicing the abnormal sub-maps based on the system architecture of the target system, and verifying the spliced abnormal sub-maps to generate a fault knowledge map;
segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph;
matching the fault sub-map with an operation and maintenance field abnormal knowledge map of the target system, and determining the operation and maintenance field abnormal knowledge map corresponding to the fault sub-map;
and obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
Furthermore, the logic instructions in the memory 930 may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the above method comprising:
determining time sequence data of performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
and carrying out fault information labeling on the domain knowledge graph based on the experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution. Alternatively, the first and second electrodes may be,
determining time sequence data of performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
splicing the abnormal sub-maps based on the system architecture of the target system, and verifying the spliced abnormal sub-maps to generate a fault knowledge map;
segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph;
matching the fault sub-map with an operation and maintenance field abnormal knowledge map of the target system, and determining the operation and maintenance field abnormal knowledge map corresponding to the fault sub-map;
and obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the above method, the method comprising:
determining time sequence data of performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
determining the similarity between the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
and carrying out fault information labeling on the domain knowledge graph based on the experience of an operation and maintenance expert to obtain an abnormal knowledge graph of the operation and maintenance domain of the target system, wherein the fault information comprises: fault name and fault solution. Alternatively, the first and second electrodes may be,
determining time sequence data of performance indexes based on the collected running data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
splicing the abnormal sub-maps based on the system architecture of the target system, and verifying the spliced abnormal sub-maps to generate a fault knowledge map;
segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph;
matching the fault sub-map with an operation and maintenance domain abnormal knowledge map of the target system, and determining the operation and maintenance domain abnormal knowledge map corresponding to the fault sub-map;
and obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the embodiment. One of ordinary skill in the art can understand and implement the method without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may be modified or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions.
Claims (10)
1. A method for generating an abnormal knowledge graph in the operation and maintenance field is characterized by comprising the following steps:
determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
and carrying out fault information labeling on the domain knowledge graph based on the experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution.
2. The method for generating the operation and maintenance field abnormal knowledge graph according to claim 1, wherein the step of grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window and constructing the corresponding abnormal sub-graph based on the grouping of the abnormal indexes comprises the steps of:
dividing the time sequence data of the abnormal indexes according to a preset time window, and determining a first proportion of the abnormal data of each abnormal index in the time sequence data of the current time window;
determining the abnormal index of which the first ratio is larger than a preset first threshold value as a target abnormal index in a corresponding time window;
and grouping the target abnormal indexes in one time window according to the component types to which the performance indexes belong, and constructing a corresponding abnormal sub-map based on the grouping of the target abnormal indexes.
3. The method for generating the operation and maintenance domain abnormal knowledge graph according to claim 1 or 2, wherein the determining the similarity between the abnormal sub-graph and the historical abnormal sub-graph of other network elements of the same type in the target system, and the determining the abnormal sub-graph as the domain knowledge graph based on the similarity comprises:
determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system;
determining a second proportion of the network element generating the abnormal sub-map and other network elements with the similarity larger than a preset second threshold in the network elements of the same type;
determining the abnormal sub-graph spectrum with the second ratio larger than the third threshold value as the domain knowledge graph.
4. The method for generating an abnormal knowledge graph in the operation and maintenance field according to claim 3, wherein the determining the similarity between the abnormal sub-graph and the historical abnormal sub-graph of other network elements of the same type in the target system includes, for each abnormal sub-graph:
determining the similarity of the abnormal indexes of the abnormal sub-map and the historical abnormal sub-maps of other network elements of the same type in the target system, the quantity of the same abnormal indexes and the similarity of map vectors based on node2 vec;
ranking the determined abnormal index similarity, the number of the same abnormal indexes and the map vector similarity based on the node2vec respectively;
and performing ranking summation on the abnormal index similarity, the same abnormal index quantity and the map vector similarity based on node2vec of each historical abnormal sub-map to obtain the similarity between the corresponding historical abnormal sub-map and the abnormal sub-map.
5. The method for generating the abnormal knowledge graph in the operation and maintenance field according to claim 1, wherein the determining of the time series data of the performance index based on the collected operation data of the target system, and performing the abnormal detection on the time series data of the performance index to determine the abnormal index therein comprises:
collecting operation data of the target system based on an agent program, and processing the operation data to obtain time sequence data of the performance index;
and carrying out anomaly detection on the time series data of the performance index based on 4-sigma, and determining an anomaly index in the time series data.
6. An application method of an abnormal knowledge graph in the operation and maintenance field is characterized by comprising the following steps:
determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
splicing the abnormal sub-maps based on the system architecture of the target system, verifying the spliced abnormal sub-maps and generating a fault knowledge map;
segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph;
matching the fault sub-map with an operation and maintenance domain abnormal knowledge map of the target system, and determining the operation and maintenance domain abnormal knowledge map corresponding to the fault sub-map;
and obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
7. An operation and maintenance field abnormal knowledge map generation device is characterized by comprising:
the abnormal index detection module is used for determining the time sequence data of the performance index based on the collected running data of the target system and carrying out abnormal detection on the time sequence data of the performance index to determine the abnormal index;
the abnormal map building module is used for grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window and building corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
the domain knowledge graph extraction module is used for determining the similarity between the abnormal sub-graph and historical abnormal sub-graphs of other network elements of the same type in the target system and determining the abnormal sub-graph as a domain knowledge graph based on the similarity;
the domain knowledge map marking module is used for marking fault information of the domain knowledge map based on experience of operation and maintenance experts to obtain the abnormal knowledge map of the operation and maintenance domain of the target system, wherein the fault information comprises: fault name and fault solution.
8. An application device of an abnormal knowledge graph in the operation and maintenance field is characterized by comprising the following steps:
the abnormal index detection module is used for determining the time sequence data of the performance index based on the collected running data of the target system and carrying out abnormal detection on the time sequence data of the performance index to determine the abnormal index;
the abnormal map building module is used for grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window and building corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
the fault knowledge map generation module is used for splicing the abnormal sub-maps based on the system architecture of the target system, verifying the spliced abnormal sub-maps and generating a fault knowledge map;
the fault knowledge map segmentation module is used for segmenting the fault knowledge map based on the component type to obtain a fault sub-map;
the domain knowledge map matching module is used for matching the fault sub-map with the operation and maintenance domain abnormal knowledge map of the target system and determining the operation and maintenance domain abnormal knowledge map corresponding to the fault sub-map;
and the fault solution extraction module is used for obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for generating the operation and maintenance domain anomaly knowledge map according to any one of claims 1 to 5 or the method for applying the operation and maintenance domain anomaly knowledge map according to claim 6.
10. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for generating the operation and maintenance domain anomaly knowledge map according to any one of claims 1 to 5, or the method for applying the operation and maintenance domain anomaly knowledge map according to claim 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210664886.4A CN115277453B (en) | 2022-06-13 | 2022-06-13 | Method for generating abnormal knowledge graph in operation and maintenance field, application method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210664886.4A CN115277453B (en) | 2022-06-13 | 2022-06-13 | Method for generating abnormal knowledge graph in operation and maintenance field, application method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115277453A true CN115277453A (en) | 2022-11-01 |
CN115277453B CN115277453B (en) | 2024-06-18 |
Family
ID=83758852
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210664886.4A Active CN115277453B (en) | 2022-06-13 | 2022-06-13 | Method for generating abnormal knowledge graph in operation and maintenance field, application method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115277453B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112787841A (en) * | 2019-11-11 | 2021-05-11 | 华为技术有限公司 | Fault root cause positioning method and device and computer storage medium |
WO2021114977A1 (en) * | 2019-12-12 | 2021-06-17 | 深圳前海微众银行股份有限公司 | Method and device for positioning fundamental cause of abnormal event |
CN113032238A (en) * | 2021-05-25 | 2021-06-25 | 南昌惠联网络技术有限公司 | Real-time root cause analysis method based on application knowledge graph |
WO2021184630A1 (en) * | 2020-03-19 | 2021-09-23 | 平安国际智慧城市科技股份有限公司 | Method for locating pollutant discharge object on basis of knowledge graph, and related device |
CN114218403A (en) * | 2021-12-20 | 2022-03-22 | 平安付科技服务有限公司 | Fault root cause positioning method, device, equipment and medium based on knowledge graph |
CN114430365A (en) * | 2022-04-06 | 2022-05-03 | 北京宝兰德软件股份有限公司 | Fault root cause analysis method and device, electronic equipment and storage medium |
CN114465874A (en) * | 2022-04-07 | 2022-05-10 | 北京宝兰德软件股份有限公司 | Fault prediction method, device, electronic equipment and storage medium |
-
2022
- 2022-06-13 CN CN202210664886.4A patent/CN115277453B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112787841A (en) * | 2019-11-11 | 2021-05-11 | 华为技术有限公司 | Fault root cause positioning method and device and computer storage medium |
WO2021114977A1 (en) * | 2019-12-12 | 2021-06-17 | 深圳前海微众银行股份有限公司 | Method and device for positioning fundamental cause of abnormal event |
WO2021184630A1 (en) * | 2020-03-19 | 2021-09-23 | 平安国际智慧城市科技股份有限公司 | Method for locating pollutant discharge object on basis of knowledge graph, and related device |
CN113032238A (en) * | 2021-05-25 | 2021-06-25 | 南昌惠联网络技术有限公司 | Real-time root cause analysis method based on application knowledge graph |
CN114218403A (en) * | 2021-12-20 | 2022-03-22 | 平安付科技服务有限公司 | Fault root cause positioning method, device, equipment and medium based on knowledge graph |
CN114430365A (en) * | 2022-04-06 | 2022-05-03 | 北京宝兰德软件股份有限公司 | Fault root cause analysis method and device, electronic equipment and storage medium |
CN114465874A (en) * | 2022-04-07 | 2022-05-10 | 北京宝兰德软件股份有限公司 | Fault prediction method, device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115277453B (en) | 2024-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111064614B (en) | Fault root cause positioning method, device, equipment and storage medium | |
CN113098723B (en) | Fault root cause positioning method and device, storage medium and equipment | |
CN104794136A (en) | Fault analysis method and device | |
US20160055044A1 (en) | Fault analysis method, fault analysis system, and storage medium | |
CN110149223B (en) | Fault positioning method and equipment | |
JP6079243B2 (en) | Failure analysis support device, failure analysis support method, and program | |
CN109726737B (en) | Track-based abnormal behavior detection method and device | |
CN111984442A (en) | Method and device for detecting abnormality of computer cluster system, and storage medium | |
CN114461534A (en) | Software performance testing method and system, electronic equipment and readable storage medium | |
CN110825466B (en) | Program jamming processing method and jamming processing device | |
CN116010456A (en) | Equipment processing method, server and rail transit system | |
CN111913824A (en) | Method for determining data link fault reason and related equipment | |
CN113138906A (en) | Call chain data acquisition method, device, equipment and storage medium | |
CN113221096A (en) | Method and system for analyzing correlation of random events in chaotic engineering | |
CN112416896A (en) | Data abnormity warning method and device, storage medium and electronic device | |
JP5668425B2 (en) | Failure detection apparatus, information processing method, and program | |
CN115277453A (en) | Method for generating abnormal knowledge graph in operation and maintenance field, application method and device | |
CN116149926A (en) | Abnormality monitoring method, device, equipment and storage medium for business index | |
CN115185792A (en) | Fault hardware processing method, device and system | |
CN109815109B (en) | Data mode change detection method, device, equipment and readable storage medium | |
CN113656210A (en) | Processing method and device for error reporting information, server and readable storage medium | |
CN113285977B (en) | Network maintenance method and system based on block chain and big data | |
CN115185932A (en) | Data processing method and device | |
CN115599621A (en) | Micro-service abnormity diagnosis method, device, equipment and storage medium | |
CN113781068A (en) | Online problem solving method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |