CN115277453A - Method for generating abnormal knowledge graph in operation and maintenance field, application method and device - Google Patents

Method for generating abnormal knowledge graph in operation and maintenance field, application method and device Download PDF

Info

Publication number
CN115277453A
CN115277453A CN202210664886.4A CN202210664886A CN115277453A CN 115277453 A CN115277453 A CN 115277453A CN 202210664886 A CN202210664886 A CN 202210664886A CN 115277453 A CN115277453 A CN 115277453A
Authority
CN
China
Prior art keywords
abnormal
map
sub
fault
indexes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210664886.4A
Other languages
Chinese (zh)
Other versions
CN115277453B (en
Inventor
王旭鹏
刘诗垒
任纪良
彭高历
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baolande Software Co ltd
Original Assignee
Beijing Baolande Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baolande Software Co ltd filed Critical Beijing Baolande Software Co ltd
Priority to CN202210664886.4A priority Critical patent/CN115277453B/en
Publication of CN115277453A publication Critical patent/CN115277453A/en
Application granted granted Critical
Publication of CN115277453B publication Critical patent/CN115277453B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/064Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a generation method, an application method and a device of an abnormal knowledge graph in the field of operation and maintenance. The generation method comprises the following steps: determining time sequence data of the performance index based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance index to determine an abnormal index; grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes; determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity; and labeling fault information including fault names and fault solutions to the domain knowledge map based on the experience of the operation and maintenance experts to obtain the operation and maintenance domain abnormal knowledge map of the target system. The method and the device can automatically generate the abnormal knowledge map in the operation and maintenance field according to the abnormal data generated when the abnormal event occurs.

Description

Method for generating abnormal knowledge graph in operation and maintenance field, application method and device
Technical Field
The invention relates to the technical field of computers, in particular to a generation method, an application method and an application device of an abnormal knowledge map spectrum in the operation and maintenance field.
Background
In a large computer cluster environment such as an IT information system, the deployment of software and hardware is complicated, the occurrence of faults can be described by abnormal performance indexes when the faults occur, and experience can be provided for the treatment of subsequent faults by accumulating fault knowledge to construct a knowledge graph. The existing method for constructing the knowledge graph according to the fault scene depends on expert experience, and the other method summarizes according to fault simulation.
Depending on expert experience: the method mainly comprises the steps that an expert combines self experience to summarize some typical fault scenes, manually compiles the typical fault scenes into a knowledge graph and adds corresponding solutions, and provides reference for subsequent fault judgment and fault solutions.
Summarising according to the fault simulation: the common method is to simulate various fault scenes as much as possible by using chaotic test tools or service embedding points and other modes, then manually count abnormal indexes generated when faults occur, and summarize the abnormal indexes and fault phenomena into a knowledge graph.
It can be seen that both expert experience and fault simulation summary require artificial statistics of anomaly indicators associated with a fault scenario. The generation of a knowledge graph with universality through artificial statistics has the following defects: when fault summary is performed manually, a few performance indexes with obvious abnormal characteristics are usually used as the display of a fault scene, but in a real environment of a system, when a fault occurs, a large number of abnormal indexes are generated within a period of time to form a multi-dimensional abnormal relation, and the description of the fault cannot be accurately performed only through the few performance indexes, so that the difficulty in subsequent fault location is increased. According to the experience of the operation and maintenance experts, the abnormal indexes are filtered by a method of setting a threshold value, a large number of abnormal indexes can be detected, however, false alarm of a large number of indexes is easily formed, normal indexes are identified as abnormal indexes, and operation and maintenance cost is seriously consumed. The method for extracting fault knowledge manually based on abnormal indexes has the defects of high cost, easiness in misinformation and missing report, low timeliness, incapability of realizing 24-hour uninterrupted extraction, incapability of extracting abnormal indexes according to a time window and converting the abnormal indexes into fault knowledge, and easiness in causing insufficient and inaccurate sampling of the fault knowledge.
Disclosure of Invention
The invention provides a method for generating an abnormal knowledge map in an operation and maintenance field, an application method and a device, which are used for overcoming the defects existing in the generation of the abnormal knowledge map through manual statistics and can realize the automatic generation of the abnormal knowledge map in the operation and maintenance field according to abnormal data generated when an abnormal event occurs.
In a first aspect, the invention provides a method for generating an abnormal knowledge graph in the operation and maintenance field, which comprises the following steps:
determining time sequence data of performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
and carrying out fault information labeling on the domain knowledge graph based on the experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution.
According to the method for generating the abnormal knowledge graph in the operation and maintenance field, the abnormal indexes are grouped according to the component types of the performance indexes based on the preset time window, and the corresponding abnormal sub-graph is constructed based on the grouping of the abnormal indexes, and the method comprises the following steps:
dividing the time sequence data of the abnormal indexes according to a preset time window, and determining a first proportion of the abnormal data of each abnormal index in the time sequence data of the current time window;
determining the abnormal index of which the first ratio is larger than a preset first threshold value as a target abnormal index in a corresponding time window;
and grouping the target abnormal indexes in one time window according to the component types to which the performance indexes belong, and constructing a corresponding abnormal sub-map based on the grouping of the target abnormal indexes.
According to the method for generating the abnormal knowledge graph in the operation and maintenance field, the similarity between the abnormal sub-graph and the historical abnormal sub-graph spectrum of other network elements of the same type in the target system is determined, and the abnormal sub-graph spectrum is determined as the field knowledge graph based on the similarity, and the method comprises the following steps:
determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system;
determining a second proportion of the network element generating the abnormal sub-map and other network elements with the similarity larger than a preset second threshold in the network elements of the same type;
determining the abnormal sub-graph spectrum with the second percentage being larger than the third threshold as the domain knowledge graph.
According to the method for generating the abnormal knowledge graph in the operation and maintenance field, the similarity between the abnormal sub-graph and the historical abnormal sub-graph spectrum of other network elements of the same type in the target system is determined, and for each abnormal sub-graph spectrum, the method comprises the following steps:
determining the similarity of the abnormal indexes of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, the quantity of the same abnormal indexes and the similarity of map vectors based on node2 vec;
ranking the determined abnormal index similarity, the number of the same abnormal indexes and the map vector similarity based on the node2vec respectively;
and aiming at the abnormal index similarity of each historical abnormal sub-map, the number of the same abnormal indexes and the ranking summation of the map vector similarity based on node2vec, obtaining the similarity of the corresponding historical abnormal sub-map and the abnormal sub-map.
According to the method for generating the abnormal knowledge graph in the operation and maintenance field, the time series data of the performance index is determined based on the collected operation data of the target system, and the abnormal index is determined by performing abnormal detection on the time series data of the performance index, and the method comprises the following steps:
collecting the operation data of the target system based on an agent program, and processing the operation data to obtain the time sequence data of the performance index;
and carrying out anomaly detection on the time series data of the performance index based on 4-sigma, and determining an anomaly index in the time series data.
In a second aspect, the present invention further provides an application method of the abnormal knowledge graph in the operation and maintenance field, including:
determining time sequence data of performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
splicing the abnormal sub-maps based on the system architecture of the target system, and verifying the spliced abnormal sub-maps to generate a fault knowledge map;
segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph;
matching the fault sub-map with an operation and maintenance field abnormal knowledge map of the target system, and determining the operation and maintenance field abnormal knowledge map corresponding to the fault sub-map;
and obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
In a third aspect, the present invention further provides a device for generating an abnormal knowledge graph in the operation and maintenance field, including:
the abnormal index detection module is used for determining the time sequence data of the performance index based on the collected running data of the target system, and performing abnormal detection on the time sequence data of the performance index to determine the abnormal index in the time sequence data;
the abnormal map building module is used for grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window and building corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
a domain knowledge graph extraction module, configured to determine similarity between the abnormal sub-graph and historical abnormal sub-graphs of other network elements of the same type in the target system, and determine the abnormal sub-graph as a domain knowledge graph based on the similarity;
the marking module of the domain knowledge map is used for marking fault information of the domain knowledge map based on the experience of an operation and maintenance expert to obtain the abnormal knowledge map of the operation and maintenance domain of the target system, wherein the fault information comprises: fault name and fault solution.
In a fourth aspect, the present invention further provides an application apparatus of an abnormal knowledge graph in the operation and maintenance field, including:
the abnormal index detection module is used for determining the time sequence data of the performance index based on the collected running data of the target system, and performing abnormal detection on the time sequence data of the performance index to determine the abnormal index in the time sequence data;
the abnormal map building module is used for grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window and building corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
the fault knowledge map generation module is used for splicing the abnormal sub-maps based on the system architecture of the target system, verifying the spliced abnormal sub-maps and generating a fault knowledge map;
the fault knowledge map segmentation module is used for segmenting the fault knowledge map based on the component type to obtain a fault sub-map;
the domain knowledge map matching module is used for matching the fault sub-map with the operation and maintenance domain abnormal knowledge map of the target system and determining the operation and maintenance domain abnormal knowledge map corresponding to the fault sub-map;
and the fault solution extraction module is used for obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
In a fifth aspect, the present invention further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for generating the abnormal knowledge map in the operation and maintenance field according to the first aspect, or the method for applying the abnormal knowledge map in the operation and maintenance field according to the second aspect when executing the program.
In a sixth aspect, the invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for generating the abnormal knowledge map in the operation and maintenance field according to the first aspect, or the method for applying the abnormal knowledge map in the operation and maintenance field according to the second aspect.
In a seventh aspect, the invention further provides a computer program product, on which a computer program is stored, and when the computer program is executed by a processor, the method for generating the operation and maintenance domain anomaly knowledge graph according to the first aspect or the step of the method for applying the operation and maintenance domain anomaly knowledge graph according to the second aspect is implemented.
According to the method and the device for generating the abnormal knowledge map in the operation and maintenance field, the abnormal knowledge map in the operation and maintenance field is automatically generated according to the abnormal data generated when the abnormal event of the target system occurs, manual participation is not needed, and the abnormal event can be depicted more comprehensively and accurately. The abnormal indexes are automatically extracted based on abnormal data, comprehensiveness and accuracy of extracting the abnormal indexes generated by abnormal events can be guaranteed, fault knowledge is automatically extracted based on the abnormal indexes, the cost is low, false alarm and missed report are not prone to occurring, the timeliness is high, 24-hour uninterrupted extraction can be achieved, the abnormal indexes can be extracted according to a time window and converted into the fault knowledge, sampling of the fault knowledge can be comprehensive and accurate, fault information such as fault names and fault solution schemes is labeled through operation and maintenance experts, the generated operation and maintenance field abnormal knowledge map can specially provide a scheme for solving one type of faults, and important data support is provided for follow-up fault judgment, fault positioning and fault processing.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the following briefly introduces the drawings needed for the embodiments or the prior art descriptions, and obviously, the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic flow chart of a method for generating an abnormal knowledge graph in the operation and maintenance field according to the present invention;
FIGS. 2A, 2B and 2C are schematic diagrams of an anomaly sub-map provided by the present invention;
FIG. 3A is a schematic flow chart of constructing an anomaly sub-map according to the present invention;
FIG. 3B is a schematic flow chart of an application scenario for constructing an anomaly sub-map provided by the present invention;
FIG. 4 is a schematic flow chart of determining a domain knowledge graph provided by the present invention;
FIG. 5 is a schematic flow chart of determining the similarity between an abnormal sub-map and a historical abnormal sub-map according to the present invention;
FIG. 6A is a flow chart of an application method of the abnormal knowledge graph in the operation and maintenance field according to the present invention;
FIG. 6B is a schematic diagram of a failure knowledge graph generated according to the application method of the abnormal knowledge graph in the operation and maintenance field provided by the invention;
FIG. 7 is a schematic diagram of a component structure of an abnormal knowledge graph generation device in the operation and maintenance field according to the present invention;
FIG. 8 is a schematic diagram of a component structure of an application apparatus of an abnormal knowledge graph in the operation and maintenance field provided by the present invention;
fig. 9 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings in the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method for generating the abnormal knowledge graph in the operation and maintenance field provided by the invention is described below with reference to fig. 1 to 5.
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for generating an abnormal knowledge graph in an operation and maintenance field according to the present invention, where the method for generating an abnormal knowledge graph in an operation and maintenance field shown in fig. 1 may be executed by a device for generating an abnormal knowledge graph in an operation and maintenance field, and the device for generating an abnormal knowledge graph in an operation and maintenance field may be disposed in a server, for example, the server may be a physical server including an independent host, a virtual server borne by a host cluster, a cloud server, and the like, which is not limited in this embodiment of the present invention. As shown in fig. 1, the method for generating the abnormal knowledge graph in the operation and maintenance field at least includes:
and 101, determining time series data of the performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time series data of the performance indexes to determine abnormal indexes in the time series data.
In the embodiment of the present invention, the target system may be a system that needs to be operated and maintained, for example, an IT information system, and the type of the target system is not limited in the embodiment of the present invention. The time sequence data of the performance index of the target system can be acquired by regularly acquiring the operation data of the components in the target system. The embodiment of the invention does not limit the number and types of the components for acquiring the operation data in the target system, the set time for acquiring the operation data and the type of the performance index acquired based on the acquired operation data. For example, the operation data of all the running hardware components and software components in the IT information system may be collected once per minute, and the timing sequence data of the performance indexes of all the running hardware components and software components may be determined based on the collected operation data, where the performance indexes of the hardware components may include CPU occupancy of the host, process count, memory usage rate, and the like, and the performance indexes of the software components may include software compatibility, security, maintainability, and the like.
The embodiment of the invention does not limit the implementation method for acquiring the running data of the target system to obtain the time sequence data of the performance index. Optionally, the running data of the target system may be collected based on the agent program, and the running data is processed to obtain time series data of the performance index; or, the running data of the target system can be acquired by other existing automatic data acquisition methods, and the running data is processed to obtain the time series data of the performance index. For example, the Agent technology may be used to collect operation data from a hardware component and a software component in an IT information system, store the collected operation data in a data warehouse, and process and aggregate the data to obtain time series data of performance indexes of the hardware component and the software component in the IT information system, where the method of processing and aggregating the data may be implemented by using an existing method according to the type of the performance index.
In the embodiment of the present invention, after the time series data of the target system performance index is obtained, abnormal performance indexes, that is, abnormal indexes, in the time series data of the performance indexes can be obtained by detecting abnormal data in the time series data of the performance indexes. The implementation method for performing the anomaly detection on the time series data of the performance index in the embodiment of the invention is not limited. Optionally, an existing anomaly detection algorithm may be used to perform anomaly detection on the time series data of the performance index, for example, a general anomaly detection algorithm such as an isolated forest algorithm, a Local anomaly Factor (LOF) algorithm, and the like; or, the existing anomaly detection algorithm may be improved, and anomaly detection may be performed on the time series data of the performance index based on the improved algorithm, for example, the existing N-sigma algorithm may be improved, where N is a deviation multiple of the threshold, and when N =3, the existing N-sigma algorithm is a general 3-sigma algorithm, and may be solved through multiple iterations, and finally N is determined to be 4, so as to obtain an improved 4-sigma algorithm, and anomaly detection may be performed on the time series data of the performance index based on 4-sigma, so as to determine an anomaly index therein.
And 102, grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes.
In the embodiment of the invention, after the time series data of the target system performance indexes are subjected to anomaly detection to obtain the anomaly indexes, the detected anomaly indexes can be grouped according to the time dimension and the component dimension to construct each anomaly sub-map. The abnormal indexes are grouped according to the time dimension, a time window can be preset, the abnormal indexes are divided according to the preset time window, the abnormal indexes are grouped according to the component dimension, the abnormal indexes in each time window can be grouped according to the component type to which the performance index belongs, and therefore an abnormal Chang Zi map is constructed according to the grouping of each abnormal index, and each abnormal sub-map is constructed. The preset time window width is not limited in the embodiment of the present invention, for example, the preset time window width may be 10 minutes. The embodiment of the present invention does not limit the division of the component type to which the performance index belongs, for example, the component type to which the performance index belongs may include a host class index, a database class index, an application class index, a log class index, a call chain class index, an alarm class index, and the like.
For example, the abnormal index is divided by taking 10 minutes as the width of a time window, the abnormal index in the current time window is grouped and summarized according to a host class index, a database class index, an application class index, a log class index, a calling chain class index, an alarm class index and the like, and the abnormal index of the host class, the abnormal index of the database class and the abnormal index of the application class in the current time window can be obtained, wherein the abnormal index of the host class is marked by a network rate, a CPU occupation rate, a disk IO speed and a memory utilization rate (MEM), the abnormal index of the database class is marked by a maximum connection number, a table space capacity and a cache space size, and the abnormal index of the application class is marked by an object processing number (app.tps) transmitted by application software per second and a request response time, as shown in fig. 2A, fig. 2B and fig. 2C, a host abnormal sub-map, a database abnormal sub-map and an application abnormal sub-map can be respectively constructed according to the grouping of the abnormal index, and the constructed abnormal sub-map is stored in a database.
And 103, determining the similarity of the abnormal sub-map and historical abnormal sub-map spectrums of other network elements of the same type in the target system, and determining the abnormal sub-map spectrums to be the domain knowledge map based on the similarity.
In the embodiment of the invention, after the abnormal indexes are grouped according to the time window and the component type to which the performance index belongs to construct the abnormal sub-map, the similarity between the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system can be determined according to the Shi Yichang sub-map of the target system, whether the abnormal sub-map has the universality in the network elements of the same type in the target system is judged according to the similarity, and if the abnormal sub-map has the universality in the network elements of the same type in the target system, the abnormal sub-map is determined as the domain knowledge map. The embodiment of the invention does not limit the implementation method for determining the similarity between the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system according to the historical abnormal sub-map of the target system. For example, the similarity between the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system can be determined according to a preset algorithm. In the present invention, a network element may refer to an element in a network of a target system, for example, a host, a server, a router, a virtual machine, an application program, and the like, and a network element of the same type may refer to an element having the same or similar function in a network of a target system, for example, a host a and a host B belong to the same type of network element, and a host a and a virtual machine C belong to different types of network elements.
And 104, carrying out fault information labeling on the domain knowledge graph based on the experience of the operation and maintenance expert to obtain an abnormal knowledge graph of the operation and maintenance domain of the target system, wherein the fault information comprises: fault name and fault resolution.
In the embodiment of the invention, after the abnormal sub-map is determined as the domain knowledge map according to the historical abnormal sub-map of the target system, the domain knowledge map can be labeled according to the experience of an operation and maintenance expert, whether the domain knowledge map is a fault or not is labeled, if the domain knowledge map is the fault, the corresponding domain knowledge map is used as an effective domain knowledge map, fault information such as a fault name, a fault solution and the like is further labeled, and the domain knowledge map labeled with the fault information is used as the operation and maintenance domain abnormal knowledge map of the target system and is stored in a knowledge base.
According to the method for generating the abnormal knowledge map in the operation and maintenance field, the abnormal knowledge map in the operation and maintenance field is automatically generated according to the abnormal data generated when the abnormal event of the target system occurs, manual participation is not needed, and the abnormal event can be depicted more comprehensively and accurately. The method has the advantages that the abnormal indexes are automatically extracted based on abnormal data, comprehensiveness and accuracy of extraction of the abnormal indexes generated by abnormal events can be guaranteed, fault knowledge is automatically extracted based on the abnormal indexes, the cost is low, false alarm and missed report are not prone to occurring, the timeliness is high, 24-hour uninterrupted extraction can be achieved, the abnormal indexes can be extracted according to a time window and converted into fault knowledge, sampling of fault knowledge can be comprehensive and accurate, fault information such as fault names and fault solutions can be labeled by operation and maintenance experts, a generated operation and maintenance field abnormal knowledge map can be specially provided for solving a class of faults, and important data support is provided for follow-up fault judgment, fault location and fault processing.
Referring to fig. 3A, fig. 3A is a flow diagram illustrating a process of constructing an abnormal sub-graph according to the present invention, as shown in fig. 3A, the abnormal indexes are grouped according to component types to which the performance indexes belong based on a preset time window, and constructing a corresponding abnormal sub-graph based on the grouping of the abnormal indexes at least includes:
and 301, dividing the time series data of the abnormal indexes according to a preset time window, and determining a first ratio of the abnormal data of each abnormal index in the time series data of the current time window.
And 302, determining the abnormal index with the first ratio larger than a preset first threshold value as a target abnormal index in a corresponding time window.
303, grouping the target abnormal indexes in a time window according to the component types to which the performance indexes belong, and constructing corresponding abnormal sub-maps based on the grouping of the target abnormal indexes.
In the embodiment of the present invention, after obtaining the abnormal indexes through abnormal detection, the time series data of the abnormal indexes may be divided according to a preset time window, and a first ratio of the abnormal data of each abnormal index in the time series data of the current time window in which the abnormal data is located, that is, a first ratio in the total number of detections of the current time window in which the abnormal data is located, is determined, and then it is determined whether the first ratio is greater than a preset first threshold value, if the first ratio is greater than the preset first threshold value, determining the corresponding abnormal index as a target abnormal index in the time window, if the first ratio is less than or equal to a preset first threshold value, not determining the corresponding abnormal index as the target abnormal index in the time window, finally grouping the target abnormal indexes in each time window according to the component types to which the performance indexes belong, and respectively constructing corresponding abnormal sub-maps according to the grouping of the target abnormal indexes, as shown in fig. 3B, wherein fig. 3B is a flow diagram of an application scene for constructing the abnormal sub-maps provided by the invention. The first threshold may be set empirically in advance, and the value of the first threshold is not limited in the embodiment of the present invention, for example, the first threshold may be 10%.
In this embodiment, before the abnormal sub-map is constructed based on the group of abnormal indexes, the abnormal indexes are filtered according to the proportion of the abnormal data of the abnormal indexes in the time sequence data of the abnormal indexes in the time window where the abnormal data of the abnormal indexes are located, so that the wrong abnormal indexes can be removed, the correctness of the abnormal indexes for constructing the abnormal Chang Zi map is ensured, and the correctness of the constructed abnormal sub-map is ensured.
Referring to fig. 4, fig. 4 is a schematic flow chart of determining a domain knowledge graph provided by the present invention, and as shown in fig. 4, determining the similarity between an abnormal sub-graph and a Shi Yichang sub-graph of other network elements of the same type in a target system, and determining the abnormal sub-graph as the domain knowledge graph based on the similarity at least includes:
401, determining similarity between the abnormal sub-graph and historical abnormal sub-graphs of other network elements of the same type in the target system.
And 402, determining a second proportion of the network element generating the abnormal sub-map and other network elements with the similarity larger than a preset second threshold in the network elements of the same type.
And 403, determining the abnormal sub-graph spectrum with the second proportion larger than a third threshold value as the domain knowledge graph spectrum.
In the embodiment of the present invention, after the abnormal sub-maps are constructed based on the abnormal indexes, similarity analysis may be performed on each abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system one by one, and it is determined whether there is an abnormal Chang Zi map whose similarity with the current abnormal sub-map is greater than a preset second threshold in the historical abnormal sub-maps of other network elements of the same type, if there is an abnormal sub-map whose similarity with the current abnormal sub-map is greater than a preset second threshold in the historical abnormal sub-maps of other network elements of the same type, it indicates that a similar fault has occurred in the target system, and further statistics is performed on network elements having similar faults, including a network element generating an abnormal sub-map and other network elements whose similarity is greater than a preset second threshold, a second occupation ratio in the network elements of the same type, and finally it is determined whether the second occupation ratio is greater than a preset third threshold, and if the second occupation ratio is greater than the preset third threshold, the abnormal sub-map is determined as a domain knowledge map. The second threshold and the third threshold may be set empirically in advance, and the values of the second threshold and the third threshold are not limited in the embodiment of the present invention, for example, the second threshold may be 80%, and the third threshold may be 30%.
Referring to fig. 5, fig. 5 is a schematic flow chart illustrating the determining of the similarity between the abnormal sub-map and the historical abnormal sub-map provided by the present invention, as shown in fig. 5, the determining of the similarity between the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system at least includes, for each abnormal sub-map:
501, determining the similarity of the abnormal indexes of the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system, the number of the same abnormal indexes and the similarity of map vectors based on node2 vec.
502, ranking the determined abnormal index similarity, the number of the same abnormal indexes and the map vector similarity based on the node2vec respectively.
503, summing the rank of the similarity of the abnormal indexes, the number of the same abnormal indexes and the similarity of the map vectors based on the node2vec of each historical abnormal sub-map to obtain the similarity of the corresponding historical abnormal sub-map and the abnormal sub-map.
In the embodiment of the present invention, when determining the similarity between the abnormal sub-map and the historical abnormal sub-map, the feature for determining the similarity may be generated based on the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system, and the feature for determining the similarity may include the similarity between the abnormal sub-map and the abnormal index of the historical abnormal sub-map, the number of the same abnormal index, and the similarity between the map vector based on node2 vec.
The similarity of the abnormal indexes of the abnormal sub-maps and the historical abnormal sub-maps can be calculated by adopting a Shape-based distance (SBD) correlation algorithm, wherein errors caused by time shifting of the performance indexes can be ignored by the SBD algorithm, and the correlation degree of the performance index time sequence data can be reflected better. For example, one abnormal sub-map g has m abnormal indexes, n abnormal Chang Zi maps of other network elements of the same type as the abnormal sub-map g can be selected from the historical abnormal sub-map library, each abnormal sub-map has k abnormal indexes, and the calculation complexity is δ = n m k. In the implementation process, in order to improve the efficiency of calculating the similarity by map matching, a parallelization method can be adopted for processing. The maximum value of the SBD value calculated by any two abnormal indexes between the abnormal sub-map and the Shi Yichang sub-map can be selected as the abnormal index similarity of the abnormal sub-map and the historical abnormal sub-map.
The number of the same abnormal indexes of the abnormal sub-map and the historical abnormal sub-map can be respectively determined, and the number of the same abnormal indexes in the abnormal sub-map g and the n historical abnormal sub-maps can be taken as the characteristics.
The abnormal sub-map and the historical abnormal sub-map are based on the map vector similarity of node2vec, the abnormal sub-map and the historical abnormal sub-map can be vectorized based on node2vec to obtain the map vectors of 200 rows of the abnormal sub-map and the historical abnormal sub-map, then the similarity of the map vectors of the abnormal sub-map and the historical abnormal sub-map is determined, wherein the node2vec is a graph embedding method comprehensively considering DFS neighborhoods and BFS neighborhoods, can be regarded as an extension of deepwalk, and is deepwalk combining DFS and BFS random walk.
And then performing feature fusion on the generated features for determining the similarity to obtain the similarity of the final atlas, wherein the feature fusion can adopt a weighting method. The similarity of the abnormal indexes obtained based on the abnormal sub-graph and the historical abnormal sub-graph, the number of the same abnormal indexes and the similarity of the graph vectors based on the node2vec are shown in table 1.
TABLE 1
Figure BDA0003691326490000151
And respectively ranking the abnormal index similarity, the same abnormal index quantity and the map vector similarity based on the node2vec of the abnormal sub-map and the historical abnormal sub-map in the table 1 to obtain a table 2.
TABLE 2
Figure BDA0003691326490000152
Figure BDA0003691326490000161
And summing the abnormal index similarity of the historical abnormal sub-maps in the table 2, the number of the same abnormal indexes and the ranking of the map vector similarity based on the node2vec to obtain a table 3.
TABLE 3
Figure BDA0003691326490000162
The ranking in the table 3 is converted into similarity, the ranking can be converted by adopting a normalization index function softmax, the ranking is normalized to be a decimal number between 0 and 1, and then the normalized numerical value is subtracted from 1 to obtain a table 4 representing the similarity between the abnormal sub-map and the historical abnormal sub-map, wherein the larger the numerical value is, the higher the similarity is.
TABLE 4
Figure BDA0003691326490000163
Referring to fig. 6A, fig. 6A is a schematic flow chart of an application method of an abnormal knowledge graph in the operation and maintenance field according to the present invention, where the application method of the abnormal knowledge graph in the operation and maintenance field shown in fig. 6A may be executed by an application device of the abnormal knowledge graph in the operation and maintenance field, and the application device of the abnormal knowledge graph in the operation and maintenance field may be disposed in a server, for example, the server may be a physical server including an independent host, a virtual server borne by a host cluster, a cloud server, and the like, which is not limited in this embodiment of the present invention. As shown in fig. 6A, the application method of the abnormal knowledge graph in the operation and maintenance field at least includes:
601, determining time sequence data of the performance index based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance index to determine an abnormal index therein.
And 602, grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes.
603, splicing the abnormal sub-maps based on the system architecture of the target system, and verifying the spliced abnormal sub-maps to generate a fault knowledge map.
And 604, segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph.
And 605, matching the fault sub-map with the operation and maintenance field abnormal knowledge map of the target system, and determining the operation and maintenance field abnormal knowledge map corresponding to the fault sub-map.
And 606, obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by fault knowledge map segmentation.
In the embodiment of the invention, after the abnormal knowledge graph in the operation and maintenance field of the target system is obtained, when the target system generates an abnormal condition, the abnormal sub-graph spectrum can be obtained based on 601 and 602, then the abnormal sub-graph spectrum is spliced according to the system architecture of the target system, the spliced abnormal sub-graph spectrum is verified, and finally the fault knowledge graph is generated. As shown in fig. 6B, fig. 6B is a schematic diagram of a failure knowledge graph generated according to the application method of the abnormal knowledge graph in the operation and maintenance field provided by the present invention. The description of 601 and 602 can refer to the description of 101 and 102 in fig. 1, and thus will not be repeated here. The implementation method for splicing the abnormal sub-maps is not limited, and for example, the abnormal sub-maps can be spliced by adopting algorithms such as frequent subgraph mining and the like. The implementation method for verifying the spliced abnormal sub-map is not limited, and for example, the spliced abnormal sub-map can be verified and confirmed by calling a chain, expert experience and other methods.
After the fault knowledge graph is generated, the fault knowledge graph can be segmented according to the component types to form fault sub-graphs, then each fault sub-graph is respectively matched with the operation and maintenance field abnormal knowledge graph of the target system in the knowledge base, and if the operation and maintenance field abnormal knowledge graph corresponding to the fault sub-graph is matched, a final fault solution of the abnormal condition generated by the target system can be obtained according to a fault solution labeled by the matched operation and maintenance field abnormal knowledge graph. The fault sub-graphs are matched with the operation and maintenance field abnormal knowledge graph, and the similarity between each fault sub-graph and the operation and maintenance field abnormal knowledge graph of the target system in the knowledge base can be determined.
The operation and maintenance domain abnormal knowledge map generation device provided by the invention is described below, and the operation and maintenance domain abnormal knowledge map generation device described below and the operation and maintenance domain abnormal knowledge map generation method described above can be referred to in a corresponding manner.
Referring to fig. 7, fig. 7 is a schematic diagram illustrating a composition structure of an operation and maintenance domain abnormal knowledge graph generation device according to the present invention, where the operation and maintenance domain abnormal knowledge graph generation device shown in fig. 7 may be disposed in a server for executing the operation and maintenance domain abnormal knowledge graph generation method shown in fig. 1, for example, the server may be a physical server including an independent host, a virtual server carried by a host cluster, a cloud server, and the like, which is not limited in the embodiment of the present invention. As shown in fig. 7, the apparatus for generating an abnormal knowledge graph in the operation and maintenance field at least includes:
and the abnormal index detection module 710 is configured to determine time series data of the performance index based on the collected operation data of the target system, and perform abnormal detection on the time series data of the performance index to determine an abnormal index therein.
And the abnormal map building module 720 is configured to group the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window, and build corresponding abnormal sub-maps based on the grouping of the abnormal indexes.
And the domain knowledge graph extraction module 730 is configured to determine similarity between the abnormal sub-graph and historical abnormal sub-graphs of other network elements of the same type in the target system, and determine the abnormal sub-graph as the domain knowledge graph based on the similarity.
The domain knowledge map marking module 740 is configured to label fault information of the domain knowledge map based on experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge map of the target system, where the fault information includes: fault name and fault solution.
Optionally, the anomaly map building module 720 includes:
and the time division unit is used for dividing the time sequence data of the abnormal indexes according to a preset time window and determining the first proportion of the abnormal data of each abnormal index in the time sequence data of the current time window.
And the index filtering unit is used for determining the abnormal index of which the first ratio is greater than a preset first threshold as the target abnormal index in the corresponding time window.
And the type grouping unit is used for grouping the target abnormal indexes in a time window according to the component types to which the performance indexes belong and constructing corresponding abnormal sub-maps based on the grouping of the target abnormal indexes.
Optionally, the domain knowledge graph extracting module 730 comprises:
and the similarity calculation unit is used for determining the similarity between the abnormal sub-map and the historical abnormal sub-map of other network elements of the same type in the target system.
And the network element proportion calculating unit is used for determining a second proportion of the network element generating the abnormal sub-map, other network elements with the similarity larger than a preset second threshold value and network elements of the same type.
And the map extraction unit is used for determining the abnormal sub-map with the second proportion larger than the third threshold value as the domain knowledge map.
Optionally, the similarity calculation unit includes:
and the characteristic generating subunit is used for determining the abnormal index similarity, the same abnormal index quantity and the map vector similarity based on the node2vec of each abnormal sub-map and the historical abnormal sub-maps of other network elements of the same type in the target system.
And the characteristic ranking subunit is used for ranking the determined abnormal index similarity, the number of the same abnormal indexes and the similarity of the map vectors based on the node2vec respectively aiming at each abnormal sub-map.
And the similarity operator unit is used for summing the abnormal index similarity of each historical abnormal Chang Zi map of each abnormal sub-map, the number of the same abnormal indexes and the ranking of the map vector similarity based on the node2vec to obtain the similarity between the corresponding historical abnormal sub-map and the abnormal sub-map.
Optionally, the abnormal index detecting module 710 includes:
and the index determining unit is used for acquiring the operation data of the target system based on the agent program and processing the operation data to obtain the time sequence data of the performance index.
And the abnormality detection unit is used for carrying out abnormality detection on the time series data of the performance indexes based on the 4-sigma and determining the abnormality indexes.
The application device of the abnormal knowledge map in the operation and maintenance field provided by the invention is described below, and the application device of the abnormal knowledge map in the operation and maintenance field described below and the application method of the abnormal knowledge map in the operation and maintenance field described above can be referred to correspondingly.
Referring to fig. 8, fig. 8 is a schematic diagram illustrating a composition structure of an application device of an abnormal knowledge graph in the operation and maintenance field according to the present invention, where the application device of the abnormal knowledge graph in the operation and maintenance field shown in fig. 8 may be disposed in a server for executing the application method of the abnormal knowledge graph in the operation and maintenance field shown in fig. 6A, for example, the server may be a physical server including an independent host, a virtual server borne by a host cluster, a cloud server, and the like, which is not limited in this embodiment of the present invention. As shown in fig. 8, the application device of the abnormal knowledge graph in the operation and maintenance field at least includes:
and the abnormal index detection module 810 is configured to determine time series data of the performance index based on the acquired operation data of the target system, and perform abnormal detection on the time series data of the performance index to determine an abnormal index therein.
And the abnormal map building module 820 is configured to group the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window, and build corresponding abnormal sub-maps based on the grouping of the abnormal indexes.
And a failure knowledge graph generation module 830, configured to splice the abnormal sub-graphs based on the system architecture of the target system, check the spliced abnormal sub-graphs, and generate a failure knowledge graph.
And the failure knowledge graph segmentation module 840 is used for segmenting the failure knowledge graph based on the component type to obtain a failure sub-graph.
And the domain knowledge map matching module 850 is used for matching the fault sub-map with the operation and maintenance domain abnormal knowledge map of the target system to determine the operation and maintenance domain abnormal knowledge map corresponding to the fault sub-map.
And the fault solution extracting module 860 is used for obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
Fig. 9 illustrates a physical structure diagram of an electronic device, and as shown in fig. 9, the electronic device may include: a processor (processor) 910, a communication interface (communication interface) 920, a memory (memory) 930, and a communication bus 940, wherein the processor 910, the communication interface 920, and the memory 930 are in communication with each other via the communication bus 940. Processor 910 may invoke logic instructions in memory 930 to perform the method described above, the method comprising:
determining time sequence data of performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
determining the similarity between the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
and carrying out fault information labeling on the domain knowledge graph based on the experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution. Alternatively, the first and second electrodes may be,
determining time sequence data of performance indexes based on the collected running data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
splicing the abnormal sub-maps based on the system architecture of the target system, and verifying the spliced abnormal sub-maps to generate a fault knowledge map;
segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph;
matching the fault sub-map with an operation and maintenance field abnormal knowledge map of the target system, and determining the operation and maintenance field abnormal knowledge map corresponding to the fault sub-map;
and obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
Furthermore, the logic instructions in the memory 930 may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the above method comprising:
determining time sequence data of performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
and carrying out fault information labeling on the domain knowledge graph based on the experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution. Alternatively, the first and second electrodes may be,
determining time sequence data of performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
splicing the abnormal sub-maps based on the system architecture of the target system, and verifying the spliced abnormal sub-maps to generate a fault knowledge map;
segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph;
matching the fault sub-map with an operation and maintenance field abnormal knowledge map of the target system, and determining the operation and maintenance field abnormal knowledge map corresponding to the fault sub-map;
and obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the above method, the method comprising:
determining time sequence data of performance indexes based on the collected operation data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
determining the similarity between the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
and carrying out fault information labeling on the domain knowledge graph based on the experience of an operation and maintenance expert to obtain an abnormal knowledge graph of the operation and maintenance domain of the target system, wherein the fault information comprises: fault name and fault solution. Alternatively, the first and second electrodes may be,
determining time sequence data of performance indexes based on the collected running data of the target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
splicing the abnormal sub-maps based on the system architecture of the target system, and verifying the spliced abnormal sub-maps to generate a fault knowledge map;
segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph;
matching the fault sub-map with an operation and maintenance domain abnormal knowledge map of the target system, and determining the operation and maintenance domain abnormal knowledge map corresponding to the fault sub-map;
and obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the embodiment. One of ordinary skill in the art can understand and implement the method without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may be modified or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. A method for generating an abnormal knowledge graph in the operation and maintenance field is characterized by comprising the following steps:
determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
and carrying out fault information labeling on the domain knowledge graph based on the experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution.
2. The method for generating the operation and maintenance field abnormal knowledge graph according to claim 1, wherein the step of grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window and constructing the corresponding abnormal sub-graph based on the grouping of the abnormal indexes comprises the steps of:
dividing the time sequence data of the abnormal indexes according to a preset time window, and determining a first proportion of the abnormal data of each abnormal index in the time sequence data of the current time window;
determining the abnormal index of which the first ratio is larger than a preset first threshold value as a target abnormal index in a corresponding time window;
and grouping the target abnormal indexes in one time window according to the component types to which the performance indexes belong, and constructing a corresponding abnormal sub-map based on the grouping of the target abnormal indexes.
3. The method for generating the operation and maintenance domain abnormal knowledge graph according to claim 1 or 2, wherein the determining the similarity between the abnormal sub-graph and the historical abnormal sub-graph of other network elements of the same type in the target system, and the determining the abnormal sub-graph as the domain knowledge graph based on the similarity comprises:
determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system;
determining a second proportion of the network element generating the abnormal sub-map and other network elements with the similarity larger than a preset second threshold in the network elements of the same type;
determining the abnormal sub-graph spectrum with the second ratio larger than the third threshold value as the domain knowledge graph.
4. The method for generating an abnormal knowledge graph in the operation and maintenance field according to claim 3, wherein the determining the similarity between the abnormal sub-graph and the historical abnormal sub-graph of other network elements of the same type in the target system includes, for each abnormal sub-graph:
determining the similarity of the abnormal indexes of the abnormal sub-map and the historical abnormal sub-maps of other network elements of the same type in the target system, the quantity of the same abnormal indexes and the similarity of map vectors based on node2 vec;
ranking the determined abnormal index similarity, the number of the same abnormal indexes and the map vector similarity based on the node2vec respectively;
and performing ranking summation on the abnormal index similarity, the same abnormal index quantity and the map vector similarity based on node2vec of each historical abnormal sub-map to obtain the similarity between the corresponding historical abnormal sub-map and the abnormal sub-map.
5. The method for generating the abnormal knowledge graph in the operation and maintenance field according to claim 1, wherein the determining of the time series data of the performance index based on the collected operation data of the target system, and performing the abnormal detection on the time series data of the performance index to determine the abnormal index therein comprises:
collecting operation data of the target system based on an agent program, and processing the operation data to obtain time sequence data of the performance index;
and carrying out anomaly detection on the time series data of the performance index based on 4-sigma, and determining an anomaly index in the time series data.
6. An application method of an abnormal knowledge graph in the operation and maintenance field is characterized by comprising the following steps:
determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormity detection on the time sequence data of the performance indexes to determine abnormal indexes in the time sequence data;
grouping the abnormal indexes according to component types to which the performance indexes belong based on a preset time window, and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
splicing the abnormal sub-maps based on the system architecture of the target system, verifying the spliced abnormal sub-maps and generating a fault knowledge map;
segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph;
matching the fault sub-map with an operation and maintenance domain abnormal knowledge map of the target system, and determining the operation and maintenance domain abnormal knowledge map corresponding to the fault sub-map;
and obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
7. An operation and maintenance field abnormal knowledge map generation device is characterized by comprising:
the abnormal index detection module is used for determining the time sequence data of the performance index based on the collected running data of the target system and carrying out abnormal detection on the time sequence data of the performance index to determine the abnormal index;
the abnormal map building module is used for grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window and building corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
the domain knowledge graph extraction module is used for determining the similarity between the abnormal sub-graph and historical abnormal sub-graphs of other network elements of the same type in the target system and determining the abnormal sub-graph as a domain knowledge graph based on the similarity;
the domain knowledge map marking module is used for marking fault information of the domain knowledge map based on experience of operation and maintenance experts to obtain the abnormal knowledge map of the operation and maintenance domain of the target system, wherein the fault information comprises: fault name and fault solution.
8. An application device of an abnormal knowledge graph in the operation and maintenance field is characterized by comprising the following steps:
the abnormal index detection module is used for determining the time sequence data of the performance index based on the collected running data of the target system and carrying out abnormal detection on the time sequence data of the performance index to determine the abnormal index;
the abnormal map building module is used for grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window and building corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
the fault knowledge map generation module is used for splicing the abnormal sub-maps based on the system architecture of the target system, verifying the spliced abnormal sub-maps and generating a fault knowledge map;
the fault knowledge map segmentation module is used for segmenting the fault knowledge map based on the component type to obtain a fault sub-map;
the domain knowledge map matching module is used for matching the fault sub-map with the operation and maintenance domain abnormal knowledge map of the target system and determining the operation and maintenance domain abnormal knowledge map corresponding to the fault sub-map;
and the fault solution extraction module is used for obtaining a target fault solution based on the fault solution labeled by the abnormal knowledge map in the operation and maintenance field corresponding to the fault sub-map obtained by the fault knowledge map segmentation.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for generating the operation and maintenance domain anomaly knowledge map according to any one of claims 1 to 5 or the method for applying the operation and maintenance domain anomaly knowledge map according to claim 6.
10. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for generating the operation and maintenance domain anomaly knowledge map according to any one of claims 1 to 5, or the method for applying the operation and maintenance domain anomaly knowledge map according to claim 6.
CN202210664886.4A 2022-06-13 2022-06-13 Method for generating abnormal knowledge graph in operation and maintenance field, application method and device Active CN115277453B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210664886.4A CN115277453B (en) 2022-06-13 2022-06-13 Method for generating abnormal knowledge graph in operation and maintenance field, application method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210664886.4A CN115277453B (en) 2022-06-13 2022-06-13 Method for generating abnormal knowledge graph in operation and maintenance field, application method and device

Publications (2)

Publication Number Publication Date
CN115277453A true CN115277453A (en) 2022-11-01
CN115277453B CN115277453B (en) 2024-06-18

Family

ID=83758852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210664886.4A Active CN115277453B (en) 2022-06-13 2022-06-13 Method for generating abnormal knowledge graph in operation and maintenance field, application method and device

Country Status (1)

Country Link
CN (1) CN115277453B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112787841A (en) * 2019-11-11 2021-05-11 华为技术有限公司 Fault root cause positioning method and device and computer storage medium
WO2021114977A1 (en) * 2019-12-12 2021-06-17 深圳前海微众银行股份有限公司 Method and device for positioning fundamental cause of abnormal event
CN113032238A (en) * 2021-05-25 2021-06-25 南昌惠联网络技术有限公司 Real-time root cause analysis method based on application knowledge graph
WO2021184630A1 (en) * 2020-03-19 2021-09-23 平安国际智慧城市科技股份有限公司 Method for locating pollutant discharge object on basis of knowledge graph, and related device
CN114218403A (en) * 2021-12-20 2022-03-22 平安付科技服务有限公司 Fault root cause positioning method, device, equipment and medium based on knowledge graph
CN114430365A (en) * 2022-04-06 2022-05-03 北京宝兰德软件股份有限公司 Fault root cause analysis method and device, electronic equipment and storage medium
CN114465874A (en) * 2022-04-07 2022-05-10 北京宝兰德软件股份有限公司 Fault prediction method, device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112787841A (en) * 2019-11-11 2021-05-11 华为技术有限公司 Fault root cause positioning method and device and computer storage medium
WO2021114977A1 (en) * 2019-12-12 2021-06-17 深圳前海微众银行股份有限公司 Method and device for positioning fundamental cause of abnormal event
WO2021184630A1 (en) * 2020-03-19 2021-09-23 平安国际智慧城市科技股份有限公司 Method for locating pollutant discharge object on basis of knowledge graph, and related device
CN113032238A (en) * 2021-05-25 2021-06-25 南昌惠联网络技术有限公司 Real-time root cause analysis method based on application knowledge graph
CN114218403A (en) * 2021-12-20 2022-03-22 平安付科技服务有限公司 Fault root cause positioning method, device, equipment and medium based on knowledge graph
CN114430365A (en) * 2022-04-06 2022-05-03 北京宝兰德软件股份有限公司 Fault root cause analysis method and device, electronic equipment and storage medium
CN114465874A (en) * 2022-04-07 2022-05-10 北京宝兰德软件股份有限公司 Fault prediction method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115277453B (en) 2024-06-18

Similar Documents

Publication Publication Date Title
CN111064614B (en) Fault root cause positioning method, device, equipment and storage medium
CN113098723B (en) Fault root cause positioning method and device, storage medium and equipment
CN104794136A (en) Fault analysis method and device
US20160055044A1 (en) Fault analysis method, fault analysis system, and storage medium
CN110149223B (en) Fault positioning method and equipment
JP6079243B2 (en) Failure analysis support device, failure analysis support method, and program
CN109726737B (en) Track-based abnormal behavior detection method and device
CN111984442A (en) Method and device for detecting abnormality of computer cluster system, and storage medium
CN114461534A (en) Software performance testing method and system, electronic equipment and readable storage medium
CN110825466B (en) Program jamming processing method and jamming processing device
CN116010456A (en) Equipment processing method, server and rail transit system
CN111913824A (en) Method for determining data link fault reason and related equipment
CN113138906A (en) Call chain data acquisition method, device, equipment and storage medium
CN113221096A (en) Method and system for analyzing correlation of random events in chaotic engineering
CN112416896A (en) Data abnormity warning method and device, storage medium and electronic device
JP5668425B2 (en) Failure detection apparatus, information processing method, and program
CN115277453A (en) Method for generating abnormal knowledge graph in operation and maintenance field, application method and device
CN116149926A (en) Abnormality monitoring method, device, equipment and storage medium for business index
CN115185792A (en) Fault hardware processing method, device and system
CN109815109B (en) Data mode change detection method, device, equipment and readable storage medium
CN113656210A (en) Processing method and device for error reporting information, server and readable storage medium
CN113285977B (en) Network maintenance method and system based on block chain and big data
CN115185932A (en) Data processing method and device
CN115599621A (en) Micro-service abnormity diagnosis method, device, equipment and storage medium
CN113781068A (en) Online problem solving method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant