CN115720186A - Abnormal root cause positioning method and device based on equipment topology and causal relationship - Google Patents

Abnormal root cause positioning method and device based on equipment topology and causal relationship Download PDF

Info

Publication number
CN115720186A
CN115720186A CN202211320044.3A CN202211320044A CN115720186A CN 115720186 A CN115720186 A CN 115720186A CN 202211320044 A CN202211320044 A CN 202211320044A CN 115720186 A CN115720186 A CN 115720186A
Authority
CN
China
Prior art keywords
alarm
root
causal relationship
root cause
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211320044.3A
Other languages
Chinese (zh)
Inventor
吴侃
周世军
覃华云
毛恒
李敏敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unihub China Information Technology Co Ltd
Original Assignee
Unihub China Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unihub China Information Technology Co Ltd filed Critical Unihub China Information Technology Co Ltd
Priority to CN202211320044.3A priority Critical patent/CN115720186A/en
Publication of CN115720186A publication Critical patent/CN115720186A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an abnormal root cause positioning method and device based on equipment topology and causal relationship, wherein the method comprises the following steps: storing the topological relation of the network equipment through a graph database; collecting historical alarm information for processing, generating data to be analyzed, calculating alarm causal relationship by using LiNMAG algorithm, and calculating alarm causal relationship weight by using conditional probability formula; and analyzing the alarm information generated in real time based on the topological relation, the alarm causal relation and the weight of the network equipment to obtain a root cause link, a root alarm and root equipment. The method and the device can give the root cause link of the alarm by utilizing the topological structure sequence of the network equipment and combining the causal relationship of the alarm and the time sequence of the alarm, and find the root equipment and the root alarm according to the root cause link.

Description

Abnormal root cause positioning method and device based on equipment topology and causal relationship
Technical Field
The invention relates to the technical field of alarm root cause positioning, in particular to an abnormal root cause positioning method and device based on equipment topology and causal relationship.
Background
Under the condition that a large amount of alarms occur in the whole network in a short time, repeated alarms in the network need to be compressed, a root cause alarm link is given, and the root cause alarm is found out, so that the workload of operation and maintenance personnel in the process of processing the alarms is reduced. In the existing technical scheme, an alarm root cause positioning method based on a GRANO algorithm (a root cause analysis algorithm based on an interactive graph) is provided, and although the method can effectively compress alarms and determine the root cause alarm by calculating score sequencing of each alarm node, a complete root cause link cannot be provided, and the interpretability of the root cause is not strong.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides an abnormal root cause positioning method and apparatus based on device topology and causal relationship, which can provide a root cause link of alarm by using the topology structure sequence of network devices, combining the causal relationship of alarm and the time sequence of alarm occurrence, and find the root device and root alarm according to the root cause link.
In order to achieve the purpose, the invention adopts the following technical scheme:
in an embodiment of the present invention, a method for positioning an abnormal root cause based on a device topology and a causal relationship is provided, where the method includes:
storing the topological relation of the network equipment through a graph database;
collecting historical alarm information for processing, generating data to be analyzed, calculating alarm causal relationship by using LiNMAG algorithm, and calculating alarm causal relationship weight by using conditional probability formula;
and analyzing the alarm information generated in real time based on the topological relation, the alarm causal relation and the weight of the network equipment to obtain a root cause link, a root alarm and root equipment.
Further, the storing the network device topological relation through a graph database includes:
and defining each network device as a vertex of the graph database, and defining the connection relation between the network devices as an edge of the graph database, so as to record and write the connection relation between the network devices and the network devices into the graph database.
Further, collecting and processing the historical alarm information to generate data to be analyzed, including:
collecting history alarm information, selecting the history alarm information in a time slice after each piece of history alarm information occurs, and screening out the history alarm information occurring on the network equipment within two hops by using the connection relation between the network equipment;
and processing the screened historical alarm information in a one-hot mode to generate a piece of data to be analyzed.
Further, analyzing the alarm information generated in real time based on the topological relation, the causal relation of the alarm and the weight thereof of the network equipment to obtain a root cause link, a root alarm and root equipment, comprising:
for each piece of alarm information generated in real time, selecting the alarm information in a time slice after the alarm information generated in real time occurs;
obtaining all network devices in the alarm information through a graph database, screening out the network devices within two hops by using the connection relation between the network devices, and recording the connection relation between the network devices within two hops through a single graph space;
analyzing the network devices in the graph space one by one, and if a plurality of alarms occur on one network device, determining a root alarm occurring on the network device;
if the graph space is not fully connected, root cause positioning analysis needs to be carried out by taking a connected subgraph as a unit, and a root cause link, a root alarm and root equipment are obtained and summarized.
Further, for a plurality of alarms generated on a network device, a directed graph is constructed first, and then a starting alarm of the longest path calculated according to the causal relationship weight in the directed graph is selected as a root alarm; and if no path exists in the directed graph, selecting the alarm at the earliest time as a root alarm.
Further, if the graph space is not fully connected, root cause localization analysis needs to be performed by taking a connected subgraph as a unit, including:
for each connected subgraph, firstly constructing a new digraph, traversing the edges of the new digraph, merging and associating the alarm types corresponding to the starting point and the end point network equipment of each edge with the alarm causal relationship result, if the combination of the alarm types corresponding to the starting point and the end point network equipment can be associated in the alarm causal relationship result, judging whether the weight values of the corresponding alarm causal relationship in the positive direction and the negative direction exist, if so, taking the weight values of the corresponding alarm causal relationship in the positive direction and the negative direction, then selecting the actual direction with the side with the higher weight value in the weight values of the alarm causal relationship in the positive direction and the negative direction, recording the actual direction into the constructed new digraph, and if not, recording the alarm causal relationship weight value as 0;
after the construction of the current new directed graph is completed, all paths in the new directed graph are calculated, the path with the highest sum of alarm causal relationship weight values is used as a root cause link, the starting device of the root cause link is a root device, and the alarm generated on the root device is a root alarm.
In an embodiment of the present invention, an abnormal root cause positioning apparatus based on device topology and causal relationship is further provided, and the apparatus includes:
the device topology information construction module is used for storing the network device topology relationship through a graph database;
the alarm causal relationship calculation module is used for collecting historical alarm information for processing, generating data to be analyzed, calculating alarm causal relationship by using LiNMAG algorithm, and calculating alarm causal relationship weight by using conditional probability formula;
and the alarm root cause convergence module is used for analyzing the alarm information generated in real time based on the topological relation, the alarm cause-and-effect relation and the weight of the network equipment to obtain a root cause link, a root alarm and root equipment.
Further, the device topology information construction module is specifically configured to:
and defining each network device as a vertex of the graph database, and defining the connection relation between the network devices as the edge of the graph database, so as to record and write the connection relation between the network devices and the network devices into the graph database.
Further, collecting and processing the historical alarm information to generate data to be analyzed, including:
collecting history alarm information, selecting the history alarm information in a time slice after each piece of history alarm information occurs, and screening out the history alarm information occurring on network equipment within two hops by using the connection relation between the network equipment;
and processing the screened historical alarm information in a one-hot mode to generate a piece of data to be analyzed.
Further, the alarm root cause convergence module is specifically configured to:
for each piece of alarm information generated in real time, selecting the alarm information in a time slice after the alarm information generated in real time occurs;
obtaining all network devices in the alarm information through a graph database, screening out the network devices within two hops by using the connection relation between the network devices, and recording the connection relation between the network devices within two hops through a single graph space;
analyzing the network devices in the graph space one by one, and if a plurality of alarms occur on one network device, determining a root alarm occurring on the network device;
if the graph space is not fully connected, root cause positioning analysis is needed to be carried out by taking a connected subgraph as a unit, and a root cause link, a root alarm and root equipment are obtained and summarized.
Further, for a plurality of alarm information generated on a network device, a directed graph is constructed first, and then a starting point alarm of the longest path calculated according to causal relationship weight in the directed graph is selected as a root alarm; and if no path exists in the directed graph, selecting the alarm at the earliest time as a root alarm.
Further, if the graph space is not fully connected, root cause localization analysis needs to be performed by taking a connected subgraph as a unit, including:
for each connected sub-graph, firstly constructing a new directed graph, traversing the edges of the new directed graph, merging and associating the alarm types corresponding to the starting point and the destination network equipment of each edge and the alarm causal relationship result, if the combination of the alarm types corresponding to the starting point and the destination network equipment can be associated in the alarm causal relationship result, judging whether the weighted values of the positive direction and the negative direction of the corresponding alarm causal relationship exist, if so, selecting the weighted value of the positive direction and the negative direction of the corresponding alarm causal relationship, then selecting the actual direction with the side with the higher weighted value in the weighted values of the positive direction and the negative direction of the alarm causal relationship, recording the actual direction into the constructed new directed graph, and if not, recording the weighted value of the alarm causal relationship as 0;
after the construction of the current new directed graph is completed, all paths in the new directed graph are calculated, the path with the highest sum of alarm causal relationship weight values is used as a root cause link, the starting device of the root cause link is a root device, and the alarm generated on the root device is a root alarm.
In an embodiment of the present invention, a computer device is further provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the foregoing method for locating an abnormal root cause based on device topology and causal relationship is implemented.
In an embodiment of the present invention, a computer-readable storage medium is further provided, where a computer program for executing the method for locating an abnormal root cause based on device topology and causal relationship is stored in the computer-readable storage medium.
Has the beneficial effects that:
1. the invention records the topological relation and the alarm causal relation of the equipment through the graph database, and can realize high efficiency of inquiry and use.
2. The alarm root cause convergence calculation mode can accurately position the alarm root cause link, obtain the root equipment and the root alarm which cause the alarm, and can effectively improve the alarm processing efficiency.
Drawings
FIG. 1 is a schematic flow chart of an abnormal root cause locating method based on device topology and causal relationship according to the present invention;
fig. 2 is a root cause link diagram with only two endpoints obtained by alarm information occurring within 2 hours in a network with a certain topology according to an embodiment of the present invention;
FIG. 3 is a root cause link diagram with four endpoints obtained from alarm information occurring within 2 hours in a topological network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an abnormal root cause locator based on device topology and causal relationship according to the present invention;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The principles and spirit of the present invention will be described below with reference to several exemplary embodiments, which should be understood to be presented only to enable those skilled in the art to better understand and implement the present invention, and not to limit the scope of the present invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as an apparatus, device, apparatus, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to the embodiment of the invention, an abnormal root cause positioning method and device based on equipment topology and causal relationship are provided, a root cause link of alarm can be given by utilizing a topological structure sequence of network equipment and combining the causal relationship of the alarm and a time sequence of alarm occurrence, and the root equipment and the root alarm are found according to the root cause link.
The principles and spirit of the present invention are explained in detail below with reference to several exemplary embodiments of the present invention.
FIG. 1 is a schematic flow chart of an abnormal root cause locating method based on device topology and causal relationship according to the present invention. As shown in fig. 1, the method is as follows:
1. constructing device topology information
And storing the connection relation between the network devices through the graph database. The graph database has a data structure of vertices and edges, and the vertices are connected by the edges for describing the connection relationship between the network devices. Each network device is defined as a vertex, the connection relation between the network devices is defined as an edge, and the connection relation between the network devices and the network devices is recorded and written into a graph database in such a way, so that the connection relation can be used at any time when subsequent analysis is carried out.
2. Calculating alarm causal relationships
Collecting historical alarm information, wherein the alarm information comprises alarm types and corresponding alarm equipment data, and processing after collection: and traversing each piece of historical alarm information one by one, and for each piece of historical alarm information, selecting all pieces of historical alarm information in a time slice after the historical alarm occurs, wherein the value of the time slice is 10 minutes. And (2) screening out historical alarm information generated on network equipment within two hops by using the connection relation between the network equipment constructed in the step (1) from the historical alarm information in the time slice, and processing the screened out historical alarm information into data to be analyzed in a one-hot mode (the prior art). After traversing all the historical alarm information, all the data used for calculating the alarm causal relationship can be obtained.
The LiNGAM algorithm is used for calculating the cause and effect relationship of the alarm, is a linear non-gaussian loop-free model, is mainly used for analyzing cause and effect directions and cause and effect connection weights among variables, and is the prior art. After the calculation is completed, several alarm causality relationships, a → B, B → C, are obtained, where a, B, C represent different alarm types.
And after the alarm causal relationship is obtained, calculating the alarm causal relationship weight for the next alarm root convergence and determining the trend of the root link. Here using conditional probability formulae in mathematics
Figure BDA0003909928700000081
(prior art), where P (B) represents the probability of occurrence of B, P (AB) represents the probability of simultaneous occurrence of a and B, and P (a | B) here refers to the probability of simultaneous occurrence of alarm type a and alarm type B in case of occurrence of alarm type B. This way the causal relationships between alarms and their weights can be derived.
3. Root cause of alarm convergence
The first two steps have already obtained the cause and effect relationship of alarm and topological relationship of network equipment, this step needs to analyze the alarm information produced in real time, get the root cause link, root alarm and root equipment. The alarm information analyzed here is the alarm information in one time slice generated in real time.
3.1 firstly, obtaining all related network devices in the alarm information and network devices which are within two hops of the connection relationship with the related network devices through a graph database, and separately constructing a graph space (the data structure of which is the same as that of the graph database) as a device graph space to record the connection relationship of the related network devices and the network devices, and analyzing the device graph space.
3.2 the network devices in the graph space of 3.1 are analyzed one by one, and in this case, one network device may include multiple alarms, and it is necessary to determine a root alarm on a single network device. If only one alarm exists on one network device, the alarm is a root alarm on the network device. For the case that a network device contains multiple alarms, a directed graph is constructed (all alarms generated on the single network device are put into the graph, each alarm is used as an end point, the alarm cause and effect relationship table obtained in the step 2 is referred to, if alarm cause and effect relationships exist in the alarm cause and effect relationship table in a group of two alarms, an edge is connected in the graph and can obtain a cause and effect relationship weight value in the alarm cause and effect relationship table, and the points and the edges can construct a directed graph) to analyze the root alarm on the single network device, the starting point alarm of the longest path is calculated according to the alarm cause and effect relationship weight in the directed graph is selected as the root alarm (all loop-free single paths in the directed graph are found, wherein the most edges are the longest paths, if a plurality of equal-length longest paths exist, the alarm cause and effect relationship weights of each edge on each path are added, the path with the highest weight is selected, and the starting point alarm of the path is taken as the root alarm of the device). And if no path exists in the directed graph, selecting the alarm at the earliest time as a root alarm.
3.3 at this point, the graph space obtained by 3.1 is not fully connected, and may be divided into multiple connected subgraphs, and root cause localization analysis needs to be performed on a subgraph-by-subgraph basis. For each sub-graph, a new directed graph needs to be constructed to analyze correspondingly, which is called as a connection sub-graph, and the edge of the connection sub-graph is traversed firstly (the connection sub-graph is a sub-graph formed by connecting some network devices satisfying physical connection conditions and alarm root conditions, the sub-graph is independent in the whole graph space and is connected with other endpoints without edges, the sub-graph also has endpoints and edges, the endpoints refer to one network device, through 3.2 calculation, each endpoint only has one root alarm at the moment, the edges refer to the relationship formed by the root alarm connection on each endpoint), the alarm types corresponding to the starting point and the end point network devices of each edge are brought into the cause-effect relationship table (namely, the result of the alarm cause-effect relationship calculated in step 2, storing the alarm types in a table form), merging and associating the alarm types corresponding to the starting point and the end point network equipment of each edge with an alarm cause and effect relationship table, if the combination of the alarm types corresponding to the starting point and the end point network equipment can be associated in the alarm cause and effect relationship result, judging whether the weight values of the positive direction and the negative direction of the corresponding alarm cause and effect relationship exist, if so, taking the weight values of the positive direction and the negative direction of the corresponding alarm cause and effect relationship, then selecting the actual direction with the side with the higher weight value from the weight values of the positive direction and the negative direction of the alarm cause and effect relationship, recording the actual direction into the constructed new directed graph, and if not, recording the alarm cause and effect relationship weight value as 0. After the current directed graph is constructed, all paths in the new directed graph are calculated in a traversal mode, the path with the highest sum of alarm causal relationship weight values is used as a root link, the initial device of the root link is a root device, and the alarm corresponding to the root device is a root alarm.
And 3.4, summarizing the root cause links, the root alarms and the root devices obtained by each connection subgraph, so that all the existing root cause links and the corresponding root alarms and root devices in the time slice can be obtained.
It should be noted that although the operations of the method of the present invention have been described in the above embodiments and the accompanying drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the operations shown must be performed, to achieve the desired results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
In order to clearly explain the above method for locating abnormal root cause based on device topology and causal relationship, a specific embodiment is described below, but it should be noted that the embodiment is only for better explaining the present invention and does not constitute an undue limitation to the present invention.
After the method is applied, the alarm information generated in 2 hours under a certain topological network is modeled and predicted, 10 root cause link graphs are obtained, and two comparative representative results are shown. As shown in fig. 2, there are only two endpoints in the graph, and the upper character is the network device id, it can be seen that the two endpoints are connected by a directional arrow, where the starting point of the arrow is the root device where the root alarm occurs. As shown in fig. 3, the graph includes four end points, and as can also be seen from the arrow relationship in the graph, the middle point is the root device where the root alarm occurs, because the fault alarm of the root device causes abnormal alarms of the other three network devices.
Based on the same invention concept, the invention also provides an abnormal root cause positioning device based on equipment topology and causal relationship. The implementation of the device can refer to the implementation of the method, and repeated details are not repeated. The term "module," as used below, may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 4 is a schematic structural diagram of an abnormal root cause locating device based on device topology and causal relationship according to an embodiment of the present invention. As shown in fig. 4, the apparatus includes:
the device topology information construction module 101 is configured to store a network device topology relationship through a graph database; the method comprises the following specific steps:
and defining each network device as a vertex of the graph database, and defining the connection relation between the network devices as the edge of the graph database, so as to record and write the connection relation between the network devices and the network devices into the graph database.
The alarm causal relationship calculation module 102 is configured to collect and process historical alarm information, generate data to be analyzed, calculate an alarm causal relationship by using a LiNGAM algorithm, and calculate an alarm causal relationship weight by using a conditional probability formula;
collecting historical alarm information for processing, and generating data to be analyzed, wherein the steps comprise:
collecting history alarm information, selecting the history alarm information in a time slice after each piece of history alarm information occurs, and screening out the history alarm information occurring on the network equipment within two hops by using the connection relation between the network equipment;
and processing the screened historical alarm information in a one-hot mode to generate a piece of data to be analyzed.
The alarm root cause convergence module 103 is configured to analyze alarm information generated in real time based on a network device topology relationship, an alarm cause-and-effect relationship, and a weight thereof, and obtain a root cause link, a root alarm, and a root device; the method comprises the following specific steps:
for each piece of alarm information generated in real time, selecting the alarm information in a time slice after the alarm information generated in real time occurs;
obtaining all network devices in the alarm information through a graph database, screening out the network devices within two hops by using the connection relation between the network devices, and recording the connection relation between the network devices within two hops through a single graph space;
analyzing the network devices in the graph space one by one, and if a plurality of alarms occur on one network device, determining a root alarm occurring on the network device; firstly, constructing a directed graph, and then selecting a starting point alarm of a longest path calculated according to causal relationship weight in the directed graph as a root alarm; if no path exists in the directed graph, selecting the alarm at the earliest time as a root alarm;
if the graphs are not fully connected in the graph space, root cause positioning analysis needs to be carried out by taking one connected sub graph as a unit, for each connected sub graph, a new directed graph is firstly constructed, the edges of the new directed graph are traversed, the alarm types corresponding to the starting point and the destination network equipment of each edge are combined and associated with the alarm causal relationship result, if the alarm causal relationship result can be associated with the combination of the alarm types corresponding to the starting point and the destination network equipment, whether the weighted values of the positive direction and the negative direction of the corresponding alarm causal relationship exist or not is judged, if yes, the weighted values of the positive direction and the negative direction of the corresponding alarm causal relationship are taken, then the actual direction with the side with the higher weighted value of the positive direction and the negative direction of the alarm causal relationship is selected and recorded into the constructed new directed graph, and if not, the weighted value of the alarm causal relationship is recorded as 0;
after the construction of the current new directed graph is completed, calculating all paths in the new directed graph, and taking the path with the highest sum of alarm causal relationship weight values as a root cause link, wherein the initial equipment of the root cause link is root equipment, and the alarm generated on the root equipment is a root alarm;
and finally, obtaining and summarizing a root cause link, a root alarm and root equipment.
It should be noted that although several modules of the anomaly root cause locating device based on device topology and causal relationships are mentioned in the above detailed description, such partitioning is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the modules described above may be embodied in one module according to embodiments of the invention. Conversely, the features and functions of one module described above may be further divided into embodiments by a plurality of modules.
Based on the aforementioned inventive concept, as shown in fig. 5, the present invention further provides a computer device 200, which includes a memory 210, a processor 220, and a computer program 230 stored on the memory 210 and operable on the processor 220, wherein the processor 220 implements the aforementioned abnormal root cause localization method based on device topology and causal relationship when executing the computer program 230.
Based on the above inventive concept, the present invention further provides a computer readable storage medium storing a computer program for executing the above method for locating an abnormal root cause based on device topology and causal relationship.
The abnormal root cause positioning method and device based on the equipment topology and the causal relationship, provided by the invention, utilize the graph database to inquire the network equipment topology relationship and the alarm causal relationship, and combine the equipment topology relationship and the alarm causal relationship to carry out abnormal root cause positioning and alarm root cause convergence calculation.
While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
The limitation of the protection scope of the present invention is understood by those skilled in the art, and various modifications or changes which can be made by those skilled in the art without inventive efforts based on the technical solution of the present invention are still within the protection scope of the present invention.

Claims (14)

1. An abnormal root cause positioning method based on equipment topology and causal relationship is characterized by comprising the following steps:
storing the topological relation of the network equipment through a graph database;
collecting historical alarm information for processing, generating data to be analyzed, calculating alarm causal relationship by using LiNMAG algorithm, and calculating alarm causal relationship weight by using conditional probability formula;
and analyzing the alarm information generated in real time based on the topological relation, the alarm causal relation and the weight of the network equipment to obtain a root cause link, a root alarm and root equipment.
2. The method for positioning abnormal root cause based on equipment topology and causal relationship as claimed in claim 1, wherein saving the network equipment topology relationship through a graph database comprises:
and defining each network device as a vertex of the graph database, and defining the connection relation between the network devices as the edge of the graph database, so as to record and write the connection relation between the network devices and the network devices into the graph database.
3. The abnormal root cause positioning method based on the equipment topology and the causal relationship according to claim 1, wherein collecting historical alarm information for processing and generating data to be analyzed comprises:
collecting history alarm information, selecting the history alarm information in a time slice after each piece of history alarm information occurs, and screening out the history alarm information occurring on network equipment within two hops by using the connection relation between the network equipment;
and processing the screened historical alarm information in a one-hot mode to generate data to be analyzed.
4. The method for locating abnormal root cause based on device topology and causal relationship according to claim 1, wherein analyzing the alarm information generated in real time based on the network device topology relationship, alarm causal relationship and their weights to obtain a root cause link, a root alarm and a root device comprises:
for each piece of alarm information generated in real time, selecting the alarm information in a time slice after the alarm information generated in real time occurs;
obtaining all network devices in the alarm information through a graph database, screening the network devices within two hops by using the connection relation between the network devices, and recording the connection relation between the network devices within two hops through a single graph space;
analyzing the network devices in the graph space one by one, and if a plurality of alarms occur on one network device, determining a root alarm occurring on the network device;
if the graph space is not fully connected, root cause positioning analysis is needed to be carried out by taking a connected subgraph as a unit, and a root cause link, a root alarm and root equipment are obtained and summarized.
5. The abnormal root cause positioning method based on the device topology and the causal relationship according to claim 4, wherein for a plurality of alarms occurring on a network device, a directed graph is constructed first, and then a starting alarm of a longest path calculated according to a causal relationship weight in the directed graph is selected as a root alarm; if no path exists in the directed graph, the alarm at the earliest time is selected as a root alarm.
6. The method for abnormal root cause localization based on device topology and causal relationship of claim 4, wherein if not fully connected in graph space, root cause localization analysis needs to be performed in units of one connected subgraph, comprising:
for each connected subgraph, firstly constructing a new digraph, traversing the edges of the new digraph, merging and associating the alarm types corresponding to the starting point and the end point network equipment of each edge with the alarm causal relationship result, if the combination of the alarm types corresponding to the starting point and the end point network equipment can be associated in the alarm causal relationship result, judging whether the weight values of the corresponding alarm causal relationship in the positive direction and the negative direction exist, if so, taking the weight values of the corresponding alarm causal relationship in the positive direction and the negative direction, then selecting the actual direction with the side with the higher weight value in the weight values of the alarm causal relationship in the positive direction and the negative direction, recording the actual direction into the constructed new digraph, and if not, recording the alarm causal relationship weight value as 0;
after the construction of the current new directed graph is completed, all paths in the new directed graph are calculated, the path with the highest sum of alarm causal relationship weight values is used as a root cause link, the starting device of the root cause link is a root device, and the alarm generated on the root device is a root alarm.
7. An abnormal root cause locating device based on equipment topology and causal relationship is characterized by comprising:
the device topology information construction module is used for storing the network device topology relationship through a graph database;
the alarm causal relationship calculation module is used for collecting historical alarm information for processing, generating data to be analyzed, calculating alarm causal relationship by using LiNMAG algorithm, and calculating alarm causal relationship weight by using conditional probability formula;
and the alarm root cause convergence module is used for analyzing the alarm information generated in real time based on the topological relation, the alarm cause-and-effect relation and the weight of the network equipment to obtain a root cause link, a root alarm and root equipment.
8. The device for locating the abnormal root cause based on the equipment topology and the causal relationship as claimed in claim 7, wherein the equipment topology information constructing module is specifically configured to:
and defining each network device as a vertex of the graph database, and defining the connection relation between the network devices as the edge of the graph database, so as to record and write the connection relation between the network devices and the network devices into the graph database.
9. The device for locating the abnormal root cause based on the equipment topology and the causal relationship according to claim 7, wherein collecting historical alarm information for processing and generating data to be analyzed comprises:
collecting history alarm information, selecting the history alarm information in a time slice after each piece of history alarm information occurs, and screening out the history alarm information occurring on the network equipment within two hops by using the connection relation between the network equipment;
and processing the screened historical alarm information in a one-hot mode to generate a piece of data to be analyzed.
10. The device according to claim 7, wherein the alarm root cause convergence module is specifically configured to:
for each piece of alarm information generated in real time, selecting the alarm information in a time slice after the alarm information generated in real time occurs;
obtaining all network devices in the alarm information through a graph database, screening out the network devices within two hops by using the connection relation between the network devices, and recording the connection relation between the network devices within two hops through a single graph space;
analyzing the network devices in the graph space one by one, and if a plurality of alarms occur on one network device, determining a root alarm occurring on the network device;
if the graph space is not fully connected, root cause positioning analysis is needed to be carried out by taking a connected subgraph as a unit, and a root cause link, a root alarm and root equipment are obtained and summarized.
11. The apparatus according to claim 10, wherein for a plurality of alarms occurring on a network device, a directed graph is constructed, and then a starting alarm of a longest path calculated according to causal weights in the directed graph is selected as a root alarm; and if no path exists in the directed graph, selecting the alarm at the earliest time as a root alarm.
12. The apparatus of claim 10, wherein if there are not all connections in the graph space, the apparatus needs to perform root location analysis in units of one connected subgraph, comprising:
for each connected subgraph, firstly constructing a new digraph, traversing the edges of the new digraph, merging and associating the alarm types corresponding to the starting point and the end point network equipment of each edge with the alarm causal relationship result, if the combination of the alarm types corresponding to the starting point and the end point network equipment can be associated in the alarm causal relationship result, judging whether the weight values of the corresponding alarm causal relationship in the positive direction and the negative direction exist, if so, taking the weight values of the corresponding alarm causal relationship in the positive direction and the negative direction, then selecting the actual direction with the side with the higher weight value in the weight values of the alarm causal relationship in the positive direction and the negative direction, recording the actual direction into the constructed new digraph, and if not, recording the alarm causal relationship weight value as 0;
after the construction of the current new directed graph is completed, all paths in the new directed graph are calculated, the path with the highest sum of alarm causal relationship weight values is used as a root cause link, the starting device of the root cause link is a root device, and the alarm generated on the root device is a root alarm.
13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1-6 when executing the computer program.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1-6.
CN202211320044.3A 2022-10-26 2022-10-26 Abnormal root cause positioning method and device based on equipment topology and causal relationship Pending CN115720186A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211320044.3A CN115720186A (en) 2022-10-26 2022-10-26 Abnormal root cause positioning method and device based on equipment topology and causal relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211320044.3A CN115720186A (en) 2022-10-26 2022-10-26 Abnormal root cause positioning method and device based on equipment topology and causal relationship

Publications (1)

Publication Number Publication Date
CN115720186A true CN115720186A (en) 2023-02-28

Family

ID=85254355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211320044.3A Pending CN115720186A (en) 2022-10-26 2022-10-26 Abnormal root cause positioning method and device based on equipment topology and causal relationship

Country Status (1)

Country Link
CN (1) CN115720186A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230336403A1 (en) * 2022-03-03 2023-10-19 Arista Networks, Inc. Root cause analysis for operational issues using a rules mining algorithm

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230336403A1 (en) * 2022-03-03 2023-10-19 Arista Networks, Inc. Root cause analysis for operational issues using a rules mining algorithm

Similar Documents

Publication Publication Date Title
KR102483025B1 (en) Operational maintenance systems and methods
JP6706321B2 (en) Method and device for service call information processing
US10338982B2 (en) Hybrid and hierarchical outlier detection system and method for large scale data protection
CN107171819B (en) Network fault diagnosis method and device
JP5878537B2 (en) Evaluation of data flow graph characteristics
US20170161131A1 (en) Identification of storage system elements causing performance degradation
CN107124289B (en) Weblog time alignment method, device and host
CN110825769A (en) Data index abnormity query method and system
US20060041659A1 (en) Method and apparatus for correlating events in a network
US20090106174A1 (en) Methods, systems, and computer program products extracting network behavioral metrics and tracking network behavioral changes
CN110493025A (en) It is a kind of based on the failure root of multilayer digraph because of the method and device of diagnosis
CN109697455B (en) Fault diagnosis method and device for distribution network switch equipment
US20140279797A1 (en) Behavioral rules discovery for intelligent computing environment administration
CN112181758A (en) Fault root cause positioning method based on network topology and real-time alarm
CN112559376A (en) Automatic positioning method and device for database fault and electronic equipment
CN107611962A (en) Network system branch road searching method, system and electronic equipment
CN115720186A (en) Abnormal root cause positioning method and device based on equipment topology and causal relationship
US20230118175A1 (en) Event analysis in an electric power system
CN108664346A (en) The localization method of the node exception of distributed memory system, device and system
Lin et al. Facgraph: Frequent anomaly correlation graph mining for root cause diagnose in micro-service architecture
CN110493176B (en) User suspicious behavior analysis method and system based on unsupervised machine learning
CN117336228A (en) IGP simulation recommendation method, device and medium based on machine learning
Toka et al. Predicting cloud-native application failures based on monitoring data of cloud infrastructure
WO2024088025A1 (en) Automated 5gc network element management method and apparatus based on multi-dimensional data
CN109818808A (en) Method for diagnosing faults, device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination