CN108809734B - Network alarm root analysis method, system, storage medium and computer equipment - Google Patents

Network alarm root analysis method, system, storage medium and computer equipment Download PDF

Info

Publication number
CN108809734B
CN108809734B CN201810777256.1A CN201810777256A CN108809734B CN 108809734 B CN108809734 B CN 108809734B CN 201810777256 A CN201810777256 A CN 201810777256A CN 108809734 B CN108809734 B CN 108809734B
Authority
CN
China
Prior art keywords
alarm
shortest
matrix
reachable path
instance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810777256.1A
Other languages
Chinese (zh)
Other versions
CN108809734A (en
Inventor
谢远航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Si Tech Information Technology Co Ltd
Original Assignee
Beijing Si Tech Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Si Tech Information Technology Co Ltd filed Critical Beijing Si Tech Information Technology Co Ltd
Priority to CN201810777256.1A priority Critical patent/CN108809734B/en
Publication of CN108809734A publication Critical patent/CN108809734A/en
Application granted granted Critical
Publication of CN108809734B publication Critical patent/CN108809734B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/064Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving time analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a network alarm root analysis method and a system, wherein the method comprises the following steps: performing off-line analysis according to the network topology relation data and the monitoring index data to obtain an optimal shortest reachable path matrix of the alarm element; and performing transaction division on the alarm instances according to preset time step length based on the occurrence time of the alarm instances, performing alarm root analysis by taking the alarm transactions as a unit according to the optimal alarm element shortest reachable path matrix, and determining the alarm instances which are alarm roots in the alarm transactions. According to the invention, the topological data is converted into the topological matrix of the alarm element through the network topological relation data and the monitoring index data, and the analysis is carried out according to the dimension of the alarm element, so that the real-time alarm data can be more intuitively and accurately analyzed; clustering is carried out according to the alarm occurrence time, and the alarm data is divided by taking the transaction as a unit, so that the relation among the alarms in the same time range is enhanced, and the noise influence of the data randomness topology complexity of topology retrieval according to a single alarm is reduced.

Description

Network alarm root analysis method, system, storage medium and computer equipment
Technical Field
The invention relates to the technical field of computer application, in particular to a network alarm root cause analysis method, a system, a medium and computer equipment.
Background
With the continuous development of information technology and information scale, the scale of network application is also continuously enlarged, and the scale of equipment supporting the network application is correspondingly increased. Because the service volume is huge and complex, a monitoring support system can generate a large amount of alarms in daily operation and maintenance, and the operation and maintenance personnel are limited, in order to guarantee the service quality and reduce the operation and maintenance cost, an intelligent and high-accuracy alarm root cause analysis solution is needed, the operation and maintenance personnel are helped to quickly locate the problem and eliminate the fault of the service system, and the high-quality and stable operation of the application is ensured to become an important problem facing the current intelligent monitoring.
In a network operation and maintenance system, when an application or equipment fails, a monitoring system sends alarm information and pushes the alarm information to operation and maintenance personnel of the system, when the operation and maintenance scale is large, the relation between software and hardware network components is complex, the monitoring indexes of the monitoring system are various, the operation and maintenance personnel can continuously receive a large amount of alarm information, particularly when the basic equipment fails, new application is on-line or the system is cut off, the situation is particularly obvious, and then some alarms in the large amount of alarm information are often related, for example, the interface failure of the application is caused by the failure of insufficient disk space of the equipment; by the aid of the intelligent high-accuracy alarm root cause analysis solution, operation and maintenance personnel can be helped to locate the root causes of the problems by reducing the importance of low-dimensional alarms and recommending important (root cause) alarms, and the alarms are processed quickly and efficiently, so that system faults can be recovered quickly.
The existing alarm root cause analysis technology mainly judges whether the current alarm is a root cause alarm or not by directly searching whether a father node or a child node exists in the topological relation through alarm equipment information according to the topological relation of network resources, or performs dimensionality reduction processing on the network topological relation, decomposes the network topological relation into a plurality of links, judges whether the alarm is the root cause alarm or not through the position of an alarm example in the link, and is based on some simple judgments of the network topological relation, and when the alarm information in the same link has a certain discrete degree, the root cause analysis uncertainty is caused; particularly, at present, most alarm information of the monitoring system depends on periodic information acquisition, and alarm information with stronger relevance possibly does not arrive at the monitoring system at the same time point; most of the methods for directly searching the network topology are only based on a layer of topology relationship, and are easily influenced by the cross link.
Disclosure of Invention
The invention aims to solve the technical problem of the prior art and provides a network alarm root cause analysis method, a system, a medium and computer equipment.
The technical scheme for solving the technical problems is as follows: a network alarm root analysis method comprises the following steps:
s1, performing off-line analysis according to the network topology relation data and the monitoring index data to obtain an optimal alarm element shortest reachable path matrix;
s2, based on the alarm instance generation time, the alarm instance is divided according to the preset time step length, the alarm root analysis is carried out by taking the alarm instance as the unit according to the optimal alarm element shortest reachable path matrix, and the alarm instance which is the alarm root in the alarm instance is determined.
The technical scheme for solving the technical problems is as follows: a network alarm root cause analysis system comprises an off-line analysis module and an alarm root cause determination module;
the off-line analysis module is used for carrying out off-line analysis according to the network topology relation data and the monitoring index data to obtain a shortest reachable path matrix of the preferred alarm element;
and the alarm root cause determining module is used for performing transaction division on the alarm instances according to preset time step length based on the alarm instance occurrence time, performing alarm root cause analysis by taking the alarm transaction as a unit according to the optimal alarm element shortest reachable path matrix, and determining the alarm instances which are alarm root causes in the alarm transactions.
The technical scheme for solving the technical problems is as follows: a computer readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of the above aspect.
The technical scheme for solving the technical problems is as follows: a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of the above aspect when executing the program.
The invention has the beneficial effects that: the method comprises the steps of off-line analysis and real-time alarm root cause analysis based on the existing network topology relation data and monitoring index data; the optimal alarm element shortest reachable path matrix is obtained by performing off-line analysis on the existing network topology relation data and the existing monitoring index data, and the optimal alarm element shortest reachable path matrix is taken as an important basis for real-time alarm root analysis, namely, the analysis is performed from the dimension of the alarm element, so that the real-time alarm data is more intuitively and accurately analyzed; the real-time alarm root cause analysis refers to the process of performing alarm time-slice transaction segmentation on alarm instances generated by a monitoring system, performing real-time online root cause analysis on the segmented alarm transactions, recommending one or more alarms most probably serving as alarm roots in the alarm transactions by combining with an optimal alarm element shortest reachable path matrix obtained by offline analysis, and dividing alarm data by taking the transactions as a unit, so that the relation among the alarms in the same time range is enhanced, and the noise influence of the data randomness topology complexity of topology retrieval according to a single alarm is reduced. The invention has small expenditure in operation, and the off-line analysis only needs to be calculated when the topology data is updated and initialized; in the real-time alarm root analysis, the alarm transactions in batches are analyzed, the alarm transactions are traversed based on the optimal alarm element shortest reachable path matrix, and the time complexity is low. The invention relies on the network topological graph, has simple configuration and easy use, and can be widely applied to the field of network monitoring and operation and maintenance.
Drawings
FIG. 1 is a schematic flow chart of a network alarm root cause analysis method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a real-time alarm root cause analysis process provided in an embodiment of the present invention;
fig. 3 is a block diagram of a network alarm root cause analysis system according to an embodiment of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a schematic flow chart of a network alarm root cause analysis method according to an embodiment of the present invention. As shown in fig. 1, the network alarm root cause analysis method includes:
s1, performing off-line analysis according to the network topology relation data and the monitoring index data to obtain an optimal alarm element shortest reachable path matrix;
s2, based on the alarm instance generation time, the alarm instance is divided according to the preset time step length, the alarm root analysis is carried out by taking the alarm instance as the unit according to the optimal alarm element shortest reachable path matrix, and the alarm instance which is the alarm root in the alarm instance is determined.
In this embodiment, the network topology relationship data: the method comprises the steps of obtaining directed incidence relation data between software and hardware in a network through manual drawing or through calling service analysis among modules in the network; monitoring index data refers to specific indexes of the monitoring system for acquiring and monitoring software and hardware resources, such as: CPU utilization of the host; alarm data: the content of the log records generated by the software and hardware components due to the fact that the relevant monitoring index value exceeds a preset threshold value comprises an alarm ID, alarm generation time, alarm level and the like, and the alarm information is presented to a monitoring operation and maintenance system to be further processed by operation and maintenance personnel. In addition, the network topology relation data, the monitoring index data and the alarm data adopted by the embodiment are preprocessed mobile industry real data.
The embodiment comprises off-line analysis and real-time alarm root cause analysis based on the existing network topology relation data and monitoring index data; the optimal alarm element shortest reachable path matrix is obtained by performing off-line analysis on the existing network topology relation data and the existing monitoring index data, and the optimal alarm element shortest reachable path matrix is taken as an important basis for real-time alarm root analysis, namely, the analysis is performed from the dimension of the alarm element, so that the real-time alarm data is more intuitively and accurately analyzed; the real-time alarm root cause analysis refers to the process of performing alarm time-slice transaction segmentation on alarm instances generated by a monitoring system, performing real-time online root cause analysis on the segmented alarm transactions, recommending one or more alarms most probably serving as alarm roots in the alarm transactions by combining with an optimal alarm element shortest reachable path matrix obtained by offline analysis, and dividing alarm data by taking the transactions as a unit, so that the relation among the alarms in the same time range is enhanced, and the noise influence of the data randomness topology complexity of topology retrieval according to a single alarm is reduced. The invention has small expenditure in operation, and the off-line analysis only needs to be calculated when the topology data is updated and initialized; in the real-time alarm root analysis, the alarm transactions in batches are analyzed, the alarm transactions are traversed based on the optimal alarm element shortest reachable path matrix, and the time complexity is low. The invention relies on the network topological graph, has simple configuration and easy use, and can be widely applied to the field of network monitoring and operation and maintenance.
Optionally, the S1 includes:
s11, acquiring a directed topology relation matrix of the alarm element based on the network topology relation data sample and the monitoring index data sample; the network topology relation data sample refers to a set of directed relation records of network resources stored in a database, the monitoring index data sample refers to a set of monitoring system index records stored in the database, and the alarm element refers to the minimum unit for generating an alarm in the network system.
S12, calculating the shortest reachable path between each alarm element in the alarm element directed topology relation matrix according to the Dixter algorithm, and generating an alarm element shortest reachable path matrix P;
element a in alarm element shortest reachable path matrix PmnK represents that the alarm element m may cause the alarm element n to generate an alarm (that is, the alarm element m may reach the alarm element n), and the length of the shortest path from the alarm element m to the alarm element n is k;
Figure BDA0001731665120000051
and S13, filtering the shortest reachable path matrix of the alarm element according to a preset shortest path threshold value to obtain an optimal shortest reachable path matrix of the alarm element, wherein the element value of the optimal shortest reachable path matrix is less than or equal to the shortest path threshold value.
In this embodiment, a shortest path threshold M may be set according to the actual application requirement and the scale of the network topology, and the shortest reachable path matrix of the alarm element is filtered according to the preset shortest path threshold M, if amnK in k>M, then amnAnd setting the shortest path matrix to be null, and obtaining the optimal alarm element shortest reachable path matrix P' with the element value smaller than or equal to the shortest path threshold value M.
In the above embodiment, the topological relation of the alarm elements is converted into the shortest reachable path matrix of the alarm elements, so that the relevance strength between the alarm elements is more intuitively represented, and meanwhile, a direct connection relation is established between the alarm elements which are not directly related, so that the influence of a data acquisition period on the alarm elements which are not directly related during alarm analysis is reduced, and the problem of misanalysis caused by delayed alarm reporting of a certain alarm element in the middle of an alarm related link is avoided.
Optionally, the S13 includes:
and setting element values of the alarm element shortest reachable path matrix, the element values of which are greater than a preset shortest path threshold value, to be null, so as to obtain an optimal alarm element shortest reachable path matrix.
In the above embodiment, the threshold of the shortest path is set according to the actual analysis requirement, the shortest reachable path is adjusted, and some paths with longer maximum reachable paths are deleted, so that the influence caused by data randomness can be greatly reduced, and the misjudgment of the non-parent node is reduced.
Optionally, S2 specifically includes:
s21, performing transaction division on the alarm instance according to the preset time step length based on the alarm instance occurrence time;
s22, when the alarm element of the alarm instance in the alarm affair exists in the shortest reachable path matrix of the optimized alarm element and the alarm element does not have a father node, determining the sub-alarm number of the alarm instance in the alarm affair according to the shortest reachable path matrix of the optimized alarm element;
s23, when the number of the sub-alarms is larger than the number threshold of the preset sub-alarm set elements, the alarm instance is marked as the alarm source.
In the above embodiment, the optimal shortest reachable path matrix of the alarm elements obtained by setting the threshold is analyzed in a manner of comparing the coincidence degree of the alarm transaction with the optimal shortest reachable path matrix of the alarm elements, so that the influence caused by the data acquisition period is reduced, and when a certain point in the middle of the alarm topology link cannot be reported together with other alarms in the link due to problems of acquisition, network failure and the like, the root cause of the alarm link can also be analyzed through the reachable path.
Optionally, the S2 further includes:
when the alarm element of the alarm instance in the alarm transaction does not exist in the shortest reachable path matrix of the preferred alarm element;
or the alarm element of the alarm instance in the alarm transaction exists in the shortest reachable path matrix of the preferred alarm element, but the alarm element exists in a father node;
or the alarm element of the alarm instance in the alarm transaction exists in the shortest reachable path matrix of the preferred alarm element, and the alarm element does not have a father node, but the number of the sub-alarms of the alarm instance in the alarm transaction at this time is less than or equal to the number threshold of the preset sub-alarm set elements;
the alarm instance is marked as a non-alarm root cause.
Optionally, the determining, according to the optimal alarm element shortest reachable path matrix, the number of sub-alarms of the alarm instance in the current alarm transaction includes: determining a sub-alarm set U ═ c of an alarm instance a in an alarm transaction x1,c2,…cnN is the number of sub-alarms, each element c in the set UnSatisfy cnE x and sub alarm cnThe alarm element Cn and the alarm element A of the alarm example a have a path P in the shortest reachable path matrix of the preferred alarm elementA.Cn
In the above embodiment, whether an instance of an alarm transaction exists in the shortest reachable path matrix is checked to determine whether there is a known association relationship of the alarm instance, and alarm data without the association relationship is excluded; whether the alarm instance in the alarm transaction is a root node on an alarm associated link is determined by checking whether a father node exists in the alarm element shortest reachable path matrix or not; determining whether the sub-alarm number of the root alarm meets the requirement or not according to whether the sub-alarm number is larger than a threshold value or not; through the three steps, the fact that the alarm is a father alarm, namely a root alarm, of a plurality of other alarms in the alarm transaction is confirmed one by one.
Specifically, the real-time alarm root cause analysis step is used for analyzing the alarm examples generated by each alarm element to obtain which alarms are more likely to be the root cause of a group of alarms. The method flow shown in fig. 2 includes the following steps.
1. Performing time slice segmentation on the alarm examples according to preset time step length based on the occurrence time of the alarm examples, dividing real-time alarms into batch alarm transactions, wherein each alarm transaction comprises a plurality of alarm examples occurring in the same time period;
2. judging whether an alarm element A generating an alarm instance a in the real-time alarm transaction x exists in a preferred alarm element shortest reachable path matrix P', if so, continuing to execute the step 3, otherwise, triggering a non-root cause mark;
3. judging whether an alarm element A generating an alarm instance a exists in the real-time alarm transaction x, whether an alarm instance B exists in the alarm transaction x, and whether an alarm element B of the alarm instance B is B and a path P exists in a preferred shortest reachable path matrix P' of the alarm elementsB.AIf not, continuing to execute the step 4, otherwise triggering a non-root cause mark;
4. aiming at the alarm example a in the real-time alarm transaction x, calculating a sub-alarm set U ═ { c } of the alarm example a1,c2,…cnH, where each element c in the set U isnSatisfy cnE is x and cnThe alarm element Cn of (a) and the alarm element A of (a) have a path P in the shortest reachable path matrix P' of the preferred alarm elementA.Cn
5. Setting a sub-alarm set element number threshold value N according to the actual application requirements of the root cause and the network scale;
6. and judging whether the number of elements in the sub alarm set U of the alarm example a is larger than the number threshold value N of the elements in the sub alarm set, if so, triggering a root cause mark, otherwise, triggering a non-root cause mark.
Optionally, in S1, the optimal shortest reachable path matrix of the alarm element obtained through offline analysis is stored in a register, and when the network topology relationship data and/or the monitoring index data are updated, the optimal shortest reachable path matrix of the alarm element is obtained again and updated into the register.
The network alarm root cause analysis method provided by the embodiment of the invention is described in detail above with reference to fig. 1 to 2. The network alarm root cause analysis system provided by the embodiment of the invention is described in detail below with reference to fig. 3. The system comprises an offline analysis module and an alarm root cause determination module.
The off-line analysis module carries out off-line analysis according to the network topology relation data and the monitoring index data to obtain an optimal alarm element shortest reachable path matrix; and the alarm root cause determining module is used for performing transaction division on the alarm instances according to preset time step length based on the alarm instance occurrence time, performing alarm root cause analysis by taking the alarm transactions as a unit according to the optimal alarm element shortest reachable path matrix, and determining the alarm instances which are alarm root causes in the alarm transactions.
The embodiment of the invention also provides a computer-readable storage medium, which comprises instructions, and when the instructions are run on a computer, the computer is enabled to execute the network alarm root cause analysis method in the scheme.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the network alarm root cause analysis method according to the above scheme is implemented.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (9)

1. A network alarm root cause analysis method is characterized by comprising the following steps:
s1, performing off-line analysis according to the network topology relation data and the monitoring index data to obtain an optimal alarm element shortest reachable path matrix;
the S1 includes:
s11, acquiring an alarm element directed topology relation matrix based on the network topology relation data sample and the monitoring index data sample; the network topology relation data sample refers to a set of directed relation records of network resources stored in a database, the monitoring index data sample refers to a set of monitoring system index records stored in the database, and the alarm element refers to the minimum unit for generating alarms in the network system;
s12, calculating the shortest reachable path between each alarm element in the alarm element directed topology relation matrix according to the Dixter algorithm, and generating an alarm element shortest reachable path matrix P;
element a in alarm element shortest reachable path matrix PmnK represents that the alarm element m may cause the alarm element n to generate an alarm, and the length of the shortest path from the alarm element m to the alarm element n is k;
Figure FDA0002826269710000011
s13, filtering the shortest reachable route matrix of the alarm element according to a preset shortest route threshold value to obtain an optimal shortest reachable route matrix of the alarm element, wherein the element value of the optimal shortest reachable route matrix is less than or equal to the shortest route threshold value;
s2, based on the alarm instance generation time, the alarm instance is divided according to the preset time step length, the alarm root analysis is carried out by taking the alarm instance as the unit according to the optimal alarm element shortest reachable path matrix, and the alarm instance which is the alarm root in the alarm instance is determined.
2. The method according to claim 1, wherein the S13 includes:
and setting element values of the alarm element shortest reachable path matrix, the element values of which are greater than a preset shortest path threshold value, to be null, so as to obtain an optimal alarm element shortest reachable path matrix.
3. The method according to claim 1, wherein S2 specifically comprises:
s21, performing transaction division on the alarm instance according to the preset time step length based on the alarm instance occurrence time;
s22, when the alarm element of the alarm instance in the alarm affair exists in the shortest reachable path matrix of the optimized alarm element and the alarm element does not have a father node, determining the sub-alarm number of the alarm instance in the alarm affair according to the shortest reachable path matrix of the optimized alarm element;
s23, when the number of the sub-alarms is larger than the number threshold of the preset sub-alarm set elements, the alarm instance is marked as the alarm source.
4. The method according to claim 3, wherein the S2 further comprises:
when the alarm element of the alarm instance in the alarm transaction does not exist in the shortest reachable path matrix of the preferred alarm element;
or the alarm element of the alarm instance in the alarm transaction exists in the shortest reachable path matrix of the preferred alarm element, but the alarm element exists in a father node;
or the alarm element of the alarm instance in the alarm transaction exists in the shortest reachable path matrix of the preferred alarm element, and the alarm element does not have a father node, but the number of the sub-alarms of the alarm instance in the alarm transaction at this time is less than or equal to the number threshold of the preset sub-alarm set elements;
the alarm instance is marked as a non-alarm root cause.
5. The method of claim 3, wherein the determining the number of sub-alarms of the alarm instance in the current alarm transaction according to the preferred alarm element shortest reachable path matrix comprises:
determining a sub-alarm set U ═ c of an alarm instance a in an alarm transaction x1,c2,…cnN is the number of sub-alarms, each element c in the set UnSatisfy cnE x and sub alarm cnThe alarm element Cn and the alarm element A of the alarm example a have a path P in the shortest reachable path matrix of the preferred alarm elementA.Cn
6. The method according to any one of claims 1 to 5, wherein the matrix of the shortest reachable path of the preferred alarm element obtained by the offline analysis is stored in a register in S1, and when the network topology relation data and/or the monitoring index data are updated, the matrix of the shortest reachable path of the preferred alarm element is obtained again and updated into the register.
7. A network alarm root cause analysis system is characterized by comprising an off-line analysis module and an alarm root cause determination module;
the off-line analysis module is used for carrying out off-line analysis according to the network topology relation data and the monitoring index data to obtain a shortest reachable path matrix of the preferred alarm element;
the offline analysis module is specifically configured to: acquiring an alarm element directed topology relation matrix based on the network topology relation data sample and the monitoring index data sample; the network topology relation data sample refers to a set of directed relation records of network resources stored in a database, the monitoring index data sample refers to a set of monitoring system index records stored in the database, and the alarm element refers to the minimum unit for generating alarms in the network system;
calculating the shortest reachable path among all the alarm elements in the alarm element directed topology relation matrix according to a Dixter algorithm, and generating an alarm element shortest reachable path matrix P;
element a in alarm element shortest reachable path matrix PmnK represents that the alarm element m may cause the alarm element n to generate an alarm, and the length of the shortest path from the alarm element m to the alarm element n is k;
Figure FDA0002826269710000031
filtering the shortest reachable path matrix of the alarm element according to a preset shortest path threshold value to obtain an optimal shortest reachable path matrix of the alarm element, wherein the element value of the optimal shortest reachable path matrix is less than or equal to the shortest path threshold value;
and the alarm root cause determining module is used for performing transaction division on the alarm instances according to preset time step length based on the alarm instance occurrence time, performing alarm root cause analysis by taking the alarm transaction as a unit according to the optimal alarm element shortest reachable path matrix, and determining the alarm instances which are alarm root causes in the alarm transactions.
8. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 6.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 6 when executing the program.
CN201810777256.1A 2018-07-16 2018-07-16 Network alarm root analysis method, system, storage medium and computer equipment Active CN108809734B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810777256.1A CN108809734B (en) 2018-07-16 2018-07-16 Network alarm root analysis method, system, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810777256.1A CN108809734B (en) 2018-07-16 2018-07-16 Network alarm root analysis method, system, storage medium and computer equipment

Publications (2)

Publication Number Publication Date
CN108809734A CN108809734A (en) 2018-11-13
CN108809734B true CN108809734B (en) 2021-02-19

Family

ID=64076861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810777256.1A Active CN108809734B (en) 2018-07-16 2018-07-16 Network alarm root analysis method, system, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN108809734B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135603B (en) * 2019-05-21 2022-11-11 国网河南省电力公司信息通信公司 Power network alarm space characteristic analysis method based on improved entropy weight method
CN110351136B (en) * 2019-07-04 2022-06-28 创新先进技术有限公司 Fault positioning method and device
CN110661660B (en) * 2019-09-25 2021-09-10 北京宝兰德软件股份有限公司 Alarm information root analysis method and device
CN110855502A (en) * 2019-11-22 2020-02-28 叶晓斌 Fault cause determination method and system based on time-space analysis log
CN113282461B (en) * 2021-05-28 2023-06-23 中国联合网络通信集团有限公司 Alarm identification method and device for transmission network
CN113364623A (en) * 2021-06-04 2021-09-07 上海天旦网络科技发展有限公司 Method and system for reducing alarm misjudgment based on path diagram and network performance index

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101577636A (en) * 2009-06-05 2009-11-11 中兴通讯股份有限公司 Method and device for determining alarm correlation matrix and analyzing alarm correlation
CN104468191A (en) * 2014-11-05 2015-03-25 国家电网公司 Electric power telecommunication fault early warning method and system based on time window and network model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7844027B2 (en) * 2008-02-22 2010-11-30 Morpho Detection, Inc. XRD-based false alarm resolution in megavoltage computed tomography systems

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101577636A (en) * 2009-06-05 2009-11-11 中兴通讯股份有限公司 Method and device for determining alarm correlation matrix and analyzing alarm correlation
CN104468191A (en) * 2014-11-05 2015-03-25 国家电网公司 Electric power telecommunication fault early warning method and system based on time window and network model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种电信告警关联规则的改进算法;Zhu Yongxuan,Xu Qianfang,Guo Jun;《现代传输》;20061231;第31卷(第2期);71-74 *

Also Published As

Publication number Publication date
CN108809734A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN108809734B (en) Network alarm root analysis method, system, storage medium and computer equipment
CN110851321B (en) Service alarm method, equipment and storage medium
CN110493065B (en) Alarm correlation degree analysis method and system for cloud center operation and maintenance
CN111984503A (en) Method and device for identifying abnormal data of monitoring index data
CN113572625B (en) Fault early warning method, early warning device, equipment and computer medium
CN109257383B (en) BGP anomaly detection method and system
KR20150038905A (en) Apparatus and method for preprocessinig data
CN115809183A (en) Method for discovering and disposing information-creating terminal fault based on knowledge graph
CN114465874B (en) Fault prediction method, device, electronic equipment and storage medium
CN112416724A (en) Alarm processing method, system, computer equipment and storage medium
US10733514B1 (en) Methods and apparatus for multi-site time series data analysis
CN113992340B (en) User abnormal behavior identification method, device, equipment and storage medium
CN113986595A (en) Abnormity positioning method and device
CN111970168A (en) Method and device for monitoring full-link service node and storage medium
US11182267B2 (en) Methods and systems to determine baseline event-type distributions of event sources and detect changes in behavior of event sources
CN110889597A (en) Method and device for detecting abnormal business timing sequence indexes
CN106961358A (en) Web application system cluster method for monitoring operation states and its system based on daily record
CN112583847A (en) Method for network security event complex analysis for medium and small enterprises
CN112612844A (en) Data processing method, device, equipment and storage medium
CN116668264A (en) Root cause analysis method, device, equipment and storage medium for alarm clustering
CN115391148A (en) Anomaly detection method and apparatus
CN115033412A (en) Task log merging method and device
CN112905479B (en) Cloud platform-based method and system for determining optimal path of alarm accident root cause
CN113656452A (en) Method and device for detecting abnormal index of call chain, electronic equipment and storage medium
CN113568950A (en) Index detection method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant