CN116545835A - Fault alarm processing method and device, electronic equipment and storage medium - Google Patents

Fault alarm processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116545835A
CN116545835A CN202310671791.XA CN202310671791A CN116545835A CN 116545835 A CN116545835 A CN 116545835A CN 202310671791 A CN202310671791 A CN 202310671791A CN 116545835 A CN116545835 A CN 116545835A
Authority
CN
China
Prior art keywords
fault alarm
alarm information
fault
information
network topology
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310671791.XA
Other languages
Chinese (zh)
Inventor
杨光宇
周莉
郈大兰
陈栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310671791.XA priority Critical patent/CN116545835A/en
Publication of CN116545835A publication Critical patent/CN116545835A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides a fault alarm processing method, a device, electronic equipment and a storage medium, which relate to the field of financial science and technology and other related fields, wherein the method comprises the following steps: acquiring first fault warning information of a first object of an application system; determining whether at least one network topology branch of the first object exists according to the first fault alarm information and the network topology structure of the application system; each network topology branch includes: the second object is positioned at a level above the level where the first object is positioned and has a network topological relation with the first object; if so, determining whether second fault warning information of a second object in the network topology branch exists; if the first fault alarm information exists, the first fault alarm information and the second fault alarm information of the second object in the network topology branch are aggregated, and the aggregated fault alarm information is obtained. The method reduces the possibility of alarm storm of the fault alarm of the application system.

Description

Fault alarm processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of financial science and technology and other related fields, and in particular, to a fault alarm processing method, a device, an electronic apparatus, and a storage medium.
Background
In the running process of the application system, software and hardware supporting the running of the application system, such as a server, a switch, an operating system deployed on the server, application software and the like, may all have faults. In order to ensure the normal operation of the application system, at present, whether faults occur in the running process of software and hardware in the application system is monitored and alarmed generally so as to realize the control of the running condition of the application system, and the faults occurring in the running process of the application system are processed in time, so that the normal operation of the application system is ensured.
However, in the current fault monitoring and alarming process of the application system, there is a problem that an alarm storm is easy to occur, namely a large amount of fault alarm information is easy to occur in a short time, so that the fault alarm information is disordered, and the root cause of the fault cannot be rapidly and accurately determined.
Disclosure of Invention
The application provides a fault alarm processing method, a device, electronic equipment and a storage medium, which are used for solving the problem that alarm storm is easy to occur in the fault monitoring alarm process of an application system.
In a first aspect, the present application provides a task processing method, the method including:
Acquiring first fault warning information of a first object of an application system;
determining whether at least one network topology branch of the first object exists according to the first fault alarm information and the network topology structure of the application system; each of the network topology branches includes: the second object is positioned at a level above the level where the first object is positioned and has a network topological relation with the first object;
if so, determining whether second fault warning information of the second object in the network topology branch exists;
and if the first fault alarm information exists, aggregating the first fault alarm information with the second fault alarm information of the second object in the network topology branch to obtain the aggregated fault alarm information.
Optionally, the aggregating the first fault alert information with the second fault alert information of the second object in the network topology branch includes:
if at least two levels of second objects exist in the network topology branch, determining third equipment positioned at the highest level from the at least two levels of second objects;
and aggregating the existing fault alarm information of the rest of the second objects and the first fault alarm information into third fault alarm information of the third equipment to obtain aggregated fault alarm information.
Optionally, the aggregating the existing fault alarm information of the remaining second objects and the first fault alarm information into the fault alarm information of the third device to obtain aggregated fault alarm information includes:
and aggregating the existing fault alarm information of the rest second objects and the fault alarm information of which the time difference between the generation time of the first fault alarm information and the generation time of the third fault alarm information is smaller than or equal to the preset time length into the third fault alarm information to obtain the aggregated fault alarm information.
Optionally, after the determining whether at least one network topology branch of the first object exists, the method further comprises:
if not, determining whether historical first fault warning information of the first object exists;
if so, aggregating the historical first fault alarm information with the time difference less than or equal to the preset time length with the first fault alarm information to obtain aggregated fault alarm information.
Optionally, the method further comprises:
and determining the root cause of the fault alarm according to the aggregated fault alarm information.
Optionally, the method further comprises:
determining the hierarchy of the first object according to the first fault alarm information and the network topology structure of the application system;
if the hierarchy of the first object cannot be determined, outputting prompt information of abnormality of the fault alarm information.
Optionally, the method further comprises:
acquiring configuration information of equipment in the application system;
and according to the configuration information of the equipment, correlating the objects in the application system to obtain the network topology structure.
In a second aspect, the present application provides a fault alert processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring first fault alarm information of a first object of the application system;
the first determining module is used for determining whether at least one network topology branch of the first object exists according to the first fault alarm information and the network topology structure of the application system; each of the network topology branches includes: the second object is positioned at a level above the level where the first object is positioned and has a network topological relation with the first object;
a second determining module, configured to determine whether second fault alert information of the second object in the network topology branch already exists, if so;
And the aggregation module is used for aggregating the first fault alarm information with the second fault alarm information of the second object in the network topology branch if the first fault alarm information exists, so as to obtain the aggregated fault alarm information.
In a third aspect, the present application provides an electronic device, including: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement the method of any one of the first aspects.
In a fourth aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, are adapted to carry out the task processing method according to any one of the first aspects.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the method according to any of the first aspects.
According to the fault alarm processing method, the device, the electronic equipment and the storage medium, the electronic equipment determines whether a second object which is higher than the first object and has a network topological relation with the first object exists or not according to the first fault alarm information and the network topological structure of the application system; if so, determining whether second fault warning information of a second object in the network topology branch exists; and after determining that the first fault alarm information exists, aggregating the first fault alarm information with the second fault alarm information of the second object in the network topology branch to obtain the aggregated fault alarm information. By the method, the electronic equipment can determine the reason for causing the failure of the first object according to the network topology structure of the application system, and aggregate the first failure alarm information and the second failure alarm information of the second object in the network topology branch according to the reason, so that the possibility of causing alarm storm is reduced, the failure alarm information is simplified, and further, an operation and maintenance person can quickly and accurately know the reason for causing the failure of the application system, and the workload of the staff is reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic diagram of an application system architecture;
FIG. 2 is a flow chart of a first method for processing fault alarms provided in the present application;
fig. 3 is a schematic diagram of a network topology of an application system provided in the present application;
FIG. 4 is a flow chart of a second method for processing fault alarms provided in the present application;
FIG. 5 is a schematic structural diagram of a third fault alarm processing device provided in the present application;
FIG. 6 is a schematic structural diagram of a fault alarm processing device provided in the present application;
fig. 7 is a schematic structural diagram of an electronic device 700 provided in the present application.
Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
It should be noted that the fault alarm processing method, the device, the electronic equipment and the storage medium of the present application may be used in the field of financial technology, and may also be used in any field other than the field of financial technology, and the application fields of the fault alarm processing method, the device, the electronic equipment and the storage medium are not limited in the present application.
Normal operation of existing application systems (e.g., banking information systems, etc.) typically requires the common support of numerous software and hardware. For example, network devices, such as routers, switches, etc., are typically included in an application system; also included are physical devices, such as computers, servers, etc.; also included are software deployed on the physical device, such as an operating system, and software modules of an application system, such as a database, middleware, etc. If a virtual machine is deployed on a physical device, the virtual machine will include at least two operating systems thereon.
However, at present, software and hardware in the application system are relatively easy to fail in the running process. Therefore, in order to ensure the normal operation of the application system, at present, monitoring and alarming are generally performed on whether operation faults occur in software and hardware in the operation process of the application system. When any software or hardware in the application system runs abnormally, a corresponding fault alarm is triggered to accurately prompt the running condition of the application system in time, and prompt operation and maintenance personnel to maintain the fault of the application system as early as possible.
As described above, existing application systems typically include many pieces of software and hardware. Among them, there is often a complex association relationship between different hardware devices. For example, fig. 1 is a schematic architecture diagram of an application system, where, as shown in fig. 1, a network device is typically connected to at least one physical device through a network port to provide support for normal network communication of the physical device of the application system; at least one piece of software is typically deployed on a physical device to provide a hardware basis for the proper functioning of the software. In addition, there may be a dependency between software deployed on different physical devices, for example, normal operation of one software may require that the call to another software be made by calling the other software, for example, by calling the internet protocol address (Internet Protocol Address, IP address) of the other software, and then the normal operation of the software may depend on the normal operation of the other software.
Based on the above situation, when any software or hardware in the application system fails, when the corresponding failure alarm is triggered, the failure alarm for the software or hardware with the connection relationship is triggered, and the operation abnormality of the software or hardware is triggered. However, the alarm storm is caused by triggering a large number of fault alarms in a short time, so that fault alarm information is disordered, further, the root cause of the fault can not be determined quickly and accurately by operation and maintenance personnel, the workload of manually checking the root cause of the fault is large, the influence range of the fault is not easy to determine, the fault processing efficiency is reduced, and the normal operation of an application system is influenced.
The inventor considers that when the fault of the application system occurs, if the fault detection method can automatically distinguish which are the truly faulty software and hardware and which are not truly faulty software and hardware, and the fault alarm information is aggregated in a targeted manner, the fault prompt method can not only realize the prompt of the fault of the application system, but also avoid the occurrence of alarm storm.
In view of this, the present application provides a fault alarm processing method, which can implement aggregation of fault alarm information according to the fault alarm information and the association relation between software and hardware in the application system, so as to avoid occurrence of alarm storm and confusion of the fault alarm information.
The execution subject of the present application is an electronic device, such as a cell phone, a computer, a server, etc. The electronic device may be a physical device in the application system, or may be an electronic device independent of the application system, which is not limited in this application.
The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 2 is a flow chart of a first fault alarm processing method provided in the present application, as shown in fig. 2, where the method includes:
s101, acquiring first fault warning information of a first object of an application system.
The architecture and service type of the application system are not limited, and may be, for example, a bank information system.
The present application defines specific types of the first object and the second object mentioned below, and may be, for example, a network device, a physical device, or software deployed on a physical device.
The first fault alert information of the first object may include, for example, an identifier of the first object, and, for example, when the first object is a network device or a physical device, the identifier of the first object may be, for example, a Serial Number (SN) of the first object or a preset Number of the first object; if the first object is software deployed on the physical device, the identifier of the first object may be, for example, a preset number of the first object, and if the first object is an operating system deployed on the physical device, the identifier of the first object may be, for example, an IP address of the first object. The present application is not limited to the specific value of the number of the first object preset above. The present application does not limit whether the first fault alert information includes other content in addition to the identifier of the first object.
In one possible implementation manner, the electronic device may monitor, for example, a running condition of a first object in the application system, and if the first object runs abnormally, the electronic device obtains first fault alarm information of the first object in the application system. The electronic device may acquire the first fault warning information sent by the first object, or may generate the first fault warning information based on monitoring the first object, which is not limited in this application. The present application is not limited to a specific implementation manner of the electronic device to monitor the running condition of the first object in the application system, and may refer to the prior art specifically.
In another possible implementation, the electronic device may obtain the first fault alert information sent by another electronic device (e.g., a server, a computer, etc.). For example, other electronic devices may monitor the failure alarm condition of the application system, and after acquiring the first failure alarm information sent by the first object, the other electronic devices send the first failure alarm information to the electronic device.
S102, determining whether at least one network topology branch of the first object exists according to the first fault alarm information and the network topology structure of the application system.
The present application does not limit the hierarchy included in the network topology of the application system and the association relationship of the objects of each hierarchy. Fig. 3 is a schematic diagram of a network topology of an application system provided in the present application, and as shown in fig. 3, the network topology of the application system may include, for example, a network layer, a device layer, and a software layer. The network layer comprises network equipment, the equipment layer comprises physical equipment, and the software layer comprises software respectively deployed on the physical equipment. Wherein any one of the network devices in the network layer is connected with at least one physical device in the device layer; at least one piece of software is deployed on any physical device in the device layer. Alternatively, there may be an association between objects within a hierarchy. For example, with continued reference to fig. 3, the software 7 needs to call the software 4 to implement its normal operation, and then an association exists between the software 7 and the software 4.
Each of the network topology branches includes: and the second object is positioned at a level above the level where the first object is positioned and has a network topological relation with the first object.
The network topology relationship exists, namely the association relationship exists. The association may be a direct association or an indirect association. By way of example, with continued reference to fig. 3, network device a in the network layer and physical device a and physical device B of the device layer have a network topology relationship; the physical equipment A and the software 1 and the software 2 have network topological relation; the network device a has a network topology relationship through the physical device a and the software 1, 2, and has a network topology relationship through the physical device B and the software 3, 4, 5.
It should be appreciated that there are superior and inferior relationships between the levels of the network topology of the application system. Wherein the normal operation of the next level needs to depend on the previous level. Illustratively, with continued reference to fig. 3, the application system has a network layer, a device layer, and a software layer in order from top to bottom. Wherein, the normal operation of the physical equipment in the equipment layer is required to depend on the normal operation of the network equipment in the network layer; and the normal operation of the software layer depends on the normal operation of the physical equipment of the equipment layer where the software layer is deployed.
In this step, the electronic device may determine, according to the first failure alarm information, a first object that fails, and further, the electronic device may determine, according to the network topology structure of the application system and the known first object that determines that the failure occurs, whether at least one network topology branch of the first object exists.
If so, the step S103 is performed, where the first object triggers the first fault alert information, possibly due to a fault of the second object in a hierarchy above the hierarchy where the first object is located.
S103, determining whether second fault alarm information of a second object in the network topology branch exists.
The content included in the second fault warning information may refer to the first fault warning information, which is not described herein.
For example, the electronic device may record all fault alert information, i.e. if the second object fails, the electronic device will acquire and record the second fault alert information of the second object. In this step, the electronic device determines whether second fault alert information of the second object in the network topology branch already exists, and further determines whether the first object triggers the first fault alert information due to a fault of the second object.
If the first object exists, the first object triggers the first fault alarm information to be caused by the fault of the second object, that is, the first object does not fail in practice, and then step S104 is executed.
S104, aggregating the first fault alarm information with the second fault alarm information of the second object in the network topology branch to obtain aggregated fault alarm information.
The aggregating the first fault alarm information with the second fault alarm information of the second object in the network topology branch to obtain aggregated fault alarm information may be, for example, that the electronic device only retains the second fault alarm information but not retains the first fault alarm information in the aggregated fault alarm information, so as to obtain aggregated fault alarm information.
Or the electronic equipment adds the identifiers in the first fault alarm information and the second fault alarm information respectively, and then obtains the aggregated fault alarm information according to the first fault alarm information and the second fault alarm information after the identifiers are added. The identifier here may be, for example, an identifier for characterizing whether or not it is the root cause of the fault. For example, if a certain second object in the network topology branch is located at the highest level in the network topology branch, an identifier indicating that the second object is a root cause is added to the second fault alarm information corresponding to the second object, the remaining second objects, and the first object is added with an identifier that is not a root cause. The present application is not limited to the specific representation of the above-identified identifiers.
The electronic device correlates the first fault alert information with a manner in which the second fault alert information for the second object in the network topology branch is aggregated and a number of tiers present in the network topology branch. For example, if a hierarchical second object exists in the network topology branch, the electronic device aggregates the first fault alarm information into second fault alarm information of the second device, and obtains aggregated fault alarm information. If a plurality of levels of second objects exist in the network topology branch, the electronic device may aggregate the first fault alarm information into the fault alarm information of the highest level of second objects, for example, to obtain aggregated fault alarm information.
In this embodiment, the electronic device determines, according to the first fault alert information and the network topology of the application system, whether there is a second object that is above the level where the first object is located and has a network topology relationship with the first object; if so, determining whether second fault warning information of a second object in the network topology branch exists; and after determining that the first fault alarm information exists, aggregating the first fault alarm information with the second fault alarm information of the second object in the network topology branch to obtain the aggregated fault alarm information. By the method, the electronic equipment can determine the reason for causing the failure of the first object according to the network topology structure of the application system, and aggregate the first failure alarm information and the second failure alarm information of the second object in the network topology branch according to the reason, so that the possibility of causing alarm storm is reduced, the failure alarm information is simplified, and further, an operation and maintenance person can quickly and accurately know the reason for causing the failure of the application system, and the workload of the staff is reduced.
Optionally, in the foregoing embodiment, after determining whether at least one network topology branch of the first object exists, if the at least one network topology branch does not exist, it indicates that the first electronic device may not trigger the first fault alert information to be caused by a fault of a second object in a hierarchy above the hierarchy where the first object exists. At this time, in one possible implementation manner, the electronic device may directly generate the aggregated fault alert information only according to the first fault alert information. For example, the aggregated fault alert information may include only the first fault alert information.
In another possible implementation, the electronic device may determine whether historical first fault alert information for the first object already exists. If the first fault alarm information exists, the first fault alarm information of the first object is recorded, and the subsequent processing mode of the electronic equipment is related to the actual application scene.
For example, if the electronic device only records the first fault alarm information of the first object, which is obtained by history and has not eliminated the fault, then the subsequent electronic device may directly aggregate the historical first fault alarm information with the first fault alarm information, to obtain the aggregated fault alarm information.
If no matter whether the fault is eliminated, the electronic equipment records all the first fault alarm information of the first object obtained in a history way, which indicates that the electronic equipment does not determine whether the first fault alarm information of the first object is triggered by the fault at present, then the electronic equipment can aggregate the historical first fault alarm information and the first fault alarm information, wherein the time difference between the historical first fault alarm information and the generation time of the first fault alarm information is smaller than or equal to the preset time length, and the aggregated fault alarm information is obtained. The aggregation may be, for example, that any first fault alarm information is reserved in the aggregated fault alarm information. The specific value of the preset time period is not limited in the application, and can be determined by a person skilled in the art according to actual situations.
For example, the generation time of the fault alarm information may be further included in the fault alarm information. The electronic device can determine whether the first fault alarm information and the historical first fault alarm information need to be aggregated according to the generation time, and the aggregated fault alarm information is obtained.
Because the duration of a fault is generally short, the electronic device can distinguish the first fault alarm information of the first object which has been subjected to fault clearing and is generated historically from the first fault alarm information of the first object which has not been subjected to fault clearing. Therefore, the possibility of error aggregation of the fault alarm information can be reduced, and the aggregation accuracy is improved.
In the following, how the electronic device aggregates the first fault alert information with the second fault alert information of the second object in the network topology branch when there are at least two hierarchical levels of the second object in the network topology branch, i.e. step S104 in the above embodiment is explained. Fig. 4 is a flow chart of a second fault alarm processing method provided in the present application, as shown in fig. 4, step S104 may include:
S201, determining a third device located at the highest hierarchy from the second objects of at least two hierarchies.
In this step, if there are at least two levels of second objects in the network topology branch, the electronic device first determines a third device located at a highest level from the at least two levels of second objects, so as to implement aggregation of alarm information. For example, the electronic device may determine a third device located at a highest hierarchy from the second objects of at least two hierarchies according to a network topology of the application system.
S202, aggregating the existing fault alarm information of the other second objects and the first fault alarm information into third fault alarm information of third equipment to obtain aggregated fault alarm information.
Because the third object, the second object and the first object have topological connection relations, and the third object is located in the highest hierarchy in the network topological branch, the first object triggers the first fault alarm information, and the second object triggers the second fault alarm information, which are all caused by the fault of the third object. Therefore, in this step, the electronic device aggregates the existing fault alarm information of the remaining second objects and the first fault alarm information into the third fault alarm information of the third device, so as to obtain the aggregated fault alarm information. For example, the aggregated fault alert information may include only the third fault alert information, and not the first fault alert information and the second fault alert information.
The electronic equipment aggregates the existing fault alarm information of the other second objects and the first fault alarm information into third fault alarm information of third equipment, and the mode of acquiring the aggregated fault alarm information is related to the actual application scene. If the electronic device only retains the fault alarm information of the object whose fault is not eliminated, it indicates that the fault of the third object indicated by the currently recorded third fault alarm information is not eliminated, that is, the reason that the first object and the remaining second objects trigger the fault alarm information is that the fault of the third object results in, the electronic device may not distinguish the fault alarm information of the remaining second objects that exist, and the first fault alarm information is aggregated into the third fault alarm information, so as to obtain the aggregated fault alarm information.
For example, with continued reference to fig. 3, taking the first object as software 1 as an example, there are two levels of second objects, i.e. a device layer and a network layer, in a network topology branch of an application system, i.e. a physical device a and a network device a, respectively. In this way, the electronic device may determine, according to the network topology of the application system, that the network device a is the third device located at the highest hierarchy from the second objects of the two hierarchies. Subsequently, the electronic device aggregates the first fault alarm information of the software 1 and the second fault alarm information of the physical device a to the third fault alarm information of the network device a to obtain aggregated fault alarm information, where the aggregated fault alarm information only includes the third fault alarm information of the network device a, and does not include the first fault alarm information and the second fault alarm information.
If no matter whether the fault is eliminated, the electronic device records the fault alarm information of all the objects obtained in a history, which indicates that the electronic device does not determine whether the third fault alarm information of the third object is triggered by the fault at present, then the electronic device can aggregate the fault alarm information of the other second objects existing in the past and the fault alarm information of which the time difference between the generation time of the first fault alarm information and the generation time of the third fault alarm information is less than or equal to the preset time length into the third fault alarm information to obtain the aggregated fault alarm information. The technical effects are similar to those of the above embodiment that the first fault alarm information is judged whether to be triggered by the current fault based on the fault alarm information generating time, and will not be described herein.
By the implementation mode, the electronic equipment can distinguish the current fault alarm from the previous fault alarm of the history record, and the first fault alarm information and the second fault alarm information are prevented from being aggregated into the third fault alarm information which is not caused by the fault in the history record. For example, in some scenarios, the electronic device may record all fault alarm information generated in the retention history, and by adopting the above manner, the possibility of error aggregation of the fault alarm information in the scenario may be reduced, and the aggregation accuracy may be improved.
Optionally, after acquiring the aggregated fault alarm information, the electronic device may further determine a root cause of the fault alarm according to the aggregated fault alarm information.
And the electronic equipment determines a method of the root cause of the fault alarm according to the aggregated fault alarm information and correlates the content included in the fault alarm information. For example, if the aggregated fault alert information only includes the fault alert information of the object at the highest level in the network topology branch, the electronic device determines the object at the highest level as the root cause of the fault alert.
If the aggregated fault alarm information includes the first object and the objects in all network topology branches of the first object, the object of the highest hierarchy in the network topology branches is marked. The electronic device can determine from the tag that the highest level object is the root cause of the fault alert.
Optionally, the electronic device may further determine, according to the first fault alert information and a network topology structure of the application system, a hierarchy in which the first object is located; if the hierarchy of the first object cannot be determined, outputting prompt information of abnormality of the fault alarm information.
The method for outputting the prompt information of the abnormality of the fault alarm information by the electronic equipment is not limited. For example, the electronic device may output the prompt information of the abnormality of the fault alarm information by means of voice output and/or text output.
If the electronic device cannot determine the hierarchy level of the first object, it indicates that the first fault alarm information or the network topology of the application system is abnormal, for example, the network topology of the application system may miss the first object corresponding to the first fault alarm information, or the identifier of the first object included in the first fault alarm information is inconsistent with the identifier of the first object in the network topology. Therefore, by the mode, the electronic equipment can prompt the abnormality of the fault alarm information and the abnormality of the network topology structure in time, so that operation and maintenance personnel can know the abnormality of the fault alarm information in time and take corresponding measures in time.
With respect to the above-mentioned acquisition of the network topology, in one possible implementation manner, the electronic device may acquire the network topology input by the user. For example, the electronic device has a user operation interface, and the electronic device obtains a network topology structure input by a user through the user operation interface.
In another possible implementation manner, the electronic device may obtain configuration information of devices in the application system; and then, according to the configuration information of the equipment, correlating the objects in the application system to obtain the network topology structure.
The devices in the application system may be, for example, physical devices and/or network devices in the application system. The configuration information of the device characterizes the network topology of the object, the specific content of which is related to the type of the device. For example, when the device is a physical device, the configuration information may include, for example, an identification of the physical device; and/or an identification of software deployed by the physical device, such as an identification of an operating system deployed by the device, and an identification of a software module of an application system deployed under the operating system; and/or including an identification of the network device to which the physical device is connected; and/or, an identification of software invoked by software deployed by the physical device. When the device is a network device, the configuration information may include, for example, an identification of the physical device to which the network device is connected.
The electronic device may obtain configuration information of all physical devices in the application system, and then associate objects in the application system according to the configuration information of all physical devices, so as to obtain a network topology structure in the application system.
Or the electronic device can acquire the configuration information of all the physical devices in the application system and the configuration information of all the network devices, and then associate the objects in the application system according to the configuration information to obtain the network topology structure.
By the method, the electronic equipment can automatically generate the network topology structure according to the configuration information of the equipment in the application system, the network topology structure is not required to be acquired in a manual input mode, and the network topology structure acquisition efficiency is improved. In addition, the method can generate the network topology structure according to the configuration information of the equipment actually included in the application system, and the accuracy of the generated network topology structure can be improved.
Optionally, if the first object is software in a software layer in the application system, the electronic device may execute the following steps after obtaining the first fault alert information of the first object of the application system:
s301, determining whether the first object has an associated node in a software layer.
The association node here may be any piece of software outside the first object in the software layer, and the normal operation of the first object needs to depend on the normal operation of the association node.
If so, the operation of the first object is affected by the associated node, and step S302 is performed.
If not, the operation of the first object is not affected by the association node, and step S102 in the above embodiment is performed.
S302, determining whether fourth fault alarm information of the associated node exists.
If so, it is indicated that the first object triggers the first fault alert information may be caused by the fault of its associated node, so that the root cause of the fault needs to be determined based on its associated node, and the associated node is determined as the first object, and step S102 in the above embodiment is performed.
Through the implementation manner, when the first object is the software in the software layer in the application system, the electronic equipment can further aggregate the fault alarm information in the fault monitoring alarm process of the application system by combining the association condition of other software, so that the application scene of the fault alarm processing method provided by the application is enlarged.
Taking an example that an application system adopts a network topology structure shown in fig. 3 as an example, the fault alarm processing method provided by the application is described. Fig. 5 is a flow chart of a third fault alarm processing method provided in the present application, as shown in fig. 5, the fault alarm processing method may include the following steps:
s401, acquiring first fault warning information of a first object of an application system.
S402, determining a hierarchy level of the first object according to the first fault alarm information and the network topology structure of the application system so as to determine whether at least one network topology branch of the first object exists.
If the level of the first object is the network layer, the first object is a network device, and the first object is at the highest level of the network topology, and if the first object does not have a network topology branch, step S403 may be executed.
If the hierarchy at which the first object is located is the device hierarchy, the first object is a physical device, and the first object is located at the second hierarchy of the network topology, and at least one network topology branch exists in the first object, step S406 may be executed.
If the level of the first object is a software layer, the first object is software, the first object is located at the third level of the network topology, and at least one network topology branch exists in the first object, step S408 may be executed.
If the level of the first object cannot be determined, step S415 is performed.
S403, determining whether historical first fault warning information of the first object exists.
If so, step S404 is performed.
If not, step S405 is performed.
S404, aggregating the historical first fault alarm information with the time difference of the generation time of the first fault alarm information being smaller than or equal to the preset time length with the first fault alarm information to obtain aggregated fault alarm information.
S405, ending the fault alarm information processing.
S406, determining whether second fault alarm information of a second object in the network topology branch exists;
if so, step S407 is performed.
If not, step S403 is performed.
S407, aggregating the first fault alarm information with the second fault alarm information of the second object in the network topology branch to obtain aggregated fault alarm information.
S408, determining whether the first object has an associated node.
If so, step S409 is performed.
If not, step S411 is performed.
S409, determining whether fourth fault warning information of the associated node exists.
If so, the associated node is used as a first object, and step S410 is executed
If not, step S411 is performed.
S410, acquiring a network topology branch of the first object according to the fourth fault alarm information and the network topology structure of the application system.
S411, according to the network topology branch of the first object, determining whether third fault alarm information of a third object located in the network layer in the network topology branch exists.
If so, step S412 is performed.
If not, step S413 is performed.
S412, aggregating the existing fault alarm information of the other second objects and the first fault alarm information into third fault alarm information of a third object to obtain aggregated fault alarm information.
S413, determining whether there is second fault alert information of the second object at the device layer remaining in the network topology branch.
If so, step S414 is performed.
If not, step S403 is performed.
S414. And aggregating the first fault alarm information into second fault alarm information of the second object to obtain aggregated fault alarm information.
S415, outputting prompt information of abnormality of the fault alarm information.
In this embodiment, the application system is taken as an example of the network topology structure shown in fig. 3, and in this embodiment, the electronic device determines the hierarchy level of the first object according to the first fault alarm information of the first object and the network topology structure, so as to determine the network topology branch of the first object. Then, the electronic device adopts corresponding methods for the first objects at different levels to respectively judge whether the fault alarm information of the first objects needs to be aggregated or not, and the aggregation processing is carried out on the fault alarm information after the aggregation is determined. Through the method, the electronic equipment can realize aggregation of the fault alarm information when the application system generates fault alarm, reduce the occurrence possibility of alarm storm and further avoid the occurrence of fault alarm information confusion. By the method, labor cost can be reduced, and efficiency of determining root causes of faults is improved.
Fig. 6 is a schematic structural diagram of a fault alarm processing device provided in the present application, as shown in fig. 6, where the task processing device includes: the device comprises an acquisition module 11, a first determination module 12, a second determination module 13 and an aggregation module 14. Optionally, the apparatus may further comprise the following modules: a third determination module 15.
An obtaining module 11, configured to obtain first fault alert information of a first object of an application system;
a first determining module 12, configured to determine whether at least one network topology branch of the first object exists according to the first fault alert information and a network topology structure of the application system; each of the network topology branches includes: the second object is positioned at a level above the level where the first object is positioned and has a network topological relation with the first object;
a second determining module 13, configured to determine whether second fault alert information of the second object in the network topology branch already exists, if so;
and the aggregation module 14 is configured to aggregate the first fault alarm information with the second fault alarm information of the second object in the network topology branch if the first fault alarm information exists, so as to obtain aggregated fault alarm information.
Optionally, the aggregation module 14 is specifically configured to determine, if there are at least two levels of second objects in the network topology branch, a third device located at a highest level from the at least two levels of second objects; and aggregating the existing fault alarm information of the rest of the second objects and the first fault alarm information into third fault alarm information of the third equipment to obtain aggregated fault alarm information.
For example, the aggregation module 14 is specifically configured to aggregate the existing fault alert information of the remaining second objects and the fault alert information in the first fault alert information, where the time difference between the time of generation of the fault alert information and the time of generation of the third fault alert information is less than or equal to a preset duration, into the third fault alert information, so as to obtain the aggregated fault alert information.
Optionally, after determining whether at least one network topology branch of the first object exists, the first determining module 12 is configured to determine whether historical first fault alert information of the first object exists if the at least one network topology branch of the first object does not exist; if so, aggregating the historical first fault alarm information with the time difference less than or equal to the preset time length with the first fault alarm information to obtain aggregated fault alarm information.
Optionally, the third determining module 15 is configured to determine a root cause of the fault alarm according to the aggregated fault alarm information.
Optionally, the first determining module 12 is further configured to determine, according to the first fault alert information and the network topology structure of the application system, a hierarchy in which the first object is located; if the hierarchy of the first object cannot be determined, outputting prompt information of abnormality of the fault alarm information.
Optionally, the first obtaining module 11 is further configured to obtain configuration information of a device in the application system; and according to the configuration information of the equipment, correlating the objects in the application system to obtain the network topology structure.
The task processing device provided in the embodiment of the present application may execute the fault alarm processing method in the embodiment of the method, and its implementation principle and technical effects are similar, and are not described herein again. The division of the modules shown in fig. 6 is merely an illustration, and the present application does not limit the division of the modules and the naming of the modules.
Fig. 7 is a schematic structural diagram of an electronic device 700 provided in the present application. As shown in fig. 7, the electronic device 700 may include: at least one processor 701, a memory 702.
A memory 702 for storing a program. In particular, the program may include program code including computer-operating instructions.
The memory 702 may comprise high-speed RAM memory or may further comprise non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 701 is configured to execute computer-executable instructions stored in the memory 702 to implement the fault alert processing method described in the foregoing method embodiment. The processor 701 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more integrated circuits configured to implement embodiments of the present application.
The electronic device 700 may also include a communication interface 703 so that communication interaction with external devices, such as user terminals (e.g., computers, tablets) may be performed through the communication interface 703. In a specific implementation, if the communication interface 703, the memory 702, and the processor 701 are implemented independently, the communication interface 703, the memory 702, and the processor 701 may be connected to each other and perform communication with each other through buses. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (Peripheral Component, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. Buses may be divided into address buses, data buses, control buses, etc., but do not represent only one bus or one type of bus.
Alternatively, in a specific implementation, if the communication interface 703, the memory 702, and the processor 701 are implemented on a single chip, the communication interface 703, the memory 702, and the processor 701 may complete communication through internal interfaces.
The present application also provides a computer-readable storage medium, which may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, specifically, the computer readable storage medium stores program instructions for the fault alert processing method in the above embodiment.
The present application also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the electronic device may read the execution instructions from the readable storage medium, and execution of the execution instructions by the at least one processor causes the electronic device to implement the fault alert processing methods provided by the various embodiments described above.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A fault alert processing method, the method comprising:
acquiring first fault warning information of a first object of an application system;
determining whether at least one network topology branch of the first object exists according to the first fault alarm information and the network topology structure of the application system; each of the network topology branches includes: the second object is positioned at a level above the level where the first object is positioned and has a network topological relation with the first object;
if so, determining whether second fault warning information of the second object in the network topology branch exists;
and if the first fault alarm information exists, aggregating the first fault alarm information with the second fault alarm information of the second object in the network topology branch to obtain the aggregated fault alarm information.
2. The method of claim 1, wherein the aggregating the first fault alert information with the second fault alert information for the second object in the network topology branch comprises:
If at least two levels of second objects exist in the network topology branch, determining a third object positioned at the highest level from the at least two levels of second objects;
and aggregating the existing fault alarm information of the rest of the second objects and the first fault alarm information into third fault alarm information of the third object to obtain aggregated fault alarm information.
3. The method according to claim 2, wherein aggregating the existing fault alert information of the remaining second objects and the first fault alert information into the fault alert information of the third object to obtain the aggregated fault alert information includes:
and aggregating the existing fault alarm information of the rest second objects and the fault alarm information of which the time difference between the generation time of the first fault alarm information and the generation time of the third fault alarm information is smaller than or equal to the preset time length into the third fault alarm information to obtain the aggregated fault alarm information.
4. A method according to claim 3, wherein after said determining whether at least one network topology branch of the first object exists, the method further comprises:
If not, determining whether historical first fault warning information of the first object exists;
if so, aggregating the historical first fault alarm information with the time difference less than or equal to the preset time length with the first fault alarm information to obtain aggregated fault alarm information.
5. The method according to any one of claims 1-4, further comprising:
and determining the root cause of the fault alarm according to the aggregated fault alarm information.
6. The method according to any one of claims 1-4, further comprising:
determining the hierarchy of the first object according to the first fault alarm information and the network topology structure of the application system;
if the hierarchy of the first object cannot be determined, outputting prompt information of abnormality of the fault alarm information.
7. The method according to any one of claims 1-4, further comprising:
acquiring configuration information of equipment in the application system;
and according to the configuration information of the equipment, correlating the objects in the application system to obtain the network topology structure.
8. A fault alert processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring first fault alarm information of a first object of the application system;
the first determining module is used for determining whether at least one network topology branch of the first object exists according to the first fault alarm information and the network topology structure of the application system; each of the network topology branches includes: the second object is positioned at a level above the level where the first object is positioned and has a network topological relation with the first object;
a second determining module, configured to determine whether second fault alert information of the second object in the network topology branch already exists, if so;
and the aggregation module is used for aggregating the first fault alarm information with the second fault alarm information of the second object in the network topology branch if the first fault alarm information exists, so as to obtain the aggregated fault alarm information.
9. An electronic device, the electronic device comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
The processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1-7.
10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are for implementing the fault alert processing method according to any one of claims 1 to 7.
CN202310671791.XA 2023-06-07 2023-06-07 Fault alarm processing method and device, electronic equipment and storage medium Pending CN116545835A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310671791.XA CN116545835A (en) 2023-06-07 2023-06-07 Fault alarm processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310671791.XA CN116545835A (en) 2023-06-07 2023-06-07 Fault alarm processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116545835A true CN116545835A (en) 2023-08-04

Family

ID=87456115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310671791.XA Pending CN116545835A (en) 2023-06-07 2023-06-07 Fault alarm processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116545835A (en)

Similar Documents

Publication Publication Date Title
CN110661659B (en) Alarm method, device and system and electronic equipment
CN108845912B (en) Service interface calls the alarm method of failure and calculates equipment
CN110445650B (en) Detection alarm method, equipment and server
CN110727556A (en) BMC health state monitoring method, system, terminal and storage medium
CN111130938B (en) Index acquisition method and device, electronic equipment and computer readable storage medium
CN112737800B (en) Service node fault positioning method, call chain generating method and server
CN113067875B (en) Access method, device and equipment based on dynamic flow control of micro-service gateway
CN112242938B (en) Detection method, device, electronic equipment and computer readable storage medium
CN108073499B (en) Application program testing method and device
CN110674149B (en) Service data processing method and device, computer equipment and storage medium
CN114363151A (en) Fault detection method and device, electronic equipment and storage medium
CN115185777A (en) Abnormity detection method and device, readable storage medium and electronic equipment
CN111221775A (en) Processor, cache processing method and electronic equipment
CN112068935A (en) Method, device and equipment for monitoring deployment of kubernets program
CN114500249B (en) Root cause positioning method and device
CN116545835A (en) Fault alarm processing method and device, electronic equipment and storage medium
CN115756888A (en) Data processing method, processor, device and storage medium
CN113064765B (en) Node exception handling method, device, electronic equipment and machine-readable storage medium
CN114860432A (en) Method and device for determining information of memory fault
CN115934453A (en) Troubleshooting method, troubleshooting device and storage medium
CN107919980B (en) Evaluation method and device for clustered system
CN111694715A (en) Abnormity warning method, device, equipment and machine readable storage medium
CN110750418B (en) Information processing method, electronic equipment and information processing system
CN114124758B (en) Flow monitoring method and device
CN116263696A (en) Machine room task notification processing method, device and task notification processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination