CN115913887A - Fault processing method, device and storage medium - Google Patents

Fault processing method, device and storage medium Download PDF

Info

Publication number
CN115913887A
CN115913887A CN202211435381.7A CN202211435381A CN115913887A CN 115913887 A CN115913887 A CN 115913887A CN 202211435381 A CN202211435381 A CN 202211435381A CN 115913887 A CN115913887 A CN 115913887A
Authority
CN
China
Prior art keywords
equipment
service state
type
state weight
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211435381.7A
Other languages
Chinese (zh)
Inventor
曾贵云
梁日惠
吕炜
谢小舜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202211435381.7A priority Critical patent/CN115913887A/en
Publication of CN115913887A publication Critical patent/CN115913887A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The invention discloses a fault processing method, a fault processing device and a storage medium. Wherein, the method comprises the following steps: receiving fault information, and determining a target device with a fault based on the fault information; acquiring first equipment with a direct calling relationship with target equipment and second equipment with an indirect calling relationship with the target equipment; carrying out fault processing on target equipment; respectively determining a service state weight of the target equipment after fault processing, a service state weight of the first type of equipment and a service state weight of the second type of equipment, wherein the service state weights are used for correspondingly representing the priority of the equipment for fault processing; and respectively determining whether the corresponding equipment is normal or not based on the service state weight of the target equipment after fault processing, the service state weight of the first type of equipment and the service state weight of the second type of equipment. The invention solves the technical problem that the key fault can not be judged in a short time when the system fault occurs in the prior art.

Description

Fault processing method, device and storage medium
Technical Field
The application relates to the technical field of financial science and technology, in particular to a fault processing method, a fault processing device and a storage medium.
Background
The monitoring system is adopted for real-time monitoring, so that the stability of the system can be guaranteed, and the usability of the system is improved to a great extent. However, when the system fails, the monitoring system sends a series of alarms, when the system cannot self-heal, manual intervention is needed for analysis, and related technical personnel face numerous alarms, face the problem of no next hand, cannot judge the key of the problem in a short time, and prolong the time for recovering the system to a certain extent. If the person involved in the problem is inexperienced, it takes longer to analyze and solve the problem.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a fault processing method, a fault processing device and a storage medium, which are used for at least solving the technical problem that the fault key cannot be judged in a short time when a system fault occurs in the prior art.
According to an aspect of an embodiment of the present invention, there is provided a fault handling method, including: receiving fault information, and determining a target device with a fault based on the fault information; acquiring first equipment with a direct calling relation with the target equipment and second equipment with an indirect calling relation with the target equipment; carrying out fault processing on the target equipment; respectively determining a service state weight of a target device after fault processing, a service state weight of the first type of device and a service state weight of the second type of device, wherein the service state weights are used for correspondingly representing the priority of the device for fault processing; and respectively determining whether the corresponding equipment is normal or not based on the service state weight of the target equipment after fault processing, the service state weight of the first type of equipment and the service state weight of the second type of equipment.
Optionally, before the obtaining of the first type device having the direct call relationship with the target device and the second type device having the indirect call relationship with the target device, the method further includes: after receiving the fault information, obtaining state information of the device, wherein the device includes: the target device and other devices except the target device; determining a call relationship between the other device and the target device based on the state information, wherein the call relationship includes: the direct call relation and the indirect call relation; determining the category of the other device based on the calling relationship, wherein the other device includes: the first type of device and the second type of device.
Optionally, after the obtaining of the first type device having the direct call relationship with the target device and the second type device having the indirect call relationship with the target device, the method further includes: determining the target device as a first type initial service state weight; determining the first class of equipment as a second class of initial service state weight; and determining the second type of equipment as a third type of initial service state weight, wherein the initial service state weight is used for correspondingly representing the priority of the equipment for fault processing.
Optionally, the determining the service state weight of the target device after the fault processing, the service state weight of the first type of device, and the service state weight of the second type of device respectively includes: detecting whether the target equipment is recovered to be normal or not; if the target equipment returns to normal, determining that the target equipment is in a normal service state weight, and detecting whether the first equipment and the second equipment are in a normal operation state; and if the first-class equipment and the second-class equipment are in the normal operation state, determining that the first-class equipment and the second-class equipment are both the normal service state weight.
Optionally, the method further includes: and if the target equipment does not recover to be normal, determining that the target equipment is a first class service state weight, and determining that the first class equipment is a second class service state weight, and the second class equipment is a third class service state weight.
Optionally, the detecting whether the target device is recovered to normal includes: detecting the target equipment by adopting detection equipment within a preset time period according to a preset time interval to obtain a detection result; and judging whether the target equipment is recovered to be normal or not based on the detection result.
Optionally, the determining, based on the service state weight of the target device after the fault processing, the service state weight of the first type of device, and the service state weight of the second type of device, whether corresponding devices are normal respectively includes: judging whether the service state weight of the target equipment, the service state weight of the first type of equipment and the service state weight of the second type of equipment are normal service state weights or not; and if the service state weight of the target equipment, the service state weight of the first type of equipment and the service state weight of the second type of equipment are normal service state weights, determining that the equipment is recovered to be normal.
According to another aspect of the embodiments of the present invention, there is also provided a fault handling apparatus, including: the receiving module is used for receiving the fault information and determining the target equipment with the fault based on the fault information; the acquisition module is used for acquiring a first type of equipment which has a direct calling relationship with the target equipment and a second type of equipment which has an indirect calling relationship with the target equipment; the processing module is used for carrying out fault processing on the target equipment; a first determining module, configured to determine a service state weight of a target device after fault processing, a service state weight of the first class of device, and a service state weight of the second class of device, where the service state weights are used to correspond to priorities for characterizing device fault processing; and a second determining module, configured to determine whether corresponding devices are normal respectively based on the service state weight of the target device after fault processing, the service state weight of the first type of device, and the service state weight of the second type of device.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor and execute any one of the above fault handling methods.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including one or more processors and a memory, where the memory is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to implement any one of the above fault handling methods.
In the embodiment of the invention, the fault information is received, and the target equipment with the fault is determined based on the fault information; acquiring first equipment with a direct calling relation with the target equipment and second equipment with an indirect calling relation with the target equipment; carrying out fault processing on the target equipment; respectively determining a service state weight of a target device after fault processing, a service state weight of the first type of device and a service state weight of the second type of device, wherein the service state weights are used for correspondingly representing the priority of the device for fault processing; whether the corresponding equipment is normal or not is respectively determined based on the service state weight of the target equipment after fault processing, the service state weight of the first type of equipment and the service state weight of the second type of equipment, the calling relations between the plurality of equipment and the target equipment are determined according to the dependency relations of the service equipment, and the service state weights of the equipment are determined, so that the technical effect of judging whether the equipment is normal or not according to the service state weights after fault processing is achieved, and the technical problem that fault keys cannot be judged in a short time when system faults occur in the prior art is solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the application, and the description of the exemplary embodiments of the application are intended to be illustrative of the application and are not intended to limit the application. In the drawings:
FIG. 1 is a fault handling method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of an alternative visual fault location system according to an embodiment of the present invention;
FIG. 3 is a flow chart of an alternative visual fault handling method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an alternative fault handling process according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a fault handling apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances in order to facilitate the description of the embodiments of the application herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that relevant information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data that are authorized by the user or sufficiently authorized by various parties. For example, an interface is provided between the system and the relevant user or institution, and before obtaining the relevant information, an obtaining request needs to be sent to the user or institution through the interface, and after receiving the consent information fed back by the user or institution, the relevant information needs to be obtained.
Based on this, the present application intends to provide a solution to the above technical problem, the details of which will be explained in the following embodiments.
Example 1
In accordance with an embodiment of the present invention, there is provided an embodiment of a fault handling method, it should be noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a fault handling method according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:
step S102, receiving fault information, and determining a target device with a fault based on the fault information;
step S104, acquiring a first type device having a direct calling relationship with the target device and a second type device having an indirect calling relationship with the target device;
step S106, carrying out fault processing on the target equipment;
step S108, respectively determining a service state weight of the target device after fault processing, a service state weight of the first type of device, and a service state weight of the second type of device, wherein the service state weights are used for corresponding priorities for representing the devices to perform fault processing;
step S110, based on the service state weight of the target device after the fault processing, the service state weight of the first type device, and the service state weight of the second type device, respectively determining whether the corresponding devices are normal.
In an embodiment of the present invention, an execution main body of the fault handling method provided in the steps S102 to S110 is a fault handling system, and the system is adopted to receive fault information and determine a target device with a fault based on the fault information; acquiring first equipment with a direct calling relation with the target equipment and second equipment with an indirect calling relation with the target equipment; carrying out fault processing on the target equipment; respectively determining a service state weight of a target device after fault processing, a service state weight of the first type of device and a service state weight of the second type of device, wherein the service state weights are used for correspondingly representing the priority of the device for fault processing; and respectively determining whether the corresponding equipment is normal or not based on the service state weight of the target equipment after fault processing, the service state weight of the first type of equipment and the service state weight of the second type of equipment.
It should be noted that, as shown in the schematic structural diagram of the visual fault location system shown in fig. 2, the fault processing system mainly includes six parts, namely, a service monitoring module 002, a service dependency relationship management module 003, a service detection module 004, a service state graph survival time module 005, a service state weight module 006, and a result analysis alarm module 007, and the system is connected to each application service device 001 (service system).
Optionally, each application system 001 (each service device) is connected to the service monitoring module 002, and mainly includes a target system to be monitored.
Optionally, the service monitoring module 002 is connected to each application system 001 and the service dependency relationship management module 003, and is configured to monitor a state of the target system. When the application system breaks down, the service monitoring module monitors the abnormity and sends the abnormal information to the service dependency relationship management module.
Optionally, the service dependency relationship management module 003 is connected to the service monitoring module 002, the service detection module 004, the service state diagram survival time module 005, and the service state weight module 006, and is configured to manage the dependency relationship and the service state of the service, and provide a visual service state interface for the user.
Optionally, the service detection module 004 is connected to the service dependency management module 003, and configured to detect a service at regular time after a fault occurs, and send a detection result to the service dependency management module 003. The working cycle of the module is a time period from the beginning of a fault to the complete recovery of the fault, when the system has a fault, the service dependency relationship management module 003 triggers the service detection module 004 to start entering a working state, when the system fault is completely recovered, the service dependency relationship management module 003 sends a system recovered signal to the service detection module 004, and at the moment, the service detection module stops working. When the system has a fault, after the key fault is solved, whether the system depending on the service is recovered to be normal or not does not need to be confirmed manually, but the latest state of each service is obtained through a timing detection module of the device, and the current condition of each system can be quickly and intuitively known.
Optionally, the service state graph survival time module 005 is connected to the service dependency management module 003, and is mainly used for setting the survival time of the service dependency. For example, when the module is set to 30min, the service dependency relationship management module only displays the latest 30min service dependency relationship, simplifies the service dependency graph, and can embody the latest state of the service. When the service fails, the key problem can be better reflected. Of course, the user can set the survival time of the service state diagram according to the requirement.
Optionally, the service state weight module 006 is connected to the service dependency management module 003, and is mainly used to identify a service fault level. The service mark weight value with the fault is 0, other service weight values directly depending on the fault service are marked as 1, and the service depending on the weight value as 1 is marked as 2. According to the service dependency hierarchy relationship, the service with the weight value of 0 needs to be solved by the system administrator, after the service fault with the weight value of 0 is solved, the service detection module initiates detection, the service dependency relationship graph and the service state weight value table are refreshed again according to the detection result, and the system administrator solves the service with the weight value of 0 according to the latest result until the whole system is completely recovered.
Optionally, the result analysis warning module 007 is connected to the service state weight module 006, integrates the weight information sent by the service state weight module, finds services with weights of 0, 1, and 2, respectively, and sends a warning prompt to a system administrator. The service with the weight value of 0 represents the fault service, the service with the weight value of 1 represents the service dependent on the fault at the first level, the service with the weight value of 2 represents the service dependent on the fault at the second level, and the rest of the weights are similar. The system administrator can do the service according to the alarm resolution weight value of 0.
As an optional embodiment, the service invocation relationship is shown to a system administrator through an intuitive system invocation relationship diagram, and the service fault source and the priority are shown through the service weight. After the service is recovered, the service detection function is used for verifying the system availability (without manual system-by-system confirmation), the visual relation diagram state and the service weight within the set time are updated according to the service calling result, and the related result is notified to a system administrator in an alarm mode, so that the aims of finding the problem, analyzing the problem and solving the problem within the shortest time are achieved.
Optionally, as shown in the schematic flow chart of the visualized fault processing method shown in fig. 3, when a system fails, a service of the system is in an unavailable state, and the service monitoring module finds the failed system and other abnormal systems depending on the failed system; the service dependency relationship management module analyzes the service caller and the callee according to the message sent by the service monitoring module, and displays the service dependency relationship and the service state to the user in the module. The service state weight value module generates each service weight value table according to the information sent by the service dependence management module, the service equipment with the fault is marked as 0, the service equipment directly depending on the fault equipment is marked as 1, the service equipment depending on the equipment with the weight value of 1 is marked as 2, and so on.
Optionally, each service is analyzed and classified according to the weight value table of the service state weight value module, services with weight values of 0, 1 and 2 are listed respectively, and then alarm information is sent to the user. After receiving the alarm information, the system administrator prioritizes the service fault with the weight value of 0 (or the system can automatically process the service fault). And when the system fails, the service detection module starts a working process to detect the service state in real time. And the service dependency relationship management module updates the service state diagram according to the latest service detection result. If all services have been restored, the fault location procedure ends.
In an optional embodiment, before the obtaining a first type device having a direct call relationship with the target device and a second type device having an indirect call relationship with the target device, the method further includes: after receiving the fault information, obtaining state information of the device, wherein the device includes: the target device and other devices except the target device; determining a call relationship between the other device and the target device based on the state information, wherein the call relationship includes: the direct call relation and the indirect call relation; determining the category of the other device based on the calling relationship, wherein the other device includes: the first type of device and the second type of device.
As an alternative embodiment, as shown in the schematic diagram of the fault processing flow shown in fig. 4, assuming that there are 8 ABCDEFG systems, taking the fault caused by "full disk space" of the system a and "network failure" of the system B as an example, the service monitoring module 002 normalizes and runs the monitoring program, monitors the state of each system 001, finds that the system a and the system B are abnormal, and then starts the fault location flow. Each system sends an abnormal signal and a related log to the service monitoring module, and the service monitoring module 002 receives the full state information and the related log of each system. The service dependency relationship management module analyzes the calling relationship of the ABCDEFGH system or device according to the received exception information, and displays the calling relationship and the system state of each system (as shown by 301 in fig. 4). At this time, the module sends a fault signal to the service detection module to prompt the service detection module to start a detection process.
In an optional embodiment, after the obtaining a first type device having a direct call relationship with the target device and a second type device having an indirect call relationship with the target device, the method further includes: determining the target device as a first type initial service state weight; determining the first class of equipment as a second class of initial service state weight; and determining the second type of equipment as a third type of initial service state weight, wherein the initial service state weight is used for correspondingly representing the priority of the equipment for fault processing.
As an optional embodiment, the service detection module receives the fault signal, starts a service detection process, detects the service state of each system in real time, and determines that the system a and the system B are the first-class initial service state weight, the system C, the system D, and the system E are the second-class initial service state weight, and the system F, the system G, and the system H are the second-class initial service state weight.
In an optional embodiment, the determining the service state weight of the target device after the fault processing, the service state weight of the first type of device, and the service state weight of the second type of device respectively includes: detecting whether the target equipment is recovered to be normal or not; if the target equipment returns to normal, determining that the target equipment is in a normal service state weight, and detecting whether the first equipment and the second equipment are in a normal operation state; and if the first class of equipment and the second class of equipment are in the normal running state, determining that the first class of equipment and the second class of equipment are both the normal service state weight.
As an alternative embodiment, the service state of the service dependency management module is updated in real time according to the service detection result (as shown by 301 in fig. 4). If it is detected that the system a has recovered to normal, the service dependency state diagram changes from S301 to S303, and so on, until all the systems have recovered to normal. And the service state weight value module records in a weight value mode according to the relationship graph of the service dependency relationship management module to form intuitive table management.
In an optional embodiment, the method further includes: and if the target equipment does not recover to be normal, determining that the target equipment is a first class service state weight, and determining that the first class equipment is a second class service state weight, and the second class equipment is a third class service state weight.
In an optional embodiment, the detecting whether the target device is normal includes: detecting the target equipment by adopting detection equipment within a preset time period according to a preset time interval to obtain a detection result; and judging whether the target equipment is recovered to be normal or not based on the detection result.
As an alternative embodiment, in the case of a plurality of services, the service dependency relationship may be very complicated and messy, and in some cases, only the state of each system in the latest period (in a preset period) needs to be concerned. In this case, the time may be set in the service state diagram survival time module, and the service dependency relationship management module may display only the service relationship diagram within the set time.
In an optional embodiment, the determining, based on the service state weight of the target device after the fault processing, the service state weight of the first type of device, and the service state weight of the second type of device, whether corresponding devices are normal respectively includes: judging whether the service state weight of the target equipment, the service state weight of the first type of equipment and the service state weight of the second type of equipment are normal service state weights or not; and if the service state weight of the target equipment, the service state weight of the first type of equipment and the service state weight of the second type of equipment are normal service state weights, determining that the equipment is recovered to be normal.
As an alternative embodiment, the service status weight table (shown as 302 in fig. 4) is updated in real time according to the service dependency management module. When the system AB fails, the weight table at this time is the case of the column at time 1.
Optionally, according to the result of the weight value table, an alarm message and an analysis report are sent to a system administrator, which indicates that the system AB with the weight value of 0 fails, the CDE with the weight value of 1 is a directly dependent failure system, and the FGH with the weight value of 2 is an indirectly dependent failure system. When the system administrator receives the alarm information, the problem of the system a is solved first, and after the service detection module detects that the system a is normal, the service dependency relationship management module is triggered to update the service dependency relationship diagram (e.g., S303 in fig. 4). At this time, the service status weight module updates the service status weight table in real time (as listed in S302 in fig. 4 at time 2). The alarm system sends the alarm information to the system administrator again, the system administrator solves the problem of the system B according to the latest information, after the problem is solved, the service relationship graph is all displayed normally, the service right value table is also all displayed as the column at time 3 of-1 (as in S30 in fig. 4), and what needs to be explained is that: 1 is the weight value of the system normal).
Through the steps, the fault positioning analysis efficiency can be improved, the problem dependence relationship can be mastered more clearly and definitely, the root cause problem can be found, the problem with high priority can be solved, and the experience dependence on technicians can be reduced.
Example 2
Fig. 5 is a schematic structural diagram of a fault handling apparatus according to an embodiment of the present application, and as shown in fig. 5, the fault handling apparatus includes: a receiving module 50, an obtaining module 52, a processing module 54, a first determining module 56, and a second determining module 58, wherein:
a receiving module 50, configured to receive fault information and determine a target device with a fault based on the fault information;
an obtaining module 52, configured to obtain a first type device having a direct call relationship with the target device and a second type device having an indirect call relationship with the target device;
a processing module 54, configured to perform fault processing on the target device;
a first determining module 56, configured to determine a service state weight of a target device after fault processing, a service state weight of the first class of device, and a service state weight of the second class of device, where the service state weights are used to correspondingly characterize a priority of fault processing performed by a device;
a second determining module 58, configured to determine whether corresponding devices are normal respectively based on the service state weight of the target device after the fault processing, the service state weight of the first type of device, and the service state weight of the second type of device.
The fault processing device provided by the embodiment of the application determines the target equipment with the fault based on the fault information by receiving the fault information; acquiring first equipment with a direct calling relation with the target equipment and second equipment with an indirect calling relation with the target equipment; carrying out fault processing on the target equipment; respectively determining a service state weight of a target device after fault processing, a service state weight of the first type of device, and a service state weight of the second type of device, wherein the service state weights are used for correspondingly representing the priority of the device for fault processing; whether the corresponding equipment is normal or not is respectively determined based on the service state weight of the target equipment after fault processing, the service state weight of the first type of equipment and the service state weight of the second type of equipment, the calling relations between the plurality of equipment and the target equipment are determined according to the dependency relations of the service equipment, and the service state weights of the equipment are determined, so that the technical effect of judging whether the equipment is normal or not according to the service state weights after fault processing is achieved, and the technical problem that fault keys cannot be judged in a short time when system faults occur in the prior art is solved.
The fault handling apparatus includes a processor and a memory, the receiving module 50, the obtaining module 52, the processing module 54, the first determining module 56, the second determining module 58, and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to implement corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. One or more than one kernel can be set, and the training and predicting speed of the convolutional neural network is accelerated by adjusting kernel parameters.
The memory may include volatile memory in a computer readable medium, random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a computer-readable storage medium on which a program is stored, which, when executed by a processor, implements the above-described fault handling method.
The embodiment of the invention provides a processor, which is used for running a program, wherein the fault processing method is executed when the program runs.
As shown in fig. 6, an embodiment of the present invention provides an electronic device, where the electronic device 10 includes a processor, a memory, and a program stored in the memory and executable on the processor, and the processor executes the program to implement the following steps: receiving fault information, and determining a target device with a fault based on the fault information; acquiring first equipment which has a direct calling relationship with the target equipment and second equipment which has an indirect calling relationship with the target equipment; carrying out fault processing on the target equipment; respectively determining a service state weight of a target device after fault processing, a service state weight of the first type of device and a service state weight of the second type of device, wherein the service state weights are used for correspondingly representing the priority of the device for fault processing; and respectively determining whether the corresponding equipment is normal or not based on the service state weight of the target equipment after fault processing, the service state weight of the first type of equipment and the service state weight of the second type of equipment.
Optionally, the processor executes the program to implement the following steps: after receiving the fault information, obtaining state information of the device, wherein the device includes: the target device and other devices except the target device; determining a call relationship between the other device and the target device based on the state information, wherein the call relationship includes: the direct call relation and the indirect call relation; determining the category of the other device based on the calling relationship, wherein the other device includes: the first type of device and the second type of device.
Optionally, the processor implements the following steps when executing the program: determining the target device as a first type initial service state weight; determining the first class of equipment as a second class of initial service state weight; and determining the second-class equipment as a third-class initial service state weight, wherein the initial service state weight is used for correspondingly representing the priority of fault processing of the equipment.
Optionally, the processor executes the program to implement the following steps: detecting whether the target equipment is recovered to be normal or not; if the target equipment returns to normal, determining that the target equipment is in a normal service state weight, and detecting whether the first equipment and the second equipment are in a normal operation state; and if the first-class equipment and the second-class equipment are in the normal operation state, determining that the first-class equipment and the second-class equipment are both the normal service state weight.
Optionally, the processor executes the program to implement the following steps: and if the target equipment does not recover to be normal, determining that the target equipment is a first class service state weight, and determining that the first class equipment is a second class service state weight, and the second class equipment is a third class service state weight.
Optionally, the processor executes the program to implement the following steps: detecting the target equipment by adopting detection equipment within a preset time period according to a preset time interval to obtain a detection result; and judging whether the target equipment is recovered to be normal or not based on the detection result.
Optionally, the processor executes the program to implement the following steps: judging whether the service state weight of the target equipment, the service state weight of the first type of equipment and the service state weight of the second type of equipment are normal service state weights or not; and if the service state weight of the target equipment, the service state weight of the first type of equipment and the service state weight of the second type of equipment are all normal service state weights, determining that the equipment is recovered to be normal.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: receiving fault information, and determining a target device with a fault based on the fault information; acquiring first equipment with a direct calling relation with the target equipment and second equipment with an indirect calling relation with the target equipment; carrying out fault processing on the target equipment; respectively determining a service state weight of a target device after fault processing, a service state weight of the first type of device and a service state weight of the second type of device, wherein the service state weights are used for correspondingly representing the priority of the device for fault processing; and respectively determining whether the corresponding equipment is normal or not based on the service state weight of the target equipment after fault processing, the service state weight of the first type of equipment and the service state weight of the second type of equipment.
Optionally, when executed on a data processing device, is adapted to perform a procedure for initializing the following method steps: the target device and other devices except the target device; determining a call relationship between the other device and the target device based on the state information, wherein the call relationship includes: the direct call relation and the indirect call relation; determining the category of the other device based on the call relation, wherein the other device includes: the first type of device and the second type of device.
Optionally, the program, when executed on a data processing device, is adapted to perform a procedure for initializing the following method steps: determining the target device as a first type initial service state weight; determining the first class of equipment as a second class of initial service state weight; and determining the second-class equipment as a third-class initial service state weight, wherein the initial service state weight is used for correspondingly representing the priority of fault processing of the equipment.
Optionally, the program, when executed on a data processing device, is adapted to perform a procedure for initializing the following method steps: detecting whether the target equipment is recovered to be normal or not; if the target equipment returns to normal, determining that the target equipment is in a normal service state weight, and detecting whether the first equipment and the second equipment are in a normal operation state; and if the first-class equipment and the second-class equipment are in the normal operation state, determining that the first-class equipment and the second-class equipment are both the normal service state weight.
Optionally, the program, when executed on a data processing device, is adapted to perform a procedure for initializing the following method steps: and if the target equipment does not return to normal, determining that the target equipment is a first class service state weight, and determining that the first class equipment is a second class service state weight, and the second class equipment is a third class service state weight.
Optionally, the program, when executed on a data processing device, is adapted to perform a procedure for initializing the following method steps: detecting the target equipment by adopting detection equipment within a preset time period according to a preset time interval to obtain a detection result; and judging whether the target equipment is recovered to be normal or not based on the detection result.
Optionally, the program, when executed on a data processing device, is adapted to perform a procedure for initializing the following method steps: judging whether the service state weight of the target equipment, the service state weight of the first type of equipment and the service state weight of the second type of equipment are normal service state weights or not; and if the service state weight of the target equipment, the service state weight of the first type of equipment and the service state weight of the second type of equipment are normal service state weights, determining that the equipment is recovered to be normal.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional identical elements in the process, method, article, or apparatus comprising the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A method of fault handling, comprising:
receiving fault information, and determining a target device with a fault based on the fault information;
acquiring first equipment with a direct calling relationship with the target equipment and second equipment with an indirect calling relationship with the target equipment;
performing fault processing on the target equipment;
respectively determining a service state weight of the target equipment after fault processing, a service state weight of the first type of equipment and a service state weight of the second type of equipment, wherein the service state weights are used for correspondingly representing the priority of the equipment for fault processing;
and respectively determining whether the corresponding equipment is normal or not based on the service state weight of the target equipment after fault processing, the service state weight of the first type of equipment and the service state weight of the second type of equipment.
2. The method of claim 1, wherein prior to the obtaining a first class of device having a direct calling relationship with the target device and a second class of device having an indirect calling relationship with the target device, the method further comprises:
after receiving the fault information, obtaining state information of the device, wherein the device comprises: the target device and other devices except the target device;
determining a calling relationship between the other device and the target device based on the state information, wherein the calling relationship comprises: the direct call relationship and the indirect call relationship;
determining a category of the other device based on the call relation, wherein the other device comprises: the first type of device and the second type of device.
3. The method of claim 1, wherein after obtaining the first class of devices having a direct calling relationship with the target device and the second class of devices having an indirect calling relationship with the target device, the method further comprises:
determining the target equipment as a first type initial service state weight;
determining the first type of equipment as a second type of initial service state weight;
and determining the second-class equipment as a third-class initial service state weight, wherein the initial service state weight is used for corresponding to the priority of fault processing of the characterization equipment.
4. The method according to claim 1, wherein the determining the service state weight of the target device after the fault processing, the service state weight of the first type device, and the service state weight of the second type device respectively comprises:
detecting whether the target equipment is recovered to be normal or not;
if the target equipment returns to normal, determining that the target equipment is in a normal service state weight, and detecting whether the first type of equipment and the second type of equipment are in a normal operation state;
and if the first type of equipment and the second type of equipment are in the normal operation state, determining that the first type of equipment and the second type of equipment are both the normal service state weight.
5. The method of claim 4, further comprising:
and if the target equipment is not recovered to be normal, determining that the target equipment is a first class service state weight, and determining that the first class equipment is a second class service state weight, wherein the second class equipment is a third class service state weight.
6. The method of claim 4, wherein the detecting whether the target device is normal comprises:
detecting the target equipment by adopting detection equipment within a preset time period according to a preset time interval to obtain a detection result;
and judging whether the target equipment is recovered to be normal or not based on the detection result.
7. The method according to any one of claims 1 to 6, wherein the determining whether the corresponding device is normal based on the service state weight of the target device after the fault processing, the service state weight of the first type device, and the service state weight of the second type device respectively includes:
judging whether the service state weight of the target equipment, the service state weight of the first type of equipment and the service state weight of the second type of equipment are normal service state weights or not;
and if the service state weight of the target equipment, the service state weight of the first type of equipment and the service state weight of the second type of equipment are both normal service state weights, determining that the equipment is recovered to be normal.
8. A fault handling device, comprising:
the receiving module is used for receiving the fault information and determining the target equipment with the fault based on the fault information;
the acquisition module is used for acquiring a first type of equipment which has a direct calling relationship with the target equipment and a second type of equipment which has an indirect calling relationship with the target equipment;
the processing module is used for carrying out fault processing on the target equipment;
a first determining module, configured to determine a service state weight of a target device after fault processing, the service state weight of the first type of device, and the service state weight of the second type of device, where the service state weights are used to correspond to priorities for characterizing device fault processing;
and a second determining module, configured to determine whether corresponding devices are normal respectively based on the service state weight of the target device after the fault processing, the service state weight of the first type of device, and the service state weight of the second type of device.
9. A computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the fault handling method of any one of claims 1 to 7.
10. An electronic device comprising one or more processors and memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the fault handling method of any of claims 1-7.
CN202211435381.7A 2022-11-16 2022-11-16 Fault processing method, device and storage medium Pending CN115913887A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211435381.7A CN115913887A (en) 2022-11-16 2022-11-16 Fault processing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211435381.7A CN115913887A (en) 2022-11-16 2022-11-16 Fault processing method, device and storage medium

Publications (1)

Publication Number Publication Date
CN115913887A true CN115913887A (en) 2023-04-04

Family

ID=86495832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211435381.7A Pending CN115913887A (en) 2022-11-16 2022-11-16 Fault processing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN115913887A (en)

Similar Documents

Publication Publication Date Title
CN106716972B (en) Semi-automatic failover
US9652316B2 (en) Preventing and servicing system errors with event pattern correlation
US7082381B1 (en) Method for performance monitoring and modeling
CN110661659A (en) Alarm method, device and system and electronic equipment
US7254750B1 (en) Health trend analysis method on utilization of network resources
US11157343B2 (en) Systems and methods for real time computer fault evaluation
WO2018103216A1 (en) Method and apparatus for detecting memory leakages
US7369967B1 (en) System and method for monitoring and modeling system performance
CN110674149B (en) Service data processing method and device, computer equipment and storage medium
US20200169493A1 (en) System for defining and implementing performance monitoring requirements for applications and hosted computing environment infrastructure
CN112035322A (en) JVM monitoring method and device
CN115913887A (en) Fault processing method, device and storage medium
CN112579387A (en) Business system monitoring method and device, storage medium and equipment
CN112668727A (en) Method and device for detecting equipment fault
CN115757138A (en) Method and device for determining script abnormal reason, storage medium and electronic equipment
CN113886122B (en) System operation exception handling method, device, equipment and storage medium
CN114610560B (en) System abnormality monitoring method, device and storage medium
CN115756888A (en) Data processing method, processor, device and storage medium
JP2016085496A (en) Abnormality sign detection device and method of computer system
CN114500249A (en) Root cause positioning method and device
CN114756455A (en) Business abnormity positioning method and device, electronic equipment and storage medium
CN113742176A (en) Fault prediction method and device and electronic equipment
JP2003345629A (en) System monitor device, system monitoring method used for the same, and program therefor
JP2013003896A (en) Information providing device, information providing method and program
CN110750418B (en) Information processing method, electronic equipment and information processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination