CN110618890A

CN110618890A - Fault processing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN110618890A
Application number: CN201910751685.6A
Authority: CN
Inventors: 王盼盼
Original assignee: Ping An Puhui Enterprise Management Co Ltd
Current assignee: Ping An Puhui Enterprise Management Co Ltd
Priority date: 2019-08-15
Filing date: 2019-08-15
Publication date: 2019-12-27
Anticipated expiration: 2039-08-15
Also published as: CN110618890B

Abstract

The embodiment of the invention discloses a fault processing method, a fault processing device, electronic equipment and a computer readable storage medium, wherein the method can be applied to the field of software monitoring, and comprises the following steps: acquiring fault information, wherein the fault information comprises a fault identification and a fault grade of a fault; acquiring a first fault type of a target fault, wherein the target fault is a fault with the highest fault grade in fault information, and the first fault type comprises a root fault type or a non-root fault type; if the first fault type of the target fault is a basic fault type, performing repair operation on the target fault; and if the first fault type of the target fault is a non-root fault type, performing repair operation on the target root fault corresponding to the target root fault identification in the fault information, wherein the target root fault can cause the target fault. By implementing the embodiment of the invention, the target fault or the target root fault which causes the fault belonging to the non-root fault type can be repaired preferentially, thereby being beneficial to improving the fault processing efficiency.

Description

Fault processing method and device, electronic equipment and computer readable storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a fault processing method and device, electronic equipment and a computer readable storage medium.

Background

Electronic devices such as computers and servers have been widely used in information industry, and because of defects of the electronic devices themselves or operational problems of users, failures are inevitably generated, and if the failures cannot be found and repaired in time, great loss is caused.

Multiple faults may exist in an electronic device at the same time, and there may be an association between different faults, for example, fault 1 may be caused by fault 2.

However, the existing fault handling method does not consider the relevance between faults, so that the fault handling efficiency is low.

Disclosure of Invention

The embodiment of the invention discloses a fault processing method, a fault processing device, electronic equipment and a computer readable storage medium, which can preferentially repair a target fault or a target root fault which can cause a fault belonging to a non-root fault type, thereby being beneficial to improving the fault processing efficiency.

In a first aspect, an embodiment of the present invention discloses a fault handling method, which may include: acquiring fault information, wherein the fault information comprises one or more unprocessed faults and fault grades of each unprocessed fault; acquiring a first fault type of a target fault, wherein the target fault is an unprocessed fault with the highest fault grade in fault information, the first fault type comprises a fundamental fault type or a non-fundamental fault type, the fault belonging to the fundamental fault type is caused by self fault, and the fault belonging to the non-fundamental fault type is caused by the fault belonging to the fundamental fault type; if the first fault type of the target fault is a root fault type, performing repair operation on the target fault; and if the first fault type of the target fault is a non-root fault type, performing repair operation on a target root fault corresponding to the target root fault identification in the fault information, wherein the target root fault belongs to a root fault type, and the target root fault can cause the target fault.

In one implementation, the method may further include: determining a root fault identification set corresponding to the identification of the target fault according to the corresponding relation between the fault identification and the root fault identification, wherein the root fault corresponding to each root fault identification in the root fault identification set belongs to a root fault type, and the root fault corresponding to each root fault identification in the root fault identification set can cause the target fault; and taking the fault identifications existing in both the root fault identification set and the fault information as the identifications of the target root fault.

In one implementation, the method may further include: acquiring a fault score of an unprocessed fault on a system dimension and a fault score on a service dimension, which correspond to an identifier of the unprocessed fault in the fault information; and taking the sum of the fault score of the unprocessed fault in the system dimension and the fault score of the service dimension, which is corresponding to the identifier of the unprocessed fault in the fault information, as the total fault score of the unprocessed fault, and taking the fault grade corresponding to the total fault score of the unprocessed fault as the fault grade of the unprocessed fault according to the corresponding relation between the fault score and the fault grade.

In an implementation manner, a specific implementation manner of obtaining a fault score of an unprocessed fault in a system dimension, where the fault score corresponds to an identifier of the unprocessed fault in the fault information, may be: aiming at each unprocessed fault identifier in the fault information, determining an initiated fault identifier set in all unprocessed fault identifiers in the fault information, wherein an initiated fault corresponding to each initiated fault identifier in the initiated fault identifier set is initiated by the unprocessed fault corresponding to the unprocessed fault identifier, and a second fault type of the initiated fault corresponding to each initiated fault identifier in the initiated fault identifier set is a system type; summing fault fractions of triggered faults corresponding to all triggered fault identifications in the triggered fault identification set to obtain a first numerical value; determining a second fault type of the unprocessed fault, wherein the second fault type comprises a system type or a service type; if the second fault type of the unprocessed fault is the system type, taking the sum of the fault score of the unprocessed fault and the first numerical value as the fault score of the unprocessed fault on the system dimension; and if the second fault type of the unprocessed fault is the service type, taking the first numerical value as the fault score of the unprocessed fault on the system dimension.

In one implementation, the method may further include: acquiring the sum of the fault grades of the unprocessed faults corresponding to the identifications of all the unprocessed faults in the fault information; and if the sum of the fault grades of the unprocessed faults corresponding to all the unprocessed faults in the fault information is greater than the preset fault grade, triggering and executing the step of acquiring the first fault type of the target fault.

In one implementation, the method may further include: acquiring a detection program of a target fault; executing a target fault detection program to obtain a detection result, wherein the detection result is used for indicating whether the target fault is successfully repaired; and if the detection result indicates that the target fault is not successfully repaired, outputting reminding information.

In one implementation, the number of unprocessed faults with the highest fault level in the fault information may be multiple, and the method may further include: and determining the unprocessed fault of which the first fault type is the root fault type in all unprocessed faults with the highest fault level in the fault information as the target fault.

In a second aspect, an embodiment of the present invention discloses a fault handling apparatus, which includes means for performing the method described in the first aspect.

In a third aspect, an embodiment of the present invention discloses an electronic device, which includes a memory and a processor, where the memory is used for storing a computer program, the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method according to the first aspect.

By implementing the embodiment of the invention, if the first fault type of the target fault is the basic fault type, the target fault is repaired; and if the first fault type of the target fault is a non-root fault type, performing repair operation on a target root fault corresponding to the target root fault identification in the fault information, wherein the target root fault belongs to a root fault type. In this way, a target failure or a target root failure that causes a failure that is not a type of root failure can be repaired preferentially, thereby contributing to an improvement in failure processing efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a fault handling method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of another fault handling method provided in the embodiment of the present invention;

fig. 3 is a schematic structural diagram of a fault handling apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a fault handling method according to an embodiment of the present invention. Specifically, as shown in fig. 1, the fault handling method according to the embodiment of the present invention may include, but is not limited to, the following steps:

s101, the electronic equipment acquires fault information, wherein the fault information comprises one or more unprocessed faults and fault grades of the unprocessed faults.

The fault indicated by the fault information acquired by the electronic device may include a fault generated in the electronic device, and optionally, the fault indicated by the fault information acquired by the electronic device may also include a fault generated in another device. For example, when the electronic device is used to manage a plurality of devices (such as the device 1, the device 2, and the device 3), the failure indicated by the failure information acquired by the electronic device may include a failure generated in any one of the device 1, the device 2, the device 3, and the electronic device.

Specifically, the specific implementation of the fault information acquired by the electronic device may be: the electronic device obtains the fault information from the local storage, wherein the fault information stored locally by the electronic device may be sent to the electronic device by each device managed by the electronic device. In an implementation manner, the specific implementation manner of the fault information acquired by the electronic device may be: the method comprises the steps that an electronic device sends a fault information acquisition request to each managed device, and after each device receives the fault information acquisition request, the relevant information of each fault which is generated in the device and is not processed can be sent to the electronic device, wherein the relevant information of each fault can include but is not limited to a fault identifier and a fault level of the fault, the fault identifier is used for uniquely identifying one fault, the fault level is used for representing the severity of the fault, and the higher the fault level is, the higher the severity of the fault is; the lower the fault level, the lower the severity of the fault.

In the embodiment of the present invention, the severity of the fault may be determined according to the severity of the possible consequences of the fault, for example, if the fault a causes a catastrophic accident (for example, the system is completely unable to work), and the fault a must be repaired immediately, the severity of the fault a is destructive, and at this time, the fault level corresponding to the fault a may be four levels; if the fault a can cause casualties or system damage (if partial functions of the system cannot be realized), the fault a needs to be repaired immediately, the severity of the fault a is dangerous, and at the moment, the fault level corresponding to the fault a can be three levels; if the fault a may cause light damage and damage, measures should be taken to repair or mitigate the fault a, the severity of the fault a is critical, and at this time, the fault level corresponding to the fault a may be two levels; if the fault a does not cause damage or damage, that is, no measures need to be taken, the severity of the fault a is safe, and at this time, the fault level corresponding to the fault a may be one level.

In the embodiment of the present invention, the electronic device may be a terminal device or a server. The terminal device may be a smart phone, a tablet Computer, a Personal Computer (PC), a smart television, a smart watch, a vehicle-mounted device, a wearable device, a terminal device in the future fifth Generation mobile communication technology (5G) network, and the like, which is not limited in the embodiments of the present invention.

S102, the electronic equipment acquires a first fault type of a target fault, wherein the target fault is an unprocessed fault with the highest fault grade in the fault information, the first fault type comprises a fundamental fault type or a non-fundamental fault type, the fault belonging to the fundamental fault type is caused by self fault, and the fault belonging to the non-fundamental fault type is caused by the fault belonging to the fundamental fault type.

In the embodiment of the present invention, the types of faults may include a first fault type, the first fault type may include a root fault type or a non-root fault type, a fault belonging to the root fault type is caused by a self-fault, a fault belonging to the non-root fault type may be caused by a fault belonging to the root fault type, and the self-fault may refer to a fault occurring inside a hardware unit or a software unit in which a fault occurs. For example, if the power supply is damaged, the CPU cannot be operated because the power supply is not supplied to the Central Processing Unit (CPU) through the power supply interface, and the power supply is damaged due to the power supply module, so the power supply fault is caused by the self-fault, that is, the power supply fault belongs to the fundamental fault type, and the CPU cannot operate because the power supply is damaged, so the fault that the CPU cannot operate belongs to the non-fundamental fault type. It should be noted that a fault belonging to a fundamental fault type may directly cause a fault belonging to a non-fundamental fault type, or may indirectly cause a fault belonging to a non-fundamental fault type, for example, a power failure (i.e., a fault belonging to a fundamental fault type) may directly cause a CPU to fail (i.e., a fault belonging to a non-fundamental fault type); for another example, a power failure may cause the CPU to fail to operate, and the CPU fails to operate, which may cause the CPU to be unable to call data, and finally causes the data to fail to be displayed (i.e., a failure that is not a fundamental failure type), where the data to fail to be displayed is indirectly caused by the power failure. It should be noted that the failure that is not a fundamental failure type may also be caused by a self-failure, for example, the reason that the CPU fails to work may be that the CPU temperature is too high due to a failure of the CPU fan, or the CPU fails to work due to poor contact of the CPU pins. The CPU fan and the CPU pin are both components forming the CPU, and faults of the CPU fan and poor contact of the CPU pin belong to faults occurring in internal units of the CPU, namely faults of the CPU fan and poor contact of the CPU pin belong to self faults of the CPU.

In one implementation, the number of unprocessed faults with the highest fault level in the fault information may be one or more, and when the number of unprocessed faults with the highest fault level in the fault information is multiple, the specific implementation manner of the electronic device determining the target fault may be: and determining the unprocessed fault of which the first fault type is the root fault type in all unprocessed faults with the highest fault level in the fault information as the target fault. Because a fault belonging to the root fault type may cause other faults (i.e., a fault belonging to a non-root fault type), if the identification of the fault caused by the fault belonging to the root fault type also exists in the fault information, the purpose of repairing other faults caused by the target fault can be achieved by repairing the target fault belonging to the root fault type. Therefore, by preferentially determining the unprocessed fault belonging to the fundamental fault type as the target fault, it is advantageous to improve the fault processing efficiency and also possible to reduce the loss caused by the fault as much as possible. In the embodiment of the present invention, the number of target faults may be one or more, which is not limited in the embodiment of the present invention. For example, if the contents included in the fault information are shown in table 1, it can be seen from table 1 that the fault levels of the faults corresponding to the identifiers 2, 3, and 4 are all v3, that is, the unprocessed fault with the highest fault level in the fault information includes the fault 2 corresponding to the identifier 2, the fault 3 corresponding to the identifier 3, and the fault 4 corresponding to the identifier 4. If the first fault type of fault 2, fault 3, and fault 4 is a non-root fault type, a root fault type, and a root fault type, respectively, the electronic device may determine fault 3 and fault 4 as the target faults.

TABLE 1 Fault information

Fault identification	Identification 1	Identification 2	Identification 3	Sign 4
					Failure class	V1	V3	V3	V3

In one implementation, the fault information may further include a first fault type for each unprocessed fault, and the electronic device may query the fault information to obtain a first fault type for the target fault.

S103, if the first fault type of the target fault is the basic fault type, the electronic equipment executes repair operation on the target fault.

Specifically, after the electronic device obtains the first fault type of the target fault, it may determine whether the first fault type of the target fault is a root fault type, and if so, perform a repair operation on the target fault, so as to achieve the purpose of repairing other faults (i.e., faults belonging to non-root fault types) caused by the target fault, and be beneficial to improving the fault processing efficiency.

In one implementation, the specific implementation manner of the electronic device performing the repair operation on the target fault may be: and acquiring solution information corresponding to the target fault, and executing the steps included in the solution information. Specifically, the target fault may be repaired by executing the steps included in the solution information. By the method, the difficulty of repairing the fault is lower, and the fault processing efficiency is improved. The solution information corresponding to the target fault may be acquired from the cloud, or may be pre-stored by the electronic device, which is not limited in the embodiment of the present invention.

In one implementation, after the electronic device performs the repairing operation on the target fault, a detection program of the target fault may be further obtained, the detection program of the target fault is executed, and a detection result is obtained, where the detection result is used to indicate whether the target fault is successfully repaired (for example, the detection program of the target fault is executed, and whether the target fault is successfully repaired is detected by the detection result); and if the detection result indicates that the target fault is not successfully repaired, outputting reminding information. In the embodiment of the invention, after the electronic equipment executes the repair operation on the target fault, whether the target fault is successfully repaired can be detected, and if the target fault is successfully repaired, the target fault is successfully repaired by executing the repair operation; if the repair is not successful, it indicates that the target fault has not been repaired by performing the repair operation. The reason why the target fault is not repaired by performing the repair operation may be that the target fault cannot be repaired by performing the repair operation, or that the target fault may originally be repaired by performing the repair operation, but the target fault is not repaired by performing the repair operation this time due to other reasons (for example, some faults occur during the performance of the repair operation). If the target fault is not repaired by performing the repair operation, the electronic device may output a prompt to prompt the user to re-perform the repair operation (or adopt another method) to repair the target fault; meanwhile, the reminding information can also prompt the user to determine the reason which causes that the target fault is not repaired by executing the repairing operation, and if the reason is that the target fault cannot be repaired by executing the repairing operation, the solution information which can repair the target fault can be further acquired.

And S104, if the first fault type of the target fault is a non-fundamental fault type, the electronic equipment executes a repair operation on a target fundamental fault corresponding to the target fundamental fault identifier in the fault information, wherein the target fundamental fault belongs to a fundamental fault type, and the target fundamental fault can cause the target fault.

Specifically, after obtaining the first fault type of the target fault, the electronic device may determine whether the first fault type of the target fault is a root fault type, and if so, perform a repair operation on the target fault; and if not, performing repair operation on the target root fault corresponding to the target root fault identification in the fault information. When the first fault type of the target fault is a non-fundamental fault type, if the electronic equipment performs a repair operation on the target fault, only one fault of the target fault can be repaired; when the first fault type of the target fault is a non-root fault type, if the electronic device performs a repair operation on a target root fault corresponding to the target root fault identifier in the fault information, the target root fault may cause the target fault, and therefore the target fault may be repaired by repairing the target root fault, which may improve the fault processing efficiency. Since the target root fault belongs to the type of the root fault, if other faults except the target fault are caused by the target root fault, the target root fault is modified, so that the purpose of repairing all faults (including the target fault) caused by the target root fault can be achieved. For example, if the fault identifier in the fault information includes an identifier 1, an identifier 2, an identifier 3, and an identifier 4, where the fault 1 corresponding to the identifier 1 is a target fault, the fault 1 corresponding to the identifier 1 is caused by the fault 3 corresponding to the identifier 3, the fault 3 may also cause the fault 4 corresponding to the identifier 4, and the fault 4 belongs to a fundamental fault type, the electronic device may achieve the purpose of repairing the fault 1 and the fault 4 by repairing the fault 3 (because the fault 1 and the fault 4 are both caused by the fault 3, the fault 3 is repaired, and the fault 1 and the fault 4 are also naturally repaired), so that the fault processing efficiency may be effectively improved. For another example, when a failure occurs that data cannot be displayed, it is likely that the data stored in the database cannot be accessed because the database has failed, and thus the data cannot be displayed. In this case, the data can be successfully displayed by repairing the database failure, that is, the wireless display data failure caused by the database failure can be repaired by repairing the database failure.

In one implementation, the specific implementation manner of the electronic device determining the target root fault may be: determining a root fault identification set corresponding to the identification of the target fault according to the corresponding relation between the fault identification and the root fault identification, wherein the root fault corresponding to each root fault identification in the root fault identification set belongs to a root fault type, and the root fault corresponding to each root fault identification in the root fault identification set can cause the target fault; and taking the fault identifications existing in both the root fault identification set and the fault information as the identifications of the target root fault.

In the embodiment of the present invention, the electronic device may store a corresponding relationship between the fault identifier and the root fault identifier, and the corresponding relationship may determine which fault is caused by which fault belongs to the root fault type, that is, may determine a root fault identifier set corresponding to the identifier of the target fault, where a root fault corresponding to each root fault identifier in the root fault identifier set may cause the target fault. For example, if the correspondence between the fault identifier and the root fault identifier is shown in table 2, and the identifier of the target fault is identifier 1, it can be known from table 2 that fault 1 corresponding to identifier 1 may be caused by fault 5 corresponding to identifier 5, or may be caused by fault 7 corresponding to identifier 7, that is, the root fault identifier set corresponding to the identifier of the target fault (identifier 1) includes identifier 5 and identifier 7.

TABLE 2 correspondence between failure signatures and root failure signatures

Fault identification	Identification 1	Identification 2	Identification 3	Sign 4
					Root fault identification	Identification 5 (identification 7)	Sign 6	Identification 7	Identification 7

In one implementation, the root fault identification set may include one or more identifications, and when the root fault identification set includes a plurality of identifications, the electronic device may use the fault identification existing in both the root fault identification set and the fault information as the identification of the target root fault. For example, if the fault identifier in the fault information includes identifier 1, identifier 2, identifier 3, identifier 4, and identifier 5, and the root fault identifier set includes identifier 5 and identifier 7, the fault 5 corresponding to identifier 5 may be determined as the target fault. Because the fault corresponding to the fault identifier included in the fault information is an unprocessed fault, the fault corresponding to the fault identifier (e.g., identifier 7) that is not present in the fault information may have been processed, or the fault corresponding to the fault identifier (e.g., identifier 7) that is not present in the fault information may not have occurred. If the fault corresponding to the fault identifier (such as identifier 7) that does not exist in the fault information is determined as the target fault, and the target fault is repaired, additional resource waste will be caused.

In one implementation, one fault may be caused by multiple faults in common. For example, fault a cannot cause fault b, but both fault a and fault c can cause fault b. In one implementation, one fault may cause one or more faults, for example, fault 7 corresponding to identifier 7 in table 2 may cause fault 1 corresponding to identifier 1, fault 3 corresponding to identifier 3, and fault 4 corresponding to identifier 4.

In one implementation, the electronic device may further obtain a sum of fault levels of unprocessed faults corresponding to the identifiers of all unprocessed faults in the fault information; and if the sum of the fault grades of the unprocessed faults corresponding to all the unprocessed faults in the fault information is greater than the preset fault grade, triggering and executing the step of acquiring the first fault type of the target fault. When the sum of the fault levels of the unprocessed faults corresponding to all the unprocessed fault identifications in the fault information is greater than the preset fault level, it is indicated that the overall fault degree of the electronic equipment is relatively serious (or the overall fault degree of the equipment managed by the electronic equipment is relatively serious), and if the processed faults are not processed in time, an alarm is generated or the electronic equipment is crashed and cannot work. When the sum of the fault levels of the unprocessed faults corresponding to all the unprocessed fault identifications in the fault information is less than or equal to the preset fault level, the overall fault degree of the electronic equipment is indicated to be slight (or the overall fault degree of the equipment managed by the electronic equipment is slight), the faults are not processed temporarily, no alarm is generated, and the performance of the electronic equipment is not influenced, so that the system resources can be saved. The step of obtaining the first fault type of the target fault is triggered and executed only when the sum of the fault levels of the unprocessed faults corresponding to the identifiers of all the unprocessed faults in the fault information is greater than the preset fault level, so that system resources can be saved. In an implementation manner, the preset fault level may be set by default in the electronic device or may be customized by a user, which is not limited in the embodiment of the present invention.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating another fault handling method according to an embodiment of the present invention. Specifically, as shown in fig. 2, the fault handling method according to the embodiment of the present invention may include, but is not limited to, the following steps:

s201, the electronic equipment acquires a fault score of an unprocessed fault on a system dimension and a fault score on a service dimension, which correspond to the identification of the unprocessed fault in the fault information.

In practical cases, the influence of a fault occurring in an electronic device on the electronic device can be divided into a system-side influence and a service-side influence, and the system-side influence and the service-side influence comprehensively determine the fault level of the fault. Therefore, in order to determine the degree of influence of the fault on the electronic device, it is necessary to comprehensively consider the degree of influence of the fault on the system and the degree of influence on the service. In the embodiment of the present invention, the fault score may be used to characterize the influence degree of the fault on the system or service, and generally, a higher fault score indicates a greater influence degree of the fault on the system or service. Specifically, the electronic device may obtain a fault score in the system dimension and a fault score in the service dimension of the unprocessed fault corresponding to the identifier of the unprocessed fault in the fault information, where if the unprocessed fault is generated in the electronic device, the fault score in the system dimension and the fault score in the service dimension of the unprocessed fault are determined by the electronic device; if the unprocessed fault is generated in the device managed by the electronic device, the fault score of the unprocessed fault in the system dimension and the fault score in the service dimension are sent to the electronic device by the device managed by the electronic device.

In an implementation manner, a specific implementation manner of the electronic device obtaining the fault score of the unprocessed fault in the system dimension corresponding to the identifier of the unprocessed fault in the fault information may be: aiming at each unprocessed fault identifier in the fault information, determining an initiated fault identifier set in all unprocessed fault identifiers in the fault information, wherein an initiated fault corresponding to each initiated fault identifier in the initiated fault identifier set is initiated by the unprocessed fault corresponding to the unprocessed fault identifier, and a second fault type of the initiated fault corresponding to each initiated fault identifier in the initiated fault identifier set is a system type; summing fault fractions of triggered faults corresponding to all triggered fault identifications in the triggered fault identification set to obtain a first numerical value; determining a second fault type of the unprocessed fault, wherein the second fault type comprises a system type or a service type; if the second fault type of the unprocessed fault is the system type, taking the sum of the fault score of the unprocessed fault and the first numerical value as the fault score of the unprocessed fault on the system dimension; and if the second fault type of the unprocessed fault is the service type, taking the first numerical value as the fault score of the unprocessed fault on the system dimension.

In the embodiment of the present invention, the type of the fault may include a second fault type in addition to the first fault type, and the second fault type may include a system type or a service type. Faults that are of the system type may include, but are not limited to: faults of the CPU, memory, thread, database, etc. occur, for example, the CPU usage rate exceeds 80%. Faults belonging to a traffic type may include, but are not limited to: the total amount of service of the equipment is smaller than a preset total amount threshold, the increase and decrease of the service amount, the service success amount (or the service success rate) is reduced to a preset power amount threshold (or a preset success rate threshold), and the service failure amount (or the service failure rate) is increased to a preset failure amount threshold (a preset failure rate threshold). For example, the traffic success rate drops to 50%.

The first value is the sum of the fault scores of the triggered faults corresponding to all triggered fault markers in the triggered fault marker set, so that the first value can be used to characterize the maximum influence of all faults belonging to the system type caused by the unprocessed fault on the system side, that is, the influence caused when all faults corresponding to all triggered fault markers in the triggered fault marker set occur is the maximum influence of all faults belonging to the system type caused by the unprocessed fault on the system side. If the second fault type of the unprocessed fault is the system type, the sum of the fault score of the unprocessed fault and the first value may be used to characterize the maximum impact the unprocessed fault has on the system. If the second fault type of the unprocessed fault is a service type, it indicates that the first value can be used to characterize the maximum impact of the unprocessed fault on the system.

In an implementation manner, a specific implementation manner of the electronic device obtaining the fault score of the unprocessed fault in the system dimension corresponding to the identifier of the unprocessed fault in the fault information may be: for each unprocessed fault identifier in the fault information, determining a first initiated fault identifier set from all unprocessed fault identifiers in the fault information, wherein a first initiated fault corresponding to each first initiated fault identifier in the first initiated fault identifier set may be initiated by an unprocessed fault corresponding to the unprocessed fault identifier, and a second fault type of the first initiated fault corresponding to each first initiated fault identifier in the first initiated fault identifier set is a system type; acquiring the initiation probability of a first initiated fault corresponding to each first initiated fault identifier in a first initiated fault identifier set initiated by an unprocessed fault, and performing weighted summation operation on the fault fraction and the initiation probability of the first initiated fault corresponding to each first initiated fault identifier in the first initiated fault identifier set to obtain a second numerical value; determining a second fault type of the unprocessed fault; if the second fault type of the unprocessed fault is the system type, taking the sum of the fault score of the unprocessed fault and the second numerical value as the fault score of the unprocessed fault in the system dimension; and if the second fault type of the unprocessed fault is the service type, taking the second numerical value as the fault score of the unprocessed fault on the system dimension. For example, if the fault information includes an identifier 1, an identifier 2, an identifier 3, an identifier 4, and an identifier 5, and the identifier 1 is taken as an identifier of an unprocessed fault for explanation, if a fault corresponding to the identifier 1 may cause a fault corresponding to the identifier 2 and a fault corresponding to the identifier 3 in the fault information (a second fault type of the fault corresponding to the identifier 2 and the fault corresponding to the identifier 3 is a system type), and the probability of causing the fault corresponding to the identifier 2 caused by the fault corresponding to the identifier 1 is 50%, the probability of causing the fault corresponding to the identifier 3 caused by the fault corresponding to the identifier 1 is 100%, the fault score of the fault corresponding to the identifier 2 is 10, and the fault score of the fault corresponding to the identifier 3 is 20, the second value is: 50%. 10+ 10%. 20. In actual life, a certain incidence relation exists between some faults, but it is not necessarily determined through the incidence relation that another fault is caused by a certain fault, and the calculated fault score on the system dimension can be more accurate by introducing the causing probability. In an implementation manner, the triggering probability between the faults may be obtained by analyzing historical fault data of the electronic device, or may be determined by a user according to an empirical value, which is not limited in the embodiment of the present invention.

In one implementation, the specific implementation of the electronic device obtaining the failure score of each failure (e.g., the first failure) may be: the method comprises the steps of obtaining the generation duration and the weight of the duration of a first fault, obtaining the weight of the first fault, carrying out weighted summation operation on the generation duration and the weight of the duration of the first fault, carrying out weighted summation operation on the result after weighted summation and the weight of the first fault, and taking the obtained result as the fault score of the first fault. By considering the generation duration of the first fault, the weight of the duration and the weight of the first fault, the calculated fault score of the first fault can be more accurate. The weight of the first fault is used to represent the degree of influence of the first fault on the electronic device, for example, if the first fault does not cause other faults, or the first fault is easy to repair, the weight of the first fault may be set to be smaller; if the first fault causes other faults or the first fault is not easy to repair, the weight of the first fault can be set to be larger. In an implementation manner, both the weight of the first failure and the weight of the duration may be set by the electronic device as a default, or may be user-defined, which is not limited in the embodiment of the present invention.

In one implementation, the specific implementation manner of the electronic device obtaining the failure score of each failure (e.g. the first failure) may further be: acquiring a score and a weight of the time length corresponding to the generation time length of the first fault, a score and a weight of the influence amount corresponding to the influence amount of the first fault, a score and a weight of the influence range of a user of the first fault, a score and a weight of the transaction amount corresponding to the influence transaction amount of the first fault, and a weight of the first fault; the method comprises the steps of conducting weighted summation operation on a score corresponding to the generation time length of a first fault and a weight of the generation time length, conducting weighted summation operation on a score corresponding to the influence amount of the first fault and a weight of the influence amount, conducting weighted summation operation on a score corresponding to the user influence range of the first fault and a weight of the user influence range, conducting weighted summation operation on a score corresponding to the influence transaction amount of the first fault and a weight of the transaction amount, conducting weighted summation operation on the sum of all results obtained through weighted summation and the weight of the first fault, and taking the obtained result as the fault score of the first fault.

By considering the generation duration and the weight of the duration of the first fault, the influence sum of the first fault, the weight of the influence sum, the user influence range and the weight of the user influence range of the first fault, the influence transaction amount and the weight of the transaction amount of the first fault and the weight of the first fault, the calculated fault score of the first fault can be more accurate. In one implementation, the weight of the first fault is used to comprehensively characterize the degree of influence of the first fault on the electronic device and the degree of criticality of the first fault in the service flow, for example, if the first fault affects not only the normal operation of the electronic device but also the normal operations of other devices (the other devices need to perform subsequent processing according to the processing result of the electronic device), the degree of criticality of the first fault in the service flow is higher; if the first failure only affects the normal operation of the electronic device (the other devices can perform subsequent processing without the processing result of the electronic device), and does not affect the normal operation of the other devices, the first failure is less critical in the service flow.

The score corresponding to the generation duration of the first fault, the score corresponding to the influence amount of the first fault, the score corresponding to the user influence range of the first fault, and the score corresponding to the influence transaction amount of the first fault may be obtained through the lookup table 3. As can be seen from table 3, the higher the score, the greater the loss due to the first failure. Table 3 is only for illustration and does not limit the embodiments of the present invention. It should be further noted that the weight of the duration, the weight of the influence amount, the weight of the influence range of the user, and the weight of the transaction amount may be set by the electronic device as a default, or may be user-defined, which is not limited in the embodiment of the present invention.

TABLE 3 score information Table corresponding to Generation duration, influence amount, user influence Range, influence transaction amount

Similarly, the specific implementation manner of obtaining, by the electronic device, the fault score of the unprocessed fault in the service dimension, which corresponds to the identifier of the unprocessed fault in the fault information, may be: for each unprocessed fault identifier in the fault information, determining a second initiated fault identifier set from all unprocessed fault identifiers in the fault information, wherein a second initiated fault corresponding to each second initiated fault identifier in the second initiated fault identifier set is initiated by an unprocessed fault corresponding to the unprocessed fault identifier, and a second fault type of the second initiated fault corresponding to each second initiated fault identifier in the second initiated fault identifier set is a service type; summing the fault fractions of the second induced faults corresponding to all the second induced fault identifications in the second induced fault identification set to obtain a third numerical value; determining a second fault type of the unprocessed fault, and if the second fault type of the unprocessed fault is the service type, taking the sum of the fault score of the unprocessed fault and a third numerical value as the fault score of the unprocessed fault on the service dimension; and if the second fault type of the unprocessed fault is the system type, taking the third numerical value as the fault score of the unprocessed fault on the service dimension.

In an implementation manner, a specific implementation manner of the electronic device obtaining a fault score of the unprocessed fault in the service dimension, where the fault score corresponds to the identifier of the unprocessed fault in the fault information, may be: for each unprocessed fault identifier in the fault information, determining a third induced fault identifier set from all unprocessed fault identifiers in the fault information, where a third induced fault corresponding to each third induced fault identifier in the third induced fault identifier set may be induced by an unprocessed fault corresponding to the unprocessed fault identifier, and a second fault type of the third induced fault corresponding to each third induced fault identifier in the third induced fault identifier set is a service type; acquiring the triggering probability of a third triggering fault corresponding to each third triggering fault identifier in a third triggering fault identifier set caused by the unprocessed fault, and performing weighted summation operation on the fault fraction and the triggering probability of the third triggering fault corresponding to each third triggering fault identifier in the third triggering fault identifier set to obtain a fourth numerical value; determining a second fault type of the unprocessed fault; if the second fault type of the unprocessed fault is the service type, taking the sum of the fault score of the unprocessed fault and a fourth numerical value as the fault score of the unprocessed fault on the service dimension; and if the second fault type of the unprocessed fault is the system type, taking the fourth numerical value as the fault score of the unprocessed fault on the service dimension.

S202, the sum of the fault score of the unprocessed fault in the system dimension and the fault score in the service dimension, which corresponds to the identification of the unprocessed fault in the fault information, is taken as the total fault score of the unprocessed fault by the electronic equipment.

And S203, the electronic equipment takes the fault grade corresponding to the total fault score of the unprocessed faults as the fault grade of the unprocessed faults according to the corresponding relation between the fault score and the fault grade.

Specifically, the electronic device may store a corresponding relationship between the failure score and the failure level, or a corresponding relationship between the failure score range and the failure level, and the electronic device may use the failure level corresponding to the total failure score of the unprocessed failure as the failure level of the unprocessed failure through the corresponding relationship. For example, a score between [10, 20] corresponds to a failure level of v1, a score between [20, 30] corresponds to a failure level of v2, and a score between [30, 40] corresponds to a failure level of v 3.

S204, the electronic equipment acquires fault information, wherein the fault information comprises one or more unprocessed faults and fault grades of each unprocessed fault.

S205, the electronic equipment acquires a first fault type of the target fault, wherein the target fault is an unprocessed fault with the highest fault level in the fault information, the first fault type comprises a root fault type or a non-root fault type, the fault belonging to the root fault type is caused by the fault of the electronic equipment, and the fault belonging to the non-root fault type is caused by the fault belonging to the root fault type.

And S206, if the first fault type of the target fault is the basic fault type, the electronic equipment executes repair operation on the target fault.

And S207, if the first fault type of the target fault is a non-root fault type, the electronic equipment executes a repair operation on a target root fault corresponding to the target root fault identification in the fault information, wherein the target root fault belongs to a root fault type, and the target root fault can cause the target fault.

It should be noted that, the execution processes of step S204 to step S207 may refer to the specific descriptions in step S101 to step S104 in fig. 1, which are not described herein again.

In the embodiment of the present invention, the sum of the fault score of the unprocessed fault in the system dimension and the fault score of the unprocessed fault in the service dimension, which correspond to the identifier of the unprocessed fault in the fault information, may be used as the total fault score of the unprocessed fault, and further, according to the correspondence between the fault score and the fault level, the equipment uses the fault level corresponding to the total fault score of the unprocessed fault as the fault level of the unprocessed fault. By considering the fault scores of the system dimension and the service dimension to determine the total fault score of the fault, the fault grade of the fault can be better evaluated according to the total fault score of the fault.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a fault handling apparatus according to an embodiment of the present invention, specifically, as shown in fig. 3, the fault handling apparatus 30 may include:

an obtaining unit 301, configured to obtain fault information, where the fault information includes an identifier of one or more unprocessed faults and a fault level of each unprocessed fault;

the acquiring unit 301 is further configured to acquire a first fault type of a target fault, where the target fault is an unprocessed fault with a highest fault level in the fault information, and the first fault type includes a root fault type or a non-root fault type, where a fault belonging to the root fault type is caused by a fault of itself, and a fault belonging to the non-root fault type is caused by a fault belonging to the root fault type;

a processing unit 302, configured to, if a first fault type of the target fault is a root fault type, perform a repair operation on the target fault;

the processing unit 302 is further configured to, if a first fault type of the target fault is a non-root fault type, perform a repair operation on a target root fault corresponding to the identifier of the target root fault in the fault information, where the target root fault belongs to a root fault type and the target root fault may cause the target fault.

In one implementation, the fault handling apparatus 30 may further include a determining unit 303, where the determining unit 303 is configured to determine, according to a correspondence between the fault identifier and the root fault identifier, a root fault identifier set corresponding to an identifier of the target fault, where a root fault corresponding to each root fault identifier in the root fault identifier set belongs to a root fault type, and a root fault corresponding to each root fault identifier in the root fault identifier set may cause the target fault; the processing unit 302 may be further configured to use the fault identifier existing in both the root fault identifier set and the fault information as the identifier of the target root fault.

In an implementation manner, the obtaining unit 301 may be further configured to obtain a fault score in a system dimension and a fault score in a business dimension of an unprocessed fault corresponding to an identifier of the unprocessed fault in the fault information; the processing unit 302 may be further configured to use a sum of a fault score of the unprocessed fault in the system dimension and a fault score of the unprocessed fault in the service dimension, which is corresponding to the identifier of the unprocessed fault in the fault information, as a total fault score of the unprocessed fault, and use a fault level corresponding to the total fault score of the unprocessed fault as a fault level of the unprocessed fault according to a correspondence between the fault score and the fault level.

In an implementation manner, when the obtaining unit 301 is configured to obtain a fault score of an unprocessed fault in a system dimension, where the fault score corresponds to an identifier of the unprocessed fault in the fault information, specifically, the obtaining unit may be configured to: aiming at each unprocessed fault identifier in the fault information, determining an initiated fault identifier set in all unprocessed fault identifiers in the fault information, wherein an initiated fault corresponding to each initiated fault identifier in the initiated fault identifier set is initiated by the unprocessed fault corresponding to the unprocessed fault identifier, and a second fault type of the initiated fault corresponding to each initiated fault identifier in the initiated fault identifier set is a system type; summing fault fractions of triggered faults corresponding to all triggered fault identifications in the triggered fault identification set to obtain a first numerical value; determining a second fault type of the unprocessed fault, wherein the second fault type comprises a system type or a service type; if the second fault type of the unprocessed fault is the system type, taking the sum of the fault score of the unprocessed fault and the first numerical value as the fault score of the unprocessed fault on the system dimension; and if the second fault type of the unprocessed fault is the service type, taking the first numerical value as the fault score of the unprocessed fault on the system dimension.

In an implementation manner, the obtaining unit 301 may further be configured to obtain a sum of fault levels of unprocessed faults corresponding to the identifiers of all unprocessed faults in the fault information; the processing unit 302 may be further configured to trigger execution of the step of obtaining the first fault type of the target fault if a sum of fault levels of the unprocessed faults corresponding to the identifiers of all the unprocessed faults in the fault information is greater than a preset fault level.

In one implementation, the obtaining unit 301 may be further configured to obtain a detection program of the target fault; the processing unit 302 may be further configured to execute a target fault detection program to obtain a detection result, where the detection result is used to indicate whether the target fault is successfully repaired; the fault handling apparatus 30 may further include an output unit 304, and the output unit 304 may be configured to output a warning message if the detection result indicates that the target fault is not successfully repaired.

In an implementation manner, the number of unprocessed faults with the highest fault level in the fault information may be multiple, and the determining unit 303 may be further configured to determine, as the target fault, an unprocessed fault in which a first fault type is a root fault type among all unprocessed faults with the highest fault level in the fault information.

The embodiments of the present invention and the embodiments of the method shown in fig. 1-2 are based on the same concept, and the technical effects thereof are also the same, and for the specific principle, reference is made to the description of the embodiments shown in fig. 1-2, which is not repeated herein.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device 40 may include a memory 401, a processor 402, and a network interface 403, the memory 401, the processor 402, and the network interface 403 being connected by one or more communication buses. Wherein the network interface 403 is controlled by the processor 402 for transceiving messages.

Memory 401 may include both read-only memory and random-access memory, and provides instructions and data to processor 402. A portion of the memory 401 may also include non-volatile random access memory.

The Processor 402 may be a Central Processing Unit (CPU), and the Processor 402 may also be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor, and optionally, the processor 402 may be any conventional processor or the like. Wherein:

a memory 401 for storing program instructions.

A processor 402 for calling program instructions stored in the memory 401 for:

acquiring fault information, wherein the fault information comprises one or more unprocessed faults and fault grades of each unprocessed fault;

acquiring a first fault type of a target fault, wherein the target fault is an unprocessed fault with the highest fault grade in fault information, the first fault type comprises a fundamental fault type or a non-fundamental fault type, the fault belonging to the fundamental fault type is caused by self fault, and the fault belonging to the non-fundamental fault type is caused by the fault belonging to the fundamental fault type;

if the first fault type of the target fault is a root fault type, performing repair operation on the target fault;

and if the first fault type of the target fault is a non-root fault type, performing repair operation on a target root fault corresponding to the target root fault identification in the fault information, wherein the target root fault belongs to a root fault type, and the target root fault can cause the target fault.

It should be noted that details that are not mentioned in the embodiment corresponding to fig. 4 and specific implementation manners of each step may refer to the embodiments shown in fig. 1 to fig. 2 and the foregoing contents, and are not described again here.

Embodiments of the present invention also provide a computer-readable storage medium, in which a computer program is stored, where the computer program includes program instructions, and the program instructions, when executed by a processor, cause the processor to execute the steps executed in the method embodiments shown in fig. 1-2.

While the invention has been described with reference to a number of embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of fault handling, the method comprising:

acquiring a first fault type of a target fault, wherein the target fault is an unprocessed fault with the highest fault grade in the fault information, the first fault type comprises a root fault type or a non-root fault type, a fault belonging to the root fault type is caused by a self fault, and a fault belonging to the non-root fault type is caused by the fault belonging to the root fault type;

if the first fault type of the target fault is the root fault type, performing repair operation on the target fault;

and if the first fault type of the target fault is the non-root fault type, performing a repair operation on a target root fault corresponding to the target root fault identifier in the fault information, wherein the target root fault belongs to the root fault type, and the target root fault can cause the target fault.

2. The method of claim 1, further comprising:

determining a root fault identification set corresponding to the target fault identification according to the corresponding relation between the fault identification and the root fault identification, wherein the root fault corresponding to each root fault identification in the root fault identification set belongs to the root fault type, and the root fault corresponding to each root fault identification in the root fault identification set can cause the target fault;

and taking the fault identifier existing in both the root fault identifier set and the fault information as the identifier of the target root fault.

3. The method of claim 1, further comprising:

acquiring a fault score of an unprocessed fault on a system dimension and a fault score on a service dimension, which correspond to the identifier of the unprocessed fault in the fault information;

taking the sum of the fault score of the unprocessed fault in the system dimension and the fault score of the unprocessed fault in the service dimension, which corresponds to the identifier of the unprocessed fault in the fault information, as the total fault score of the unprocessed fault;

and according to the corresponding relation between the fault scores and the fault grades, taking the fault grade corresponding to the total fault score of the unprocessed faults as the fault grade of the unprocessed faults.

4. The method of claim 3, wherein the obtaining of the fault score of the unprocessed fault in the fault information corresponding to the identification of the unprocessed fault in the system dimension comprises:

for each unprocessed fault identifier in the fault information, determining an initiated fault identifier set from all unprocessed fault identifiers in the fault information, where an initiated fault corresponding to each initiated fault identifier in the initiated fault identifier set is initiated by an unprocessed fault corresponding to the unprocessed fault identifier, and a second fault type of the initiated fault corresponding to each initiated fault identifier in the initiated fault identifier set is a system type;

summing fault fractions of triggered faults corresponding to all triggered fault identifications in the triggered fault identification set to obtain a first numerical value;

determining a second fault type of the unprocessed fault, wherein the second fault type comprises the system type or the service type;

if the second fault type of the unprocessed fault is the system type, taking the sum of the fault score of the unprocessed fault and the first numerical value as the fault score of the unprocessed fault on the system dimension;

and if the second fault type of the unprocessed fault is the service type, taking the first numerical value as a fault score of the unprocessed fault on a system dimension.

5. The method according to any one of claims 1 to 4, further comprising:

acquiring the sum of the fault grades of the unprocessed faults corresponding to the identifications of all the unprocessed faults in the fault information;

and if the sum of the fault grades of the unprocessed faults corresponding to the identifiers of all the unprocessed faults in the fault information is greater than a preset fault grade, triggering and executing the step of obtaining the first fault type of the target fault.

6. The method according to any one of claims 1 to 4, wherein after performing a repair operation on the target fault, the method further comprises:

acquiring a detection program of the target fault;

executing a detection program of the target fault to obtain a detection result, wherein the detection result is used for indicating whether the target fault is successfully repaired;

and if the detection result indicates that the target fault is not successfully repaired, outputting reminding information.

7. The method according to any one of claims 1 to 4, wherein the number of unprocessed faults with the highest fault level in the fault information is multiple, and the method further comprises:

and determining the unprocessed fault of which the first fault type is the root fault type in all the unprocessed faults with the highest fault grade in the fault information as the target fault.

8. A fault handling device, characterized in that the device comprises means for performing the method according to any of claims 1-7.

9. An electronic device comprising a memory for storing a computer program comprising program instructions and a processor configured to invoke the program instructions to perform the method of any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method according to any one of claims 1 to 7.