CN114296973A - Server troubleshooting system, method and storage medium - Google Patents

Server troubleshooting system, method and storage medium Download PDF

Info

Publication number
CN114296973A
CN114296973A CN202111552628.9A CN202111552628A CN114296973A CN 114296973 A CN114296973 A CN 114296973A CN 202111552628 A CN202111552628 A CN 202111552628A CN 114296973 A CN114296973 A CN 114296973A
Authority
CN
China
Prior art keywords
fault
fault information
information
server
troubleshooting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111552628.9A
Other languages
Chinese (zh)
Inventor
赵子腾
王晓玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202111552628.9A priority Critical patent/CN114296973A/en
Publication of CN114296973A publication Critical patent/CN114296973A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention relates to a server troubleshooting system, method and storage medium, the system includes: a model generation unit that generates a troubleshooting model based on first failure information of the server and a failure solution corresponding thereto; the fault acquisition unit is in communication connection with the server and is used for acquiring second fault information of the server; the information storage unit is in communication connection with the fault acquisition unit and is used for storing second fault information of the server; an information comparison unit for verifying whether the second failure information exists in the troubleshooting model. The server fault removing system, the server fault removing method and the storage medium can remove server faults more accurately and more rapidly, and have the advantages of improving production efficiency and reducing consumption of manpower and material resources.

Description

Server troubleshooting system, method and storage medium
Technical Field
The present invention relates to the field of troubleshooting technologies, and in particular, to a server troubleshooting system, method, and storage medium.
Background
With the continuous development of the server industry and related technologies, the functions of the server industry and related technologies are rich, the design is complex, the integration level is high, the design difficulty is increased, and the possibility of the server failing is increased at the same time, so that in the research and development test stage, the task of troubleshooting of research and development personnel is heavy, a large amount of manpower is needed to process the failure problem and optimize the design, correspondingly, the division of the server research and development personnel is thin, and the server research and development personnel comprises hardware, software, components, heat dissipation, power supplies and the like.
However, in the prior art, the fault diagnosis operation firstly needs to keep the fault phenomenon, and at present, the fault phenomenon may not be kept as it is due to the reason of the test schedule of the tester, and the like, and some phenomena are low probability events, once the phenomenon is destroyed, the subsequent fault elimination may be very difficult, and the situation that no fault related data exists and the fault phenomenon cannot be reproduced is likely to occur. Even if the failure-related data can be reproduced, it takes a long time to reproduce the failure-related data, and it is necessary to perform, based on the failure-related data, previous data such as capturing a failure log, refreshing a firmware version, and the like. In addition, when the signal quality is not ideal during the hardware signal test, if the signal is sent by a Complex Programmable Logic Device (CPLD), first, whether the driving capability of a related port expander (GPIO) interface of the CPLD is too high or too low is considered, and if the signal quality is still not good, the size of the series resistance needs to be changed. However, updating the driving capability requires a CPLD engineer to change the CPLD code, troubleshooting of one signal may require trying multiple driving capabilities, and each trial requires releasing a new version of the CPLD, which results in low troubleshooting efficiency. In summary, the following problems may exist in diagnosing the server fault based on the prior art: the fault-related data is difficult to reproduce, the reproduction of the fault-related data consumes a lot of time and effort, and troubleshooting is difficult.
Therefore, it is desirable to provide a server troubleshooting system, method, and storage medium that can perform fault location, fault classification, and troubleshooting.
Disclosure of Invention
In order to solve the technical problems, the invention provides a server fault removing system, a server fault removing method and a storage medium, which can locate a fault position, can acquire a fault reason, can replace manual work to primarily remove the fault, and can reduce the consumption of manpower and material resources.
In order to achieve the above object, the present application proposes a first technical solution:
a server troubleshooting system, the system comprising: a model generation unit that generates a troubleshooting model based on first failure information of the server and a failure solution corresponding thereto; the fault acquisition unit is in communication connection with the server and is used for acquiring second fault information of the server; the information storage unit is in communication connection with the fault acquisition unit and is used for storing second fault information of the server; the input end of the information comparison unit is in communication connection with the information storage unit so as to obtain second fault information of the server; the output end of the information comparison unit is in communication connection with the model generation unit so as to verify whether the second fault information exists in the troubleshooting model; if the second fault information exists in the troubleshooting model, troubleshooting the second fault information based on the troubleshooting model; if the second fault information does not exist in the fault elimination model, a fault solution corresponding to the second fault information is established through manual intervention, and the fault elimination model is updated and generated.
In an embodiment of the present invention, the information comparing unit includes: the information coarse screening module is in communication connection with the information storage unit, and performs coarse screening on the second fault information based on a first fault category to judge whether the second fault information can be matched with the first fault information; if the second fault information can be matched with the first fault information, the second fault information is eliminated based on a fault solution corresponding to the first fault information; the information fine screening module is in communication connection with the information coarse screening unit, and if the second fault information cannot be matched with the first fault information, the information fine screening module is called to continuously judge whether the second fault information can be matched with the first fault information; if the fault information can be matched with first fault information, removing the second fault information based on a fault solution corresponding to the first fault information; and if the second fault information cannot be matched with the first fault information, establishing a fault solution corresponding to the second fault information through manual intervention. .
In order to achieve the above object, the present application further provides a second technical solution:
a method is applied to a server fault removal system and comprises the following steps: and if the server fails, the server troubleshooting system is in communication connection with the server.
In one embodiment of the invention, the method comprises: generating a fault elimination model based on first fault information of the server and a corresponding fault solution method thereof; acquiring second fault information of the server, and verifying whether the second fault information exists in the troubleshooting model; if the second fault information exists in the troubleshooting model, troubleshooting the second fault information based on the troubleshooting model; and if the second fault information does not exist in the fault elimination model, updating and generating the fault elimination model.
In an embodiment of the present invention, the generating the troubleshooting model specifically includes the following steps: acquiring first fault information of a server and a corresponding fault solution; generating a data set based on the first fault information and a corresponding fault solution method thereof; and generating the troubleshooting model based on a deep learning algorithm and the data set.
In one embodiment of the invention, the updating to generate the troubleshooting model includes: if the second fault information does not exist in the fault elimination model, a fault solution corresponding to the second fault information is established through manual intervention; generating a data set based on the second fault information and a corresponding fault solution method thereof; and updating and generating the fault elimination model based on a deep learning algorithm and the data set.
In an embodiment of the present invention, the verifying whether the second failure information exists in the troubleshooting model specifically includes: based on a first fault category, performing coarse screening on the second fault information, and judging whether the second fault information can be matched with the first fault information; if the second fault information can be matched with the first fault information, the second fault information is eliminated based on a fault solution corresponding to the first fault information; if the second fault information cannot be matched with the first fault information, screening the second fault information based on a second fault category, and continuously judging whether the second fault information can be matched with the first fault information; if the fault information can be matched with first fault information, removing the second fault information based on a fault solution corresponding to the first fault information; and if the second fault information cannot be matched with the first fault information, establishing a fault solution corresponding to the second fault information through manual intervention.
In one embodiment of the invention, the first failure category includes the second failure category; the first fault information and the second fault information specifically include: location of failure and cause of failure.
In an embodiment of the present invention, the acquiring of the first fault information and the second fault information specifically includes: acquiring the fault position, and capturing a fault log of the fault position; and extracting the fault reason based on the fault log.
In order to achieve the above object, the present application proposes a third technical solution:
a computer-readable storage medium storing a program that, when executed by a processor, causes the processor to perform the steps of: and if the server fails, the server troubleshooting system is in communication connection with the server.
In one embodiment of the invention, the program, when executed by a processor, causes the processor to perform the steps of: generating a fault elimination model based on first fault information of the server and a corresponding fault solution method thereof; acquiring second fault information of the server, and verifying whether the second fault information exists in the troubleshooting model; if the second fault information exists in the troubleshooting model, troubleshooting the second fault information based on the troubleshooting model; and if the second fault information does not exist in the fault elimination model, updating and generating the fault elimination model.
In one embodiment of the invention, the program, when executed by a processor, causes the processor to perform the steps of: acquiring first fault information of a server and a corresponding fault solution; generating a data set based on the first fault information and a corresponding fault solution method thereof; and generating the troubleshooting model based on a deep learning algorithm and the data set.
In one embodiment of the invention, the program, when executed by a processor, causes the processor to perform the steps of: if the second fault information does not exist in the fault elimination model, a fault solution corresponding to the second fault information is established through manual intervention; generating a data set based on the second fault information and a corresponding fault solution method thereof; and updating and generating the fault elimination model based on a deep learning algorithm and the data set.
In one embodiment of the invention, the program, when executed by a processor, causes the processor to perform the steps of: based on a first fault category, performing coarse screening on the second fault information, and judging whether the second fault information can be matched with the first fault information; if the second fault information can be matched with the first fault information, the second fault information is eliminated based on a fault solution corresponding to the first fault information; if the second fault information cannot be matched with the first fault information, screening the second fault information based on a second fault category, and continuously judging whether the second fault information can be matched with the first fault information; if the fault information can be matched with first fault information, removing the second fault information based on a fault solution corresponding to the first fault information; and if the second fault information cannot be matched with the first fault information, establishing a fault solution corresponding to the second fault information through manual intervention.
In one embodiment of the invention, the program, when executed by a processor, causes the processor to perform the steps of: acquiring the fault position, and capturing a fault log of the fault position; and extracting the fault reason based on the fault log.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the invention relates to a server fault elimination system, a server fault elimination method and a storage medium. By establishing a fault elimination model, the fault elimination of the server is realized, and the consumption of manpower and material resources is reduced. And the fault acquisition unit and the information storage unit cooperate together, so that the fault acquired by the fault acquisition unit can be timely and accurately stored in the information storage unit, convenience for workers to know fault information is realized, and the fault which occurs is not required to be reproduced. In addition, the information comparison unit is in communication connection with the model generation unit, so that on one hand, the problem of server faults can be solved quickly and accurately on the basis of the generated fault removal model without manual intervention; on the other hand, when the newly acquired second fault information cannot be eliminated based on the existing fault elimination model, a solution corresponding to the second fault information is made through manual intervention, and the second fault information and the solution corresponding to the second fault information are used as the first fault information and the solution corresponding to the first fault information to update the existing fault elimination model, so that the whole server fault elimination system becomes a benign cyclic update. To sum up, the server fault elimination system, method and storage medium of the application can more accurately and rapidly acquire fault positions and fault reasons and eliminate server faults in the research and development test process of the server, and have the advantages of improving production efficiency and reducing consumption of manpower and material resources.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a system block diagram of the present invention;
fig. 2 is a flow chart of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment is as follows:
referring to fig. 1, fig. 1 is a system configuration diagram according to a first embodiment.
The system of the embodiment comprises: a model generation unit that generates a troubleshooting model based on first failure information of the server and a failure solution corresponding thereto; the fault acquisition unit is in communication connection with the server and is used for acquiring second fault information of the server; the information storage unit is in communication connection with the fault acquisition unit and is used for storing second fault information of the server; the input end of the information comparison unit is in communication connection with the information storage unit so as to obtain second fault information of the server; the output end of the information comparison unit is in communication connection with the model generation unit so as to verify whether the second fault information exists in the troubleshooting model; if the second fault information exists in the troubleshooting model, troubleshooting the second fault information based on the troubleshooting model; if the second fault information does not exist in the fault elimination model, a fault solution corresponding to the second fault information is established through manual intervention, and the fault elimination model is updated and generated.
In one embodiment, the information comparing unit includes: the information coarse screening module is in communication connection with the information storage unit, and performs coarse screening on the second fault information based on a first fault category to judge whether the second fault information can be matched with the first fault information; if the second fault information can be matched with the first fault information, the second fault information is eliminated based on a fault solution corresponding to the first fault information; the information fine screening module is in communication connection with the information coarse screening unit, and if the second fault information cannot be matched with the first fault information, the information fine screening module is called to continuously judge whether the second fault information can be matched with the first fault information; if the fault information can be matched with first fault information, removing the second fault information based on a fault solution corresponding to the first fault information; and if the second fault information cannot be matched with the first fault information, establishing a fault solution corresponding to the second fault information through manual intervention.
In one embodiment, the second fault category is a more detailed division of the first fault category, and first, whether the second fault information can be matched with the first fault information is determined in a coarse screening manner, and if the second fault information can be directly solved, the second fault information is directly eliminated; and if the second fault information cannot be matched with the first fault information, judging whether the second fault information can be matched with the first fault information in a fine screening mode, and if the second fault information can be directly solved, directly removing the second fault information. Therefore, when the second fault information is common, namely the second fault information and the corresponding solution method can be determined through coarse screening, the second fault information can be determined and eliminated in a very short time, and the server fault elimination efficiency is improved.
In one embodiment, if the second fault information cannot be determined and cannot be eliminated in the coarse screening and fine screening manners, manual intervention is applied, a fault solution matched with the second fault information is formulated for the second fault information, and then the second fault information and a corresponding fault solution are used as the first fault information and a corresponding solution to update the fault elimination model. The method and the device realize the continuous updating and perfection of the troubleshooting model, so that the troubleshooting model can deal with solving more server faults.
The server troubleshooting system described in this embodiment establishes the model generation unit, the fault acquisition unit, the information storage unit, and the information comparison unit. By establishing a fault elimination model, the fault elimination of the server is realized, and the consumption of manpower and material resources is reduced. And the fault acquisition unit and the information storage unit cooperate together, so that the fault acquired by the fault acquisition unit can be timely and accurately stored in the information storage unit, convenience for workers to know fault information is realized, and the fault which occurs is not required to be reproduced. In addition, the information comparison unit is in communication connection with the model generation unit, so that on one hand, the problem of server faults can be solved quickly and accurately on the basis of the generated fault removal model without manual intervention; on the other hand, when the newly acquired second fault information cannot be eliminated based on the existing fault elimination model, a solution corresponding to the second fault information is made through manual intervention, and the second fault information and the solution corresponding to the second fault information are used as the first fault information and the solution corresponding to the first fault information to update the existing fault elimination model, so that the whole server fault elimination system becomes a benign cyclic update, and the beneficial effect of eliminating more faults occurring in the server more quickly and accurately is achieved.
In one embodiment, the information comparing unit may be implemented by a module, such as a Central Processing Unit (CPU), a micro-processor (GPU), and the like, which has computing power and can analyze and locate the server fault. The information acquisition module can be realized by a module which is provided with a Field Programmable Gate Array (FPGA) and the like and can capture a fault log corresponding to the server fault information based on the server fault information. The information storage unit can be realized by modules with a data storage function, such as a memory device (SD card), a storage hard disk and the like, wherein the data stored by the information storage unit not only comprises first fault information and a corresponding solution method, but also comprises second fault information and a corresponding solution method.
In one embodiment, the communication connection manner described herein may be a network cable connection, a bluetooth connection, a radio frequency connection, or the like, which can implement mutual communication between the unit modules.
In one embodiment, the communication connection between the server troubleshooting system and the server may be: and setting an interface corresponding to the server hardware interface on the server troubleshooting system. Such as: taking an internet access interface and an extended fault handling interface (XDP) interface arranged on the server as an example, a first internet access interface and a first XDP interface are respectively arranged on the server troubleshooting system. When the server fails, the first internet access interface on the server troubleshooting system is in communication connection with the internet access interface on the server, and the first XDP interface on the server troubleshooting system is in communication connection with the XDP interface on the server, so that the server failure is monitored and solved. It should be understood that, the setting of the interfaces on the server troubleshooting system may select a plurality of hardware with which the server is relatively prone to fail according to the working experience of the person skilled in the art and the working condition of the server, and then set a plurality of interfaces in communication connection with the hardware with which the server is relatively prone to fail on the server troubleshooting system, so as to achieve the beneficial effect of reducing the cost.
Example two:
referring to fig. 2, fig. 2 is a flowchart of a method according to a second embodiment.
A server trouble shooting method is applied to the server trouble shooting system; and if the server fails, the server troubleshooting system is in communication connection with the server. It should be understood that the troubleshooting system and the server described herein are not always in communication, i.e., the troubleshooting system and the server are only communicatively connected when the server fails. The reason for this is that: on one hand, the server is used as an important component of the data center, the safety of the server is required to be guaranteed, and if the server troubleshooting system is always connected with the server, a backdoor is provided for lawbreakers, and the server is stolen or damaged; on the other hand, the server can normally run in the data center after being screened by a production line layer by layer, so that various problems are not worried about, and at the moment, the server troubleshooting model is in communication connection with the server, so that not only is space and field wasted, but also the cost is increased.
In one embodiment, the determining whether the server fails specifically includes the following steps: first operating parameters of relevant components of the server under the condition of normal operation can be recorded in advance. And comparing the working parameters of the relevant parts acquired in real time with the first working parameters to determine whether the server fails.
In one embodiment, the method comprises: generating a fault elimination model based on first fault information of the server and a corresponding fault solution method thereof; acquiring second fault information of the server, and verifying whether the second fault information exists in the troubleshooting model; if the second fault information exists in the troubleshooting model, troubleshooting the second fault information based on the troubleshooting model; and if the second fault information does not exist in the fault elimination model, updating and generating the fault elimination model. When newly acquired second fault information cannot be eliminated based on the existing fault elimination model, a solution corresponding to the second fault information is established through manual intervention, the second fault information and the corresponding solution are used as the first fault information and the corresponding solution to update the existing fault elimination model, so that the data set is always in a state of continuously expanding learning, the whole server fault elimination process becomes a benign cyclic update, and more faults occurring in the server can be eliminated more quickly and accurately by the server fault elimination system.
In one embodiment, the generating the troubleshooting model specifically includes the following steps: acquiring first fault information of a server and a corresponding fault solution; generating a data set based on the first fault information and a corresponding fault solution method thereof; and generating the troubleshooting model based on a deep learning algorithm and the data set. Specifically, the first fault information may be obtained by a person skilled in the art by summarizing and classifying fault information that is relatively easy to appear in the server according to previous work experience and practical application of the server; and a fault solution corresponding to any fault information is generated. The fault information is first fault information, and the first fault information and a corresponding fault solution method are combed to generate a data set; and then training is carried out on the basis of the data set by combining a deep learning algorithm to generate a fault elimination model. It should be understood that the fault resolution method described in the present application includes a fault resolution process and a fault resolution result.
In one embodiment, generating the troubleshooting model may specifically include the following steps: known and relatively common fault information and corresponding fault solving methods are formed into a data set. Selecting some more common fault information and corresponding fault solution as a training set from the data set, and firstly generating a preliminary fault elimination model; then, some less common fault information and corresponding fault solution methods are selected as a cross validation set, a primary fault elimination model is updated, and a relatively perfect fault elimination model is generated; and finally, taking the residual fault information in the data set and the corresponding fault solution as a test set, testing the relatively perfect fault elimination model, and further updating to generate the fault elimination model. That is, the collected first fault information and the corresponding fault solution thereof may form a data set, the first fault information and the corresponding fault solution thereof in the data set may be divided into three parts, namely, an unequal training set and a cross validation set test set, and the training and testing may be performed continuously, so as to finally generate a fault elimination model with high accuracy and adaptability and capable of eliminating more faults.
In one embodiment, the updating and generating the troubleshooting model includes: if the second fault information does not exist in the fault elimination model, a fault solution corresponding to the second fault information is established through manual intervention; generating a data set based on the second fault information and a corresponding fault solution method thereof; and updating and generating the fault elimination model based on a deep learning algorithm and the data set.
In one embodiment, the verifying whether the second failure information exists in the troubleshooting model specifically includes: based on a first fault category, performing coarse screening on the second fault information, and judging whether the second fault information can be matched with the first fault information; if the second fault information can be matched with the first fault information, the second fault information is eliminated based on a fault solution corresponding to the first fault information; if the second fault information cannot be matched with the first fault information, screening the second fault information based on a second fault category, and continuously judging whether the second fault information can be matched with the first fault information; if the fault information can be matched with first fault information, removing the second fault information based on a fault solution corresponding to the first fault information; and if the second fault information cannot be matched with the first fault information, establishing a fault solution corresponding to the second fault information through manual intervention.
In one embodiment, the first failure category includes the second failure category; preferably, the first failure category may include, but is not limited to: at least one of a hardware failure, a communication protocol failure, a component cell failure, and a heat dissipation system failure; the second failure category may include, but is not limited to: the power failure and the time sequence disorder in the hardware failure, the serial communication protocol failure and the standard failure of the high-speed serial computer expansion bus in the communication protocol failure, the CPU failure and the hard disk failure in the single component, and the over-high temperature of the memory and the over-high temperature of the CPU in the heat dissipation system failure. That is to say, the second fault category is a more detailed division of the first fault category, and firstly, whether the second fault information can be matched with the first fault information is determined in a coarse screening mode, and if the second fault information can be directly solved, the second fault information is directly eliminated; and if the second fault information cannot be matched with the first fault information, judging whether the second fault information can be matched with the first fault information in a fine screening mode, and if the second fault information can be directly solved, directly removing the second fault information. Therefore, when the second fault information is common, namely the second fault information and the corresponding solution method can be determined through coarse screening, the second fault information can be determined and eliminated in a very short time, and the server fault elimination efficiency is improved.
In one embodiment, the acquiring of the first fault information and the second fault information specifically includes: acquiring the fault position, and capturing a fault log of the fault position; and extracting the fault reason based on the fault log.
Example three:
the present embodiment provides a computer-readable storage medium, which stores a program that, when executed by a processor, causes the processor to execute the steps of the server troubleshooting method in the second embodiment.
In one embodiment, the program when executed by the processor may perform the steps of: and if the server fails, the server troubleshooting system is in communication connection with the server.
In one embodiment, the program when executed by the processor may perform the steps of: generating a fault elimination model based on first fault information of the server and a corresponding fault solution method thereof; acquiring second fault information of the server, and verifying whether the second fault information exists in the troubleshooting model; if the second fault information exists in the troubleshooting model, troubleshooting the second fault information based on the troubleshooting model; and if the second fault information does not exist in the fault elimination model, updating and generating the fault elimination model.
In one embodiment, the program when executed by the processor may perform the steps of: acquiring first fault information of a server and a corresponding fault solution; generating a data set based on the first fault information and a corresponding fault solution method thereof; and generating the troubleshooting model based on a deep learning algorithm and the data set.
In one embodiment, the program when executed by the processor may perform the steps of: if the second fault information does not exist in the fault elimination model, a fault solution corresponding to the second fault information is established through manual intervention; generating a data set based on the second fault information and a corresponding fault solution method thereof; and updating and generating the fault elimination model based on a deep learning algorithm and the data set.
In one embodiment, the program when executed by the processor may perform the steps of: based on a first fault category, performing coarse screening on the second fault information, and judging whether the second fault information can be matched with the first fault information; if the second fault information can be matched with the first fault information, the second fault information is eliminated based on a fault solution corresponding to the first fault information; if the second fault information cannot be matched with the first fault information, screening the second fault information based on a second fault category, and continuously judging whether the second fault information can be matched with the first fault information; if the fault information can be matched with first fault information, removing the second fault information based on a fault solution corresponding to the first fault information; and if the second fault information cannot be matched with the first fault information, establishing a fault solution corresponding to the second fault information through manual intervention.
As will be appreciated by one of skill in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a system for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including an instruction system which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A server troubleshooting system characterized by: the system comprises:
a model generation unit that generates a troubleshooting model based on first failure information of a server and a failure solution corresponding thereto;
the fault acquisition unit is in communication connection with the server and is used for acquiring second fault information of the server;
the information storage unit is in communication connection with the fault acquisition unit and is used for storing second fault information of the server;
the input end of the information comparison unit is in communication connection with the information storage unit so as to obtain second fault information of the server; the output end of the information comparison unit is in communication connection with the model generation unit so as to verify whether the second fault information exists in the troubleshooting model; if the second fault information exists in the troubleshooting model, troubleshooting the second fault information based on the troubleshooting model; if the second fault information does not exist in the fault elimination model, a fault solution corresponding to the second fault information is established through manual intervention, and the fault elimination model is updated and generated.
2. The server troubleshooting system of claim 1 wherein: the information comparison unit comprises:
the information coarse screening module is in communication connection with the information storage unit, and performs coarse screening on the second fault information based on a first fault category to judge whether the second fault information can be matched with the first fault information; if the second fault information can be matched with the first fault information, the second fault information is eliminated based on a fault solution corresponding to the first fault information;
the information fine screening module is in communication connection with the information coarse screening unit, and if the second fault information cannot be matched with the first fault information, the information fine screening module is called to continuously judge whether the second fault information can be matched with the first fault information; if the fault information can be matched with first fault information, removing the second fault information based on a fault solution corresponding to the first fault information; and if the second fault information cannot be matched with the first fault information, establishing a fault solution corresponding to the second fault information through manual intervention.
3. A server troubleshooting method applied to the server troubleshooting system as recited in claim 1 or 2; the method is characterized in that:
and if the server fails, the server troubleshooting system is in communication connection with the server.
4. The server failure recovery method according to claim 3, characterized in that: the method comprises the following steps:
generating a fault elimination model based on first fault information of the server and a corresponding fault solution method thereof;
acquiring second fault information of the server, and verifying whether the second fault information exists in the troubleshooting model;
if the second fault information exists in the troubleshooting model, troubleshooting the second fault information based on the troubleshooting model; and if the second fault information does not exist in the fault elimination model, updating and generating the fault elimination model.
5. The server failure recovery method according to claim 4, characterized in that: the generating of the troubleshooting model specifically includes the steps of:
acquiring first fault information of a server and a corresponding fault solution;
generating a data set based on the first fault information and a corresponding fault solution method thereof;
and generating the troubleshooting model based on a deep learning algorithm and the data set.
6. The server failure recovery method according to claim 4 or 5, characterized in that: the updating to generate the troubleshooting model includes:
if the second fault information does not exist in the fault elimination model, a fault solution corresponding to the second fault information is established through manual intervention;
generating a data set based on the second fault information and a corresponding fault solution method thereof;
and updating and generating the fault elimination model based on a deep learning algorithm and the data set.
7. The server troubleshooting method of claim 6 wherein: the verifying whether the second fault information exists in the troubleshooting model specifically includes:
based on a first fault category, performing coarse screening on the second fault information, and judging whether the second fault information can be matched with the first fault information; if the second fault information can be matched with the first fault information, the second fault information is eliminated based on a fault solution corresponding to the first fault information;
if the second fault information cannot be matched with the first fault information, screening the second fault information based on a second fault category, and continuously judging whether the second fault information can be matched with the first fault information;
if the fault information can be matched with first fault information, removing the second fault information based on a fault solution corresponding to the first fault information; and if the second fault information cannot be matched with the first fault information, establishing a fault solution corresponding to the second fault information through manual intervention.
8. The server troubleshooting method of claim 7 wherein: the first failure category includes the second failure category; the first fault information and the second fault information specifically include: location of failure and cause of failure.
9. The server troubleshooting method of claim 8 wherein: the acquiring of the first fault information and the second fault information specifically includes:
acquiring the fault position, and capturing a fault log of the fault position;
and extracting the fault reason based on the fault log.
10. A computer-readable storage medium characterized by: the computer readable storage medium stores a program which, when executed by a processor, causes the processor to perform the steps of the method according to any one of claims 3 to 9.
CN202111552628.9A 2021-12-17 2021-12-17 Server troubleshooting system, method and storage medium Pending CN114296973A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111552628.9A CN114296973A (en) 2021-12-17 2021-12-17 Server troubleshooting system, method and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111552628.9A CN114296973A (en) 2021-12-17 2021-12-17 Server troubleshooting system, method and storage medium

Publications (1)

Publication Number Publication Date
CN114296973A true CN114296973A (en) 2022-04-08

Family

ID=80967499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111552628.9A Pending CN114296973A (en) 2021-12-17 2021-12-17 Server troubleshooting system, method and storage medium

Country Status (1)

Country Link
CN (1) CN114296973A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106533754A (en) * 2016-11-08 2017-03-22 北京交通大学 Fault diagnosis method and expert system for college teaching servers
CN108629432A (en) * 2018-06-12 2018-10-09 中国三峡新能源有限公司 Troubleshooting planing method, system and device
CN109218114A (en) * 2018-11-12 2019-01-15 西安微电子技术研究所 A kind of server failure automatic checkout system and detection method based on decision tree
CN109262653A (en) * 2018-09-19 2019-01-25 北京云迹科技有限公司 Failed machines people automatic recovery method and device
CN111445135A (en) * 2020-03-26 2020-07-24 宁波邑畅交通设备科技有限公司 Intelligent maintenance method and device, computer equipment and storage medium
CN111711533A (en) * 2020-05-21 2020-09-25 北京奇艺世纪科技有限公司 Fault diagnosis method, fault diagnosis device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106533754A (en) * 2016-11-08 2017-03-22 北京交通大学 Fault diagnosis method and expert system for college teaching servers
CN108629432A (en) * 2018-06-12 2018-10-09 中国三峡新能源有限公司 Troubleshooting planing method, system and device
CN109262653A (en) * 2018-09-19 2019-01-25 北京云迹科技有限公司 Failed machines people automatic recovery method and device
CN109218114A (en) * 2018-11-12 2019-01-15 西安微电子技术研究所 A kind of server failure automatic checkout system and detection method based on decision tree
CN111445135A (en) * 2020-03-26 2020-07-24 宁波邑畅交通设备科技有限公司 Intelligent maintenance method and device, computer equipment and storage medium
CN111711533A (en) * 2020-05-21 2020-09-25 北京奇艺世纪科技有限公司 Fault diagnosis method, fault diagnosis device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111209131B (en) Method and system for determining faults of heterogeneous system based on machine learning
CN111817891A (en) Network fault processing method and device, storage medium and electronic equipment
CN110088744B (en) Database maintenance method and system
CN111147322B (en) Test system and method for micro service architecture of 5G core network
CN109150619B (en) Fault diagnosis method and system based on network flow data
CN108459951B (en) Test method and device
WO2023123943A1 (en) Interface automation testing method and apparatus, and medium, device and program
CN112000558A (en) Method for generating automatic test case of rail transit signal system
CN112231163A (en) Multifunctional computer detection equipment and operation method thereof
CN109815124B (en) MBSE-based interlocking function defect analysis method and device and interlocking system
CN115114064A (en) Micro-service fault analysis method, system, equipment and storage medium
CN110990289A (en) Method and device for automatically submitting bug, electronic equipment and storage medium
CN117729576A (en) Alarm monitoring method, device, equipment and storage medium
CN116107794B (en) Ship software fault automatic diagnosis method, system and storage medium
CN111090553B (en) Test system, test method and test device
CN114296973A (en) Server troubleshooting system, method and storage medium
CN113986618B (en) Cluster brain fracture automatic repair method, system, device and storage medium
CN116032581A (en) Network equipment security management method and electronic equipment
CN110489286B (en) BOX node machine power supply current sharing test method and system
CN110633201B (en) Integrated fuzzy test method and device for program
CN112527631A (en) bug positioning method, system, electronic equipment and storage medium
CN111967961B (en) Data mining method and device
CN113672498B (en) Automatic diagnosis test method, device and equipment
CN114090382B (en) Health inspection method and device for super-converged cluster
CN118101999B (en) Short video flow data analysis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination