WO2016090929A1 - Method, server and system for software system fault diagnosis - Google Patents

Method, server and system for software system fault diagnosis Download PDF

Info

Publication number
WO2016090929A1
WO2016090929A1 PCT/CN2015/085932 CN2015085932W WO2016090929A1 WO 2016090929 A1 WO2016090929 A1 WO 2016090929A1 CN 2015085932 W CN2015085932 W CN 2015085932W WO 2016090929 A1 WO2016090929 A1 WO 2016090929A1
Authority
WO
WIPO (PCT)
Prior art keywords
fault
attribute
matching
software
database
Prior art date
Application number
PCT/CN2015/085932
Other languages
French (fr)
Chinese (zh)
Inventor
杜征
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016090929A1 publication Critical patent/WO2016090929A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software

Definitions

  • the invention relates to the field of software system fault automation analysis, in particular to a software system fault diagnosis method, a server and a system.
  • the hardware problem is mainly caused by abnormal hardware components caused by hardware anomalies, which may be caused by hardware damage or design defects, affecting system operation. Such problems are often concentrated, and the collection of problem phenomena and problem causes is relatively small, making it easier to check.
  • the software problem is more complicated, generally due to unreasonable configuration, configuration errors, incomplete network components, unreasonable resource planning, transmission or other network problems, and due to the network.
  • the embodiment of the present invention provides a software system fault diagnosis method, a server, and a system.
  • a software system fault diagnosis method provided by an embodiment of the present invention includes the following steps:
  • the step of performing the matching in the preset rule database according to the fault attribute, and generating the fault diagnosis decision list according to the matching degree between the fault attribute and the preset rule database further includes:
  • the fault attribute is sent to the fault analysis and rule development end for analysis;
  • the step of obtaining the fault attribute of the diagnosed software system by using the network management system further includes:
  • the fault attribute includes: a configuration attribute, an alarm attribute, a performance indicator attribute, a fault cause attribute, and a solution Attributes;
  • a mapping relationship between the fault phenomenon and the fault cause attribute and the solution attribute combination is established, and the mapping relationship is entered into the fault rule database, where the fault phenomenon includes the configuration attribute, the alarm attribute, and the performance indicator attribute, and the fault The reason corresponds to the solution attribute one by one;
  • the fault attribute database and the fault rule database are merged together into a rule database, and the data in the fault attribute database and the fault rule database correspond to each other.
  • the step of forming the fault attribute data record according to the verified fault attribute and inputting the fault attribute data record into the fault attribute database is specifically:
  • the fault attribute database includes:
  • Configuration property library including configuration attribute number, software failure number queue, and configuration table
  • the alarm attribute database includes the alarm attribute number, the software fault number queue, and the alarm feature.
  • Performance indicator attribute library including performance indicator attribute number, software failure number queue, and performance indicator characteristics
  • Fault cause attribute library including fault cause attribute number, software fault number queue, and fault reason description
  • Workaround property library including solution property number, software fault number queue, and solution description.
  • the fault rule database includes: a software fault number, a software fault name, a configuration attribute group, an alarm attribute group, a performance indicator attribute group, a fault cause attribute group, a solution attribute group, and a software defect identifier;
  • the configuration attribute includes a configuration attribute number and a configuration attribute weight
  • the alarm attribute includes an alarm attribute number and an alarm attribute weight
  • the performance indicator attribute includes a performance indicator number and a performance indicator weight
  • the step of generating a fault diagnosis decision list according to the matching degree of the fault attribute in the preset rule database according to the fault attribute includes:
  • the software fault number queue corresponding to the fault attribute matched by the fault attribute is summarized and sorted to form a preliminary matching fault table; if the fault attribute does not match any fault attribute, an unknown fault attribute table is formed, wherein the preliminary match
  • the fault table includes a matching software fault number, a matching configuration attribute queue, a matched alarm attribute queue, and a matched performance indicator attribute queue, and each matched fault attribute queue is composed of a matched fault attribute number; the unknown fault attribute belongs to
  • the Attribute Table includes unmatched configuration attributes, unmatched alarm attributes, and unmatched performance indicator attributes.
  • the matched fault attribute queues are composed of matching fault attribute numbers.
  • the fault attribute number includes Configuring an attribute number, an alarm attribute number, and a performance indicator number
  • the fault attribute weight includes a configuration attribute weight, an alarm attribute weight, and a performance indicator weight
  • the matching faults are sorted from large to small, and fault cause attributes and solution attributes corresponding to the matching faults are extracted from the fault rule database to form a fault diagnosis decision.
  • a list wherein the fault diagnosis decision list includes a matching software fault number, a matching software fault name, a fault cause attribute, and a solution attribute.
  • an embodiment of the present invention further provides a software system fault diagnosis server, where the software system fault diagnosis server includes:
  • the fault attribute obtaining module is configured to acquire a fault attribute of the software system to be diagnosed through the network management system;
  • the matching decision module is configured to perform matching in the preset rule database according to the fault attribute, and generate a fault diagnosis decision list with a high to low matching degree.
  • the software system fault diagnosis server further includes a matching update module, and the matching update module is configured to:
  • the fault attribute is sent to the fault analysis and rule development end for analysis;
  • the software system fault diagnosis server further includes a database module, and the database module includes:
  • the attribute building unit is configured to form a fault attribute data record according to the verified fault attribute, and record the fault attribute data record into the fault attribute database, wherein the fault attribute includes: a configuration attribute, an alarm attribute, and a performance indicator attribute. , fault cause attribute and solution attribute;
  • the diagnosis building unit is configured to establish a mapping relationship between the fault phenomenon and the fault cause attribute and the solution attribute combination, and enter the mapping relationship into the fault rule database, wherein the fault phenomenon includes the configuration attribute, the alarm attribute, and Performance indicator attribute, the fault reason and the solution attribute are in one-to-one correspondence;
  • the rule building unit is configured to merge the fault attribute database and the fault rule database into the rule database, and the data in the fault attribute database and the fault rule database correspond to each other.
  • the attribute building unit is further configured to:
  • All the verified fault attributes are stored in the data record mode, and the verified fault attributes are respectively into a library and entered into the fault attribute database, and the fault attribute database includes:
  • Configuration property library including configuration attribute number, software failure number queue, and configuration table
  • the alarm attribute database includes the alarm attribute number, the software fault number queue, and the alarm feature.
  • Performance indicator attribute library including performance indicator attribute number, software failure number queue, and performance indicator characteristics
  • Fault cause attribute library including fault cause attribute number, software fault number queue, and fault reason description
  • Workaround property library including solution property number, software fault number queue, and solution description.
  • the fault rule database includes: a software fault number, a software fault name, a configuration attribute group, an alarm attribute group, a performance indicator attribute group, a fault cause attribute group, a solution attribute group, and a software defect identifier;
  • the configuration attribute includes a configuration attribute number and a configuration attribute weight
  • the alarm attribute includes an alarm attribute number and an alarm attribute weight
  • the performance indicator attribute includes a performance indicator number and a performance indicator weight
  • the matching decision module includes:
  • the attribute matching unit is configured to match the acquired fault attributes with the matching alarm attribute library, the configuration attribute library, and the performance indicator attribute database in the fault attribute database, respectively;
  • a preliminary matching unit configured to summarize and sort the software fault number queue corresponding to the fault attribute matched by the fault attribute to form a preliminary matching fault table; if the fault attribute does not match any fault attribute, an unknown fault attribute table is formed,
  • the preliminary matching fault table includes a matched software fault number, a matching configuration attribute queue, a matched alarm attribute queue, and a matched performance indicator attribute queue, and each matched fault attribute queue is composed of a matched fault attribute number;
  • the unknown fault attribute table includes unmatched configuration attributes, unmatched alarm attributes, and unmatched performance indicator attributes.
  • the matched fault attribute queues are composed of matching fault attribute numbers.
  • the weight matching unit is configured to match the preliminary matching fault table with the fault attribute number in the fault rule database and the fault attribute weight, and obtain a matching degree of the matching faults in the preliminary matching fault table, where
  • the fault attribute number includes a configuration attribute number, an alarm attribute number, and a performance indicator number
  • the fault attribute weight includes a configuration attribute weight, an alarm attribute weight, and a performance indicator weight
  • the decision matching unit is configured to sort the matching faults according to the matching degree of the matching faults in the preliminary matching fault table, and extract fault cause attributes and solutions corresponding to the matching faults from the fault rule database.
  • the attribute forms a fault diagnosis decision list, wherein the fault diagnosis decision list includes a matching software fault number, a matching software fault name, a fault cause attribute, and a solution attribute.
  • an embodiment of the present invention further provides a software system fault diagnosis.
  • the software system fault diagnosis system includes a software system diagnostic server, a software system client, and a fault analysis and rule development terminal.
  • the software system diagnostic server includes a fault attribute obtaining module, a matching decision module, and a matching update module, where
  • the fault attribute obtaining module is configured to acquire a fault attribute of the software system to be diagnosed through the network management system;
  • the matching decision module is configured to perform matching in the preset rule database according to the fault attribute, and generate a fault diagnosis decision list with high to low matching degree;
  • the matching update module is configured to:
  • the fault attribute is sent to the fault analysis and rule development end for analysis;
  • the matching update module is further configured to: receive the processing new rule that is analyzed by the fault analysis and the rule development end to analyze the unsuccessful fault attribute, and incorporate the processing new rule into the rule database;
  • the software system client is configured to provide a fault attribute to the software system diagnostic server, and receive the fault diagnosis decision list;
  • the fault analysis and rule development end is configured to receive a fault attribute that is unsuccessful in the fault sent by the software system diagnostic server, and analyze the fault attribute that is unsuccessful in the match to obtain a new rule, and process the new rule. Incorporated into the rules database.
  • the fault attribute of the software system to be diagnosed is obtained by the network management system, and the fault attribute includes a configuration attribute, an alarm attribute, a performance indicator attribute, a fault cause attribute, and a solution attribute, and then the verified configuration attribute of the software system to be diagnosed is The combination of the alarm attribute and the performance indicator attribute is combined with the corresponding fault cause and solution, and the mapping relationship is modeled and stored into a rule database including a software fault attribute database and a fault rule database, and finally The fault attribute of the diagnostic software system is carried out in a preset rule database Matching, generating a fault diagnosis decision list according to the matching degree between the fault attribute and the preset rule database, sending the fault diagnosis decision table to the client of the software system to be diagnosed, instructing the operator to try to recover the fault, thus implementing the software system fault Intelligent diagnosis and repair, real-time monitoring of software system faults, online update of diagnostic rules, greatly improve the efficiency and automation of software fault diagnosis and repair, and improve the maintenance and improvement efficiency of the diagnostic system itself, thus solving the existing
  • FIG. 1 is a schematic flowchart of a first embodiment of a software system fault diagnosis method according to the present invention
  • FIG. 2 is a schematic flowchart of a second embodiment of a software system fault diagnosis method according to the present invention.
  • FIG. 3 is a schematic flowchart of a third embodiment of a software system fault diagnosis method according to the present invention.
  • FIG. 4 is a schematic flowchart of a step of generating a fault diagnosis decision list according to the matching degree of the fault attribute in a preset rule database according to the fault attribute and the preset rule database;
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of a software system fault diagnosis server according to the present invention.
  • FIG. 6 is a schematic diagram of functional modules of a second embodiment of a software system fault diagnosis server according to the present invention.
  • FIG. 7 is a schematic diagram of functional modules of a third embodiment of a software system fault diagnosis server according to the present invention.
  • FIG. 8 is a schematic diagram of a refinement function module of the database module in FIG. 7;
  • FIG. 9 is a schematic diagram of a refinement function module of the matching decision module in FIG. 5;
  • FIG. 10 is a schematic diagram of functional modules of a software system fault diagnosis system according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of a fault attribute database according to an embodiment of the present invention.
  • FIG. 12 is a schematic structural diagram of a fault rule database according to an embodiment of the present invention.
  • FIG. 13 is a system deployment diagram of a software system fault diagnosis system according to an embodiment of the present invention.
  • FIG. 14 is a schematic diagram of a fault data analysis process according to an embodiment of the present invention.
  • 15 is a flowchart of software fault diagnosis based on a rule database according to an embodiment of the present invention.
  • 16 is a flowchart of interaction between a server program and a client program according to an embodiment of the present invention
  • 17 is a flowchart of executing a diagnosis plan in an embodiment of the present invention.
  • FIG. 18 is a flowchart of a server-side update software fault attribute database according to an embodiment of the present invention.
  • the embodiment of the invention provides a software system fault diagnosis method.
  • FIG. 1 is a schematic flowchart diagram of a first embodiment of a software system fault diagnosis method according to the present invention.
  • the software system fault diagnosis method comprises the following steps:
  • Step S10 Acquire a fault attribute of the software system to be diagnosed through the network management system
  • Install and run the server program on the software system troubleshooting server that is, the network management server
  • install and run the client agent on the software system client network management client
  • the client agent edits the diagnostic task and the diagnostic plan through human-machine commands, and sends a message containing the diagnostic task and the diagnostic plan to the server program through the TCP protocol to monitor the software system status in real time; the server program is in the software system to be diagnosed.
  • the network management server runs to obtain the diagnostic tasks and diagnostic plans sent by the client program, and performs diagnostic tasks, and outputs the diagnostic results to the client program.
  • Step S20 Perform matching in the preset rule database according to the fault attribute, and generate a fault diagnosis decision list according to the matching degree between the fault attribute and the preset rule database.
  • the matching is performed in the preset rule database, and the preset fault attribute suitable for the fault attribute is matched, and the fault cause and the fault solution corresponding to the preset fault attribute are found, and finally, according to the fault attribute and the preset.
  • the matching degree of the fault attribute and the corresponding fault cause and fault resolution production fault diagnosis decision list is performed in the preset rule database, and the preset fault attribute suitable for the fault attribute is matched, and the fault cause and the fault solution corresponding to the preset fault attribute are found, and finally, according to the fault attribute and the preset.
  • the fault attribute of the software system to be diagnosed is obtained by the network management system, and the fault attribute includes a configuration attribute, an alarm attribute, a performance indicator attribute, a fault cause attribute, and a solution attribute, and then the verified software system is verified.
  • a combination of configuration attributes, alarm attributes, and performance indicator attributes is combined with corresponding fault causes and solutions to form a mapping relationship, and the mapping relationship is modeled and stored into a rule database including a software fault attribute database and a fault rule database, and finally Matching according to the fault attribute of the diagnosed software system in a preset rule database, generating a fault diagnosis decision list according to the matching degree of the fault attribute and the preset rule database, and finally sending the fault diagnosis decision table to the software system to be diagnosed
  • the client guides the operator to try to recover the fault.
  • step S20 the method further includes:
  • Step S30 When the fault attribute is not successfully matched with the preset rule database, the fault attribute is sent to the fault analysis and rule development end for analysis;
  • Step S40 Receive a new rule for analyzing the fault attribute and the rule development end to analyze the unsuccessful fault attribute, and merge the processing new rule into the rule database.
  • the human machine command edits and formulates new fault attributes and fault diagnosis rules, and synchronizes the message containing the fault attribute and the fault diagnosis rule to the server through the TCP protocol. program.
  • the matching process between the fault attribute and the preset rule numerical control library is also judged. If the matching is unsuccessful, the fault attribute is unsuccessful (for example, an unknown fault table and Software defect table) is sent to the fault analysis and rule development side for system developers to analyze and edit new fault attributes and fault diagnosis rules, and then reverse The feed server updates the fault attribute database and the fault rule database. In this way, intelligent diagnosis and repair of the software system fault is realized, and the software system is automatically monitored.
  • the fault diagnosis rule is continuously improved during the system operation, and the fault diagnosis rule is greatly improved. The efficiency and automation of software troubleshooting and repair.
  • FIG. 3 is a schematic flowchart of a third embodiment of a software system fault diagnosis method according to the present invention. Referring to FIG. 3, FIG. 11 and FIG. 12 are simultaneously referred to. In the third embodiment, before step S20, the method further includes:
  • Step S50 Form a fault attribute data record according to the verified fault attribute, and record the fault attribute data record into the fault attribute database, where the fault attribute includes: a configuration attribute, an alarm attribute, a performance indicator attribute, a fault cause attribute, and a solution.
  • step S50 is specifically:
  • the fault attribute database includes:
  • Configuration property library including configuration attribute number, software failure number queue, and configuration table
  • the alarm attribute database includes the alarm attribute number, the software fault number queue, and the alarm feature.
  • Performance indicator attribute library including performance indicator attribute number, software failure number queue, and performance indicator characteristics
  • Fault cause attribute library including fault cause attribute number, software fault number queue, and fault reason description
  • Workaround property library including solution property number, software fault number queue, and solution description.
  • step S60 a mapping relationship between the fault phenomenon and the fault cause attribute and the solution attribute combination is established, and the mapping relationship is entered into the fault rule database, wherein the fault phenomenon includes configuration attributes, alarm attributes, and performance index attributes, fault causes and solution attributes.
  • the fault phenomenon includes configuration attributes, alarm attributes, and performance index attributes, fault causes and solution attributes.
  • the fault rule database includes: a software fault number, a software fault name, a configuration attribute group, an alarm attribute group, a performance indicator attribute group, a fault cause attribute group, and a solution attribute group.
  • the software defect identifier includes the configuration attribute number and the configuration attribute weight
  • the alarm attribute includes the alarm attribute number and the alarm attribute weight
  • the performance indicator attribute includes the performance indicator number and the performance indicator weight.
  • step S70 the fault attribute database and the fault rule database are merged together into the rule database, and the data in the fault attribute database and the fault rule database correspond to each other.
  • the software fault attribute is associated with the software fault to form a fault rule database of the software system, including the software fault number, the software fault name, the corresponding configuration attribute group, the alarm attribute group, the performance indicator attribute group, the fault reason attribute group, and the solution.
  • the attribute group and the software defect identifier are composed; the attribute queues belonging to the software failure are composed of the attribute number and the attribute weight; the software fault attribution relation library and the elements in each attribute library have a many-to-many relationship, and mutually establish an index table.
  • FIG. 4 is a schematic diagram of the refinement process of step S20 in FIG. 1.
  • step S20 includes:
  • Step S201 matching the acquired fault attributes with the matching alarm attribute library, the configuration attribute library, and the performance indicator attribute database in the fault attribute database, respectively;
  • the server agent matches the acquired fault attribute data with the alarm attribute library, the configuration attribute library, and the performance indicator attribute database in the fault attribute database.
  • Step S202 the software fault number queue corresponding to the fault attribute matched by the fault attribute is summarized and sorted to form a preliminary matching fault table; if the fault attribute does not match any fault attribute, an unknown fault attribute table is formed, wherein the preliminary matching fault table includes The matching software fault number, the matching configuration attribute queue, the matching alarm attribute queue, and the matched performance indicator attribute queue, each matched fault attribute queue is composed of matching fault attribute numbers; the unknown fault attribute table includes unmatched configuration attributes, Unmatched alarm attributes, unmatched performance indicator attributes, and the matched fault attribute queues are composed of matching fault attribute numbers;
  • the software fault number queue corresponding to the matched fault attribute is summarized and sorted to form a preliminary A fault table is formed; if the corresponding fault attribute does not match any software attribute, an unknown fault attribute table is formed; the preliminary matching fault table is composed of a matching software fault number, a matching configuration attribute queue, a matching alarm attribute queue, and a matching performance indicator attribute queue.
  • the matching attribute queue is composed of matching attribute numbers; the unknown fault attribute table is composed of unmatched configuration data, unmatched alarm data, and unmatched performance indicator data, and the matching attribute queue is composed of matching attribute numbers.
  • step S203 the fault matching attribute number in the preliminary matching fault table and the fault rule database are matched with the fault attribute weight, and the matching degree of the matching fault in the preliminary matching fault table is obtained, wherein the fault attribute number includes the configuration attribute number and the alarm.
  • the attribute number and the performance indicator number, and the fault attribute weight includes the configuration attribute weight, the alarm attribute weight, and the performance indicator weight;
  • the server-side agent calculates the matching degree of each matching fault in the matching fault table according to the weight of each fault attribute in the software fault attribution relationship in the corresponding software fault (ie, the fault attribute number), and reorders according to the matching degree, and the fault rule
  • the fault cause and solution attributes are extracted from the database to form a fault diagnosis decision table.
  • Step S204 Sort the matching faults according to the matching degree of the matching faults in the preliminary matching fault table, and extract fault cause attributes and solution attributes corresponding to the matching faults from the fault rule database to form fault diagnosis.
  • the decision list wherein the fault diagnosis decision list includes a matching software fault number, a matching software fault name, a fault cause attribute, and a solution attribute.
  • the fault diagnosis decision table is composed of matching software fault number, matching software fault name, fault reason attribute, and solution attribute; in addition, in the process of matching the fault attribute database and the fault rule database, it is judged whether there is a software defect, and if there is a software defect
  • the record proposes to form a software defect table consisting of a preliminary matching record identified as a software defect and a matching attribute data.
  • the server program automatically obtains the client according to the target requirement. Take the network management data (fault attribute) of the corresponding target, match the rule base, form a fault diagnosis decision table and execute suggestions according to the matching degree, and return to the client. For faults that cannot match or match the software defect, notify and send the fault data.
  • the server agent first obtains the corresponding network management data according to the fault diagnosis task sent by the client or performs a periodic monitoring plan, and forms a matching between the fault attribute database and the fault diagnosis database.
  • the unknown fault attribute table, the final fault decision table, and the software defect table respectively send the fault final decision table to the client agent, and send the unknown fault attribute table and the software defect table to the fault analysis and rule development client program;
  • the end agent receives new fault attributes and software fault diagnosis rules from the fault analysis and rule development client program, and updates to the fault attribute database and the fault diagnosis database synchronously.
  • the network management data corresponding to the known fault of the software system to be diagnosed including the configuration data, the alarm data, and the performance indicator data, and the fault cause and solution corresponding to the known fault are regarded as five attributes.
  • the organization is a software fault diagnosis rule, and all the rules are organized into a library to form a software fault diagnosis rule base, and the five attributes are organized into a fault attribute database; the two libraries are deployed on the network management server of the software system to be diagnosed;
  • the agent is deployed on the network management server, the client agent is deployed on the client PC, and the fault analysis and rule development client is deployed on the system developer server to be diagnosed.
  • the server-side agent obtains the diagnosis result by acquiring the data on the network management system, matching the fault attribute database and the fault diagnosis rule base according to the diagnosis task, and respectively feeding back the result to the client agent and the fault analysis and rule development client for the client to operate. Personnel perform recovery measures and system developers analyze software failures.
  • the software system fault diagnosis method of the embodiment of the present invention is further described in detail below with reference to FIG. 13 to FIG. 18, and the method includes:
  • Step 1 The server agent receives the diagnosis object sent by the client, or the diagnostic plan periodic timer expires, and the diagnosis process starts;
  • Step 2 The server agent determines the diagnostic object level and the object number according to the content of the diagnosis object in the diagnosis object or the diagnosis plan, and extracts configuration data, alarm data, and performance indicator data of the corresponding object number in the network management;
  • Step 3 Match the extracted configuration data with the configuration attribute database in the fault attribute database, record the matching configuration attribute number, calculate the matching weight A, and extract the software fault attribute group corresponding to the configuration attribute;
  • Step 4 Match the extracted alarm data with the alarm attribute database in the fault attribute database, record the matched alarm attribute number, calculate the matching weight B, and extract the software fault attribute group corresponding to the configuration attribute.
  • Step 5 Match the extracted performance indicator data with the performance indicator attribute database in the fault attribute database, record the matched performance indicator attribute number, calculate the matching weight C, and extract the software fault attribute group corresponding to the performance indicator attribute;
  • Step 6 Summarize the matched software fault attribute groups, and use the software fault number as an index to summarize the matched matching configuration attribute number group, the alarm attribute number group, the performance indicator number group, and the matching weights of each attribute;
  • Step 7 Calculate and record the final matching value Z of each matched software fault attribute in the previous step according to the software fault recorded by the fault diagnosis rule base and the weight of the corresponding state attribute (A ⁇ , B ⁇ , C ⁇ ). Sorting according to the final matching value Z to form a preliminary matching fault table;
  • Step 8 If the preliminary matching fault table is empty and the system is abnormal, the fault data is summarized to form an unknown fault attribute table. If the preliminary matching fault table is not empty, the software corresponding to the software fault recorded in the fault diagnosis rule base is extracted. The cause attribute and the solution attribute form a software fault decision table;
  • Step 9 According to the software defect attribute corresponding to the software fault recorded in the fault diagnosis rule base, extract the software fault attribute record confirmed as the software defect, and the matched fault data, a software defect table;
  • Step 10 Send the software fault decision table to the client agent through the network system of the network management; send the software defect table and the unknown software fault attribute table to the fault analysis and rule development client;
  • Step 11 The diagnosis process ends.
  • step 3 specifically includes the following steps:
  • Step 3.1 performs sequence matching according to the extracted configuration data and the configuration attribute library in the fault attribute database, and configures the configuration attribute data in the attribute database as a matching rule edited by the developer, and the specific expression is as follows:
  • Step 3.2 After all the configuration attribute database matching is completed, the configuration attribute number is indexed, and the matching configuration attribute number and the corresponding matching weight value and the corresponding software fault number are summarized; the other attribute matching process is the same as the configuration attribute.
  • the fault attribute database and the fault rule database are formed by establishing a corresponding relationship between the alarm attribute, the performance attribute and the configuration attribute of the software system to be diagnosed, and the fault cause attribute and the solution attribute, and modeling and warehousing
  • the software fault diagnosis and task management and interpersonal interaction are divided into a server program and a client agent.
  • the client agent triggers the server to obtain the software system fault data and analyzes it by establishing a fault diagnosis task or formulating a fault diagnosis plan to generate a fault diagnosis.
  • the results are divided into final fault diagnosis decision table, unknown fault table and software defect table, and the final fault diagnosis table is fed back to the client respectively to guide the operator to try to recover the fault; the unknown fault table and the software defect table are fed back to the fault.
  • the analysis and rule development client is used by the system developer to analyze and edit the new fault attributes and fault diagnosis rules, and then feed back to the server to update the software fault diagnosis attribute database and the software fault diagnosis rule base. In this way, intelligent diagnosis and repair of software faults is realized, and automatic software system faults are also realized. Monitoring, at the same time, can continuously improve the fault diagnosis rules while the system is running, greatly improving the efficiency and automation of software fault diagnosis and repair.
  • the embodiment of the present invention further provides a software system fault diagnosis server (ie, a server end), a software system fault diagnosis client (ie, a client), and a fault analysis and rule development client.
  • a software system fault diagnosis server ie, a server end
  • a software system fault diagnosis client ie, a client
  • a fault analysis and rule development client ie, a server end
  • the interaction process, the specific steps are as follows:
  • Step a the client agent organizes a diagnostic task or a diagnosis plan, encapsulates it into a command message, and sends it to the server agent;
  • Step b the server agent receives the command message sent by the client and decodes, if it is a diagnostic task, triggers the diagnosis process, and returns the diagnosis result; if it is a diagnosis plan, updates the diagnosis plan, and returns the diagnosis plan update result;
  • Step c The client agent receives the diagnosis result and displays it to the human machine interface
  • Step d the client agent receives the diagnosis plan update result, and displays it to the man-machine interface;
  • Step e If there is a software defect record, the server agent sends a software defect message to the fault analysis and rule development client through FTP.
  • Step f If the diagnosis result does not match any known attribute, notify the client to find the unknown fault, and organize the unknown fault attribute table, and send it to the fault analysis and rule development client through FTP.
  • Step g If the server-side agent detects that the diagnostic plan timer has timed out, the diagnostic plan is executed, and the steps are the same as steps a-f.
  • an embodiment of the present invention further provides a process for updating a software fault attribute database of a server end, and the specific steps are as follows:
  • Step A The system developer edits the new fault attribute and the fault diagnosis rule through the fault analysis and rule development client, and encapsulates it into a fault diagnosis rule message, and sends the message to the server agent.
  • the fault attribute includes the configuration attribute, the alarm attribute, and the performance indicator attribute number and the fault diagnosis primitive.
  • the fault diagnosis rule includes the software fault number, the five-element attribute, and each attribute in the soft. The weight of the fault in the fault.
  • Step B The server-side agent receives the fault diagnosis rule message and decodes it, and updates to the fault attribute database and the fault diagnosis rule database respectively;
  • Step C The server-side agent sends a fault rule update result message to the fault analysis and rule development client.
  • Step B specifically includes the following steps:
  • Step B.1 The server side determines whether to add according to the number of the fault attribute received, and if it is new, directly adds a record in the fault attribute database, and if not, adds the fault diagnosis primitive on the original record;
  • Step B.2 The server determines whether to add the new software according to the received software fault number. If it is new, it adds a record directly to the fault diagnosis rule database. If it is not new, the fault matching data is updated on the original record.
  • FIG. 5 is a schematic diagram of functional modules of the first embodiment of the software system fault diagnosis server according to the present invention.
  • the software system fault diagnosis server comprises:
  • the fault attribute obtaining module 10 is configured to acquire a fault attribute of the software system to be diagnosed through the network management system;
  • Install and run the server program on the software system troubleshooting server that is, the network management server
  • install and run the client agent on the software system client network management client
  • the client agent edits the diagnostic task and the diagnostic plan through human-machine commands, and sends a message containing the diagnostic task and the diagnostic plan to the server program through the TCP protocol to monitor the software system status in real time; the server program is in the software system to be diagnosed.
  • the network management server runs to obtain the diagnostic tasks and diagnostic plans sent by the client program, and performs diagnostic tasks, and outputs the diagnostic results to the client program.
  • the matching decision module 20 is configured to perform the matching in the preset rule database according to the fault attribute. Match, generate a list of troubleshooting decisions with high to low matching.
  • the matching is performed in the preset rule database, and the preset fault attribute suitable for the fault attribute is matched, and the fault cause and the fault solution corresponding to the preset fault attribute are found, and finally, according to the fault attribute and the preset.
  • the matching degree of the fault attribute and the corresponding fault cause and fault resolution production fault diagnosis decision list is performed in the preset rule database, and the preset fault attribute suitable for the fault attribute is matched, and the fault cause and the fault solution corresponding to the preset fault attribute are found, and finally, according to the fault attribute and the preset.
  • the fault attribute of the software system to be diagnosed is obtained by the network management system, and the fault attribute includes a configuration attribute, an alarm attribute, a performance indicator attribute, a fault cause attribute, and a solution attribute, and then the verified software system is verified.
  • a combination of configuration attributes, alarm attributes, and performance indicator attributes is combined with corresponding fault causes and solutions to form a mapping relationship, and the mapping relationship is modeled and stored into a rule database including a software fault attribute database and a fault rule database, and finally Matching according to the fault attribute of the diagnosed software system in a preset rule database, generating a fault diagnosis decision list according to the matching degree of the fault attribute and the preset rule database, and finally sending the fault diagnosis decision table to the software system to be diagnosed
  • the client guides the operator to try to recover the fault.
  • Each of the above units may be implemented by a central processing unit (CPU), a digital signal processor (DSP), or a field-programmable gate array (FPGA) in an electronic device.
  • CPU central processing unit
  • DSP digital signal processor
  • FPGA field-programmable gate array
  • FIG. 6 is a schematic diagram of functional modules of a second embodiment of a software system fault diagnosis server according to the present invention.
  • the software system fault diagnosis server further includes a matching update module 30, and the match update module 30 is configured to:
  • the fault attribute is sent to the fault analysis and rule development end for analysis;
  • the receiving failure analysis and the rule development end process the new rule for analyzing the unsuccessful failure attribute, and process the new rule into the rule database.
  • the human machine command edits and formulates new fault attributes and fault diagnosis rules, and synchronizes the message containing the fault attribute and the fault diagnosis rule to the server through the TCP protocol. program.
  • the matching process between the fault attribute and the preset rule numerical control library is also judged. If the matching is unsuccessful, the fault attribute is unsuccessful (for example, an unknown fault table and The software defect table is sent to the fault analysis and rule development end, and the system developer analyzes and edits the new fault attribute and fault diagnosis rule, and then feeds back to the server to update the fault attribute database and the fault rule database, so that the software is implemented.
  • the system developer analyzes and edits the new fault attribute and fault diagnosis rule, and then feeds back to the server to update the fault attribute database and the fault rule database, so that the software is implemented.
  • intelligent diagnosis and repair of system failure automatic monitoring of software system is realized, and fault diagnosis rules are continuously improved during system operation, which greatly improves the efficiency and automation of software fault diagnosis and repair.
  • FIG. 7 is a schematic diagram of a functional module of a software system fault diagnosis server according to a third embodiment of the present invention.
  • the software system fault diagnosis server further includes a database module 40, and the database module 40 includes:
  • the attribute building unit 401 is configured to form a fault attribute data record according to the verified fault attribute, and record the fault attribute data record into the fault attribute database, wherein the fault attribute includes: a configuration attribute, an alarm attribute, a performance indicator attribute, Fault cause attribute and solution attribute;
  • the attribute database unit 401 is further configured to:
  • the fault attribute database includes:
  • Configuration property library including configuration attribute number, software failure number queue, and configuration table
  • the alarm attribute database includes the alarm attribute number, the software fault number queue, and the alarm feature.
  • Performance indicator attribute library including performance indicator attribute number, software failure number queue, and performance indicator characteristics
  • Fault cause attribute library including fault cause attribute number, software fault number queue, and fault reason description
  • Workaround property library including solution property number, software fault number queue, and solution description.
  • the diagnostic building unit 402 is configured to establish a mapping relationship between the fault phenomenon and the fault cause attribute and the solution attribute combination, and enter the mapping relationship into the fault rule database, wherein the fault phenomenon includes configuration attributes, alarm attributes, and performance index attributes, and the fault The reason corresponds to the solution attribute one by one;
  • the fault rule database includes: a software fault number, a software fault name, a configuration attribute group, an alarm attribute group, a performance indicator attribute group, a fault cause attribute group, a solution attribute group, and a software defect identifier;
  • the configuration attribute includes a configuration attribute number and Configure the attribute weights.
  • the alarm attributes include the alarm attribute number and the alarm attribute weight.
  • the performance indicator attributes include the performance indicator number and the performance indicator weight.
  • the rule building unit 403 is configured to merge the fault attribute database and the fault rule database into the rule database, and the data in the fault attribute database and the fault rule database correspond to each other.
  • the software fault attribute is associated with the software fault to form a fault rule database of the software system, including the software fault number, the software fault name, the corresponding configuration attribute group, the alarm attribute group, the performance indicator attribute group, the fault reason attribute group, and the solution.
  • the attribute group and the software defect identifier are composed; the attribute queues belonging to the software failure are composed of the attribute number and the attribute weight; the software fault attribution relation library and the elements in each attribute library have a many-to-many relationship, and mutually establish an index table.
  • the matching decision module 20 includes:
  • the attribute matching unit 201 is configured to match the acquired fault attributes with the matching alarm attribute library, the configuration attribute library, and the performance indicator attribute database in the fault attribute database, respectively;
  • the server agent matches the acquired fault attribute data with the alarm attribute library, the configuration attribute library, and the performance indicator attribute database in the fault attribute database.
  • the preliminary matching unit 202 is configured to summarize and sort the software fault number queue corresponding to the fault attribute matched by the fault attribute to form a preliminary matching fault table; if the fault attribute does not match any fault attribute, an unknown fault attribute table is formed, wherein
  • the matching fault table includes a matching software fault number, a matching configuration attribute queue, a matched alarm attribute queue, and a matched performance indicator attribute queue.
  • Each matched fault attribute queue is composed of a matching fault attribute number;
  • the unknown fault attribute table includes an unmatched Configuration attributes, unmatched alarm attributes, unmatched performance indicator attributes, and the matching fault attribute queues are composed of matching fault attribute numbers;
  • the software fault number queue corresponding to the matched fault attribute is summarized and sorted to form a preliminary matching fault table; if the corresponding fault attribute does not match any software attribute, an unknown fault attribute table is formed; the preliminary matching fault table is matched by the software fault number, matching
  • the configuration attribute queue, the matching alarm attribute queue, and the matching performance indicator attribute queue are composed, and each matching attribute queue is composed of matching attribute numbers; the unknown fault attribute table is composed of unmatched configuration data, unmatched alarm data, and unmatched performance indicator data, and is matched.
  • the attribute queue consists of matching attribute numbers.
  • the weight matching unit 203 is configured to match the fault attribute number in the preliminary matching fault table and the fault rule database with the fault attribute weight, and obtain a matching degree of the matching fault in the preliminary matching fault table, where the fault attribute number includes Configure the attribute number, alarm attribute number, and performance indicator number.
  • the fault attribute weights include configuration attribute weights, alarm attribute weights, and performance indicator weights.
  • the server-side agent calculates the matching degree of each matching fault in the matching fault table according to the weight of each fault attribute in the software fault attribution relationship in the corresponding software fault (ie, the fault attribute number), and reorders according to the matching degree, and the fault rule Extract the cause of the failure and the solution attribute in the database. Form a fault diagnosis decision table.
  • the decision matching unit 204 is configured to sort the matching faults according to the matching degree of the matching faults in the preliminary matching fault table, and extract the fault reason attribute and the solution attribute corresponding to the matching fault from the fault rule database. Forming a fault diagnosis decision list, wherein the fault diagnosis decision list includes a matching software fault number, a matching software fault name, a fault cause attribute, and a solution attribute.
  • the fault diagnosis decision table is composed of matching software fault number, matching software fault name, fault reason attribute, and solution attribute; in addition, in the process of matching the fault attribute database and the fault rule database, it is judged whether there is a software defect, and if there is a software defect
  • the record proposes to form a software defect table consisting of a preliminary matching record identified as a software defect and a matching attribute data.
  • the server program automatically acquires the network management data (fault attribute) of the corresponding target according to the client analysis target requirement, matches the rule base, and forms a fault diagnosis decision table and execution suggestions according to the matching degree, and returns to the
  • the client notifies and sends the fault data to the fault analysis and rule development client for faults that cannot be matched or matched to the software defect; specifically, the server agent first performs fault diagnosis tasks or performs regular monitoring according to the client. Plan to obtain the corresponding network management data of the object, and through the matching of the fault attribute database and the fault diagnosis database, form an unknown fault attribute table, a final fault decision table, and a software defect table, and respectively send the fault final decision table to the client agent, which will be unknown.
  • the fault attribute table and the software defect table are sent to the fault analysis and rule development client program; then the server agent receives the new fault attribute and software fault diagnosis rule sent by the fault analysis and rule development client program, and synchronizes the update to Fault attribute database and fault Diagnostic database.
  • the embodiment of the present invention further provides a software system fault diagnosis system.
  • the software system fault diagnosis system includes a software system diagnostic server 100 and a software system client. 200 and fault analysis and rule development terminal 300,
  • the software system diagnostic server 100 includes a fault attribute obtaining module 10, a matching decision module 20, and a matching update module 30, where
  • the fault attribute obtaining module 10 is configured to acquire a fault attribute of the software system to be diagnosed through the network management system;
  • the matching decision module 20 is configured to perform matching in a preset rule database according to the fault attribute, and generate a fault diagnosis decision list with high to low matching degree;
  • the matching update module 30 is configured to send the fault attribute to the fault analysis and rule development end for analysis when the fault attribute is not successfully matched with the preset rule database;
  • the matching update module 30 is further configured to: receive a new rule for analyzing the fault attribute and the rule development end to analyze the unsuccessful fault attribute, and merge the processing new rule into the rule database;
  • a software system client 200 configured to provide a fault attribute to a software system diagnostic server and to receive the fault diagnosis decision list;
  • the fault analysis and rule development terminal 300 is configured to receive the unsuccessful fault attribute sent by the software system diagnostic server, and analyze the unsuccessful fault attribute to obtain a new rule, and process the new rule into the rule database. .
  • the server program (ie, the software system diagnostic server) runs in the network management server of the software system to be diagnosed, acquires the network management data according to the requirements, acquires the diagnosis task and the diagnosis plan sent by the client agent, and executes the task. Diagnosing, outputting diagnostic results to the client; obtaining fault attributes and fault diagnosis rules sent by the fault analysis and rule development client, and updating to the fault attribute database and the fault diagnosis rule base.
  • the client agent edits the diagnostic task and the diagnostic plan through the man-machine command, and sends a message containing the diagnostic task and the diagnosis plan information to the server program through the TCP protocol; the client agent obtains the diagnosis result of the server program, and the graphic Display, based on the diagnosis by the operator The results and repairs are performed as recommended, attempting to recover from the failure.
  • Fault analysis and rule development client program edit and formulate new fault attributes and fault diagnosis rules through human-machine commands, and synchronize messages containing fault attributes and fault diagnosis rules to the server program through TCP protocol; fault analysis and rule development The client program obtains the diagnosis result and fault data sent by the server program FTP, and provides the R&D personnel with analysis and location.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. Instructions are provided for implementation The steps of a function specified in a block or blocks of a flow or a flow and/or a block diagram of a flow chart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

A method, a server and a system for software system fault diagnosis are provided, the method includes: obtaining fault properties of a software system diagnosed by means of a network management system (S10); matching in a default rule database according to the fault properties, generating a list of fault diagnosis decisions according to matching degree of the fault properties with the default rule database (S20).

Description

软件***故障诊断方法、服务器及***Software system fault diagnosis method, server and system 技术领域Technical field
本发明涉及软件***故障自动化分析领域,尤其涉及一种软件***故障诊断方法、服务器及***。The invention relates to the field of software system fault automation analysis, in particular to a software system fault diagnosis method, a server and a system.
背景技术Background technique
随着技术的进步,大型分布式软件***应用逐渐增多,面向通讯、网络服务、智能管理***等各个方面,而这类***的复杂性和规模对维护也提出了更高的要求,包括人员数量及人员技能,维护成本和难度不断增大。With the advancement of technology, the application of large-scale distributed software systems is increasing, facing communication, network services, intelligent management systems, etc., and the complexity and scale of such systems put forward higher requirements for maintenance, including the number of personnel. And personnel skills, maintenance costs and difficulty continue to increase.
在软件***的维护过程中,主要需要应对两个方面的问题,一方面是硬件问题,一方面是软件问题。硬件问题主要是由硬件异常造成的某组件工作不正常,可能是硬件损坏或者设计缺陷造成,影响***运行,此类问题往往现象比较集中,问题现象和问题原因的集合相对较少,较容易排查,通过整理输出硬件排查手册就可以满足要求;软件问题就比较复杂,一般是因为配置不合理、配置错误、网络组件不完整、资源规划不合理、传输或其他网络问题所引起的,同时由于网络的复杂性,这类软件问题的现象和问题原因的对应关系集合非常庞大,这需要维护人员具有优秀的技术基础和长期的技术积累才能解决软件***的问题,从而使得软件***维护的学习成本过高,也使软件***的维护群体无法面向普通用户或者一般维护人员,维护十分不方便。In the maintenance process of the software system, it is mainly necessary to deal with two aspects, one is the hardware problem, and the other is the software problem. The hardware problem is mainly caused by abnormal hardware components caused by hardware anomalies, which may be caused by hardware damage or design defects, affecting system operation. Such problems are often concentrated, and the collection of problem phenomena and problem causes is relatively small, making it easier to check. By sorting out the output hardware troubleshooting manual, the requirements can be met; the software problem is more complicated, generally due to unreasonable configuration, configuration errors, incomplete network components, unreasonable resource planning, transmission or other network problems, and due to the network. The complexity, the combination of the phenomenon of such software problems and the cause of the problem is very large, which requires maintenance personnel to have excellent technical foundation and long-term technical accumulation to solve the problems of the software system, so that the learning cost of software system maintenance is too High, also makes the maintenance group of the software system unable to face ordinary users or general maintenance personnel, and maintenance is very inconvenient.
发明内容Summary of the invention
有鉴于此,为解决现有技术存在的技术问题,本发明实施例提供一种软件***故障诊断方法、服务器及***。 In view of this, in order to solve the technical problem existing in the prior art, the embodiment of the present invention provides a software system fault diagnosis method, a server, and a system.
本发明实施例提供的一种软件***故障诊断方法,所述软件***故障诊断方法包括以下步骤:A software system fault diagnosis method provided by an embodiment of the present invention, the software system fault diagnosis method includes the following steps:
通过网管***获取被诊断软件***的故障属性;Obtaining fault attributes of the diagnosed software system through the network management system;
根据所述故障属性在预设的规则数据库中进行匹配,根据所述故障属性与预设的规则数据库的匹配度生成故障诊断决策列表。Performing matching in the preset rule database according to the fault attribute, and generating a fault diagnosis decision list according to the matching degree of the fault attribute and the preset rule database.
其中,所述根据所述故障属性在预设的规则数据库中进行匹配,根据所述故障属性与预设的规则数据库的匹配度生成故障诊断决策列表的步骤之后还包括:The step of performing the matching in the preset rule database according to the fault attribute, and generating the fault diagnosis decision list according to the matching degree between the fault attribute and the preset rule database further includes:
当所述故障属性与预设的规则数据库匹配不成功,则将所述故障属性发送至故障分析与规则开发端进行分析;When the fault attribute is not successfully matched with the preset rule database, the fault attribute is sent to the fault analysis and rule development end for analysis;
接收所述故障分析与规则开发端对匹配不成功的故障属性进行分析所得的处理新规则,并将所述处理新规则并入所述规则数据库中。Receiving the failure analysis and the rule development terminal to process the new rule that the unsuccessful failure attribute is analyzed, and incorporating the processing new rule into the rule database.
其中,所述通过网管***获取被诊断软件***的故障属性的步骤之前还包括:The step of obtaining the fault attribute of the diagnosed software system by using the network management system further includes:
根据已验证的故障属性分别形成故障属性数据记录,并将该故障属性数据记录录入故障属性数据库中,其中,所述故障属性包括:配置属性、告警属性、性能指标属性、故障原因属性和解决办法属性;Forming a fault attribute data record according to the verified fault attribute, and recording the fault attribute data record into the fault attribute database, wherein the fault attribute includes: a configuration attribute, an alarm attribute, a performance indicator attribute, a fault cause attribute, and a solution Attributes;
建立故障现象与所述故障原因属性和解决办法属性组合的映射关系,并将该映射关系录入故障规则数据库,其中,所述故障现象包括所述配置属性、告警属性和性能指标属性,所述故障原因与解决办法属性一一对应;A mapping relationship between the fault phenomenon and the fault cause attribute and the solution attribute combination is established, and the mapping relationship is entered into the fault rule database, where the fault phenomenon includes the configuration attribute, the alarm attribute, and the performance indicator attribute, and the fault The reason corresponds to the solution attribute one by one;
将所述故障属性数据库和故障规则数据库一起并入规则数据库,所述故障属性数据库和故障规则数据库中的数据互相对应。The fault attribute database and the fault rule database are merged together into a rule database, and the data in the fault attribute database and the fault rule database correspond to each other.
其中,所述根据已验证的故障属性分别形成故障属性数据记录,并将该故障属性数据记录录入故障属性数据库中的步骤具体为:The step of forming the fault attribute data record according to the verified fault attribute and inputting the fault attribute data record into the fault attribute database is specifically:
采用数据记录方式存储所有已验证的故障属性,并将该已验证的故障 属性分别成库并录入故障属性数据库中,所述故障属性数据库包括:Store all verified fault attributes in data logging mode and use the verified faults The attributes are respectively stored in the database and entered into the fault attribute database. The fault attribute database includes:
配置属性库,包括配置属性编号、软件故障编号队列和配置表;Configuration property library, including configuration attribute number, software failure number queue, and configuration table;
告警属性库,包括告警属性编号、软件故障编号队列和告警特征;The alarm attribute database includes the alarm attribute number, the software fault number queue, and the alarm feature.
性能指标属性库,包括性能指标属性编号、软件故障编号队列和性能指标特征;Performance indicator attribute library, including performance indicator attribute number, software failure number queue, and performance indicator characteristics;
故障原因属性库,包括故障原因属性编号、软件故障编号队列和故障原因描述;Fault cause attribute library, including fault cause attribute number, software fault number queue, and fault reason description;
解决办法属性库,包括解决办法属性编号、软件故障编号队列和解决办法描述。Workaround property library, including solution property number, software fault number queue, and solution description.
其中,所述故障规则数据库包括:软件故障编号、软件故障名称、配置属性组、告警属性组、性能指标属性组、故障原因属性组、解决办法属性组和是否软件缺陷标识;The fault rule database includes: a software fault number, a software fault name, a configuration attribute group, an alarm attribute group, a performance indicator attribute group, a fault cause attribute group, a solution attribute group, and a software defect identifier;
所述配置属性包括配置属性编号和配置属性权值,所述告警属性包括告警属性编号和告警属性权值,所述性能指标属性包括性能指标编号和性能指标权值。The configuration attribute includes a configuration attribute number and a configuration attribute weight, and the alarm attribute includes an alarm attribute number and an alarm attribute weight, and the performance indicator attribute includes a performance indicator number and a performance indicator weight.
其中,所述根据所述故障属性在预设的规则数据库中进行匹配,根据所述故障属性与预设的规则数据库的匹配度生成故障诊断决策列表的步骤包括:The step of generating a fault diagnosis decision list according to the matching degree of the fault attribute in the preset rule database according to the fault attribute includes:
将获取到的所述故障属性分别与所述故障属性数据库中的匹配告警属性库、配置属性库和性能指标属性库进行匹配;And matching the obtained fault attributes with the matching alarm attribute library, the configuration attribute library, and the performance indicator attribute database in the fault attribute database, respectively;
将所述故障属性匹配到的故障属性对应的软件故障编号队列汇总排序,形成初步匹配故障表;若所述故障属性没有匹配到任何故障属性,则形成未知故障属性表,其中,所述初步匹配故障表包括匹配的软件故障编号、匹配的配置属性队列、匹配的告警属性队列、匹配的性能指标属性队列,各匹配的故障属性队列由匹配的故障属性编号组成;所述未知故障属 性表包括未匹配的配置属性、未匹配的告警属性、未匹配的性能指标属性,各位匹配的故障属性队列由匹配的故障属性编号组成;The software fault number queue corresponding to the fault attribute matched by the fault attribute is summarized and sorted to form a preliminary matching fault table; if the fault attribute does not match any fault attribute, an unknown fault attribute table is formed, wherein the preliminary match The fault table includes a matching software fault number, a matching configuration attribute queue, a matched alarm attribute queue, and a matched performance indicator attribute queue, and each matched fault attribute queue is composed of a matched fault attribute number; the unknown fault attribute belongs to The Attribute Table includes unmatched configuration attributes, unmatched alarm attributes, and unmatched performance indicator attributes. The matched fault attribute queues are composed of matching fault attribute numbers.
将所述初步匹配故障表与所述故障规则数据库中的故障属性编号与故障属性权值进行匹配,得出所述初步匹配故障表中个匹配故障的匹配度,其中,所述故障属性编号包括配置属性编号、告警属性编号和性能指标编号,所述故障属性权值包括配置属性权值、告警属性权值和性能指标权值;Matching the preliminary matching fault table with the fault attribute number in the fault rule database and the fault attribute weight, and obtaining a matching degree of the matching faults in the preliminary matching fault table, where the fault attribute number includes Configuring an attribute number, an alarm attribute number, and a performance indicator number, where the fault attribute weight includes a configuration attribute weight, an alarm attribute weight, and a performance indicator weight;
根据所述初步匹配故障表中个匹配故障的匹配度由大至小对该匹配故障进行排序,并从故障规则数据库中提取与匹配故障相对应的故障原因属性和解决办法属性,形成故障诊断决策列表,其中所述故障诊断决策列表包括匹配软件故障编号、匹配软件故障名称、故障原因属性和解决办法属性。According to the matching degree of the matching faults in the preliminary matching fault table, the matching faults are sorted from large to small, and fault cause attributes and solution attributes corresponding to the matching faults are extracted from the fault rule database to form a fault diagnosis decision. A list wherein the fault diagnosis decision list includes a matching software fault number, a matching software fault name, a fault cause attribute, and a solution attribute.
此外,为实现上述目的,本发明实施例还提供一种软件***故障诊断服务器,所述软件***故障诊断服务器包括:In addition, in order to achieve the above object, an embodiment of the present invention further provides a software system fault diagnosis server, where the software system fault diagnosis server includes:
故障属性获取模块,配置为通过网管***获取被诊断软件***的故障属性;The fault attribute obtaining module is configured to acquire a fault attribute of the software system to be diagnosed through the network management system;
匹配决策模块,配置为根据所述故障属性在预设的规则数据库中进行匹配,生成匹配度由高到低的故障诊断决策列表。The matching decision module is configured to perform matching in the preset rule database according to the fault attribute, and generate a fault diagnosis decision list with a high to low matching degree.
其中,所述软件***故障诊断服务器还包括匹配更新模块,所述匹配更新模块配置为:The software system fault diagnosis server further includes a matching update module, and the matching update module is configured to:
当所述故障属性与预设的规则数据库匹配不成功,则将所述故障属性发送至故障分析与规则开发端进行分析;When the fault attribute is not successfully matched with the preset rule database, the fault attribute is sent to the fault analysis and rule development end for analysis;
接收所述故障分析与规则开发端对匹配不成功的故障属性进行分析所得的处理新规则,并将所述处理新规则并入所述规则数据库中。Receiving the failure analysis and the rule development terminal to process the new rule that the unsuccessful failure attribute is analyzed, and incorporating the processing new rule into the rule database.
其中,所述软件***故障诊断服务器还包括数据库模块,所述数据库模块包括: The software system fault diagnosis server further includes a database module, and the database module includes:
属性建库单元,配置为根据已验证的故障属性分别形成故障属性数据记录,并将该故障属性数据记录录入故障属性数据库中,其中,所述故障属性包括:配置属性、告警属性、性能指标属性、故障原因属性和解决办法属性;The attribute building unit is configured to form a fault attribute data record according to the verified fault attribute, and record the fault attribute data record into the fault attribute database, wherein the fault attribute includes: a configuration attribute, an alarm attribute, and a performance indicator attribute. , fault cause attribute and solution attribute;
诊断建库单元,配置为建立故障现象与所述故障原因属性和解决办法属性组合的映射关系,并将该映射关系录入故障规则数据库,其中,所述故障现象包括所述配置属性、告警属性和性能指标属性,所述故障原因与解决办法属性一一对应;The diagnosis building unit is configured to establish a mapping relationship between the fault phenomenon and the fault cause attribute and the solution attribute combination, and enter the mapping relationship into the fault rule database, wherein the fault phenomenon includes the configuration attribute, the alarm attribute, and Performance indicator attribute, the fault reason and the solution attribute are in one-to-one correspondence;
规则建库单元,配置为将所述故障属性数据库和故障规则数据库一起并入规则数据库,所述故障属性数据库和故障规则数据库中的数据互相对应。The rule building unit is configured to merge the fault attribute database and the fault rule database into the rule database, and the data in the fault attribute database and the fault rule database correspond to each other.
其中,所述属性建库单元还配置为:The attribute building unit is further configured to:
采用数据记录方式存储所有已验证的故障属性,并将该已验证的故障属性分别成库并录入故障属性数据库中,所述故障属性数据库包括:All the verified fault attributes are stored in the data record mode, and the verified fault attributes are respectively into a library and entered into the fault attribute database, and the fault attribute database includes:
配置属性库,包括配置属性编号、软件故障编号队列和配置表;Configuration property library, including configuration attribute number, software failure number queue, and configuration table;
告警属性库,包括告警属性编号、软件故障编号队列和告警特征;The alarm attribute database includes the alarm attribute number, the software fault number queue, and the alarm feature.
性能指标属性库,包括性能指标属性编号、软件故障编号队列和性能指标特征;Performance indicator attribute library, including performance indicator attribute number, software failure number queue, and performance indicator characteristics;
故障原因属性库,包括故障原因属性编号、软件故障编号队列和故障原因描述;Fault cause attribute library, including fault cause attribute number, software fault number queue, and fault reason description;
解决办法属性库,包括解决办法属性编号、软件故障编号队列和解决办法描述。Workaround property library, including solution property number, software fault number queue, and solution description.
其中,所述故障规则数据库包括:软件故障编号、软件故障名称、配置属性组、告警属性组、性能指标属性组、故障原因属性组、解决办法属性组和是否软件缺陷标识; The fault rule database includes: a software fault number, a software fault name, a configuration attribute group, an alarm attribute group, a performance indicator attribute group, a fault cause attribute group, a solution attribute group, and a software defect identifier;
所述配置属性包括配置属性编号和配置属性权值,所述告警属性包括告警属性编号和告警属性权值,所述性能指标属性包括性能指标编号和性能指标权值。The configuration attribute includes a configuration attribute number and a configuration attribute weight, and the alarm attribute includes an alarm attribute number and an alarm attribute weight, and the performance indicator attribute includes a performance indicator number and a performance indicator weight.
其中,所述匹配决策模块包括:The matching decision module includes:
属性匹配单元,配置为将获取到的所述故障属性分别与所述故障属性数据库中的匹配告警属性库、配置属性库和性能指标属性库进行匹配;The attribute matching unit is configured to match the acquired fault attributes with the matching alarm attribute library, the configuration attribute library, and the performance indicator attribute database in the fault attribute database, respectively;
初步匹配单元,配置为将所述故障属性匹配到的故障属性对应的软件故障编号队列汇总排序,形成初步匹配故障表;若所述故障属性没有匹配到任何故障属性,则形成未知故障属性表,其中,所述初步匹配故障表包括匹配的软件故障编号、匹配的配置属性队列、匹配的告警属性队列、匹配的性能指标属性队列,各匹配的故障属性队列由匹配的故障属性编号组成;所述未知故障属性表包括未匹配的配置属性、未匹配的告警属性、未匹配的性能指标属性,各位匹配的故障属性队列由匹配的故障属性编号组成;a preliminary matching unit configured to summarize and sort the software fault number queue corresponding to the fault attribute matched by the fault attribute to form a preliminary matching fault table; if the fault attribute does not match any fault attribute, an unknown fault attribute table is formed, The preliminary matching fault table includes a matched software fault number, a matching configuration attribute queue, a matched alarm attribute queue, and a matched performance indicator attribute queue, and each matched fault attribute queue is composed of a matched fault attribute number; The unknown fault attribute table includes unmatched configuration attributes, unmatched alarm attributes, and unmatched performance indicator attributes. The matched fault attribute queues are composed of matching fault attribute numbers.
权值匹配单元,配置为将所述初步匹配故障表与所述故障规则数据库中的故障属性编号与故障属性权值进行匹配,得出所述初步匹配故障表中个匹配故障的匹配度,其中,所述故障属性编号包括配置属性编号、告警属性编号和性能指标编号,所述故障属性权值包括配置属性权值、告警属性权值和性能指标权值;The weight matching unit is configured to match the preliminary matching fault table with the fault attribute number in the fault rule database and the fault attribute weight, and obtain a matching degree of the matching faults in the preliminary matching fault table, where The fault attribute number includes a configuration attribute number, an alarm attribute number, and a performance indicator number, and the fault attribute weight includes a configuration attribute weight, an alarm attribute weight, and a performance indicator weight;
决策匹配单元,配置为根据所述初步匹配故障表中个匹配故障的匹配度由大至小对该匹配故障进行排序,并从故障规则数据库中提取与匹配故障相对应的故障原因属性和解决办法属性,形成故障诊断决策列表,其中所述故障诊断决策列表包括匹配软件故障编号、匹配软件故障名称、故障原因属性和解决办法属性。The decision matching unit is configured to sort the matching faults according to the matching degree of the matching faults in the preliminary matching fault table, and extract fault cause attributes and solutions corresponding to the matching faults from the fault rule database. The attribute forms a fault diagnosis decision list, wherein the fault diagnosis decision list includes a matching software fault number, a matching software fault name, a fault cause attribute, and a solution attribute.
此外,为实现上述目的,本发明实施例还提供一种软件***故障诊断 ***,所述软件***故障诊断***包括软件***诊断服务器、软件***客户端和故障分析与规则开发端,In addition, in order to achieve the above object, an embodiment of the present invention further provides a software system fault diagnosis. The software system fault diagnosis system includes a software system diagnostic server, a software system client, and a fault analysis and rule development terminal.
所述软件***诊断服务器包括故障属性获取模块、匹配决策模块和匹配更新模块,其中,The software system diagnostic server includes a fault attribute obtaining module, a matching decision module, and a matching update module, where
所述故障属性获取模块,配置为通过网管***获取被诊断软件***的故障属性;The fault attribute obtaining module is configured to acquire a fault attribute of the software system to be diagnosed through the network management system;
所述匹配决策模块,配置为根据所述故障属性在预设的规则数据库中进行匹配,生成匹配度由高到低的故障诊断决策列表;The matching decision module is configured to perform matching in the preset rule database according to the fault attribute, and generate a fault diagnosis decision list with high to low matching degree;
所述匹配更新模块配置为:The matching update module is configured to:
当所述故障属性与预设的规则数据库匹配不成功,则将所述故障属性发送至故障分析与规则开发端进行分析;When the fault attribute is not successfully matched with the preset rule database, the fault attribute is sent to the fault analysis and rule development end for analysis;
所述匹配更新模块还配置为:接收所述故障分析与规则开发端对匹配不成功的故障属性进行分析所得的处理新规则,并将所述处理新规则并入所述规则数据库中;The matching update module is further configured to: receive the processing new rule that is analyzed by the fault analysis and the rule development end to analyze the unsuccessful fault attribute, and incorporate the processing new rule into the rule database;
所述软件***客户端,配置为向软件***诊断服务器提供故障属性,以及接收所述故障诊断决策列表;The software system client is configured to provide a fault attribute to the software system diagnostic server, and receive the fault diagnosis decision list;
所述故障分析与规则开发端,配置为接收所述软件***诊断服务器发送过来的匹配不成功的故障属性,并对该匹配不成功的故障属性进行分析得到处理新规则,将所述处理新规则并入所述规则数据库中。The fault analysis and rule development end is configured to receive a fault attribute that is unsuccessful in the fault sent by the software system diagnostic server, and analyze the fault attribute that is unsuccessful in the match to obtain a new rule, and process the new rule. Incorporated into the rules database.
本发明实施例通过网管***获取被诊断软件***的故障属性,该故障属性包括配置属性、告警属性、性能指标属性、故障原因属性和解决办法属性,然后将被诊断软件***的已验证的配置属性、告警属性和性能指标属性的组合与对应的故障原因和解决办法组合建立映射关系,并将该映射关系建模和入库,形成包括软件故障属性数据库和故障规则数据库的规则数据库,最后根据被诊断软件***的故障属性在预设的规则数据库中进行 匹配,根据故障属性与预设的规则数据库的匹配度生成故障诊断决策列表,将该故障诊断决策表发送至被诊断软件***的客户端,指导操作人员尝试恢复故障,如此,实现了软件***故障的智能诊断与修复,软件***故障实时监测,在线更新诊断规则,大大提高了软件故障诊断与修复的效率和自动化程度,同时也提高诊断***本身的维护与改进效率,从而解决了现有软件***维护的学习成本高、维护不方便的技术问题。In the embodiment of the present invention, the fault attribute of the software system to be diagnosed is obtained by the network management system, and the fault attribute includes a configuration attribute, an alarm attribute, a performance indicator attribute, a fault cause attribute, and a solution attribute, and then the verified configuration attribute of the software system to be diagnosed is The combination of the alarm attribute and the performance indicator attribute is combined with the corresponding fault cause and solution, and the mapping relationship is modeled and stored into a rule database including a software fault attribute database and a fault rule database, and finally The fault attribute of the diagnostic software system is carried out in a preset rule database Matching, generating a fault diagnosis decision list according to the matching degree between the fault attribute and the preset rule database, sending the fault diagnosis decision table to the client of the software system to be diagnosed, instructing the operator to try to recover the fault, thus implementing the software system fault Intelligent diagnosis and repair, real-time monitoring of software system faults, online update of diagnostic rules, greatly improve the efficiency and automation of software fault diagnosis and repair, and improve the maintenance and improvement efficiency of the diagnostic system itself, thus solving the existing software system Maintenance of technical problems with high learning costs and inconvenient maintenance.
附图说明DRAWINGS
图1为本发明软件***故障诊断方法第一实施例的流程示意图;1 is a schematic flowchart of a first embodiment of a software system fault diagnosis method according to the present invention;
图2为本发明软件***故障诊断方法第二实施例的流程示意图;2 is a schematic flowchart of a second embodiment of a software system fault diagnosis method according to the present invention;
图3为本发明软件***故障诊断方法第三实施例的流程示意图;3 is a schematic flowchart of a third embodiment of a software system fault diagnosis method according to the present invention;
图4为图1中根据所述故障属性在预设的规则数据库中进行匹配,根据所述故障属性与预设的规则数据库的匹配度生成故障诊断决策列表的步骤的细化流程示意图;4 is a schematic flowchart of a step of generating a fault diagnosis decision list according to the matching degree of the fault attribute in a preset rule database according to the fault attribute and the preset rule database;
图5为本发明软件***故障诊断服务器第一实施例的功能模块示意图;5 is a schematic diagram of functional modules of a first embodiment of a software system fault diagnosis server according to the present invention;
图6为本发明软件***故障诊断服务器第二实施例的功能模块示意图;6 is a schematic diagram of functional modules of a second embodiment of a software system fault diagnosis server according to the present invention;
图7为本发明软件***故障诊断服务器第三实施例的功能模块示意图;7 is a schematic diagram of functional modules of a third embodiment of a software system fault diagnosis server according to the present invention;
图8为图7中数据库模块的细化功能模块示意图;8 is a schematic diagram of a refinement function module of the database module in FIG. 7;
图9为图5中匹配决策模块的细化功能模块示意图;9 is a schematic diagram of a refinement function module of the matching decision module in FIG. 5;
图10为本发明实施例中软件***故障诊断***的功能模块示意图;10 is a schematic diagram of functional modules of a software system fault diagnosis system according to an embodiment of the present invention;
图11为本发明实施例中故障属性数据库结构示意图;11 is a schematic structural diagram of a fault attribute database according to an embodiment of the present invention;
图12为本发明实施例中故障规则数据库结构示意图;12 is a schematic structural diagram of a fault rule database according to an embodiment of the present invention;
图13为本发明实施例中软件***故障诊断***的***部署图;13 is a system deployment diagram of a software system fault diagnosis system according to an embodiment of the present invention;
图14为本发明实施例中故障数据分析过程示意图;14 is a schematic diagram of a fault data analysis process according to an embodiment of the present invention;
图15为本发明实施例中基于规则数据库的软件故障诊断流程图;15 is a flowchart of software fault diagnosis based on a rule database according to an embodiment of the present invention;
图16为本发明实施例中服务器端程序与客户端程序交互流程图; 16 is a flowchart of interaction between a server program and a client program according to an embodiment of the present invention;
图17为本发明实施例中执行诊断计划的流程图;17 is a flowchart of executing a diagnosis plan in an embodiment of the present invention;
图18为本发明实施例中服务器端更新软件故障属性数据库的流程图。FIG. 18 is a flowchart of a server-side update software fault attribute database according to an embodiment of the present invention.
具体实施方式detailed description
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional features, and advantages of the present invention will be further described in conjunction with the embodiments.
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
本发明实施例提供一种软件***故障诊断方法。The embodiment of the invention provides a software system fault diagnosis method.
参照图1,图1为本发明软件***故障诊断方法第一实施例的流程示意图。Referring to FIG. 1, FIG. 1 is a schematic flowchart diagram of a first embodiment of a software system fault diagnosis method according to the present invention.
在第一实施例中,该软件***故障诊断方法包括以下步骤:In the first embodiment, the software system fault diagnosis method comprises the following steps:
步骤S10,通过网管***获取被诊断软件***的故障属性;Step S10: Acquire a fault attribute of the software system to be diagnosed through the network management system;
在软件***故障诊断服务器(即网管服务器)安装并运行服务端程序,在软件***客户端(网管客户端)安装并运行客户端代理程序。客户端代理程序通过人机命令,编辑诊断任务和诊断计划,并通过TCP协议将包含诊断任务和诊断计划的消息发送至服务端程序以实时监控软件***状态;服务端程序在待诊断软件***的网管服务器中运行,获取客户端程序发来的诊断任务和诊断计划,并执行诊断任务,输出诊断结果反馈给客户端程序。Install and run the server program on the software system troubleshooting server (that is, the network management server), and install and run the client agent on the software system client (network management client). The client agent edits the diagnostic task and the diagnostic plan through human-machine commands, and sends a message containing the diagnostic task and the diagnostic plan to the server program through the TCP protocol to monitor the software system status in real time; the server program is in the software system to be diagnosed. The network management server runs to obtain the diagnostic tasks and diagnostic plans sent by the client program, and performs diagnostic tasks, and outputs the diagnostic results to the client program.
步骤S20,根据故障属性在预设的规则数据库中进行匹配,根据故障属性与预设的规则数据库的匹配度生成故障诊断决策列表。Step S20: Perform matching in the preset rule database according to the fault attribute, and generate a fault diagnosis decision list according to the matching degree between the fault attribute and the preset rule database.
根据故障属性在预设的规则数据库中进行匹配,匹配出于该故障属性相适的预设故障属性,并查找出预设故障属性对应的故障原因和故障解决办法,最后根据故障属性与预设故障属性的匹配度和相应的故障原因和故障解决办法生产故障诊断决策列表。 According to the fault attribute, the matching is performed in the preset rule database, and the preset fault attribute suitable for the fault attribute is matched, and the fault cause and the fault solution corresponding to the preset fault attribute are found, and finally, according to the fault attribute and the preset. The matching degree of the fault attribute and the corresponding fault cause and fault resolution production fault diagnosis decision list.
在本实施例中,通过网管***获取被诊断软件***的故障属性,该故障属性包括配置属性、告警属性、性能指标属性、故障原因属性和解决办法属性,然后将被诊断软件***的已验证的配置属性、告警属性和性能指标属性的组合与对应的故障原因和解决办法组合建立映射关系,并将该映射关系建模和入库,形成包括软件故障属性数据库和故障规则数据库的规则数据库,最后根据被诊断软件***的故障属性在预设的规则数据库中进行匹配,根据故障属性与预设的规则数据库的匹配度生成故障诊断决策列表,最后将该故障诊断决策表发送至被诊断软件***的客户端,指导操作人员尝试恢复故障,如此,实现了软件***故障的智能诊断与修复,软件***故障实时监测,在线更新诊断规则,大大提高了软件故障诊断与修复的效率和自动化程度,同时也提高诊断***本身的维护与改进效率,从而解决了现有软件***维护的学习成本高、维护不方便的技术问题。In this embodiment, the fault attribute of the software system to be diagnosed is obtained by the network management system, and the fault attribute includes a configuration attribute, an alarm attribute, a performance indicator attribute, a fault cause attribute, and a solution attribute, and then the verified software system is verified. A combination of configuration attributes, alarm attributes, and performance indicator attributes is combined with corresponding fault causes and solutions to form a mapping relationship, and the mapping relationship is modeled and stored into a rule database including a software fault attribute database and a fault rule database, and finally Matching according to the fault attribute of the diagnosed software system in a preset rule database, generating a fault diagnosis decision list according to the matching degree of the fault attribute and the preset rule database, and finally sending the fault diagnosis decision table to the software system to be diagnosed The client guides the operator to try to recover the fault. In this way, intelligent diagnosis and repair of software system faults, real-time monitoring of software system faults, and online update of diagnostic rules are greatly improved, which greatly improves the efficiency and automation of software fault diagnosis and repair. improve The system itself off maintenance and improvement of efficiency, so as to solve the high cost of studying existing software systems maintenance, maintenance technical problems inconvenient.
图2为本发明软件***故障诊断方法第二实施例的流程示意图。参照图2,在第二实施例中,步骤S20之后还包括:2 is a schematic flow chart of a second embodiment of a software system fault diagnosis method according to the present invention. Referring to FIG. 2, in the second embodiment, after step S20, the method further includes:
步骤S30,当故障属性与预设的规则数据库匹配不成功,则将故障属性发送至故障分析与规则开发端进行分析;Step S30: When the fault attribute is not successfully matched with the preset rule database, the fault attribute is sent to the fault analysis and rule development end for analysis;
步骤S40,接收故障分析与规则开发端对匹配不成功的故障属性进行分析所得的处理新规则,并将处理新规则并入规则数据库中。Step S40: Receive a new rule for analyzing the fault attribute and the rule development end to analyze the unsuccessful fault attribute, and merge the processing new rule into the rule database.
故障分析与规则开发端接收到匹配不成功的故障属性后,通过人机命令编辑和制定新的故障属性和故障诊断规则,并通过TCP协议将包含故障属性和故障诊断规则的消息同步到服务端程序。After the failure analysis and the rule development end receive the unsuccessful fault attribute, the human machine command edits and formulates new fault attributes and fault diagnosis rules, and synchronizes the message containing the fault attribute and the fault diagnosis rule to the server through the TCP protocol. program.
在本实施例中,通过在生成故障诊断决策列表的同时,也对故障属性与预设的规则数控库的匹配过程进行判断,若匹配不成功,则将不成功故障属性(例如未知故障表和软件缺陷表)发送至故障分析与规则开发端,用于***开发人员分析并编辑新的故障属性和故障诊断规则,然后再次反 馈给服务器端更新故障属性数据库和故障规则数据库,这样,在实现软件***故障的智能诊断和修复的同时,也实现了软件***的自动监控,在***运行时不断完善故障诊断规则,大大提高了软件故障诊断与修复的效率和自动化程度。In this embodiment, by generating a fault diagnosis decision list, the matching process between the fault attribute and the preset rule numerical control library is also judged. If the matching is unsuccessful, the fault attribute is unsuccessful (for example, an unknown fault table and Software defect table) is sent to the fault analysis and rule development side for system developers to analyze and edit new fault attributes and fault diagnosis rules, and then reverse The feed server updates the fault attribute database and the fault rule database. In this way, intelligent diagnosis and repair of the software system fault is realized, and the software system is automatically monitored. The fault diagnosis rule is continuously improved during the system operation, and the fault diagnosis rule is greatly improved. The efficiency and automation of software troubleshooting and repair.
图3为本发明软件***故障诊断方法第三实施例的流程示意图,参照图3,同时参照图11和图12。在第三实施例中,步骤S20之前还包括:FIG. 3 is a schematic flowchart of a third embodiment of a software system fault diagnosis method according to the present invention. Referring to FIG. 3, FIG. 11 and FIG. 12 are simultaneously referred to. In the third embodiment, before step S20, the method further includes:
步骤S50,根据已验证的故障属性分别形成故障属性数据记录,并将该故障属性数据记录录入故障属性数据库中,其中,故障属性包括:配置属性、告警属性、性能指标属性、故障原因属性和解决办法属性;Step S50: Form a fault attribute data record according to the verified fault attribute, and record the fault attribute data record into the fault attribute database, where the fault attribute includes: a configuration attribute, an alarm attribute, a performance indicator attribute, a fault cause attribute, and a solution. Method attribute
其中,步骤S50具体为:Wherein, step S50 is specifically:
采用数据记录方式存储所有已验证的故障属性,并将该已验证的故障属性分别成库并录入故障属性数据库中,故障属性数据库包括:Data logging is used to store all verified fault attributes, and the verified fault attributes are separately stored into a database and entered into the fault attribute database. The fault attribute database includes:
配置属性库,包括配置属性编号、软件故障编号队列和配置表;Configuration property library, including configuration attribute number, software failure number queue, and configuration table;
告警属性库,包括告警属性编号、软件故障编号队列和告警特征;The alarm attribute database includes the alarm attribute number, the software fault number queue, and the alarm feature.
性能指标属性库,包括性能指标属性编号、软件故障编号队列和性能指标特征;Performance indicator attribute library, including performance indicator attribute number, software failure number queue, and performance indicator characteristics;
故障原因属性库,包括故障原因属性编号、软件故障编号队列和故障原因描述;Fault cause attribute library, including fault cause attribute number, software fault number queue, and fault reason description;
解决办法属性库,包括解决办法属性编号、软件故障编号队列和解决办法描述。Workaround property library, including solution property number, software fault number queue, and solution description.
步骤S60,建立故障现象与故障原因属性和解决办法属性组合的映射关系,并将该映射关系录入故障规则数据库,其中,故障现象包括配置属性、告警属性和性能指标属性,故障原因与解决办法属性一一对应;In step S60, a mapping relationship between the fault phenomenon and the fault cause attribute and the solution attribute combination is established, and the mapping relationship is entered into the fault rule database, wherein the fault phenomenon includes configuration attributes, alarm attributes, and performance index attributes, fault causes and solution attributes. One-to-one correspondence;
其中,故障规则数据库包括:软件故障编号、软件故障名称、配置属性组、告警属性组、性能指标属性组、故障原因属性组、解决办法属性组 和是否软件缺陷标识;配置属性包括配置属性编号和配置属性权值,告警属性包括告警属性编号和告警属性权值,性能指标属性包括性能指标编号和性能指标权值。The fault rule database includes: a software fault number, a software fault name, a configuration attribute group, an alarm attribute group, a performance indicator attribute group, a fault cause attribute group, and a solution attribute group. And the software defect identifier; the configuration attribute includes the configuration attribute number and the configuration attribute weight, the alarm attribute includes the alarm attribute number and the alarm attribute weight, and the performance indicator attribute includes the performance indicator number and the performance indicator weight.
步骤S70,将故障属性数据库和故障规则数据库一起并入规则数据库,故障属性数据库和故障规则数据库中的数据互相对应。In step S70, the fault attribute database and the fault rule database are merged together into the rule database, and the data in the fault attribute database and the fault rule database correspond to each other.
将软件故障的属性和软件故障建立对应关系,形成软件***的故障规则数据库,由软件故障编号、软件故障名称、对应配置属性组、告警属性组、性能指标属性组、故障原因属性组、解决办法属性组和是否软件缺陷标识组成;归属软件故障的各属性队列,由属性编号和属性权重组成;软件故障归属关系库和各个属性库中的元素具有多对多的关系,互相建立索引表。The software fault attribute is associated with the software fault to form a fault rule database of the software system, including the software fault number, the software fault name, the corresponding configuration attribute group, the alarm attribute group, the performance indicator attribute group, the fault reason attribute group, and the solution. The attribute group and the software defect identifier are composed; the attribute queues belonging to the software failure are composed of the attribute number and the attribute weight; the software fault attribution relation library and the elements in each attribute library have a many-to-many relationship, and mutually establish an index table.
图4为图1中步骤S20的细化流程示意图。参照图4,在本实施例中,步骤S20包括:FIG. 4 is a schematic diagram of the refinement process of step S20 in FIG. 1. Referring to FIG. 4, in this embodiment, step S20 includes:
步骤S201,将获取到的故障属性分别与故障属性数据库中的匹配告警属性库、配置属性库和性能指标属性库进行匹配;Step S201, matching the acquired fault attributes with the matching alarm attribute library, the configuration attribute library, and the performance indicator attribute database in the fault attribute database, respectively;
服务端代理程序(服务器)把获取到的故障属性数据分别与故障属性数据库中的告警属性库、配置属性库和性能指标属性库匹配。The server agent (server) matches the acquired fault attribute data with the alarm attribute library, the configuration attribute library, and the performance indicator attribute database in the fault attribute database.
步骤S202,将故障属性匹配到的故障属性对应的软件故障编号队列汇总排序,形成初步匹配故障表;若故障属性没有匹配到任何故障属性,则形成未知故障属性表,其中,初步匹配故障表包括匹配的软件故障编号、匹配的配置属性队列、匹配的告警属性队列、匹配的性能指标属性队列,各匹配的故障属性队列由匹配的故障属性编号组成;未知故障属性表包括未匹配的配置属性、未匹配的告警属性、未匹配的性能指标属性,各位匹配的故障属性队列由匹配的故障属性编号组成;Step S202, the software fault number queue corresponding to the fault attribute matched by the fault attribute is summarized and sorted to form a preliminary matching fault table; if the fault attribute does not match any fault attribute, an unknown fault attribute table is formed, wherein the preliminary matching fault table includes The matching software fault number, the matching configuration attribute queue, the matching alarm attribute queue, and the matched performance indicator attribute queue, each matched fault attribute queue is composed of matching fault attribute numbers; the unknown fault attribute table includes unmatched configuration attributes, Unmatched alarm attributes, unmatched performance indicator attributes, and the matched fault attribute queues are composed of matching fault attribute numbers;
将匹配到的故障属性对应的软件故障编号队列汇总排序,形成初步匹 配故障表;若对应故障属性没有匹配到任何软件属性,则形成未知故障属性表;初步匹配故障表由匹配软件故障编号、匹配配置属性队列、匹配告警属性队列、匹配性能指标属性队列组成,各匹配属性队列由匹配的属性编号组成;未知故障属性表由未匹配配置数据、未匹配告警数据、未匹配性能指标数据组成,匹配属性队列由匹配的属性编号组成。The software fault number queue corresponding to the matched fault attribute is summarized and sorted to form a preliminary A fault table is formed; if the corresponding fault attribute does not match any software attribute, an unknown fault attribute table is formed; the preliminary matching fault table is composed of a matching software fault number, a matching configuration attribute queue, a matching alarm attribute queue, and a matching performance indicator attribute queue. The matching attribute queue is composed of matching attribute numbers; the unknown fault attribute table is composed of unmatched configuration data, unmatched alarm data, and unmatched performance indicator data, and the matching attribute queue is composed of matching attribute numbers.
步骤S203,将初步匹配故障表与故障规则数据库中的故障属性编号与故障属性权值进行匹配,得出初步匹配故障表中个匹配故障的匹配度,其中,故障属性编号包括配置属性编号、告警属性编号和性能指标编号,故障属性权值包括配置属性权值、告警属性权值和性能指标权值;In step S203, the fault matching attribute number in the preliminary matching fault table and the fault rule database are matched with the fault attribute weight, and the matching degree of the matching fault in the preliminary matching fault table is obtained, wherein the fault attribute number includes the configuration attribute number and the alarm. The attribute number and the performance indicator number, and the fault attribute weight includes the configuration attribute weight, the alarm attribute weight, and the performance indicator weight;
服务器端代理程序根据软件故障归属关系中各故障属性在对应软件故障(即故障属性编号)中的权值,计算匹配故障表中各匹配故障的匹配度,并按匹配度重新排序,从故障规则数据库中提取故障原因和解决办法属性,形成故障诊断决策表。The server-side agent calculates the matching degree of each matching fault in the matching fault table according to the weight of each fault attribute in the software fault attribution relationship in the corresponding software fault (ie, the fault attribute number), and reorders according to the matching degree, and the fault rule The fault cause and solution attributes are extracted from the database to form a fault diagnosis decision table.
步骤S204,根据初步匹配故障表中个匹配故障的匹配度由大至小对该匹配故障进行排序,并从故障规则数据库中提取与匹配故障相对应的故障原因属性和解决办法属性,形成故障诊断决策列表,其中故障诊断决策列表包括匹配软件故障编号、匹配软件故障名称、故障原因属性和解决办法属性。Step S204: Sort the matching faults according to the matching degree of the matching faults in the preliminary matching fault table, and extract fault cause attributes and solution attributes corresponding to the matching faults from the fault rule database to form fault diagnosis. The decision list, wherein the fault diagnosis decision list includes a matching software fault number, a matching software fault name, a fault cause attribute, and a solution attribute.
故障诊断决策表由匹配软件故障编号、匹配软件故障名称、故障原因属性、解决办法属性;此外,在故障属性数据库与故障规则数据库的匹配对应过程中,判断是否存在软件缺陷,若存在将软件缺陷记录提出形成软件缺陷表,该软件缺陷表由标识为软件缺陷的初步匹配记录和匹配的各属性数据组成。完成上述分析后服务端程序分别将故障诊断决策表发往服务端程序,将未知故障属性表和软件缺陷表发往故障分析与规则开发客户端。The fault diagnosis decision table is composed of matching software fault number, matching software fault name, fault reason attribute, and solution attribute; in addition, in the process of matching the fault attribute database and the fault rule database, it is judged whether there is a software defect, and if there is a software defect The record proposes to form a software defect table consisting of a preliminary matching record identified as a software defect and a matching attribute data. After completing the above analysis, the server program sends the fault diagnosis decision table to the server program, and sends the unknown fault attribute table and the software defect table to the fault analysis and rule development client.
在本实施例中,服务端程序(服务器)根据客户端分析目标要求,自动获 取对应目标的网管数据(故障属性),匹配规则库,按照匹配度排序形成故障诊断决策表及执行建议并返回给客户端,对于无法匹配或者匹配到软件缺陷的故障,将故障数据通知并发送到故障分析与规则开发客户端;具体地,首先服务端代理程序根据客户端发来的故障诊断任务或者执行定期监控计划,获取对象对应网管数据,通过故障属性数据库和故障诊断数据库的匹配,形成未知故障属性表、最终故障决策表和软件缺陷表,并分别将故障最终决策表发送给客户端代理程序,将未知故障属性表和软件缺陷表发送给故障分析与规则开发客户端程序;然后服务端代理程序接收到故障分析与规则开发客户端程序发来的新的故障属性和软件故障诊断规则,并同步更新到故障属性数据库和故障诊断数据库。In this embodiment, the server program (server) automatically obtains the client according to the target requirement. Take the network management data (fault attribute) of the corresponding target, match the rule base, form a fault diagnosis decision table and execute suggestions according to the matching degree, and return to the client. For faults that cannot match or match the software defect, notify and send the fault data. To the fault analysis and rule development client; specifically, the server agent first obtains the corresponding network management data according to the fault diagnosis task sent by the client or performs a periodic monitoring plan, and forms a matching between the fault attribute database and the fault diagnosis database. The unknown fault attribute table, the final fault decision table, and the software defect table, and respectively send the fault final decision table to the client agent, and send the unknown fault attribute table and the software defect table to the fault analysis and rule development client program; The end agent receives new fault attributes and software fault diagnosis rules from the fault analysis and rule development client program, and updates to the fault attribute database and the fault diagnosis database synchronously.
在本发明的各种实施例中:将待诊断软件***的已知故障对应的网管数据,包括配置数据、告警数据和性能指标数据,以及已知故障对应的故障原因和解决办法作为5种属性,组织为软件故障诊断规则,并将所有规则整理入库,形成软件故障诊断规则库,五种属性整理为故障属性数据库;将这两个库部署在被诊断软件***的网管服务器上;将服务端代理程序部署在网管服务器上,将客户端代理程序部署在客户端个人电脑机上,将故障分析与规则开发客户端部署在待诊断***开发方服务器上。服务器端代理程序根据诊断任务,通过获取网管上的数据,匹配故障属性数据库和故障诊断规则库,形成诊断结果,并分别反馈到客户端代理程序和故障分析与规则开发客户端,供客户端操作人员执行恢复措施以及***开发人员分析软件故障。In various embodiments of the present invention, the network management data corresponding to the known fault of the software system to be diagnosed, including the configuration data, the alarm data, and the performance indicator data, and the fault cause and solution corresponding to the known fault are regarded as five attributes. The organization is a software fault diagnosis rule, and all the rules are organized into a library to form a software fault diagnosis rule base, and the five attributes are organized into a fault attribute database; the two libraries are deployed on the network management server of the software system to be diagnosed; The agent is deployed on the network management server, the client agent is deployed on the client PC, and the fault analysis and rule development client is deployed on the system developer server to be diagnosed. The server-side agent obtains the diagnosis result by acquiring the data on the network management system, matching the fault attribute database and the fault diagnosis rule base according to the diagnosis task, and respectively feeding back the result to the client agent and the fault analysis and rule development client for the client to operate. Personnel perform recovery measures and system developers analyze software failures.
下面结合图13至图18对本发明实施例软件***故障诊断方法做进一步地详细说明,该方法包括:The software system fault diagnosis method of the embodiment of the present invention is further described in detail below with reference to FIG. 13 to FIG. 18, and the method includes:
步骤1、服务端代理程序收到客户端发来的诊断对象,或者诊断计划周期性定时器超时,诊断过程开始; Step 1. The server agent receives the diagnosis object sent by the client, or the diagnostic plan periodic timer expires, and the diagnosis process starts;
步骤2、服务端代理程序根据诊断对象或者诊断计划中的诊断对象内容,确定诊断对象级别以及对象编号,在网管中分别提取对应对象编号的配置数据、告警数据、性能指标数据;Step 2: The server agent determines the diagnostic object level and the object number according to the content of the diagnosis object in the diagnosis object or the diagnosis plan, and extracts configuration data, alarm data, and performance indicator data of the corresponding object number in the network management;
步骤3、将提取的配置数据和故障属性数据库中的配置属性数据库进行匹配,并记录匹配到的配置属性编号,计算匹配权值A,并提取对应配置属性的软件故障属性组;Step 3: Match the extracted configuration data with the configuration attribute database in the fault attribute database, record the matching configuration attribute number, calculate the matching weight A, and extract the software fault attribute group corresponding to the configuration attribute;
步骤4、将提取的告警数据和故障属性数据库中的告警属性数据库进行匹配,并记录匹配到的告警属性编号,计算匹配权值B,并提取对应配置属性的软件故障属性组;。Step 4: Match the extracted alarm data with the alarm attribute database in the fault attribute database, record the matched alarm attribute number, calculate the matching weight B, and extract the software fault attribute group corresponding to the configuration attribute.
步骤5、将提取的性能指标数据和故障属性数据库中的性能指标属性数据库进行匹配,并记录匹配到的性能指标属性编号,计算匹配权值C,并提取对应性能指标属性的软件故障属性组;Step 5: Match the extracted performance indicator data with the performance indicator attribute database in the fault attribute database, record the matched performance indicator attribute number, calculate the matching weight C, and extract the software fault attribute group corresponding to the performance indicator attribute;
步骤6、将匹配到的软件故障属性组汇总,并以软件故障编号为索引,汇总对应匹配到的配置属性编号组、告警属性编号组和性能指标编号组和各属性的匹配权值;Step 6. Summarize the matched software fault attribute groups, and use the software fault number as an index to summarize the matched matching configuration attribute number group, the alarm attribute number group, the performance indicator number group, and the matching weights of each attribute;
步骤7、根据故障诊断规则库记录的软件故障和对应状态属性的权值(A,B,C)计算上一步骤中每一个匹配到的软件故障属性的最终匹配值Z并记录,根据最终匹配值Z进行排序,形成初步匹配故障表;Step 7. Calculate and record the final matching value Z of each matched software fault attribute in the previous step according to the software fault recorded by the fault diagnosis rule base and the weight of the corresponding state attribute (A , B , C ). Sorting according to the final matching value Z to form a preliminary matching fault table;
其中Z=A*A+B*B+C*CWhere Z=A *A+B *B+C *C
步骤8、对于初步匹配故障表为空,且判断***存在异常,则将故障数据汇总形成未知故障属性表;若初步匹配故障表不为空,则提取故障诊断规则库记录的软件故障对应的软件原因属性和解决办法属性,形成软件故障决策表;Step 8. If the preliminary matching fault table is empty and the system is abnormal, the fault data is summarized to form an unknown fault attribute table. If the preliminary matching fault table is not empty, the software corresponding to the software fault recorded in the fault diagnosis rule base is extracted. The cause attribute and the solution attribute form a software fault decision table;
步骤9、根据故障诊断规则库记录的软件故障对应的是否软件缺陷属性,提取确认为软件缺陷的软件故障属性记录,以及匹配的故障数据,形 成软件缺陷表;Step 9. According to the software defect attribute corresponding to the software fault recorded in the fault diagnosis rule base, extract the software fault attribute record confirmed as the software defect, and the matched fault data, a software defect table;
步骤10、将软件故障决策表通过网管的网络***发送给客户端代理程序;将软件缺陷表和未知软件故障属性表发送给故障分析与规则开发客户端;Step 10: Send the software fault decision table to the client agent through the network system of the network management; send the software defect table and the unknown software fault attribute table to the fault analysis and rule development client;
步骤11、诊断过程结束。Step 11. The diagnosis process ends.
具体地,步骤3具体包括以下步骤:Specifically, step 3 specifically includes the following steps:
步骤3.1根据提取的配置数据和故障属性数据库中的配置属性库进行顺序匹配,配置属性数据库中的配置属性数据为开发人员编辑的匹配规则,具体表现为如下if-then形式:Step 3.1 performs sequence matching according to the extracted configuration data and the configuration attribute library in the fault attribute database, and configures the configuration attribute data in the attribute database as a matching rule edited by the developer, and the specific expression is as follows:
if(提取的异常配置满足故障匹配规则)If (the extracted exception configuration satisfies the fault matching rule)
then计算匹配权值and记录配置属性编号以及对应的软件故障编号Then calculate the matching weight and record the configuration attribute number and the corresponding software fault number
步骤3.2完成所有的配置属性库匹配后,以配置属性编号为索引,将匹配的配置属性编号及对应的匹配权值和对应的软件故障编号汇总;其他属性匹配过程与配置属性相同。Step 3.2 After all the configuration attribute database matching is completed, the configuration attribute number is indexed, and the matching configuration attribute number and the corresponding matching weight value and the corresponding software fault number are summarized; the other attribute matching process is the same as the configuration attribute.
在本实施例中,通过将被诊断软件***的告警属性、性能属性和配置属性以及故障原因属性和解决办法属性建立对应关系,并进行建模和入库,形成故障属性数据库和故障规则数据库,将软件故障诊断和任务管理及人际交互划分为服务端程序和客户端代理程序,客户端代理程序通过建立故障诊断任务或者制定故障诊断计划,触发服务器端获取软件***故障数据并分析,生成故障诊断结果,将结果分为最终故障诊断决策表、未知故障表和软件缺陷表,并分别将最终故障诊断表反馈给客户端,指导操作人员尝试恢复故障;将未知故障表和软件缺陷表反馈给故障分析与规则开发客户端,用于***开发人员的分析并编辑新的故障属性和故障诊断规则,再次反馈给服务器端更新软件故障诊断属性数据库和软件故障诊断规则库。如此,实现了软件故障的智能诊断与修复,也实现了软件***故障的自动 监控,同时可以在***运行时不断完善故障诊断规则,大大提高了软件故障诊断与修复的效率和自动化程度。In this embodiment, the fault attribute database and the fault rule database are formed by establishing a corresponding relationship between the alarm attribute, the performance attribute and the configuration attribute of the software system to be diagnosed, and the fault cause attribute and the solution attribute, and modeling and warehousing, The software fault diagnosis and task management and interpersonal interaction are divided into a server program and a client agent. The client agent triggers the server to obtain the software system fault data and analyzes it by establishing a fault diagnosis task or formulating a fault diagnosis plan to generate a fault diagnosis. As a result, the results are divided into final fault diagnosis decision table, unknown fault table and software defect table, and the final fault diagnosis table is fed back to the client respectively to guide the operator to try to recover the fault; the unknown fault table and the software defect table are fed back to the fault. The analysis and rule development client is used by the system developer to analyze and edit the new fault attributes and fault diagnosis rules, and then feed back to the server to update the software fault diagnosis attribute database and the software fault diagnosis rule base. In this way, intelligent diagnosis and repair of software faults is realized, and automatic software system faults are also realized. Monitoring, at the same time, can continuously improve the fault diagnosis rules while the system is running, greatly improving the efficiency and automation of software fault diagnosis and repair.
此外,参照图15、图16和图17,本发明实施例还提供软件***故障诊断服务器(即服务器端)、软件***故障诊断客户端(即客户端)和故障分析与规则开发客户端之间的交互流程,具体步骤如下:In addition, referring to FIG. 15, FIG. 16, and FIG. 17, the embodiment of the present invention further provides a software system fault diagnosis server (ie, a server end), a software system fault diagnosis client (ie, a client), and a fault analysis and rule development client. The interaction process, the specific steps are as follows:
步骤a、客户端代理程序组织诊断任务或者诊断计划,封装成命令消息,并发送到服务端代理程序;Step a, the client agent organizes a diagnostic task or a diagnosis plan, encapsulates it into a command message, and sends it to the server agent;
步骤b、服务端代理程序收到客户端发来的命令消息并解码,若为诊断任务则触发诊断过程,并返回诊断结果;若为诊断计划,则更新诊断计划,并返回诊断计划更新结果;Step b, the server agent receives the command message sent by the client and decodes, if it is a diagnostic task, triggers the diagnosis process, and returns the diagnosis result; if it is a diagnosis plan, updates the diagnosis plan, and returns the diagnosis plan update result;
步骤c、客户端代理程序收到诊断结果,显示到人机界面;Step c: The client agent receives the diagnosis result and displays it to the human machine interface;
步骤d、客户端代理程序收到诊断计划更新结果,显示到人机界面;Step d, the client agent receives the diagnosis plan update result, and displays it to the man-machine interface;
步骤e、若存在软件缺陷记录,服务端代理程序通过FTP发送软件缺陷消息到故障分析与规则开发客户端Step e. If there is a software defect record, the server agent sends a software defect message to the fault analysis and rule development client through FTP.
步骤f、若诊断结果未匹配任何已知属性,则通知客户端发现未知故障,并组织未知故障属性表,通过FTP发送到故障分析与规则开发客户端Step f: If the diagnosis result does not match any known attribute, notify the client to find the unknown fault, and organize the unknown fault attribute table, and send it to the fault analysis and rule development client through FTP.
步骤g、服务器端代理程序若检测到诊断计划定时器超时,则执行诊断计划,步骤与步骤a—步骤f相同。Step g: If the server-side agent detects that the diagnostic plan timer has timed out, the diagnostic plan is executed, and the steps are the same as steps a-f.
此外,参照图18,本发明实施例还提供一种服务器端更新软件故障属性数据库的流程,具体步骤如下:In addition, referring to FIG. 18, an embodiment of the present invention further provides a process for updating a software fault attribute database of a server end, and the specific steps are as follows:
步骤A、***开发人员通过故障分析与规则开发客户端编辑新的故障属性和故障诊断规则,并封装为故障诊断规则消息,发送到服务端代理程序;Step A: The system developer edits the new fault attribute and the fault diagnosis rule through the fault analysis and rule development client, and encapsulates it into a fault diagnosis rule message, and sends the message to the server agent.
其中,故障属性包括配置属性、告警属性和性能指标属性的编号和故障诊断原语,故障诊断规则包括软件故障编号、五元属性和各属性在本软 件故障中所占权值。The fault attribute includes the configuration attribute, the alarm attribute, and the performance indicator attribute number and the fault diagnosis primitive. The fault diagnosis rule includes the software fault number, the five-element attribute, and each attribute in the soft. The weight of the fault in the fault.
步骤B、服务器端代理程序收到故障诊断规则消息并解码,分别更新到故障属性数据库和故障诊断规则数据库;Step B: The server-side agent receives the fault diagnosis rule message and decodes it, and updates to the fault attribute database and the fault diagnosis rule database respectively;
步骤C、服务器端代理程序发送故障规则更新结果消息到故障分析与规则开发客户端。Step C: The server-side agent sends a fault rule update result message to the fault analysis and rule development client.
步骤B具体包括以下步骤:Step B specifically includes the following steps:
步骤B.1服务器端分别根据收到的故障属性的编号判断是否新增,如果是新增则直接在故障属性数据库新增记录,如果不是新增,则在原记录上更新故障诊断原语;Step B.1 The server side determines whether to add according to the number of the fault attribute received, and if it is new, directly adds a record in the fault attribute database, and if not, adds the fault diagnosis primitive on the original record;
步骤B.2服务器端根据收到的软件故障的编号判断是否新增,如果是新增则直接在故障诊断规则数据库新增记录,如果不是新增,则在原记录上更新故障匹配数据。Step B.2 The server determines whether to add the new software according to the received software fault number. If it is new, it adds a record directly to the fault diagnosis rule database. If it is not new, the fault matching data is updated on the original record.
本发明实施例进一步提供一种软件***故障诊断服务器,参照图5,图5为本发明软件***故障诊断服务器第一实施例的功能模块示意图。The embodiment of the present invention further provides a software system fault diagnosis server. Referring to FIG. 5, FIG. 5 is a schematic diagram of functional modules of the first embodiment of the software system fault diagnosis server according to the present invention.
在第一实施例中,该软件***故障诊断服务器包括:In the first embodiment, the software system fault diagnosis server comprises:
故障属性获取模块10,配置为通过网管***获取被诊断软件***的故障属性;The fault attribute obtaining module 10 is configured to acquire a fault attribute of the software system to be diagnosed through the network management system;
在软件***故障诊断服务器(即网管服务器)安装并运行服务端程序,在软件***客户端(网管客户端)安装并运行客户端代理程序。客户端代理程序通过人机命令,编辑诊断任务和诊断计划,并通过TCP协议将包含诊断任务和诊断计划的消息发送至服务端程序以实时监控软件***状态;服务端程序在待诊断软件***的网管服务器中运行,获取客户端程序发来的诊断任务和诊断计划,并执行诊断任务,输出诊断结果反馈给客户端程序。Install and run the server program on the software system troubleshooting server (that is, the network management server), and install and run the client agent on the software system client (network management client). The client agent edits the diagnostic task and the diagnostic plan through human-machine commands, and sends a message containing the diagnostic task and the diagnostic plan to the server program through the TCP protocol to monitor the software system status in real time; the server program is in the software system to be diagnosed. The network management server runs to obtain the diagnostic tasks and diagnostic plans sent by the client program, and performs diagnostic tasks, and outputs the diagnostic results to the client program.
匹配决策模块20,配置为根据故障属性在预设的规则数据库中进行匹 配,生成匹配度由高到低的故障诊断决策列表。The matching decision module 20 is configured to perform the matching in the preset rule database according to the fault attribute. Match, generate a list of troubleshooting decisions with high to low matching.
根据故障属性在预设的规则数据库中进行匹配,匹配出于该故障属性相适的预设故障属性,并查找出预设故障属性对应的故障原因和故障解决办法,最后根据故障属性与预设故障属性的匹配度和相应的故障原因和故障解决办法生产故障诊断决策列表。According to the fault attribute, the matching is performed in the preset rule database, and the preset fault attribute suitable for the fault attribute is matched, and the fault cause and the fault solution corresponding to the preset fault attribute are found, and finally, according to the fault attribute and the preset. The matching degree of the fault attribute and the corresponding fault cause and fault resolution production fault diagnosis decision list.
在本实施例中,通过网管***获取被诊断软件***的故障属性,该故障属性包括配置属性、告警属性、性能指标属性、故障原因属性和解决办法属性,然后将被诊断软件***的已验证的配置属性、告警属性和性能指标属性的组合与对应的故障原因和解决办法组合建立映射关系,并将该映射关系建模和入库,形成包括软件故障属性数据库和故障规则数据库的规则数据库,最后根据被诊断软件***的故障属性在预设的规则数据库中进行匹配,根据故障属性与预设的规则数据库的匹配度生成故障诊断决策列表,最后将该故障诊断决策表发送至被诊断软件***的客户端,指导操作人员尝试恢复故障,如此,实现了软件***故障的智能诊断与修复,软件***故障实时监测,在线更新诊断规则,大大提高了软件故障诊断与修复的效率和自动化程度,同时也提高诊断***本身的维护与改进效率,从而解决了现有软件***维护的学习成本高、维护不方便的技术问题。In this embodiment, the fault attribute of the software system to be diagnosed is obtained by the network management system, and the fault attribute includes a configuration attribute, an alarm attribute, a performance indicator attribute, a fault cause attribute, and a solution attribute, and then the verified software system is verified. A combination of configuration attributes, alarm attributes, and performance indicator attributes is combined with corresponding fault causes and solutions to form a mapping relationship, and the mapping relationship is modeled and stored into a rule database including a software fault attribute database and a fault rule database, and finally Matching according to the fault attribute of the diagnosed software system in a preset rule database, generating a fault diagnosis decision list according to the matching degree of the fault attribute and the preset rule database, and finally sending the fault diagnosis decision table to the software system to be diagnosed The client guides the operator to try to recover the fault. In this way, intelligent diagnosis and repair of software system faults, real-time monitoring of software system faults, and online update of diagnostic rules are greatly improved, which greatly improves the efficiency and automation of software fault diagnosis and repair. improve The system itself off maintenance and improvement of efficiency, so as to solve the high cost of studying existing software systems maintenance, maintenance technical problems inconvenient.
上述各单元可以由电子设备中的中央处理器(Central Processing Unit,CPU)、数字信号处理器(Digital Signal Processor,DSP)或可编程逻辑阵列(Field-Programmable Gate Array,FPGA)实现。Each of the above units may be implemented by a central processing unit (CPU), a digital signal processor (DSP), or a field-programmable gate array (FPGA) in an electronic device.
图6为本发明软件***故障诊断服务器第二实施例的功能模块示意图。参照图6,在第二实施例中,软件***故障诊断服务器还包括匹配更新模块30,该匹配更新模块30配置为:FIG. 6 is a schematic diagram of functional modules of a second embodiment of a software system fault diagnosis server according to the present invention. Referring to FIG. 6, in the second embodiment, the software system fault diagnosis server further includes a matching update module 30, and the match update module 30 is configured to:
当故障属性与预设的规则数据库匹配不成功,则将故障属性发送至故障分析与规则开发端进行分析; When the fault attribute is not successfully matched with the preset rule database, the fault attribute is sent to the fault analysis and rule development end for analysis;
接收故障分析与规则开发端对匹配不成功的故障属性进行分析所得的处理新规则,并将处理新规则并入规则数据库中。The receiving failure analysis and the rule development end process the new rule for analyzing the unsuccessful failure attribute, and process the new rule into the rule database.
故障分析与规则开发端接收到匹配不成功的故障属性后,通过人机命令编辑和制定新的故障属性和故障诊断规则,并通过TCP协议将包含故障属性和故障诊断规则的消息同步到服务端程序。After the failure analysis and the rule development end receive the unsuccessful fault attribute, the human machine command edits and formulates new fault attributes and fault diagnosis rules, and synchronizes the message containing the fault attribute and the fault diagnosis rule to the server through the TCP protocol. program.
在本实施例中,通过在生成故障诊断决策列表的同时,也对故障属性与预设的规则数控库的匹配过程进行判断,若匹配不成功,则将不成功故障属性(例如未知故障表和软件缺陷表)发送至故障分析与规则开发端,用于***开发人员分析并编辑新的故障属性和故障诊断规则,然后再次反馈给服务器端更新故障属性数据库和故障规则数据库,这样,在实现软件***故障的智能诊断和修复的同时,也实现了软件***的自动监控,在***运行时不断完善故障诊断规则,大大提高了软件故障诊断与修复的效率和自动化程度。In this embodiment, by generating a fault diagnosis decision list, the matching process between the fault attribute and the preset rule numerical control library is also judged. If the matching is unsuccessful, the fault attribute is unsuccessful (for example, an unknown fault table and The software defect table is sent to the fault analysis and rule development end, and the system developer analyzes and edits the new fault attribute and fault diagnosis rule, and then feeds back to the server to update the fault attribute database and the fault rule database, so that the software is implemented. At the same time of intelligent diagnosis and repair of system failure, automatic monitoring of software system is realized, and fault diagnosis rules are continuously improved during system operation, which greatly improves the efficiency and automation of software fault diagnosis and repair.
图7为本发明软件***故障诊断服务器第三实施例的功能模块示意图,参照图7以及图8,在第三实施例中,软件***故障诊断服务器还包括数据库模块40,数据库模块40包括:FIG. 7 is a schematic diagram of a functional module of a software system fault diagnosis server according to a third embodiment of the present invention. Referring to FIG. 7 and FIG. 8, in the third embodiment, the software system fault diagnosis server further includes a database module 40, and the database module 40 includes:
属性建库单元401,配置为根据已验证的故障属性分别形成故障属性数据记录,并将该故障属性数据记录录入故障属性数据库中,其中,故障属性包括:配置属性、告警属性、性能指标属性、故障原因属性和解决办法属性;The attribute building unit 401 is configured to form a fault attribute data record according to the verified fault attribute, and record the fault attribute data record into the fault attribute database, wherein the fault attribute includes: a configuration attribute, an alarm attribute, a performance indicator attribute, Fault cause attribute and solution attribute;
其中,属性建库单元401还配置为:The attribute database unit 401 is further configured to:
采用数据记录方式存储所有已验证的故障属性,并将该已验证的故障属性分别成库并录入故障属性数据库中,故障属性数据库包括:Data logging is used to store all verified fault attributes, and the verified fault attributes are separately stored into a database and entered into the fault attribute database. The fault attribute database includes:
配置属性库,包括配置属性编号、软件故障编号队列和配置表;Configuration property library, including configuration attribute number, software failure number queue, and configuration table;
告警属性库,包括告警属性编号、软件故障编号队列和告警特征; The alarm attribute database includes the alarm attribute number, the software fault number queue, and the alarm feature.
性能指标属性库,包括性能指标属性编号、软件故障编号队列和性能指标特征;Performance indicator attribute library, including performance indicator attribute number, software failure number queue, and performance indicator characteristics;
故障原因属性库,包括故障原因属性编号、软件故障编号队列和故障原因描述;Fault cause attribute library, including fault cause attribute number, software fault number queue, and fault reason description;
解决办法属性库,包括解决办法属性编号、软件故障编号队列和解决办法描述。Workaround property library, including solution property number, software fault number queue, and solution description.
诊断建库单元402,配置为建立故障现象与故障原因属性和解决办法属性组合的映射关系,并将该映射关系录入故障规则数据库,其中,故障现象包括配置属性、告警属性和性能指标属性,故障原因与解决办法属性一一对应;The diagnostic building unit 402 is configured to establish a mapping relationship between the fault phenomenon and the fault cause attribute and the solution attribute combination, and enter the mapping relationship into the fault rule database, wherein the fault phenomenon includes configuration attributes, alarm attributes, and performance index attributes, and the fault The reason corresponds to the solution attribute one by one;
其中,故障规则数据库包括:软件故障编号、软件故障名称、配置属性组、告警属性组、性能指标属性组、故障原因属性组、解决办法属性组和是否软件缺陷标识;配置属性包括配置属性编号和配置属性权值,告警属性包括告警属性编号和告警属性权值,性能指标属性包括性能指标编号和性能指标权值。The fault rule database includes: a software fault number, a software fault name, a configuration attribute group, an alarm attribute group, a performance indicator attribute group, a fault cause attribute group, a solution attribute group, and a software defect identifier; the configuration attribute includes a configuration attribute number and Configure the attribute weights. The alarm attributes include the alarm attribute number and the alarm attribute weight. The performance indicator attributes include the performance indicator number and the performance indicator weight.
规则建库单元403,配置为将故障属性数据库和故障规则数据库一起并入规则数据库,故障属性数据库和故障规则数据库中的数据互相对应。The rule building unit 403 is configured to merge the fault attribute database and the fault rule database into the rule database, and the data in the fault attribute database and the fault rule database correspond to each other.
将软件故障的属性和软件故障建立对应关系,形成软件***的故障规则数据库,由软件故障编号、软件故障名称、对应配置属性组、告警属性组、性能指标属性组、故障原因属性组、解决办法属性组和是否软件缺陷标识组成;归属软件故障的各属性队列,由属性编号和属性权重组成;软件故障归属关系库和各个属性库中的元素具有多对多的关系,互相建立索引表。The software fault attribute is associated with the software fault to form a fault rule database of the software system, including the software fault number, the software fault name, the corresponding configuration attribute group, the alarm attribute group, the performance indicator attribute group, the fault reason attribute group, and the solution. The attribute group and the software defect identifier are composed; the attribute queues belonging to the software failure are composed of the attribute number and the attribute weight; the software fault attribution relation library and the elements in each attribute library have a many-to-many relationship, and mutually establish an index table.
图9为图5中匹配决策模块的细化功能模块示意图,参见图9,匹配决策模块20包括: 9 is a schematic diagram of a refinement function module of the matching decision module in FIG. 5. Referring to FIG. 9, the matching decision module 20 includes:
属性匹配单元201,配置为将获取到的故障属性分别与故障属性数据库中的匹配告警属性库、配置属性库和性能指标属性库进行匹配;The attribute matching unit 201 is configured to match the acquired fault attributes with the matching alarm attribute library, the configuration attribute library, and the performance indicator attribute database in the fault attribute database, respectively;
服务端代理程序(服务器)把获取到的故障属性数据分别与故障属性数据库中的告警属性库、配置属性库和性能指标属性库匹配。The server agent (server) matches the acquired fault attribute data with the alarm attribute library, the configuration attribute library, and the performance indicator attribute database in the fault attribute database.
初步匹配单元202,配置为将故障属性匹配到的故障属性对应的软件故障编号队列汇总排序,形成初步匹配故障表;若故障属性没有匹配到任何故障属性,则形成未知故障属性表,其中,初步匹配故障表包括匹配的软件故障编号、匹配的配置属性队列、匹配的告警属性队列、匹配的性能指标属性队列,各匹配的故障属性队列由匹配的故障属性编号组成;未知故障属性表包括未匹配的配置属性、未匹配的告警属性、未匹配的性能指标属性,各位匹配的故障属性队列由匹配的故障属性编号组成;The preliminary matching unit 202 is configured to summarize and sort the software fault number queue corresponding to the fault attribute matched by the fault attribute to form a preliminary matching fault table; if the fault attribute does not match any fault attribute, an unknown fault attribute table is formed, wherein The matching fault table includes a matching software fault number, a matching configuration attribute queue, a matched alarm attribute queue, and a matched performance indicator attribute queue. Each matched fault attribute queue is composed of a matching fault attribute number; the unknown fault attribute table includes an unmatched Configuration attributes, unmatched alarm attributes, unmatched performance indicator attributes, and the matching fault attribute queues are composed of matching fault attribute numbers;
将匹配到的故障属性对应的软件故障编号队列汇总排序,形成初步匹配故障表;若对应故障属性没有匹配到任何软件属性,则形成未知故障属性表;初步匹配故障表由匹配软件故障编号、匹配配置属性队列、匹配告警属性队列、匹配性能指标属性队列组成,各匹配属性队列由匹配的属性编号组成;未知故障属性表由未匹配配置数据、未匹配告警数据、未匹配性能指标数据组成,匹配属性队列由匹配的属性编号组成。The software fault number queue corresponding to the matched fault attribute is summarized and sorted to form a preliminary matching fault table; if the corresponding fault attribute does not match any software attribute, an unknown fault attribute table is formed; the preliminary matching fault table is matched by the software fault number, matching The configuration attribute queue, the matching alarm attribute queue, and the matching performance indicator attribute queue are composed, and each matching attribute queue is composed of matching attribute numbers; the unknown fault attribute table is composed of unmatched configuration data, unmatched alarm data, and unmatched performance indicator data, and is matched. The attribute queue consists of matching attribute numbers.
权值匹配单元203,配置为将初步匹配故障表与故障规则数据库中的故障属性编号与故障属性权值进行匹配,得出初步匹配故障表中个匹配故障的匹配度,其中,故障属性编号包括配置属性编号、告警属性编号和性能指标编号,故障属性权值包括配置属性权值、告警属性权值和性能指标权值;The weight matching unit 203 is configured to match the fault attribute number in the preliminary matching fault table and the fault rule database with the fault attribute weight, and obtain a matching degree of the matching fault in the preliminary matching fault table, where the fault attribute number includes Configure the attribute number, alarm attribute number, and performance indicator number. The fault attribute weights include configuration attribute weights, alarm attribute weights, and performance indicator weights.
服务器端代理程序根据软件故障归属关系中各故障属性在对应软件故障(即故障属性编号)中的权值,计算匹配故障表中各匹配故障的匹配度,并按匹配度重新排序,从故障规则数据库中提取故障原因和解决办法属性, 形成故障诊断决策表。The server-side agent calculates the matching degree of each matching fault in the matching fault table according to the weight of each fault attribute in the software fault attribution relationship in the corresponding software fault (ie, the fault attribute number), and reorders according to the matching degree, and the fault rule Extract the cause of the failure and the solution attribute in the database. Form a fault diagnosis decision table.
决策匹配单元204,配置为根据初步匹配故障表中个匹配故障的匹配度由大至小对该匹配故障进行排序,并从故障规则数据库中提取与匹配故障相对应的故障原因属性和解决办法属性,形成故障诊断决策列表,其中故障诊断决策列表包括匹配软件故障编号、匹配软件故障名称、故障原因属性和解决办法属性。The decision matching unit 204 is configured to sort the matching faults according to the matching degree of the matching faults in the preliminary matching fault table, and extract the fault reason attribute and the solution attribute corresponding to the matching fault from the fault rule database. Forming a fault diagnosis decision list, wherein the fault diagnosis decision list includes a matching software fault number, a matching software fault name, a fault cause attribute, and a solution attribute.
故障诊断决策表由匹配软件故障编号、匹配软件故障名称、故障原因属性、解决办法属性;此外,在故障属性数据库与故障规则数据库的匹配对应过程中,判断是否存在软件缺陷,若存在将软件缺陷记录提出形成软件缺陷表,该软件缺陷表由标识为软件缺陷的初步匹配记录和匹配的各属性数据组成。完成上述分析后服务端程序分别将故障诊断决策表发往服务端程序,将未知故障属性表和软件缺陷表发往故障分析与规则开发客户端。The fault diagnosis decision table is composed of matching software fault number, matching software fault name, fault reason attribute, and solution attribute; in addition, in the process of matching the fault attribute database and the fault rule database, it is judged whether there is a software defect, and if there is a software defect The record proposes to form a software defect table consisting of a preliminary matching record identified as a software defect and a matching attribute data. After completing the above analysis, the server program sends the fault diagnosis decision table to the server program, and sends the unknown fault attribute table and the software defect table to the fault analysis and rule development client.
在本实施例中,服务端程序(服务器)根据客户端分析目标要求,自动获取对应目标的网管数据(故障属性),匹配规则库,按照匹配度排序形成故障诊断决策表及执行建议并返回给客户端,对于无法匹配或者匹配到软件缺陷的故障,将故障数据通知并发送到故障分析与规则开发客户端;具体地,首先服务端代理程序根据客户端发来的故障诊断任务或者执行定期监控计划,获取对象对应网管数据,通过故障属性数据库和故障诊断数据库的匹配,形成未知故障属性表、最终故障决策表和软件缺陷表,并分别将故障最终决策表发送给客户端代理程序,将未知故障属性表和软件缺陷表发送给故障分析与规则开发客户端程序;然后服务端代理程序接收到故障分析与规则开发客户端程序发来的新的故障属性和软件故障诊断规则,并同步更新到故障属性数据库和故障诊断数据库。In this embodiment, the server program (server) automatically acquires the network management data (fault attribute) of the corresponding target according to the client analysis target requirement, matches the rule base, and forms a fault diagnosis decision table and execution suggestions according to the matching degree, and returns to the The client notifies and sends the fault data to the fault analysis and rule development client for faults that cannot be matched or matched to the software defect; specifically, the server agent first performs fault diagnosis tasks or performs regular monitoring according to the client. Plan to obtain the corresponding network management data of the object, and through the matching of the fault attribute database and the fault diagnosis database, form an unknown fault attribute table, a final fault decision table, and a software defect table, and respectively send the fault final decision table to the client agent, which will be unknown. The fault attribute table and the software defect table are sent to the fault analysis and rule development client program; then the server agent receives the new fault attribute and software fault diagnosis rule sent by the fault analysis and rule development client program, and synchronizes the update to Fault attribute database and fault Diagnostic database.
本发明实施例进一步提供一种软件***故障诊断***,参见图10所示,该软件***故障诊断***包括软件***诊断服务器100、软件***客户端 200和故障分析与规则开发端300,The embodiment of the present invention further provides a software system fault diagnosis system. Referring to FIG. 10, the software system fault diagnosis system includes a software system diagnostic server 100 and a software system client. 200 and fault analysis and rule development terminal 300,
软件***诊断服务器100包括故障属性获取模块10、匹配决策模块20和匹配更新模块30,其中,The software system diagnostic server 100 includes a fault attribute obtaining module 10, a matching decision module 20, and a matching update module 30, where
故障属性获取模块10,配置为通过网管***获取被诊断软件***的故障属性;The fault attribute obtaining module 10 is configured to acquire a fault attribute of the software system to be diagnosed through the network management system;
匹配决策模块20,配置为根据故障属性在预设的规则数据库中进行匹配,生成匹配度由高到低的故障诊断决策列表;The matching decision module 20 is configured to perform matching in a preset rule database according to the fault attribute, and generate a fault diagnosis decision list with high to low matching degree;
匹配更新模块30配置为,当故障属性与预设的规则数据库匹配不成功,则将故障属性发送至故障分析与规则开发端进行分析;The matching update module 30 is configured to send the fault attribute to the fault analysis and rule development end for analysis when the fault attribute is not successfully matched with the preset rule database;
匹配更新模块30还配置为:接收故障分析与规则开发端对匹配不成功的故障属性进行分析所得的处理新规则,并将处理新规则并入规则数据库中;The matching update module 30 is further configured to: receive a new rule for analyzing the fault attribute and the rule development end to analyze the unsuccessful fault attribute, and merge the processing new rule into the rule database;
软件***客户端200,配置为向软件***诊断服务器提供故障属性,以及接收所述故障诊断决策列表;a software system client 200 configured to provide a fault attribute to a software system diagnostic server and to receive the fault diagnosis decision list;
故障分析与规则开发端300,配置为接收软件***诊断服务器发送过来的匹配不成功的故障属性,并对该匹配不成功的故障属性进行分析得到处理新规则,将处理新规则并入规则数据库中。The fault analysis and rule development terminal 300 is configured to receive the unsuccessful fault attribute sent by the software system diagnostic server, and analyze the unsuccessful fault attribute to obtain a new rule, and process the new rule into the rule database. .
在本实施例中,服务端程序(即软件***诊断服务器)在待诊断软件***的网管服务器中运行,根据要求获取网管数据;获取客户端代理程序发来的诊断任务和诊断计划,并执行任务诊断,输出诊断结果反馈给客户端;获取故障分析与规则开发客户端发来的故障属性和故障诊断规则,并更新到故障属性数据库和故障诊断规则库中。In this embodiment, the server program (ie, the software system diagnostic server) runs in the network management server of the software system to be diagnosed, acquires the network management data according to the requirements, acquires the diagnosis task and the diagnosis plan sent by the client agent, and executes the task. Diagnosing, outputting diagnostic results to the client; obtaining fault attributes and fault diagnosis rules sent by the fault analysis and rule development client, and updating to the fault attribute database and the fault diagnosis rule base.
客户端代理程序通过人机命令,编辑诊断任务和诊断计划,并通过TCP协议将包含诊断任务和诊断计划信息的消息发送给服务端程序;客户端代理程序获取服务端程序的诊断结果,并图形化展示,由操作人员根据诊断 结果和修复执行建议进行操作,尝试恢复故障。故障分析与规则开发客户端程序,通过人机命令编辑和制定新的故障属性和故障诊断规则,并通过TCP协议将包含故障属性和故障诊断规则的消息同步到服务端程序;故障分析与规则开发客户端程序获取服务端程序FTP方式发来的诊断结果和故障数据,并提供给研发人员分析定位。The client agent edits the diagnostic task and the diagnostic plan through the man-machine command, and sends a message containing the diagnostic task and the diagnosis plan information to the server program through the TCP protocol; the client agent obtains the diagnosis result of the server program, and the graphic Display, based on the diagnosis by the operator The results and repairs are performed as recommended, attempting to recover from the failure. Fault analysis and rule development client program, edit and formulate new fault attributes and fault diagnosis rules through human-machine commands, and synchronize messages containing fault attributes and fault diagnosis rules to the server program through TCP protocol; fault analysis and rule development The client program obtains the diagnosis result and fault data sent by the server program FTP, and provides the R&D personnel with analysis and location.
本领域内的技术人员应明白,本发明的实施例可提供为方法、***、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
本发明是参照根据本发明实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现 在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. Instructions are provided for implementation The steps of a function specified in a block or blocks of a flow or a flow and/or a block diagram of a flow chart.
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。 The above are only the preferred embodiments of the present invention, and are not intended to limit the scope of the invention, and the equivalent structure or equivalent process transformations made by the description of the present invention and the drawings are directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of the present invention.

Claims (14)

  1. 一种软件***故障诊断方法,所述方法包括:A software system fault diagnosis method, the method comprising:
    通过网管***获取被诊断软件***的故障属性;Obtaining fault attributes of the diagnosed software system through the network management system;
    根据所述故障属性在预设的规则数据库中进行匹配,根据所述故障属性与预设的规则数据库的匹配度生成故障诊断决策列表。Performing matching in the preset rule database according to the fault attribute, and generating a fault diagnosis decision list according to the matching degree of the fault attribute and the preset rule database.
  2. 如权利要求1所述的软件***故障诊断方法,其中,所述根据所述故障属性在预设的规则数据库中进行匹配,根据所述故障属性与预设的规则数据库的匹配度生成故障诊断决策列表之后,所述方法还包括:The software system fault diagnosis method according to claim 1, wherein the matching is performed in a preset rule database according to the fault attribute, and a fault diagnosis decision is generated according to the matching degree between the fault attribute and a preset rule database. After the list, the method further includes:
    当所述故障属性与预设的规则数据库匹配不成功,则将所述故障属性发送至故障分析与规则开发端进行分析;When the fault attribute is not successfully matched with the preset rule database, the fault attribute is sent to the fault analysis and rule development end for analysis;
    接收所述故障分析与规则开发端对匹配不成功的故障属性进行分析所得的处理新规则,并将所述处理新规则并入所述规则数据库中。Receiving the failure analysis and the rule development terminal to process the new rule that the unsuccessful failure attribute is analyzed, and incorporating the processing new rule into the rule database.
  3. 如权利要求1或2所述的软件***故障诊断方法,其中,所述通过网管***获取被诊断软件***的故障属性之前,所述方法还包括:The software system fault diagnosis method according to claim 1 or 2, wherein before the obtaining the fault attribute of the software system to be diagnosed by the network management system, the method further comprises:
    根据已验证的故障属性分别形成故障属性数据记录,并将该故障属性数据记录录入故障属性数据库中,其中,所述故障属性包括:配置属性、告警属性、性能指标属性、故障原因属性和解决办法属性;Forming a fault attribute data record according to the verified fault attribute, and recording the fault attribute data record into the fault attribute database, wherein the fault attribute includes: a configuration attribute, an alarm attribute, a performance indicator attribute, a fault cause attribute, and a solution Attributes;
    建立故障现象与所述故障原因属性和解决办法属性组合的映射关系,并将该映射关系录入故障规则数据库,其中,所述故障现象包括所述配置属性、告警属性和性能指标属性,所述故障原因与解决办法属性一一对应;A mapping relationship between the fault phenomenon and the fault cause attribute and the solution attribute combination is established, and the mapping relationship is entered into the fault rule database, where the fault phenomenon includes the configuration attribute, the alarm attribute, and the performance indicator attribute, and the fault The reason corresponds to the solution attribute one by one;
    将所述故障属性数据库和故障规则数据库一起并入规则数据库,所述故障属性数据库和故障规则数据库中的数据互相对应。The fault attribute database and the fault rule database are merged together into a rule database, and the data in the fault attribute database and the fault rule database correspond to each other.
  4. 如权利要求3所述的软件***故障诊断方法,其中,所述根据已验证的故障属性分别形成故障属性数据记录,并将该故障属性数据记录录入故障属性数据库中,包括: The software system fault diagnosis method according to claim 3, wherein the fault attribute data record is formed according to the verified fault attribute, and the fault attribute data record is recorded into the fault attribute database, including:
    采用数据记录方式存储所有已验证的故障属性,并将该已验证的故障属性分别成库并录入故障属性数据库中,所述故障属性数据库包括:All the verified fault attributes are stored in the data record mode, and the verified fault attributes are respectively into a library and entered into the fault attribute database, and the fault attribute database includes:
    配置属性库,包括配置属性编号、软件故障编号队列和配置表;Configuration property library, including configuration attribute number, software failure number queue, and configuration table;
    告警属性库,包括告警属性编号、软件故障编号队列和告警特征;The alarm attribute database includes the alarm attribute number, the software fault number queue, and the alarm feature.
    性能指标属性库,包括性能指标属性编号、软件故障编号队列和性能指标特征;Performance indicator attribute library, including performance indicator attribute number, software failure number queue, and performance indicator characteristics;
    故障原因属性库,包括故障原因属性编号、软件故障编号队列和故障原因描述;Fault cause attribute library, including fault cause attribute number, software fault number queue, and fault reason description;
    解决办法属性库,包括解决办法属性编号、软件故障编号队列和解决办法描述。Workaround property library, including solution property number, software fault number queue, and solution description.
  5. 如权利要求4所述的软件***故障诊断方法,其中,A software system fault diagnosis method according to claim 4, wherein
    所述故障规则数据库包括:软件故障编号、软件故障名称、配置属性组、告警属性组、性能指标属性组、故障原因属性组、解决办法属性组和是否软件缺陷标识;The fault rule database includes: a software fault number, a software fault name, a configuration attribute group, an alarm attribute group, a performance indicator attribute group, a fault cause attribute group, a solution attribute group, and a software defect identifier;
    所述配置属性包括配置属性编号和配置属性权值,所述告警属性包括告警属性编号和告警属性权值,所述性能指标属性包括性能指标编号和性能指标权值。The configuration attribute includes a configuration attribute number and a configuration attribute weight, and the alarm attribute includes an alarm attribute number and an alarm attribute weight, and the performance indicator attribute includes a performance indicator number and a performance indicator weight.
  6. 如权利要求5所述的软件***故障诊断方法,其中,所述根据所述故障属性在预设的规则数据库中进行匹配,根据所述故障属性与预设的规则数据库的匹配度生成故障诊断决策列表,包括:The software system fault diagnosis method according to claim 5, wherein the matching is performed in a preset rule database according to the fault attribute, and a fault diagnosis decision is generated according to the matching degree between the fault attribute and a preset rule database. List, including:
    将获取到的所述故障属性分别与所述故障属性数据库中的匹配告警属性库、配置属性库和性能指标属性库进行匹配;And matching the obtained fault attributes with the matching alarm attribute library, the configuration attribute library, and the performance indicator attribute database in the fault attribute database, respectively;
    将所述故障属性匹配到的故障属性对应的软件故障编号队列汇总排序,形成初步匹配故障表;若所述故障属性没有匹配到任何故障属性,则形成未知故障属性表,其中,所述初步匹配故障表包括匹配的软件故障编 号、匹配的配置属性队列、匹配的告警属性队列、匹配的性能指标属性队列,各匹配的故障属性队列由匹配的故障属性编号组成;所述未知故障属性表包括未匹配的配置属性、未匹配的告警属性、未匹配的性能指标属性,各位匹配的故障属性队列由匹配的故障属性编号组成;The software fault number queue corresponding to the fault attribute matched by the fault attribute is summarized and sorted to form a preliminary matching fault table; if the fault attribute does not match any fault attribute, an unknown fault attribute table is formed, wherein the preliminary match The fault table includes matching software faults. Number, matching configuration attribute queue, matching alarm attribute queue, matching performance indicator attribute queue, each matched fault attribute queue is composed of matching fault attribute numbers; the unknown fault attribute table includes unmatched configuration attributes, and does not match The alarm attribute, the unmatched performance indicator attribute, and the matched fault attribute queue are composed of matching fault attribute numbers;
    将所述初步匹配故障表与所述故障规则数据库中的故障属性编号与故障属性权值进行匹配,得出所述初步匹配故障表中个匹配故障的匹配度,其中,所述故障属性编号包括配置属性编号、告警属性编号和性能指标编号,所述故障属性权值包括配置属性权值、告警属性权值和性能指标权值;Matching the preliminary matching fault table with the fault attribute number in the fault rule database and the fault attribute weight, and obtaining a matching degree of the matching faults in the preliminary matching fault table, where the fault attribute number includes Configuring an attribute number, an alarm attribute number, and a performance indicator number, where the fault attribute weight includes a configuration attribute weight, an alarm attribute weight, and a performance indicator weight;
    根据所述初步匹配故障表中个匹配故障的匹配度由大至小对该匹配故障进行排序,并从故障规则数据库中提取与匹配故障相对应的故障原因属性和解决办法属性,形成故障诊断决策列表,其中所述故障诊断决策列表包括匹配软件故障编号、匹配软件故障名称、故障原因属性和解决办法属性。According to the matching degree of the matching faults in the preliminary matching fault table, the matching faults are sorted from large to small, and fault cause attributes and solution attributes corresponding to the matching faults are extracted from the fault rule database to form a fault diagnosis decision. A list wherein the fault diagnosis decision list includes a matching software fault number, a matching software fault name, a fault cause attribute, and a solution attribute.
  7. 一种软件***故障诊断服务器,所述软件***故障诊断服务器包括:A software system fault diagnosis server, the software system fault diagnosis server includes:
    故障属性获取模块,配置为通过网管***获取被诊断软件***的故障属性;The fault attribute obtaining module is configured to acquire a fault attribute of the software system to be diagnosed through the network management system;
    匹配决策模块,配置为根据所述故障属性在预设的规则数据库中进行匹配,生成匹配度由高到低的故障诊断决策列表。The matching decision module is configured to perform matching in the preset rule database according to the fault attribute, and generate a fault diagnosis decision list with a high to low matching degree.
  8. 如权利要求7所述的软件***故障诊断服务器,其中,所述软件***故障诊断服务器还包括匹配更新模块,所述匹配更新模块配置为:The software system fault diagnosis server according to claim 7, wherein the software system fault diagnosis server further comprises a matching update module, and the matching update module is configured to:
    当所述故障属性与预设的规则数据库匹配不成功,则将所述故障属性发送至故障分析与规则开发端进行分析;When the fault attribute is not successfully matched with the preset rule database, the fault attribute is sent to the fault analysis and rule development end for analysis;
    接收所述故障分析与规则开发端对匹配不成功的故障属性进行分析所得的处理新规则,并将所述处理新规则并入所述规则数据库中。Receiving the failure analysis and the rule development terminal to process the new rule that the unsuccessful failure attribute is analyzed, and incorporating the processing new rule into the rule database.
  9. 如权利要求7或8所述的软件***故障诊断服务器,其中,所述软 件***故障诊断服务器还包括数据库模块,所述数据库模块包括:A software system fault diagnosis server according to claim 7 or 8, wherein said soft The system fault diagnosis server further includes a database module, and the database module includes:
    属性建库单元,配置为根据已验证的故障属性分别形成故障属性数据记录,并将该故障属性数据记录录入故障属性数据库中,其中,所述故障属性包括:配置属性、告警属性、性能指标属性、故障原因属性和解决办法属性;The attribute building unit is configured to form a fault attribute data record according to the verified fault attribute, and record the fault attribute data record into the fault attribute database, wherein the fault attribute includes: a configuration attribute, an alarm attribute, and a performance indicator attribute. , fault cause attribute and solution attribute;
    诊断建库单元,配置为建立故障现象与所述故障原因属性和解决办法属性组合的映射关系,并将该映射关系录入故障规则数据库,其中,所述故障现象包括所述配置属性、告警属性和性能指标属性,所述故障原因与解决办法属性一一对应;The diagnosis building unit is configured to establish a mapping relationship between the fault phenomenon and the fault cause attribute and the solution attribute combination, and enter the mapping relationship into the fault rule database, wherein the fault phenomenon includes the configuration attribute, the alarm attribute, and Performance indicator attribute, the fault reason and the solution attribute are in one-to-one correspondence;
    规则建库单元,配置为将所述故障属性数据库和故障规则数据库一起并入规则数据库,所述故障属性数据库和故障规则数据库中的数据互相对应。The rule building unit is configured to merge the fault attribute database and the fault rule database into the rule database, and the data in the fault attribute database and the fault rule database correspond to each other.
  10. 如权利要求9所述的软件***故障诊断服务器,其中,所述属性建库单元还配置为:The software system fault diagnosis server according to claim 9, wherein the attribute building unit is further configured to:
    采用数据记录方式存储所有已验证的故障属性,并将该已验证的故障属性分别成库并录入故障属性数据库中,所述故障属性数据库包括:All the verified fault attributes are stored in the data record mode, and the verified fault attributes are respectively into a library and entered into the fault attribute database, and the fault attribute database includes:
    配置属性库,包括配置属性编号、软件故障编号队列和配置表;Configuration property library, including configuration attribute number, software failure number queue, and configuration table;
    告警属性库,包括告警属性编号、软件故障编号队列和告警特征;The alarm attribute database includes the alarm attribute number, the software fault number queue, and the alarm feature.
    性能指标属性库,包括性能指标属性编号、软件故障编号队列和性能指标特征;Performance indicator attribute library, including performance indicator attribute number, software failure number queue, and performance indicator characteristics;
    故障原因属性库,包括故障原因属性编号、软件故障编号队列和故障原因描述;Fault cause attribute library, including fault cause attribute number, software fault number queue, and fault reason description;
    解决办法属性库,包括解决办法属性编号、软件故障编号队列和解决办法描述。Workaround property library, including solution property number, software fault number queue, and solution description.
  11. 如权利要求10所述的软件***故障诊断服务器,其中, A software system fault diagnosis server according to claim 10, wherein
    所述故障规则数据库包括:软件故障编号、软件故障名称、配置属性组、告警属性组、性能指标属性组、故障原因属性组、解决办法属性组和是否软件缺陷标识;The fault rule database includes: a software fault number, a software fault name, a configuration attribute group, an alarm attribute group, a performance indicator attribute group, a fault cause attribute group, a solution attribute group, and a software defect identifier;
    所述配置属性包括配置属性编号和配置属性权值,所述告警属性包括告警属性编号和告警属性权值,所述性能指标属性包括性能指标编号和性能指标权值。The configuration attribute includes a configuration attribute number and a configuration attribute weight, and the alarm attribute includes an alarm attribute number and an alarm attribute weight, and the performance indicator attribute includes a performance indicator number and a performance indicator weight.
  12. 如权利要求11所述的软件***故障诊断服务器,其中,所述匹配决策模块包括:The software system fault diagnosis server of claim 11, wherein the matching decision module comprises:
    属性匹配单元,配置为将获取到的所述故障属性分别与所述故障属性数据库中的匹配告警属性库、配置属性库和性能指标属性库进行匹配;The attribute matching unit is configured to match the acquired fault attributes with the matching alarm attribute library, the configuration attribute library, and the performance indicator attribute database in the fault attribute database, respectively;
    初步匹配单元,配置为将所述故障属性匹配到的故障属性对应的软件故障编号队列汇总排序,形成初步匹配故障表;若所述故障属性没有匹配到任何故障属性,则形成未知故障属性表,其中,所述初步匹配故障表包括匹配的软件故障编号、匹配的配置属性队列、匹配的告警属性队列、匹配的性能指标属性队列,各匹配的故障属性队列由匹配的故障属性编号组成;所述未知故障属性表包括未匹配的配置属性、未匹配的告警属性、未匹配的性能指标属性,各位匹配的故障属性队列由匹配的故障属性编号组成;a preliminary matching unit configured to summarize and sort the software fault number queue corresponding to the fault attribute matched by the fault attribute to form a preliminary matching fault table; if the fault attribute does not match any fault attribute, an unknown fault attribute table is formed, The preliminary matching fault table includes a matched software fault number, a matching configuration attribute queue, a matched alarm attribute queue, and a matched performance indicator attribute queue, and each matched fault attribute queue is composed of a matched fault attribute number; The unknown fault attribute table includes unmatched configuration attributes, unmatched alarm attributes, and unmatched performance indicator attributes. The matched fault attribute queues are composed of matching fault attribute numbers.
    权值匹配单元,配置为将所述初步匹配故障表与所述故障规则数据库中的故障属性编号与故障属性权值进行匹配,得出所述初步匹配故障表中个匹配故障的匹配度,其中,所述故障属性编号包括配置属性编号、告警属性编号和性能指标编号,所述故障属性权值包括配置属性权值、告警属性权值和性能指标权值;The weight matching unit is configured to match the preliminary matching fault table with the fault attribute number in the fault rule database and the fault attribute weight, and obtain a matching degree of the matching faults in the preliminary matching fault table, where The fault attribute number includes a configuration attribute number, an alarm attribute number, and a performance indicator number, and the fault attribute weight includes a configuration attribute weight, an alarm attribute weight, and a performance indicator weight;
    决策匹配单元,配置为根据所述初步匹配故障表中个匹配故障的匹配度由大至小对该匹配故障进行排序,并从故障规则数据库中提取与匹配故 障相对应的故障原因属性和解决办法属性,形成故障诊断决策列表,其中所述故障诊断决策列表包括匹配软件故障编号、匹配软件故障名称、故障原因属性和解决办法属性。The decision matching unit is configured to sort the matching faults according to the matching degree of the matching faults in the preliminary matching fault table, and extract and match from the fault rule database. The fault cause attribute and the solution attribute corresponding to the fault form a fault diagnosis decision list, wherein the fault diagnosis decision list includes a matching software fault number, a matching software fault name, a fault cause attribute, and a solution attribute.
  13. 一种软件***故障诊断***,所述软件***故障诊断***包括软件***诊断服务器、软件***客户端和故障分析与规则开发端,A software system fault diagnosis system includes a software system diagnostic server, a software system client, and a fault analysis and rule development terminal.
    所述软件***诊断服务器包括故障属性获取模块、匹配决策模块和匹配更新模块,其中,The software system diagnostic server includes a fault attribute obtaining module, a matching decision module, and a matching update module, where
    所述故障属性获取模块,配置为通过网管***获取被诊断软件***的故障属性;The fault attribute obtaining module is configured to acquire a fault attribute of the software system to be diagnosed through the network management system;
    所述匹配决策模块,配置为根据所述故障属性在预设的规则数据库中进行匹配,生成匹配度由高到低的故障诊断决策列表;The matching decision module is configured to perform matching in the preset rule database according to the fault attribute, and generate a fault diagnosis decision list with high to low matching degree;
    所述匹配更新模块,配置为当所述故障属性与预设的规则数据库匹配不成功,则将所述故障属性发送至故障分析与规则开发端进行分析;The matching update module is configured to send the fault attribute to the fault analysis and rule development end for analysis when the fault attribute is not successfully matched with the preset rule database;
    所述匹配更新模块还配置为:接收所述故障分析与规则开发端对匹配不成功的故障属性进行分析所得的处理新规则,并将所述处理新规则并入所述规则数据库中;The matching update module is further configured to: receive the processing new rule that is analyzed by the fault analysis and the rule development end to analyze the unsuccessful fault attribute, and incorporate the processing new rule into the rule database;
    所述软件***客户端,配置为向软件***诊断服务器提供故障属性,以及接收所述故障诊断决策列表;The software system client is configured to provide a fault attribute to the software system diagnostic server, and receive the fault diagnosis decision list;
    所述故障分析与规则开发端,配置为接收所述软件***诊断服务器发送过来的匹配不成功的故障属性,并对该匹配不成功的故障属性进行分析得到处理新规则,将所述处理新规则并入所述规则数据库中。The fault analysis and rule development end is configured to receive a fault attribute that is unsuccessful in the fault sent by the software system diagnostic server, and analyze the fault attribute that is unsuccessful in the match to obtain a new rule, and process the new rule. Incorporated into the rules database.
  14. 一种计算机存储介质,所述计算机存储介质存储有计算机可执行指令,所述计算机可执行指令用于执行所述权利要求1至6任一项所述的方法。 A computer storage medium storing computer executable instructions for performing the method of any one of claims 1 to 6.
PCT/CN2015/085932 2014-12-10 2015-08-03 Method, server and system for software system fault diagnosis WO2016090929A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410759411.9A CN105740140A (en) 2014-12-10 2014-12-10 Software system failure diagnosis method, server and system
CN201410759411.9 2014-12-10

Publications (1)

Publication Number Publication Date
WO2016090929A1 true WO2016090929A1 (en) 2016-06-16

Family

ID=56106596

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/085932 WO2016090929A1 (en) 2014-12-10 2015-08-03 Method, server and system for software system fault diagnosis

Country Status (2)

Country Link
CN (1) CN105740140A (en)
WO (1) WO2016090929A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109245910A (en) * 2017-07-10 2019-01-18 中兴通讯股份有限公司 Identify the method and device of fault type
CN110635962A (en) * 2018-06-25 2019-12-31 阿里巴巴集团控股有限公司 Abnormity analysis method and device for distributed system
CN111221890A (en) * 2019-11-08 2020-06-02 中盈优创资讯科技有限公司 Automatic monitoring and early warning method and device for general indexes
CN112702196A (en) * 2020-12-18 2021-04-23 上海中通吉网络技术有限公司 Automatic fault processing method and system
CN114116428A (en) * 2021-12-01 2022-03-01 中国建设银行股份有限公司 Fault diagnosis method and equipment for dispatching system
CN114500334A (en) * 2021-12-31 2022-05-13 钉钉(中国)信息技术有限公司 Diagnosis method and device of server application architecture
CN115225370A (en) * 2022-07-18 2022-10-21 北京天融信网络安全技术有限公司 Rule base optimization method and device, electronic equipment and storage medium
CN115396287A (en) * 2022-08-29 2022-11-25 武汉烽火技术服务有限公司 Fault analysis method and device

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10191112B2 (en) * 2016-11-18 2019-01-29 Globalfoundries Inc. Early development of a database of fail signatures for systematic defects in integrated circuit (IC) chips
CN106774271B (en) * 2017-01-03 2020-06-23 中车株洲电力机车有限公司 Urban rail transit vehicle fault diagnosis and display system
CN109218042B (en) * 2017-06-29 2023-04-18 中兴通讯股份有限公司 Fault diagnosis method and device based on web server and computer-readable storage medium
CN107301131A (en) * 2017-06-30 2017-10-27 郑州云海信息技术有限公司 A kind of distributed storage management software fault diagnosis method and system
CN107329885A (en) * 2017-07-21 2017-11-07 金鹏电子信息机器有限公司 A kind of method for early warning based on view data feature difference
CN107181630A (en) * 2017-07-24 2017-09-19 郑州云海信息技术有限公司 The treating method and apparatus of service fault in cloud system
CN107943098B (en) * 2018-01-01 2021-04-23 深圳通联金融网络科技服务有限公司 Intelligent operation and maintenance robot system based on machine learning
CN108363665A (en) * 2018-02-09 2018-08-03 西安博达软件股份有限公司 A kind of CMS novel maintenances diagnostic system and method based on high in the clouds
CN109726071A (en) * 2018-07-18 2019-05-07 平安科技(深圳)有限公司 System failure processing method, device, equipment and storage medium
CN109062746A (en) * 2018-07-27 2018-12-21 郑州云海信息技术有限公司 A kind of fault self-diagnosis method, device and the storage medium of server admin unit
CN112838944B (en) * 2020-07-29 2022-08-12 中兴通讯股份有限公司 Diagnosis and management, rule determination and deployment method, distributed device, and medium
CN112631192A (en) * 2020-09-30 2021-04-09 中车青岛四方机车车辆股份有限公司 Monitoring system for coupling and/or uncoupling, operating method, computer and storage medium
CN113836044B (en) * 2021-11-26 2022-03-15 华中科技大学 Method and system for collecting and analyzing software faults

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3489727B2 (en) * 1999-09-03 2004-01-26 株式会社日立情報システムズ Software failure determination method and recording medium recording the program
CN1968148A (en) * 2006-10-13 2007-05-23 华为技术有限公司 Network management system for integrative supervision and management of application software system and host resource
CN101201788A (en) * 2006-12-15 2008-06-18 中兴通讯股份有限公司 System for locating detection item
CN103473400A (en) * 2013-08-27 2013-12-25 北京航空航天大学 Software FMEA (failure mode and effects analysis) method based on level dependency modeling

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008262510A (en) * 2007-04-13 2008-10-30 Fuji Xerox Co Ltd Electronic circuit device, failure diagnostic device, failure diagnostic system, and failure diagnostic program
CN102243497B (en) * 2011-07-25 2013-10-02 江苏吉美思物联网产业股份有限公司 Networking technology-based remote intelligent analysis service system used for engineering machinery
CN103684828B (en) * 2012-09-18 2018-08-03 长春亿阳计算机开发有限公司 A kind for the treatment of method and apparatus of telecommunication equipment fault
CN103699489B (en) * 2014-01-03 2016-05-11 中国人民解放军装甲兵工程学院 A kind of remote software fault diagnosis and restorative procedure based on knowledge base

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3489727B2 (en) * 1999-09-03 2004-01-26 株式会社日立情報システムズ Software failure determination method and recording medium recording the program
CN1968148A (en) * 2006-10-13 2007-05-23 华为技术有限公司 Network management system for integrative supervision and management of application software system and host resource
CN101201788A (en) * 2006-12-15 2008-06-18 中兴通讯股份有限公司 System for locating detection item
CN103473400A (en) * 2013-08-27 2013-12-25 北京航空航天大学 Software FMEA (failure mode and effects analysis) method based on level dependency modeling

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109245910A (en) * 2017-07-10 2019-01-18 中兴通讯股份有限公司 Identify the method and device of fault type
CN109245910B (en) * 2017-07-10 2023-03-24 中兴通讯股份有限公司 Method and device for identifying fault type
CN110635962A (en) * 2018-06-25 2019-12-31 阿里巴巴集团控股有限公司 Abnormity analysis method and device for distributed system
CN111221890B (en) * 2019-11-08 2024-03-12 中盈优创资讯科技有限公司 Automatic monitoring and early warning method and device for universal index class
CN111221890A (en) * 2019-11-08 2020-06-02 中盈优创资讯科技有限公司 Automatic monitoring and early warning method and device for general indexes
CN112702196A (en) * 2020-12-18 2021-04-23 上海中通吉网络技术有限公司 Automatic fault processing method and system
CN114116428A (en) * 2021-12-01 2022-03-01 中国建设银行股份有限公司 Fault diagnosis method and equipment for dispatching system
CN114500334A (en) * 2021-12-31 2022-05-13 钉钉(中国)信息技术有限公司 Diagnosis method and device of server application architecture
CN114500334B (en) * 2021-12-31 2024-04-09 钉钉(中国)信息技术有限公司 Diagnosis method and device for server application architecture
CN115225370A (en) * 2022-07-18 2022-10-21 北京天融信网络安全技术有限公司 Rule base optimization method and device, electronic equipment and storage medium
CN115225370B (en) * 2022-07-18 2023-11-10 北京天融信网络安全技术有限公司 Rule base optimization method and device, electronic equipment and storage medium
CN115396287B (en) * 2022-08-29 2023-05-12 武汉烽火技术服务有限公司 Fault analysis method and device
CN115396287A (en) * 2022-08-29 2022-11-25 武汉烽火技术服务有限公司 Fault analysis method and device

Also Published As

Publication number Publication date
CN105740140A (en) 2016-07-06

Similar Documents

Publication Publication Date Title
WO2016090929A1 (en) Method, server and system for software system fault diagnosis
CN104483842B (en) One kind regulation and control one automation main website comparison method
CN106446412B (en) Model-based test method for avionics system
US20130311977A1 (en) Arrangement and method for model-based testing
CN104506338A (en) Fault diagnosis expert system based on decision tree for industrial Ethernet network
CN104407971A (en) Method for automatically testing embedded software
CN109936479B (en) Control plane fault diagnosis system based on differential detection and implementation method thereof
CN104614601B (en) A kind of terminal fault localization method, apparatus and system
CN114237192B (en) Digital factory intelligent control method and system based on Internet of things
CN110659202A (en) Client automatic testing method and device
KR102543064B1 (en) System for providing manufacturing environment monitoring service based on robotic process automation
CN103197640A (en) Intelligent management and control system and method of manufacturing technique
CN112306880A (en) Test method, test device, electronic equipment and computer readable storage medium
CN107247827B (en) Virtual terminal model modeling and automatic wiring method based on machine learning
CN115480746A (en) Method, device, equipment and medium for generating execution file of data processing task
CN116861708B (en) Method and device for constructing multidimensional model of production equipment
CN112949018B (en) System and method for creating and testing verification of direct-current transmission control and protection platform model
EP3916494A1 (en) Method and apparatus for updating data of industrial model
CN110570646A (en) Four-remote signal acceptance method and system based on historical data
US20190138921A1 (en) Interactive guidance system for selecting thermodynamics methods in process simulations
CN115587720A (en) Configuration management system
CN115755799A (en) Method for monitoring quality fluctuation
CN103595819A (en) Method for online testing service usability of web system
CN110163374B (en) Fault diagnosis reasoning system based on Word general configuration
CN114268569A (en) Configurable network operation, maintenance, acceptance and test method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15866829

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15866829

Country of ref document: EP

Kind code of ref document: A1