CN104794136A - Fault analysis method and device - Google Patents

Fault analysis method and device Download PDF

Info

Publication number
CN104794136A
CN104794136A CN201410029699.4A CN201410029699A CN104794136A CN 104794136 A CN104794136 A CN 104794136A CN 201410029699 A CN201410029699 A CN 201410029699A CN 104794136 A CN104794136 A CN 104794136A
Authority
CN
China
Prior art keywords
event
matrix
fault analysis
matching degree
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410029699.4A
Other languages
Chinese (zh)
Inventor
刘克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201410029699.4A priority Critical patent/CN104794136A/en
Publication of CN104794136A publication Critical patent/CN104794136A/en
Pending legal-status Critical Current

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention provides a fault analysis method and device. The method includes the following steps that data in a preset time period are collected; time points when key events occur and running parameters of the time points are extracted from the data to generate a first event matrix, wherein rows of the first event matrix represent the time points, the time points are arranged in a time sequence, columns of the first event matrix represent the running parameters, and each element in the first event matrix is a numerical value, at the corresponding time point, of the corresponding running parameter; the matching degrees of the first event matrix and various second event matrices stored in a fault analysis database are worked out; fault analysis records include the second even matrices and fault analysis files corresponding to the second even matrices; fault analysis is conducted according to the fault analysis file corresponding to the second event matrix with the highest matching degree. Through the fault analysis method and device, fault analysis accuracy can be improved.

Description

Failure analysis methods and device
Technical field
The embodiment of the present invention relates to the communication technology, particularly relates to a kind of failure analysis methods and device.
Background technology
Along with the develop rapidly of science and technology, equipment or systems grow complexity, especially for telecommunication apparatus, also more and more difficult to the fault analysis of equipment.Wherein, fault analysis comprises localization of fault, failure prediction etc.
In prior art, the corresponding relation of store status set and failure cause in failure analysis state database, the state of correlation module when occurring by gathering alarm, form a state set, then mate with the state set in this state set and database, thus realize the analysis to fault.
But the corresponding multiple failure cause of same state set possibility, therefore, adopts the method for prior art, cannot accurately analyze fault.
Summary of the invention
The embodiment of the present invention provides a kind of failure analysis methods and device, to improve the accuracy of fault analysis.
First aspect, the embodiment of the present invention provides a kind of failure analysis methods, comprising:
Gather the data in preset time period;
The time point occurred from described extracting data critical event and the operational factor of described time point, generate the first occurrence matrix, the row of described first occurrence matrix represents at least one time point, the in chronological sequence order arrangement of described time point, the row of described first occurrence matrix represent at least one operational factor, and each element in described first occurrence matrix is the numerical value of described operational factor at described time point;
Described first occurrence matrix is mated with at least one the second event matrix stored in fault analysis database, calculates the matching degree of described first occurrence matrix and second event matrix described in each; Wherein, comprise at least one fault analysis record in described fault analysis database, described fault analysis record comprises described second event matrix and fault analysis document corresponding to described second event matrix;
The fault analysis document corresponding according to the second event matrix that matching degree is the highest, carries out fault analysis.
In the first possible implementation of first aspect, described described first occurrence matrix to be mated with at least one the second event matrix stored in fault analysis database, calculate the matching degree of described first occurrence matrix and second event matrix described in each, comprising:
The row determining in described first occurrence matrix to arrange with in described second event matrix and mate row each other, and described coupling comprises at least one identical critical event in arranging;
Other elements of described coupling row are compared respectively, calculates the matching degree of described coupling row;
According to the order matching degree of sequencing, the matching degree of described coupling row that critical event in described first occurrence matrix occurs, calculate the matching degree of described first occurrence matrix and described second event matrix.
In the implementation that the second of first aspect is possible, described first occurrence matrix is mated with at least one the second event matrix stored in fault analysis database, calculate the matching degree of described first occurrence matrix and second event matrix described in each, comprising:
Adopt similar matrixes degree recognizer to be mated with at least one the second event matrix stored in fault analysis database by described first occurrence matrix, calculate the matching degree of described first occurrence matrix and second event matrix described in each; Or,
Be image by described first occurrence matrix and described second event matrix conversion, mate according to image similarity algorithm, calculate the matching degree of described first occurrence matrix and second event matrix described in each.
The implementation possible according to the first or the second of first aspect, first aspect, in the implementation that the third is possible, the corresponding weight coefficient of each operational factor in described first occurrence matrix, the weight coefficient of the operational factor that described critical event is corresponding is greater than other operational factors.
According to first aspect, first aspect the first to any one in the third possible implementation, in the 4th kind of possible implementation, described critical event comprises: alarm, abnormal log, system core indexes suddenly changed, user operation, configuration change.
According to first aspect, first aspect the first to any one in the 4th kind of possible implementation, in the 5th kind of possible implementation, described operational factor comprises: system core refers to the variation tendency of target value, system core index, the amplitude of variation of system core index, system core indexes suddenly changed direction, alarm name, warning position, the instruction of input, configuration change, abnormal log.
According to first aspect, first aspect the first to any one in the 5th kind of possible implementation, in the 6th kind of possible implementation,
Described fault analysis document package containing following at least one: fault type, abort situation, recovery policy.
According to first aspect, first aspect the first to any one in the 6th kind of possible implementation, in the 7th kind of possible implementation,
If the time point comprised in the second event matrix that matching degree is the highest after first occurrence matrix and the event of correspondence, then described method also comprises:
The second event Matrix prediction the highest according to described matching degree will event.
According to first aspect, first aspect the first to any one in the 7th kind of possible implementation, in the 8th kind of possible implementation, described before the time point of described extracting data critical event generation and the operational factor of described time point, also comprise:
Receive the fault analysis order of anomalous event information or reception user transmission.
Second aspect, the embodiment of the present invention provides a kind of failure analysis methods, comprising:
Gather the data in preset time period;
From the time point that described extracting data critical event occurs;
Critical event sequence is generated according to the sequencing that described critical event occurs;
Described critical event adopted critical event mark to represent, form the first event character string;
Adopt string matching algorithm to calculate described first event character string to mate with at least one the second event character string stored in fault analysis database, calculate the matching degree of described first event character string and second event character string described in each; Wherein, comprise at least one fault analysis record in described fault analysis database, described fault analysis record comprises described second event character string and fault analysis document corresponding to described second event character string;
The fault analysis document corresponding according to the second event character string that matching degree is the highest, carries out fault analysis.
In the first possible implementation of second aspect, described critical event comprises: alarm, abnormal log, system core indexes suddenly changed, user operation, configuration change.
According to the first possible implementation of second aspect or second aspect, in the implementation that the second is possible,
Described fault analysis document package containing following at least one: fault type, abort situation, recovery policy.
The implementation possible according to the first or the second of second aspect, second aspect, in the implementation that the third is possible,
If comprise the event after the first event character string in the second event character string that matching degree is the highest, then described method also comprises:
The second event character string prediction the highest according to described matching degree will event.
According to second aspect, second aspect the first to any one in the third possible implementation, in the 4th kind of possible implementation, before the described time point occurred from described extracting data critical event, also comprise:
Receive the fault analysis order of anomalous event information or reception user transmission.
The third aspect, the embodiment of the present invention provides a kind of fail analysis device, comprising:
Acquisition module, for gathering the data in preset time period;
Data processing module, for the operational factor of the time point that occurs from described extracting data critical event and described time point, generate the first occurrence matrix, the row of described first occurrence matrix represents at least one time point, the in chronological sequence order arrangement of described time point, the row of described first occurrence matrix represent at least one operational factor, and each element in described first occurrence matrix is the numerical value of described operational factor at described time point;
Analysis module, for being mated with at least one the second event matrix stored in fault analysis database by described first occurrence matrix, calculates the matching degree of described first occurrence matrix and second event matrix described in each; Wherein, comprise at least one fault analysis record in described fault analysis database, described fault analysis record comprises described second event matrix and fault analysis document corresponding to described second event matrix;
Described analysis module, also for the fault analysis document that the second event matrix the highest according to matching degree is corresponding, carries out fault analysis.
In the first possible implementation of the third aspect, described analysis module specifically for:
The row determining in described first occurrence matrix to arrange with in described second event matrix and mate row each other, and described coupling comprises at least one identical critical event in arranging;
Other elements of described coupling row are compared respectively, calculates the matching degree of described coupling row;
According to the order matching degree of sequencing, the matching degree of described coupling row that critical event in described first occurrence matrix occurs, calculate the matching degree of described first occurrence matrix and described second event matrix.
In the implementation that the second of the third aspect is possible, described analysis module specifically for:
Adopt similar matrixes degree recognizer to be mated with at least one the second event matrix stored in fault analysis database by described first occurrence matrix, calculate the matching degree of described first occurrence matrix and second event matrix described in each; Or,
Be image by described first occurrence matrix and described second event matrix conversion, mate according to image similarity algorithm, calculate the matching degree of described first occurrence matrix and second event matrix described in each.
The implementation possible according to the first or the second of the third aspect, the third aspect, in the implementation that the third is possible, the corresponding weight coefficient of each operational factor in described first occurrence matrix, the weight coefficient of the operational factor that described critical event is corresponding is greater than other operational factors.
According to the third aspect, the third aspect the first to any one in the third possible implementation, in the 4th kind of possible implementation, described critical event comprises: alarm, abnormal log, system core indexes suddenly changed, user operation, configuration change.
According to the third aspect, the third aspect the first to any one in the 4th kind of possible implementation, in the 5th kind of possible implementation, described operational factor comprises: system core refers to the variation tendency of target value, system core index, the amplitude of variation of system core index, system core indexes suddenly changed direction, alarm name, warning position, the instruction of input, configuration change, abnormal log.
According to the third aspect, the third aspect the first to any one in the 5th kind of possible implementation, in the 6th kind of possible implementation,
Described fault analysis document package containing following at least one: fault type, abort situation, recovery policy.
According to the third aspect, the third aspect the first to any one in the 6th kind of possible implementation, in the 7th kind of possible implementation,
If the time point comprised in the second event matrix that matching degree is the highest after first occurrence matrix and the event of correspondence, then described analysis module also for:
The second event Matrix prediction the highest according to described matching degree will event.
According to the third aspect, the third aspect the first to any one in the 7th kind of possible implementation, in the 8th kind of possible implementation, described acquisition module also for:
Described before the time point of described extracting data critical event generation and the operational factor of described time point, receive the fault analysis order of anomalous event information or reception user transmission.
Fourth aspect, the embodiment of the present invention provides a kind of fail analysis device, comprising:
Acquisition module, for gathering the data in preset time period;
Data processing module, for the time point occurred from described extracting data critical event;
Described data processing module, the sequencing also for occurring according to described critical event generates critical event sequence;
Described data processing module, also for described critical event being adopted critical event mark to represent, forms the first event character string;
Analysis module, calculating described first event character string for adopting string matching algorithm to mate with at least one the second event character string stored in fault analysis database, calculating the matching degree of described first event character string and second event character string described in each; Wherein, comprise at least one fault analysis record in described fault analysis database, described fault analysis record comprises described second event character string and fault analysis document corresponding to described second event character string;
Described analysis module, also for the fault analysis document that the second event character string the highest according to matching degree is corresponding, carries out fault analysis.
In the first possible implementation of fourth aspect, described critical event comprises: alarm, abnormal log, system core indexes suddenly changed, user operation, configuration change.
According to the first possible implementation of fourth aspect or fourth aspect, in the implementation that the second is possible,
Described fault analysis document package containing following at least one: fault type, abort situation, recovery policy.
The implementation possible according to the first or the second of fourth aspect, fourth aspect, in the implementation that the third is possible,
If comprise the event after the first event character string in the second event character string that matching degree is the highest, then described analysis module also for:
The second event character string prediction the highest according to described matching degree will event.
According to fourth aspect, fourth aspect the first to any one in the third possible implementation, in the 4th kind of possible implementation, described acquisition module also for:
Before the described time point occurred from described extracting data critical event, receive the fault analysis order of anomalous event information or reception user transmission.
Embodiment of the present invention failure analysis methods and device, by analyzing from the time point of extracting data critical event generation in the preset time period gathered and the operational factor of described time point, namely consider the time that event occurs when fault analysis simultaneously, form the first occurrence matrix, again described first occurrence matrix is mated with at least one the second event matrix stored in fault analysis database, when calculating the matching degree of described first occurrence matrix and second event matrix described in each, have also contemplated that the sequencing of event, therefore, the result of fault analysis is more accurate.
Embodiment of the present invention failure analysis methods and device, the time point occurred by the extracting data critical event in the preset time period from collection also forms critical event sequence according to the sequencing that event occurs, again by Sequence Transformed for this critical event be the first event character string, then described first occurrence matrix is mated with at least one the second event character string stored in fault analysis database, calculate the matching degree of described first event character string and second event character string described in each, the fault analysis document corresponding according to the second event character string that matching degree is the highest carries out fault analysis and process, owing to considering the sequencing of event, therefore, the result of fault analysis is more accurate.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the structural representation of the computer system of cloud computing;
Fig. 2 is the structural representation of the computer system of non-cloud computing;
Fig. 3 is the process flow diagram of failure analysis methods embodiment one of the present invention;
The schematic diagram of the data that the method that Fig. 4 is the present embodiment gathers;
The schematic diagram of the time point that the critical event that Fig. 5 extracts for the present embodiment occurs with it;
Fig. 6 is the schematic diagram of the first occurrence matrix;
Fig. 7 is the sub-process figure of matrix matching technique;
Fig. 8 is the process flow diagram of failure analysis methods embodiment two of the present invention;
Fig. 9 is the schematic diagram of the critical event sequence generated in failure analysis methods embodiment two of the present invention;
Figure 10 is the schematic diagram of the first event character string generated in failure analysis methods embodiment two of the present invention;
Figure 11 is the process flow diagram of fail analysis device embodiment one of the present invention;
Figure 12 is the structural representation of fail analysis device embodiment two of the present invention.
Embodiment
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Along with the develop rapidly of the communication technology, the function of communication facilities or communication system is become stronger day by day, the structure of communication facilities or communication system also becomes increasingly complex, the present invention is in order to improve the accuracy of fault analysis, arranged by the time order and function order that event in predeterminable event section is occurred according to event, mate with the failure logging stored in database, failure logging in database is set up based on historical data, namely according to event in historical time section, the sequencing that event occurs is set up, therefore, can localizing faults reason more accurately.
The failure analysis methods of the embodiment of the present invention can be applied to the computer system of cloud computing, is also applicable to the computer system of non-cloud computing.Fig. 1 is the structural representation of the computer system of cloud computing, as shown in Figure 1, the computer system of this cloud computing can comprise multiple subsystem, each subsystem can collect the information of each object respectively, be kept in respective database, subsystems can also comprise monitoring module, query statistic module and operational processes module, wherein operational processes module is used for carrying out alternately with client, receive the instruction of user and present information to user, such as, the fault analysis order that user sends passes to other module by operational processes module; The failure analysis methods that the embodiment of the present invention provides can be performed by the analytic system of concentrated deployment, this analytic system comprises data processing module, database and analysis module, wherein, the operational processes model calling of data processing module and subsystems, thus obtain the data of subsystems collection, namely the operational processes module of subsystems is equivalent to the acquisition module of analytic system.Fig. 2 is the structural representation of the computer system of non-cloud computing, as shown in Figure 2, the computer system of this non-cloud computing can comprise information collection module, database and operational processes module, wherein, information collection module is for collecting the information of each object, and save the data in database, operational processes module is then to the data analysis in database, and it is mutual with client, receive the instruction of user and present information to user, such as, the fault analysis order that user sends passes to other module by operational processes module.Monitoring module and query statistic module is also show in Fig. 2.
Wherein, the Data Source in database has two parts, and a part comes from subsystem, and subsystem mainly provides " event data ", and another part provides the fault analysis provided for event analytical documentation by client.Such as: " A event data " belongs to newfound fault, the fault do not matched in systems in which, after customer analysis goes out the conclusion of " A event data ", can by this fault analysis document storing to database, " A event data " simultaneously after process is also saved to database, wherein, for the computer system of the cloud computing shown in Fig. 1, this database is the database of analytic system; For the computer system of the non-cloud computing shown in Fig. 2, this database is the database of computing machine.
Fig. 3 is the process flow diagram of failure analysis methods embodiment one of the present invention, the method of the present embodiment can create a fault analysis task by user and trigger execution, also can be triggered by an anomalous event and perform, anomalous event can comprise alarm, system core index KPI suddenlys change, occurs abnormal log or occur high-risk operation etc., also can comprise other events, the embodiment of the present invention does not limit this.As shown in Figure 3, the failure analysis methods of the present embodiment can comprise:
Step 301, the data gathered in preset time period.
During specific implementation, preset time period can be specified in advance by user, such as user can create fault analysis task time a self-defined time period; Or, if this fault analysis task is triggered by anomalous event, then can arrange according to anomalous event processing policy, anomalous event processing policy can by User Defined, according to destruction grade or the influence degree of anomalous event, formulate corresponding analysis task.When pre-configured anomalous event processing policy, can fixed time section be t before the generation of finger anomalous event at after occurring to anomalous event btime period: T={t|t a≤ t≤t b.
That is, before step 301, the method for the present embodiment can also comprise: the fault analysis order receiving anomalous event information or reception user transmission.
The data gathered can be the related data of computer run, such as can include but not limited to variation tendency, the amplitude of variation of system core index, the time of system core indexes suddenly changed and sudden change sequencing, system core refers to target value, the time that alarm name, warning position, alarm produce and sequencing, the sequencing of the operation of user and running time and operation, the sequencing etc. of configuration change and the time of change and change, as shown in Figure 4, the schematic diagram of the data that the method that Fig. 4 is the present embodiment gathers.
The operational factor of step 302, the time point occurred from described extracting data critical event and described time point, generate the first occurrence matrix, the row of described first occurrence matrix represents at least one time point, the in chronological sequence order arrangement of described time point, the row of described first occurrence matrix represent at least one operational factor, and each element in described first occurrence matrix is the numerical value of described operational factor at described time point.
Wherein, critical event includes but not limited to alarm, abnormal log, system core indexes suddenly changed, user operation, configuration change etc.The ID that each critical event correspondence one is unique.For this class critical event of system core indexes suddenly changed, can also segmentation further.Such as KPI1 sudden change, upwards suddenlys change when can also be further subdivided into KPI1 health, suddenlys change downwards, upwards suddenly change when KPI1 is unhealthy, suddenlys change downwards.
During specific implementation, step 302 can adopt the first algorithm A1: extraction time section T={t|t a≤ t≤t bin, all critical events and its time point occurred, as shown in Figure 5, the schematic diagram of the time point that the critical event that Fig. 5 extracts for the present embodiment occurs with it, wherein each critical event k1, k2 ... kn represents, the time point t corresponding to critical event k1, t k2... t knrepresent.Meanwhile, also extract operational factor when described critical event occurs, wherein operational factor includes but not limited to variation tendency, the amplitude of variation of system core index, system core refers to target value, health degree etc., and generates the first occurrence matrix, as shown in Figure 6, Fig. 6 is the schematic diagram of the first occurrence matrix.In Fig. 6, the health degree of described operational factor represents with " ■ " between 90% ~ 100%; Represent with " " between 75% ~ 90%, be less than 75% and represent with " ◇ ".User for different KPI, can arrange different health degree interval ranges and representation; The amplitude of variation of operational factor represents by "●" in ± 10% scope; Represent by "○" in ± 20% scope, be greater than ± 20% represent with " ◎ ".User for different KPI, can arrange different amplitude of variation interval ranges and representation; Symbol " ★ " represents there is this event at current point in time, and "-" represents in this time point not this event.
Can find out, in the first occurrence matrix, t k1~ t knoften arrange equal corresponding one or more critical event.
Further, the corresponding weight coefficient of each operational factor in described first occurrence matrix, the weight coefficient of the operational factor that described critical event is corresponding is greater than other operational factors.
For the data of each row in the first occurrence matrix, can using the master data of data corresponding for critical event as these row, remaining data is the secondary data of these row.Further, can arrange higher by the weight coefficient of master data, secondary data, according to the correlation degree height of itself and this row critical event, arranges different weight coefficients respectively, can the weight coefficient of limited number of times certificate lower relative to the master data of these row.Such as, t k1the critical event that row are corresponding is alarm 1, then the master data that arranges for K1 of alarm 1, and weight coefficient is higher.Remaining alarm, KPI, user operation, abnormal log etc. are secondary data, and secondary data, according to the correlation degree height of itself and alarm 1, has different weight coefficients respectively.In like manner, the critical event that K3 row are corresponding is KPI2 sudden change, the master data that the variation tendency, amplitude of variation, health degree etc. of that KPI2 arrange for K3, and weight coefficient is higher.Remaining KPI, alarm, user operation, abnormal log etc. are secondary data, and secondary data, according to the correlation degree height of itself and KPI2, has different weight coefficients respectively.
The failure analysis methods of prior art needs manually to carry out modeling to this fault usually, then analysis conclusion is associated with fault model and is saved to database, and is that the workload of fault modeling is larger; In the present embodiment, for a new fault, the process generating the first occurrence matrix is exactly automatically set up the process of this fault model, do not need to carry out manual intervention, therefore, compared to the failure analysis methods of prior art, the method for the present embodiment can reduce human cost, is also convenient to safeguard and upgrading.
Step 303, described first occurrence matrix to be mated with at least one the second event matrix stored in fault analysis database, calculate the matching degree of described first occurrence matrix and second event matrix described in each; Wherein, comprise at least one fault analysis record in described fault analysis database, described fault analysis record comprises described second event matrix and fault analysis document corresponding to described second event matrix.
Alternatively, described in this step, described first occurrence matrix is mated with at least one the second event matrix stored in fault analysis database, calculate the matching degree of described first occurrence matrix and second event matrix described in each, method as shown in Figure 7 can be adopted, Fig. 7 is the sub-process figure of matrix matching technique, as shown in Figure 7, the method can comprise following sub-step:
Step 3031, the row determined in described first occurrence matrix arrange with in described second event matrix and mate row each other, comprise at least one identical critical event in described coupling row.
During specific implementation, can respectively get row respectively from the first occurrence matrix and second event matrix, if this two shows at least one identical critical event, then illustrate that these two row mate row each other.
Step 3032, by described coupling row other elements compare respectively, calculate described coupling row matching degree.
Particularly, can draw according to the weight calculation of master data and secondary data the matching degree that described coupling arranges.
Step 3033, the order matching degree of sequencing, the matching degree of described coupling row that occur according to critical event in described first occurrence matrix, calculate the matching degree of described first occurrence matrix and described second event matrix.
Alternatively, described in step 303, described first occurrence matrix is mated with at least one the second event matrix stored in fault analysis database, calculate the matching degree of described first occurrence matrix and second event matrix described in each, can also adopt with the following method:
Adopt similar matrixes degree recognizer to be mated with at least one the second event matrix stored in fault analysis database by described first occurrence matrix, calculate the matching degree of described first occurrence matrix and second event matrix described in each.
Or, can also be image by described first occurrence matrix and described second event matrix conversion, mate according to image similarity algorithm, calculate the matching degree of described first occurrence matrix and second event matrix described in each.Concrete conversion regime is: each event is given a unique color and (use RGB (red green blue, be called for short: RGB) color mode, screen reappears 16777216 (256 × 256 × 256) and plants color), similar events adopts same color, the corresponding rectangular image of each matrix, then utilizes the similarity of image similarity method comparison first occurrence matrix and two images corresponding to described second event matrix.
Step 304, the fault analysis document corresponding according to the second event matrix that matching degree is the highest, carry out fault analysis.
Further, described fault analysis document can comprise following at least one: fault type, abort situation, recovery policy.Particularly, fault type, fault title, localization of fault, performance evaluation or prediction conclusion can be comprised, the consequence that may cause, processing policy etc.During specific implementation, described fault analysis document can be summed up by customer analysis, and is saved in system database.
Further, if the event of the time point comprised in the highest second event matrix of matching degree after the first occurrence matrix and correspondence, then after the step 304, described method can also comprise:
The second event Matrix prediction the highest according to described matching degree will event.
During specific implementation, will event whether can also be risk of disturbance according to the rule judgment preset, such as, when judged result for this event will have larger harm to computer system, then initiatively information is sent to user, reminding user is within certain time period in future, there is the probability of x% may produce certain exception or fault (if when analysis result also obtains the probability that this exception or fault occur, the probability of generation can also be presented simultaneously), which consequence this exception or fault will cause; When judged result for this will event computer system is not had much affect time, then the information that can not send is to user.
It should be noted that, at the initial stage that the method for the present embodiment is applied, especially when the case of database purchase is relatively less, second event matrix in database and the matching degree of the first occurrence matrix may be all not too high, therefore at this moment according to the fault analysis document that the second event matrix that matching degree is the highest is corresponding, the result of carrying out fault analysis is not necessarily very accurate.Such as, with the first occurrence matrix matching degree the highest be m successively, n, x, y, z occurrence matrix, the matching degree of these occurrence matrixs and the first occurrence matrix is respectively 30.1%, and 30.0%, 29.6%, 25.0%, 22.0%, at this moment, the matching degree of the m occurrence matrix that matching degree is the highest also only has 30.1%, therefore, can not analyze by the occurrence matrix relevant documentation the highest according to matching degree.When several occurrence matrix matching degree close to or matching degree is all lower time, the matching degree of these occurrence matrixs and analytical documentation all can be presented to user, allow user carry out discriminatory analysis according to oneself experience.
If system is also like this according to the automatic triggering analysis of anomalous event, such as, after system automatic analysis completes, the analysis conclusion of Top5 match event matrix can be provided, the conclusion form provided can be: 30.1% may be fault corresponding to m event, 30.0% may be fault corresponding to n event, 29.6% may be fault corresponding to x event, 25.0% may be fault corresponding to y event, have 22.0% may be fault corresponding to z event, because matching degree is all lower, suggestion user carries out manual analysis.
Certainly, the conclusion form provided can by User Defined.In addition, can user be pointed out, when after the manual analysis that user completes the first event, can analysis conclusion be saved to database, so occur again next time with the first event class like event, will mate out the occurrence matrix that matching degree is higher.
It should be noted that, if an emerging event of failure, also can deposit the situation that the matching degree of each second event matrix in above-mentioned situation and database is all lower, therefore, processing mode as above can be adopted.
The present embodiment, by analyzing from the time point of extracting data critical event generation in the preset time period gathered and the operational factor of described time point, namely consider the time that event occurs when fault analysis simultaneously, form the first occurrence matrix, again described first occurrence matrix is mated with at least one the second event matrix stored in fault analysis database, when calculating the matching degree of described first occurrence matrix and second event matrix described in each, have also contemplated that the sequencing of event, therefore, the result of fault analysis is more accurate; In addition, in the present embodiment, for a new fault, the process generating the first occurrence matrix is exactly automatically set up the process of this fault model, does not need to spend manual intervention, as long as user draws and analyzes after conclusion carrying out manual analysis, analysis conclusion is saved to system, thus eliminate cost specialized labor and go as often kind of different faults builds the work of fault model, thus the method for the present embodiment can reduce human cost, is convenient to safeguard and upgrading.
Fig. 8 is the process flow diagram of failure analysis methods embodiment two of the present invention, the method of the present embodiment can create a fault analysis task by user and trigger execution, also can be triggered by an anomalous event and perform, anomalous event can comprise alarm, system core index KPI suddenlys change, occurs abnormal log or occur high-risk operation etc., also can comprise other events, the embodiment of the present invention does not limit this.The difference of the present embodiment and above-mentioned failure analysis methods embodiment one is, critical event, when to data analysis, is formed event character string according to sequencing by the present embodiment, instead of occurrence matrix.As shown in Figure 8, the failure analysis methods of the present embodiment can comprise:
Step 801, the data gathered in preset time period.
During specific implementation, preset time period can be specified in advance by user, such as user can create fault analysis task time a self-defined time period; Or, if this fault analysis task is triggered by anomalous event, then can arrange according to anomalous event processing policy, anomalous event processing policy can by User Defined, according to destruction grade or the influence degree of anomalous event, formulate corresponding analysis task.When pre-configured anomalous event processing policy, can fixed time section be t before the generation of finger anomalous event at after occurring to anomalous event btime period: T={t|t a≤ t≤t b.
That is, before step 801, the method for the present embodiment can also comprise: the fault analysis order receiving anomalous event information or reception user transmission.
Step 802, the time point occurred from described extracting data critical event.
Wherein, described critical event comprises: alarm, abnormal log, system core indexes suddenly changed, user operation, configuration change.
Step 803, the sequencing generation critical event sequence occurred according to described critical event.
Step 804, adopt critical event mark to represent described critical event, form the first event character string.
Step 805, employing string matching algorithm calculate described first event character string and mate with at least one the second event character string stored in fault analysis database, calculate the matching degree of described first event character string and second event character string described in each; Wherein, comprise at least one fault analysis record in described fault analysis database, described fault analysis record comprises described second event character string and fault analysis document corresponding to described second event character string.
Step 806, the fault analysis document corresponding according to the second event character string that matching degree is the highest, carry out fault analysis.
Particularly, in the present embodiment, by extracting the critical event in data, and generate critical event sequence according to the sequencing that critical event occurs, as shown in Figure 9, Fig. 9 is the schematic diagram of the critical event sequence generated in failure analysis methods embodiment two of the present invention.Then in step 804, replace critical event, obtain a special string by critical event mark (ID), as shown in Figure 10, Figure 10 is the schematic diagram of the first event character string generated in failure analysis methods embodiment two of the present invention.Then according to string matching algorithm, calculate the matching degree of existing second event character string in the first event character string and database, and each second event character string of coupling is listed according to matching degree height, fault analysis document corresponding for the second event character string of this coupling is carried out fault analysis simultaneously, also can by this fault analysis document in dedicating user to.
Further, described fault analysis document can comprise following at least one: fault type, abort situation, recovery policy.Particularly, fault type, fault title, localization of fault, performance evaluation or prediction conclusion can be comprised, the consequence that may cause, processing policy etc.During specific implementation, described fault analysis document can be summed up by customer analysis, and is saved in system database.
Further, if comprise the event after the first event character string in the highest second event character string of matching degree, then described method also comprises:
The second event character string prediction the highest according to described matching degree will event.
It should be noted that, at the initial stage that the method for the present embodiment is applied, especially when the case of database purchase is relatively less, the matching degree of the second event character string in database and the first event character string may be all not too high, therefore at this moment according to the fault analysis document that the second event character string that matching degree is the highest is corresponding, the result of carrying out fault analysis is not necessarily very accurate.Such as, with the first event string matching degree the highest be m successively, n, x, y, z occurrence matrix, the matching degree of these event character strings and the first event character string is respectively 30.1%, 30.0%, 29.6%, 25.0%, 22.0%, at this moment, the matching degree of the m occurrence matrix that matching degree is the highest also only has 30.1%, therefore, can not analyze according to the relevant documentation of the highest event character string of matching degree.When several event character string matching degree close to or matching degree is all lower time, the matching degree of these occurrence matrixs and analytical documentation all can be presented to user, allow user carry out discriminatory analysis according to oneself experience.
If system is also like this according to the automatic triggering analysis of anomalous event, such as, after system automatic analysis completes, the analysis conclusion of Top5 match event matrix can be provided, the conclusion form provided can be: 30.1% may be fault corresponding to m event, 30.0% may be fault corresponding to n event, 29.6% may be fault corresponding to x event, 25.0% may be fault corresponding to y event, have 22.0% may be fault corresponding to z event, because matching degree is all lower, suggestion user carries out manual analysis.
Certainly, the conclusion form provided can by User Defined.In addition, can user be pointed out, when after the manual analysis that user completes the first event, can analysis conclusion be saved to database, so occur again next time with the first event class like event, will mate out the event character string that matching degree is higher.
It should be noted that, if an emerging event of failure, also can deposit the situation that the matching degree of each second event character string in above-mentioned situation and database is all lower, therefore, processing mode as above can be adopted.
The present embodiment, the time point occurred by the extracting data critical event in the preset time period from collection also forms critical event sequence according to the sequencing that event occurs, again by Sequence Transformed for this critical event be the first event character string, then described first occurrence matrix is mated with at least one the second event character string stored in fault analysis database, calculate the matching degree of described first event character string and second event character string described in each, the fault analysis document corresponding according to the second event character string that matching degree is the highest carries out fault analysis and process, owing to considering the sequencing of event, therefore, the result of fault analysis is more accurate, in addition, in the present embodiment, for a new fault, the process generating the first event character string is exactly automatically set up the process of this fault model, does not need to spend manual intervention, as long as user draws and analyzes after conclusion carrying out manual analysis, analysis conclusion is saved to system, thus eliminate cost specialized labor and go as often kind of different faults builds the work of fault model, thus the method for the present embodiment can reduce human cost, is convenient to safeguard and upgrading.
Figure 11 is the process flow diagram of fail analysis device embodiment one of the present invention, and as shown in figure 11, the device 1100 of the present embodiment can comprise: acquisition module 1101, data processing module 1102 and analysis module 1103, wherein,
Acquisition module 1101, may be used for gathering the data in preset time period;
Data processing module 1102, may be used for the operational factor of time point and the described time point occurred from described extracting data critical event, generate the first occurrence matrix, the row of described first occurrence matrix represents at least one time point, the in chronological sequence order arrangement of described time point, the row of described first occurrence matrix represent at least one operational factor, and each element in described first occurrence matrix is the numerical value of described operational factor at described time point;
Analysis module 1103, may be used for described first occurrence matrix to mate with at least one the second event matrix stored in fault analysis database, calculates the matching degree of described first occurrence matrix and second event matrix described in each; Wherein, comprise at least one fault analysis record in described fault analysis database, described fault analysis record comprises described second event matrix and fault analysis document corresponding to described second event matrix;
Described analysis module 1103, can also be used for the fault analysis document corresponding according to the second event matrix that matching degree is the highest, carry out fault analysis.
Further, described analysis module 1103 specifically for:
The row determining in described first occurrence matrix to arrange with in described second event matrix and mate row each other, and described coupling comprises at least one identical critical event in arranging;
Other elements of described coupling row are compared respectively, calculates the matching degree of described coupling row;
According to the order matching degree of sequencing, the matching degree of described coupling row that critical event in described first occurrence matrix occurs, calculate the matching degree of described first occurrence matrix and described second event matrix.
Further, described analysis module 1103 specifically for:
Adopt similar matrixes degree recognizer to be mated with at least one the second event matrix stored in fault analysis database by described first occurrence matrix, calculate the matching degree of described first occurrence matrix and second event matrix described in each; Or,
Be image by described first occurrence matrix and described second event matrix conversion, mate according to image similarity algorithm, calculate the matching degree of described first occurrence matrix and second event matrix described in each.
Further, the corresponding weight coefficient of each operational factor in described first occurrence matrix, the weight coefficient of the operational factor that described critical event is corresponding is greater than other operational factors.
Further, described critical event comprises: alarm, abnormal log, system core indexes suddenly changed, user operation, configuration change.
Further, described operational factor comprises: system core refers to the variation tendency of target value, system core index, the amplitude of variation of system core index, system core indexes suddenly changed direction, alarm name, warning position, the instruction of input, configuration change, abnormal log.
Further, described fault analysis document package containing following at least one: fault type, abort situation, recovery policy.
Further, if the event of the time point comprised in the highest second event matrix of matching degree after the first occurrence matrix and correspondence, then described analysis module 1103 also for:
The second event Matrix prediction the highest according to described matching degree will event.
Further, described acquisition module 1101, can also be used for described before the time point of described extracting data critical event generation and the operational factor of described time point, receive the fault analysis order of anomalous event information or reception user transmission.
Wherein, acquisition module 1101 can correspond to the information collection module shown in Fig. 1 or Fig. 2, when specific implementation, anomalous event information can be sent according to anomalous event processing policy by the monitoring module shown in Fig. 1 or Fig. 2, and, can only send to acquisition module 1101, also can issue client by the operational processes module shown in Fig. 1 or Fig. 2 notifies user simultaneously, only can also issue client, at this moment user may send fault analysis order, and acquisition module 1101 receives this fault analysis order.After acquisition module 1101 receives anomalous event information, corresponding acquisition tasks can be triggered according to anomalous event strategy.
The device of the present embodiment, may be used for the technical scheme performing embodiment of the method shown in Fig. 3, it is similar that it realizes principle, repeats no more herein.
The device of the present embodiment, by analyzing from the time point of extracting data critical event generation in the preset time period gathered and the operational factor of described time point, namely consider the time that event occurs when fault analysis simultaneously, form the first occurrence matrix, again described first occurrence matrix is mated with at least one the second event matrix stored in fault analysis database, when calculating the matching degree of described first occurrence matrix and second event matrix described in each, have also contemplated that the sequencing of event, therefore, the result of fault analysis is more accurate; In addition, in the present embodiment, for a new fault, the process generating the first event character string is exactly automatically set up the process of this fault model, does not need to spend manual intervention, as long as user draws and analyzes after conclusion carrying out manual analysis, analysis conclusion is saved to system, thus eliminate cost specialized labor and go as often kind of different faults builds the work of fault model, thus the method for the present embodiment can reduce human cost, is convenient to safeguard and upgrading.
Figure 12 is the structural representation of fail analysis device embodiment two of the present invention, and as shown in figure 12, the device 1200 of the present embodiment can comprise: acquisition module 1201, data processing module 1202 and analysis module 1203, wherein,
Acquisition module 1201, for gathering the data in preset time period;
Data processing module 1202, for the time point occurred from described extracting data critical event;
Described data processing module 1202, the sequencing also for occurring according to described critical event generates critical event sequence;
Described data processing module 1202, also for described critical event being adopted critical event mark to represent, forms the first event character string;
Analysis module 1203, calculating described first event character string for adopting string matching algorithm to mate with at least one the second event character string stored in fault analysis database, calculating the matching degree of described first event character string and second event character string described in each; Wherein, comprise at least one fault analysis record in described fault analysis database, described fault analysis record comprises described second event character string and fault analysis document corresponding to described second event character string;
Described analysis module 1203, also for the fault analysis document that the second event character string the highest according to matching degree is corresponding, carries out fault analysis.
Further, described critical event comprises: alarm, abnormal log, system core indexes suddenly changed, user operation, configuration change.
Further, described fault analysis document package containing following at least one: fault type, abort situation, recovery policy.
Further, if comprise the event after the first event character string in the highest second event character string of matching degree, then described analysis module 1203 also for:
The second event character string prediction the highest according to described matching degree will event.
Further, described acquisition module 1201 can also be used for, before the described time point occurred from described extracting data critical event, receiving the fault analysis order of anomalous event information or reception user transmission.
Wherein, acquisition module 1201 can correspond to the information collection module shown in Fig. 1 or Fig. 2, when specific implementation, anomalous event information can be sent according to anomalous event processing policy by the monitoring module shown in Fig. 1 or Fig. 2, and, can only send to acquisition module 1201, also can issue client by the operational processes module shown in Fig. 1 or Fig. 2 notifies user simultaneously, only can also issue client, at this moment user may send fault analysis order, and acquisition module 1201 receives this fault analysis order.After acquisition module 1201 receives anomalous event information, corresponding acquisition tasks can be triggered according to anomalous event strategy.
The device of the present embodiment, may be used for the technical scheme performing embodiment of the method shown in Fig. 8, it is similar that it realizes principle, repeats no more herein.
The device of the present embodiment, the time point occurred by the extracting data critical event in the preset time period from collection also forms critical event sequence according to the sequencing that event occurs, again by Sequence Transformed for this critical event be the first event character string, then described first occurrence matrix is mated with at least one the second event character string stored in fault analysis database, calculate the matching degree of described first event character string and second event character string described in each, the fault analysis document corresponding according to the second event character string that matching degree is the highest carries out fault analysis and process, owing to considering the sequencing of event, therefore, the result of fault analysis is more accurate, in addition, in the present embodiment, for a new fault, the process generating the first event character string is exactly automatically set up the process of this fault model, does not need to spend manual intervention, as long as user draws and analyzes after conclusion carrying out manual analysis, analysis conclusion is saved to system, thus eliminate cost specialized labor and go as often kind of different faults builds the work of fault model, thus the method for the present embodiment can reduce human cost, is convenient to safeguard and upgrading.
One of ordinary skill in the art will appreciate that: all or part of step realizing above-mentioned each embodiment of the method can have been come by the hardware that programmed instruction is relevant.Aforesaid program can be stored in a computer read/write memory medium.This program, when performing, performs the step comprising above-mentioned each embodiment of the method; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium.
Last it is noted that above each embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme.

Claims (28)

1. a failure analysis methods, is characterized in that, comprising:
Gather the data in preset time period;
The time point occurred from described extracting data critical event and the operational factor of described time point, generate the first occurrence matrix, the row of described first occurrence matrix represents at least one time point, the in chronological sequence order arrangement of described time point, the row of described first occurrence matrix represent at least one operational factor, and each element in described first occurrence matrix is the numerical value of described operational factor at described time point;
Described first occurrence matrix is mated with at least one the second event matrix stored in fault analysis database, calculates the matching degree of described first occurrence matrix and second event matrix described in each; Wherein, comprise at least one fault analysis record in described fault analysis database, described fault analysis record comprises described second event matrix and fault analysis document corresponding to described second event matrix;
The fault analysis document corresponding according to the second event matrix that matching degree is the highest, carries out fault analysis.
2. method according to claim 1, it is characterized in that, described described first occurrence matrix to be mated with at least one the second event matrix stored in fault analysis database, calculates the matching degree of described first occurrence matrix and second event matrix described in each, comprising:
The row determining in described first occurrence matrix to arrange with in described second event matrix and mate row each other, and described coupling comprises at least one identical critical event in arranging;
Other elements of described coupling row are compared respectively, calculates the matching degree of described coupling row;
According to the order matching degree of sequencing, the matching degree of described coupling row that critical event in described first occurrence matrix occurs, calculate the matching degree of described first occurrence matrix and described second event matrix.
3. method according to claim 1, it is characterized in that, described first occurrence matrix is mated with at least one the second event matrix stored in fault analysis database, calculates the matching degree of described first occurrence matrix and second event matrix described in each, comprising:
Adopt similar matrixes degree recognizer to be mated with at least one the second event matrix stored in fault analysis database by described first occurrence matrix, calculate the matching degree of described first occurrence matrix and second event matrix described in each; Or,
Be image by described first occurrence matrix and described second event matrix conversion, mate according to image similarity algorithm, calculate the matching degree of described first occurrence matrix and second event matrix described in each.
4. the method according to any one of claims 1 to 3, is characterized in that, the corresponding weight coefficient of each operational factor in described first occurrence matrix, the weight coefficient of the operational factor that described critical event is corresponding is greater than other operational factors.
5. the method according to any one of Claims 1 to 4, is characterized in that, described critical event comprises: alarm, abnormal log, system core indexes suddenly changed, user operation, configuration change.
6. the method according to any one of Claims 1 to 5, it is characterized in that, described operational factor comprises: system core refers to the variation tendency of target value, system core index, the amplitude of variation of system core index, system core indexes suddenly changed direction, alarm name, warning position, the instruction of input, configuration change, abnormal log.
7. the method according to any one of claim 1 ~ 6, is characterized in that,
Described fault analysis document package containing following at least one: fault type, abort situation, recovery policy.
8. the method according to any one of claim 1 ~ 7, is characterized in that,
If the time point comprised in the second event matrix that matching degree is the highest after first occurrence matrix and the event of correspondence, then described method also comprises:
The second event Matrix prediction the highest according to described matching degree will event.
9. the method according to any one of claim 1 ~ 8, is characterized in that, described before the time point of described extracting data critical event generation and the operational factor of described time point, also comprises:
Receive the fault analysis order of anomalous event information or reception user transmission.
10. a failure analysis methods, is characterized in that, comprising:
Gather the data in preset time period;
From the time point that described extracting data critical event occurs;
Critical event sequence is generated according to the sequencing that described critical event occurs;
Described critical event adopted critical event mark to represent, form the first event character string;
Adopt string matching algorithm to calculate described first event character string to mate with at least one the second event character string stored in fault analysis database, calculate the matching degree of described first event character string and second event character string described in each; Wherein, comprise at least one fault analysis record in described fault analysis database, described fault analysis record comprises described second event character string and fault analysis document corresponding to described second event character string;
The fault analysis document corresponding according to the second event character string that matching degree is the highest, carries out fault analysis.
11. methods according to claim 10, is characterized in that, described critical event comprises: alarm, abnormal log, system core indexes suddenly changed, user operation, configuration change.
12. methods according to claim 10 or 11, is characterized in that,
Described fault analysis document package containing following at least one: fault type, abort situation, recovery policy.
13. methods according to any one of claim 10 ~ 12, is characterized in that,
If comprise the event after the first event character string in the second event character string that matching degree is the highest, then described method also comprises:
The second event character string prediction the highest according to described matching degree will event.
14. methods according to any one of claim 10 ~ 13, is characterized in that, before the described time point occurred from described extracting data critical event, also comprise:
Receive the fault analysis order of anomalous event information or reception user transmission.
15. 1 kinds of fail analysis devices, is characterized in that, comprising:
Acquisition module, for gathering the data in preset time period;
Data processing module, for the operational factor of the time point that occurs from described extracting data critical event and described time point, generate the first occurrence matrix, the row of described first occurrence matrix represents at least one time point, the in chronological sequence order arrangement of described time point, the row of described first occurrence matrix represent at least one operational factor, and each element in described first occurrence matrix is the numerical value of described operational factor at described time point;
Analysis module, for being mated with at least one the second event matrix stored in fault analysis database by described first occurrence matrix, calculates the matching degree of described first occurrence matrix and second event matrix described in each; Wherein, comprise at least one fault analysis record in described fault analysis database, described fault analysis record comprises described second event matrix and fault analysis document corresponding to described second event matrix;
Described analysis module, also for the fault analysis document that the second event matrix the highest according to matching degree is corresponding, carries out fault analysis.
16. devices according to claim 15, is characterized in that, described analysis module specifically for:
The row determining in described first occurrence matrix to arrange with in described second event matrix and mate row each other, and described coupling comprises at least one identical critical event in arranging;
Other elements of described coupling row are compared respectively, calculates the matching degree of described coupling row;
According to the order matching degree of sequencing, the matching degree of described coupling row that critical event in described first occurrence matrix occurs, calculate the matching degree of described first occurrence matrix and described second event matrix.
17. devices according to claim 15, is characterized in that, described analysis module specifically for:
Adopt similar matrixes degree recognizer to be mated with at least one the second event matrix stored in fault analysis database by described first occurrence matrix, calculate the matching degree of described first occurrence matrix and second event matrix described in each; Or,
Be image by described first occurrence matrix and described second event matrix conversion, mate according to image similarity algorithm, calculate the matching degree of described first occurrence matrix and second event matrix described in each.
18. devices according to any one of claim 15 ~ 17, is characterized in that, the corresponding weight coefficient of each operational factor in described first occurrence matrix, the weight coefficient of the operational factor that described critical event is corresponding is greater than other operational factors.
19. devices according to any one of claim 15 ~ 18, it is characterized in that, described critical event comprises: alarm, abnormal log, system core indexes suddenly changed, user operation, configuration change.
20. devices according to any one of claim 15 ~ 19, it is characterized in that, described operational factor comprises: system core refers to the variation tendency of target value, system core index, the amplitude of variation of system core index, system core indexes suddenly changed direction, alarm name, warning position, the instruction of input, configuration change, abnormal log.
21. devices according to any one of claim 15 ~ 20, is characterized in that,
Described fault analysis document package containing following at least one: fault type, abort situation, recovery policy.
22. devices according to any one of claim 15 ~ 21, is characterized in that,
If the time point comprised in the second event matrix that matching degree is the highest after first occurrence matrix and the event of correspondence, then described analysis module also for:
The second event Matrix prediction the highest according to described matching degree will event.
23. devices according to any one of claim 15 ~ 22, is characterized in that, described acquisition module also for:
Described before the time point of described extracting data critical event generation and the operational factor of described time point, receive the fault analysis order of anomalous event information or reception user transmission.
24. 1 kinds of fail analysis devices, is characterized in that, comprising:
Acquisition module, for gathering the data in preset time period;
Data processing module, for the time point occurred from described extracting data critical event;
Described data processing module, the sequencing also for occurring according to described critical event generates critical event sequence;
Described data processing module, also for described critical event being adopted critical event mark to represent, forms the first event character string;
Analysis module, calculating described first event character string for adopting string matching algorithm to mate with at least one the second event character string stored in fault analysis database, calculating the matching degree of described first event character string and second event character string described in each; Wherein, comprise at least one fault analysis record in described fault analysis database, described fault analysis record comprises described second event character string and fault analysis document corresponding to described second event character string;
Described analysis module, also for the fault analysis document that the second event character string the highest according to matching degree is corresponding, carries out fault analysis.
25. devices according to claim 24, is characterized in that, described critical event comprises: alarm, abnormal log, system core indexes suddenly changed, user operation, configuration change.
26. devices according to claim 24 or 25, is characterized in that,
Described fault analysis document package containing following at least one: fault type, abort situation, recovery policy.
27. devices according to any one of claim 24 ~ 26, is characterized in that,
If comprise the event after the first event character string in the second event character string that matching degree is the highest, then described analysis module also for:
The second event character string prediction the highest according to described matching degree will event.
28. devices according to any one of claim 24 ~ 27, is characterized in that, described acquisition module also for:
Before the described time point occurred from described extracting data critical event, receive the fault analysis order of anomalous event information or reception user transmission.
CN201410029699.4A 2014-01-22 2014-01-22 Fault analysis method and device Pending CN104794136A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410029699.4A CN104794136A (en) 2014-01-22 2014-01-22 Fault analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410029699.4A CN104794136A (en) 2014-01-22 2014-01-22 Fault analysis method and device

Publications (1)

Publication Number Publication Date
CN104794136A true CN104794136A (en) 2015-07-22

Family

ID=53558930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410029699.4A Pending CN104794136A (en) 2014-01-22 2014-01-22 Fault analysis method and device

Country Status (1)

Country Link
CN (1) CN104794136A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105471659A (en) * 2015-12-25 2016-04-06 华为技术有限公司 Root fault cause analysis method and analysis device
CN105577440A (en) * 2015-12-24 2016-05-11 华为技术有限公司 Network fault time location method and analyzing device
CN105812177A (en) * 2016-03-08 2016-07-27 华为技术有限公司 Network fault processing method and processing apparatus
CN106550271A (en) * 2016-12-08 2017-03-29 深圳Tcl数字技术有限公司 Television set running log recording method and device
CN106603264A (en) * 2015-10-20 2017-04-26 阿里巴巴集团控股有限公司 Method and equipment for positioning fault root
CN107203450A (en) * 2016-03-16 2017-09-26 伊姆西公司 The sorting technique and equipment of failure
WO2018059402A1 (en) * 2016-09-30 2018-04-05 华为技术有限公司 Method and apparatus for determining fault type
CN108564344A (en) * 2018-04-17 2018-09-21 许继电气股份有限公司 A kind of DC converter station fault data automatic acquiring method and device
CN108920296A (en) * 2018-07-10 2018-11-30 郑州云海信息技术有限公司 A kind of Fault Locating Method, device and computer readable storage medium
CN109039710A (en) * 2018-07-10 2018-12-18 中国联合网络通信集团有限公司 Route data auditing method, device, server and storage medium
CN109342033A (en) * 2018-09-11 2019-02-15 珠海格力电器股份有限公司 State analysis method and system of magnetic suspension centrifuge
CN109669844A (en) * 2018-11-27 2019-04-23 平安科技(深圳)有限公司 Equipment obstacle management method, apparatus, equipment and storage medium
CN109756382A (en) * 2019-02-28 2019-05-14 新华三技术有限公司 fault locating method and device
CN110514960A (en) * 2019-08-23 2019-11-29 索尔实业(集团)有限公司 A kind of cable fault locating platform
CN110930115A (en) * 2019-11-20 2020-03-27 拉扎斯网络科技(上海)有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN111404770A (en) * 2020-02-29 2020-07-10 华为技术有限公司 Network device, data processing method, device, system and readable storage medium
CN112731903A (en) * 2020-11-27 2021-04-30 成都飞机工业(集团)有限责任公司 Diagnosis system and method for all-electric transmission flight control fault
CN112782638A (en) * 2020-12-22 2021-05-11 国网浙江海宁市供电有限公司 Fault diagnosis and maintenance method for electric energy meter in low-voltage metering box
CN112782639A (en) * 2020-12-22 2021-05-11 国网浙江海宁市供电有限公司 Intelligent fault indication method and indicator for low-voltage metering box
WO2021147832A1 (en) * 2020-01-23 2021-07-29 阿里巴巴集团控股有限公司 Data processing method and apparatus, database system, electronic device, and storage medium
CN113825162A (en) * 2020-06-19 2021-12-21 ***通信集团设计院有限公司 Method and device for positioning fault reason of telecommunication network
CN114641740A (en) * 2019-11-05 2022-06-17 Abb瑞士股份有限公司 Method and device for monitoring an electric drive in an industrial system

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106603264A (en) * 2015-10-20 2017-04-26 阿里巴巴集团控股有限公司 Method and equipment for positioning fault root
CN105577440A (en) * 2015-12-24 2016-05-11 华为技术有限公司 Network fault time location method and analyzing device
CN105577440B (en) * 2015-12-24 2019-06-11 华为技术有限公司 A kind of network downtime localization method and analytical equipment
CN105471659B (en) * 2015-12-25 2019-03-01 华为技术有限公司 A kind of failure root cause analysis method and analytical equipment
CN105471659A (en) * 2015-12-25 2016-04-06 华为技术有限公司 Root fault cause analysis method and analysis device
CN105812177A (en) * 2016-03-08 2016-07-27 华为技术有限公司 Network fault processing method and processing apparatus
CN105812177B (en) * 2016-03-08 2019-10-18 华为技术有限公司 A kind of network failure processing method and processing equipment
CN107203450A (en) * 2016-03-16 2017-09-26 伊姆西公司 The sorting technique and equipment of failure
WO2018059402A1 (en) * 2016-09-30 2018-04-05 华为技术有限公司 Method and apparatus for determining fault type
US11140021B2 (en) 2016-09-30 2021-10-05 Huawei Technologies Co., Ltd. Method and apparatus for determining fault type
CN106550271A (en) * 2016-12-08 2017-03-29 深圳Tcl数字技术有限公司 Television set running log recording method and device
CN108564344A (en) * 2018-04-17 2018-09-21 许继电气股份有限公司 A kind of DC converter station fault data automatic acquiring method and device
CN109039710B (en) * 2018-07-10 2021-06-01 中国联合网络通信集团有限公司 Routing data auditing method, device, server and storage medium
CN109039710A (en) * 2018-07-10 2018-12-18 中国联合网络通信集团有限公司 Route data auditing method, device, server and storage medium
CN108920296A (en) * 2018-07-10 2018-11-30 郑州云海信息技术有限公司 A kind of Fault Locating Method, device and computer readable storage medium
CN109342033A (en) * 2018-09-11 2019-02-15 珠海格力电器股份有限公司 State analysis method and system of magnetic suspension centrifuge
CN109669844A (en) * 2018-11-27 2019-04-23 平安科技(深圳)有限公司 Equipment obstacle management method, apparatus, equipment and storage medium
CN109756382A (en) * 2019-02-28 2019-05-14 新华三技术有限公司 fault locating method and device
CN110514960A (en) * 2019-08-23 2019-11-29 索尔实业(集团)有限公司 A kind of cable fault locating platform
CN110514960B (en) * 2019-08-23 2021-06-11 索尔实业(集团)有限公司 Cable fault positioning platform
CN114641740A (en) * 2019-11-05 2022-06-17 Abb瑞士股份有限公司 Method and device for monitoring an electric drive in an industrial system
CN110930115A (en) * 2019-11-20 2020-03-27 拉扎斯网络科技(上海)有限公司 Data processing method and device, electronic equipment and computer readable storage medium
WO2021147832A1 (en) * 2020-01-23 2021-07-29 阿里巴巴集团控股有限公司 Data processing method and apparatus, database system, electronic device, and storage medium
US12001275B2 (en) 2020-01-23 2024-06-04 Alibaba Group Holding Limited Data processing method, apparatus, database system, electronic device, and storage medium
CN111404770A (en) * 2020-02-29 2020-07-10 华为技术有限公司 Network device, data processing method, device, system and readable storage medium
CN113825162A (en) * 2020-06-19 2021-12-21 ***通信集团设计院有限公司 Method and device for positioning fault reason of telecommunication network
CN113825162B (en) * 2020-06-19 2024-05-28 ***通信集团设计院有限公司 Method and device for positioning fault reasons of telecommunication network
CN112731903A (en) * 2020-11-27 2021-04-30 成都飞机工业(集团)有限责任公司 Diagnosis system and method for all-electric transmission flight control fault
CN112782639A (en) * 2020-12-22 2021-05-11 国网浙江海宁市供电有限公司 Intelligent fault indication method and indicator for low-voltage metering box
CN112782638A (en) * 2020-12-22 2021-05-11 国网浙江海宁市供电有限公司 Fault diagnosis and maintenance method for electric energy meter in low-voltage metering box
CN112782638B (en) * 2020-12-22 2022-09-13 国网浙江海宁市供电有限公司 Fault diagnosis and maintenance method for electric energy meter in low-voltage metering box

Similar Documents

Publication Publication Date Title
CN104794136A (en) Fault analysis method and device
CN104461842B (en) Based on daily record similitude come the method and apparatus of handling failure
CN112181758B (en) Fault root cause positioning method based on network topology and real-time alarm
US11294754B2 (en) System and method for contextual event sequence analysis
CN109889538B (en) User abnormal behavior detection method and system
KR20210019564A (en) Operation maintenance system and method
CN106104496A (en) The abnormality detection not being subjected to supervision for arbitrary sequence
US20140317040A1 (en) Event analyzer and computer-readable storage medium
JP2019502191A (en) Service call information processing method and device
CN107667370A (en) Abnormal account is detected using event log
CN109501834A (en) A kind of point machine failure prediction method and device
CN102713861A (en) Operation management device, operation management method, and program storage medium
CN111722952A (en) Fault analysis method, system, equipment and storage medium of business system
CN109669844A (en) Equipment obstacle management method, apparatus, equipment and storage medium
WO2015131558A1 (en) Alarm correlation data mining method and device
CN103577514A (en) Method and apparatus automated data exploration
CN112801316A (en) Fault positioning method, system equipment and storage medium based on multi-index data
US11004002B2 (en) Information processing system, change point detection method, and recording medium
Dsouza et al. Preventive maintenance for fault detection in transfer nodes using machine learning
CN116986246A (en) Intelligent inspection system and method for coal conveying belt
Assaf et al. An anomaly detection and explainability framework using convolutional autoencoders for data storage systems
CN105069158B (en) Data digging method and system
KR102024829B1 (en) System and Method for Fault Isolation in Industrial Processes using CART based variable ranking
CN113988690A (en) Risk behavior monitoring method, device and equipment
CN107783879A (en) A kind of method and apparatus for being used to analyze workflow execution path

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150722