CN116955059A - Root cause positioning method, root cause positioning device, computing equipment and computer storage medium - Google Patents

Root cause positioning method, root cause positioning device, computing equipment and computer storage medium Download PDF

Info

Publication number
CN116955059A
CN116955059A CN202211531044.8A CN202211531044A CN116955059A CN 116955059 A CN116955059 A CN 116955059A CN 202211531044 A CN202211531044 A CN 202211531044A CN 116955059 A CN116955059 A CN 116955059A
Authority
CN
China
Prior art keywords
abnormal event
abnormal
root cause
class
events
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211531044.8A
Other languages
Chinese (zh)
Inventor
翁乐怡
陈青青
陈健飞
张皓恒
王恬
乔柏林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Zhejiang Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202211531044.8A priority Critical patent/CN116955059A/en
Publication of CN116955059A publication Critical patent/CN116955059A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The embodiment of the application discloses a root cause positioning method, a root cause positioning device, computing equipment and a computer storage medium, wherein the method comprises the following steps: data processing is carried out on the data of multiple dimensions according to the sampling time period, and multiple abnormal events in the same sampling time period are determined; dividing adjacent abnormal events in the plurality of abnormal events into the same class to obtain a plurality of abnormal event classes; determining the importance of each abnormal event in each abnormal event class according to each abnormal event class, and analyzing causal rules between the abnormal event and the fault according to the importance of each abnormal event to obtain a rule set corresponding to the fault; root cause positioning is performed according to the rule set. According to the technical scheme, the abnormal events can be conveniently and fast determined by carrying out data processing on the data with multiple dimensions, the resource cost of data processing analysis is effectively reduced, the causal rules between faults and abnormal events are self-adaptively learned, quick root cause positioning is realized, and service loss can be recovered in time.

Description

Root cause positioning method, root cause positioning device, computing equipment and computer storage medium
Technical Field
The embodiment of the application relates to the technical field of Internet, in particular to a root cause positioning method, a root cause positioning device, computing equipment and a computer storage medium.
Background
With the deep development of cloud computing and the great transformation of IT technical architecture, enterprise digital transformation has been stepped into deep water areas. In the process of cloud of the business system, the number of servers in an enterprise is increased in an explosive manner. Therefore, the operation and maintenance alarm, event and index data also grow exponentially. In the face of such complex and huge systems, it is becoming increasingly important how to process and analyze massive data for fast and accurate fault localization.
In the prior art, root positioning techniques mainly include root positioning methods based on AI algorithms such as configuration management database (Configuration Management Database, CMDB), monte carlo tree search, and cluster analysis. The method mainly comprises the steps of analyzing root causes by combining relations between indexes, alarms and other faults and fault expressions and calling and the like when faults occur, wherein the root causes are based on a CMDB root cause positioning method, and the CMDB root cause positioning method depends on complete relation data. The root cause positioning method based on the AI algorithm such as Monte Carlo tree search, clustering and the like is logically similar to a black box, has low interpretability, is not easy to introduce field experience, needs to train for a large amount of historical data, needs a large amount of data feature extraction work, has the problem of large resource expenditure in a training stage, has the problems of complex flow and large delay in a real-time deducing process, generally does not occur in similar faults, trains for the historical data, has the problems of unbalanced data and overfitting, and is difficult to apply in production.
Disclosure of Invention
The present application has been made in view of the above problems, and it is an object of the present application to provide a root cause positioning method, apparatus, computing device and computer storage medium that overcomes or at least partially solves the above problems.
According to an aspect of an embodiment of the present application, there is provided a root cause positioning method, including:
data processing is carried out on the data of multiple dimensions according to the sampling time period, and multiple abnormal events in the same sampling time period are determined;
dividing adjacent abnormal events in the plurality of abnormal events into the same class to obtain a plurality of abnormal event classes;
determining the importance of each abnormal event in each abnormal event class according to each abnormal event class, and analyzing causal rules between the abnormal event and the fault according to the importance of each abnormal event to obtain a rule set corresponding to the fault;
root cause positioning is performed according to the rule set.
According to another aspect of an embodiment of the present application, there is provided a root cause positioning device including:
the processing module is used for carrying out data processing on the data of multiple dimensions according to the sampling time period and determining multiple abnormal events in the same sampling time period;
the dividing module is used for dividing adjacent abnormal events in the plurality of abnormal events into the same class to obtain a plurality of abnormal event classes;
the analysis module is used for determining the importance of each abnormal event in each abnormal event class according to each abnormal event class, and analyzing causal rules between the abnormal event and the fault according to the importance of each abnormal event to obtain a rule set corresponding to the fault;
and the root cause positioning module is used for performing root cause positioning according to the rule set.
According to yet another aspect of an embodiment of the present application, there is provided a computing device including: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the root cause positioning method.
According to still another aspect of the embodiments of the present application, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the root cause positioning method described above.
According to the root cause positioning method, the root cause positioning device, the root cause positioning computing equipment and the root cause positioning computer storage medium, abnormal events can be conveniently determined by carrying out data processing on alarm data, index abnormal detection results, operation logs and other data with multiple dimensions, resource expenditure of data processing and analysis is effectively reduced by binarization coding processing, computing speed is effectively improved, and the root cause positioning method, the root cause positioning device, the root cause positioning computing equipment and the root cause computer storage medium have very strong computing advantages for massive operation data processing; importance pruning is carried out on each abnormal event in the abnormal event class based on the occurrence frequency; and (3) utilizing a causal relationship interestingness measurement algorithm to self-adaptively learn causal rules between faults and abnormal events, and rapidly judging the root cause of the current faults. According to the scheme, quick root cause positioning can be realized based on binarization coding and causal rules, and the root cause of a fault can be quickly analyzed and positioned when the system is applied to the fault, so that service loss can be recovered in time; the causal relationship is fully utilized in the root cause inference stage, and the scheme is different from a 'black box' mode of a machine learning algorithm based on statistical analysis, can flexibly introduce expert experience, priori causal rules and the like, has simple implementation mode and strong interpretability, does not need to define system architecture total relationship data, and has good generalization in the inference stage; and the over fitting problem caused by sample sparsity can be reduced, and the evolution of the overall root cause diagnosis is a preferred direction.
The foregoing description is only an overview of the technical solutions of the embodiments of the present application, and may be implemented according to the content of the specification, so that the technical means of the embodiments of the present application can be more clearly understood, and the following specific implementation of the embodiments of the present application will be more apparent.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 shows a flow diagram of a root cause positioning method according to one embodiment of the application;
FIG. 2a is a schematic diagram of the construction of an anomaly event coding chain for an index anomaly detection result dimension;
FIG. 2b shows a schematic diagram of the construction of a time-of-failure code chain;
FIG. 2c shows a schematic diagram of the classification of classes of abnormal events based on a time-to-failure code chain;
FIG. 2d illustrates a flow architecture diagram of a root cause positioning method according to one embodiment of the application;
FIG. 3 shows a block diagram of a root cause positioning device according to one embodiment of the application;
FIG. 4 illustrates a schematic diagram of a computing device, according to one embodiment of the application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
FIG. 1 shows a flow diagram of a root cause positioning method according to one embodiment of the application, as shown in FIG. 1, comprising the steps of:
step S101, data processing is carried out on the data in multiple dimensions according to the sampling time period, and multiple abnormal events in the same sampling time period are determined.
In the present embodiment, the abnormal event is obtained by performing data processing on data of a plurality of dimensions. Wherein the data of the plurality of dimensions may include: alarm data, index anomaly detection results and operation logs. In step S101, data processing may be performed on the data of each dimension according to the sampling time period to obtain an abnormal event analysis result of the dimension, and then the abnormal event analysis results of multiple dimensions corresponding to the same sampling time period are summarized to obtain multiple abnormal events in the same sampling time period. For example, for the same sampling time period, binarizing and encoding are performed on the abnormal event analysis results of multiple dimensions corresponding to the sampling time period to obtain an abnormal event encoding chain of multiple dimensions, and the abnormal event encoding chains of multiple dimensions are summarized to obtain multiple abnormal events in the sampling time period.
Specifically, binary coding modeling is performed on alarm data, index anomaly detection results, operation logs and other data in multiple dimensions acquired in each sampling time period, the moment when an anomaly event occurs is recorded as 1, other moments are recorded as 0, and an anomaly event coding chain in multiple dimensions is obtained. For example, for data of the ith dimension, the anomaly event code chain for that dimension corresponding to a single sampling period is noted as X i ,X i =(x 0 ,x 1 ,...,x t ) Wherein, the method comprises the steps of, wherein,
wherein x is t And representing the analysis result of the abnormal event at the t-th moment in the sampling time period.
FIG. 2a is a schematic diagram showing the construction of an anomaly event coding chain of an index anomaly detection result dimension, wherein in a coordinate system, a horizontal axis represents time (abbreviated as t) and a vertical axis represents an index value of an index x, as shown in FIG. 2 a; the dotted line in the coordinate system represents a preset index threshold corresponding to the index x, and if the index value of the index x exceeds the preset index threshold, the index value represents that no abnormal event exists at the moment, and the corresponding code is 0; if the index value of the index x does not exceed the preset index threshold value, the index value represents that an abnormal event occurs at the moment, and the corresponding code is 1.
In this embodiment, the aggregated multiple-dimension abnormal event code chain may be referred to as a failure time code chain, where the failure time code chain is a union of multiple-dimension abnormal event code chains, and the failure time code chain is denoted as G, g= u X i Wherein X is i And the abnormal event coding chain of the ith dimension corresponding to the single sampling time period is represented, i is more than or equal to 1 and less than or equal to n, and n is the total number of the dimensions.
Assuming that the total number of dimensions is 3, fig. 2b shows a schematic diagram of a construction of a fault time coding chain, as shown in fig. 2b, the fault time coding chain in the sampling time period is obtained by summarizing the fault event coding chains in 3 dimensions, and a plurality of fault events in the sampling time period can be reflected through the fault time coding chain, wherein 1 in the fault time coding chain indicates that the fault event occurs at the corresponding time, and 0 indicates that the fault event does not occur at the corresponding time.
Step S102, dividing adjacent abnormal events in the abnormal events into the same class to obtain a plurality of abnormal event classes.
Wherein, adjacent abnormal events in the plurality of abnormal events can be divided into the same class, so as to obtain a plurality of abnormal event classes, and then one abnormal event class can be regarded as a fault. Specifically, the fault time code chain can be time-sliced, and adjacent 1 in the fault time code chain is divided into the same class to form a plurality of abnormal event classes, which is equivalent to dividing a plurality of faults g k Where k=1, 2,..k. Each fault g k The sequence formed by each abnormal event in (a) is called asWhere k=1, 2.
Wherein X is i k Representation ofI is not less than 1 and not more than m, m is +.>Is a total number of abnormal events.
FIG. 2c shows a schematic diagram of the division of the classes of abnormal events based on the time-to-failure code chain, in which, as shown in FIG. 2c, the time-to-failure code chain is time-sliced, and adjacent 1's in the time-to-failure code chain are divided into the same class, thereby forming 3 classes of abnormal events, which are respectively referred to as failure g 1 Failure g 2 And failure g 3
Step S103, determining the importance of each abnormal event in each abnormal event class according to each abnormal event class, and analyzing causal rules between the abnormal event and the fault according to the importance of each abnormal event to obtain a rule set corresponding to the fault.
The importance of each abnormal event in the abnormal event class is calculated by using a TF-IDF algorithm according to each abnormal event class. For example, for each anomaly event in the anomaly event class, the importance of the anomaly event is calculated using a TF-IDF algorithm based on the normalized value of the anomaly event in the anomaly event class and the duty cycle between the anomaly event and all anomaly event classes that contain the anomaly event.
Since different abnormal events have different contributions to a particular fault, the importance of an abnormal event to a particular fault increases in proportion to the number of times it occurs in that fault, while decreasing inversely with the frequency of occurrence in other faults. In this embodiment, each abnormal event in the abnormal event class is simplified with reference to TF-IDF algorithm in natural language. Mapping word frequency in the algorithm into abnormal event frequency, which is the normalized value of the abnormal event in the abnormal event class; the reverse file frequency is mapped to an abnormal event divided by all abnormal event classes containing the abnormal event frequency, and the obtained quotient is obtained by taking the logarithm based on 10, so that the importance of the abnormal event is measured.
Each fault g k Sequences of individual exception events in (a)The sequence obtained after simplification is called Wherein k=1, 2,.. i k Representation->The importance of the ith abnormal event in (1.ltoreq.i.ltoreq.m, m is +.>Is a total number of abnormal events.
After the importance of each abnormal event is determined, the causal rules between the abnormal event and the fault can be analyzed according to the importance of each abnormal event, so as to obtain a rule set corresponding to the fault. Specifically, a causal association interestingness metric (Causal Association Interesting Measure, CAIM) algorithm may be utilized to analyze causal rules between the abnormal events and the faults according to the importance of each abnormal event, and obtain a rule set corresponding to the faults.
The cause of the fault that causes the fault to occur can be represented as X by causal rules 1 k ,X 2 k ,...,X i k →g k And mining the causal relationship by using a CAIM algorithm. When only single abnormal event X is considered i k Acting on fault g k In the time-course of which the first and second contact surfaces,
CAIM(X i k →g k )=N(X i ,g)
wherein N (X) i G) is X i Normalized mutual information with g. When consider abnormal event X i k And X j k At the same time act on fault g k In the time-course of which the first and second contact surfaces,
CAIM(X i k ,X j k →g k )=N(X i ,g)+N(X j ,g)-N(X i ,X j )+N(X i ,X j |g)
for causal rules with more common factors greater than 3, then
And carrying out causal rule mining based on the CAIM to obtain rule sets when each fault occurs. Meanwhile, in the searching process of carrying out causal rule mining, pruning can be flexibly carried out by combining expert priori knowledge.
Step S104, root cause positioning is carried out according to the rule set.
After the rule set corresponding to the fault is obtained through the processing mining, root cause positioning inference can be performed based on the rule set obtained through causal rule mining.
Fig. 2d is a schematic flow architecture diagram of a root cause positioning method according to an embodiment of the present application, as shown in fig. 2d, data in multiple dimensions is input, and data processing, binarization encoding, summarization and other processes are performed on the data in multiple dimensions, so as to obtain an abnormal event encoding chain and a fault time encoding chain in multiple dimensions in the same sampling period; then entering a characteristic optimization stage, simplifying a sequence formed by each abnormal event in each fault obtained by segmenting a fault time coding chain, and analyzing the importance of each abnormal event; and then entering a causal inference stage, obtaining a rule set corresponding to the fault through causal rule learning and causal inference, and further carrying out root cause positioning according to the rule set corresponding to the fault, and outputting a root cause positioning result.
According to the root cause positioning method provided by the embodiment of the application, the abnormal event can be conveniently determined by carrying out data processing on the alarm data, the index abnormal detection result, the operation log and other data in multiple dimensions, the resource cost of data processing analysis is effectively reduced by binarization coding processing, the calculation speed is effectively improved, and the method has strong calculation advantages for massive operation and dimension data processing; importance pruning is carried out on each abnormal event in the abnormal event class based on the occurrence frequency; and (3) utilizing a causal relationship interestingness measurement algorithm to self-adaptively learn causal rules between faults and abnormal events, and rapidly judging the root cause of the current faults. According to the scheme, quick root cause positioning can be realized based on binarization coding and causal rules, and the root cause of a fault can be quickly analyzed and positioned when the system is applied to the fault, so that service loss can be recovered in time; the causal relationship is fully utilized in the root cause inference stage, and the scheme is different from a 'black box' mode of a machine learning algorithm based on statistical analysis, can flexibly introduce expert experience, priori causal rules and the like, has simple implementation mode and strong interpretability, does not need to define system architecture total relationship data, and has good generalization in the inference stage; and the over fitting problem caused by sample sparsity can be reduced, and the evolution of the overall root cause diagnosis is a preferred direction.
FIG. 3 shows a block diagram of a root cause positioning device according to one embodiment of the application, as shown in FIG. 3, comprising: a processing module 301, a partitioning module 302, an analysis module 303 and a root cause positioning module 304.
The processing module 301 is configured to: and carrying out data processing on the data of multiple dimensions according to the sampling time period, and determining multiple abnormal events in the same sampling time period.
The dividing module 302 is configured to: and dividing adjacent abnormal events in the plurality of abnormal events into the same class to obtain a plurality of abnormal event classes.
The analysis module 303 is configured to: and determining the importance of each abnormal event in each abnormal event class according to each abnormal event class, and analyzing causal rules between the abnormal event and the fault according to the importance of each abnormal event to obtain a rule set corresponding to the fault.
Root cause location module 304 is to: root cause positioning is performed according to the rule set.
Optionally, the processing module 301 is further configured to: aiming at the data of each dimension, carrying out data processing on the data of the dimension according to the sampling time period to obtain an abnormal event analysis result of the dimension; summarizing the abnormal event analysis results of a plurality of dimensions corresponding to the same sampling time period to obtain a plurality of abnormal events in the same sampling time period.
Optionally, the processing module 301 is further configured to: and carrying out binarization coding on the abnormal event analysis results of a plurality of dimensions corresponding to the sampling time period aiming at the same sampling time period to obtain an abnormal event coding chain of the plurality of dimensions, and summarizing the abnormal event coding chain of the plurality of dimensions to obtain a plurality of abnormal events in the sampling time period.
Wherein the data for the plurality of dimensions includes: alarm data, index anomaly detection results and operation logs.
Optionally, the analysis module 303 is further configured to: for each abnormal event class, the importance of each abnormal event in the abnormal event class is calculated by using a TF-IDF algorithm.
Optionally, the analysis module 303 is further configured to: and calculating the importance of the abnormal event according to the normalized value of the abnormal event in the abnormal event class and the duty ratio between the abnormal event and all abnormal event classes containing the abnormal event by utilizing a TF-IDF algorithm aiming at each abnormal event in the abnormal event class.
Optionally, the analysis module 303 is further configured to: and analyzing causal rules between the abnormal events and the faults according to the importance of each abnormal event by using a causal association interestingness measurement algorithm to obtain a rule set corresponding to the faults.
The above descriptions of the modules refer to the corresponding descriptions in the method embodiments, and are not repeated herein.
According to the root cause positioning device provided by the embodiment of the application, abnormal events can be conveniently determined by carrying out data processing on the alarm data, the index abnormal detection result, the operation log and other data in multiple dimensions, the resource cost of data processing analysis is effectively reduced by binarization coding processing, the calculation speed is effectively improved, and the root cause positioning device has strong calculation advantages for massive operation and maintenance data processing; importance pruning is carried out on each abnormal event in the abnormal event class based on the occurrence frequency; and (3) utilizing a causal relationship interestingness measurement algorithm to self-adaptively learn causal rules between faults and abnormal events, and rapidly judging the root cause of the current faults. According to the scheme, quick root cause positioning can be realized based on binarization coding and causal rules, and the root cause of a fault can be quickly analyzed and positioned when the system is applied to the fault, so that service loss can be recovered in time; the causal relationship is fully utilized in the root cause inference stage, and the scheme is different from a 'black box' mode of a machine learning algorithm based on statistical analysis, can flexibly introduce expert experience, priori causal rules and the like, has simple implementation mode and strong interpretability, does not need to define system architecture total relationship data, and has good generalization in the inference stage; and the over fitting problem caused by sample sparsity can be reduced, and the evolution of the overall root cause diagnosis is a preferred direction.
The embodiment of the application also provides a nonvolatile computer storage medium, and the computer storage medium stores at least one executable instruction which can execute the root cause positioning method in any of the method embodiments.
FIG. 4 illustrates a schematic diagram of a computing device, according to an embodiment of the application, the particular embodiment of which is not limiting of the particular implementation of the computing device.
As shown in fig. 4, the computing device may include: a processor 402, a communication interface (Communications Interface) 404, a memory 406, and a communication bus 408.
Wherein:
processor 402, communication interface 404, and memory 406 communicate with each other via communication bus 408.
A communication interface 404 for communicating with network elements of other devices, such as clients or other servers.
The processor 402 is configured to execute the program 410, and may specifically perform relevant steps in the root cause positioning method embodiment described above.
In particular, program 410 may include program code including computer-operating instructions.
The processor 402 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application. The one or more processors included by the computing device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.
Memory 406 for storing programs 410. Memory 406 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
Program 410 may be specifically configured to cause processor 402 to perform the root cause positioning method of any of the method embodiments described above. The specific implementation of each step in the procedure 410 may refer to the corresponding step and corresponding description in the unit in the root cause positioning embodiment, which is not described herein. It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and modules described above may refer to corresponding procedure descriptions in the foregoing method embodiments, which are not repeated herein.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, embodiments of the present application are not directed to any particular programming language. It will be appreciated that the teachings of embodiments of the present application described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the embodiments of the present application.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the above description of exemplary embodiments of the application, various features of the embodiments of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., an embodiment of the application that is claimed, requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of embodiments of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in accordance with embodiments of the present application may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). Embodiments of the present application may also be implemented as a device or apparatus program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the embodiments of the present application may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Embodiments of the application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

Claims (10)

1. A root cause positioning method, comprising:
data processing is carried out on the data of multiple dimensions according to the sampling time period, and multiple abnormal events in the same sampling time period are determined;
dividing adjacent abnormal events in the plurality of abnormal events into the same class to obtain a plurality of abnormal event classes;
determining the importance of each abnormal event in each abnormal event class aiming at each abnormal event class, and analyzing causal rules between the abnormal event and the fault according to the importance of each abnormal event to obtain a rule set corresponding to the fault;
and carrying out root cause positioning according to the rule set.
2. The method of claim 1, wherein the data processing of the data in the plurality of dimensions according to the sampling period, determining a plurality of abnormal events in the same sampling period further comprises:
aiming at the data of each dimension, carrying out data processing on the data of the dimension according to the sampling time period to obtain an abnormal event analysis result of the dimension;
summarizing the abnormal event analysis results of a plurality of dimensions corresponding to the same sampling time period to obtain a plurality of abnormal events in the same sampling time period.
3. The method of claim 2, wherein the summarizing the abnormal event analysis results of the plurality of dimensions corresponding to the same sampling period to obtain a plurality of abnormal events in the same sampling period further comprises:
and carrying out binarization coding on the abnormal event analysis results of a plurality of dimensions corresponding to the sampling time period aiming at the same sampling time period to obtain an abnormal event coding chain of the plurality of dimensions, and summarizing the abnormal event coding chain of the plurality of dimensions to obtain a plurality of abnormal events in the sampling time period.
4. The method of claim 1, wherein the data of the plurality of dimensions comprises: alarm data, index anomaly detection results and operation logs.
5. The method of claim 1, wherein determining, for each exception class, the importance of each exception in the exception class further comprises:
for each abnormal event class, the importance of each abnormal event in the abnormal event class is calculated by using a TF-IDF algorithm.
6. The method of claim 5, wherein for each class of abnormal events, calculating the importance of each abnormal event in the class of abnormal events using TF-IDF algorithm further comprises:
and calculating the importance of the abnormal event according to the normalized value of the abnormal event in the abnormal event class and the duty ratio between the abnormal event and all abnormal event classes containing the abnormal event by utilizing a TF-IDF algorithm aiming at each abnormal event in the abnormal event class.
7. The method according to any one of claims 1-6, wherein analyzing causal rules between abnormal events and faults according to the importance of each abnormal event, and obtaining a rule set corresponding to faults further comprises:
and analyzing causal rules between the abnormal events and the faults according to the importance of each abnormal event by using a causal association interestingness measurement algorithm to obtain a rule set corresponding to the faults.
8. A root cause positioning device, comprising:
the processing module is used for carrying out data processing on the data of multiple dimensions according to the sampling time period and determining multiple abnormal events in the same sampling time period;
the dividing module is used for dividing adjacent abnormal events in the plurality of abnormal events into the same class to obtain a plurality of abnormal event classes;
the analysis module is used for determining the importance of each abnormal event in each abnormal event class according to each abnormal event class, and analyzing causal rules between the abnormal event and the fault according to the importance of each abnormal event to obtain a rule set corresponding to the fault;
and the root cause positioning module is used for performing root cause positioning according to the rule set.
9. A computing device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
the memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the root cause positioning method according to any one of claims 1-7.
10. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the root cause localization method of any one of claims 1-7.
CN202211531044.8A 2022-12-01 2022-12-01 Root cause positioning method, root cause positioning device, computing equipment and computer storage medium Pending CN116955059A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211531044.8A CN116955059A (en) 2022-12-01 2022-12-01 Root cause positioning method, root cause positioning device, computing equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211531044.8A CN116955059A (en) 2022-12-01 2022-12-01 Root cause positioning method, root cause positioning device, computing equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN116955059A true CN116955059A (en) 2023-10-27

Family

ID=88450061

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211531044.8A Pending CN116955059A (en) 2022-12-01 2022-12-01 Root cause positioning method, root cause positioning device, computing equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN116955059A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117560706A (en) * 2024-01-12 2024-02-13 亚信科技(中国)有限公司 Root cause analysis method, root cause analysis device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117560706A (en) * 2024-01-12 2024-02-13 亚信科技(中国)有限公司 Root cause analysis method, root cause analysis device, electronic equipment and storage medium
CN117560706B (en) * 2024-01-12 2024-03-22 亚信科技(中国)有限公司 Root cause analysis method, root cause analysis device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110928718B (en) Abnormality processing method, system, terminal and medium based on association analysis
US20210092160A1 (en) Data set creation with crowd-based reinforcement
CN112615888B (en) Threat assessment method and device for network attack behavior
CN115296933B (en) Industrial production data risk level assessment method and system
CN112990281A (en) Abnormal bid identification model training method, abnormal bid identification method and abnormal bid identification device
CN114328277A (en) Software defect prediction and quality analysis method, device, equipment and medium
WO2022053163A1 (en) Distributed trace anomaly detection with self-attention based deep learning
CN115456107A (en) Time series abnormity detection system and method
CN116955059A (en) Root cause positioning method, root cause positioning device, computing equipment and computer storage medium
CN112463564B (en) Method and device for determining associated index influencing host state
US11501058B2 (en) Event detection based on text streams
CN116739605A (en) Transaction data detection method, device, equipment and storage medium
CN115587017A (en) Data processing method and device, electronic equipment and storage medium
CN115658515A (en) Deep learning metamorphic test case sequencing method and computer readable medium
CN113052509B (en) Model evaluation method, model evaluation device, electronic apparatus, and storage medium
CN110874601A (en) Method for identifying running state of equipment, and state identification model training method and device
CN115328753A (en) Fault prediction method and device, electronic equipment and storage medium
CN114399407A (en) Power dispatching monitoring data anomaly detection method based on dynamic and static selection integration
CN112948469A (en) Data mining method and device, computer equipment and storage medium
CN112750047A (en) Behavior relation information extraction method and device, storage medium and electronic equipment
CN114969335B (en) Abnormality log detection method, abnormality log detection device, electronic device and readable storage medium
CN117435441B (en) Log data-based fault diagnosis method and device
US20240223615A1 (en) System and method for data set creation with crowd-based reinforcement
CN116070897A (en) Order wind control method and device based on anomaly detection algorithm and storage medium
CN117829904A (en) Investment decision prediction method, apparatus, device, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination