NL2034766A - Alarming method for micro-service index prediction based on causality test - Google Patents

Alarming method for micro-service index prediction based on causality test Download PDF

Info

Publication number
NL2034766A
NL2034766A NL2034766A NL2034766A NL2034766A NL 2034766 A NL2034766 A NL 2034766A NL 2034766 A NL2034766 A NL 2034766A NL 2034766 A NL2034766 A NL 2034766A NL 2034766 A NL2034766 A NL 2034766A
Authority
NL
Netherlands
Prior art keywords
service
causality
alarm
index
indices
Prior art date
Application number
NL2034766A
Other languages
Dutch (nl)
Inventor
Yang Jingbo
Ji Suozhao
Wu Wenjun
Original Assignee
Univ Beihang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Beihang filed Critical Univ Beihang
Publication of NL2034766A publication Critical patent/NL2034766A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/323Visualisation of programs or trace data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computer Hardware Design (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention relates to an alarming method for micro-service index prediction based on a causality test, including causality discovery of service indexes based on a Granger causality test and multi-index prediction based on Attention Long Short-Term Memory (Attention LSTM). Indexes with causality with indexes to be predicted are found through the Granger causality test for jointly participating in the prediction, thus improving accuracy of the prediction. For relatively long index sequences in a micro-service scenario, there are often problems of partial causality between indexes and weak overall causality. The Granger causality test is improved, and the causality is incrementally calculated in segments. In practical applications, when new values are added to time sequences, only incremented causality is required to be calculated, and historical data are not required to be recalculated, thus reducing calculation amount, and improving efficiency of the causality discovery in the micro-service scenario.

Description

ALARMING METHOD FOR MICRO-SERVICE INDEX PREDICTION
BASED ON CAUSALITY TEST
TECHNICAL FIELD
[01] The present invention belongs to the technical field of computer applications, and more particularly relates to an alarming method for micro-service index prediction based on a causality test.
BACKGROUND ART
[02] With the development of the Internet, network services have explosively grown and brought convenience to people. Nowadays, people are more and more inseparable from Internet services. Users of network services are increasing rapidly.
However, the system architecture of conventional network services is iterated slowly and is difficult to deploy and maintain, and thus cannot meet current needs. The micro-service architecture overcomes shortcomings of the conventional architecture and has been widely concerned.
[03] The micro-service architecture divides a single service into a plurality of small services, each of which runs independently and provides services to users through cooperation. A lightweight communication mechanism is adopted in service communication, and each service is independently developed and deployed by a specific business team, thus it is suitable for the development of current Internet applications.
[04] Accurate prediction of future values of micro-service indexes is of great significance to the allocation and expansion of service resources. In many cases, it is difficult to accurately predict only relying on the indexes. Therefore, it is necessary to introduce other relevant indexes for prediction. In the conventional multi-index prediction, the relevant indexes are known or few, which are relatively available.
However, in the field of micro-services, there are many service indexes and the relationship between indexes is constantly changing. Therefore, it is necessary to use new methods to find relevant indexes quickly and accurately in many indexes.
SUMMARY
[05] In order to overcome deficiencies in the prior art, the present invention provides a micro-service index prediction method based on a Granger causality test and Attention Long Short-Term Memory (Attention LSTM), which improves accuracy of micro-service index prediction.
[06] Technical solutions of the present invention are as follows. An alarming method for micro-service index prediction based on a causality test includes the following steps. 1. Causality discovery on service indexes based on a Granger causality test (1) Firstly, service index data is preprocessed, a stationarity test is performed on the service index data, and differential processing is performed on non-stationarity sequences.
[07] (2) The Granger causality test is performed on the service indexes. Since there is misjudgment for the Granger causality test on relatively long time sequences, and index sequences are relatively long in a micro-service scenario, there are often problems of partial causality between indexes and weak overall causality. The Granger causality test is improved in the present invention, and the causality is incrementally calculated in segments. Specifically, the service index data is segmented into segments with a same length, then the Granger causality test is performed on corresponding segments of two service indexes, and finally, the number of segments with the causality is counted. The greater the number of segments with the causality, the stronger the causality.
[08] A calculation method for performing the Granger causality test on a segment of service indexes X and Y is as follows:
m—1
Yip = > AY + Eye j=0 m—1 m—1 — ’ LT i ’ ; 1 . , .
Yip = a; Xe; + b; Yi, + Eyixt+1 j=0 j=0
The above two formulas are successively used to calculate, where XxX : and Y £ are values of the service indexes X and Y at the moment t, {I j ‚€ j ‚ and bh j are parameters of the model, m is a lag period of the model, i.e, the causality is calculated using the first m values of Y, +1: j is a value from 0 to m-1, tq represents the moment (t-j), and & Vv and & FIX. are residual errors of the model at the moment t and are differences between actual values and estimated values.
Regression calculation is performed using this formula, and variance sizes of & Vv 0 and & Yi X . are compared according to the regression results to determine whether
X—Y has Granger causality. Coefficients of the Granger causality are defined as follows: . var (ey)
GClyy = In————— whey VA (eyix) < var{&y) Gly vy > 0 which indicates that X—Y has the Granger causality.
[09] (3) The causality is saved in a causality graph for use in an Attention LSTM multi-index prediction model upon completion of calculating the causality between all the service indexes.
[10] 2. Multi-index prediction based on Attention LSTM (1) First several service indexes with the strongest causality with service indexes to be predicted in the service index causality graph obtained from the Granger causality test and the service indexes to be predicted are taken as inputs of the Attention LSTM prediction model.
[11] (2) The input service indexes are preprocessed, and all the service indexes are normalized to O to 1. If there is missing data in the service indexes, missing values of the service indexes are set to average values of previous and subsequent values.
[12] (3) The preprocessed service indexes are taken as inputs of an LSTM layer. A formula of the model in the LSTM layer is as follows: fe = OW ex, + Urhe 4 + bf)
I; = a{W, x, + UA: 1 + b;)
Ot =0(Wyx; + Uh 1 + by)
Cy = f+ © Ct_1 + iy © TW x; + U he 1 + b.) hy = 0, © o(c;)
Where t represents the moment, W, U, b, Wy, Wo, Us, and We are parameters of the model, fe is a forgetting gate, Ls is an input gate, 0 t is an output gate, C f lS is a state value of a memory unit, h, is an output value of a hidden layer, {F isan activation function, oO represents a Hadamard product, br, bi, bo, and be represent bias values of different functions, © t Tepresents an input value, and U, Us, U, and
Ur with different subscripts represent weight coefficients of corresponding functions.
[13] (4) Outputs of the LSTM layer are taken as inputs of an Attention layer. The
Attention layer enables a neural network to selectively focus on input features, saves and assigns weights of the learned features to input vectors of the next time step, and allocates attention using a weight matrix to highlight an effect of key input features on the prediction. The formula of the model in the Attention layer is as follows:
Spi = U,_tanh(Vh, + Vh, + b) exp(Si:) ay; = softmax(sy)) = ———— - ! WIE exp (si)
N
{= a rifts 7
U, = tanh{C, hi)
V, = sigmoid(U;,)
Where & ki represents an effect of an i™ sequence point on a k¥ sequence point, {J F—1 is a vector saved by an update of the Attention hidden layer, A Ir represents the k' point of the Attention hidden layer, h; represents the i" point of 5 the Attention hidden layer, N is the number of points, Vy , Vs , and h are parameters of the model, eX ki 1s probability distribution obtained by inputting each 5 ki into a Softmax layer and normalizing the same, C is an attention coefficient of the k™ sequence point obtained by weighting and summing a fri. an output value
U I of the Attention layer is obtained according to C and a saved value of the
IR
Attention hidden layer is updated, a predicted value V k is output after U k
A passes a full connection layer and a sigmoid activation function, and finally, V k is compared with a real value Y kr
[14] The present invention has the following advantages compared to the prior art. (1) Indexes with the causality with the indexes to be predicted are found by the
Granger causality test for jointly participating in the prediction, thus improving accuracy of the prediction.
[15] (2) For relatively long index sequences in the micro-service scenario, there are often problems of partial causality between indexes and weak overall causality. The
Granger causality test is improved in the present invention, and the causality is incrementally calculated in segments. In practical applications, when new values are added to time sequences, only incremented causality is required to be calculated, and historical data are not required to be recalculated, thus reducing a calculation amount, and improving efficiency of the causality discovery in the micro-service scenario.
[16] (3) Anomaly points can be marked into a line graph of real-time index data using an anomaly index detection method of the present invention in combination with graphic visualization technology to facilitate operation and maintenance personnel to view and troubleshoot. The causality graph generated by the Granger causality test can solve the problem. If the indexes with the causality are subject to anomaly fluctuation at the same time, it can be converged as an anomaly to avoid excessive anomaly alarms.
[17] (4) The present invention works on principles and development related to a composite alarm, alarm convergence, and alarm notification in a service platform. A composite alarm method allows configuration of the composite alarm for a plurality of indexes, simplifies the configuration of the composite alarm using an expression, and improves flexibility of configuring the alarms.
[18] (5) The alarm convergence method developed in the present invention can aggregate anomaly alarms that occur within a same time range according to a service calling relationship graph generated by a service grid, a service index causality graph, and an alarm topological relationship edited by a developer, converges associated anomaly alarms into an alarm, and reduces troubleshooting costs for the developer.
BRIEF DESCRIPTION OF THE DRAWINGS
[19] FIG. 1 is an architecture diagram of a multi-index prediction model based on a
Granger causality test and Attention LSTM according to the present invention;
FIG. 2 is a schematic diagram of a Granger causality test in segments according to the present invention;
FIG. 3 is a hierarchical structure diagram of an Attention LSTM prediction model according to the present invention;
FIG. 4 is a schematic diagram of parsing a composite alarm expression into an abstract syntax tree;
FIG. 5 is a schematic diagram of marking an index anomaly; and
FIG. 6 is an architecture diagram of alarm convergence.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[20] Technical solutions in embodiments of the present invention will now be described clearly and completely below with reference to the drawings in embodiments of the present invention. Embodiments described herein are only part of, but not all, embodiments of the present invention. Based on embodiments in the present invention, all the other embodiments obtained by those skilled in the art without making any inventive effort fall within the scope of protection of the present invention.
[21] According to an embodiment of the present invention, an alarming method for micro-service index prediction based on a causality test is shown in FIG. 1. A Granger causality test 1s performed according to index data, and a causality graph is generated.
Prediction results are output through an Attention LSTM prediction model according to indexes to be predicted and indexes with causality. The method specifically includes the following steps.
In step 1, causality discovery is performed on service indexes based on a Granger causality test.
In step 2, multi-index prediction, service anomaly detection, and service intelligent alarm are performed based on Attention LSTM.
The causality discovery based on service indexes of a Granger causality test in step 1 specifically includes the following steps. (1) Firstly, service index data is preprocessed, a stationarity test is performed on the service index data, and differential processing is performed on non-stationarity sequences.
[22] (2) The Granger causality test is performed on the service indexes. Since there is misjudgment for the Granger causality test on relatively long time sequences, and index sequences are relatively long in a micro-service scenario, there are often problems of partial causality between indexes and weak overall causality. The Granger causality test is improved in the present invention, and the causality is incrementally calculated in segments. Specifically, the service indexes are segmented into segments with a same length, then the Granger causality test is performed on corresponding segments of two service indexes, and finally, the number of segments with the causality is counted. The greater the number of segments with the causality, the stronger the causality.
[23] A calculation method for performing the Granger causality test on a segment of service indexes X and Y is as follows: m—1
Vit = > Aje ;+ Epa j=0 m—1 m—1
Yigg = > ajXt j+ > bit; + &yixis1 j=0 j=0
The above two formulas are successively used to calculate, where X + and Y £ are values of the service indexes X and Y at the moment t, & j , tL j ‚ and bh j are parameters of the model, m is a lag period of the model, i.e, the causality is calculated using the first m values of Y, +1: j is a value from 0 to m-1, tq represents the moment (t-j), and & Vv and & FIX ‚ are residual errors of the model at the moment t and are differences between actual values and estimated values.
Regression calculation is performed using this formula, and variance sizes of & Vv and & YiX. are compared according to the regression results to determine whether
X—Y has Granger causality. Coefficients of the Granger causality are defined as follows: . var(Ey)
GCly_y =in var(&y|x) aan es 1 me - —~ wien var(erpr) < var(ey) Gly vy > 0 which indicates that X—Y has the Granger causality.
[24] (3) The causality is saved in the causality graph for use in the Attention LSTM multi-index prediction model upon completion of calculating the causality between all the service indexes.
[25] According to an embodiment of the present invention, the Granger causality test in segments is shown in FIG. 2. When performing the causality test on two time sequences x1 and x2, xl and x2 are segmented, then the Granger causality test is performed on corresponding segments of the two sequences, and the number of segments with the causality of x1->x2 is counted. The greater the number of segments with the causality, the stronger the causality. FIG. 3 is a hierarchical structure diagram of an Attention LSTM prediction model according to the present invention.
The multi-index prediction based on Attention LSTM in step 2 specifically includes the following steps. (1) First several service indexes with the strongest causality with service indexes to be predicted in the service index causality graph obtained from the Granger causality test and the service indexes to be predicted are taken as inputs of the Attention LSTM prediction model.
[26] (2) The input service indexes are preprocessed, and all the service indexes are normalized to 0 to 1. If there is missing data in the service indexes, missing values of the service indexes are set to average values of previous and subsequent values.
[27] (3) The preprocessed service indexes are taken as inputs of an LSTM layer. A formula of the model in the LSTM layer is as follows:
f+ = GW rx, + Uehe + br) i = ag(W,x, + U; ht + b;)
Og = (WX: + Ught 1 + by) =O ti OQO T(Wex: + Ushi 1 + bo) hy = 0, © o(cy)
Where t represents the moment, W, U, b, Wg, Wo, Uy, and We are parameters of the model, f+ is a forgetting gate, i t is an input gate, 0 1s an output gate, C f is a state value of a memory unit, h t is an output value of a hidden layer, {3 is an activation function, © represents a Hadamard product, br, bi, bo, and be represent bias values of different functions, X t represents an input value, and U, U, U, and
Ur with different subscripts represent weight coefficients of corresponding functions.
[28] (4) Outputs of the LSTM layer are taken as inputs of an Attention layer. The
Attention layer enables a neural network to selectively focus on input features, saves and assigns weights of the learned features to input vectors of the next time step, and allocates attention using a weight matrix to highlight an effect of key input features on the prediction. The formula of the model in the Attention layer is as follows: ee: Pe exp (Ski) ay; = softmax(sy) ZG i N ore
SV exp (si) {= a wil {
U; = tanh{C, ht)
Vi, = sigmoid(U;)
Where & ki represents an effect of an i™ sequence point on a kf! sequence point, {J k—1 is a vector saved by an update of the Attention hidden layer, h k represents the k™ point of the Attention hidden layer, h; represents the if point of the Attention hidden layer, N is the number of points, V 1 Vs , and are parameters of the model, & Ji is probability distribution obtained by inputting each & i i into a Softmax layer and normalizing the same, C is an attention coefficient of 3 the kit sequence point obtained by weighting and summing a ki. an output value
U I of the Attention layer is obtained according to c and a saved value of the on .
Attention hidden layer is updated, a predicted value y I is output after U i passes a full connection layer and a sigmoid activation function, and finally, V k& is compared with a real value y kr
Furthermore, the service anomaly detection in step 2 is specifically as follows.
The anomaly detection is performed on multiple indexes according to the multi-index prediction model based on the Granger causality test and the Attention LSTM, and a future value of the index is predicted using the multiple indexes. If a difference between the true value and the predicted value of the index is greater than a confidence interval, the index is marked as an anomaly, and the effect is shown in FIG. 5.
[29] When a service item appears anomaly, anomaly alarms of a plurality of indexes are often generated at the same time, causing difficulties in anomaly troubleshooting. The causality graph generated by the Granger causality test can solve the problem. If the indexes with the causality are subject to anomaly fluctuation at the same time, it can be converged as an anomaly to avoid excessive anomaly alarms,
[30] Furthermore, the service intelligent alarm in step 2 is specifically as follows. 3.1 Composite alarm
A composite alarm tool based on an expression engine firstly parses an expression of the composite alarm configured by a developer and generates an abstract syntax tree, then extracts the index data according to a time stamp related to the service indexes or other dimensions, calculates values of the expression at each time point according to the abstract syntax tree, and finally, determines whether an alarm is triggered at each time point according to the calculated values. The expression engine supports four arithmetic operations of addition, subtraction, multiplication, and division as well as custom function operations. The custom function can use a Python programming language. The expression engine parsing the expression of the composite alarm into the abstract syntax tree is shown in FIG. 4. When parsing, indexes, operators, and functions are marked, and then a corresponding abstract syntax tree is generated according to an operation rule. For example, the expression of the composite alarm in
FIG. 4 1s "service 1. index 1/ (service 1. index 1 + service 2. index 2) <0.8", and a root node of the generated abstract syntax tree is a "<". For each time point of the two indexes, "service 1. index 1/ (service 1. index 1 + service 2. index 2)" is calculated first according to the abstract syntax tree, then the calculation result is compared with 0.8, and finally, it is determined whether the alarm is triggered at the time point. Since a plurality of composite alarms are required to be configured in practical applications, and each composite alarm often needs to calculate the index data over a long time span, the present invention encapsulates the expression engine as a stateless micro-service.
The composite alarm tool can be served by calling the expression engine. When the calculation amount is large, the expression engine can be horizontally expanded into a plurality of examples, and the calculation efficiency can be improved through multi-example parallel calculation. Through the composite alarm tool based on the expression engine provided herein, composite alarm configuration can be greatly simplified while increasing alarm flexibility of the configuration.
[31] 3.2 Alarm convergence
There are numerous services and monitoring indexes in most systems or platforms, and the calling relationship between services is complex, resulting in the correlation between service alarms. When the service item fails, multiple indexes of multiple services generate anomaly alarms at the same time. In this case, it is difficult for the developer to discover the correlation between different anomaly alarms. It is necessary to analyze and troubleshoot each anomaly alarm, causing difficulties for the developer to troubleshoot. In response to this problem, an alarm convergence method and a module system architecture based on a service calling relationship, service index causality, and an alarm topological relationship defined by the developer are implemented in the present invention.
[32] As shown in FIG. 5, when the service item appears anomaly, anomaly alarms of a plurality of indexes are often generated at the same time, causing difficulties in anomaly troubleshooting. The causality graph generated by the Granger causality test can solve the problem. If the indexes with causality are subject to anomaly fluctuation at the same time, it can be converged as an anomaly to avoid excessive anomaly alarms.
[33] A convergence architecture is shown in FIG. 6. When a plurality of service indexes trigger the anomaly alarm, anomalies that occur within a certain time range are aggregated according to information about a service calling relationship graph, a service index causality graph, and a user-defined alarm topological relationship graph.
If there is an association between two anomaly alarms, the two anomaly alarms are aggregated. When sending an alarm notification, all the associated anomaly alarms are notified together to reduce troubleshooting costs for the developer. The service calling relationship graph is recorded and generated in real time through a service grid. The service index causality graph is generated using a service index causality discovery algorithm based on the Granger causality test according to the present invention. The present invention provides a user with an ability to define the alarm topological relationship using an alarm topological relationship editor.
[34] The alarm topological relationship editor interfaces a service index collection module and a service anomaly detection module, and uses nodes to represent alarms of the service indexes. The editor supports searching for existing alarms, adds alarms to the topological relationship graph, and performs operations such as dragging, connecting lines, and deleting on the nodes. Directed edges between the nodes represent topological relationships between the alarms. Upon completion of editing the alarm topological relationship, the editor supports exporting the graph data into multiple formats such as json and Gremlin statements of the JanusGraph graph database.
[35] 3.3 Alarm notification
The alarm notification interfaces the service index collection module and the service anomaly detection module. When the developer receives an alarm notification of a certain index, an alarm notification tool automatically acquires relevant data of an anomaly alarm and sends text information about the alarm and a line graph of the anomaly index to a user via mail and the like. When notifying the aggregation alarm, the alarm notification tool automatically acquires relevant data of alarm convergence and sends the service calling relationship graph, the service index causality graph, and the alarm topological relationship graph configured by the developer to the user.
[36] Although illustrative embodiments of the present invention have been described herein, the present invention is not limited to those embodiments. Those skilled in the art should understand that various changes may be made without departing from the spirit and scope of the present invention as defined by the claims.
All inventions that use the idea of the present invention are within the protection of the present invention.

Claims (5)

ConclusiesConclusions 1. Computergeimplementeerde alarmeringswerkwijze voor microdienstindexvoorspelling op basis van een causaliteitstest, omvat: stap 1, het uitvoeren van causaliteitsontdekking op dienstindices op basis van een Granger-causaliteitstest; stap 2, het uitvoeren van multi-indexvoorspelling, dienstafwijkingsdetectie en intelligent dienstalarm op basis van Attentielangekorttermijngeheugen (“Attention Long Short-Term Memory”, Attention LSTM).1. Computer-implemented alerting method for microservice index prediction based on causality test, includes: step 1, performing causality discovery on service indices based on Granger causality test; step 2, performing multi-index prediction, service anomaly detection and intelligent service alarm based on Attention Long Short-Term Memory (Attention LSTM). 2. Werkwijze volgens conclusie 1, waarbij stap 1 specifiek omvat: (1) het voorbewerken van dienstindexdata, het uitvoeren van een stationairiteitstest op de dienstindexdata, en het uitvoeren van differentiële verwerking op niet-stationairiteits-sequenties; (2) het uitvoeren van de Granger-causaliteitstest op de dienstindices in de dienstindexdata, het verbeteren van de Granger-causaliteitstest en het incrementeel berekenen van causaliteit in segmenten, specifiek het segmenteren van de dienstindexdata in segmenten met dezelfde lengte, en vervolgens het uitvoeren van de Granger-causaliteitstest op overeenkomstige segmenten van twee dienstindices X en Y, en tenslotte, het tellen van het aantal segmenten met de causaliteit, en het berekenen en verkrijgen van de causaliteit tussen de dienstindices; en (3) het opslaan van de causaliteit in een causaliteitsgrafiek voor gebruik in een Attentie-LSTM-multi-indexvoorspellingsmodel na voltooiing van het berekenen van de causaliteit tussen alle dienstindices.The method of claim 1, wherein step 1 specifically includes: (1) preprocessing service index data, performing a stationarity test on the service index data, and performing differential processing on non-stationarity sequences; (2) performing the Granger causality test on the service indexes in the service index data, improving the Granger causality test and incrementally calculating causality in segments, specifically segmenting the service index data into segments of the same length, and then performing the Granger causality test on corresponding segments of two service indices X and Y, and finally, counting the number of segments with the causality, and calculating and obtaining the causality between the service indices; and (3) storing the causality in a causality graph for use in an Attention-LSTM multi-index prediction model after completing calculating the causality between all service indices. 3. Werkwijze volgens conclusie 1, waarbij het uitvoeren van multi- indexvoorspelling op basis van Attentie-LSTM specifiek omvat:The method of claim 1, wherein performing multi-index prediction based on Attention-LSTM specifically includes: (2.1) eerst, het nemen van verscheidene dienstindices met de sterkste causaliteit met te voorspellen dienstindices in de dienstindexcausaliteitsgrafiek verkregen uit de Granger-causaliteitstest en de te voorspellen dienstindices als invoer van het Attentie- LSTM-voorspellingsmodel;(2.1) first, taking several service indices with the strongest causality with service indices to be predicted in the service index causality graph obtained from the Granger causality test and the service indices to be predicted as input of the Attention LSTM prediction model; (2.2) het voorbewerken van de invoerdienstindices en het normaliseren van alle dienstindices naar 0-1, waarbij, indien er data in de dienstindices ontbreekt, ontbrekende waarden van de dienstindices op gemiddelde waarden van vorige en volgende waarden ingesteld worden;(2.2) preprocessing the input service indices and normalizing all service indices to 0-1, where, if data are missing in the service indices, missing values of the service indices are set to average values of previous and next values; (2.3) het nemen van de voorbewerkte dienstindices als invoer van een LSTM- laag; en(2.3) taking the preprocessed service indices as input to an LSTM layer; and (2.4) het nemen van uitvoeren van de LSTM-laag als invoer van een verborgen Attentie-laag, waarbij de verborgen Attentie-laag een neuraal netwerk activeert om selectief te focussen op invoerkenmerken, gewichten van de geleerde kenmerken op te slaan en toe te wijzen aan invoervectoren van de volgende tijdsstap, en attentie alloceert met behulp van een gewichtsmatrix om een effect van sleutelinvoerkenmerken op de voorspelling te benadrukken.(2.4) taking outputs from the LSTM layer as input to a hidden Attention layer, where the hidden Attention layer activates a neural network to selectively focus on input features, store and assign weights of the learned features to input vectors of the next time step, and allocates attention using a weight matrix to highlight an effect of key input features on the prediction. 4. Werkwijze volgens conclusie 1, waarbij de dienstafwijkingsdetectie in stap 2 specifiek omvat: het uitvoeren van de afwijkingsdetectie op meerdere indices op basis van de Granger-causaliteitstest en het Attentie-LSTM-multi-indexvoorspellingsmodel, en het voorspellen van toekomstige waarden van de dienstindices met behulp van de meerdere indices, waarbij, indien een verschil tussen een echte waarde en een voorspelde waarde van de dienstindex groter is dan een zekerheidsinterval, de dienstindex als een anomalie gemarkeerd wordt.The method of claim 1, wherein the service anomaly detection in step 2 specifically includes: performing the anomaly detection on multiple indices based on the Granger causality test and the Attention-LSTM multi-index prediction model, and predicting future values of the service indices using the multiple indices, where if a difference between a real value and a predicted value of the service index is greater than a confidence interval, the service index is marked as an anomaly. 5. Werkwijze volgens conclusie 1, waarbij het dienstintelligentalarm in stap 2 specifiek omvat: het ontwerpen van een samengesteld alarm op basis van een expressie-engine, het parsen van een uitdrukking van het samengestelde alarm dat geconfigureerd is door een ontwikkelaar en het genereren van een abstracte syntaxisboom, het vervolgens extraheren van de indexdata volgens een tijdstempel die gerelateerd is aan de dienstindices, het berekenen van waarden van de uitdrukking op elk tijdstip volgens de abstracte syntaxisboom, en tenslotte, het bepalen of een alarm getriggerd wordt op elk tijdstip volgens de berekende waarden; het uitvoeren van een alarmconvergentiewerkwijze op basis van een dienstoproeprelatie, dienstindex causaliteit, en een alarmtopologische relatie die door de ontwikkelaar gedefinieerd is voor alarmering; wanneer een veelheid van dienstindices een afwijkingsalarm triggeren, het aggregeren van afwijkingen die binnen een vooraf bepaald tijdsbereik voorkomen volgens informatie over een dienstoproepverhoudingsgrafiek, een dienstindexcausaliteitsgrafiek, en een door de gebruiker gedefinieerde alarmtopologische verhoudingsgrafiek; indien er een associatie is tussen twee afwijkingsalarmen, het aggregeren van de twee afwijkingsalarmen; en bij het verzenden van een alarmnotificatie, het gezamenlijk notificeren van alle geassocieerde afwijkingsalarmen om de probleemoplossingskosten voor de ontwikkelaar te verminderen; en wanneer de ontwikkelaar een alarmnotificatie van een bepaalde index ontvangt, het automatisch verwerven van relevante data van het afwijkingsalarm en het verzenden van tekstinformatie over het alarm en een lijngrafiek van een afwijkingsindex naar een gebruiker door een alarmnotificatiehulpmiddel;, het, bij het notificeren van een aggregatiealarm, automatisch verwerven van relevante data van alarmconvergentie en het verzenden van de dienstoproeprelatiegrafiek, de dienstindexcausaliteitsgrafiek en de alarmtopologischerelatiegrafiek die geconfigureerd is door de ontwikkelaar naar de gebruiker door het alarmnotificatiehulpmiddel.The method of claim 1, wherein the service intelligence alarm in step 2 specifically includes: designing a composite alarm based on an expression engine, parsing an expression of the composite alarm configured by a developer, and generating a abstract syntax tree, then extracting the index data according to a timestamp related to the service indices, calculating values of the expression at each time according to the abstract syntax tree, and finally, determining whether an alarm is triggered at each time according to the calculated values; performing an alarm convergence method based on a service call relationship, service index causality, and an alarm topology relationship defined by the developer for alarming; when a plurality of service indices trigger an anomaly alarm, aggregating anomalies occurring within a predetermined time range according to information on a service call ratio graph, a service index causality graph, and a user-defined alarm topology ratio graph; if there is an association between two deviation alarms, aggregating the two deviation alarms; and when sending an alarm notification, jointly notifying all associated anomaly alarms to reduce troubleshooting costs for the developer; and when the developer receives an alarm notification of a certain index, automatically acquiring relevant data of the abnormality alarm and sending text information about the alarm and a line graph of an abnormality index to a user by an alarm notification tool; aggregation alarm, automatically acquiring relevant alarm convergence data and sending the service call relationship graph, the service index causality graph and the alarm topology relationship graph configured by the developer to the user through the alarm notification tool.
NL2034766A 2022-05-05 2023-05-05 Alarming method for micro-service index prediction based on causality test NL2034766A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210478087.8A CN114579407B (en) 2022-05-05 2022-05-05 Causal relationship inspection and micro-service index prediction alarm method

Publications (1)

Publication Number Publication Date
NL2034766A true NL2034766A (en) 2023-11-14

Family

ID=81783976

Family Applications (1)

Application Number Title Priority Date Filing Date
NL2034766A NL2034766A (en) 2022-05-05 2023-05-05 Alarming method for micro-service index prediction based on causality test

Country Status (2)

Country Link
CN (1) CN114579407B (en)
NL (1) NL2034766A (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115051870B (en) * 2022-06-30 2024-02-06 浙江网安信创电子技术有限公司 Method for detecting unknown network attack based on causal discovery
CN116383096B (en) * 2023-06-06 2023-08-18 安徽思高智能科技有限公司 Micro-service system anomaly detection method and device based on multi-index time sequence prediction
CN117539648A (en) * 2024-01-09 2024-02-09 天津市大数据管理中心 Service quality management method and device for electronic government cloud platform

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231187B (en) * 2019-07-15 2022-07-26 华为技术有限公司 Micro-service abnormity analysis method and device
US11777966B2 (en) * 2019-11-25 2023-10-03 Cisco Technology, Inc. Systems and methods for causation analysis of network traffic anomalies and security threats
CN113391943B (en) * 2021-06-18 2023-01-06 广东工业大学 Micro-service fault root cause positioning method and device based on cause and effect inference
CN113837358B (en) * 2021-08-25 2024-06-07 华润数字科技有限公司 System strategy prediction method based on Grangel causal relationship and related equipment
CN113919599A (en) * 2021-11-26 2022-01-11 云南电网有限责任公司电力科学研究院 Medium-and-long-term load prediction method

Also Published As

Publication number Publication date
CN114579407A (en) 2022-06-03
CN114579407B (en) 2022-08-23

Similar Documents

Publication Publication Date Title
NL2034766A (en) Alarming method for micro-service index prediction based on causality test
Tama et al. An empirical comparison of classification techniques for next event prediction using business process event logs
CN110493025B (en) Fault root cause diagnosis method and device based on multilayer digraphs
CN107203199A (en) A kind of industry control network safe early warning method and system
CN111143097A (en) GNSS positioning service-oriented fault management system and method
Manias et al. Concept drift detection in federated networked systems
KR102087959B1 (en) Artificial intelligence operations system of telecommunication network, and operating method thereof
CN110942086A (en) Data prediction optimization method, device and equipment and readable storage medium
CN116245033B (en) Artificial intelligent driven power system analysis method and intelligent software platform
WO2021204365A1 (en) Device and method for monitoring communication networks
Fahmy et al. A data mining experimentation framework to improve six sigma projects
CN114546365B (en) Flow visualization modeling method, server, computer system and medium
TW202019133A (en) Software defined driven ict service provider system based on end to end orchestration
Zhang et al. Automatic Traffic Anomaly Detection on the Road Network with Spatial‐Temporal Graph Neural Network Representation Learning
Vashisht et al. Defect prediction framework using neural networks for business intelligence technology based projects
CN117708746B (en) Risk prediction method based on multi-mode data fusion
Andriyanov Development of prediction methods for taxi order service on the basis of intellectual data analysis
Kiavarz et al. BIM-GIS oriented intelligent knowledge discovery
CN114676021A (en) Job log monitoring method and device, computer equipment and storage medium
Andrushchak et al. Intelligent traffic engineering for future intent-based software-defined transport network
Aruna et al. Sparrow Search Optimization with Deep Belief Network based Wind Power Prediction Model
Siryani et al. Framework using Bayesian belief networks for utility effective management and operations
Privat et al. Cyber-Physical graphs” vs. RDF graphs
Fahmy et al. The application of data mining for the trouble ticket prediction in telecom operators
Ghimire et al. Probabilistic-based electricity demand forecasting with hybrid convolutional neural network-extreme learning machine model