CN113900844A

CN113900844A - Service code level-based fault root cause positioning method, system and storage medium

Info

Publication number: CN113900844A
Application number: CN202111127982.7A
Authority: CN
Inventors: 沈梦家; 曹立; 隋楷心; 刘大鹏; 王继斌; 张文池; 吴楠; 陈恒茂
Original assignee: Beijing Bishi Technology Co ltd
Current assignee: Beijing Bishi Technology Co ltd
Priority date: 2021-09-26
Filing date: 2021-09-26
Publication date: 2022-01-07
Anticipated expiration: 2041-09-26
Also published as: CN113900844B

Abstract

The invention provides a fault root cause positioning method, a system and a storage medium based on service code level, wherein the method comprises the following steps: constructing a global heterogeneous topological graph comprising an intersystem calling relation and a service code calling relation; constructing a time series anomaly detection model based on multi-dimensional indexes, and carrying out anomaly detection on each calling edge of the global heterogeneous topological graph; generating a heterogeneous fault map based on the abnormal detection result of each calling edge; and carrying out fault root cause positioning on the obtained heterogeneous fault graph based on a random walk object level sorting algorithm. By adopting the heterogeneous topological graph, the calling relation and the membership relation of the service codes with finer granularity are simply and clearly displayed; by fusing the correlation characteristics of the multi-dimensional indexes, the accuracy of index abnormality detection of the calling edge in the heterogeneous topological structure is effectively improved; the accuracy of fault root cause positioning is effectively improved through a node sorting algorithm of a heterogeneous graph.

Description

Service code level-based fault root cause positioning method, system and storage medium

Technical Field

The invention relates to fault root cause location, in particular to fault root cause location based on service code level.

Background

With the rapid development of technologies such as cloud computing and service computing and the increasing demand of social production for business, more and more modern enterprises deploy application programs and system services in a cloud computing environment, which are called distributed cloud application programs or micro-services. Compared with the traditional centralized architecture, the distributed architecture has better component expansibility, higher development productivity and lower cost.

To ensure high availability and reliability of the system, application providers must deploy link monitoring systems to collect key performance metrics for each service, such as network response time, service response rate, success rate, etc., to handle complex distributed environments to meet availability constraints and stringent service level objectives. However, with increasingly complex business requirements and increasing micro-service scale, when a fault occurs, a large number of index alarms are generated due to the existence of a cross-system multiple-call dependency relationship, and at this time, a system administrator faces massive alarm index information and is difficult to quickly find a key alarm index and a corresponding fault root cause system thereof only by relying on manual analysis, so that monitoring index data and a system topological relationship need to be automatically processed and analyzed by using a machine learning algorithm, so that a fault root cause system is quickly positioned.

However, most of the existing link tracking and monitoring systems only acquire call relation data between systems, perform fault root cause location based on the call relation of the system level, and do not consider service code key information of system call, so that the existing scheme is difficult to locate the problem of fault root cause of fine granularity, and abnormal information is easily hidden due to data aggregation information of system level coarse granularity.

In addition, due to complexity and periodicity of services, the existing simple anomaly detection strategy based on a fixed threshold or k-sigma has more false alarms or false negatives, for example, the effect of an alarm rule that the response rate is lower than 90% and the time exceeds 3 minutes in different services is not satisfactory, and an ideal effect is difficult to achieve. Most of the current anomaly detection algorithms only perform anomaly detection triggering alarm aiming at a single index, do not consider the complex dependency relationship existing among a plurality of key performance indexes, are easy to cause false alarm, and have high false alarm rate particularly in the scene of index anomaly detection of a fine-grained calling side in a heterogeneous topological structure.

Finally, for a data scene after combining system and service codes, currently, academic circles and industrial circles mostly adopt the same level of call data for analysis, but most of actual scenes involve multiple different levels of call data, and the situation is often more complicated. Therefore, a fault root cause positioning scheme for a converged system and service code needs to be provided.

Disclosure of Invention

In order to solve the above problems in the prior art, the present invention provides:

a fault root cause positioning method based on service code level mainly comprises the following steps:

s1, constructing a global heterogeneous topological graph comprising an intersystem calling relation and a service code calling relation;

s2, constructing a time series anomaly detection model based on multi-dimensional indexes, and carrying out anomaly detection on each calling edge of the global heterogeneous topological graph;

s3, generating a heterogeneous fault map based on the abnormal detection result of each calling edge;

s4, based on the random walk object level sorting algorithm, the fault root cause positioning is carried out on the obtained heterogeneous fault graph.

A fault root cause positioning system based on service code level mainly comprises the following modules:

the global heterogeneous topological graph generating module is used for constructing a global heterogeneous topological graph comprising an intersystem calling relation and a service code calling relation;

the anomaly detection module is used for constructing a time series anomaly detection model based on multi-dimensional indexes and carrying out anomaly detection on each calling edge of the global heterogeneous topological graph;

the heterogeneous fault map generation module is used for generating a heterogeneous fault map based on the abnormal detection result of each calling edge;

and the fault root cause positioning module is used for positioning the fault root cause of the obtained heterogeneous fault graph based on a random walk object level sorting algorithm.

A storage medium storing a computer program; when the computer program is executed by a processor in a computer device, the computer device performs the method as described in any one of the above.

By constructing a heterogeneous topological graph, the invention simply and clearly shows the calling relation and the membership relation of the service codes with finer granularity; by fusing the correlation characteristics of the multi-dimensional indexes, a time series abnormity detection model based on the multi-dimensional indexes is constructed, the abnormity detection of the calling edge of the global heterogeneous topological graph is realized, and compared with the technical problem of high false alarm rate caused by carrying out abnormity detection only aiming at a single index in the prior art, the accuracy of abnormity detection of the index of the calling edge in the heterogeneous topological structure is effectively improved; further, a heterogeneous fault graph and a root cause system corresponding to the current alarm are obtained through a node sorting algorithm of the heterogeneous graph and combined with automatic processing of a machine learning algorithm, and are simply displayed to the system for subsequent analysis and processing in a form of visual graph and root cause recommendation, so that an administrator can be assisted to efficiently locate the fault root cause, and the accuracy of fault root cause location is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 flow chart of the method of the present invention

FIG. 2 is a heterogeneous topology diagram of the system and service code invocation relationship of the present invention

FIG. 3 is a time series anomaly detection model based on multi-dimensional indexes

FIG. 4 is a schematic diagram of the index abnormality detection result of the present invention

FIG. 5 heterogeneous fault map of the present invention

FIG. 6 is a visual interface for fault root cause location in accordance with the present invention

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as specifically described herein, and thus the scope of the present invention is not limited by the specific embodiments disclosed below.

Example one

In order to solve the problems in the prior art, the present embodiment provides a method for locating a fault root cause based on a service code level, where a flowchart is shown in fig. 1, and the method mainly includes the following steps:

s1 constructs a global heterogeneous topology graph including intersystem call relationships and service intersystem call relationships.

In order to locate the exception and root cause of a finer-grained service code level, the invention provides a composition strategy of a mixed relation between a service code and an application system. In addition, if a system call forwarded by using the enterprise service bus system ESB _ F5 exists, the service code calling relationship and the service code membership in the upstream and downstream systems can be obtained by arranging the CMDB service calling comparison table. The construction process of the heterogeneous topology map is described below with actual sample data.

The service monitoring system collects the call data and the state of the service transaction in detail in the log, for example, the call log at a certain alarm time is analyzed and then is shown in the following table 1:

TABLE 1 parsed transaction detail data

It can be seen that the data at this time includes the system nodes S1, S2, S3, S4 and the service code nodes T1, T2 called by them. The call relationship existing among the nodes is considered comprehensively, a heterogeneous topological graph including the call relationship of the system nodes and the service code nodes is constructed and obtained as shown in fig. 2, the call relationship graph reflecting the global system and the service code is obtained in fig. 2, wherein each call edge is time sequence index data formed by aggregation of transaction detail data and set time granularity, and the indexes adopted by the invention comprise: transaction amount, success amount, response amount, failure amount, non-response amount, success rate, response time. Compared with the prior art which only relates to the calling topological graph among the systems, the heterogeneous topological graph comprising the calling relation among the systems and the service codes, which is used by the invention, can capture the calling relation and the membership relation of the service codes with finer granularity, and the representation form is concise and clear.

Due to the fact that the traffic volume in actual service is large and complex, the obtained global heterogeneous topological graph is often complex. However, only local services are affected when a fault occurs in an actual production environment, so that the invention proposes to call the global heterogeneous topological graph and detect the abnormity at the same time so as to obtain the local heterogeneous topological graph with the fault.

S2, a time series anomaly detection model based on the multi-dimensional indexes is constructed, and anomaly detection is carried out on the calling edge of the global heterogeneous topological graph. The time series anomaly detection model based on the multi-dimensional indexes is constructed through a graph attention machine mechanism as shown in fig. 3. S2 specifically includes the following steps:

s2.1, normalizing the time sequence of the time window corresponding to the n indexes;

wherein n represents eachCalling the number of KPI indexes counted at the edge, converting the n KPI indexes into nodes for representing in order to consider the correlation characteristics among all the indexes, namely, the ith index corresponds to the node v_i. Obtaining input characteristics { v) corresponding to n KPI indexes by adopting a min-max normalization method₁，v₂，…，v_nTherein of

Node v_iAnd representing a w-dimensional feature vector corresponding to the ith KPI, wherein the dimension w of the feature vector corresponds to the dimension of the time window.

S2.2 learning the fusion characteristics of the nodes through graph attention mechanism.

Node v_iFusion feature h of_iCalculated by the following formula:

wherein N (i) represents a node v_iV set of neighbor nodes of_jRepresenting a node v_iA represents a sigmoid activation function, a_ijRepresenting a node v_iAnd node v_jAssociated weight of, node V_jRepresenting the w-dimensional feature vector corresponding to the j index, and associating the weight a_ijCalculated by the following formula:

wherein the content of the first and second substances,

e_ijrepresenting a node v_iAnd node v_jAttention value of calling edge in between, e_ilRepresenting a node v_iAnd node v_lThe attention value of the calling edge in between,

representation featureConnecting operation, LeakyReLU is an activation function, W represents a learnable parameter matrix, L represents v_jThe number of neighbor nodes of a node, l represents v_iSequence numbers of neighbor nodes of the node.

Calculating to obtain the fusion characteristics of all nodes by using H_iAnd (4) showing.

S2.3 fusion characteristics H based on all obtained nodes_iAnd learning to obtain the embedded characteristics of the time series corresponding to different indexes.

After the learning of the graph attention machine, the fusion characteristics H of all the nodes_iThe output feature dimension is n x w, the n x 2w dimension feature is obtained by connecting the output feature dimension with the original sequence feature, then the long-term time sequence dependent feature is input into the LSTM module to be coded, and the embedded feature of the time sequence corresponding to different indexes is obtained by learning.

S2.4 obtaining the predicted values of the time series of all the indexes at the t moment based on the obtained embedding characteristics of the time series corresponding to the different indexes

Specifically, the embedded characteristics of all the indexes are input into a multi-layer perceptron MLP to obtain predicted values of all time sequences at t moment

Taking a mean square error loss function MSE as an optimization function:

where n represents the number of predicted indices.

S2.5 the predicted values at time t based on the time series of all the obtained indicators

Calculating to obtain abnormal score value score representing index deviation degree_i(t)。

Wherein the deviation value for the i-th index is calculated by the following formula:

the deviation value of the index is normalized by the following formula:

wherein, score_i(t) is the value of the abnormality score,

and

and respectively representing the median and the quartile instead of the mean and the standard deviation, and experiments prove that the normalization effect has the optimal expression effect. By adopting a time series abnormity detection model based on multiple indexes, the invention can more intuitively observe the deviation degree of each index.

S2.6 score based on the obtained abnormality score value score_iAnd (t) judging whether the calling edge is abnormal or not. Specifically, the abnormality score value score representing the degree of deviation of the index to be obtained_i(t) comparing the abnormality score value score with a preset threshold value when the abnormality score value score is higher than the preset threshold value_iAnd (t) when the threshold value is larger than the threshold value, judging that the detection result of the calling edge is abnormal. The detection result is shown in fig. 4, where red sides indicate abnormality and black sides indicate normality.

Compared with the traditional time series anomaly detection method, the time series anomaly detection model based on the multi-dimensional indexes, which is constructed by the invention, does not depend on any hypothesis of data distribution, and takes the correlation dependence characteristics among the multi-dimensional indexes called by the service into consideration, so that the anomaly detection is more accurate and efficient.

S3, generating a heterogeneous fault map based on the abnormal detection result of each calling edge.

Specifically, based on S2, an abnormal calling edge in the heterogeneous topology map is obtained, and data of the calling edge whose detection result is normal is filtered from the global heterogeneous topology map, so as to obtain a heterogeneous fault map in which only a fault portion is displayed. For example, filtering the global heterogeneous topology map of fig. 2 results in a heterogeneous fault map as shown in fig. 5.

Specifically, S4 includes the following steps:

s4.1, based on the heterogeneous fault map generated in S3, an object set V and an object type set A are determined.

Specifically, the heterogeneous fault map generated by S3 can be formally expressed as

Wherein ν, ε represents the object set and the relationship set, respectively. Setting object type mapping function due to the fact that heterogeneous graph comprises multiple types of objects

Wherein A represents a set of object types which are not repeated after mapping, and objects of the same type of a plurality of different instances are mapped to corresponding object types through a mapping function.

And S4.2, distributing corresponding abnormal propagation factors for different object types based on the obtained object type set A.

And distributing corresponding abnormal propagation factors for different object types based on the importance degrees of the different object types in the heterogeneous fault graph. Specifically, the abnormal propagation factors of different object types can be obtained through distribution by expert knowledge or learning by combining search optimization algorithms, such as simulated annealing optimization algorithms, based on historical data.

Compared with the method that the abnormal propagation differences among different object types are not considered in the prior art, the method and the device for calculating the root cause score effectively improve the accuracy and pertinence of subsequent root cause score calculation by setting the abnormal propagation factors among different object types and expressing the differences of the abnormal propagation weights among different object types.

S4.3 based on the obtained object set V, iteratively calculating by adopting a PageRank algorithm to obtain a pivot value of each object as an initial root factor score R of each object_ea。

Where a represents any object in the set of objects V.

S4.4 determining a root cause score R of each object based on the obtained abnormal propagation factor and the initial root cause score_x。

Specifically, the root cause fraction R of the object x is obtained by the following formula_x：

X, Y respectively represents an object set with the type of X and an object set with the type of Y in the object type set A, wherein X represents an object in the object set with the type of X, and Y represents an object in the object set with the type of Y; r_xAnd R_yRoot scores representing object x and object y, respectively; m_xYIs a contiguous matrix, M_xYM is used as element in_xYMeaning that if there is a relationship between object x and object type Y, then m_xYNum (x, Y); if there is no relationship between object x and object type Y, then m_xY0; num (x, Y) represents the sum of the number of relationships between object x and all objects in the set of objects of type Y; gamma ray_XYRepresenting an exception propagation factor between object type X and object type Y,

ε represents the attenuation factor, selected based on expert knowledge.

The invention effectively solves the problem that the initial root factor score does not consider the relation between different object types by combining the object sorting algorithm of the heterogeneous graph.

And S4.5, selecting the object corresponding to the root cause score of top-K as a fault root cause positioning result based on the obtained root cause score of each object.

Wherein the root score of top-K represents the first K largest root scores.

Specifically, the obtained fault root cause positioning result is displayed in a visual form, as shown in fig. 6, for reference by a system administrator.

By adopting the heterogeneous topological graph, the invention simply and clearly shows the calling relation and the membership relation of the service codes with finer granularity; by fusing the correlation characteristics of the multi-dimensional indexes, the accuracy of index abnormality detection of the calling edge in the heterogeneous topological structure is effectively improved; the method has the advantages that root cause positioning is carried out by adopting a node sorting algorithm of a heterogeneous graph, not only are pivot values of abnormal propagation of objects in the heterogeneous graph considered, but also abnormal propagation causes among different object types are considered, after system monitoring data pass through the algorithm processing framework, heterogeneous fault graphs and root cause systems corresponding to current alarms are obtained by combining automatic processing of a machine learning algorithm, and are simply displayed to the system for analysis and processing in a visual form and a root cause recommending form, so that an administrator can be assisted to efficiently position a fault root cause, and the accuracy of fault root cause positioning is effectively improved.

Example two

The embodiment provides a fault root cause positioning system based on service code level, which mainly comprises the following modules:

and the global heterogeneous topological graph generating module is used for constructing a global heterogeneous topological graph comprising an intersystem calling relation and a service code calling relation.

In order to locate the service code level abnormity and root cause with finer granularity, the invention provides a composition strategy of mixed relation between the service code and an application system. In addition, if a system call forwarded by using the enterprise service bus system ESB _ F5 exists, the service code calling relationship and the service code membership in the upstream and downstream systems can be obtained by arranging the CMDB service calling comparison table.

And the anomaly detection module is used for constructing a time series anomaly detection model based on the multi-dimensional indexes and carrying out anomaly detection on the calling edge of the global heterogeneous topological graph. Here, the abnormality detection model is constructed by a graph attention machine system as shown in fig. 3. The anomaly detection module is used for realizing the following functions:

firstly, the time series of the time windows corresponding to the n indexes are normalized.

Wherein n represents the number of KPI indexes counted by each calling edge, and in order to consider the correlation characteristics among all indexes, the n KPI indexes are converted into nodes to be represented, namely the i index corresponds to the node v_i. Obtaining input characteristics { v) corresponding to n KPI indexes by adopting a min-max normalization method₁，v₂，…，v_nTherein of

And learning fusion characteristics of different nodes through a graph attention mechanism.

In particular, node v_iFusion feature h of_iCalculated by the following formula:

wherein e is_ijRepresenting a node v_iAnd node v_jAttention value of calling edge in between, e_ilRepresenting a node v_iAnd node v_lThe attention value of the calling edge in between,

representing a characteristic join operation, LeakyReLU being an activation function, W representing a learnable parameter matrix, L representing v_jThe number of neighbor nodes of a node, l represents v_iSequence numbers of neighbor nodes of the node. Calculating to obtain the fusion characteristics of all nodes by using H_iAnd (4) showing.

Fusion characteristic H based on all obtained nodes_iAnd learning to obtain the embedded characteristics of the time series corresponding to different indexes.

Obtaining the predicted values of the time series of all the indexes at the time t based on the obtained embedding characteristics of the time series corresponding to the different indexes

Taking a mean square error loss function MSE as an optimization function:

where n represents the number of predicted indices. The invention adopts a time series abnormity detection model based on multiple indexes, and the deviation degree of each index can be observed more intuitively.

Based onThe predicted values of the time series of all the indexes at the time t are obtained

the deviation value of the index is normalized by the following formula:

wherein, score_i(t) is the value of the abnormality score,

and

Based on the obtained abnormality score value score_iAnd (t) judging whether the calling edge is abnormal or not.

Specifically, the abnormality score value score representing the degree of deviation of the index to be obtained_i(t) comparing the abnormality score value score with a preset threshold value when the abnormality score value score is higher than the preset threshold value_iAnd (t) when the threshold value is larger than the threshold value, judging that the detection result of the calling edge is abnormal. The detection result is shown in fig. 4, where red sides indicate abnormality and black sides indicate normality.

And the heterogeneous fault map generation module is used for generating a heterogeneous fault map based on the abnormal detection result of each calling edge.

Specifically, the abnormal calling side in the heterogeneous topological graph is obtained based on the abnormal detection module, data of the calling side with a detection result being normal is filtered from the global heterogeneous topological graph, and the heterogeneous fault graph which only displays the fault part is obtained. For example, filtering the global heterogeneous topology map of fig. 2 results in a heterogeneous fault map as shown in fig. 5.

Specifically, the fault root cause positioning module is used for realizing the following functions:

and determining an object set V and an object type set A based on the heterogeneous fault graphs generated by the heterogeneous fault graph generation module.

Specifically, the heterogeneous fault map generated by the heterogeneous fault map generation module can be formally expressed as

And distributing corresponding abnormal propagation factors for different object types based on the obtained object type set A.

Based on the obtained object set V, a PageRank algorithm is adopted to iteratively calculate a pivot value of each object as an initial root factor score R of each object_ea。

Where a represents any object in the set of objects V.

Determining a root cause score R of each object based on the obtained abnormal propagation factor and the initial root cause score_x。

ε represents the attenuation factor, selected based on expert knowledge.

And selecting an object corresponding to the root factor score of top-K as a fault root factor positioning result based on the obtained root factor score of each object.

Wherein the root score of top-K represents the first K largest root scores.

Example three:

the present embodiment provides a storage medium storing a computer program; when the computer program is executed by a processor in a computer device, the computer device performs the method as described in any one of the above.

The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the embodiments. It will be apparent, however, to one skilled in the art that the embodiments may be practiced without the specific details. Thus, the foregoing descriptions of specific embodiments described herein are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed. It will be apparent to those skilled in the art that many modifications and variations are possible in light of the above teaching. Further, as used herein to refer to the position of a component, the terms above and below, or their synonyms, do not necessarily refer to an absolute position relative to an external reference, but rather to a relative position of the component with reference to the drawings.

Moreover, the foregoing drawings and description include many concepts and features that may be combined in various ways to achieve various benefits and advantages. Thus, features, components, elements and/or concepts from various different figures may be combined to produce embodiments or implementations not necessarily shown or described in this specification. Furthermore, not all features, components, elements and/or concepts shown in a particular figure or description are necessarily required to be in any particular embodiment and/or implementation. It is to be understood that such embodiments and/or implementations fall within the scope of the present description.

Claims

1. A fault root cause positioning method based on service code level is characterized by comprising the following steps:

s4, based on the random walk object level sorting algorithm, fault root cause positioning is carried out on the obtained heterogeneous fault graph.

2. The method according to claim 1, wherein each of the calling edges of the global heterogeneous topology map is time-series index data generated by aggregating transaction detail data and set time granularity, and the index data at least includes a combination of two or more of transaction amount, success amount, response amount, failure amount, non-response amount, success rate, response rate, and response time.

3. The method for locating a root cause of a fault based on a service code level as claimed in claim 1, wherein the step of S2 further comprises the steps of:

s2.2, learning the fusion characteristics of the nodes through graph attention mechanism;

s2.3 fusion characteristics H based on all obtained nodes_iLearning to obtain embedded characteristics of time series corresponding to different indexes;

s2.4 obtaining the predicted values of the time series of all the indexes at the t moment based on the obtained embedding characteristics of the time series corresponding to different indexes

S2.5 predicted value at t moment based on time series of all obtained indexes

Calculating to obtain an abnormality score value score representing the degree of deviation of the index_i(t)；

S2.6 score based on obtained abnormality score value score_iAnd (t) judging whether the calling edge is abnormal or not.

4. A method according to claim 3, wherein the S2.2 learning the fusion characteristics of the nodes through the graph attention mechanism includes:

fusion feature h of node i_iCalculated by the following formula:

wherein N (i) represents a node v_iV set of neighbor nodes of_jRepresenting a node v_iThe neighbor nodes of (a) are,

wherein N (i) represents a node v_iV set of neighbor nodes of_jRepresenting a node v_iA represents a sigmoid activation function, a_ijRepresenting a node v_iAnd node v_jAssociated weight of, node v_jRepresenting a w-dimensional feature vector corresponding to the jth KPI index;

associated weight a_ijCalculated by the following formula:

wherein the content of the first and second substances,

representing a characteristic join operation, LeakyReLU being an activation function, W representing a learnable parameter matrix, L representing v_jThe number of neighbor nodes of a node, l represents v_iSequence numbers of neighbor nodes of the node.

5. The method for locating a root cause of a fault based on a service code level as claimed in claim 1, wherein the step of S4 comprises the steps of:

s4.1, determining an object set V and an object type set A based on the heterogeneous fault map generated in the S3;

s4.2, distributing corresponding abnormal propagation factors for different object types based on the obtained object type set A;

s4.3 based on the obtained object set V, iteratively calculating by adopting a PageRank algorithm to obtain a pivot value of each object as an initial root factor score R of each object_ea；

S4.4 determining a root cause score R of each object based on the obtained abnormal propagation factor and the initial root cause score_x；

6. The method for fault root cause location based on service code level of claim 5, wherein the S4.2 comprises: the abnormal propagation factor is distributed through expert knowledge or a combined search optimization algorithm based on historical data.

7. The method for fault root cause location based on service code level of claim 5, wherein the S4.4 comprises:

root cause score R of object x_xCalculated by the following formula:

x, Y respectively represents an object set with the type of X and an object set with the type of Y in the object type set A, wherein X represents an object in the object set with the type of X, and Y represents an object in the object set with the type of Y; r_xAnd R_yRoot scores representing object x and object y, respectively; m_xYIs a contiguous matrix, M_xYM is used as element in_xYMeaning that if there is a relationship between object x and object type Y, then m_xYNum (x, Y); if there is no relationship between object x and object type Y, then m_xY0; num (x, Y) represents the sum of the number of relationships between object x and all objects in the set of objects of type Y; gamma ray_XYRepresenting object type X and object type YThe abnormal propagation factor of the abnormal wave in the middle,

ε represents the attenuation factor.

8. The method for fault root location based on service code level of claim 5, wherein the root score of top-K represents the top K largest root scores; said S4.5 further comprises: and displaying the obtained fault root cause positioning result in a visual form.

9. A fault root cause positioning system based on service code level is characterized in that the system mainly comprises the following modules:

10. A storage medium, characterized in that it stores a computer program; the computer device performs the method of any one of claims 1-8 when the computer program is executed by a processor in the computer device.