CN113271224A - Node positioning method and device, storage medium and electronic device - Google Patents

Node positioning method and device, storage medium and electronic device Download PDF

Info

Publication number
CN113271224A
CN113271224A CN202110535454.9A CN202110535454A CN113271224A CN 113271224 A CN113271224 A CN 113271224A CN 202110535454 A CN202110535454 A CN 202110535454A CN 113271224 A CN113271224 A CN 113271224A
Authority
CN
China
Prior art keywords
target
data
node
determining
index data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110535454.9A
Other languages
Chinese (zh)
Inventor
徐磊
徐竹胜
刘伟煜
刘义
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Postal Savings Bank of China Ltd
Original Assignee
Postal Savings Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Postal Savings Bank of China Ltd filed Critical Postal Savings Bank of China Ltd
Priority to CN202110535454.9A priority Critical patent/CN113271224A/en
Publication of CN113271224A publication Critical patent/CN113271224A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Abstract

The embodiment of the invention provides a node positioning method, a node positioning device, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring target index data generated when a target micro-service application runs, wherein a node called by the target micro-service application comprises a plurality of nodes; detecting the target index data to determine a target detection result; and in the case that the detection result indicates that abnormal data exists in the target index data, positioning a target node for triggering generation of the abnormal data based on the target index data. The invention solves the problem of low efficiency of positioning the failed node in the related technology, and achieves the effect of improving the positioning efficiency of the failed node.

Description

Node positioning method and device, storage medium and electronic device
Technical Field
The embodiment of the invention relates to the field of communication, in particular to a node positioning method, a node positioning device, a storage medium and an electronic device.
Background
Under a large-scale and variable micro-service environment, the calling levels among service systems are deeper and deeper, and the calling relation is increasingly complex. When a service system fails, how to quickly analyze the problem from an intricate and complex calling relationship and accurately locate a fault point is a problem which is focused on monitoring, operation and maintenance of the system.
Generally, when the micro service application is abnormal (for example, response time is long, success rate is reduced, etc.), operation and maintenance personnel are required to use experience to locate the micro service interface and even which server has a fault by means of querying logs, etc. However, the ability and experience of each operation and maintenance person varies, and the accumulation and propagation of experience also requires a certain time cost. Even if tools such as application performance monitoring are used for assisting, operation and maintenance personnel can inquire faults in a large number of performance monitoring and link tracking numbers, and the operation and maintenance personnel also take time and labor. The main reasons for the time and labor consumption are as follows:
firstly, the service relationship in the micro-service architecture is complex and changeable, and the real-time topological relationship is difficult to obtain;
second, the large number of microservices generate a large amount of operational data, which can be quite time consuming to process and analyze;
thirdly, the traditional alarm function is basically a constant threshold value, the alarm false alarm rate is high, and the difficulty of fault location is increased;
fourth, conventional fault location lacks fault active discovery capability.
Therefore, the related art has the problem of low efficiency in locating the failed node.
In view of the above problems in the related art, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a node positioning method, a node positioning device, a storage medium and an electronic device, which are used for at least solving the problem of low efficiency of positioning a failed node in the related art.
According to an embodiment of the present invention, there is provided a node positioning method, including: acquiring target index data generated when a target micro-service application runs, wherein a node called by the target micro-service application comprises a plurality of nodes; detecting the target index data to determine a target detection result; and under the condition that the detection result indicates that abnormal data exists in the target index data, positioning a target node for triggering generation of the abnormal data based on the target index data.
According to another embodiment of the present invention, there is provided a node positioning apparatus including: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring target index data generated when a target micro-service application runs, and nodes called by the target micro-service application comprise a plurality of nodes; the detection module is used for detecting the target index data to determine a target detection result; and the positioning module is used for positioning a target node for triggering generation of abnormal data based on the target index data under the condition that the detection result indicates that the abnormal data exists in the target index data.
According to yet another embodiment of the invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program, when executed by a processor, implements the steps of the method as set forth in any of the above.
According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
According to the method and the device, the target index data generated when the target micro-service application runs are obtained, the target index data are detected to determine the target detection result, and the target node generating the abnormal data is triggered according to the positioning of the target index data under the condition that the detection result indicates that the abnormal data exist in the target index data. The target index data can be automatically detected after being acquired, so that the target node can be directly positioned under the condition that the abnormal data is determined. Therefore, the problem of low efficiency of locating the failed node in the related technology can be solved, and the effect of improving the locating efficiency of the failed node is achieved.
Drawings
Fig. 1 is a block diagram of a hardware structure of a mobile terminal of a node positioning method according to an embodiment of the present invention;
fig. 2 is a flowchart of a method of positioning a node according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method for locating a node according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a node location apparatus according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an example micro-service fault location in accordance with an embodiment of the present invention;
fig. 6 is a block diagram of a positioning apparatus of a node according to an embodiment of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings in conjunction with the embodiments.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
The method embodiments provided in the embodiments of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking an example of the method performed by a mobile terminal, fig. 1 is a block diagram of a hardware structure of the mobile terminal of a method for locating a node according to an embodiment of the present invention. As shown in fig. 1, the mobile terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and a memory 104 for storing data, wherein the mobile terminal may further include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used for storing computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the node positioning method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
In this embodiment, a method for positioning a node is provided, and fig. 2 is a flowchart of a method for positioning a node according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:
step S202, target index data generated when a target micro-service application runs is obtained, wherein nodes called by the target micro-service application comprise a plurality of nodes;
step S204, detecting the target index data to determine a target detection result;
step S206, under the condition that the detection result indicates that abnormal data exists in the target index data, positioning a target node for triggering generation of the abnormal data based on the target index data.
In the above embodiments, the target index data may include key business indexes, call chain data, operating system performance indexes, and middleware data. The key business indexes can comprise average transaction response time, service calling success rate, calling times per minute and the like; the call chain data can comprise service names, ports, instances, link data, topological relation data and the like; the performance index and middleware data of the operating system may include CPU utilization, memory utilization, network interface ingress and egress traffic, total database connection count, and the like.
In the above embodiment, when the target index data is detected, the business key index data and the operating system performance index data included in the target index data may be detected, and it is determined whether there is abnormal data in the business key index data and the operating system performance index data.
Optionally, the main body of the above steps may be a background processor or other devices with similar processing capabilities, and may also be a machine of at least a data processing device, where the data processing device may include a terminal such as a computer, a mobile phone, and the like, but is not limited thereto.
According to the method and the device, the target index data generated when the target micro-service application runs are obtained, the target index data are detected to determine the target detection result, and the target node generating the abnormal data is triggered according to the positioning of the target index data under the condition that the detection result indicates that the abnormal data exist in the target index data. The target index data can be automatically detected after being acquired, so that the target node can be directly positioned under the condition that the abnormal data is determined. Therefore, the problem of low efficiency of locating the failed node in the related technology can be solved, and the effect of improving the locating efficiency of the failed node is achieved.
In one exemplary embodiment, detecting the target metric data to determine a target detection result includes: determining a target data type of the target index data; and determining a target detection model based on the type of the target data, and sending the target index data to the target detection model for detection so as to determine the target detection result. In this embodiment, index data of different data types may be detected by using different detection models. When determining the target data type of the target index data, the key features of the target index data may be extracted first, and the target index data may be classified according to the key features. When the target indicator data is rate type data (e.g., memory usage), the target detection model may be an accumulated constant threshold detection algorithm model. If the target metric data is periodic data, historical data can be utilized for targeted training and model generation. And after the model is generated, carrying out abnormity detection on the index data in real time.
In one exemplary embodiment, determining the target detection model based on the target data type includes: and determining the target detection model corresponding to the target data type based on a preset corresponding relation, wherein the preset corresponding relation is used for indicating the corresponding relation between the data type and the detection model. In this embodiment, the target detection model corresponding to the target data type may be determined according to a preset correspondence. The preset corresponding relationship may be a corresponding relationship determined according to data characteristics of the historical data. For example, when the target indicator data is rate-type data (e.g., memory usage), the target detection model may be an accumulative constant threshold detection algorithm model. When the target index data is periodic data, the target detection model may be a model trained by machine learning through a plurality of sets of training data, where each set of training data included in the plurality of sets of training data includes index data and a data type. In determining the model of the cumulative constant threshold detection algorithm, the training data may be historical rate type data and an average of the data. That is, the average value of each rate type data included in the rate type data is determined from the history data, and when the target index data is detected by the cumulative constant threshold detection algorithm model, the value of the rate type data included in the target index data is determined, the value is compared with the history average value, and when the value is greater than the average value, it can be considered that the data is abnormal. It should be noted that the reference value is an average value, which is only an exemplary illustration, and may also be other statistical values of the historical data, such as a median value, and the like, and the reference value is not limited in the present invention.
In an exemplary embodiment, before determining the target detection model corresponding to the target data type based on a preset correspondence, the method further includes: acquiring historical index data generated when the target micro-service application runs; carrying out data preprocessing on the historical index data to obtain time sequence historical index data; extracting data characteristics of the time sequence historical index data; determining a historical data type of the historical metric data based on the data feature; and establishing the corresponding relation between the historical data type and the detection module. In this embodiment, the historical index data may be acquired, arranged in a time sequence, and subjected to data preprocessing, such as filling missing data, removing a burr point, and the like, with respect to the historical time-series index data. The time sequence index data is analyzed, key features are extracted, the indexes are classified according to the data features, and the extracted features or detection algorithms of each data form are different. And determining the length of the extracted historical data and the selected algorithm according to the classification of the time sequence index data and the result of the feature extraction. And merging the abnormal alarms according to a certain rule, thereby reducing the number of alarms. When the abnormal data is determined to need to be alarmed, alarming can be performed through the alarming times within the preset time, and when the alarming times within the preset time exceed a preset threshold value, the alarming is performed so as to reduce the alarming quantity.
In one exemplary embodiment, before locating a target node for triggering generation of the anomaly data based on the target metric data, the method further comprises: determining a fault type of the abnormal data based on the abnormal data; under the condition that the fault type is determined to be a service fault, positioning the target node for triggering generation of the abnormal data based on the target index data; and under the condition that the fault type is determined to be other faults except the service fault, storing the target index data and/or executing alarm operation. In this embodiment, when the abnormal data is the non-key service index abnormal data, an alarm prompt is sent and/or the abnormal data index is stored. When the abnormal data is the service fault data, the correlation analysis can be carried out by combining the micro-service real-time topological relation, the call chain data and the alarm information, and the fault can be quickly positioned.
In one exemplary embodiment, locating the target node for triggering generation of the anomaly data based on the target metric data comprises: determining a target time period for generating the abnormal data; acquiring call chain data in the target time period from the target index data; determining a candidate call chain set in the call chain data based on the fault type; determining the target node based on the set of candidate call chains. In this embodiment, when the key service indicator is abnormal or a fault is found, the abnormal time interval may be determined, and all call chain data of this time interval may be acquired. And analyzing and screening the selected call chain data according to the abnormal key service index type to obtain a call chain set which greatly contributes to the current fault. For example, if the average response time is abnormal, the call chains are sorted in a descending order according to the response time of the call chains, and the top 20 call chains are selected as a candidate call chain set according to the port information of the call chains. And determining a target node according to the call chain set.
In one exemplary embodiment, determining the target node based on the set of candidate call chains comprises: acquiring the consumption time of each span included in a candidate call chain, wherein the candidate call chain is a call chain included in the candidate call set; determining candidate nodes included in the candidate call chain based on the consumption time and a topological relation included in the target index data; determining the target node based on the candidate node. In this embodiment, for each call chain in the selected candidate call chain set, the consumption time of each span (span) in the call chain may be counted, and according to the consumption time and the topological relation data, a candidate node set is determined through layer-by-layer traversal analysis.
In one exemplary embodiment, determining the target node based on the candidate node comprises: determining a state of the index data of each node included in the candidate nodes, respectively; determining a node in an abnormal state included in the candidate nodes based on a state of the metric data of each node, and determining the node in the abnormal state as the target node. In this embodiment, according to the abnormal condition of the selected candidate node index data, if there is no abnormal alarm in the node index data, the node is removed from the candidate node. And if the node index data has abnormal alarm, adding the node and the abnormal index into the node abnormal index candidate set, and determining a target node from the abnormal index candidate set. For example, multiple dimension information such as time and abnormal types can be integrated, a final fault node and an abnormal index set are determined, and the final fault node and the abnormal index set are pushed to the front end to be displayed.
In one exemplary embodiment, after determining the node in the abnormal state as the target node, the method further includes: determining index data of the target node; and displaying the target node and the index data. In this embodiment, after the target node is determined, the target node and the index data may be pushed to the front end for presentation.
The following describes a node positioning method with reference to a specific embodiment:
fig. 3 is a flowchart of a node location method according to an embodiment of the present invention, and as shown in fig. 3, the flowchart includes:
and S1, collecting index data. The method specifically comprises the following steps: s1-1, the collected index data are mainly divided into three types:
1. key business indexes are as follows: the method comprises the steps of average transaction response time, service calling success rate, calling times per minute and the like;
2. calling chain data: including service name, port, instance, link data, topology relationship data, etc.;
3. operating system performance index and middleware data: the method comprises the following steps of CPU utilization rate, memory utilization rate, network interface access flow, total database connection number and the like.
And S2, monitoring the service key indexes and the performance indexes, and adopting abnormal detection and alarm based on a machine learning algorithm.
The method specifically comprises the following steps:
s1-2, storing the acquired index data into different databases according to types;
and S2-1, performing data preprocessing aiming at the historical time sequence index data, wherein the data preprocessing comprises the steps of filling missing data, removing burr points and the like. The time sequence index data is analyzed, key features are extracted, the indexes are classified according to the data features, and the extracted features or detection algorithms of each data form are different.
And S2-2, determining the length of the extracted historical data and the selected algorithm according to the classification of the time sequence index data and the result of feature extraction in the S2-1. For example, if the indicator data is rate-type data (e.g., memory usage), a cumulative constant threshold detection algorithm is directly employed. If the index data is periodic data, the historical data is used for targeted training and model generation. And after the model is generated, carrying out abnormity detection on the index data in real time.
And S2-3, carrying out anomaly detection on the real-time index data according to the S2-2 anomaly detection model, and merging the anomaly alarms according to a certain rule to reduce the number of alarms.
And S2-4, if the non-key service index is abnormal, sending an alarm prompt and storing the alarm prompt.
And S3, when a fault occurs, performing correlation analysis by combining the micro-service real-time topological relation, the call chain data and the alarm information, and rapidly positioning the fault.
The method specifically comprises the following steps:
s3-1, when the key service index is abnormal or a fault is found, determining the abnormal time interval and acquiring all call chain data of the time interval.
And S3-2, analyzing and screening the selected call chain data according to the abnormal key service index type, and acquiring a call chain set which greatly contributes to the current fault. For example, if the average response time is abnormal, the call chains are sorted in a descending order according to the response time of the call chains, and the top 20 call chains are selected as a candidate call chain set according to the port information of the call chains.
And S3-3, counting the consumption time of each span (span) in the call chain aiming at each call chain in the candidate call chain set selected in the S3-2, and determining the candidate node set by layer through traversal analysis according to the consumption time and the topological relation data. Where a span is a certain node/step in a service invocation.
And S3-4, acquiring abnormal conditions of the candidate node index data selected in the S3-3, and if no abnormal alarm exists in the node index data, removing the node from the candidate node. And if the node index data has abnormal alarm, adding the node and the abnormal index into the node abnormal index candidate set.
And S3-5, integrating multiple dimensional information such as time, abnormal types and the like, determining a final fault node and an abnormal index set, and recommending to the front end for display.
It should be noted that after the step S2-4 is executed, the step S3-4 is not required, and when the step S3-4 is executed, the non-critical service index data may be called to perform comprehensive judgment, so as to further filter the candidate node set.
Fig. 4 is a schematic diagram of a node positioning apparatus according to an embodiment of the present invention, as shown in fig. 4, the apparatus includes:
the micro-service data monitoring and collecting module comprises: the method is mainly used for system monitoring and data acquisition, and the monitored and acquired data mainly comprises business key indexes, micro-service real-time topological relation, micro-service real-time calling link data and real-time performance indexes (including an operating system, a database, a virtual machine and the like). Wherein, the micro-service real-time topological relation and the call chain data are mainly obtained by deploying an application performance monitoring tool; and storing all the acquired index data into different databases according to types.
The micro-service fault perception module: based on a machine learning technology, key characteristics of each index are obtained by analyzing collected historical data of each index, resource allocation and historical data length are calculated according to characteristics of a time sequence, different characteristics and algorithms are selected for targeted training, an abnormality detection model is generated, and real-time abnormality detection is carried out. And aiming at the result of the abnormal detection, after merging processing is carried out through a corresponding algorithm, an abnormal alarm is sent out, and the fault is sensed.
A microservice fault location module: when the micro-service fault sensing module detects that key service indexes are abnormal or faults are found, association analysis is carried out, the micro-service topological relation data and call chain data are combined, association of abnormal alarms is achieved, problems are rapidly located through a decision derivation and weight analysis method, and results are pushed to the micro-service data monitoring and collecting module to be displayed.
When the micro-service anomaly sensing module performs anomaly detection, the index data can be stored in different databases according to the type of the collected index data, for example, time sequence data such as key service index data and performance index data are stored in a time sequence database, and call chain data and topological relation data are stored in an elastic search; after preprocessing such as filling missing data and removing burr points is performed on the collected historical data, statistical characteristics, time sequence characteristics and the like can be extracted through tsfresh and a corresponding algorithm, and the time sequence data are classified according to key characteristics. Different algorithms are used for model training for different types of indexes, for example, if the index data are rate type data (such as memory utilization rate), a cumulative constant threshold detection algorithm is directly used. If the index data is periodic data, the historical data is used for targeted training and model generation. And carrying out anomaly detection on the real-time data through the model. And if the non-key service index is abnormal, sending an alarm prompt and storing the alarm prompt. And if the key service index is abnormal, entering a micro-service fault positioning module for fault positioning. The elastic search is a distributed, high-expansion and high-real-time search and data analysis engine; tsfresh is an open source python package that extracts time series data features.
After the micro-service abnormality is sensed, the micro-service fault positioning module can perform fault positioning. Assuming that an abnormal alarm occurs in the average response time of the key service index found by the micro service sensing module at present, for example, as shown in fig. 5, the processing flow of the micro service fault locating module is as follows:
s2-1, when the key service indicator average response time abnormal alarm is found, determining an abnormal time interval, and obtaining all call chain data of the time interval, for example, trace1 and trace2 in fig. 5.
And S2-2, sorting the call chains in descending order according to the response time of the call chains, and selecting the first 20 call chains as a candidate call chain set by combining the port information of the call chains.
S3-3, performing layer-by-layer traversal analysis on trace1 and trace2 in the graph, and taking trace1 as an example for explanation:
according to the calling chain data of trace1, firstly, the consumption time of each span (span) in the calling chain is counted, and the layer-by-layer traversal analysis is performed according to the consumption time and the topological relation data. For example, if the time consumed by the micro service B to the micro service D is relatively large in the whole link, the node where the micro service D is located is added to the node candidate set according to the topological relation.
And S3-4, acquiring abnormal conditions of the candidate node index data selected in the S3-3, and if no abnormal alarm exists in the node index data, removing the node from the candidate node, such as the node where the micro service B in the example is located. And if the node index data has abnormal alarm, adding the node and the abnormal index into the node abnormal index candidate set.
And S3-5, integrating multiple dimension information such as time, abnormal types and the like, and determining a final fault node and an abnormal index set. For example, if the candidate failure node and the abnormal index data set acquired through the two trace chains of trace1 and trace2 both include database nodes, the database nodes will be failure points.
In the foregoing embodiment, the micro-service application topological relation and call chain data are obtained by the application performance monitoring tool. The method monitors the key indexes and performance indexes of the service, and adopts anomaly detection and alarm based on a machine learning algorithm to find out the problem of microservice anomaly in time and reduce the false alarm rate of the alarm. When the key service index is abnormal or faults occur, the micro-service real-time topological relation, the call chain data and the alarm information are combined to perform association analysis, the faults are quickly positioned, and the purpose of improving operation and maintenance efficiency is achieved.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
In this embodiment, a node positioning apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and details of which have been already described are omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 6 is a block diagram of a positioning apparatus of a node according to an embodiment of the present invention, as shown in fig. 6, the apparatus including:
an obtaining module 62, configured to obtain target index data generated when a target micro service application runs, where a node called by the target micro service application includes multiple nodes;
a detection module 64, configured to detect the target index data to determine a target detection result;
a positioning module 66, configured to, if the detection result indicates that abnormal data exists in the target index data, position a target node for triggering generation of the abnormal data based on the target index data.
The obtaining module 62 corresponds to the microservice data monitoring and collecting module, the detecting module 64 corresponds to the microservice fault sensing module, and the positioning module 66 corresponds to the microservice fault positioning module.
In an exemplary embodiment, the detection module 64 may perform the detection on the target index data to determine the target detection result by: determining a target data type of the target index data; determining a target detection model based on a target data type, and sending the target index data to the target detection model for detection so as to determine a target detection result, wherein the target detection model is trained by using multiple groups of target training data through machine learning, and each group of data included in the multiple groups of target training data includes: index data and detection results.
In an exemplary embodiment, the detection module 64 may determine the target detection model based on the target data type by: and determining the target detection model corresponding to the target data type based on a preset corresponding relation, wherein the preset corresponding relation is used for indicating the corresponding relation between the data type and the detection model.
In an exemplary embodiment, the apparatus may be configured to obtain historical index data generated when the target micro service application runs before determining the target detection model corresponding to the target data type based on a preset correspondence; carrying out data preprocessing on the historical index data to obtain time sequence historical index data; extracting data characteristics of the time sequence historical index data; determining a historical data type of the historical metric data based on the data feature; and establishing the corresponding relation between the historical data type and the detection module.
In an exemplary embodiment, the apparatus may be further configured to determine a fault type of the abnormal data based on the abnormal data before locating a target node for triggering generation of the abnormal data based on the target metric data; under the condition that the fault type is determined to be a service fault, positioning the target node for triggering generation of the abnormal data based on the target index data; and under the condition that the fault type is determined to be other faults except the service fault, storing the target index data and/or executing alarm operation.
In an exemplary embodiment, the apparatus may enable locating the target node for triggering generation of the anomaly data based on the target metric data by: determining a target time period for generating the abnormal data; acquiring call chain data in the target time period from the target index data; determining a candidate call chain set in the call chain data based on the fault type; determining the target node based on the set of candidate call chains.
In an exemplary embodiment, the apparatus may enable determining the target node based on the set of candidate call chains by: acquiring the consumption time of each span included in a candidate call chain, wherein the candidate call chain is a call chain included in the candidate call set; determining candidate nodes included in the candidate call chain based on the consumption time and a topological relation included in the target index data; determining the target node based on the candidate node.
In one example embodiment, the apparatus may be configured to determine the target node based on the candidate node: determining a state of the index data of each node included in the candidate nodes, respectively; determining a node in an abnormal state included in the candidate nodes based on a state of the metric data of each node, and determining the node in the abnormal state as the target node.
In one exemplary embodiment, the apparatus may be configured to determine the target node as the node in the abnormal state, and then determine the target node as the index data of the target node; and displaying the target node and the index data.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method as set forth in any of the above.
In an exemplary embodiment, the computer-readable storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
In an exemplary embodiment, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary embodiments, and details of this embodiment are not repeated herein.
It will be apparent to those skilled in the art that the various modules or steps of the invention described above may be implemented using a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and they may be implemented using program code executable by the computing devices, such that they may be stored in a memory device and executed by the computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into various integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A method for locating a node, comprising:
acquiring target index data generated when a target micro-service application runs, wherein a node called by the target micro-service application comprises a plurality of nodes;
detecting the target index data to determine a target detection result;
and under the condition that the detection result indicates that abnormal data exists in the target index data, positioning a target node for triggering generation of the abnormal data based on the target index data.
2. The method of claim 1, wherein detecting the target metric data to determine a target detection result comprises:
determining a target data type of the target index data;
and determining a target detection model based on the type of the target data, and sending the target index data to the target detection model for detection so as to determine the target detection result.
3. The method of claim 2, wherein determining the target detection model based on the target data type comprises:
and determining the target detection model corresponding to the target data type based on a preset corresponding relation, wherein the preset corresponding relation is used for indicating the corresponding relation between the data type and the detection model.
4. The method of claim 3, wherein prior to determining the target detection model corresponding to the target data type based on a preset correspondence, the method further comprises:
acquiring historical index data generated when the target micro-service application runs;
carrying out data preprocessing on the historical index data to obtain time sequence historical index data;
extracting data characteristics of the time sequence historical index data;
determining a historical data type of the historical metric data based on the data feature;
and establishing the corresponding relation between the historical data type and the detection module.
5. The method of claim 1, wherein prior to locating a target node for triggering generation of the anomaly data based on the target metric data, the method further comprises:
determining a fault type of the abnormal data based on the abnormal data;
under the condition that the fault type is determined to be a service fault, positioning the target node for triggering generation of the abnormal data based on the target index data;
and under the condition that the fault type is determined to be other faults except the service fault, storing the target index data and/or executing alarm operation.
6. The method of claim 5, wherein locating the target node for triggering generation of the anomaly data based on the target metric data comprises:
determining a target time period for generating the abnormal data;
acquiring call chain data in the target time period from the target index data;
determining a candidate call chain set in the call chain data based on the fault type;
determining the target node based on the set of candidate call chains.
7. The method of claim 6, wherein determining the target node based on the set of candidate call chains comprises:
acquiring the consumption time of each span included in a candidate call chain, wherein the candidate call chain is a call chain included in the candidate call set;
determining candidate nodes included in the candidate call chain based on the consumption time and a topological relation included in the target index data;
determining the target node based on the candidate node.
8. The method of claim 7, wherein determining the target node based on the candidate nodes comprises:
determining a state of the index data of each node included in the candidate nodes, respectively;
determining a node in an abnormal state included in the candidate nodes based on a state of the metric data of each node, and determining the node in the abnormal state as the target node.
9. The method of claim 8, wherein after determining a node in an abnormal state as the target node, the method further comprises:
determining index data of the target node;
and displaying the target node and the index data.
10. An apparatus for locating a node, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring target index data generated when a target micro-service application runs, and nodes called by the target micro-service application comprise a plurality of nodes;
the detection module is used for detecting the target index data to determine a target detection result;
and the positioning module is used for positioning a target node for triggering generation of abnormal data based on the target index data under the condition that the detection result indicates that the abnormal data exists in the target index data.
11. A computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9.
12. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 9.
CN202110535454.9A 2021-05-17 2021-05-17 Node positioning method and device, storage medium and electronic device Pending CN113271224A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110535454.9A CN113271224A (en) 2021-05-17 2021-05-17 Node positioning method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110535454.9A CN113271224A (en) 2021-05-17 2021-05-17 Node positioning method and device, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN113271224A true CN113271224A (en) 2021-08-17

Family

ID=77231254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110535454.9A Pending CN113271224A (en) 2021-05-17 2021-05-17 Node positioning method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN113271224A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114723082A (en) * 2022-04-19 2022-07-08 镇江西门子母线有限公司 Abnormity early warning method and system for intelligent low-voltage complete equipment
CN114966304A (en) * 2022-04-13 2022-08-30 中移互联网有限公司 Fault positioning method and device and electronic equipment
CN115514627A (en) * 2022-09-21 2022-12-23 深信服科技股份有限公司 Fault root cause positioning method and device, electronic equipment and readable storage medium
WO2024027127A1 (en) * 2022-08-03 2024-02-08 中兴通讯股份有限公司 Fault detection method and apparatus, and electronic device and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109861858A (en) * 2019-01-28 2019-06-07 北京大学 Wrong investigation method of the micro services system root because of node
CN110888755A (en) * 2019-11-15 2020-03-17 亚信科技(中国)有限公司 Method and device for searching abnormal root node of micro-service system
CN111240876A (en) * 2020-01-06 2020-06-05 远光软件股份有限公司 Fault positioning method and device for microservice, storage medium and terminal
CN111722952A (en) * 2020-05-25 2020-09-29 中国建设银行股份有限公司 Fault analysis method, system, equipment and storage medium of business system
CN112187527A (en) * 2020-09-15 2021-01-05 中信银行股份有限公司 Micro-service abnormity positioning method and device, electronic equipment and readable storage medium
CN112540905A (en) * 2020-12-18 2021-03-23 青岛特来电新能源科技有限公司 System risk assessment method, device, equipment and medium under micro-service architecture
CN112698975A (en) * 2020-12-14 2021-04-23 北京大学 Fault root cause positioning method and system of micro-service architecture information system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109861858A (en) * 2019-01-28 2019-06-07 北京大学 Wrong investigation method of the micro services system root because of node
CN110888755A (en) * 2019-11-15 2020-03-17 亚信科技(中国)有限公司 Method and device for searching abnormal root node of micro-service system
CN111240876A (en) * 2020-01-06 2020-06-05 远光软件股份有限公司 Fault positioning method and device for microservice, storage medium and terminal
CN111722952A (en) * 2020-05-25 2020-09-29 中国建设银行股份有限公司 Fault analysis method, system, equipment and storage medium of business system
CN112187527A (en) * 2020-09-15 2021-01-05 中信银行股份有限公司 Micro-service abnormity positioning method and device, electronic equipment and readable storage medium
CN112698975A (en) * 2020-12-14 2021-04-23 北京大学 Fault root cause positioning method and system of micro-service architecture information system
CN112540905A (en) * 2020-12-18 2021-03-23 青岛特来电新能源科技有限公司 System risk assessment method, device, equipment and medium under micro-service architecture

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114966304A (en) * 2022-04-13 2022-08-30 中移互联网有限公司 Fault positioning method and device and electronic equipment
CN114723082A (en) * 2022-04-19 2022-07-08 镇江西门子母线有限公司 Abnormity early warning method and system for intelligent low-voltage complete equipment
CN114723082B (en) * 2022-04-19 2023-08-18 镇江西门子母线有限公司 Abnormality early warning method and system for intelligent low-voltage complete equipment
WO2024027127A1 (en) * 2022-08-03 2024-02-08 中兴通讯股份有限公司 Fault detection method and apparatus, and electronic device and readable storage medium
CN115514627A (en) * 2022-09-21 2022-12-23 深信服科技股份有限公司 Fault root cause positioning method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN113271224A (en) Node positioning method and device, storage medium and electronic device
US9672085B2 (en) Adaptive fault diagnosis
EP3745272B1 (en) An application performance analyzer and corresponding method
EP2759938A1 (en) Operations management device, operations management method, and program
CN105337765A (en) Distributed hadoop cluster fault automatic diagnosis and restoration system
CN102929773B (en) information collecting method and device
CN108737182A (en) The processing method and system of system exception
CN111010291A (en) Business process abnormity warning method and device, electronic equipment and storage medium
CN108123849A (en) Detect threshold value determination method, device, equipment and the storage medium of network traffics
CN110830438A (en) Abnormal log warning method and device and electronic equipment
CN112953738B (en) Root cause alarm positioning system, method and device and computer equipment
CN113542017A (en) Network fault positioning method based on network topology and multiple indexes
CN113313280B (en) Cloud platform inspection method, electronic equipment and nonvolatile storage medium
CN115118581B (en) Internet of things data all-link monitoring and intelligent guaranteeing system based on 5G
CN111708672B (en) Data transmission method, device, equipment and storage medium
CN108289035B (en) Method and system for visually displaying running states of network and business system
CN115766402A (en) Method and device for filtering fault root cause of server, storage medium and electronic device
CN113285847A (en) Communication network anomaly detection method and system of intelligent converter station monitoring system
CN115222181A (en) Robot operation state monitoring system and method
CN105892387A (en) Cross-platform multi-point data acquisition MPCA (multi-way principal component analysis) model-based computer room hidden danger automatic reporting device and method
CN114422324B (en) Alarm information processing method and device, electronic equipment and storage medium
CN113037550B (en) Service fault monitoring method, system and computer readable storage medium
AU2014200806B1 (en) Adaptive fault diagnosis
CN116204386B (en) Method, system, medium and equipment for automatically identifying and monitoring application service relationship
KR102646586B1 (en) Detecting method of anomaly pattern

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination