CN112882911A - Abnormal performance behavior detection method, system, device and storage medium - Google Patents

Abnormal performance behavior detection method, system, device and storage medium Download PDF

Info

Publication number
CN112882911A
CN112882911A CN202110137565.4A CN202110137565A CN112882911A CN 112882911 A CN112882911 A CN 112882911A CN 202110137565 A CN202110137565 A CN 202110137565A CN 112882911 A CN112882911 A CN 112882911A
Authority
CN
China
Prior art keywords
abnormal
historical
data
performance
events
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110137565.4A
Other languages
Chinese (zh)
Other versions
CN112882911B (en
Inventor
任睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cetc Cyberspace Security Research Institute Co Ltd
Original Assignee
Cetc Cyberspace Security Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cetc Cyberspace Security Research Institute Co Ltd filed Critical Cetc Cyberspace Security Research Institute Co Ltd
Priority to CN202110137565.4A priority Critical patent/CN112882911B/en
Publication of CN112882911A publication Critical patent/CN112882911A/en
Application granted granted Critical
Publication of CN112882911B publication Critical patent/CN112882911B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3428Benchmarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses an abnormal performance behavior detection method, a system, a device and a computer readable storage medium, wherein various abnormal events and corresponding abnormal characteristic data are comprehensively analyzed by utilizing various abnormal detection algorithms in advance, abnormal events and abnormal characteristic data are utilized to construct abnormal incidence relations between the abnormal events and the abnormal events, between the abnormal events and the abnormal characteristic data, between the abnormal characteristic data and the abnormal characteristic data, and the abnormal events and the abnormal characteristic data are closely linked, so that the abnormal events and the abnormal characteristic data can be integrally analyzed, and finally, by utilizing historical abnormal incidence relations, historical abnormal events and historical abnormal characteristic data, the constructed knowledge graph model can effectively and comprehensively analyze the current abnormal events and the corresponding abnormal characteristic data of a data center, and the detection result is more comprehensive and accurate.

Description

Abnormal performance behavior detection method, system, device and storage medium
Technical Field
The present invention relates to the field of distributed storage, and in particular, to a method, a system, an apparatus, and a computer-readable storage medium for detecting abnormal performance behavior.
Background
The operation and maintenance management of the data center infrastructure is to ensure that the data center environment can meet the requirements of various facilities, client SLAs and reliability required by the normal operation of computer equipment. Due to the gradual increase of the scale of the data center, the complex server node types, the numerous operation and maintenance problem types and the unpredictable problem occurrence, the system operation and maintenance also face more and more difficulties, and how to perform intelligent operation and maintenance decision triggering based on the monitoring data in a large-scale operation and maintenance scene to realize the operation and maintenance capability of automatic intelligent operation is the key of the modern operation and maintenance means.
Traditional automatic operation and maintenance is mainly triggered through a rule-based template, but the existing server nodes are complex in type, numerous in operation and maintenance problem types and difficult to quickly locate due to failure, and problems cannot be solved under many conditions based on manual rules. The knowledge graph is used as high-quality structured data, a comprehensive operation and maintenance knowledge base can be constructed by using the knowledge graph, and automatic operation and maintenance can be realized by using a machine learning technology. For example, various state information of server hardware, an operating system, a job scheduling system and a computing application, such as CPU utilization, job load, storage utilization and the like, is analyzed and processed to form service operation data. Meanwhile, the collected user information, the hardware information of the equipment, the virtual machine information and other information are used as node attributes to create entity nodes; and then establishing the relationship among the nodes, namely establishing relationship connection by utilizing the relationship among the entity nodes to form relationship connection as the relationship data of the knowledge graph, thereby constructing the operation and maintenance knowledge graph and realizing intelligent operation and maintenance management.
At present, the existing operation and maintenance knowledge graph mainly includes a Configuration Management Database (CMDB) and an operation and maintenance knowledge base, and an enterprise-specific operation and maintenance knowledge base is formed by automatically enriching operation and maintenance knowledge. However, the configuration change management library constructed by taking the CMDB as the core needs to change the configuration depending on the change process, and cannot adapt to the container and cloud environment (the relationship between the container environment and the cloud environment, and the relationship between resources are completely dynamic) by using a non-real-time update mechanism. Moreover, the topology of the conventional configuration change is not time-sequenced, and the corresponding topology cannot be found out according to the failure time. Meanwhile, the traditional operation and maintenance knowledge base is static and single, and cannot meet the requirement of quick and accurate operation and maintenance.
However, most of the existing operation and maintenance knowledge maps are constructed in a semi-automatic or manual mode, so that two problems exist: (1) the operation and maintenance knowledge is incomplete, and potential relations among a plurality of entities in the knowledge graph are not mined; (2) the extensibility is poor and new entities cannot be automatically added to the knowledge-graph.
Therefore, a detection method capable of more comprehensively and effectively reflecting the abnormal event and the related abnormal data is needed.
Disclosure of Invention
In view of the above, the present invention provides a method, a system, a device and a computer readable storage medium for detecting abnormal performance behavior, which can perform abnormality detection and fault diagnosis more comprehensively and effectively. The specific scheme is as follows:
an abnormal performance behavior detection method, comprising:
acquiring performance data of a data center;
analyzing the performance data by using a pre-constructed knowledge graph model to obtain abnormal parameters of the abnormal event;
the knowledge graph model is a pre-construction process and comprises the following steps:
extracting the characteristics of the historical performance data of the data center to obtain historical characteristic data corresponding to different historical events;
detecting historical characteristic data corresponding to different historical events by using an anomaly detection algorithm set to obtain a plurality of historical anomaly events and corresponding historical anomaly characteristic data;
establishing a historical abnormal association relation between each historical abnormal event and corresponding historical abnormal characteristic data by using each historical abnormal event and corresponding historical abnormal characteristic data; the historical abnormal feature data comprises indexes and performance factors;
and constructing the knowledge graph model by utilizing the historical abnormal incidence relation, the historical abnormal event and the historical abnormal characteristic data.
Optionally, the process of acquiring performance data of the data center includes:
and acquiring hardware layer performance data, system structure layer performance data, system layer performance data and application layer performance data of the data center.
Optionally, the process of performing feature extraction on the historical performance data of the data center to obtain historical feature data corresponding to different historical events includes:
and performing feature extraction on the historical performance data of the data center to obtain historical feature data corresponding to different single scene historical events.
Optionally, the process of detecting historical feature data corresponding to different historical events by using the anomaly detection algorithm set to obtain a plurality of historical anomaly events and corresponding historical anomaly feature data includes:
detecting historical characteristic data corresponding to different historical events by using an anomaly detection algorithm set to obtain a plurality of historical anomaly events and corresponding historical anomaly characteristic data;
the abnormal detection algorithm set comprises a load unbalance detection algorithm, a data volume inclination detection algorithm, a data placement unbalance detection algorithm, an abnormal node detection algorithm, an abnormal index detection algorithm, an inter-process interference detection algorithm and a system fault category detection algorithm.
The invention also discloses an abnormal performance behavior detection system, which comprises:
the performance data acquisition module is used for acquiring performance data of the data center;
the knowledge graph analysis module is used for analyzing the performance data by utilizing a pre-constructed knowledge graph model to obtain abnormal parameters of the abnormal event;
wherein the knowledge-graph analysis module comprises:
the characteristic data extraction unit is used for extracting characteristics of historical performance data of the data center to obtain historical characteristic data corresponding to different historical events;
the abnormal event detection unit is used for detecting historical characteristic data corresponding to different historical events by using an abnormal detection algorithm set to obtain a plurality of historical abnormal events and corresponding historical abnormal characteristic data;
the association factor construction unit is used for constructing a historical abnormal association relation between each historical abnormal event and corresponding historical abnormal characteristic data by utilizing each historical abnormal event and corresponding historical abnormal characteristic data; the historical abnormal feature data comprises indexes and performance factors;
and the knowledge graph building unit is used for building the knowledge graph model by utilizing the historical abnormal association relation, the historical abnormal event and the historical abnormal characteristic data.
Optionally, the performance data acquiring module is specifically configured to acquire hardware layer performance data, architecture layer performance data, system layer performance data, and application layer performance data of the data center.
Optionally, the feature data extraction unit is specifically configured to perform feature extraction on the historical performance data of the data center to obtain historical feature data corresponding to different single-scene historical events.
Optionally, the abnormal event detecting unit is specifically configured to detect historical feature data corresponding to different historical events by using an abnormal detection algorithm set, so as to obtain a plurality of historical abnormal events and corresponding historical abnormal feature data;
the abnormal detection algorithm set comprises a load unbalance detection algorithm, a data volume inclination detection algorithm, a data placement unbalance detection algorithm, an abnormal node detection algorithm, an abnormal index detection algorithm, an inter-process interference detection algorithm and a system fault category detection algorithm.
The invention also discloses an abnormal performance behavior detection device, which comprises:
a memory for storing a computer program;
a processor for executing the computer program to implement the abnormal performance behavior detection method as described above.
The invention also discloses a computer readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the abnormal performance behavior detection method as described above.
The abnormal performance behavior detection method comprises the following steps: acquiring performance data of a data center; analyzing the performance data by using a pre-constructed knowledge graph model to obtain abnormal parameters of the abnormal event; the knowledge graph model is a pre-construction process and comprises the following steps: extracting the characteristics of the historical performance data of the data center to obtain historical characteristic data corresponding to different historical events; detecting historical characteristic data corresponding to different historical events by using an anomaly detection algorithm set to obtain a plurality of historical anomaly events and corresponding historical anomaly characteristic data; establishing a historical abnormal association relation between each historical abnormal event and corresponding historical abnormal characteristic data by using each historical abnormal event and corresponding historical abnormal characteristic data; the historical abnormal characteristic data comprises indexes and performance factors; and constructing a knowledge graph model by using the historical abnormal incidence relation, the historical abnormal events and the historical abnormal characteristic data.
The invention utilizes various abnormal detection algorithms in advance to comprehensively analyze various abnormal events and corresponding abnormal characteristic data thereof, then utilizes the abnormal events and the abnormal characteristic data to construct abnormal event and abnormal event, abnormal event and abnormal characteristic data abnormal association relationship between the abnormal characteristic data and the abnormal event, and closely associates the abnormal event and the abnormal characteristic data, so that the abnormal event and the abnormal characteristic data can be integrally analyzed, and finally, historical abnormal event and historical abnormal characteristic data are utilized, and the constructed knowledge map model can effectively and comprehensively analyze the current abnormal event and corresponding abnormal characteristic data of the data center, and the detection result is more comprehensive and accurate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for detecting abnormal performance behavior according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method for pre-constructing a knowledge graph model according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of an abnormal indicator detection algorithm disclosed in the embodiments of the present invention;
fig. 4 is a schematic structural diagram of an abnormal performance behavior detection system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a method for detecting abnormal performance behaviors, which is shown in figure 1 and comprises the following steps:
s11: performance data of the data center is obtained.
Specifically, performance data in the data center is multi-dimensional data, and can be roughly divided into hardware layer performance data, system structure layer performance data, system layer performance data and application layer performance data, and the four types of data are acquired from bottom to top by adopting a fine-grained multi-layer performance data acquisition frame, and the performance data on the hardware layer, the system structure layer, the system layer, the large data frame layer and the application load layer on the data center system are simultaneously acquired from bottom to top and are used as original input of performance analysis of the large data system, so that the performance condition of the large data applied to the whole life cycle can be effectively described so as to be used for subsequent correlation analysis and performance diagnosis.
Specifically, the hardware layer performance data of the hardware layer mainly includes hardware parameters such as hardware temperature and power consumption; in the hardware layer, the power consumption and temperature information of the hardware can be acquired mainly by depending on a group of special registers, namely a hardware counter, provided by the existing CPU.
Specifically, the performance data of the architecture layer mainly comprises an IPC, an instruction proportion, a TLB Miss, a Cache Miss, a memory access bandwidth and the like; on the architecture layer, a hardware counter provided by the existing CPU can be utilized, and the hardware counter can record the number of times of occurrence of the micro-architecture layer event.
Specifically, the system layer performance data of the system layer mainly includes CPU utilization, disk I/O, memory management, network and process related information, and the like, as well as system logs. Under the Linux system, data acquisition at a system layer mainly comes from a file system proc carried by the Linux, the proc file system is a file system without storage, when a file in the file system is read, the content of the file system is dynamically generated, when the file is written, a write function associated with the file is called, a kernel component can provide an interface for a user space through the file system to provide query information and modify software behaviors, and the information almost covers all parts of a kernel and key performance parameters of the system. In addition, the system log comprises a system RAS log, a system security audit log and the like, mainly records the information of hardware, software and system problems in the system, and can also be used for monitoring events occurring in the system. For system logs, they may be collected by Rsyslog or syslog or other log collection tools.
Specifically, on the application framework layer of the data center, the performance data of the application layer mainly comprises configuration information and related logs related to the application framework and the like; in the application load layer, the Profile information on the user code layer is mainly used. For the performance data at different levels, different data collection tools can be used to obtain the performance data, for example, a log collection tool is used to collect information such as logs of current applications, tasks and stages output by a data center application framework at runtime, and then the performance-related data can be analyzed from the information.
S12: and analyzing the performance data by using a pre-constructed knowledge graph model to obtain abnormal parameters of the abnormal event.
Specifically, the knowledge graph model of the embodiment of the present invention is a knowledge graph model in which abnormal association relationships between various abnormal events and various abnormal feature data are pre-constructed, and the abnormal association relationships can comprehensively reflect the interaction relationships between various abnormal events and various abnormal feature data, so as to associate an abnormal event and abnormal feature data thereof in each single scene with abnormal events and abnormal feature data thereof in other scenes, and better reflect the abnormal situation of the data center in a complex scene in actual application.
Referring to fig. 2, the knowledge map model is a specific process constructed in advance, and may include S121 to S124:
s121: and performing feature extraction on the historical performance data of the data center to obtain historical feature data corresponding to different historical events.
Specifically, the historical performance data is obtained by obtaining data from the data center for history, and the specific obtaining process is consistent with the obtaining process in S11, which is not described herein again.
Specifically, after the historical performance data is obtained, feature extraction is performed on the historical performance data, in order to facilitate classification and analysis of various data and effective extraction, behaviors and performance expressions of execution subjects applied to different levels of a data center system are uniformly described, and events (Eevnt) and indexes (Metric) are defined.
The events are abstract behaviors of executing main bodies applied to different levels in a data center system into events, and can be divided into three types of events: (1) application layer events: the execution task, execution phase, user operation, etc. of the application; (2) system layer events: processor processes/threads, communication processes/threads, etc.; (3) hardware layer events: processor control instructions, memory access instructions, and the like.
The indexes can be expressed by performance expression values applied to different levels in a data center system and can be divided into three types of indexes: (1) application layer indexes: the most intuitive performance observation indicators loaded on the application layer, such as the amount of data processed per second, can also analyze the applied processing logic, such as: algorithm complexity, data change rule (i.e. change of data in application process); (2) system layer indexes: the behaviors of a software running environment, an operating system, a hardware environment and the like are reflected, and the behaviors mainly comprise the single performance of each component of the system and the interaction condition among the components; (3) the indexes of the system structure layer are as follows: including instruction ratio, and memory-related microarchitecture layer indicators. The common metrics are shown in the table, and the different metrics already contain information about the system components or the execution entity, e.g., CPU utilization shows the processor performance status.
Further, in the face of a great number of events and indexes of the data center, and different events and indexes may have dependence or propagation relations, only a single abnormal event or index is analyzed, so that the performance of the data center cannot be diagnosed comprehensively and accurately. Therefore, the performance problem of the data center is decomposed, the complicated performance problem is decomposed into single-scenario performance problems which can be solved one by one, and then the single-scenario performance problems are correlated, so that the self-adaptive analysis of the performance can be realized. For single scenario performance issues, a performance state (Status) and a performance Factor (Factor) are defined. The performance status refers to a performance status of an event or an index, and mainly includes: a Normal (Normal) state and an Abnormal (Abnormal) state. A performance factor refers to an event or indicator having a certain performance state, and primarily includes normal/abnormal events and normal/abnormal indicators.
S122: and detecting historical characteristic data corresponding to different historical events by using an anomaly detection algorithm set to obtain a plurality of historical anomaly events and corresponding historical anomaly characteristic data.
Specifically, the various targeted anomaly detection algorithms are used to detect different anomaly problems, so that historical anomaly events included in historical characteristic data can be analyzed comprehensively, for example, whether a load imbalance phenomenon exists in a data center, whether a data distribution applied by the data center is inclined, whether an unbalanced phenomenon exists in a data placement position applied by the data center, whether an abnormal node exists in the data center, whether an abnormal index exists in the data center, whether a process mutual interference phenomenon exists in the data center, and a fault category or failure category existing in the data center can be detected, so that a plurality of historical anomaly events can be obtained, and historical anomaly characteristic data corresponding to each historical anomaly event can also be obtained.
It is understood that the embodiments of the present invention are not limited to the above specific anomaly phenomena and anomaly detection methods, that is, the performance anomaly detection techniques based on data driving and model driving are used in combination with specific situations of performance problems at different levels, so that various anomaly conditions can be detected.
S123: and constructing a historical abnormal association relation between the historical abnormal events and the corresponding historical abnormal characteristic data by utilizing each historical abnormal event and the corresponding historical abnormal characteristic data.
Specifically, a single abnormal occurrence may cause a chain reaction, and a plurality of abnormal occurrences are easily caused, so in order to deeply study the relationship between each abnormal event and the abnormal feature data, a historical abnormal association relationship between historical abnormal events and corresponding historical abnormal feature data is constructed.
Specifically, the performance elements of the data center are subjected to correlation analysis, the performance elements include abnormal events and corresponding abnormal characteristic data, namely, the correlation relation between each historical abnormal event and the corresponding historical abnormal characteristic data is subjected to statistics and correlation analysis, wherein the historical abnormal characteristic data include indexes and performance factors. For example, the correlation coefficient, a statistical indicator designed by the statistician karl pearson, is a measure of the degree of linear correlation between the study variables. The correlation analysis may analyze not only a correlation between two performance factors but also a correlation between a plurality of performance factors. Meanwhile, frequent association patterns which may exist among different performance elements can be discovered by utilizing an association mining algorithm.
Specifically, in an application scenario of a data center, the following four types of correlation relationships are defined: the correlation between the abnormal event and the abnormal event, the correlation between the index and the index, the correlation between the event and the index, and the correlation between the performance factor and the performance factor.
Further, to analyze the causal relationship between different performance elements, it can be represented by a probability or distribution function from a statistical point of view: in the case that the occurrence of all other events is fixed, if the occurrence of one event a has an influence on the occurrence probability of another event B, and the two events are in chronological order (event a occurs before event B, i.e. a and B have a preamble relationship), then a can be said to be the cause of B. For example, the granger's causal relationship theory can be used to determine whether one of the two variables has a correct effect on the prediction of the other variable by statistical hypothesis testing. Or the causal relationship among different performance elements in the system is established through causal path mining and a probability model, and a causal chain of the performance problem is further deduced.
Specifically, based on different performance elements, nine types of causal relationships may be defined: 1. events have a causal relationship with each other, i.e. the occurrence of one event causes another event to occur. 2. There is a causal relationship between events and indicators, i.e. the occurrence of an event causes a change in an indicator. 3. There is a causal relationship between events and performance factors, i.e. the occurrence of an event causes a change in an indicator. 4. Indicators have a causal relationship with indicators, i.e. a change in one indicator results in a change in another indicator. 5. There is a causal relationship between events and indicators, i.e. a change in an indicator causes an event to occur. 6. There is a causal relationship between events and performance factors, i.e. a change in a certain indicator causes a certain event to occur. 7. There is a causal relationship between performance factors and performance factors, i.e. a change in one performance factor results in a change in another. 8. There is a causal relationship between the performance factors and the events, i.e. a change in a certain performance factor causes a certain event to occur. 9. The performance factors and the indexes have a causal relationship, namely, a certain index is changed due to the change of a certain performance factor.
S124: and constructing a knowledge graph model by using the historical abnormal incidence relation, the historical abnormal events and the historical abnormal characteristic data.
Specifically, the obtained historical abnormal association relationship, historical abnormal events and historical abnormal feature data are integrated to obtain multidimensional information of data center server hardware, an operating system, an operation scheduling system and calculation application, and triples (entity-relationship-entity) of different operation and maintenance events are abstracted, so that the knowledge graph model can analyze performance data in multiple dimensions, and more comprehensive and accurate abnormal detection results are obtained. The entities refer to abstracted performance events, indexes and factors, and the relationship refers to the correlation relationship among the entities.
Therefore, the embodiment of the invention utilizes a plurality of anomaly detection algorithms in advance to comprehensively analyze various abnormal events and corresponding abnormal characteristic data thereof, then utilizes the abnormal events and the abnormal characteristic data to construct abnormal event and abnormal event, abnormal event and abnormal characteristic data abnormal association relationship, closely associates the abnormal events and the abnormal characteristic data, enables the abnormal events and the abnormal characteristic data to be integrally analyzed, and finally utilizes historical abnormal association relationship, historical abnormal events and historical abnormal characteristic data, and the constructed knowledge graph model can effectively and comprehensively analyze the current abnormal events and the corresponding abnormal characteristic data of the data center, so that the detection result is more comprehensive and accurate.
The embodiment of the invention discloses a specific abnormal performance behavior detection method, and compared with the previous embodiment, the embodiment further explains and optimizes the technical scheme. Specifically, the method comprises the following steps:
specifically, the anomaly detection algorithm set may specifically include a load imbalance detection algorithm, a data volume inclination detection algorithm, a data placement imbalance detection algorithm, an abnormal node detection algorithm, an abnormal index detection algorithm, an inter-process interference detection algorithm, a system fault category detection algorithm, and other algorithms.
Specifically, the following specific application scenarios are provided for the data placement imbalance detection algorithm: data placement is another important factor that affects task runtime and load balancing. To determine whether Data placement is balanced, consideration is mainly given to Data Locality (Data Locality): the data locality represents the spatial proximity of data and executing codes, and if the data and the codes are not in the same node or frame, the overhead of remote data transmission is generated, so that the data processing speed of a task is influenced; if the data is as close to the processing code as possible, the expenses of long-distance data copying and data migration can be reduced, and therefore the performance of big data application is improved.
On the Spark framework, the priority of data locality includes:
(1) PROCESS _ LOCAL, data and code are on the same JVM;
(2) NODE _ LOCAL, data and code are on the same NODE;
(3) NO _ PREF (NO difference), NO difference when data is processed anywhere, meaning it has NO local performance;
(4) RACK _ LOCAL, data and code are on the same RACK;
(5) ANY, and ANY, data and code are in different machine-interleaved racks.
Where from PROCESS _ LOCAL to ANY means from high priority to low priority.
And the priority of data locality on the Hadoop framework comprises:
(1) NODE _ LOCALITY;
(2) RACK _ LOCALITY;
(3) OFF _ SWITCH (data center locality).
From NODE _ LOCALITY to OFF _ SWITCH, again in order of high priority to low priority.
Since the data locality of different priorities may have different effects on the running time of the task, the influence of the data locality on the running time of the task is mainly judged. Firstly, the running time of a task is divided into two categories: (1) normal operation duration, (2) abnormal operation duration, and use
Figure BDA0002927608410000112
Representing those that are much longer than normal operation. Then, in order to evaluate the influence of different data locality on task running time, the data is divided into each type of dataLocality sets an impact weight.
The weight setting of the data locality priority on the Spark frame and the weight setting of the data locality priority on the Hadoop frame in the table 1 and the table 2 respectively list the weight values set for the data locality priorities of various types in the Spark frame and the Hadoop frame, wherein the larger the set weight value is, the larger the influence of the data locality on the running time length is, and if the weight of the data locality priority is 0, the influence of the data locality on the running time length is represented.
TABLE 1
Data locality ANY RACK_LOCAL NODE_LOCAL PROCESS_LOCAL NO_PREF
Priority weighting 2 2 1 0 0
TABLE 2
Data locality OFF_SWITCH RACK_LOCALITY NODE_LOCALITY
Priority weighting 2 2 1
Specifically, the data placement imbalance detection Algorithm based on the euclidean distance by Algorithm 2 is as follows:
Figure BDA0002927608410000111
Figure BDA0002927608410000121
the algorithm provides a data placement imbalance detection algorithm based on Euclidean distance. Firstly, by calculating the distance dis between the running time length of each task and the average value of the running time lengthjCombining the mean distance mean (dis) and the standard deviation std (D) of the operating durationSi) By the formula | | disj|-mean(disj)|>std(DSi) And 1.96, judging whether the running time of the task j is the abnormal running time, and adding the abnormal running time into an abnormal running time list. Then according to the task with abnormal operation duration, the node operated by the task and the data locality category can be found, and the locality of each type of data can be calculatedtHas a differenceNumber of constant running time
Figure BDA0002927608410000122
Further, a priority weight for data locality is introduced, by the Ratio (locality) defined as abovetK) to represent the proportion of the existence of data placement imbalance on node k. When Ratio (locality)t,k)>0, then an unbalanced data placement on node k is considered to exist.
Specifically, the following is a specific application scenario of the abnormal index detection algorithm: in order to detect the abnormal indexes existing on each stage Si, the performance indexes on the nodes k are constructed into a performance index matrix Xsi,kThe size of the matrix is m × n, where n denotes the number of collected performance indicators, and m denotes the performance indicators at the m timestamps of the collection stage si
Figure BDA0002927608410000123
Figure BDA0002927608410000131
Then, based on the constructed performance index matrix, the existing abnormal indexes are found through principal component analysis, time series transformation, standardization and outlier detection algorithms. Fig. 3 shows an exemplary diagram of an abnormal index detection process.
According to observation, not all indexes have strong correlation with abnormal performance, and different big data applications and execution behaviors of different stages have different degrees of influence on different performance indexes. To reduce the dimensionality of the dataset to reduce complexity while maintaining the features in the dataset that contribute most to variance, Principal Components Analysis (PCA) is used for dimensionality reduction. Principal component analysis is a statistical process that uses orthogonal transformation to transform a set of possibly correlated variable observations into a set of linearly independent variables (principal components), and the number of principal components is less than or equal to the number of original variables.
In the specific implementation, the covariance matrix of the performance index matrix is calculated, then the eigenvalue eigenvector of the covariance matrix is obtained, and the eigenvector corresponding to the first d characteristics with the largest eigenvalue (i.e. the largest variance) is selected to form a new matrix, thereby realizing the dimension reduction of the data characteristics, namely the principal component characteristic in the performance index matrix Xsi is the covariance matrix
Figure BDA0002927608410000132
N x feature vectors.
Then, the first d principal component indexes PCd are selected through the cumulative contribution rate CCRated, that is, the feature vectors with the cumulative contribution rate exceeding a certain threshold are selected as the principal component vectors. In the experiment, 0.95 was selected as the cumulative contribution rate of the selected principal component index.
Then, the performance index matrix is subjected to dimensionality reduction through principal component analysis, and the original performance index matrix Xsi,kConversion into principal component index matrix of size mxd
Figure BDA0002927608410000133
Figure BDA0002927608410000134
(2) Time series transformation
D principal component indexes can be obtained by principal component analysis. For each principal component index, the principal component indexes of the big data application stage Si on each node k form a group of time series, for example, the time series of the first principal component index PC1 is
Figure BDA0002927608410000141
The second principal component index PC2 has a time sequence of
Figure BDA0002927608410000142
The time series of the d-th principal component index PCd is
Figure BDA0002927608410000143
And (3) adopting mean value transformation: and averaging the performance index values in the time sequence, wherein the calculation method of mean value transformation refers to a mean value transformation formula. Because if some nodes have a large difference from others in the average value of a certain performance index, it can be inferred that the performance index is possibly a potential abnormal index on the node.
Mean transformation formula:
Figure BDA0002927608410000144
(3) standardization
Different performance indexes in the system usually have different units, and the value range may have larger difference. For example, CPU utilization and memory usage are typically in units of percentages (%), taking values between 0 and 1. The unit of the disk read-write bandwidth and the network transceiving bandwidth is MB/s or KB/s and the like. Thus, the performance index values on different scales are adjusted to a uniform range by a standardization method.
In this section, the time-series transformed index value is converted to between 0 and 1 using a linear Min-Max normalization method. In particular, Min-Max normalized expression
Figure BDA0002927608410000145
Wherein y represents
Figure BDA0002927608410000146
Max is the maximum value of the index, and min is the minimum value of the index. Although the Min-Max normalization method is simple and effective, it has the disadvantage that it may require recalculation of the maximum and minimum values as additional new data is input. And the index value after Min-Max standardization is used
Figure BDA0002927608410000147
And (4) showing.
(4) Outlier detection based on distance and dimension
The main purpose of outlier detection is to detect abnormal data or behaviors that differ significantly from the characteristic attributes or behaviors of normal data, and generally, outlier data is usually smaller in size than normal data, but the influence of these outliers cannot be ignored.
In this section, it is detected whether there is an abnormal index on the cluster node at each stage of the big data application. Then, the normalized index value on all the computing nodes in the cluster is obtained
Figure BDA0002927608410000148
A set of index vectors is formed,
Figure BDA0002927608410000149
outlier detection algorithms are then used to find the anomaly indicators.
Specifically, an unsupervised outlier detection algorithm combining distance and dimension is proposed. In general, in distance-based outlier detection, if an object in data set D has at least a part of pct as a distance from object o greater than dmin, object o is said to be a distance-based outlier with pct and dmin as parameters, i.e., a DB (pct, dmin) outlier. The determination of the pct and dmin parameter values and the evaluation of validity (determining whether a DB (pct, dmin) outlier is a true outlier) require expert experience for guidance. By setting appropriate parameters for the normalized performance index data, most abnormal values in the data set can be detected by using the distance-based outlier detection algorithm, but some abnormal values are still missed. For example, a set of values for the cpu _ use index are obtained, respectively [ hw073: 0.006838, hw106: 0.15604399, hw114: 0.17810599], when dmin is set to 0.5 and pct to 1, no abnormal value can be detected; in fact 0.006838 can be intuitively considered as an anomaly.
Then, using a logarithmic approach (e.g., using log (10) to transform the normalized raw data to obtain a dimension on the order of the value, e.g., [ hw073:2, hw106:0, hw114:0], then based on the logarithmized value, the cpu _ use on the hw073 node can be considered an outlier.
In a pseudo code algorithm of the distance and dimension based anomaly index detection algorithm, a default value of the parameter pct is set to 1, and a value of the parameter dmin is adjustable. The steps of the algorithm are as follows:
(1) and obtaining the dimension of the magnitude order of the index value by a logarithm method, and then detecting the abnormal index by using an outlier detection algorithm based on the dimension. That is, the median of all index value dimensions is calculated, then the distance dis between each index value dimension and the median is calculated, and if the distance dis of a certain index value dimension is greater than the mean avg (dis) of the distances, the index value dimension is added to the suspicion group SuspG. And comparing the distances dis (SuspG) between all index value dimensions in the suspected group and the median of the dimensions, and if the difference between dis (SuspG) and avg (dis) is greater than the variance (variance), considering the index value dimensions as outliers.
(2) And detecting abnormal indexes by using an outlier detection algorithm based on distance. Specifically, the index value is divided into two categories: one of the classes A and B is a larger class (including a class with a larger number of index values) and the other is a smaller class (including a class with a smaller number of index values). Wherein the representative points of the larger class are calculated using two methods, respectively, one is to calculate the maximum/minimum value of the larger class, and the other is to calculate the median value of the larger class. And then calculating the distances between all indexes in the small class and the representative points in the large class, and if the calculated distance value is greater than a threshold dmin, considering the corresponding index value as an outlier. In subsequent experimental evaluations, outlier detection results representing a large class using the maximum/minimum and median values, respectively, at different dmin values were compared.
Correspondingly, the embodiment of the present invention further discloses an abnormal performance behavior detection system, as shown in fig. 4, the system includes:
the performance data acquisition module 11 is used for acquiring performance data of the data center;
the knowledge graph analysis module 12 is configured to analyze the performance data by using a pre-constructed knowledge graph model to obtain an abnormal parameter of the abnormal event;
the knowledge graph analysis module 12 includes:
the characteristic data extraction unit is used for extracting characteristics of historical performance data of the data center to obtain historical characteristic data corresponding to different historical events;
the abnormal event detection unit is used for detecting historical characteristic data corresponding to different historical events by using an abnormal detection algorithm set to obtain a plurality of historical abnormal events and corresponding historical abnormal characteristic data;
the association factor construction unit is used for constructing a historical abnormal association relation between each historical abnormal event and corresponding historical abnormal characteristic data by utilizing each historical abnormal event and corresponding historical abnormal characteristic data; the historical abnormal characteristic data comprises indexes and performance factors;
and the knowledge graph building unit is used for building a knowledge graph model by utilizing the historical abnormal association relation, the historical abnormal event and the historical abnormal characteristic data.
Therefore, the embodiment of the invention utilizes a plurality of anomaly detection algorithms in advance to comprehensively analyze various abnormal events and corresponding abnormal characteristic data thereof, then utilizes the abnormal events and the abnormal characteristic data to construct abnormal event and abnormal event, abnormal event and abnormal characteristic data abnormal association relationship, closely associates the abnormal events and the abnormal characteristic data, enables the abnormal events and the abnormal characteristic data to be integrally analyzed, and finally utilizes historical abnormal association relationship, historical abnormal events and historical abnormal characteristic data, and the constructed knowledge graph model can effectively and comprehensively analyze the current abnormal events and the corresponding abnormal characteristic data of the data center, so that the detection result is more comprehensive and accurate.
Specifically, the performance data acquiring module may be specifically configured to acquire hardware layer performance data, architecture layer performance data, system layer performance data, and application layer performance data of the data center.
Specifically, the feature data extraction unit may be specifically configured to perform feature extraction on historical performance data of the data center to obtain historical feature data corresponding to historical events of different single scenes.
Specifically, the abnormal event detection unit may be specifically configured to detect historical feature data corresponding to different historical events by using an abnormal detection algorithm set, so as to obtain a plurality of historical abnormal events and corresponding historical abnormal feature data;
the abnormal detection algorithm set comprises a load unbalance detection algorithm, a data volume inclination detection algorithm, a data placement unbalance detection algorithm, an abnormal node detection algorithm, an abnormal index detection algorithm, an inter-process interference detection algorithm and a system fault category detection algorithm.
In addition, the embodiment of the invention also discloses an abnormal performance behavior detection device, which comprises:
a memory for storing a computer program;
a processor for executing a computer program to implement the abnormal performance behavior detection method as described above.
In addition, the embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when being executed by a processor, the computer program realizes the abnormal performance behavior detection method.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The technical content provided by the present invention is described in detail above, and the principle and the implementation of the present invention are explained in this document by applying specific examples, and the above description of the examples is only used to help understanding the method of the present invention and the core idea thereof; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. An abnormal performance behavior detection method, comprising:
acquiring performance data of a data center;
analyzing the performance data by using a pre-constructed knowledge graph model to obtain abnormal parameters of the abnormal event;
the knowledge graph model is a pre-construction process and comprises the following steps:
extracting the characteristics of the historical performance data of the data center to obtain historical characteristic data corresponding to different historical events;
detecting historical characteristic data corresponding to different historical events by using an anomaly detection algorithm set to obtain a plurality of historical anomaly events and corresponding historical anomaly characteristic data;
establishing a historical abnormal association relation between each historical abnormal event and corresponding historical abnormal characteristic data by using each historical abnormal event and corresponding historical abnormal characteristic data; the historical abnormal feature data comprises indexes and performance factors;
and constructing the knowledge graph model by utilizing the historical abnormal incidence relation, the historical abnormal event and the historical abnormal characteristic data.
2. The abnormal performance behavior detection method according to claim 1, wherein the process of obtaining performance data of the data center comprises:
and acquiring hardware layer performance data, system structure layer performance data, system layer performance data and application layer performance data of the data center.
3. The abnormal performance behavior detection method according to claim 2, wherein the process of extracting the features of the historical performance data of the data center to obtain the historical feature data corresponding to different historical events comprises:
and performing feature extraction on the historical performance data of the data center to obtain historical feature data corresponding to different single scene historical events.
4. The abnormal performance behavior detection method of claim 3, wherein the step of detecting historical feature data corresponding to different historical events by using the abnormal performance detection algorithm set to obtain a plurality of historical abnormal events and corresponding historical abnormal feature data comprises:
detecting historical characteristic data corresponding to different historical events by using an anomaly detection algorithm set to obtain a plurality of historical anomaly events and corresponding historical anomaly characteristic data;
the abnormal detection algorithm set comprises a load unbalance detection algorithm, a data volume inclination detection algorithm, a data placement unbalance detection algorithm, an abnormal node detection algorithm, an abnormal index detection algorithm, an inter-process interference detection algorithm and a system fault category detection algorithm.
5. An abnormal performance behavior detection system, comprising:
the performance data acquisition module is used for acquiring performance data of the data center;
the knowledge graph analysis module is used for analyzing the performance data by utilizing a pre-constructed knowledge graph model to obtain abnormal parameters of the abnormal event;
wherein the knowledge-graph analysis module comprises:
the characteristic data extraction unit is used for extracting characteristics of historical performance data of the data center to obtain historical characteristic data corresponding to different historical events;
the abnormal event detection unit is used for detecting historical characteristic data corresponding to different historical events by using an abnormal detection algorithm set to obtain a plurality of historical abnormal events and corresponding historical abnormal characteristic data;
the association factor construction unit is used for constructing a historical abnormal association relation between each historical abnormal event and corresponding historical abnormal characteristic data by utilizing each historical abnormal event and corresponding historical abnormal characteristic data; the historical abnormal feature data comprises indexes and performance factors;
and the knowledge graph building unit is used for building the knowledge graph model by utilizing the historical abnormal association relation, the historical abnormal event and the historical abnormal characteristic data.
6. The abnormal performance behavior detection system of claim 5, wherein the performance data obtaining module is specifically configured to obtain hardware layer performance data, architecture layer performance data, system layer performance data, and application layer performance data of a data center.
7. The abnormal performance behavior detection system according to claim 6, wherein the feature data extraction unit is specifically configured to perform feature extraction on historical performance data of the data center to obtain historical feature data corresponding to historical events of different single scenes.
8. The system according to claim 7, wherein the abnormal performance behavior detection unit is specifically configured to detect historical feature data corresponding to different historical events by using an abnormal detection algorithm set, so as to obtain a plurality of historical abnormal events and corresponding historical abnormal feature data;
the abnormal detection algorithm set comprises a load unbalance detection algorithm, a data volume inclination detection algorithm, a data placement unbalance detection algorithm, an abnormal node detection algorithm, an abnormal index detection algorithm, an inter-process interference detection algorithm and a system fault category detection algorithm.
9. An abnormal performance behavior detection apparatus, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the abnormal performance behavior detection method of any of claims 1 to 4.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the abnormal performance behavior detection method of any one of claims 1 to 4.
CN202110137565.4A 2021-02-01 2021-02-01 Abnormal performance behavior detection method, system, device and storage medium Active CN112882911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110137565.4A CN112882911B (en) 2021-02-01 2021-02-01 Abnormal performance behavior detection method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110137565.4A CN112882911B (en) 2021-02-01 2021-02-01 Abnormal performance behavior detection method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN112882911A true CN112882911A (en) 2021-06-01
CN112882911B CN112882911B (en) 2023-12-29

Family

ID=76052353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110137565.4A Active CN112882911B (en) 2021-02-01 2021-02-01 Abnormal performance behavior detection method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN112882911B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113268891A (en) * 2021-06-30 2021-08-17 云智慧(北京)科技有限公司 Modeling method and device of operation and maintenance system
CN113448806A (en) * 2021-06-30 2021-09-28 平安证券股份有限公司 Database cluster anomaly detection method and device, terminal device and storage medium
CN114826718A (en) * 2022-04-19 2022-07-29 中国人民解放军战略支援部队航天工程大学 Multi-dimensional information-based internal network anomaly detection method and system
CN116644975A (en) * 2023-07-27 2023-08-25 山东卓越精工集团有限公司 Intelligent supervision method and system for anti-collision hidden engineering construction
CN117454299A (en) * 2023-12-21 2024-01-26 深圳市研盛芯控电子技术有限公司 Abnormal node monitoring method and system

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104125085A (en) * 2013-04-27 2014-10-29 ***通信集团黑龙江有限公司 EBS (Enterprise Service Bus) data management and control method and device
CN109213854A (en) * 2018-09-05 2019-01-15 平安科技(深圳)有限公司 Knowledge mapping approaches to IM, device, computer equipment and storage medium
CN109992440A (en) * 2019-04-02 2019-07-09 北京睿至大数据有限公司 A kind of IT root accident analysis recognition methods of knowledge based map and machine learning
CN110059775A (en) * 2019-05-22 2019-07-26 湃方科技(北京)有限责任公司 Rotary-type mechanical equipment method for detecting abnormality and device
CN110888788A (en) * 2019-10-16 2020-03-17 平安科技(深圳)有限公司 Anomaly detection method and device, computer equipment and storage medium
CN111061620A (en) * 2019-12-27 2020-04-24 福州林科斯拉信息技术有限公司 Intelligent detection method and detection system for server abnormity of mixed strategy
CN111158977A (en) * 2019-12-12 2020-05-15 深圳前海微众银行股份有限公司 Abnormal event root cause positioning method and device
CN111177284A (en) * 2019-12-31 2020-05-19 清华大学 Emergency plan model generation method, device and equipment
CN111191041A (en) * 2019-11-22 2020-05-22 腾讯云计算(北京)有限责任公司 Characteristic data acquisition method, data storage method, device, equipment and medium
WO2020244262A1 (en) * 2019-06-05 2020-12-10 厦门邑通软件科技有限公司 Device fault intelligent monitoring method based on event graph technology
CN112188531A (en) * 2019-07-01 2021-01-05 ***通信集团浙江有限公司 Abnormality detection method, abnormality detection device, electronic apparatus, and computer storage medium
CN112187514A (en) * 2020-09-02 2021-01-05 上海御威通信科技有限公司 Intelligent operation and maintenance system, method and terminal for data center network equipment
CN112241424A (en) * 2020-10-16 2021-01-19 中国民用航空华东地区空中交通管理局 Air traffic control equipment application system and method based on knowledge graph
CN112269901A (en) * 2020-09-14 2021-01-26 合肥中科类脑智能技术有限公司 Fault distinguishing and reasoning method based on knowledge graph

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104125085A (en) * 2013-04-27 2014-10-29 ***通信集团黑龙江有限公司 EBS (Enterprise Service Bus) data management and control method and device
CN109213854A (en) * 2018-09-05 2019-01-15 平安科技(深圳)有限公司 Knowledge mapping approaches to IM, device, computer equipment and storage medium
CN109992440A (en) * 2019-04-02 2019-07-09 北京睿至大数据有限公司 A kind of IT root accident analysis recognition methods of knowledge based map and machine learning
CN110059775A (en) * 2019-05-22 2019-07-26 湃方科技(北京)有限责任公司 Rotary-type mechanical equipment method for detecting abnormality and device
WO2020244262A1 (en) * 2019-06-05 2020-12-10 厦门邑通软件科技有限公司 Device fault intelligent monitoring method based on event graph technology
CN112188531A (en) * 2019-07-01 2021-01-05 ***通信集团浙江有限公司 Abnormality detection method, abnormality detection device, electronic apparatus, and computer storage medium
CN110888788A (en) * 2019-10-16 2020-03-17 平安科技(深圳)有限公司 Anomaly detection method and device, computer equipment and storage medium
CN111191041A (en) * 2019-11-22 2020-05-22 腾讯云计算(北京)有限责任公司 Characteristic data acquisition method, data storage method, device, equipment and medium
CN111158977A (en) * 2019-12-12 2020-05-15 深圳前海微众银行股份有限公司 Abnormal event root cause positioning method and device
CN111061620A (en) * 2019-12-27 2020-04-24 福州林科斯拉信息技术有限公司 Intelligent detection method and detection system for server abnormity of mixed strategy
CN111177284A (en) * 2019-12-31 2020-05-19 清华大学 Emergency plan model generation method, device and equipment
CN112187514A (en) * 2020-09-02 2021-01-05 上海御威通信科技有限公司 Intelligent operation and maintenance system, method and terminal for data center network equipment
CN112269901A (en) * 2020-09-14 2021-01-26 合肥中科类脑智能技术有限公司 Fault distinguishing and reasoning method based on knowledge graph
CN112241424A (en) * 2020-10-16 2021-01-19 中国民用航空华东地区空中交通管理局 Air traffic control equipment application system and method based on knowledge graph

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113268891A (en) * 2021-06-30 2021-08-17 云智慧(北京)科技有限公司 Modeling method and device of operation and maintenance system
CN113448806A (en) * 2021-06-30 2021-09-28 平安证券股份有限公司 Database cluster anomaly detection method and device, terminal device and storage medium
CN113268891B (en) * 2021-06-30 2022-06-03 云智慧(北京)科技有限公司 Modeling method and device of operation and maintenance system
CN114826718A (en) * 2022-04-19 2022-07-29 中国人民解放军战略支援部队航天工程大学 Multi-dimensional information-based internal network anomaly detection method and system
CN114826718B (en) * 2022-04-19 2022-11-04 中国人民解放军战略支援部队航天工程大学 Multi-dimensional information-based internal network anomaly detection method and system
CN116644975A (en) * 2023-07-27 2023-08-25 山东卓越精工集团有限公司 Intelligent supervision method and system for anti-collision hidden engineering construction
CN116644975B (en) * 2023-07-27 2023-10-10 山东卓越精工集团有限公司 Intelligent supervision method and system for anti-collision hidden engineering construction
CN117454299A (en) * 2023-12-21 2024-01-26 深圳市研盛芯控电子技术有限公司 Abnormal node monitoring method and system
CN117454299B (en) * 2023-12-21 2024-03-26 深圳市研盛芯控电子技术有限公司 Abnormal node monitoring method and system

Also Published As

Publication number Publication date
CN112882911B (en) 2023-12-29

Similar Documents

Publication Publication Date Title
CN112882911B (en) Abnormal performance behavior detection method, system, device and storage medium
US9921937B2 (en) Behavior clustering analysis and alerting system for computer applications
KR102522005B1 (en) Apparatus for VNF Anomaly Detection based on Machine Learning for Virtual Network Management and a method thereof
US10452458B2 (en) Computer performance prediction using search technologies
Guo et al. Tracking probabilistic correlation of monitoring data for fault detection in complex systems
US9870294B2 (en) Visualization of behavior clustering of computer applications
US20100153431A1 (en) Alert triggered statistics collections
Jiang et al. Efficient fault detection and diagnosis in complex software systems with information-theoretic monitoring
US8903757B2 (en) Proactive information technology infrastructure management
US20090307347A1 (en) Using Transaction Latency Profiles For Characterizing Application Updates
Fu et al. Performance issue diagnosis for online service systems
WO2015110873A1 (en) Computer performance prediction using search technologies
CN114327964A (en) Method, device, equipment and storage medium for processing fault reasons of service system
KR102410151B1 (en) Method, apparatus and computer-readable medium for machine learning based observation level measurement using server system log and risk calculation using thereof
US9397921B2 (en) Method and system for signal categorization for monitoring and detecting health changes in a database system
CN114661568A (en) Abnormal operation behavior detection method, device, equipment and storage medium
Saluja et al. Optimized approach for antipattern detection in service computing architecture
CN111865899B (en) Threat-driven cooperative acquisition method and device
US20230333971A1 (en) Workload generation for optimal stress testing of big data management systems
CN116661954B (en) Virtual machine abnormality prediction method, device, communication equipment and storage medium
US20230306343A1 (en) Business process management system and method thereof
ChuahM et al. Failure diagnosis for cluster systems using partial correlations
Berry et al. An approach to detecting changes in the factors affecting the performance of computer systems
LYU et al. Alarm-Based Root Cause Analysis Based on Weighted Fault Propagation Topology for Distributed Information Network
CN117520040B (en) Micro-service fault root cause determining method, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant