CN117596126B - Monitoring method for high-speed network abnormality in high-performance cluster - Google Patents

Monitoring method for high-speed network abnormality in high-performance cluster Download PDF

Info

Publication number
CN117596126B
CN117596126B CN202410079549.8A CN202410079549A CN117596126B CN 117596126 B CN117596126 B CN 117596126B CN 202410079549 A CN202410079549 A CN 202410079549A CN 117596126 B CN117596126 B CN 117596126B
Authority
CN
China
Prior art keywords
cluster
representing
cluster node
domain
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410079549.8A
Other languages
Chinese (zh)
Other versions
CN117596126A (en
Inventor
戴煜
刘翀
康浩鹏
张家杰
姚胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Advanced Computing Center Operations Management Co ltd
Original Assignee
Hefei Advanced Computing Center Operations Management Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Advanced Computing Center Operations Management Co ltd filed Critical Hefei Advanced Computing Center Operations Management Co ltd
Priority to CN202410079549.8A priority Critical patent/CN117596126B/en
Publication of CN117596126A publication Critical patent/CN117596126A/en
Application granted granted Critical
Publication of CN117596126B publication Critical patent/CN117596126B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence

Abstract

The invention relates to the technical field of distributed computing, and discloses a monitoring method for high-speed network abnormality in a high-performance cluster, which comprises the following steps: acquiring information of cluster nodes with abnormal network connection; acquiring information of a parallel communication domain where a first cluster node is located; generating a domain relation according to the information of the parallel communication domain in which the first cluster node is located, and if a relation of mutual communication exists between two cluster nodes in one parallel communication domain, generating the domain relation for the two cluster nodes; generating a characterization feature for each cluster node; inputting the characterization features of the cluster nodes into an anomaly identification model, and outputting a result representing the type of network anomaly cause of the first cluster node; the invention identifies the reason of the network abnormality of the cluster node by learning the occurrence mode of the network abnormality of the cluster node in each layer of communication domain through the training model, can respond and adjust in time in a targeted manner, and stabilizes the service response speed of the high-performance computing cluster.

Description

Monitoring method for high-speed network abnormality in high-performance cluster
Technical Field
The invention relates to the technical field of distributed computing, in particular to a monitoring method for high-speed network abnormality in a high-performance cluster.
Background
The high-performance computing cluster is an architecture capable of using a plurality of computers for parallel computing, message transmission is carried out among nodes through a cross-language communication protocol, and along with the continuous expansion of the cluster scale and the continuous growth of the cluster performance, the influence of network delay on the high-performance computing cluster is larger and larger; the invention discloses a method, a device, equipment and a readable medium for detecting sub-health of a cluster network, wherein the network state of a node is judged based on the time delay of a data packet in a log, for example, the invention with the bulletin number of CN115002001B, and the node with abnormal network state is switched through the time delay and the packet loss rate of data transmission;
according to the general method, only whether the node of the high-performance computing cluster has network abnormality or not can be judged, the reason of the network abnormality cannot be analyzed, operation and maintenance personnel are required to check logs to carry out debugging analysis, and a great amount of time is required for analyzing specific reasons, so that the network abnormality of the node can be solved only by switching the node in order to ensure the normal operation of the cluster, but the problem after switching the node still exists possibly due to the undefined reason of the network abnormality of the node, frequent switching of the node is caused, and the service response speed of the high-performance computing cluster is influenced.
Disclosure of Invention
The invention provides a monitoring method for high-speed network abnormality in a high-performance cluster, which solves the technical problem that the service response speed of the high-performance computing cluster is influenced only by switching nodes to solve the network abnormality of the nodes because the reason of the network abnormality cannot be resolved in the related technology.
The invention provides a monitoring method for high-speed network abnormality in a high-performance cluster, which comprises the following steps:
step 101, obtaining information of cluster nodes with abnormal network connection, defining the cluster nodes with abnormal network connection as first cluster nodes, and defining the cluster nodes outside the first cluster nodes as second cluster nodes;
102, obtaining information of a parallel communication domain where a first cluster node is located;
step 103, generating a domain relation according to the information of the parallel communication domain where the first cluster node is located, and if a relation of mutual communication exists between two cluster nodes in one parallel communication domain, generating the domain relation for the two cluster nodes;
step 104, generating a characterization feature for each cluster node;
step 105, inputting the characterization features of the cluster nodes into an anomaly identification model, and outputting a result representing the type of network anomaly cause of the first cluster node.
Further, the definition of the network connection abnormality of the cluster node is that the establishment of the network connection exceeds a set first time.
Further, encoding based on information of the subtasks performed by the cluster node generates a characterization feature of the cluster node.
Further, the anomaly identification model includes:
the feature fusion layer has the following calculation formula:
,/>
fusion characteristics of the ith cluster node representing the kth parallel communication domain, +.>And->Characterization features of the ith and jth cluster nodes, respectively, representing the kth parallel communication domain,/->And->Respectively representing a first weight parameter and a second weight parameter,/respectively>Weight vector representing the first hidden layer, +.>A set of cluster nodes representing an edge domain relationship with an ith cluster node in a kth parallel communication domain,/v>Representing an activation function->Fusion weights representing the ith and jth cluster nodes of the kth parallel communication domain,/, are->And->Intermediate characteristics of the ith and jth cluster nodes respectively representing the kth parallel communication domain,/, are represented by->Represents an exponential function based on natural constants, < ->Representing vector stitching, T representing transposition;
the cross-domain fusion layer has the following calculation formula:
wherein the method comprises the steps ofAnd->Respectively representing the fusion characteristics of the first cluster node in the kth and the h parallel communication domains, M represents the total number of the parallel communication domains in which the first cluster node is located, and +.>Representing a characterization feature of the first cluster node, < >>Representing the first bias parameter, ">Representing a third weight parameter, ++>Representing cross-domain fusion characteristics of the first cluster node;
the output layer has the following calculation formula:
wherein,representing the fourth weight parameter,/->Representing a second bias parameter, ">Representing the output vector.
Further, one component of the output vector corresponds to the network anomaly cause type representing one first cluster node, and the network anomaly cause type of the first cluster node represented by the component with the largest output vector is used as the output result.
The invention provides a monitoring system for high-speed network abnormality in a high-performance cluster, which comprises:
the abnormality identification module is used for identifying cluster nodes with abnormal network connection;
the information acquisition module is used for acquiring information of cluster nodes with abnormal network connection and acquiring information of parallel communication domains where the first cluster nodes are located;
the domain relation generating module generates a domain relation according to the information of the parallel communication domain where the first cluster node is located;
a characterization module for generating a characterization feature for each cluster node;
and the pattern recognition module is used for inputting the characterization characteristics of the cluster nodes into the anomaly recognition model and outputting a result representing the type of network anomaly cause of the first cluster node.
Further, the types of network anomaly causes of the first cluster node include communication domain allocation errors and excessive CPU usage.
Communication domain allocation errors; the communication domain allocation error herein refers to: a process corresponding to the first cluster node performs a certain service, and the parallel communication domain distributed by the cluster node organized by the service does not contain the first cluster node;
the CPU usage is too high, which means that: the first cluster node uses the virtual network service, and an excessive number of data packets results in an excessive CPU usage rate for running the virtual network service.
Further, the cluster node control system further comprises an abnormal node control module, wherein the abnormal node control module executes a corresponding control strategy to control the cluster node based on the network abnormality cause type of the first cluster node.
Further, if the network abnormality cause type of the first cluster node is that the CPU usage is too high, one of the following policies is adopted: transferring part of subtasks of the first cluster node to the second cluster node until the abnormal network connection of the first cluster node disappears;
transferring all subtasks of the first cluster node to a second cluster node with higher CPU processing performance;
the network abnormality cause type of the first cluster node is communication domain allocation error, and the following strategy is adopted: the first cluster node is registered in a parallel communication domain organized by services performed by the corresponding process.
The present invention provides a storage medium storing non-transitory computer readable instructions that, when executed by a computer, are capable of performing the steps of a method of monitoring for high-speed network anomalies in a high-performance cluster as described above.
The invention has the beneficial effects that: the invention identifies the reason of the network abnormality of the cluster node by learning the occurrence mode of the network abnormality of the cluster node in each layer of communication domain through the training model, can respond and adjust in time in a targeted manner, and stabilizes the service response speed of the high-performance computing cluster.
Drawings
FIG. 1 is a flow chart of a method of monitoring for high speed network anomalies in a high performance cluster in accordance with the present invention;
FIG. 2 is a schematic diagram of a monitoring system for high-speed network anomalies in a high-performance cluster according to one embodiment of the present invention;
fig. 3 is a schematic diagram of a second module of the monitoring system for high-speed network anomalies in a high-performance cluster according to the present invention.
In the figure: the system comprises an anomaly identification module 201, an information acquisition module 202, a domain relation generation module 203, a characterization module 204, a pattern identification module 205 and an anomaly node control module 206.
Detailed Description
The subject matter described herein will now be discussed with reference to example embodiments. It is to be understood that these embodiments are merely discussed so that those skilled in the art may better understand and implement the subject matter described herein and that changes may be made in the function and arrangement of the elements discussed without departing from the scope of the disclosure herein. Various examples may omit, replace, or add various procedures or components as desired. In addition, features described with respect to some examples may be combined in other examples as well.
In at least one embodiment of the present invention, a method for monitoring a high-speed network anomaly in a high-performance cluster is provided, as shown in fig. 1, including the following steps:
step 101, obtaining information of cluster nodes with abnormal network connection, defining the cluster nodes with abnormal network connection as first cluster nodes, and defining the cluster nodes outside the first cluster nodes as second cluster nodes;
here, the definition of the cluster node network connection anomaly is that the establishment of the network connection exceeds a set first time. Generally, for a high-performance computing cluster, the processing response time of the cluster node is generally 100ms, so the first time can be defined as 10ms, and the impact on the performance of the high-performance computing cluster is larger according to the wooden barrel effect.
The establishment of a network connection is typically embodied by the delay of a data packet in a log;
for sporadic network connection anomalies, the delays of the data packets are different, and in order to enhance robustness, the delays of the data packets in the logs of the cluster nodes are ordered from large to small, and the delay of the rank L is read as the time for establishing the network connection of the cluster nodes; the default value of L is 3.
Of course, reference may be made to other methods of statistically defining delays in the art, and it should be noted that the delay of packet loss is generally defined as an infinite value.
102, obtaining information of a parallel communication domain where a first cluster node is located;
a parallel communication domain comprises a group of processes, the processes are operated by the cluster nodes, and the processes can communicate and cooperate with each other, so that the cluster nodes correspond to the processes and can be mapped as the relationship among the cluster nodes according to the relationship of the processes of the parallel communication domain;
here, processes in the parallel communication domain include relationships that communicate and cooperate with each other.
Step 103, generating a domain relation according to the information of the parallel communication domain where the first cluster node is located, and if a relation of mutual communication exists between two cluster nodes in one parallel communication domain, generating the domain relation for the two cluster nodes;
the first cluster node is within a plurality of different parallel communication domains, and thus defines IDs for the parallel communication domains, which can be distinguished according to the IDs of the parallel communication domains from which the domain relationship originates.
Step 104, generating a characterization feature for each cluster node;
in one embodiment of the invention, the characterization features belong to non-artificial features and are generated by adopting modes such as single-heat coding and the like.
In one embodiment of the invention, the encoding is performed based on information of subtasks performed by the cluster nodes to generate characterization features;
specifically, the content information of the subtasks is expressed in a text form, and the characterization features are obtained through semantic coding modes such as word vector coding and the like.
Step 105, inputting the characterization features of the cluster nodes into an anomaly identification model, wherein the anomaly identification model comprises:
the feature fusion layer has the following calculation formula:
,/>
fusion characteristics of the ith cluster node representing the kth parallel communication domain, +.>And->Ith respectively representing kth parallel communication domainAnd characterization feature of the jth cluster node, < >>And->Respectively representing a first weight parameter and a second weight parameter,/respectively>Weight vector representing the first hidden layer, +.>A set of cluster nodes representing an edge domain relationship with an ith cluster node in a kth parallel communication domain,/v>Representing an activation function->Fusion weights representing the ith and jth cluster nodes of the kth parallel communication domain,/, are->And->Intermediate characteristics of the ith and jth cluster nodes respectively representing the kth parallel communication domain,/, are represented by->Represents an exponential function based on natural constants, < ->Representing vector stitching, T representing transposition;
the cross-domain fusion layer has the following calculation formula:
wherein the method comprises the steps ofAnd->Respectively representing the fusion characteristics of the first cluster node in the kth and the h parallel communication domains, M represents the total number of the parallel communication domains in which the first cluster node is located, and +.>Representing a characterization feature of the first cluster node, < >>Representing the first bias parameter, ">Representing a third weight parameter, ++>Representing cross-domain fusion characteristics of the first cluster node;
the output layer has the following calculation formula:
wherein,representing the fourth weight parameter,/->Representing a second bias parameter, ">Representing the output vector.
One component of the output vector corresponds to the network abnormality cause type representing one first cluster node, and the network abnormality cause type of the first cluster node represented by the component with the largest output vector is used as an output result.
The trained anomaly identification model has the capability to output results representative of the type of network anomaly cause for the first cluster node.
If the number of the first cluster nodes is more than one, calculating in series or in parallel, and inputting the fusion characteristics of only one first cluster node to a cross-domain fusion layer at one time or one channel, and outputting the network abnormality cause type of only one first cluster node.
In one embodiment of the present invention, to accommodate a dynamic high performance computer cluster, another anomaly identification model is provided, and the calculation formula of the output layer is as follows:
wherein,representing the fourth weight parameter,/->Representing a second bias parameter, ">Representing the output vector +.>Representing vector concatenation function, ">Representing service characteristics of the first cluster node;
the service feature is a representation of the service in which the first cluster node participates, and the service can also be represented by descriptive text or codes, and the service feature can be obtained by means of word vectors or other semantic codes.
The weight parameters and the bias parameters can be obtained by updating the back propagation according to the general neural network training process.
At least one embodiment of the present invention provides a storage medium storing non-transitory computer readable instructions that, when executed by a computer, are capable of performing the steps of a method for monitoring high-speed network anomalies in a high-performance cluster as described above.
At least one embodiment of the present invention provides a monitoring system for high-speed network anomalies in a high-performance cluster, as shown in fig. 2, including:
an anomaly identification module 201 for identifying cluster nodes in which network connection anomalies occur;
an information obtaining module 202, configured to obtain information of cluster nodes with abnormal network connection, and obtain information of a parallel communication domain where a first cluster node is located;
a domain relation generating module 203, configured to generate a domain relation according to information of a parallel communication domain where the first cluster node is located;
a characterization module 204 for generating characterization features for each cluster node;
the pattern recognition module 205 is configured to input the characterization feature of the cluster node into the anomaly recognition model, and output a result indicating the type of network anomaly cause of the first cluster node.
In one embodiment of the invention, the types of network anomaly causes for the first cluster node include communication domain allocation errors and CPU usage that is too high.
Communication domain allocation errors; the communication domain allocation error herein refers to: a process corresponding to the first cluster node performs a certain service, and the parallel communication domain distributed by the cluster node organized by the service does not contain the first cluster node;
the CPU usage is too high, which means that: the first cluster node uses virtual network service, and the excessive number of data packets leads to the excessively high CPU utilization rate of running the virtual network service;
of course, the network anomaly cause type of the first cluster node of the present invention may also include other types for indicating the network anomaly cause of other cluster nodes known to those skilled in the art.
It should be noted that, the problem that the network bandwidth is insufficient belongs to the mismatch of task allocation resources, which generally does not occur in a normal cluster, and the factors of the network service provider belong to factors outside the cluster, and also do not belong to the network abnormality cause of the first cluster node.
In one embodiment of the present invention, a monitoring system for high-speed network anomalies in a high-performance cluster is provided, as shown in fig. 3, further including an anomaly node control module 206, which executes a corresponding control policy based on a network anomaly cause type of a first cluster node;
specifically, for a CPU with too high a usage rate, the following strategy is adopted: transferring part of subtasks of the first cluster node to the second cluster node until the abnormal network connection of the first cluster node disappears;
transferring all subtasks of the first cluster node to a second cluster node with higher CPU processing performance;
for communication domain allocation errors, the following strategy is adopted: the first cluster node is registered in a parallel communication domain organized by services performed by the corresponding process.
The embodiment has been described above with reference to the embodiment, but the embodiment is not limited to the above-described specific implementation, which is only illustrative and not restrictive, and many forms can be made by those of ordinary skill in the art, given the benefit of this disclosure, are within the scope of this embodiment.

Claims (8)

1. The monitoring method for the high-speed network abnormality in the high-performance cluster is characterized by comprising the following steps of:
step 101, obtaining information of cluster nodes with abnormal network connection, defining the cluster nodes with abnormal network connection as first cluster nodes, and defining the cluster nodes outside the first cluster nodes as second cluster nodes;
102, obtaining information of a parallel communication domain where a first cluster node is located;
step 103, generating a domain relation according to the information of the parallel communication domain where the first cluster node is located, and if a relation of mutual communication exists between two cluster nodes in one parallel communication domain, generating the domain relation for the two cluster nodes;
104, generating a characterization feature for each cluster node, and encoding based on the information of the subtasks executed by the cluster node to generate the characterization feature of the cluster node;
step 105, inputting the characterization features of the cluster nodes into an anomaly identification model, and outputting a result representing the type of network anomaly cause of the first cluster node;
the anomaly identification model includes:
the feature fusion layer has the following calculation formula:
,/>
wherein the method comprises the steps ofFusion characteristics of the ith cluster node representing the kth parallel communication domain, +.>And->Characterization features of the ith and jth cluster nodes, respectively, representing the kth parallel communication domain,/->And->Respectively representing a first weight parameter and a second weight parameter,/respectively>Weight vector representing the first hidden layer, +.>A set of cluster nodes representing an edge domain relationship with an ith cluster node in a kth parallel communication domain,/v>Representing an activation function->Fusion weights representing the ith and jth cluster nodes of the kth parallel communication domain,/, are->And->Intermediate characteristics of the ith and jth cluster nodes respectively representing the kth parallel communication domain,/, are represented by->Represents an exponential function based on natural constants, < ->Representing vector stitching, T representing transposition;
the cross-domain fusion layer has the following calculation formula:
wherein the method comprises the steps ofAnd->Respectively representing the fusion characteristics of the first cluster node in the kth and the h parallel communication domains, M represents the total number of the parallel communication domains in which the first cluster node is located, and +.>Representing a characterization feature of the first cluster node, < >>Representing the first bias parameter, ">Representing a third weight parameter, ++>Representing cross-domain fusion characteristics of the first cluster node;
the output layer has the following calculation formula:
wherein,representing the fourth weight parameter,/->Representing a second bias parameter, ">Representing the output vector.
2. A method of monitoring for high-speed network anomalies in a high-performance cluster according to claim 1, characterized in that the definition of a cluster node network connection anomaly is a first time at which a network connection is established beyond a set point.
3. The method according to claim 1, wherein one component of the output vector corresponds to a network anomaly cause type representing one of the first cluster nodes, and the network anomaly cause type of the first cluster node represented by the component with the largest output vector is used as the output result.
4. A monitoring system for high-speed network anomalies in a high-performance cluster, characterized in that it is configured to perform the steps of a method for monitoring for high-speed network anomalies in a high-performance cluster according to any one of claims 1 to 3, comprising:
the abnormality identification module is used for identifying cluster nodes with abnormal network connection;
the information acquisition module is used for acquiring information of cluster nodes with abnormal network connection and acquiring information of parallel communication domains where the first cluster nodes are located;
the domain relation generating module generates a domain relation according to the information of the parallel communication domain where the first cluster node is located;
the characterization module is used for generating characterization features for each cluster node, and encoding the characterization features for the cluster node based on the information of the subtasks executed by the cluster node;
the pattern recognition module is used for inputting the characterization features of the cluster nodes into the anomaly recognition model and outputting a result representing the type of network anomaly cause of the first cluster node;
the anomaly identification model includes:
the feature fusion layer has the following calculation formula:
,/>
wherein the method comprises the steps ofFusion characteristics of the ith cluster node representing the kth parallel communication domain, +.>And->Characterization features of the ith and jth cluster nodes, respectively, representing the kth parallel communication domain,/->And->Respectively representing a first weight parameter and a second weight parameter,/respectively>Weight vector representing the first hidden layer, +.>A set of cluster nodes representing an edge domain relationship with an ith cluster node in a kth parallel communication domain,/v>Representing an activation function->Fusion weights representing the ith and jth cluster nodes of the kth parallel communication domain,/, are->And->Intermediate characteristics of the ith and jth cluster nodes respectively representing the kth parallel communication domain,/, are represented by->Represents an exponential function based on natural constants, < ->Representing vector stitching, T representing transposition;
the cross-domain fusion layer has the following calculation formula:
wherein the method comprises the steps ofAnd->Respectively representing the fusion characteristics of the first cluster node in the kth and the h parallel communication domains, M represents the total number of the parallel communication domains in which the first cluster node is located, and +.>Representing a characterization feature of the first cluster node, < >>Representing the first bias parameter, ">Representing a third weight parameter, ++>Representing cross-domain fusion characteristics of the first cluster node;
the output layer has the following calculation formula:
wherein,representing the fourth weight parameter,/->Representing a second bias parameter, ">Representing the output vector.
5. The monitoring system for high-speed network anomalies in a high-performance cluster according to claim 4, wherein the types of network anomalies for the first cluster node include communication domain allocation errors and CPU utilization overages;
communication domain allocation errors; the communication domain allocation error herein refers to: a process corresponding to the first cluster node performs a certain service, and the parallel communication domain distributed by the cluster node organized by the service does not contain the first cluster node;
the CPU usage is too high, which means that: the first cluster node uses the virtual network service, and an excessive number of data packets results in an excessive CPU usage rate for running the virtual network service.
6. The system of claim 4, further comprising an anomaly node control module that executes a corresponding control policy to control the cluster nodes based on the network anomaly cause type of the first cluster node.
7. The system of claim 5, wherein the network anomaly type of the first cluster node is a CPU utilization that is too high, and one of the following policies is adopted: transferring part of subtasks of the first cluster node to the second cluster node until the abnormal network connection of the first cluster node disappears;
transferring all subtasks of the first cluster node to a second cluster node with higher CPU processing performance;
the network abnormality cause type of the first cluster node is communication domain allocation error, and the following strategy is adopted: the first cluster node is registered in a parallel communication domain organized by services performed by the corresponding process.
8. A storage medium storing non-transitory computer readable instructions which, when executed by a computer, are capable of performing the steps of a method of monitoring for high speed network anomalies in a high-performance cluster according to any one of claims 1 to 3.
CN202410079549.8A 2024-01-19 2024-01-19 Monitoring method for high-speed network abnormality in high-performance cluster Active CN117596126B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410079549.8A CN117596126B (en) 2024-01-19 2024-01-19 Monitoring method for high-speed network abnormality in high-performance cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410079549.8A CN117596126B (en) 2024-01-19 2024-01-19 Monitoring method for high-speed network abnormality in high-performance cluster

Publications (2)

Publication Number Publication Date
CN117596126A CN117596126A (en) 2024-02-23
CN117596126B true CN117596126B (en) 2024-03-26

Family

ID=89917057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410079549.8A Active CN117596126B (en) 2024-01-19 2024-01-19 Monitoring method for high-speed network abnormality in high-performance cluster

Country Status (1)

Country Link
CN (1) CN117596126B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115550139A (en) * 2022-09-19 2022-12-30 中国电信股份有限公司 Fault root cause positioning method, device and system, electronic equipment and storage medium
CN115865625A (en) * 2022-11-28 2023-03-28 武汉烽火技术服务有限公司 Method and device for analyzing fault root cause of communication equipment
CN116450399A (en) * 2023-06-13 2023-07-18 西华大学 Fault diagnosis and root cause positioning method for micro service system
CN116684253A (en) * 2023-05-24 2023-09-01 电子科技大学 Network anomaly management and control method based on intelligent operation and maintenance
CN117376092A (en) * 2023-10-24 2024-01-09 中国联合网络通信集团有限公司 Fault root cause positioning method, device, equipment and storage medium
CN117376084A (en) * 2022-06-30 2024-01-09 华为技术有限公司 Fault detection method, electronic equipment and medium thereof

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10193780B2 (en) * 2015-10-09 2019-01-29 Futurewei Technologies, Inc. System and method for anomaly root cause analysis
US11252014B2 (en) * 2019-09-30 2022-02-15 Dynatrace Llc Forming root cause groups of incidents in clustered distributed system through horizontal and vertical aggregation
US11894969B2 (en) * 2021-07-12 2024-02-06 Ciena Corporation Identifying root causes of network service degradation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117376084A (en) * 2022-06-30 2024-01-09 华为技术有限公司 Fault detection method, electronic equipment and medium thereof
CN115550139A (en) * 2022-09-19 2022-12-30 中国电信股份有限公司 Fault root cause positioning method, device and system, electronic equipment and storage medium
CN115865625A (en) * 2022-11-28 2023-03-28 武汉烽火技术服务有限公司 Method and device for analyzing fault root cause of communication equipment
CN116684253A (en) * 2023-05-24 2023-09-01 电子科技大学 Network anomaly management and control method based on intelligent operation and maintenance
CN116450399A (en) * 2023-06-13 2023-07-18 西华大学 Fault diagnosis and root cause positioning method for micro service system
CN117376092A (en) * 2023-10-24 2024-01-09 中国联合网络通信集团有限公司 Fault root cause positioning method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GAD:基于拓扑感知的时间序列异常检测;戚琦;申润业;王敬宇;;通信学报;20200624(06);第152-160页 *

Also Published As

Publication number Publication date
CN117596126A (en) 2024-02-23

Similar Documents

Publication Publication Date Title
EP3796176B1 (en) Fault root cause analysis method and apparatus
US8676965B2 (en) Tracking high-level network transactions
CN112468347B (en) Security management method and device for cloud platform, electronic equipment and storage medium
CN109462590B (en) Unknown protocol reverse analysis method based on fuzzy test
US11586981B2 (en) Failure analysis device, failure analysis method, and failure analysis program
RU2697648C2 (en) Traffic classification system
Dai et al. Identifying and estimating persistent items in data streams
CN112087334A (en) Alarm root cause analysis method, electronic device and storage medium
US7844443B2 (en) Network subscriber experience modeling
CN116723136B (en) Network data detection method applying FCM clustering algorithm
CN117596126B (en) Monitoring method for high-speed network abnormality in high-performance cluster
CN116828087B (en) Information security system based on block chain connection
CN112543145A (en) Method and device for selecting communication path of equipment node for sending data
CN110266554B (en) Testing method of private communication protocol
Chiu et al. A genetic algorithm for reliability-oriented task assignment with k/spl tilde/duplications in distributed systems
JP7173273B2 (en) Failure analysis device, failure analysis method and failure analysis program
CN110798371A (en) Testing method of private communication protocol
CN115766518A (en) Anomaly detection model training and anomaly detection method and system for cloud side end system
Sheng et al. How to fingerprint attack traffic against industrial control system network
CN112600753B (en) Equipment node communication path selection method and device according to equipment access amount
CN114330363A (en) Industrial control protocol vulnerability mining method based on vulnerability semantic intelligent analysis
JP2018537009A (en) Whitelist generator, whitelist evaluator, whitelist generator / evaluator, whitelist generation method, whitelist evaluation method, and whitelist generation / evaluation method
CN111950853A (en) Power running state white list generation method based on information physical bilateral data
Cheng et al. Analysis of policy anomalies in distributed firewalls
CN113055388B (en) Deep packet detection method and system based on generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant