CN115604089A - Network fault positioning method and device - Google Patents

Network fault positioning method and device Download PDF

Info

Publication number
CN115604089A
CN115604089A CN202210507852.4A CN202210507852A CN115604089A CN 115604089 A CN115604089 A CN 115604089A CN 202210507852 A CN202210507852 A CN 202210507852A CN 115604089 A CN115604089 A CN 115604089A
Authority
CN
China
Prior art keywords
data
delay
data traffic
analysis
network fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210507852.4A
Other languages
Chinese (zh)
Inventor
余学山
赵耀
杨飘飘
陈镛先
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210507852.4A priority Critical patent/CN115604089A/en
Publication of CN115604089A publication Critical patent/CN115604089A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the application provides a network fault positioning method and a device, which can be used in the field of finance, and the method comprises the following steps: collecting data traffic forwarded by an edge access switch, performing message analysis, and determining the data traffic with possible congestion; judging whether the data flow with the possibility of congestion is a micro-burst data flow or not, and if not, performing relevance matching and life cycle identification on the round-trip characteristic message of the data flow; after the relevance matching and the life cycle identification are completed, carrying out delay analysis on the data traffic to determine the network fault position; the method and the device can accurately delimit the network fault reason.

Description

Network fault positioning method and device
Technical Field
The application relates to the field of data processing and can also be used in the field of finance, in particular to a network fault positioning method and device.
Background
The data center storage service generally has the risks of a large amount of packet loss and service transaction amount reduction of the whole link due to network congestion. The current data center lacks storage service network modeling, which causes the relevance between network performance and storage service quality to be unclear, and the indexes of network packet loss, throughput and the like can only explain part of basic network problems.
Disclosure of Invention
Aiming at the problems in the prior art, the application provides a network fault positioning method and device, which can accurately delimit the network fault reason.
In order to solve at least one of the above problems, the present application provides the following technical solutions:
in a first aspect, the present application provides a network fault location method, including:
collecting data traffic forwarded by an edge access switch, performing message analysis, and determining the data traffic with possible congestion;
judging whether the data flow with the possibility of congestion is a micro-burst data flow or not, and if not, performing relevance matching and life cycle identification on the round-trip characteristic message of the data flow;
and after the relevance matching and the life cycle identification are completed, carrying out delay analysis on the data flow to determine the network fault position.
Further, the acquiring data traffic forwarded by the edge access switch and performing packet analysis to determine data traffic with possible congestion includes:
monitoring data traffic forwarded by a storage service edge access switch in real time and acquiring message header information of the data traffic;
and carrying out characteristic analysis on the message header information of the data flow to determine the data flow with possible congestion.
Further, the determining whether the data traffic with the possibility of congestion is a micro-burst data traffic includes:
and carrying out micro-burst data flow judgment on the data flow with the possibility of congestion according to the chip cache queue depth information read in a set time period.
Further, the performing relevance matching and life cycle identification on the round trip feature packet of the data traffic includes:
performing relevance matching on the round-trip characteristic messages of the data traffic on the edge access switch processors accessed on the client side and the storage array side respectively;
and identifying the end-side data access delay, the data preparation delay, the data transmission delay and the data confirmation delay in the data traffic life cycle.
Further, the performing, after the relevance matching and the life cycle identification are completed, a delay analysis on the data traffic includes:
carrying out large and small flow type distinguishing on the data flow according to the relevance matching result of the round-trip characteristic message;
if the data traffic belongs to a small traffic type, the data traffic is uploaded to a set acquisition server in a data mirror image mode for delay analysis;
and if the data flow belongs to a large-flow type, preprocessing data and then uploading the data to a set acquisition server for time delay analysis.
Further, the performing delay analysis on the data traffic after the relevance matching and the life cycle identifier are completed to determine a network fault location includes:
if the delay ratio of the data transmission delay exceeds a threshold value, judging that a network side has a fault, and sending the bandwidth utilization rate and the packet loss rate of the network switch to a network operation and maintenance end for emergency treatment;
and if the delay ratio of the data preparation delay and/or the data confirmation delay exceeds a threshold value, judging that the end side has a fault, and informing the system operation and maintenance end to check the client and the storage array.
In a second aspect, the present application provides a network fault location apparatus, including:
the information acquisition module is used for acquiring data traffic forwarded by the edge access switch, analyzing messages and determining the data traffic with possible congestion;
the data volume correlation module is used for judging whether the data flow with the possibility of congestion is a micro-burst data flow or not, and if not, performing correlation matching and life cycle identification on the round-trip characteristic message of the data flow;
and the delay fault analysis module is used for performing delay analysis on the data traffic after the relevance matching and the life cycle identification are completed, and determining the network fault position.
Further, the information acquisition module comprises:
the message header information acquisition unit is used for monitoring the data traffic forwarded by the storage service edge access switch in real time and acquiring the message header information of the data traffic;
and the message header characteristic analysis unit is used for carrying out characteristic analysis on the message header information of the data flow and determining the data flow with possible congestion.
Further, the data volume association module comprises:
and the micro-burst judging unit is used for judging the data flow with the possibility of congestion according to the chip cache queue depth information read in the set time period.
Further, the data volume association module comprises:
the relevance matching unit is used for performing relevance matching on the round-trip characteristic messages of the data traffic on the edge access switch processors accessed on the client side and the storage array side respectively;
and the life cycle identification unit is used for identifying end-side data access delay, data preparation delay, data transmission delay and data confirmation delay in the data traffic life cycle.
Further, the delay fault analysis module includes:
a big flow and small flow distinguishing unit, which is used for distinguishing the big flow and small flow types of the data flow according to the correlation matching result of the round trip characteristic message;
the small flow uploading unit is used for uploading the data flow to a set acquisition server in a data mirror image mode for time delay analysis if the data flow belongs to a small flow type;
and the large-flow uploading unit is used for preprocessing data and then uploading the data to a set acquisition server for time delay analysis if the data flow belongs to a large-flow type.
Further, the delay fault analysis module includes:
the network side fault positioning unit is used for judging that a network side has a fault if the delay ratio of the data transmission delay exceeds a threshold value, and sending the bandwidth utilization rate and the packet loss rate of the network switch to a network operation and maintenance end for emergency treatment;
and the end side fault positioning unit is used for judging that the end side has a fault if the delay ratio of the data preparation delay and/or the data confirmation delay exceeds a threshold value, and informing the system operation and maintenance end of checking the client and the storage array.
In a third aspect, the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the network fault location method are implemented.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the network fault location method described herein.
In a fifth aspect, the present application provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the network fault location method described.
According to the technical scheme, the method and the device for positioning the network fault are used for accurately partitioning the micro burst data stream by collecting the data traffic forwarded by the edge access switch and performing message analysis, so that the total data collection amount is greatly reduced, the bottleneck of data collection is avoided, and abnormal reasons are accurately distinguished and delimited according to the decomposition of the total data traffic delay.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a network fault location method in an embodiment of the present application;
fig. 2 is a second flowchart of a network fault location method according to an embodiment of the present application;
fig. 3 is a third schematic flowchart of a network fault location method in the embodiment of the present application;
fig. 4 is a fourth schematic flowchart of a network fault location method in the embodiment of the present application;
fig. 5 is a fifth flowchart illustrating a network fault location method according to an embodiment of the present application;
fig. 6 is one of the structural diagrams of the network fault location apparatus in the embodiment of the present application;
fig. 7 is a second block diagram of a network fault location device according to an embodiment of the present invention;
fig. 8 is a third block diagram of a network fault location device in an embodiment of the present application;
fig. 9 is a fourth structural diagram of a network fault location device in the embodiment of the present application;
fig. 10 is a fifth structural diagram of a network fault location device in the embodiment of the present application;
fig. 11 is a sixth configuration diagram of a network fault location device in an embodiment of the present application;
fig. 12 is a schematic flow chart illustrating a network fault location method according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
According to the technical scheme, the data acquisition, storage, use, processing and the like meet relevant regulations of national laws and regulations.
In view of the problem of inaccurate network fault location in the prior art, the application provides a network fault location method and device, which accurately partition micro-burst data streams by collecting data traffic forwarded by an edge access switch and performing message analysis, greatly reduce the total amount of collected data, avoid encountering data collection bottlenecks, and accurately partition and delimit abnormal reasons according to the decomposition of the total delay of the data traffic.
In order to accurately delimit the cause of the network fault, the present application provides an embodiment of a network fault location method, and referring to fig. 1, the network fault location method specifically includes the following contents:
step S101: and collecting data traffic forwarded by the edge access switch, performing message analysis, and determining the data traffic with possible congestion.
Optionally, the data traffic forwarded by the storage service edge access switch may be monitored in real time, the header information of the data traffic may be collected, and the header information of the data traffic may be subjected to feature analysis to determine the data traffic with the possibility of congestion.
For example, the qos statistical information of the stored service data traffic may be analyzed, and if congestion control identification information such as ECN, CNP, PFC, and the like occurs, it is indicated that the service data IO may be congested.
Step S102: and judging whether the data flow with the possibility of congestion is a micro-burst data flow, and if not, performing relevance matching and life cycle identification on the round-trip characteristic message of the data flow.
Optionally, the micro-burst data flow judgment may be performed on the data flow with the possibility of congestion according to chip cache queue depth information read within a set time period.
For example, the occurrence of a single peak in the queue depth information read in a certain period and the absence of a queue depth peak in an adjacent period indicates the occurrence of a burst IO.
Optionally, the present application may further perform relevance matching on the round-trip feature packets of the data traffic on the edge access switch processors accessed on the client side and the storage array side, respectively. And identifying the end-side data access delay, the data preparation delay, the data transmission delay and the data confirmation delay in the data traffic life cycle.
Specifically, the performing relevance matching on the round trip feature packet of the data traffic means: and inserting a unique identification field into the header of the message based on the capability of the storage network protocol, and performing correlation matching on the round-trip characteristic message in one IO operation according to the identification field information of the header of the message.
Step S103: and after the relevance matching and the life cycle identification are completed, carrying out delay analysis on the data traffic to determine the network fault position.
In an embodiment of the present application, the present application may distinguish the types of the data traffic according to the correlation matching result of the round-trip feature packet; if the data flow belongs to a small flow type, the data flow is sent to a set acquisition server in a data mirror image mode for delay analysis; and if the data flow belongs to a large-flow type, preprocessing data and then uploading the data to a set acquisition server for time delay analysis.
Optionally, the delay analysis includes, for example, respective ratios of the identifiers (end-side data access delay, data preparation delay, data transmission delay, and data acknowledgement delay).
Specifically, if the delay ratio of the data transmission delay exceeds a threshold, it is determined that a network side fails, and the bandwidth utilization rate and the packet loss rate of the network switch are sent to a network operation and maintenance end for emergency treatment. And if the delay ratio of the data preparation delay and/or the data confirmation delay exceeds a threshold value, judging that the end side has a fault, and informing the system operation and maintenance end to check the client and the storage array.
As can be seen from the above description, the network fault location method provided in the embodiment of the present application can accurately partition the micro-burst data stream by collecting the data traffic forwarded by the edge access switch and performing packet analysis, thereby greatly reducing the total amount of collected data, avoiding the occurrence of a data collection bottleneck, and accurately distinguishing and delimiting the abnormal reasons according to the decomposition of the total delay of the data traffic.
In order to determine the data traffic with congestion possibility, in an embodiment of the network fault location method of the present application, referring to fig. 2, the step S101 may further specifically include the following steps:
step S201: and monitoring data traffic forwarded by the access switch of the storage service edge in real time and acquiring message header information of the data traffic.
Step S202: and performing characteristic analysis on the message header information of the data flow to determine the data flow with possible congestion.
In order to be able to screen the micro-burst data stream, in an embodiment of the network fault location method of the present application, the step S102 may further include the following steps:
and carrying out micro-burst data flow judgment on the data flow with the possibility of congestion according to the chip cache queue depth information read in a set time period.
In order to accurately perform association matching and life cycle identification, in an embodiment of the network fault location method according to the present application, referring to fig. 3, the step S102 may further specifically include the following steps:
step S301: and performing relevance matching on the round-trip characteristic messages of the data traffic on the edge access switch processors accessed on the client side and the storage array side respectively.
Step S302: and identifying the end-side data access delay, the data preparation delay, the data transmission delay and the data confirmation delay in the data traffic life cycle.
In order to accurately perform the delay analysis, in an embodiment of the network fault location method of the present application, referring to fig. 4, the step S103 may further specifically include the following steps:
step S401: and distinguishing the types of the data traffic according to the relevance matching result of the round-trip characteristic message.
Step S402: and if the data flow belongs to a small flow type, the data flow is uploaded to a set acquisition server in a data mirror image mode for delay analysis.
Step S403: and if the data flow belongs to a large-flow type, preprocessing data and then uploading the data to a set acquisition server for time delay analysis.
In order to accurately determine the network fault location, in an embodiment of the network fault location method of the present application, referring to fig. 5, the step S103 may further include the following steps:
step S501: and if the delay ratio of the data transmission delay exceeds a threshold value, judging that the network side has a fault, and sending the bandwidth utilization rate and the packet loss rate of the network switch to a network operation and maintenance end for emergency treatment.
Step S502: and if the delay ratio of the data preparation delay and/or the data confirmation delay exceeds a threshold value, judging that the end side has a fault, and informing the system operation and maintenance end to check the client and the storage array.
In order to accurately delimit the cause of the network fault, the present application provides an embodiment of a network fault location apparatus for implementing all or part of the content of the network fault location method, and referring to fig. 6, the network fault location apparatus specifically includes the following content:
and the information acquisition module 10 is configured to acquire data traffic forwarded by the edge access switch, perform message analysis, and determine data traffic with possible congestion.
And the data volume association module 20 is configured to determine whether the data traffic with the possibility of congestion is a micro-burst data flow, and if not, perform association matching and life cycle identification on the round-trip feature packet of the data traffic.
And the delay fault analysis module 30 is configured to perform delay analysis on the data traffic after the relevance matching and the life cycle identifier are completed, and determine a network fault location.
As can be seen from the above description, the network fault location device provided in the embodiment of the present application can accurately partition the micro burst data stream by collecting the data traffic forwarded by the edge access switch and performing packet analysis, thereby greatly reducing the total amount of collected data, avoiding encountering a data collection bottleneck, and accurately distinguishing and delimiting the abnormal reasons according to the decomposition of the total delay of the data traffic.
In order to determine the data traffic with the possibility of congestion, in an embodiment of the network fault location apparatus of the present application, referring to fig. 7, the information collection module 10 includes:
and the message header information acquisition unit 11 is used for monitoring the data traffic forwarded by the storage service edge access switch in real time and acquiring the message header information of the data traffic.
And a packet header feature analysis unit 12, configured to perform feature analysis on packet header information of the data traffic to determine data traffic with possible congestion.
In order to be able to screen the micro-burst data stream, in an embodiment of the network fault location apparatus of the present application, referring to fig. 8, the data volume association module 20 includes:
and the micro-burst judging unit 21 is configured to perform micro-burst data flow judgment on the data flow with the possibility of congestion according to the chip cache queue depth information read within a set time period.
In order to accurately perform association matching and life cycle identification, in an embodiment of the network fault location apparatus of the present application, referring to fig. 9, the data volume association module 20 includes:
and the relevance matching unit 22 is configured to perform relevance matching on the round-trip feature messages of the data traffic on the edge access switch processors accessed on the client side and the storage array side, respectively.
And the life cycle identification unit 23 is configured to identify end-side data access delay, data preparation delay, data transmission delay, and data acknowledgement delay in the data traffic life cycle.
In order to accurately perform the delay analysis, in an embodiment of the network fault location apparatus of the present application, referring to fig. 10, the delay fault analysis module 30 includes:
and a big-flow and small-flow distinguishing unit 31, configured to distinguish the types of the data flows according to the correlation matching result of the round-trip feature packet.
And the small flow uploading unit 32 is configured to, if the data flow belongs to a small flow type, upload the data flow to a set acquisition server in a data mirroring manner for performing delay analysis.
And the large-flow uploading unit 33 is used for, if the data flow belongs to a large-flow type, preprocessing the data and then uploading the data to a set acquisition server for time delay analysis.
In order to accurately determine the location of the network fault, in an embodiment of the network fault location apparatus of the present application, referring to fig. 11, the delay fault analysis module 30 includes:
and the network side fault positioning unit 34 is configured to determine that a network side has a fault if the delay ratio of the data transmission delay exceeds a threshold, and send the bandwidth utilization rate and the packet loss rate of the network switch to the network operation and maintenance end for emergency processing.
And the end-side fault positioning unit 35 is configured to determine that a fault occurs at the end side if the delay ratio of the data preparation delay and/or the data confirmation delay exceeds a threshold, and notify the system operation and maintenance end to perform client and storage array inspection.
To further explain the present solution, the present application further provides a specific application example of implementing the network fault location method by using the network fault location apparatus, referring to fig. 12, which specifically includes the following contents:
s1: and the monitoring analysis platform monitors IO traffic forwarded by the storage service edge access switch in real time and acquires equipment log observation IO traffic index information.
S2: the monitoring analysis platform is matched with the switch, only the information of the head part of the IO flow message is collected and sent to the index, the characteristic message information of the message head is analyzed, and the total amount of analysis and collection data is greatly reduced.
S3, primarily screening out IO which is possibly congested through service modeling analysis, reading chip cache queue depth information in a period to analyze whether the IO is micro-burst IO flow or not, wherein the micro-burst is a normal phenomenon; if not, further judgment is needed.
S3a: and respectively performing relevance matching on the IO round-trip characteristic messages on the edge access switch processors accessed at the client side and the storage array side, and identifying each life cycle of the IO (end side data access delay, data preparation delay, data transmission delay and data confirmation delay).
S3b: distinguishing large and small flows according to IO association conditions, and uploading small IOs to an acquisition server for analysis and processing in a mirror image or erpsan mode; the large IO is sent to an acquisition server for analysis and processing after the device is preprocessed by feature extraction, aggregation and the like.
S3c: and the analysis server completes the decomposition of the IO total time delay.
S4: the data transmission delay is related to bandwidth, network delay and the like, the data preparation delay and the data confirmation delay are related to a client and a storage array, and the abnormal delay is judged to occur at a network side or an end side.
S5: and according to the S4 result, if the data transmission delay is abnormal, namely the network side is abnormal, the network operation and maintenance personnel check the indexes of the bandwidth utilization rate, the packet loss rate and the like of the network switch and carry out emergency treatment.
S6: and according to the result of S4, if the data preparation delay and the data confirmation delay are abnormal, the system operation and maintenance personnel check the client and the storage array and perform targeted processing.
S7: and finishing the delimitation of the problem caused by the abnormal storage service.
As can be seen from the above, the present application can achieve at least the following technical effects:
1. and (4) constructing a service network model, and accurately distinguishing the data streams of the micro-burst type and the large-small IO type.
2. The total amount of the collected data is greatly reduced by collecting the IO message header information analysis mode, and the bottleneck of data collection is avoided.
3. And distinguishing the reason of the abnormal service of the delimited storage according to the decomposition of the total delay of the IO data stream.
In terms of hardware, in order to accurately delimit the cause of the network fault, the present application provides an embodiment of an electronic device for implementing all or part of contents in the network fault location method, where the electronic device specifically includes the following contents:
a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the communication interface is used for realizing information transmission between the network fault positioning device and relevant equipment such as a core service system, a user terminal, a relevant database and the like; the logic controller may be a desktop computer, a tablet computer, a mobile terminal, and the like, but the embodiment is not limited thereto. In this embodiment, the logic controller may be implemented with reference to the embodiments of the network fault location method and the embodiments of the network fault location device in the embodiments, and the contents thereof are incorporated herein, and repeated descriptions are omitted.
It is understood that the user terminal may include a smart phone, a tablet electronic device, a network set-top box, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), a vehicle-mounted device, a smart wearable device, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..
In practical applications, part of the network fault location method may be performed at the electronic device side as described above, or all operations may be performed in the client device. The selection may be specifically performed according to the processing capability of the client device, the limitation of the user usage scenario, and the like. This is not a limitation of the present application. The client device may further include a processor if all operations are performed in the client device.
The client device may have a communication module (i.e., a communication unit), and may be communicatively connected to a remote server to implement data transmission with the server. The server may include a server on the task scheduling center side, and in other implementation scenarios, the server may also include a server on an intermediate platform, for example, a server on a third-party server platform that is communicatively linked to the task scheduling center server. The server may include a single computer device, or may include a server cluster formed by a plurality of servers, or a server structure of a distributed apparatus.
Fig. 13 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 13, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 13 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.
In one embodiment, the network fault location method functions may be integrated into the central processor 9100. The central processor 9100 may be configured to control as follows:
step S101: and collecting data traffic forwarded by the edge access switch, performing message analysis, and determining the data traffic with possible congestion.
Step S102: and judging whether the data flow with the possibility of congestion is a micro-burst data flow, and if not, performing relevance matching and life cycle identification on the round-trip characteristic message of the data flow.
Step S103: and after the relevance matching and the life cycle identification are completed, carrying out delay analysis on the data traffic to determine the network fault position.
As can be seen from the above description, the electronic device provided in the embodiment of the present application accurately partitions the micro burst data stream by collecting the data traffic forwarded by the edge access switch and performing packet analysis, so as to substantially reduce the total amount of collected data, avoid encountering a data collection bottleneck, and accurately partition and delimit the abnormal reason according to the decomposition of the total delay of the data traffic.
In another embodiment, the network fault locating device may be configured separately from the central processor 9100, for example, the network fault locating device may be configured as a chip connected to the central processor 9100, and the network fault locating method function is implemented by the control of the central processor.
As shown in fig. 13, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 13; in addition, the electronic device 9600 may further include components not shown in fig. 13, which can be referred to in the prior art.
As shown in fig. 13, a central processor 9100, sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, which central processor 9100 receives input and controls the operation of the various components of the electronic device 9600.
The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.
The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. Power supply 9170 is used to provide power to electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.
The memory 9140 can be a solid state memory, e.g., read Only Memory (ROM), random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 9140 could also be some other type of device. Memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 being used for storing application programs and function programs or for executing a flow of operations of the electronic device 9600 by the central processor 9100.
The memory 9140 can also include a data store 9143, the data store 9143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).
The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132, thereby implementing ordinary telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.
An embodiment of the present application further provides a computer-readable storage medium capable of implementing all the steps in the network fault location method with the execution subject being the server or the client in the foregoing embodiments, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program implements all the steps in the network fault location method with the execution subject being the server or the client in the foregoing embodiments, for example, when the processor executes the computer program, the processor implements the following steps:
step S101: and collecting data traffic forwarded by the edge access switch, performing message analysis, and determining the data traffic with possible congestion.
Step S102: and judging whether the data flow with the possibility of congestion is a micro-burst data flow, and if not, performing relevance matching and life cycle identification on the round-trip characteristic message of the data flow.
Step S103: and after the relevance matching and the life cycle identification are completed, carrying out delay analysis on the data traffic to determine the network fault position.
As can be seen from the above description, the computer-readable storage medium provided in the embodiment of the present application accurately partitions the micro-burst data stream by collecting the data traffic forwarded by the edge access switch and performing packet analysis, so as to substantially reduce the total amount of collected data, avoid encountering a data collection bottleneck, and accurately partition and delimit the abnormal reason according to the decomposition of the total delay of the data traffic.
Embodiments of the present application further provide a computer program product capable of implementing all steps in the network fault location method in which the execution subject is a server or a client in the foregoing embodiments, and when being executed by a processor, the computer program/instruction implements the steps of the network fault location method, for example, the computer program/instruction implements the following steps:
step S101: and collecting data traffic forwarded by the edge access switch, performing message analysis, and determining the data traffic with possible congestion.
Step S102: and judging whether the data flow with the possibility of congestion is a micro-burst data flow or not, and if not, performing relevance matching and life cycle identification on the round-trip characteristic message of the data flow.
Step S103: and after the relevance matching and the life cycle identification are completed, carrying out delay analysis on the data traffic to determine the network fault position.
As can be seen from the above description, the computer program product provided in the embodiment of the present application accurately partitions the micro-burst data stream by collecting the data traffic forwarded by the edge access switch and performing packet analysis, thereby greatly reducing the total amount of collected data, avoiding encountering a data collection bottleneck, and accurately partitioning and delimiting the abnormal reasons according to the decomposition of the total delay of the data traffic.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method for locating a network fault, the method comprising:
collecting data traffic forwarded by an edge access switch, performing message analysis, and determining the data traffic with possible congestion;
judging whether the data flow with the possibility of congestion is a micro-burst data flow or not, and if not, performing relevance matching and life cycle identification on the round-trip characteristic message of the data flow;
and after the relevance matching and the life cycle identification are completed, carrying out delay analysis on the data traffic to determine the network fault position.
2. The method according to claim 1, wherein the collecting data traffic forwarded by the edge access switch and performing packet analysis to determine data traffic with possible congestion comprises:
monitoring data traffic forwarded by a storage service edge access switch in real time and acquiring message header information of the data traffic;
and carrying out characteristic analysis on the message header information of the data flow to determine the data flow with possible congestion.
3. The method according to claim 1, wherein the determining whether the data traffic with the possibility of congestion is a micro-burst data traffic comprises:
and carrying out micro-burst data flow judgment on the data flow with the possibility of congestion according to the chip cache queue depth information read in a set time period.
4. The method according to claim 1, wherein the performing correlation matching and life cycle identification on the round trip feature packet of the data traffic comprises:
performing relevance matching on the round-trip characteristic messages of the data flow on the edge access switch processors accessed at the client side and the storage array side respectively;
and identifying the end-side data access delay, the data preparation delay, the data transmission delay and the data confirmation delay in the data traffic life cycle.
5. The method according to claim 1, wherein the performing the delay analysis on the data traffic after the correlation matching and the lifetime identification is completed comprises:
carrying out large and small flow type distinguishing on the data flow according to the relevance matching result of the round-trip characteristic message;
if the data flow belongs to a small flow type, the data flow is sent to a set acquisition server in a data mirror image mode for delay analysis;
and if the data flow belongs to a large-flow type, preprocessing data and then uploading the data to a set acquisition server for time delay analysis.
6. The method according to claim 4, wherein the performing a delay analysis on the data traffic after the correlation matching and the lifetime identification is completed to determine the location of the network fault comprises:
if the delay ratio of the data transmission delay exceeds a threshold value, judging that a network side has a fault, and sending the bandwidth utilization rate and the packet loss rate of the network switch to a network operation and maintenance end for emergency treatment;
and if the delay ratio of the data preparation delay and/or the data confirmation delay exceeds a threshold value, judging that the end side has a fault, and informing the system operation and maintenance end to check the client and the storage array.
7. A network fault location device, comprising:
the information acquisition module is used for acquiring data traffic forwarded by the edge access switch, analyzing messages and determining the data traffic with possible congestion;
the data volume correlation module is used for judging whether the data flow with the possibility of congestion is a micro-burst data flow or not, and if not, performing correlation matching and life cycle identification on the round-trip characteristic message of the data flow;
and the delay fault analysis module is used for performing delay analysis on the data traffic after the relevance matching and the life cycle identification are completed, and determining the network fault position.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the network fault location method of any of claims 1 to 6 are implemented when the program is executed by the processor.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the network fault localization method of any one of claims 1 to 6.
10. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the network fault localization method of any one of claims 1 to 6.
CN202210507852.4A 2022-05-11 2022-05-11 Network fault positioning method and device Pending CN115604089A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210507852.4A CN115604089A (en) 2022-05-11 2022-05-11 Network fault positioning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210507852.4A CN115604089A (en) 2022-05-11 2022-05-11 Network fault positioning method and device

Publications (1)

Publication Number Publication Date
CN115604089A true CN115604089A (en) 2023-01-13

Family

ID=84842115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210507852.4A Pending CN115604089A (en) 2022-05-11 2022-05-11 Network fault positioning method and device

Country Status (1)

Country Link
CN (1) CN115604089A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118101526A (en) * 2024-04-24 2024-05-28 沈阳蓝巨人网络科技有限公司 Safety monitoring method and system based on information communication technology

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118101526A (en) * 2024-04-24 2024-05-28 沈阳蓝巨人网络科技有限公司 Safety monitoring method and system based on information communication technology

Similar Documents

Publication Publication Date Title
EP3780702A1 (en) Method and device for monitoring network data
CN107294808B (en) Interface test method, device and system
CN111740860A (en) Log data transmission link monitoring method and device
CN111181800B (en) Test data processing method and device, electronic equipment and storage medium
CN111782470A (en) Distributed container log data processing method and device
CN109947585A (en) The processing method and processing device of PCIE device failure
CN111611129B (en) Performance monitoring method and device of PaaS cloud platform
CN109962827B (en) Equipment link detection method, device, equipment and readable storage medium
CN105391573A (en) Data acquisition system, data analysis system and monitoring analysis system based on intelligent terminal
CN112564954A (en) Network quality prediction method and device
CN107306200B (en) Network fault early warning method and gateway for network fault early warning
CN115604089A (en) Network fault positioning method and device
CN113190516B (en) Data synchronization monitoring method and device
CN111782473A (en) Distributed log data processing method, device and system
CN103517292A (en) Mobile terminal information reporting method and apparatus
CN114253710A (en) Processing method of computing request, intelligent terminal, cloud server, equipment and medium
CN103078905A (en) Communication management method of GPS (Global Position System) terminal
CN111741007A (en) Financial business real-time monitoring system and method based on network layer message analysis
CN115080363B (en) System capacity evaluation method and device based on service log
CN116260747A (en) Monitoring method and device of terminal test equipment and electronic equipment
CN112217944B (en) Online ticket processing method, device, equipment and storage medium
CN103269379A (en) Water source information acquisition system and water source information uploading method
CN111553497A (en) Equipment working state detection method and device of multimedia terminal
CN214014540U (en) Wireless network test system
CN112101810A (en) Risk event control method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination