CN111211926B

CN111211926B - Communication fault monitoring method and device, storage medium and equipment

Info

Publication number: CN111211926B
Application number: CN201911407462.4A
Authority: CN
Inventors: 齐永杰; 孙艳杰
Original assignee: Hangzhou DPTech Technologies Co Ltd
Current assignee: Hangzhou DPTech Technologies Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2023-01-24
Anticipated expiration: 2039-12-31
Also published as: CN111211926A

Abstract

The specification provides a communication fault monitoring method, a communication fault monitoring device, a storage medium and equipment. The method is suitable for fault detection of a network constructed based on an Internet protocol stack, and by introducing health monitoring technologies of a network layer, a transmission layer and an application layer from bottom to top, a detection result is output when the fault of the current layer is judged, health monitoring of the previous layer is triggered when the fault of the current layer is judged, and the fault of which layer is the fault is quickly positioned, so that whether the fault is server abnormity, network abnormity or other reasons is determined, and convenience is provided for operation and maintenance personnel to remove network faults.

Description

Communication fault monitoring method and device, storage medium and equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a storage medium, and a device for monitoring a communication fault.

Background

As the internet has entered new stages of development, its volume and demand have increased, with a consequent significant increase in the workload of operation and maintenance of network devices. At this time, the monitoring requirement on the network equipment is higher, more information is expected to be obtained, and when the network equipment fails, the reason of the failure needs to be known immediately so as to solve the problem in time. However, when most of the network devices have problems, operation and maintenance personnel are required to analyze fault causes by integrating various information, which wastes time.

The load balancer health monitoring technology is a technology for monitoring the server state in real time, and the monitoring result is an important parameter to be considered in load balancing scheduling. At present, health monitoring technologies are dozens of types, and are generally named according to different protocol types, such as HTTP health monitoring, SIP health monitoring, TCP health monitoring, ICMP health monitoring and the like. In general, the internet protocol stack services targeted by different health monitors are different, and the health monitoring functions are independent of each other. This results in that most load balancers currently have a health monitoring function, but this function can only display whether the monitored state of the real server is normal or failed, and cannot quickly give the cause of the failure. This makes the operation and maintenance personnel less efficient at troubleshooting the network.

Disclosure of Invention

In order to overcome the problems in the related art, the present specification provides a method, an apparatus, and a device for monitoring a communication failure.

According to a first aspect of the embodiments of the present specification, there is provided a communication fault monitoring method, which is suitable for fault detection of a network constructed based on an internet protocol stack, and includes the steps of:

when the communication fault occurs in the running application, starting from a network layer, triggering a fault monitoring event of the layer, wherein the fault monitoring event is realized based on a health monitoring technology;

if the monitoring result of the fault monitoring event determines that the fault is not the fault of the current layer, triggering the fault monitoring event of the previous layer;

and if the fault is determined to be the fault of the current layer, outputting a detection result.

In some examples, the health monitoring technology of the network layer includes:

ICMP health monitoring.

In some examples, the layer above the network layer is a transport layer, and the health monitoring technology of the transport layer includes any one of:

TCP health monitoring, UDP health monitoring.

In some examples, the layer above the transport layer is an application layer, and the health monitoring technology of the application layer includes any one of the following:

HTTP health monitoring, SIP health monitoring, FTP health monitoring.

In some examples, when the monitoring result of the fault monitoring event of the transport layer determines that the fault is a fault of the layer, the method includes:

and configuring a request message of specified content based on the health monitoring technology of the transmission layer, comparing the content of the received response message with the specified content, and determining the reason of the fault according to the comparison result.

In some examples, the output detection result is a fault log.

In some examples, the method further comprises:

and displaying an interface control, and displaying the fault log when a user triggers the interface control.

According to a second aspect of embodiments of the present specification, there is provided a communication failure monitoring apparatus adapted to detect a failure of a network constructed based on an internet protocol stack, the apparatus including:

the monitoring module is used for triggering a fault monitoring event of the current layer from the network layer when the running application has a communication fault, wherein the fault monitoring event is realized based on a health monitoring technology, and if the detection result of the fault monitoring event determines that the fault is not the fault of the current layer, the fault monitoring event of the previous layer is triggered;

and the output module is used for outputting the detection result when the fault is determined to be the fault of the current layer.

In some examples, the apparatus further comprises:

and the display module is used for displaying the output monitoring result in a fault log mode.

According to a third aspect of the embodiments of the present specification, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of the embodiments of the specification when executing the program.

The technical scheme provided by the embodiment of the specification can have the following beneficial effects:

in the embodiment of the specification, a method, a device, a storage medium and equipment for monitoring communication faults are disclosed. The method is suitable for fault detection of a network constructed based on an Internet protocol stack, and by introducing a health monitoring technology of a network layer, a transmission layer and an application layer from bottom to top, a detection result is output when the fault of the current layer is judged, and health monitoring of the previous layer is triggered when the fault of the current layer is not judged, so that the fault of which layer is quickly positioned, and the fault is determined to be server abnormity, network abnormity or other reasons, thereby providing convenience for operation and maintenance personnel to remove network faults.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with this specification and, together with the description, serve to explain the principles of the specification.

FIG. 1 is a flow chart illustrating a method of monitoring for communication faults in accordance with an exemplary embodiment;

FIG. 2 is a flow chart illustrating another method of monitoring for communication faults in accordance with an exemplary embodiment of the present description;

fig. 3 is a hardware configuration diagram of a computer device in which a communication failure monitoring apparatus according to an embodiment of the present disclosure is located;

FIG. 4 is a block diagram of a communication failure monitoring device shown in accordance with an exemplary embodiment of the present description;

fig. 5 is a block diagram of another communication failure monitoring device shown in the present specification according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the claims that follow.

The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present description. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.

The health monitoring technology is a technology for monitoring the state of a server in real time, the result of health monitoring can be taken as an important parameter to be considered during load balancing scheduling, and generally, only a real server with a normal monitoring result can be scheduled. The working principle of health monitoring is to send a request to a real server with the identity of the client. There are more than ten health monitoring technologies, which are generally named according to different protocol types, including HTTP health monitoring, SIP health monitoring, TCP health monitoring, ICMP health monitoring, and the like. Different types of health monitoring technologies monitor different server states, and send different request contents.

The internet protocol stack has five layers, from top to bottom: application layer, transport layer, network layer, link layer, and physical layer. The application layer is where the network application program and its application layer protocol persist, the processes running on different hosts use the application layer protocol to communicate, and the information packets located at the application layer are called messages; the transmission layer provides service for transmitting application layer messages between application program end points and is responsible for providing data transmission service between application program processes for a signal source and a signal sink, and the transmission layer is grouped into message segments; the network layer is responsible for moving network layer packets called datagrams from one host to another; the link layer is responsible for encapsulating the IP datagram into a frame format suitable for transmission on a physical network and transmitting the frame format, or decapsulating a frame received from the physical network, taking out the IP datagram and transmitting the IP datagram to the network layer; the physical layer is responsible for transmitting the bit stream between nodes, i.e. for physical transmission.

In general, the internet protocol stack services targeted by different health monitoring are different, and the health monitoring modes of the respective layers are independent from each other. In the related art, when communication failure occurs in an application, the failure is monitored by a health monitoring technology based on an application layer, but the failure reason is server abnormality, network failure or other reasons, so that operation and maintenance personnel are often required to synthesize various information to analyze the failure reason, obviously, time is wasted, and great loss is easily caused because the failure cannot be timely eliminated.

The following provides a detailed description of examples of the present specification.

As shown in fig. 1, fig. 1 is a flowchart illustrating a method for monitoring communication faults according to an exemplary embodiment, where the method is applied to fault detection of a network constructed based on an internet protocol stack, and includes the following steps:

in step 101, when a communication fault occurs in an application in operation, starting from a network layer, triggering a fault monitoring event of the layer, wherein the fault monitoring event is realized based on a health monitoring technology;

in step 102, if the monitoring result of the fault monitoring event determines that the fault is not a fault of the current layer, triggering a fault monitoring event of the previous layer;

the cause of a communication failure occurring in an application in operation may include a server anomaly or a network anomaly, where the server anomaly may include an application layer failure, and the network anomaly may include a transport layer failure or a network layer failure.

Firstly, a fault monitoring event of a reference network layer judges whether the fault is a network anomaly or not. In some examples, health monitoring techniques at the network layer may include: ICMP health monitoring. Specific ways of this health monitoring technique may include: sending an echo request to the server, waiting for an echo response of the server, and if the echo response is received and the contents in the echo response are all corresponding, judging that the detection result of the ICMP health monitoring is normal; otherwise, the detection result of the ICMP health monitoring is a fault, namely the fault is determined to be a network layer fault.

When it is determined that the failure is a network layer failure, it may be determined that the failure is a network anomaly. Network anomalies typically have several possibilities: interface configuration issues for network devices; a network protocol configuration error; traffic congestion, and the like. In order to determine a specific failure cause, in some examples, the present embodiment further includes: and when the fault is determined to be a network layer fault, changing the type of the loopback request, and judging which network is abnormal according to the content of the loopback response.

When the detection result of the ICMP health monitoring is normal, it may be determined that the network layer is normal, and then a fault monitoring event of a layer above the network layer is referred to determine whether the fault is a fault of the layer. The layer above the network layer is a transport layer, and in some examples, the health monitoring techniques of the transport layer may include any of: TCP health monitoring, UDP health monitoring. Firstly, judging whether an application program of an application layer needs to be connected, starting TCP health monitoring if the application program needs to be connected, and starting UDP health monitoring if the application program does not need to be connected. Specific modes of the TCP health monitoring technology may include: monitoring whether the TCP connection can be established, if so, determining that the fault is not the fault of the current layer, and if not, determining that the fault is the fault of the current layer; specific modes of the UDP health monitoring technology may include: and sending a UDP message to the server, if the returned error information is not received, determining that the fault is not the fault of the current layer, and if the returned error information is received, determining that the fault is the fault of the current layer.

Similarly, when it is determined that the failure is a transport layer failure, in order to determine a specific failure cause, in some examples, when it is determined that the failure is a transport layer failure, a request message of specified content is configured based on a health monitoring technology of the transport layer, content of a received response message is compared with the specified content, state information of a current server application program is obtained, and the specific cause of the failure is determined according to a comparison result.

And when the fault is judged not to be the fault of the transmission layer, referring to a fault monitoring event of the upper layer of the transmission layer to judge whether the fault is the fault of the layer. The layer above the transport layer is the application layer, and in some examples, the health monitoring techniques of the application layer may include any of: HTTP health monitoring, SIP health monitoring, FTP health monitoring. Specific ways of health monitoring counting at the application layer may include: sending a request to a server; waiting for a server response message, and if the response message is not received, determining that the fault is a local layer fault; if the response message is received, analyzing the content of the message, matching according to the analyzed content and the configured content, and further determining whether the fault is the local layer fault according to the matching result.

And when the judgment result is that the application layer fails, the failure is server abnormity. Similar to the previous network layer, transport layer, in some examples, when the failure is determined to be an application layer failure, the specific cause of the failure may be determined based on the configuration content matched by the health monitoring technology of the application layer. Generally, configuration content needing to be matched for SIP health monitoring is a status code, configuration content needing to be matched for FTP health monitoring is a user name and a password, and configuration content needing to be matched for HTTP health monitoring comprises the status code, a normal service identifier and an abnormal service identifier. Matching is detailed by taking HTTP health monitoring as an example: if the state code in the server response message is not matched with the state code of the HTTP health monitoring configuration, the monitoring result is a fault; if the content in the server response message is matched with the abnormal service identifier, the monitoring result is a fault; and if the content in the server response message is not matched with the normal service identifier, the monitoring result is a fault. When the above situation is monitored, the cause of the failure has been determined. The above is illustrated by an example: when one application fails, the load balancer actively sends a request of HTTP health monitoring to the server and checks the response to realize the health monitoring of the server, in the request of HTTP health monitoring configuration, a command of matching a state code with a response code of 200 is configured, when the response code in a server response message is 2XX, the server is considered to be matched, the server is judged to pass the health monitoring, and the state of the server is normal; when the response code in the server response message is 500, it is determined that the server fails the health monitoring and the status thereof is a fault, and the fault cause is determined according to the type of the defined status code, for example, when the defined status code is 500 means that the server has an internal error, it is determined that the fault cause is that the server has an error and cannot reply the request of the load balancer.

When the fault is not the application layer fault, the server can be judged to be normal, and at the moment, the network and the server can be judged to be in normal states.

In step 103, if the fault is determined to be a local layer fault, a detection result is output.

In some examples, the output detection result mentioned in this step may be displayed in the form of a fault log; in some examples, the method further comprises the following steps: and displaying an interface control, and displaying the fault log when a user starts the interface control. By setting the interface control for displaying the fault log, the monitored fault reason is output to the log, so that the log is convenient to check, and the method is helpful for providing useful information to help solve problems during communication faults.

The embodiment of the description aims at the obvious defects of the existing health monitoring function in the aspect of communication fault elimination, the communication fault reason is determined by referring to the health monitoring of the Internet protocol stack layer from bottom to top, the health monitoring function obtains a simple network diagnosis function, and the user-defined input content can be automatically changed to determine the fault reason, so that the fault reason can be found in time, and a user can conveniently eliminate the fault in time.

To facilitate an understanding of the embodiments listed in the present application, a specific application example is described below, which takes a browser as an example for accessing a web server. As shown in fig. 2, when the browser attempts to communicate with the web server after starting, and the web server fails to return to the web page, the flow of monitoring communication failure by the load balancer is triggered:

step 201, the load balancer detects the state of the network layer of the server based on the ICMP health monitoring, and the specific detection mode includes: firstly, sending an echo request to a server, waiting for an echo response of the server, if the echo response is received and the contents in the echo response are all corresponding, judging that the detection result of the ICMP health monitoring is normal, and executing a step 203; otherwise, the detection result of the ICMP health monitoring is a failure, that is, it is determined that the failure is a network layer failure, step 202 is executed;

step 202, checking the specific reason of the network layer fault based on the health monitoring technology of the network layer fault by changing the loopback request type, determining the specific reason of the fault according to the content of the loopback response, and executing step 209;

step 203, judging whether the application layer application program needs to be connected, if so, executing step 204, otherwise, executing step 205;

step 204, the load balancer detects the state of the server transport layer based on the TCP health monitoring, and the specific detection mode includes: monitoring whether the TCP connection can be established or not by TCP health monitoring, if so, determining that the fault is not the local fault, and executing step 207; if the fault cannot be established, determining that the fault is the local layer fault, and executing step 206;

step 205, the load balancer detects the state of the server transport layer based on UDP health monitoring, and the specific detection method includes: sending a UDP message to the server, if the returned error report information is not received, determining that the fault is not the fault of the local layer, and executing step 207; if the returned error information is received, determining that the fault is a local layer fault, and executing step 206;

step 206, configuring a request message of specified content based on the health monitoring technology of the transmission layer, comparing the content of the received response message with the specified content to obtain the state information of the current server application program, determining the specific reason of the fault according to the state information, and executing step 209;

step 207, the load balancer detects the application layer state based on HTTP health monitoring, and the specific detection mode includes: firstly, a load balancer and a web server establish TCP connection; if the connection is successful, sending an HTTP HEAD request to the server; waiting for a server response message; if no response message is received, determining that the fault is a local layer fault, and executing step 209; if a response message is received, analyzing the content of the message, matching according to the analyzed content, the configured state code and the service identifier, determining whether the fault is the fault of the current layer according to the matching result, and executing the step 208 when the fault is determined to be the fault of the current layer; if the fault is determined not to be the fault of the current layer, judging that the network and the server are in normal states, and executing step 209;

step 208, determining the reason of the fault according to the configuration content and the matching result, and executing step 209;

and 209, outputting a detection result, displaying the fault reason in a log form, displaying a control, and starting the control to output the detected fault reason to the equipment log for reference of a developer.

According to the application example, through the health monitoring from bottom to top, the fault reason can be quickly positioned, the problem can be found in time, and the fault can be conveniently eliminated by developers.

Corresponding to the embodiment of the method, the specification also provides an embodiment of a communication fault monitoring device and a terminal applied by the communication fault monitoring device.

The embodiment of the communication fault monitoring device in the specification can be applied to computer equipment, such as a server or terminal equipment. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for operation through the processor in which the file processing is located. In terms of hardware, as shown in fig. 3, a hardware structure diagram of a computer device where a communication fault monitoring apparatus according to an embodiment of the present disclosure is located is shown, except for the processor 510, the memory 530, the network interface 520, and the nonvolatile memory 540 shown in fig. 3, a server or an electronic device where the apparatus 531 is located in the embodiment may also include other hardware according to an actual function of the computer device, which is not described again.

Accordingly, the embodiments of the present specification also provide a computer storage medium, in which a program is stored, and the program, when executed by a processor, implements the method in any of the above embodiments.

Embodiments of the present description may take the form of a computer program product embodied on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having program code embodied therein. Computer-usable storage media include permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of the storage medium of the computer include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.

Fig. 4 is a block diagram illustrating a communication failure monitoring apparatus according to an exemplary embodiment, where the apparatus is adapted to detect a failure of a network constructed based on an internet protocol stack, and includes:

a monitoring module 41, configured to, when a communication failure occurs in an application in operation, start from a network layer, trigger a failure monitoring event of the current layer, where the failure monitoring event is implemented based on a health monitoring technology, and trigger a failure monitoring event of a previous layer if a detection result of the failure monitoring event determines that the failure is not a failure of the current layer;

and the output module 42 is configured to output a detection result when the fault is determined to be the local layer fault.

Fig. 5 is a block diagram illustrating another communication failure monitoring apparatus according to an exemplary embodiment, which is suitable for failure detection of a network constructed based on an internet protocol stack, and includes:

a monitoring module 51, configured to, when a communication failure occurs in an application in operation, start from a network layer, trigger a failure monitoring event of the current layer, where the failure monitoring event is implemented based on a health monitoring technology, and trigger a failure monitoring event of a previous layer if a detection result of the failure monitoring event determines that the failure is not a failure of the current layer;

an output module 52, configured to output a detection result when it is determined that the fault is a local layer fault;

and the display module 53 is configured to display the output monitoring result in the form of a fault log.

The implementation process of the functions and actions of each module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.

The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following the general principles of the specification and including such departures from the present disclosure as come within known or customary practice in the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.

It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.

The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A communication fault monitoring method is suitable for fault detection of a network constructed based on an Internet protocol stack, and is characterized by comprising the following steps:

when the communication fault occurs in the running application, starting from a network layer, taking the network layer, a transmission layer and an application layer as the local layer in sequence, and triggering a fault monitoring event of the local layer, wherein the fault monitoring event is realized based on a health monitoring technology;

and if the fault is determined to be the fault of the current layer, outputting a monitoring result, wherein the monitoring result is a fault log.

2. The method of claim 1, wherein the health monitoring technique of the network layer comprises:

ICMP health monitoring.

3. The method of claim 1, wherein the layer above the network layer is a transport layer, and wherein the health monitoring technology of the transport layer comprises any one of:

TCP health monitoring, UDP health monitoring.

4. The method of claim 3, wherein the layer above the transport layer is an application layer, and the health monitoring technology of the application layer comprises any one of the following:

HTTP health monitoring, SIP health monitoring, FTP health monitoring.

5. The method of claim 3, wherein when the monitoring result of the fault monitoring event of the transport layer determines that the fault is a local layer fault, the method further comprises:

configuring a request message of specified content based on the health monitoring technology of a transmission layer, comparing the content of the received response message with the specified content, and determining the reason of the fault according to the comparison result.

6. The method of claim 1, further comprising:

7. A communication failure monitoring apparatus adapted to detect a failure in a network constructed based on an internet protocol stack, the apparatus comprising:

the monitoring module is used for taking a network layer, a transmission layer and an application layer as the local layer in sequence from the network layer when the running application has a communication fault, triggering a fault monitoring event of the local layer, wherein the fault monitoring event is realized based on a health monitoring technology, and triggering a fault monitoring event of the previous layer if the fault is determined not to be the fault of the local layer by the monitoring result of the fault monitoring event;

and the output module is used for outputting a monitoring result when the fault is determined to be the fault of the current layer, wherein the monitoring result is a fault log.

8. The apparatus of claim 7, further comprising:

and the display module is used for displaying the output monitoring result of the output module in a fault log mode.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method of any one of claims 1 to 6.