CN115242688A

CN115242688A - Network fault detection method, device and medium

Info

Publication number: CN115242688A
Application number: CN202210890235.7A
Authority: CN
Inventors: 单云凡
Original assignee: Jinan Inspur Data Technology Co Ltd
Current assignee: Zhengzhou Inspur Data Technology Co Ltd
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-10-25

Abstract

The application discloses a network fault detection method, a device and a medium, wherein the method comprises the steps of confirming respective computing nodes where a target virtual machine and a source virtual machine are located according to IDs (identity) of the target virtual machine and the source virtual machine, sending request information and basic information of the target virtual machine and the source virtual machine to the corresponding computing nodes, so that each computing node can generate and capture virtual flow at a port of the corresponding virtual machine according to the basic information and analyze the virtual flow according to the captured virtual flow; the basic information comprises network card information and network types of the virtual machines; and receiving the analysis result fed back by each computing node. By adopting the technical scheme, the control node positions the computing nodes where the target virtual machine and the source virtual machine are located, the computing nodes are controlled to generate and capture virtual flow at the ports of the corresponding virtual machines, and the computing nodes can also analyze the captured virtual flow, so that the position of a fault is judged, and the detection of the network link fault between the target virtual machine and the source virtual machine is realized.

Description

Network fault detection method, device and medium

Technical Field

The present application relates to the field of cloud server technologies, and in particular, to a method, an apparatus, and a medium for detecting a network fault.

Background

In the using process of the OpenStack cloud platform, the situation that networks cannot be accessed to each other occurs among partial virtual machines in the OpenStack cloud platform is inevitable, and service use is seriously influenced. When a network link fault occurs in a traditional OpenStack cloud platform, a user can hardly get rid of the problem, so that the user needs to seek for the help of cloud platform operation and maintenance personnel when the network link fault occurs at every time, the operation and maintenance personnel can possibly find research and development personnel downwards to intervene, the time spent on solving the problem is too long, the efficiency is low, and the user experience is seriously influenced.

Therefore, how to timely locate the position of the network link fault and improve the operation and maintenance efficiency of the cloud platform network link part is a problem to be solved urgently by technical personnel in the field.

Disclosure of Invention

The application aims to provide a network fault detection method, a network fault detection device and a network fault detection medium, which are used for timely positioning the position of a network link fault, improving the operation and maintenance efficiency of a cloud platform network link part, reducing time waste and improving the use experience of a user.

In order to solve the above technical problem, the present application provides a network fault detection method, which is applied to a control node, and includes:

acquiring IDs of a target virtual machine and a source virtual machine;

inquiring directory information of the target virtual machine and the source virtual machine according to the ID so as to confirm respective computing nodes of the target virtual machine and the source virtual machine;

sending request information and basic information of the target virtual machine and the source virtual machine to the corresponding computing nodes, so that each computing node generates and captures virtual traffic at a port of the corresponding virtual machine according to the basic information and analyzes the virtual traffic according to the captured virtual traffic; the basic information comprises network card information and network types of the virtual machines;

and receiving the analysis result fed back by each computing node.

Preferably, after the step of receiving the analysis result fed back by each of the computing nodes, the method further includes:

carrying out classification polling comparison on the analysis result and data in an expert database;

sending the detection result to the user; the detection result comprises a comparison result and a troubleshooting suggestion.

Preferably, the generating and capturing, by each of the computing nodes, virtual traffic at a port of a corresponding virtual machine according to the basic information includes:

the computing node where the source virtual machine is located generates virtual traffic at a tap port of the source virtual machine according to the basic information, and captures the virtual traffic at a physical port of a host machine;

and the computing node where the target virtual machine is located performs virtual flow capture on a tap port of the target virtual machine and a physical port of a host machine according to the basic information.

Preferably, the method further comprises the following steps:

and if the data corresponding to the analysis result is not compared in the expert database, storing the analysis result into the expert database.

In order to solve the above technical problem, the present application further provides another network fault detection method, which is applied to a computing node, and includes:

receiving request information sent by a control node and basic information of a target virtual machine and a source virtual machine;

generating and capturing virtual flow at an interface of a corresponding virtual machine according to the basic information, and analyzing according to the captured virtual flow; the basic information comprises network card information and network types of the virtual machines;

and feeding back an analysis result to the control node.

Preferably, the generating and capturing the virtual traffic at the interface of the corresponding virtual machine according to the basic information includes:

performing virtual traffic generation at a tap port of the source virtual machine according to the basic information, and performing virtual traffic capture at a physical port of a host machine;

and performing virtual flow capture on a tap port of the target virtual machine and a physical port of the host machine according to the basic information.

In order to solve the above technical problem, the present application further provides a network fault detection apparatus, which is applied to a control node, and includes:

the acquisition module is used for acquiring the IDs of the target virtual machine and the source virtual machine;

the query module is used for querying the directory information of the target virtual machine and the source virtual machine according to the ID so as to confirm the respective computing nodes of the target virtual machine and the source virtual machine;

a sending module, configured to send request information and basic information of the target virtual machine and the source virtual machine to the corresponding computing nodes, so that each computing node generates and captures virtual traffic at a port of the corresponding virtual machine according to the basic information, and analyzes according to the captured virtual traffic; the basic information comprises network card information and network types of the virtual machines;

and the first receiving module is used for receiving the analysis result fed back by each computing node.

In order to solve the above technical problem, the present application further provides another network fault detection apparatus, applied to a compute node, including:

the second receiving module is used for receiving the request information sent by the control node and the basic information of the target virtual machine and the source virtual machine;

the processing module is used for generating and capturing virtual flow at an interface of a corresponding virtual machine according to the basic information and analyzing according to the captured virtual flow; the basic information comprises network card information and network types of the virtual machines;

and the feedback module is used for feeding back the analysis result to the control node.

In order to solve the above technical problem, the present application further provides another network fault detection apparatus, including a memory for storing a computer program;

a processor for implementing the steps of the network failure detection method as described above when executing the computer program.

To solve the above technical problem, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the network fault detection method as described above.

The network fault detection method provided by the application is applied to control nodes, the computing nodes where the target virtual machine and the source virtual machine are respectively located are confirmed according to the IDs of the target virtual machine and the source virtual machine, request information and basic information of the target virtual machine and the source virtual machine are sent to the corresponding computing nodes, so that each computing node can generate and capture virtual flow at the port of the corresponding virtual machine according to the basic information and can analyze the captured virtual flow; the basic information comprises network card information and network types of the virtual machines; and receiving the analysis result fed back by each computing node. Compared with the prior art that when a network link between virtual machines fails, a user needs to seek the help of cloud platform operation and maintenance personnel to waste a lot of time, by adopting the technical scheme, the control node positions the computing nodes where the target virtual machine and the source virtual machine are located, controls the computing nodes to generate and capture virtual flow at the ports of the corresponding virtual machines, and can analyze the captured virtual flow, so that the failure position is judged, and the detection of the network link failure between the target virtual machine and the source virtual machine is realized. The method is applied to the control node, the automatic identification of the interruption position of the link can be realized through the control node, the minute-level network fault detection can be achieved, the operation and maintenance efficiency of the cloud platform network link part can be improved, the time waste is reduced, and the use experience of a user is improved.

In addition, the network fault detection method, the network fault detection device and the medium applied to the computing node provided by the application correspond to the network fault detection method, and the effect is the same as that of the network fault detection method.

Drawings

In order to more clearly illustrate the embodiments of the present application, the drawings required for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained by those skilled in the art without inventive effort.

Fig. 1 is a flowchart of a network fault detection method according to an embodiment of the present application;

fig. 2 is a structural diagram of a network fault detection system according to an embodiment of the present disclosure;

fig. 3 is a flowchart of another network fault detection method provided in the embodiment of the present application;

fig. 4 is a structural diagram of a network fault detection apparatus according to an embodiment of the present application;

fig. 5 is a block diagram of another network failure detection apparatus provided in the embodiment of the present application;

fig. 6 is a block diagram of another network failure detection apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the present application.

OpenStack is today the most influential cloud computing management tool — managing the resource pool (servers, storage, and network) of the IaaS cloud through commands or Web-based visual control panels. OpenStack is an open source cloud computing management platform project and is a combination of a series of software open source projects. It is not a piece of software, this project is responsible for handling core cloud computing services including computing, networking, storage, identity and mirroring services, combined by several major components (Nova, neutron, swift, sender, keystone, gland). There are another ten more optional items that the user can bundle to package to create a unique, deployable cloud architecture. OpenStack provides scalable and resilient cloud computing services for private and public clouds. The project aims to provide a cloud computing management platform which is simple to implement, can be expanded on a large scale, is rich and has unified standards.

The core of the application is to provide a network fault detection method, a network fault detection device and a network fault detection medium, so that the position of a network link fault can be timely located, the operation and maintenance efficiency of a cloud platform network link part is improved, the time waste is reduced, and the use experience of a user is improved.

In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings.

Fig. 1 is a flowchart of a network fault detection method provided in an embodiment of the present application, and as shown in fig. 1, the method is applied to a control node, and includes:

s10: and acquiring the IDs of the target virtual machine and the source virtual machine.

S11: and inquiring directory information of the target virtual machine and the source virtual machine according to the IDs to confirm the respective computing nodes of the target virtual machine and the source virtual machine.

S12: sending the request information and the basic information of the target virtual machine and the source virtual machine to corresponding computing nodes so that each computing node can generate and capture virtual traffic at a port of the corresponding virtual machine according to the basic information and analyze the virtual traffic according to the captured virtual traffic; the basic information comprises network card information and network types of the virtual machines;

s13: and receiving the analysis result fed back by each computing node.

First, it should be noted that the method provided in the embodiment of the present application is applied to a control node, and both the control node and a computing node represent identities of host nodes in a cloud platform. One control node can control a plurality of computing nodes, and there may be a plurality of virtual machines in one computing node. The execution subject in this embodiment may be a network failure detection device applied to the control node, or may be a plug-in deployed in a server serving as the control node.

In step S10, when a user encounters a network link connectivity fault in a virtual machine during using an OpenStack cloud platform, first, after a network link fault detection function is opened in a partial network page of the OpenStack cloud platform, the ID of a target virtual machine and the ID of a source virtual machine are input to the control node.

The detection control plug-in icc-plugin in the control node firstly calls the API of Nova in OpenStack to query the information of the virtual machines according to the IDs of the two virtual machines, and obtains the computing nodes to which the two virtual machines respectively belong. And then, sending an REST API request for starting intelligent detection of network link faults and basic information of the related virtual machines, including network card information and network types of the virtual machines, to the icc-agents of the detection execution plug-in the computing node where the virtual machines are located, and starting to monitor a primary analysis result fed back by each icc-agent.

In specific implementation, after the detection execution plug-in icc-agent in each computing node receives the REST API request and the basic information of the virtual machine sent by the detection control plug-in icc-plug, a python pcap library is used in each interface related to the virtual machine in the host to generate and capture virtual traffic, and a possible network fault reason list is preliminarily judged according to the captured result and returned to the icc-plug for the icc-plug to perform deep analysis in combination with the expert library.

Preferably, the step of generating and capturing virtual traffic at the port of the corresponding virtual machine by each computing node according to the basic information includes:

a computing node where a source virtual machine is located generates virtual traffic at a tap port of the source virtual machine according to basic information, and captures the virtual traffic at a physical port of a host machine;

and the computing node where the target virtual machine is located captures virtual flow at a tap port of the target virtual machine and a physical port of the host machine according to the basic information.

Therefore, the virtual flow is captured at different nodes, so that the position of the network fault can be judged in more detail, and the subsequent analysis and comparison of the fault position are facilitated.

It can be seen that the analysis of the network fault cause relies on the expert database, and therefore in the specific implementation, the method further includes: and if the data corresponding to the analysis result is not compared in the expert database, storing the analysis result into the expert database.

Through the updating of the expert database, richer contrast data are provided, more accurate suggestions are provided for users, and the user experience is improved.

After the detection control plug-in icc-plugin receives the primary analysis results fed back by each detection execution plug-in icc-agent, the collected multiple primary analysis results are compared and analyzed in a classification polling mode by combining with the extensible expert base, next action is automatically selected according to the obtained failure root cause list, for example, if the expert base considers that the network link fails due to a firewall or a security group, the ecc-plugin automatically calls the corresponding OpenStack API interface to query and compare to obtain a final conclusion, and then the detection results and the troubleshooting suggestions are returned to the user.

The network fault detection method provided by the embodiment of the application is applied to a control node, confirms the respective computing nodes of a target virtual machine and a source virtual machine according to the IDs of the target virtual machine and the source virtual machine, and sends request information and basic information of the target virtual machine and the source virtual machine to the corresponding computing nodes, so that each computing node generates and captures virtual traffic at the port of the corresponding virtual machine according to the basic information and analyzes the virtual traffic according to the captured virtual traffic; the basic information comprises network card information and network types of the virtual machines; and receiving the analysis result fed back by each computing node. Compared with the prior art, when a network link between virtual machines fails, a user needs to seek for the help of cloud platform operation and maintenance personnel, so that a large amount of time is wasted. The method is applied to the control node, the automatic identification of the interruption position of the link can be realized through the control node, the minute-level network fault detection can be achieved, the operation and maintenance efficiency of the cloud platform network link part can be improved, the time waste is reduced, and the use experience of a user is improved.

Fig. 3 is a flowchart of another network fault detection method provided in the embodiment of the present application, and as shown in fig. 3, the method is applied to a compute node, and includes:

s20: receiving request information sent by a control node and basic information of a target virtual machine and a source virtual machine;

s21: generating and capturing virtual flow at an interface corresponding to the virtual machine according to the basic information, and analyzing according to the captured virtual flow; the basic information comprises network card information and network types of the virtual machines;

s22: and feeding back the analysis result to the control node.

In the above embodiments, the network fault detection method applied to the control node is described in detail, and the method corresponds to the method in this embodiment, and therefore is not described herein again.

The network fault detection method provided by the embodiment of the application is applied to a computing node, and is used for receiving request information sent by a control node and basic information of a target virtual machine and a source virtual machine; generating and capturing virtual flow at an interface corresponding to the virtual machine according to the basic information, and analyzing according to the captured virtual flow; the basic information comprises network card information and network types of the virtual machines; and feeding back the analysis result to the control node. Compared with the prior art, when a network link between virtual machines fails, a user needs to seek for the help of cloud platform operation and maintenance personnel, so that a large amount of time is wasted. The method is applied to the computing node, the minute-level network fault detection can be achieved by generating and capturing the virtual flow at the port of the virtual machine, the operation and maintenance efficiency of the cloud platform network link part can be improved, the time waste is reduced, and the use experience of a user is improved.

Similarly, on the basis of the foregoing embodiment, in this embodiment, the generating and capturing virtual traffic at the interface of the corresponding virtual machine according to the basic information includes:

performing virtual traffic generation at a tap port of a source virtual machine according to basic information, and performing virtual traffic capture at a physical port of a host machine;

By capturing the virtual flow at different nodes, the position of the network fault can be judged in more detail, and the subsequent analysis and comparison of the fault position are facilitated.

In the foregoing embodiment, a network fault detection method is described in detail, and the present application also provides an embodiment corresponding to a network fault detection apparatus. It should be noted that the present application describes the embodiments of the apparatus portion from two perspectives, one from the perspective of the function module and the other from the perspective of the hardware.

Fig. 4 is a structural diagram of a network fault detection apparatus provided in an embodiment of the present application, and as shown in fig. 4, the apparatus is applied to a control node, and includes:

the obtaining module 01 is used for obtaining the IDs of the target virtual machine and the source virtual machine;

the query module 02 is configured to query directory information of the target virtual machine and the source virtual machine according to the IDs, so as to confirm respective computing nodes where the target virtual machine and the source virtual machine are located;

the sending module 03 is configured to send the request information and the basic information of the target virtual machine and the source virtual machine to corresponding computing nodes, so that each computing node generates and captures virtual traffic at a port of the corresponding virtual machine according to the basic information, and analyzes the virtual traffic according to the captured virtual traffic; the basic information comprises network card information and network types of the virtual machines;

the first receiving module 04 is configured to receive an analysis result fed back by each computing node.

Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.

The network fault detection device provided by the embodiment of the application is applied to a control node, confirms the respective computing nodes where a target virtual machine and a source virtual machine are located according to the IDs of the target virtual machine and the source virtual machine, and sends request information and basic information of the target virtual machine and the source virtual machine to the corresponding computing nodes, so that each computing node generates and captures virtual traffic at the port of the corresponding virtual machine according to the basic information and analyzes the virtual traffic according to the captured virtual traffic; the basic information comprises network card information and network types of the virtual machines; and receiving the analysis result fed back by each computing node. Compared with the prior art that when a network link between virtual machines fails, a user needs to seek the help of cloud platform operation and maintenance personnel to waste a lot of time, by adopting the technical scheme, the control node positions the computing nodes where the target virtual machine and the source virtual machine are located, controls the computing nodes to generate and capture virtual flow at the ports of the corresponding virtual machines, and can analyze the captured virtual flow, so that the failure position is judged, and the detection of the network link failure between the target virtual machine and the source virtual machine is realized. The device is applied to the control node, the automatic identification of the interruption position of the link can be realized through the control node, the minute-level network fault detection can be achieved, the operation and maintenance efficiency of the cloud platform network link part can be improved, the time waste is reduced, and the use experience of a user is improved.

Fig. 5 is a structural diagram of another network fault detection apparatus provided in an embodiment of the present application, and as shown in fig. 5, the apparatus is applied to a computing node, and includes:

a second receiving module 10, configured to receive request information sent by a control node and basic information of a target virtual machine and a source virtual machine;

the processing module 11 is configured to generate and capture virtual traffic at an interface of a corresponding virtual machine according to the basic information, and analyze the virtual traffic according to the captured virtual traffic; the basic information comprises network card information and network types of the virtual machines;

and the feedback module 12 is used for feeding back the analysis result to the control node.

The network fault detection device provided by the embodiment of the application is applied to a computing node and used for receiving request information sent by a control node and basic information of a target virtual machine and a source virtual machine; generating and capturing virtual flow at an interface corresponding to the virtual machine according to the basic information, and analyzing according to the captured virtual flow; the basic information comprises network card information and network types of the virtual machines; and feeding back the analysis result to the control node. Compared with the prior art, when a network link between virtual machines fails, a user needs to seek for the help of cloud platform operation and maintenance personnel, so that a large amount of time is wasted. The device is applied to the computing node, the minute-level network fault detection can be achieved by generating and capturing the virtual flow at the port of the virtual machine, the operation and maintenance efficiency of the cloud platform network link part can be improved by using the method, the time waste is reduced, and the use experience of a user is improved.

Fig. 6 is a structural diagram of another network fault detection apparatus provided in an embodiment of the present application, and as shown in fig. 6, the apparatus includes: a memory 20 for storing a computer program;

a processor 21 for implementing the steps of the network failure detection method as described in the above embodiments when executing the computer program.

The network failure detection apparatus provided in this embodiment may include, but is not limited to, a smart phone, a tablet computer, a notebook computer, or a desktop computer.

The processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The Processor 21 may be implemented in hardware using at least one of a Digital Signal Processor (DSP), a Field-Programmable Gate Array (FPGA), and a Programmable Logic Array (PLA). The processor 21 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 21 may be integrated with a Graphics Processing Unit (GPU) which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 21 may further include an Artificial Intelligence (AI) processor for processing computational operations related to machine learning.

Memory 20 may include one or more computer-readable storage media, which may be non-transitory. Memory 20 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 20 is at least used for storing a computer program 201, wherein after being loaded and executed by the processor 21, the computer program can implement the relevant steps of the network fault detection method disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 20 may also include an operating system 202, data 203, and the like, and the storage manner may be a transient storage manner or a permanent storage manner. Operating system 202 may include, among others, windows, unix, linux, and the like. Data 203 may include, but is not limited to, request information, base information, and the like.

In some embodiments, the network failure detection device may further include a display 22, an input/output interface 23, a communication interface 24, a power supply 25, and a communication bus 26.

Those skilled in the art will appreciate that the configuration shown in fig. 6 does not constitute a limitation of the network fault detection apparatus and may include more or fewer components than those shown.

The network fault detection device provided by the embodiment of the application comprises a memory and a processor, wherein when the processor executes a program stored in the memory, the following method can be realized: provided is a network fault detection method.

Finally, the application also provides a corresponding embodiment of the computer readable storage medium. The computer-readable storage medium has stored thereon a computer program, which when executed by a processor implements the steps described in the above-described method embodiments (which may be a method for controlling node side correspondence, a method for computing node side correspondence, or a method for controlling node side correspondence and computing node side correspondence).

It is to be understood that if the method in the above embodiments is implemented in the form of software functional units and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium and executes all or part of the steps of the methods described in the embodiments of the present application, or all or part of the technical solutions. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The network fault detection method, device and medium provided by the present application are described in detail above. The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, without departing from the principle of the present application, the present application can also make several improvements and modifications, and those improvements and modifications also fall into the protection scope of the claims of the present application.

It should also be noted that, in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

Claims

1. A network fault detection method is applied to a control node and comprises the following steps:

acquiring IDs of a target virtual machine and a source virtual machine;

and receiving the analysis result fed back by each computing node.

2. The method according to claim 1, wherein after the step of receiving the analysis result fed back by each of the computing nodes, the method further comprises:

3. The method according to claim 2, wherein the step of generating and capturing virtual traffic at the port of the corresponding virtual machine by each of the computing nodes according to the basic information includes:

4. The method of claim 1, further comprising:

5. A network fault detection method is applied to a computing node and comprises the following steps:

and feeding back an analysis result to the control node.

6. The method according to claim 5, wherein the generating and capturing virtual traffic at the interface of the corresponding virtual machine according to the basic information comprises:

7. A network fault detection device is applied to a control node and comprises the following components:

8. A network fault detection device applied to a computing node comprises:

9. A network fault detection apparatus comprising a memory for storing a computer program;

a processor for implementing the steps of the network failure detection method according to any of claims 1 to 6 when executing said computer program.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the network fault detection method according to any one of claims 1 to 6.