WO2020244067A1 - 故障检测方法及相关设备 - Google Patents

故障检测方法及相关设备 Download PDF

Info

Publication number
WO2020244067A1
WO2020244067A1 PCT/CN2019/102769 CN2019102769W WO2020244067A1 WO 2020244067 A1 WO2020244067 A1 WO 2020244067A1 CN 2019102769 W CN2019102769 W CN 2019102769W WO 2020244067 A1 WO2020244067 A1 WO 2020244067A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual machine
machine
virtual
switch
heartbeat message
Prior art date
Application number
PCT/CN2019/102769
Other languages
English (en)
French (fr)
Inventor
李爽久
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020244067A1 publication Critical patent/WO2020244067A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route

Definitions

  • This application relates to the field of cloud computing technology, and in particular to a fault detection method and related equipment.
  • network function virtualization (NFV) products are widely used to implement network functions through software.
  • a common practice is to deploy virtual machines on multiple physical server machines and implement network communication through software functions.
  • the embodiments of the present application provide a fault detection method and related equipment, which can accurately detect the cause of a fault.
  • an embodiment of the present application provides a fault detection method applied to a control device, and the method includes:
  • the control device receives the first indication information sent by the switch, where the first indication information is used to indicate that the switch receives a first heartbeat message sent by the first virtual machine to the second virtual machine, and the first virtual machine is the first One of one or more virtual machines configured on a physical machine, and the second virtual machine is one of the one or more virtual machines configured on a second physical machine;
  • control device does not receive the second instruction information sent by the switch within a preset period of time after receiving the first instruction information, it is determined that the second virtual machine is faulty, and the second instruction The information is used to indicate that the switch has received a second heartbeat message sent by the second virtual machine to the first virtual machine, where the second heartbeat message is that the second virtual machine performs according to the first heartbeat Message generated.
  • an embodiment of the present application provides a fault detection method applied to a switch, and the method includes:
  • the switch receives the first heartbeat message sent by the first virtual machine to the second virtual machine, where the first virtual machine is one of one or more virtual machines configured on the first physical machine, and the second virtual machine is the first 2.
  • the first virtual machine is one of one or more virtual machines configured on the first physical machine
  • the second virtual machine is the first 2.
  • the switch sends the control
  • the device sends second indication information, where the second indication information is used to indicate that the switch receives the second heartbeat message sent by the second virtual machine to the first virtual machine, and the second heartbeat message The message is generated by the second virtual machine according to the first heartbeat message.
  • an embodiment of the present application provides a control device, which includes a module or unit for executing the fault detection method described in the first aspect.
  • the control device includes: a receiving unit and a processing unit.
  • the receiving unit is configured to receive first indication information sent by the switch, where the first indication information is used to indicate that the switch receives a first heartbeat message sent from a first virtual machine to a second virtual machine, and the first A virtual machine is one of one or more virtual machines configured on a first physical machine, and the second virtual machine is one of one or more virtual machines configured on a second physical machine;
  • a processing unit configured to determine that the second virtual machine is faulty if the receiving unit does not receive the second instruction information sent by the switch within a preset time period after receiving the first instruction information,
  • the second indication information is used to indicate that the switch receives a second heartbeat message sent by the second virtual machine to the first virtual machine, and the second heartbeat message is based on the second virtual machine according to The first heartbeat message is generated.
  • an embodiment of the present application provides a switch, which includes a module or unit for performing the fault detection method described in the second aspect.
  • the switch includes: a receiving unit and a sending unit.
  • the receiving unit is configured to receive a first heartbeat message sent by a first virtual machine to a second virtual machine, where the first virtual machine is one of one or more virtual machines configured on the first physical machine, and The second virtual machine is one of one or more virtual machines configured on the second physical machine;
  • a sending unit configured to send first indication information to a control device, where the first indication information is used to indicate that the switch receives the first heartbeat message sent by the first virtual machine to the second virtual machine ;
  • the sending unit is further configured to, if the receiving unit receives the second virtual machine sent to the first virtual machine within a preset time period after the sending unit sends the first indication information
  • a two-heartbeat message send second indication information to the control device, where the second indication information is used to indicate that the switch receives the second virtual machine sent to the first virtual machine.
  • a heartbeat message where the second heartbeat message is generated by the second virtual machine according to the first heartbeat message.
  • an embodiment of the present application provides a control device, including a processor, a communication interface, and a memory, and the processor is respectively connected to the memory and the communication interface.
  • the communication interface is used to communicate with other network devices (such as physical machines, switches)
  • the memory is used to store the implementation code of the fault detection method provided in the first aspect
  • the processor is used to execute the program code stored in the memory, that is, execute The fault detection method provided by the first aspect.
  • an embodiment of the present application provides a switch including: a processor, a communication interface, and a memory, and the processor is connected to the memory and the communication interface, respectively.
  • the communication interface is used to communicate with other network devices (such as physical machines, control devices)
  • the memory is used to store the implementation code of the fault detection method provided in the second aspect
  • the processor is used to execute the program code stored in the memory, namely Perform the fault detection method provided in the second aspect.
  • an embodiment of the present application provides a communication system, including a control device, multiple physical machines, and one or more switches.
  • the control device is the control device according to the third aspect or the fifth aspect
  • the switch is the switch according to the fourth or sixth aspect.
  • the multiple physical machines include the first physical machine and the second physical machine described in the first aspect or the second aspect, and the one or more switches include the switch described in the first or second aspect.
  • One or more virtual machines can be deployed on each physical machine.
  • One or more virtual machines deployed on the first physical machine include the aforementioned first virtual machine, and one or more virtual machines deployed on the second physical machine include the aforementioned first virtual machine.
  • Two virtual machines. Communication between different physical machines requires a switch.
  • the switch can identify the IP address carried in the message sent by the originating physical machine to find the corresponding receiving physical machine, and then send the message to the receiving physical machine.
  • the control device can perform failure detection of global physical machines and virtual machines, and can accurately identify which physical machines and virtual machines in the world have failed.
  • an embodiment of the present application provides a computer-readable storage medium with instructions stored on the readable storage medium, which when run on a processor, cause the processor to execute the description of the first or second aspect above The fault detection method.
  • the embodiments of the present application provide a computer program product containing instructions that, when run on a processor, cause the processor to execute the fault detection method described in the first or second aspect.
  • the control device can determine whether the virtual machine is faulty according to the time interval between the heartbeat message sent to the response between the virtual machines, for example, if the first virtual machine sends the first virtual machine to the second virtual machine. If the second virtual machine does not respond to the heartbeat message within a preset period of time after a heartbeat message, it can be recognized that the second virtual machine has failed. Therefore, it can accurately detect whether the source of the communication link failure is the virtual machine. Save time locating faults.
  • FIG. 1 is a schematic diagram of the architecture of a communication system provided by an embodiment of the application
  • FIG. 2 is a schematic diagram of the hardware structure of a network device provided by an embodiment of the application.
  • FIG. 3 is a schematic flowchart of a fault detection method provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of a logical structure of a control device provided by an embodiment of the application.
  • Fig. 5 is a schematic diagram of a logical structure of a switch provided by an embodiment of the application.
  • FIG. 1 is a schematic diagram of the architecture of a communication system related to an embodiment of the present application.
  • the communication system may be a cloud network system.
  • the communication system includes control equipment, switch (or physical switch) cluster and multiple physical machines.
  • one or more virtual machines can be configured or deployed on each physical machine, and the specific configuration number is determined by the control device.
  • Each physical machine has a physical network port.
  • Each virtual machine can correspond to a virtual network card.
  • Multiple virtual machines on the same physical machine correspond to the same physical network port.
  • Each virtual network card can have an independent IP address. Configured by the control device and sent to each physical machine.
  • the IP addresses of multiple virtual machines on the same physical machine can be under the same IP network segment, and the IP addresses of two physical machines that communicate can be under the same IP network segment.
  • the switch is used to forward packets or data transmitted between any two physical machines.
  • each physical machine has a physical network port.
  • the switch receives a message sent by a physical network port, it identifies the destination IP address or destination MAC address of the message, and then sends the message to the destination IP Another physical network port corresponding to the address or destination MAC address, so as to realize message communication between two physical machines.
  • the switch cluster can include one or more switches. If multiple switches are included in the switch cluster, there can be a master switch and a backup switch. When the master switch fails, the backup switch can take over from the master switch to continue data packet forwarding operations.
  • the control device is respectively connected with each physical machine and the main switch in the switch cluster.
  • the control device can allocate the IP network segments of each physical machine in the network and the IP addresses of all virtual machines on each physical machine.
  • the IP network segments of the two physical machines that need to communicate need to be under the same IP network segment, as long as the IP is in the same One network segment can realize Layer 2 communication.
  • the control device allocates the IP address, it can send the IP address mapping table to the switch and each physical machine.
  • Each physical machine can learn the IP address of each virtual machine on the physical machine at the communication peer through the IP address mapping table.
  • the switch can implement packet forwarding between physical machines through the IP address mapping table.
  • the control device mentioned in the embodiment of this application may be a software defined network (Software Defined Network, SDN) control device or other control device.
  • SDN Software Defined Network
  • the first physical machine and the second physical machine may be computers, servers, or other physical devices.
  • the embodiments of the present application take the first physical machine and the second physical machine among multiple physical machines, and the main switch among the multiple switches as examples for description.
  • the control device communicates with the main switch, the first physical machine and the second physical machine through a cloud network.
  • the heartbeat message is sent to the main switch, and the main switch receives the first virtual machine on the first physical machine and sends it to the first virtual machine on the second physical machine.
  • the first instruction information is sent to the control device, and then the opposite physical machine, that is, the second physical machine, is identified through the IP address mapping table, and the heartbeat message sent by the first physical machine is sent to the first Two physical machines.
  • the second virtual machine on the second physical machine After receiving the heartbeat message sent by the first virtual machine on the first physical machine, the second virtual machine on the second physical machine generates a second heartbeat message based on the first heartbeat message, and sends the second heartbeat message to the main switch After receiving the second heartbeat message, the main switch sends the second indication information to the control device. If the control device does not receive the second instruction information within the preset time period of receiving the first instruction information, it determines that the second virtual machine has a fault. If all virtual machines on the second physical machine have failed, it is determined that the second physical machine has failed to send. Therefore, this application can accurately identify whether the fault source is a virtual machine or a physical machine.
  • FIG. 2 shows a schematic diagram of the hardware structure of a network device provided by an embodiment of the present application.
  • the network device 200 may include a memory 201, a communication interface 202, and one or more processors 203. These components can be connected through the bus 204 or in other ways.
  • FIG. 2 takes the connection through the bus as an example. among them:
  • the memory 201 may be coupled with the processor 203 through a bus 204 or an input/output port, and the memory 201 may also be integrated with the processor 203.
  • the memory 201 is used to store various software programs and/or multiple sets of instructions.
  • the memory 201 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
  • the memory 201 may also store a network communication program, which may be used to communicate with one or more additional devices, one or more terminals, and one or more network devices.
  • the processor 203 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processing (DSP), an application specific integrated circuit (ASIC), or One or more integrated circuits configured to implement the embodiments of the present application.
  • the processor 203 may process data received through the communication interface 202.
  • the communication interface 202 is used for the network device 200 to communicate with other network devices, for example, a physical machine to communicate.
  • the communication interface 202 may be a transceiver, a transceiver circuit, etc., where the communication interface is a general term and may include one or more interfaces, such as an interface between a control device and a switch.
  • the communication interface 202 may include a wired interface and a wireless interface, such as a standard interface, Ethernet, and a multi-machine synchronization interface.
  • the processor 203 can be used to read and execute computer-readable instructions. Specifically, the processor 203 may be used to call data stored in the memory 201. Optionally, when the processor 203 sends any message or data, it specifically drives or controls the communication interface 202 to do the sending. Optionally, when the processor 203 receives any message or data, it specifically drives or controls the communication interface 202 to do the reception. Therefore, the processor 203 can be regarded as a control center that performs transmission or reception, and the communication interface 202 is a specific performer of transmission and reception operations.
  • the communication interface 202 is specifically configured to execute the steps of data transceiving involved in the following method embodiments, and the processor 203 is specifically configured to implement data processing steps other than the data transceiving.
  • the network device 200 may further include an output device and an input device.
  • the output device communicates with the processor 203 and can display information in a variety of ways.
  • the output device may be a liquid crystal display (Liquid Crystal Display, LCD), a light emitting diode (Light Emitting Diode, LED) display device, a cathode ray tube (Cathode Ray Tube, CRT) display device, or a projector (projector), etc.
  • the input device communicates with the processor 203 and can accept user input in various ways.
  • the input device may be a mouse, a keyboard, a touch screen device, or a sensor device.
  • the aforementioned network device 200 may be a general-purpose computer device or a special-purpose computer device.
  • the network device 200 may be a desktop computer, a portable computer, a network server, a PDA (Personal Digital Assistant, PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, an embedded device, or the device shown in Figure 2. Similar structure equipment.
  • PDA Personal Digital Assistant
  • the embodiment of the present application does not limit the type of the network device 200.
  • the control device in FIG. 1 may be the device shown in FIG. 2, and one or more software modules (such as an interaction module and a processing module) are stored in the memory of the control device.
  • the switch in FIG. 1 may also be the device shown in FIG. 2, and one or more software modules (such as interaction modules and processing modules) are stored in the memory of the switch.
  • the control device or the switch may implement the software module through the processor and the program code in the memory to implement the fault detection method involved in the following method embodiments.
  • FIG. 3 provides a schematic flow diagram of a fault detection method.
  • the fault detection method may include:
  • S301 The switch receives a first heartbeat message sent by the first virtual machine to the second virtual machine.
  • the first virtual machine is a virtual machine on the first physical machine, and one or more virtual machines are configured on the first physical machine.
  • the second virtual machine is a virtual machine on the second physical machine, and one or more virtual machines are configured on the second physical machine.
  • Each virtual machine on the first physical machine establishes a heartbeat link with a certain virtual machine on the second physical machine for transmitting heartbeat messages.
  • the following embodiment takes the transmission of heartbeat messages between the first virtual machine and the second virtual machine as an example, and uses the auxiliary control device to perform global fault detection.
  • the switch sends first indication information to the control device, and the control device receives the first indication information sent by the switch, where the first indication information is used to indicate that the switch receives the first heartbeat message sent by the first virtual machine to the second virtual machine .
  • the switch After the switch receives the heartbeat message, it needs to report the heartbeat transmission event to the control device to assist the control device in detecting whether a physical machine or virtual machine in the network fails.
  • the switch can identify the source IP address and the destination IP address of the first heartbeat message.
  • the source IP address is the IP address of the first virtual machine, the destination IP address and the IP address of the second virtual machine.
  • the first indication information may include the source IP address and the destination IP address of the first heartbeat packet.
  • the switch If the switch receives the second heartbeat message sent by the second virtual machine to the first virtual machine within a preset time period after sending the first instruction information, the switch sends the second instruction information to the control device, and the second instruction The information is used to indicate that the switch receives a second heartbeat message sent by the second virtual machine to the first virtual machine, and the second heartbeat message is generated by the second virtual machine according to the first heartbeat message.
  • the switch If the switch does not receive the second heartbeat message sent by the second virtual machine to the first virtual machine within the preset time period after sending the first instruction information, the switch will not send the second instruction information to the control device.
  • the switch after the switch sends the first heartbeat message to the second physical machine to which the second virtual machine belongs, it can start a timer, and if the second virtual machine receives the message sent to the first virtual machine before the timer expires For the second heartbeat message, the switch needs to report second indication information to the control device, indicating that the second virtual machine responded to the heartbeat message. If the second heartbeat message sent by the second virtual machine to the first virtual machine is not received after the timer expires, the switch will not report the second indication information to the control device, implicitly indicating that the second virtual machine has not responded to the heartbeat Message. Or, if the second heartbeat message sent by the second virtual machine to the first virtual machine is not received after the timer expires, the switch reports third indication information to the control device, indicating that the second virtual machine has not responded to the heartbeat message .
  • the control device may start a timer, and if the second instruction information sent by the switch is received before the timer ends, it is determined that the second virtual machine has not failed. If the second instruction information sent by the switch is not received after the timer expires, it is determined that the second virtual machine is faulty. Alternatively, if the third instruction information sent by the switch is received after the timer expires, indicating that the second virtual machine has not responded to the heartbeat message, it is determined that the second virtual machine has failed.
  • the above timers can all be in countdown mode.
  • the length of the countdown can be configured by the control device.
  • the control device can determine whether the virtual machine is faulty according to the time interval between the heartbeat message sent to the response between the virtual machines, for example, if the first virtual machine sends the first virtual machine to the second virtual machine. If the second virtual machine does not respond to the heartbeat message within a preset period of time after a heartbeat message, it can be identified that the second virtual machine has failed, and therefore, it can be accurately detected whether the source of the communication link failure is the virtual machine.
  • control device may further identify whether the source of the fault is a physical machine.
  • the control device can detect whether each virtual machine has failed in the above-mentioned manner. If it detects that all virtual machines on a certain physical machine have failed, it is determined that the physical machine has failed. A failure of a physical machine will cause all virtual machines on the physical machine to fail to respond to heartbeat packets normally.
  • the control device After implementing the embodiments of this application, after the control device determines that all virtual machines on a physical machine have failed according to the time interval between the heartbeat message between the virtual machines from sending to the response, it can further determine the failure of the communication link
  • the source is the physical machine, so the fault source that causes the communication link between the virtual machines to fail can be quickly and effectively found, saving time for locating the fault.
  • the identification of the faulty virtual machine and/or the identification of the physical machine can be summarized and output to the management personnel (or operation and maintenance personnel), and the management personnel Further detect the cause of the fault and repair the faulty virtual or physical machine. For example, if the physical machine is faulty caused by the operating system of the physical machine, the administrator will solve the system problem. If the physical machine is caused by the hardware circuit problem of the physical machine If a fault occurs, the management personnel solve the circuit problem to restore the normal operation of the faulty physical machine. For another example, if a virtual machine failure is caused by a virtual machine configuration problem, the administrator solves the configuration problem to restore the normal operation of the failed virtual machine.
  • the network segment of each physical machine and the IP address of the virtual machine on each physical machine can be configured by the control device, which allocates IP addresses to all virtual machines on the first physical machine and all virtual machines on the second physical machine.
  • the IP addresses of all virtual machines on the first physical machine and the IP addresses of all virtual machines on the second physical machine are sent to the switch, and the IP address of the second virtual machine is used by the switch to send the first heartbeat message to the second physical machine machine.
  • the control device also sends the IP addresses of all virtual machines on the first physical machine to the first physical machine, and sends the IP addresses of all virtual machines on the second physical machine to the second physical machine.
  • each virtual machine is associated with the opposite virtual machine
  • the control device will also send the IP address of the virtual machine associated with each virtual machine on the first physical machine to the first physical machine, and send each virtual machine on the second physical machine to the first physical machine.
  • the IP address of the virtual machine associated with the machine is sent to the second physical machine.
  • the control device needs to configure the IP subnet segment for the first physical machine and the second physical machine respectively, and ensure that the IP subnet segment of the first physical machine and The IP subnet segment of the second physical machine is in the same network segment.
  • the IP subnet segment allocated by the control device for the first physical machine is 192.168.1.X
  • the IP subnet segment allocated by the control device for the second physical machine is 192.168.2.X
  • both subnet segments are located in the network.
  • the control device configures IP subnet segments for the first physical machine and the second physical machine respectively, it also needs to allocate the IP addresses of several virtual machines for the first physical machine and the second physical machine, and the number of virtual machines is controlled Equipment decision. And configure the mapping table of each virtual machine IP address of the first physical machine and each virtual machine IP address of the second physical machine. For example, the IP subnet segment of the first physical machine is 192.168.1.X, and the IP subnet segment of the second physical machine is 192.168.2.X, and the control device allocates 3 for the first physical machine and the second physical machine. The IP address of the virtual machine, and the association relationship between the IP address of each virtual machine in the first physical machine and the IP address of each virtual machine in the second physical machine is configured.
  • association means that if the assigned IP addresses of the virtual machines on two physical machines are associated, the two virtual machines have established a heartbeat link and need to send and respond to heartbeat packets to each other.
  • the format and content of the IP address mapping table can be, for example, but not limited to, as shown in Table 1 below.
  • control device after the control device allocates an IP subnet segment for each physical machine and an IP address mapping table for virtual machines on each physical machine, it can send the IP address mapping table to the switch, and the switch reports according to the IP address mapping table. Text forwarding.
  • the IP address mapping table sent by the control device to the switch may be, for example, but not limited to, shown in Table 1.
  • control device also needs to send the IP subnet segment allocated to each physical machine and the IP address mapping table of the virtual machine to each physical machine.
  • the control device needs to send the IP subnet segment 192.168.1.X allocated to the first physical machine to the first physical machine, and also needs to send the IP address of each virtual machine on the first physical machine and its The IP address mapping table of the corresponding virtual machine is sent to the first physical machine.
  • the IP address mapping table sent by the control device to the first physical machine may be, for example, but not limited to, as shown in Table 2 below.
  • IP address mapping table sent by the control device to the first physical machine may also be as shown in Table 1.
  • the control device needs to send the IP subnet segment 192.168.2.X allocated for the second physical machine to the second physical machine, and also needs to send the IP address of each virtual machine on the second physical machine and its corresponding virtual machine
  • the IP address mapping table is sent to the second physical machine.
  • the IP address mapping table sent by the control device to the second physical machine may be, for example, but not limited to, as shown in Table 3 below.
  • IP address mapping table sent by the control device to the second physical machine may also be as shown in Table 1.
  • the physical machine after receiving the IP address mapping table sent by the control device, the physical machine creates a virtual machine according to the IP address mapping table.
  • the first physical machine receives the IP address mapping table shown in Table 2 sent by the control device
  • the first physical machine The physical machine creates three virtual machines according to Table 2, namely, virtual machine 1, virtual machine 2, and virtual machine 3, and assigns IP addresses to each virtual machine according to the virtual machine IP address assigned to it by the control device.
  • the IP address configured for virtual machine 1 is: 192.168.1.102
  • the IP address configured for virtual machine 2 is: 192.168.1.68
  • the IP address configured for virtual machine 3 is: 192.168.1.94.
  • the second physical machine receives the IP address mapping table shown in Table 3 sent by the control device, it creates three virtual machines according to Table 3, namely virtual machine 4, virtual machine 5 and virtual machine 6, and according to The virtual machine IP address assigned by the control device to each virtual machine assigns an IP address.
  • the IP address configured for virtual machine 4 is: 192.168.2.104
  • the IP address configured for virtual machine 5 is: 192.168.2.70
  • the IP address configured for virtual machine 6 is: 192.168.2.96.
  • the first heartbeat message sent by the first physical machine is sent by a virtual machine on the first physical machine (for ease of description, taking virtual machine 1 as an example) to a virtual machine on the second physical machine (for convenience Description, taking virtual machine 4 as an example).
  • the first physical machine sets the destination IP address of the first heartbeat message sent by virtual machine 1 to the destination IP address matching virtual machine 1 according to the IP address of virtual machine 1 and the IP address mapping table. For example, the first physical machine determines the destination IP address 192.168.2.104 from the IP address mapping table 1 according to the IP address 192.168.1.102 of the virtual machine 1, and the first physical machine sets the destination IP address of the heartbeat packet sent by the virtual machine 1 It is 192.168.2.104.
  • the switch After the switch receives the first heartbeat message sent by the first physical machine, it parses the destination IP address of the first heartbeat message, finds the 192.168.2.104 address corresponding to the second physical machine according to the IP address mapping table, and then sets the A heartbeat message is sent to the second physical machine.
  • control device can assign IP addresses to virtual machines on each physical machine in the network and send them to the switch, so that the switch can forward heartbeat packets transmitted between virtual machines based on the IP addresses of each virtual machine deal with.
  • the relationship between the first virtual machine and the second virtual machine may be a master virtual machine, where the first virtual machine is the master virtual machine and the second virtual machine is the backup virtual machine.
  • the host can periodically send heartbeat messages to the standby machine, and the standby machine can detect whether the host periodically sends heartbeat messages to identify whether the host is in a normal working state. If the standby machine does not receive heartbeat messages from the host within a period of time, then The standby machine determines that the host is faulty, and the standby machine is upgraded to the host to continue the operation of the host.
  • the above-mentioned switch is a main switch in a switch cluster, and the switch cluster includes at least two switches. After the main switch fails, a new main switch can be elected in the switch cluster to replace the failed switch and continue with the first physical machine, The second physical machine and the control device perform the aforementioned interaction.
  • the switches in the switch cluster store the same data content as the main switch.
  • the reliability of the entire communication system can be improved through the configuration of the active and standby clusters, and data loss after a failure of the main switch can be avoided.
  • FIG. 4 shows a schematic diagram of a logical structure of a control device.
  • the control device 400 includes a receiving unit 401 and a processing unit 402.
  • the receiving unit 401 is configured to receive first indication information sent by the switch, where the first indication information is used to indicate that the switch receives the first heartbeat message sent by the first virtual machine to the second virtual machine, and
  • the first virtual machine is one of the one or more virtual machines configured on the first physical machine
  • the second virtual machine is one of the one or more virtual machines configured on the second physical machine;
  • the processing unit 402 is configured to, if the receiving unit 401 does not receive the second instruction information sent by the switch within a preset time period after receiving the first instruction information, determine that the second virtual machine has occurred Failure, the second indication information is used to indicate that the switch receives a second heartbeat packet sent by the second virtual machine to the first virtual machine, and the second heartbeat packet is the second virtual machine Machine generated according to the first heartbeat message.
  • the processing unit 402 is further configured to: if the processing unit 402 detects that all virtual machines on the second physical machine are faulty, determine that the second physical machine is faulty.
  • processing unit 402 is further configured to summarize and output the identifiers of virtual machines and/or physical machines that have failed.
  • the processing unit 402 is further configured to: before the receiving unit 401 receives the first instruction information sent by the switch, set all virtual machines on the first physical machine and all virtual machines on the second physical machine. Each machine allocates IP addresses;
  • the receiving unit 401 is further configured to send the IP addresses of all virtual machines on the first physical machine and the IP addresses of all virtual machines on the second physical machine to the switch, and the IP address of the second virtual machine Used by the switch to send the first heartbeat message to the second physical machine.
  • FIG. 5 shows a schematic diagram of a logical structure of a switch.
  • the switch 500 includes a receiving unit 501 and a sending unit 502.
  • the receiving unit 501 is configured to receive a first heartbeat message sent by a first virtual machine to a second virtual machine, where the first virtual machine is one of one or more virtual machines configured on the first physical machine, so The second virtual machine is one of one or more virtual machines configured on the second physical machine;
  • the sending unit 502 is configured to send first indication information to the control device, where the first indication information is used to indicate that the switch receives the first heartbeat report sent by the first virtual machine to the second virtual machine Text
  • the sending unit 502 is further configured to: if the receiving unit 501 receives the second virtual machine within a preset period of time after the sending unit 502 sends the first instruction information, and sends it to the first virtual machine.
  • the second heartbeat message of the computer the second instruction information is sent to the control device, and the second instruction information is used to indicate that the switch receives all the information sent by the second virtual machine to the first virtual machine.
  • the second heartbeat message is generated by the second virtual machine according to the first heartbeat message.
  • the receiving unit 501 is further configured to receive all virtual machines on the first physical machine sent by the control device before receiving the first heartbeat message sent by the first virtual machine to the second virtual machine And the IP addresses of all virtual machines on the second physical machine;
  • the sending unit 502 is further configured to, after the receiving unit 501 receives the first heartbeat packet sent by the first virtual machine to the second virtual machine, send the first heartbeat message according to the IP address of the second virtual machine The message is sent to the second physical machine.
  • each unit in the switch 500 can refer to the related description in the method embodiment shown in FIG. 3, and will not be repeated this time.
  • a computer-readable storage medium stores a computer program, the computer program includes program instructions, and the program instructions are implemented when executed by a processor.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
  • the readable storage medium may be any available medium that can be accessed by the computer or a data storage device such as a server or data center integrated with one or more available media.
  • the available medium may be a magnetic medium, (for example, a floppy disk, hard disk, Magnetic tape), optical media (for example, digital versatile disc (DVD), semiconductor media (for example, solid state disk, SSD), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Environmental & Geological Engineering (AREA)
  • Hardware Redundancy (AREA)

Abstract

本申请公开了一种故障检测方法及相关设备,其中该方法包括:控制设备接收交换机发送的第一指示信息,第一指示信息用于指示交换机接收到第一虚拟机发往第二虚拟机的第一心跳报文,该第一虚拟机为第一物理机上配置的一个或多个虚拟机中的一个,第二虚拟机为第二物理机上配置的一个或多个虚拟机中的一个;若控制设备在接收到第一指示信息后的预设时间段内没有接收到交换机发送的第二指示信息,则判定第二虚拟机发生故障,第二指示信息用于指示交换机接收到第二虚拟机发往第一虚拟机的第二心跳报文,第二心跳报文是第二虚拟机根据第一心跳报文生成的。采用本申请实施例,可以精准检测出发生故障的原因。

Description

故障检测方法及相关设备
本申请要求于2019年6月4日提交中国专利局、申请号为201910484497.1、申请名称为“一种故障检测方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及云计算技术领域,尤其涉及一种故障检测方法及相关设备。
背景技术
在云计算场景下,会大量使用网络功能虚拟化(network function virtualization,NFV)产品,通过软件实现网络功能,比较常见的做法是在多台物理服务机上部署虚拟机,通过软件功能实现网络通信。
然而,当虚拟机之间建立通信链路后,虚拟机通过链路发送报文时会存在通信失败的情况,目前无法检测出通信链路发生故障的原因。
发明内容
本申请实施例提供一种故障检测方法及相关设备,可以精准检测出发生故障的原因。
第一方面,本申请实施例提供了一种故障检测方法,应用于控制设备,该方法包括:
控制设备接收交换机发送的第一指示信息,所述第一指示信息用于指示所述交换机接收到第一虚拟机发往第二虚拟机的第一心跳报文,所述第一虚拟机为第一物理机上配置的一个或多个虚拟机中的一个,所述第二虚拟机为第二物理机上配置的一个或多个虚拟机中的一个;
若所述控制设备在接收到所述第一指示信息后的预设时间段内没有接收到所述交换机发送的第二指示信息,则判定所述第二虚拟机发生故障,所述第二指示信息用于指示所述交换机接收到所述第二虚拟机发往所述第一虚拟机的第二心跳报文,所述第二心跳报文是所述第二虚拟机根据所述第一心跳报文生成的。
第二方面,本申请实施例提供了一种故障检测方法,应用于交换机,该方 法包括:
交换机接收第一虚拟机发往第二虚拟机的第一心跳报文,所述第一虚拟机为第一物理机上配置的一个或多个虚拟机中的一个,所述第二虚拟机为第二物理机上配置的一个或多个虚拟机中的一个;
所述交换机向控制设备发送第一指示信息,所述第一指示信息用于指示所述交换机接收到所述第一虚拟机发往所述第二虚拟机的所述第一心跳报文;
若所述交换机在发送所述第一指示信息之后的预设时间段内接收到所述第二虚拟机发往所述第一虚拟机的第二心跳报文,则所述交换机向所述控制设备发送第二指示信息,所述第二指示信息用于指示所述交换机接收到所述第二虚拟机发往所述第一虚拟机的所述第二心跳报文,所述第二心跳报文是所述第二虚拟机根据所述第一心跳报文生成的。
第三方面,本申请实施例提供了一种控制设备,该控制设备包括由于执行上述第一方面所述的故障检测方法的模块或单元。例如,该控制设备包括:接收单元和处理单元。
其中,接收单元,用于接收交换机发送的第一指示信息,所述第一指示信息用于指示所述交换机接收到第一虚拟机发往第二虚拟机的第一心跳报文,所述第一虚拟机为第一物理机上配置的一个或多个虚拟机中的一个,所述第二虚拟机为第二物理机上配置的一个或多个虚拟机中的一个;
处理单元,用于若所述接收单元在接收到所述第一指示信息后的预设时间段内没有接收到所述交换机发送的第二指示信息,则判定所述第二虚拟机发生故障,所述第二指示信息用于指示所述交换机接收到所述第二虚拟机发往所述第一虚拟机的第二心跳报文,所述第二心跳报文是所述第二虚拟机根据所述第一心跳报文生成的。
第四方面,本申请实施例提供了一种交换机,该交换机包括由于执行上述第二方面所述的故障检测方法的模块或单元。例如,该交换机包括:接收单元和发送单元。
其中,接收单元,用于接收第一虚拟机发往第二虚拟机的第一心跳报文,所述第一虚拟机为第一物理机上配置的一个或多个虚拟机中的一个,所述第二虚拟机为第二物理机上配置的一个或多个虚拟机中的一个;
发送单元,用于向控制设备发送第一指示信息,所述第一指示信息用于指示所述交换机接收到所述第一虚拟机发往所述第二虚拟机的所述第一心跳报 文;
所述发送单元,还用于若所述接收单元在所述发送单元发送所述第一指示信息之后的预设时间段内接收到所述第二虚拟机发往所述第一虚拟机的第二心跳报文,则向所述控制设备发送第二指示信息,所述第二指示信息用于指示所述交换机接收到所述第二虚拟机发往所述第一虚拟机的所述第二心跳报文,所述第二心跳报文是所述第二虚拟机根据所述第一心跳报文生成的。
第五方面,本申请实施例提供了一种控制设备,包括:处理器、通信接口和存储器,所述处理器分别与所述存储器和所述通信接口连接。其中,通信接口用于与其它网络设备(例如物理机、交换机)进行通信,存储器用于存储第一方面所提供的故障检测方法的实现代码,处理器用于执行存储器中存储的程序代码,即执行第一方面所提供的故障检测方法。
第六方面,本申请实施例提供了一种交换机,包括:处理器、通信接口和存储器,所述处理器分别与所述存储器和所述通信接口连接。其中,通信接口用于与其它网络设备(例如物理机、控制设备)进行通信,存储器用于存储第二方面所提供的故障检测方法的实现代码,处理器用于执行存储器中存储的程序代码,即执行第二方面所提供的故障检测方法。
第七方面,本申请实施例提供了一种通信***,包括控制设备、多台物理机、一个或多个交换机。其中,所述控制设备为上述第三方面或第五方面所述的控制设备,所述交换机为上述第四方面或第六方面所述的交换机。
所述多台物理机包括上述第一方面或第二方面所述的第一物理机和第二物理机,该一个或多个交换机包括上述第一方面或第二方面所述的交换机。其中每台物理机上可以部署一个或多个虚拟机,第一物理机上部署的一个或多个虚拟机中包括上述第一虚拟机,第二物理机上部署的一个或多个虚拟机中包括上述第二虚拟机。不同物理机之间进行通信需要经过交换机,交换机可以识别发端物理机发送的报文携带的IP地址找到对应的收端物理机,进而将报文发送给收端物理机。控制设备可以进行全局物理机、虚拟机的故障检测,可以精准识别出全局中有哪些物理机和虚拟机发生了故障。
第八方面,本申请实施例提供了一种计算机可读存储介质,该可读存储介质上存储有指令,当其在处理器上运行时,使得处理器执行上述第一方面或第二方面描述的故障检测方法。
第九方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在 处理器上运行时,使得处理器执行上述第一方面或第二方面描述的故障检测方法。
实施本申请实施例,控制设备可以根据虚拟机之间的心跳报文从发送到响应之间的时间间隔来判断虚拟机是否存在故障,例如,若第一虚拟机向第二虚拟机发送了第一心跳报文后的预设时间段内第二虚拟机未响应心跳报文,则可以识别出第二虚拟机发生了故障,因此,可以精准检测出通信链路的故障源是否是虚拟机,节省定位故障的时间。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。
图1为本申请实施例提供的通信***的架构示意图;
图2为本申请实施例提供的一种网络设备的硬件结构示意图;
图3为本申请实施例提供的一种故障检测方法的流程示意图;
图4为本申请实施例提供的一种控制设备的逻辑结构示意图;
图5为本申请实施例提供的一种交换机的逻辑结构示意图。
具体实施方式
请参见图1,图1是本申请实施例涉及的一种通信***的架构示意图,该通信***可以是云网络***。该通信***包括控制设备,交换机(或称物理交换机)集群和多台物理机。
其中,每台物理机上均可以配置或部署一台或多台虚拟机,具体配置数量由控制设备决定。每一物理机均有一物理网口,每一虚拟机可以对应一个虚拟网卡,同一物理机上的多个虚拟机对应到同一个物理网口上,每个虚拟网卡均可以具备独立的IP地址,具体可以由控制设备配置并发送给各个物理机。同一物理机上的多个虚拟机的IP地址可以位于同一IP网段下,进行通信的两台物理机的IP地址可以位于同一IP网段下。
其中,交换机用于对任意两台物理机之间传输的报文或数据进行转发。例如,每一物理机均有一物理网口,当交换机接收到某一物理网口发送的报文后,识别该报文的目的IP地址或者目的MAC地址,进而将该报文发送给该目的IP地址或者目的MAC地址对应的另一物理网口,从而实现两台物理机的报文 通信。交换机集群中可以包括一台或多台交换机,若交换机集群中包括多台交换机,则其中可以有主交换机和备交换机,当主交换机发生故障,备交换机可以接替主交换机继续进行数据包转发操作。
控制设备分别与每台物理机以及交换机集群中的主交换机进行连接。控制设备可以分配网络中的各个物理机的IP网段以及各个物理机上的全部虚拟机的IP地址,需要进行通信的两台物理机的IP网段需要处于同一IP网段下,只要IP在同一个网段,就可以实现二层通信。控制设备分配了IP地址后,可以将IP地址映射表发送给交换机以及各台物理机。各台物理机可以通过IP地址映射表获知通信对端的物理机上各个虚拟机的IP地址。交换机可以通过IP地址映射表实现物理机之间的报文转发。
本申请实施例中提及的控制设备可以是软件定义网络(Software Defined Network,SDN)控制设备或者其他控制设备。第一物理机和第二物理机可以是电脑、服务器或其他实体设备。
本申请实施例为了便于描述,以多台物理机中的第一物理机和第二物理机,多台交换机中的主交换机为例来进行说明。其中控制设备与主交换机、第一物理机和第二物理机通过云网络相互通信。在该通信***中,第一物理机的第一虚拟机产生心跳报文后,心跳报文发送给主交换机,主交换机接收到第一物理机上的第一虚拟机发往第二物理机上的第二虚拟机的心跳报文后向控制设备发送第一指示信息,然后通过IP地址映射表识别出对端物理机,即第二物理机,进而将第一物理机发送的心跳报文发送给第二物理机。第二物理机上的第二虚拟机收到第一物理机上的第一虚拟机发送的心跳报文后,基于第一心跳报文生成第二心跳报文,将第二心跳报文发送给主交换机,主交换机接收到第二心跳报文后,向控制设备发送第二指示信息。控制设备若在接收到第一指示信息的预设时间段内没有接收到第二指示信息,则判定第二虚拟机发生故障。若第二物理机上的全部虚拟机均发生故障,则判定第二物理机发送故障。因此,本申请可以精准识别出故障源是虚拟机还是物理机。
请参见图2,图2示出了本申请实施例提供的一种网络设备的硬件结构示意图,该网络设备200可包括:存储器201、通信接口202、和一个或多个处理器203。这些部件可通过总线204或者其他方式连接,图2以通过总线连接为例。其中:
存储器201可以和处理器203通过总线204或者输入输出端口耦合,存储器201也可以与处理器203集成在一起。存储器201用于存储各种软件程序和/或多组指令。具体的,存储器201可包括高速随机存取的存储器,并且也可包括非易失性存储器,例如一个或多个磁盘存储设备、闪存设备或其他非易失性固态存储设备。存储器201还可以存储网络通信程序,该网络通信程序可用于与一个或多个附加设备,一个或多个终端,一个或多个网络设备进行通信。
处理器203可以是通用处理器,例如中央处理器(central processing unit,CPU),还可以是数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路。处理器203可处理通过通信接口202接收到的数据。
通信接口202用于网络设备200与其他网络设备进行通信,例如物理机进行通信。通信接口202可以是收发器、收发电路等,其中,通信接口是统称,可以包括一个或多个接口,例如控制设备与交换机之间的接口。通信接口202可以包括有线接口和无线接口,例如标准接口、以太网、多机同步接口。
处理器203可用于读取和执行计算机可读指令。具体的,处理器203可用于调用存储于存储器201中的数据。可选地,当处理器203发送任何消息或数据时,其具体通过驱动或控制通信接口202做所述发送。可选地,当处理器203接收任何消息或数据时,其具体通过驱动或控制通信接口202做所述接收。因此,处理器203可以被视为是执行发送或接收的控制中心,通信接口202是发送和接收操作的具体执行者。
在本申请实施例中,通信接口202具体用于执行下述方法实施例中涉及的数据收发的步骤,处理器203具体用于实施除数据收发之外的数据处理的步骤。
在具体实现中,作为一种实施例,网络设备200还可以包括输出设备和输入设备。输出设备和处理器203通信,可以以多种方式来显示信息。例如,输出设备可以是液晶显示器(Liquid Crystal Display,LCD),发光二级管(Light Emitting Diode,LED)显示设备,阴极射线管(Cathode Ray Tube,CRT)显示设备,或投影仪(projector)等。输入设备和处理器203通信,可以以多种方式接受用户的输入。例如,输入设备可以是鼠标、键盘、触摸屏设备或传感设备等。
上述的网络设备200可以是一个通用计算机设备或者是一个专用计算机 设备。在具体实现中,网络设备200可以是台式机、便携式电脑、网络服务器、掌上电脑(Personal Digital Assistant,PDA)、移动手机、平板电脑、无线终端设备、通信设备、嵌入式设备或有图2中类似结构的设备。本申请实施例不限定网络设备200的类型。
如图1中的控制设备可以为图2所示的设备,控制设备的存储器中存储了一个或多个软件模块(如交互模块和处理模块)。如图1中的交换机也可以为图2所示的设备,交换机的存储器中存储了一个或多个软件模块(如交互模块和处理模块)。控制设备或者交换机可以通过处理器以及存储器中的程序代码来实现软件模块,实现下述方法实施例涉及的故障检测方法。
结合图1所示的通信***架构示意图,参见图3,图3提供了一种故障检测方法的流程示意图。其中,该故障检测方法可以包括:
S301,交换机接收第一虚拟机发往第二虚拟机的第一心跳报文。
本申请中,第一虚拟机为第一物理机上的虚拟机,第一物理机上配置有一个或多个虚拟机。第二虚拟机为第二物理机上的虚拟机,第二物理机上配置有一个或多个虚拟机。第一物理机上的每个虚拟机均与第二物理机上的某一虚拟机建立心跳链路,用于传输心跳报文。下述实施例以第一虚拟机与第二虚拟机之间传输心跳报文为例,以辅助控制设备进行全局的故障检测。
S302,交换机向控制设备发送第一指示信息,控制设备接收交换机发送的第一指示信息,该第一指示信息用于指示交换机接收到第一虚拟机发往第二虚拟机的第一心跳报文。
交换机接收到心跳报文后,需要向控制设备上报心跳传输事件,以辅助控制设备检测网络中的物理机或虚拟机是否发生故障。交换机可以识别出第一心跳报文的源IP地址和目的IP地址,源IP地址即第一虚拟机的IP地址,目的IP地址及第二虚拟机的IP地址。第一指示信息中可以包括该第一心跳报文的源IP地址以及目的IP地址。
S303,若交换机在发送第一指示信息之后的预设时间段内接收到第二虚拟机发往第一虚拟机的第二心跳报文,则交换机向控制设备发送第二指示信息,第二指示信息用于指示交换机接收到第二虚拟机发往第一虚拟机的第二心跳报文,第二心跳报文是第二虚拟机根据第一心跳报文生成的。
若交换机在发送第一指示信息之后的预设时间段内未接收到第二虚拟机 发往第一虚拟机的第二心跳报文,则交换机不会向控制设备发送第二指示信息。
示例性的,交换机在将第一心跳报文发送给第二虚拟机所属的第二物理机后,可以启动计时器,若在计时器结束之前接收到第二虚拟机发往第一虚拟机的第二心跳报文,则交换机需要向控制设备上报第二指示信息,指示第二虚拟机响应了心跳报文。若在计时器结束后仍未接收到第二虚拟机发往第一虚拟机的第二心跳报文,则交换机不会向控制设备上报第二指示信息,隐式指示第二虚拟机未响应心跳报文。或者,若在计时器结束后仍未接收到第二虚拟机发往第一虚拟机的第二心跳报文,则交换机向控制设备上报第三指示信息,指示第二虚拟机未响应心跳报文。
S304,若控制设备在接收到第一指示信息后的预设时间段内没有接收到交换机发送的第二指示信息,则判定第二虚拟机发生故障。
控制设备在接收到交换机发送的第一指示信息后,可以启动计时器,若在计时器结束之前接收到交换机发送的第二指示信息,则判定第二虚拟机没有发生故障。若在计时器结束后仍未接收到交换机发送的第二指示信息,则判定第二虚拟机发生故障。或者,若在计时器结束后接收到交换机发送的第三指示信息,指示第二虚拟机未响应心跳报文,则判定第二虚拟机发生故障。
上述计时器均可以是倒计时模式。倒计时的时长可以由控制设备配置。
实施本申请实施例,控制设备可以根据虚拟机之间的心跳报文从发送到响应之间的时间间隔来判断虚拟机是否存在故障,例如,若第一虚拟机向第二虚拟机发送了第一心跳报文后的预设时间段内第二虚拟机未响应心跳报文,则可以识别出第二虚拟机发生了故障,因此,可以精准检测出通信链路的故障源是否是虚拟机。
可选的,除了可以识别出故障源是否是虚拟机以外,控制设备还可以进一步识别出故障源是否是物理机。控制设备可以采用上述方式检测各个虚拟机是否发生故障,若检测到某一物理机上的全部虚拟机均发生故障,则判定是物理机出现故障。物理机出现故障即会导致该物理机上的全部虚拟机均无法正常响应心跳报文。
实施本申请实施例,控制设备根据虚拟机之间的心跳报文从发送到响应之间的时间间隔判断出某一物理机上的全部虚拟机均发生故障后,可以进一步判断出通信链路的故障源是该物理机,因此可以快速有效找到导致虚拟机之间通信链路发生故障的故障源,节省定位故障的时间。
可选的,控制设备检测出故障虚拟机或故障物理机之后,可以将发生故障的虚拟机的标识和/或物理机的标识进行汇总并输出给管理人员(或运维人员),由管理人员对发生故障的虚拟机或物理机进行进一步检测故障原因并维修,比如如果是物理机操作***问题造成的物理机故障,则管理人员解决***问题,如果是物理机的硬件电路问题造成的物理机故障,则管理人员解决电路问题,以恢复该故障物理机的正常运行。又例如,如果是虚拟机配置问题造成的虚拟机故障,则管理人员解决该配置问题,以恢复该故障虚拟机的正常运行。
可选的,各个物理机的网段以及各个物理机上的虚拟机的IP地址可以由控制设备配置,控制设备为第一物理机上的全部虚拟机和第二物理机上的全部虚拟机分别分配IP地址,将第一物理机上的全部虚拟机的IP地址和第二物理机上的全部虚拟机的IP地址发送给交换机,第二虚拟机的IP地址用于交换机将第一心跳报文发送给第二物理机。控制设备还将第一物理机上的全部虚拟机的IP地址发送给第一物理机,将第二物理机上的全部虚拟机的IP地址发送给第二物理机。除此之外,每一虚拟机均关联了对端虚拟机,控制设备还会将第一物理机上各个虚拟机关联的虚拟机的IP地址发送给第一物理机,将第二物理机上各个虚拟机关联的虚拟机的IP地址发送给第二物理机。
为实现第一物理机与第二物理机在局域网内能够通信,控制设备需要为第一物理机和第二物理机分别配置IP子网段,并且需要保证第一物理机的IP子网段与第二物理机的IP子网段在同一网段内。例如,控制设备为第一物理机分配的IP子网段为192.168.1.X,控制设备为第二物理机分配的IP子网段为192.168.2.X,这两个子网段均位于网段192.168.X.X范围内。
进一步的,控制设备为第一物理机和第二物理机分别配置IP子网段后,还需要为第一物理机和第二物理机分配若干个虚拟机的IP地址,虚拟机的数量由控制设备决定。并配置第一物理机的每个虚拟机IP地址与第二物理机的每个虚拟机IP地址的映射表。例如,第一物理机的IP子网段为192.168.1.X,第二物理机的IP子网段为192.168.2.X,控制设备分别为第一物理机和第二物理机分配3个虚拟机的IP地址,并配置位于第一物理机中的每一个虚拟机的IP地址与第二物理机中每个虚拟机的IP地址的关联关系。这里,“关联”是指如果两个物理机上的虚拟机被分配的IP地址是关联的,则这两台虚拟机建立了心跳链路,需要互相发送并响应心跳报文。例如,IP地址映射表的格式及内容可以例如但不限于如下表1所示。
表1
Figure PCTCN2019102769-appb-000001
可选的,控制设备为各个物理机分配了IP子网段以及为各个物理机上的虚拟机分配了IP地址映射表后,可以将IP地址映射表发送给交换机,交换机根据IP地址映射表进行报文转发。例如,控制设备发送给交换机的IP地址映射表可以例如但不限于表1所示。
此外,控制设备还需要将为各个物理机分配的IP子网段以及虚拟机的IP地址映射表发送给各个物理机。以第一物理机为例,控制设备需要将为第一物理机分配的IP子网段192.168.1.X发送给第一物理机,还需要将第一物理机上各个虚拟机的IP地址以及与其对应的虚拟机的IP地址映射表发送给第一物理机。例如,控制设备向第一物理机发送的IP地址映射表可以例如但不限于下表2所示。
表2
Figure PCTCN2019102769-appb-000002
当然,控制设备发送给第一物理机的IP地址映射表也可以如表1所示。
相应的,控制设备需要将为第二物理机分配的IP子网段192.168.2.X发送给第二物理机以外,还需要将第二物理机上各个虚拟机的IP地址以及与其对应的虚拟机的IP地址映射表发送给第二物理机。例如,控制设备向第二物理 机发送的IP地址映射表可以例如但不限于下表3所示。
表3
Figure PCTCN2019102769-appb-000003
当然,控制设备发送给第二物理机的IP地址映射表也可以如表1所示。
可选的,物理机接收控制设备发送的IP地址映射表后,根据IP地址映射表创建虚拟机,例如,第一物理机接收到控制设备发送的如表2的IP地址映射表后,第一物理机根据表2创建3台虚拟机,分别为虚拟机1、虚拟机2和虚拟机3,并且根据控制设备为其分配的虚拟机IP地址分别为各个虚拟机分配IP地址。例如,为虚拟机1配置的IP地址为:192.168.1.102,为虚拟机2配置的IP地址为:192.168.1.68,为虚拟机3配置的IP地址为:192.168.1.94。同样的,第二物理机接收到控制设备发送的如表3所示的IP地址映射表后,根据表3创建3台虚拟机,分别为虚拟机4、虚拟机5和虚拟机6,并根据控制设备为其分配的虚拟机IP地址分别为各个虚拟机分配IP地址。例如,为虚拟机4配置的IP地址为:192.168.2.104,为虚拟机5配置的IP地址为:192.168.2.70,为虚拟机6配置的IP地址为:192.168.2.96。
可选的,第一物理机发送的第一心跳报文是第一物理机上某一虚拟机(为便于描述,以虚拟机1为例)发送给第二物理机上的某一虚拟机(为便于描述,以虚拟机4为例)的。第一物理机根据虚拟机1的IP地址和IP地址映射表将虚拟机1发送的第一心跳报文的目的IP地址设置为与虚拟机1匹配的目的IP地址。例如,第一物理机根据虚拟机1的IP地址192.168.1.102,从IP地址映射表1中确定目的IP地址192.168.2.104,第一物理机将虚拟机1发送的心跳报文的目的IP地址设置为192.168.2.104。
交换机接收到第一物理机发送的第一心跳报文后,对第一心跳报文的目的IP地址进行解析,根据IP地址映射表找到192.168.2.104地址对应的为第二物理机,然后将第一心跳报文发送给第二物理机。
实施本申请实施例,控制设备可以为网络中的各个物理机上的虚拟机分配IP地址,并发送给交换机,使得交换机可以基于各个虚拟机的IP地址对虚拟 机之间传输的心跳报文进行转发处理。
可选的,第一虚拟机和第二虚拟机的关系可以是主备虚拟机,其中第一虚拟机为主虚拟机,第二虚拟机为备虚拟机。主机可以周期性向备机发送心跳报文,备机可以检测主机是否周期性发送心跳报文来识别主机是否处于正常工作状态,若备机在一段时间内未收到主机发送的心跳报文,则备机判定主机发生故障,则备机升级为主机继续执行主机的操作。
可选的,上述交换机为交换机集群中的主交换机,所述交换机集群中包括至少两台交换机,在主交换机出现故障后交换机集群中可以选举新的主交换机替换故障交换机继续与第一物理机、第二物理机和控制设备进行上述交互。
具体的,交换机集群中的交换机中存储有与主交换机相同的数据内容。
实施本申请实施例,通过主备集群的设置能够提升整个通信***的可靠性,避免数据在主交换机发生故障后丢失。
参见图4,图4示给出了一种控制设备的逻辑结构示意图,如图4所示,该控制设备400包括:接收单元401和处理单元402。
其中,接收单元401,用于接收交换机发送的第一指示信息,所述第一指示信息用于指示所述交换机接收到第一虚拟机发往第二虚拟机的第一心跳报文,所述第一虚拟机为第一物理机上配置的一个或多个虚拟机中的一个,所述第二虚拟机为第二物理机上配置的一个或多个虚拟机中的一个;
处理单元402,用于若所述接收单元401在接收到所述第一指示信息后的预设时间段内没有接收到所述交换机发送的第二指示信息,则判定所述第二虚拟机发生故障,所述第二指示信息用于指示所述交换机接收到所述第二虚拟机发往所述第一虚拟机的第二心跳报文,所述第二心跳报文是所述第二虚拟机根据所述第一心跳报文生成的。
可选的,所述处理单元402还用于:若所述处理单元402检测到所述第二物理机上的全部虚拟机均发生故障,则判定所述第二物理机发生故障。
可选的,所述处理单元402还用于:将发生故障的虚拟机的标识和/或物理机标识进行汇总并输出。
可选的,所述处理单元402还用于:在所述接收单元401接收交换机发送的第一指示信息之前,为所述第一物理机上的全部虚拟机和所述第二物理机上的全部虚拟机分别分配IP地址;
所述接收单元401,还用于将第一物理机上的全部虚拟机的IP地址和所述第二物理机上的全部虚拟机的IP地址发送给所述交换机,所述第二虚拟机的IP地址用于所述交换机将所述第一心跳报文发送给所述第二物理机。
需要说明的是,控制设备400中各个单元的功能和实现可以参考前述图3所示方法实施例中的相关描述,此次不再赘述。
参见图5,图5示给出了一种交换机的逻辑结构示意图,如图5所示,该交换机500包括:接收单元501和发送单元502。
其中,接收单元501,用于接收第一虚拟机发往第二虚拟机的第一心跳报文,所述第一虚拟机为第一物理机上配置的一个或多个虚拟机中的一个,所述第二虚拟机为第二物理机上配置的一个或多个虚拟机中的一个;
发送单元502,用于向控制设备发送第一指示信息,所述第一指示信息用于指示所述交换机接收到所述第一虚拟机发往所述第二虚拟机的所述第一心跳报文;
所述发送单元502,还用于若所述接收单元501在所述发送单元502发送所述第一指示信息之后的预设时间段内接收到所述第二虚拟机发往所述第一虚拟机的第二心跳报文,则向所述控制设备发送第二指示信息,所述第二指示信息用于指示所述交换机接收到所述第二虚拟机发往所述第一虚拟机的所述第二心跳报文,所述第二心跳报文是所述第二虚拟机根据所述第一心跳报文生成的。
可选的,所述接收单元501,还用于在接收第一虚拟机发往第二虚拟机的第一心跳报文之前,接收所述控制设备发送的所述第一物理机上的全部虚拟机的IP地址和所述第二物理机上的全部虚拟机的IP地址;
所述发送单元502,还用于在所述接收单元501接收第一虚拟机发往第二虚拟机的第一心跳报文之后,根据所述第二虚拟机的IP地址将所述第一心跳报文发送给所述第二物理机。
需要说明的是,交换机500中各个单元的功能和实现可以参考前述图3所示方法实施例中的相关描述,此次不再赘述。
在本申请的另一实施例中提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被 处理器执行时实现。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如数字多功能光盘(digital versatile disc,DVD)、半导体介质(例如固态硬盘solid state disk,SSD)等。
以上所述的具体实施方式,对本申请实施例的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请实施例的具体实施方式而已,并不用于限定本申请实施例的保护范围,凡在本申请实施例的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请实施例的保护范围之内。

Claims (20)

  1. 一种故障检测方法,其特征在于,包括:
    控制设备接收交换机发送的第一指示信息,所述第一指示信息用于指示所述交换机接收到第一虚拟机发往第二虚拟机的第一心跳报文,所述第一虚拟机为第一物理机上配置的一个或多个虚拟机中的一个,所述第二虚拟机为第二物理机上配置的一个或多个虚拟机中的一个;
    若所述控制设备在接收到所述第一指示信息后的预设时间段内没有接收到所述交换机发送的第二指示信息,则判定所述第二虚拟机发生故障,所述第二指示信息用于指示所述交换机接收到所述第二虚拟机发往所述第一虚拟机的第二心跳报文,所述第二心跳报文是所述第二虚拟机根据所述第一心跳报文生成的。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    若所述控制设备检测到所述第二物理机上的全部虚拟机均发生故障,则判定所述第二物理机发生故障。
  3. 根据权利要求2所述的方法,其特征在于,还包括:
    所述控制设备将发生故障的虚拟机的标识和/或物理机标识进行汇总并输出。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述控制设备接收交换机发送的第一指示信息之前,还包括:
    所述控制设备为所述第一物理机上的全部虚拟机和所述第二物理机上的全部虚拟机分别分配IP地址;
    所述控制设备将第一物理机上的全部虚拟机的IP地址和所述第二物理机上的全部虚拟机的IP地址发送给所述交换机,所述第二虚拟机的IP地址用于所述交换机将所述第一心跳报文发送给所述第二物理机。
  5. 一种故障检测方法,其特征在于,包括:
    交换机接收第一虚拟机发往第二虚拟机的第一心跳报文,所述第一虚拟机为第一物理机上配置的一个或多个虚拟机中的一个,所述第二虚拟机为第二物理机上配置的一个或多个虚拟机中的一个;
    所述交换机向控制设备发送第一指示信息,所述第一指示信息用于指示所述交换机接收到所述第一虚拟机发往所述第二虚拟机的所述第一心跳报文;
    若所述交换机在发送所述第一指示信息之后的预设时间段内接收到所述第二虚拟机发往所述第一虚拟机的第二心跳报文,则所述交换机向所述控制设备发送第二指示信息,所述第二指示信息用于指示所述交换机接收到所述第二虚拟机发往所述第一虚拟机的所述第二心跳报文,所述第二心跳报文是所述第二虚拟机根据所述第一心跳报文生成的。
  6. 根据权利要求5所述的方法,其特征在于,所述交换机接收第一虚拟机发往第二虚拟机的第一心跳报文之前,还包括:
    所述交换机接收所述控制设备发送的所述第一物理机上的全部虚拟机的IP地址和所述第二物理机上的全部虚拟机的IP地址;
    所述交换机接收第一虚拟机发往第二虚拟机的第一心跳报文之后,还包括:
    所述交换机根据所述第二虚拟机的IP地址将所述第一心跳报文发送给所述第二物理机。
  7. 一种控制设备,其特征在于,包括:
    接收单元,用于接收交换机发送的第一指示信息,所述第一指示信息用于指示所述交换机接收到第一虚拟机发往第二虚拟机的第一心跳报文,所述第一虚拟机为第一物理机上配置的一个或多个虚拟机中的一个,所述第二虚拟机为第二物理机上配置的一个或多个虚拟机中的一个;
    处理单元,用于若所述控制设备在接收到所述第一指示信息后的预设时间段内没有接收到所述交换机发送的第二指示信息,则判定所述第二虚拟机发生故障,所述第二指示信息用于指示所述交换机接收到所述第二虚拟机发往所述第一虚拟机的第二心跳报文,所述第二心跳报文是所述第二虚拟机根据所述第一心跳报文生成的。
  8. 根据权利要求7所述的控制设备,其特征在于,所述处理单元还用于:若所述处理单元检测到所述第二物理机上的全部虚拟机均发生故障,则判定所述第二物理机发生故障。
  9. 根据权利要求8所述的控制设备,其特征在于,所述处理单元还用于:将发生故障的虚拟机的标识和/或物理机标识进行汇总并输出。
  10. 根据权利要求7或8所述的控制设备,其特征在于,所述处理单元还用于:在所述接收单元接收交换机发送的第一指示信息之前,为所述第一物理机上的全部虚拟机和所述第二物理机上的全部虚拟机分别分配IP地址;
    所述接收单元,还用于将第一物理机上的全部虚拟机的IP地址和所述第 二物理机上的全部虚拟机的IP地址发送给所述交换机,所述第二虚拟机的IP地址用于所述交换机将所述第一心跳报文发送给所述第二物理机。
  11. 一种交换机,其特征在于,包括:
    接收单元,用于接收第一虚拟机发往第二虚拟机的第一心跳报文,所述第一虚拟机为第一物理机上配置的一个或多个虚拟机中的一个,所述第二虚拟机为第二物理机上配置的一个或多个虚拟机中的一个;
    发送单元,用于向控制设备发送第一指示信息,所述第一指示信息用于指示所述交换机接收到所述第一虚拟机发往所述第二虚拟机的所述第一心跳报文;
    所述发送单元,还用于若所述接收单元在所述发送单元发送所述第一指示信息之后的预设时间段内接收到所述第二虚拟机发往所述第一虚拟机的第二心跳报文,则向所述控制设备发送第二指示信息,所述第二指示信息用于指示所述交换机接收到所述第二虚拟机发往所述第一虚拟机的所述第二心跳报文,所述第二心跳报文是所述第二虚拟机根据所述第一心跳报文生成的。
  12. 根据权利要求11所述的交换机,其特征在于,所述接收单元,还用于在接收第一虚拟机发往第二虚拟机的第一心跳报文之前,接收所述控制设备发送的所述第一物理机上的全部虚拟机的IP地址和所述第二物理机上的全部虚拟机的IP地址;
    所述发送单元,还用于在所述接收单元接收第一虚拟机发往第二虚拟机的第一心跳报文之后,根据所述第二虚拟机的IP地址将所述第一心跳报文发送给所述第二物理机。
  13. 一种控制设备,其特征在于,包括处理器、存储器和通信接口,所述处理器分别与所述存储器和所述通信接口连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行如下操作:
    通过所述通信接口接收交换机发送的第一指示信息,所述第一指示信息用于指示所述交换机接收到第一虚拟机发往第二虚拟机的第一心跳报文,所述第一虚拟机为第一物理机上配置的一个或多个虚拟机中的一个,所述第二虚拟机为第二物理机上配置的一个或多个虚拟机中的一个;
    若所述控制设备在接收到所述第一指示信息后的预设时间段内没有接收到所述交换机发送的第二指示信息,则所述处理器判定所述第二虚拟机发生故 障,所述第二指示信息用于指示所述交换机接收到所述第二虚拟机发往所述第一虚拟机的第二心跳报文,所述第二心跳报文是所述第二虚拟机根据所述第一心跳报文生成的。
  14. 根据权利要求13所述的控制设备,其特征在于,所述处理器还用于:
    若所述控制设备检测到所述第二物理机上的全部虚拟机均发生故障,则判定所述第二物理机发生故障。
  15. 根据权利要求14所述的控制设备,其特征在于,所述处理器还用于:
    通过所述通信接口将发生故障的虚拟机的标识和/或物理机标识进行汇总并输出。
  16. 根据权利要求13至15任一项所述的控制设备,其特征在于,所述处理器还用于:
    在通过所述通信接口接收交换机发送的第一指示信息之前,为所述第一物理机上的全部虚拟机和所述第二物理机上的全部虚拟机分别分配IP地址;
    通过所述通信接口将第一物理机上的全部虚拟机的IP地址和所述第二物理机上的全部虚拟机的IP地址发送给所述交换机,所述第二虚拟机的IP地址用于所述交换机将所述第一心跳报文发送给所述第二物理机。
  17. 一种交换机,其特征在于,包括处理器、存储器和通信接口,所述处理器分别与所述存储器和所述通信接口连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行如下操作:
    通过所述通信接口接收第一虚拟机发往第二虚拟机的第一心跳报文,所述第一虚拟机为第一物理机上配置的一个或多个虚拟机中的一个,所述第二虚拟机为第二物理机上配置的一个或多个虚拟机中的一个;
    通过所述通信接口向控制设备发送第一指示信息,所述第一指示信息用于指示所述交换机接收到所述第一虚拟机发往所述第二虚拟机的所述第一心跳报文;
    若所述交换机在发送所述第一指示信息之后的预设时间段内接收到所述第二虚拟机发往所述第一虚拟机的第二心跳报文,则通过所述通信接口向所述控制设备发送第二指示信息,所述第二指示信息用于指示所述交换机接收到所述第二虚拟机发往所述第一虚拟机的所述第二心跳报文,所述第二心跳报文是所述第二虚拟机根据所述第一心跳报文生成的。
  18. 根据权利要求17所述的交换机,其特征在于,所述交换机还用于:在通过所述通信接口接收第一虚拟机发往第二虚拟机的第一心跳报文之前,通过所述通信接口接收所述控制设备发送的所述第一物理机上的全部虚拟机的IP地址和所述第二物理机上的全部虚拟机的IP地址;
    通过所述通信接口接收第一虚拟机发往第二虚拟机的第一心跳报文之后,还包括:
    通过所述通信接口根据所述第二虚拟机的IP地址将所述第一心跳报文发送给所述第二物理机。
  19. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1-4任一项所述的方法。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求5-6任一项所述的方法。
PCT/CN2019/102769 2019-06-04 2019-08-27 故障检测方法及相关设备 WO2020244067A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910484497.1A CN110247821B (zh) 2019-06-04 2019-06-04 一种故障检测方法及相关设备
CN201910484497.1 2019-06-04

Publications (1)

Publication Number Publication Date
WO2020244067A1 true WO2020244067A1 (zh) 2020-12-10

Family

ID=67886049

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/102769 WO2020244067A1 (zh) 2019-06-04 2019-08-27 故障检测方法及相关设备

Country Status (2)

Country Link
CN (1) CN110247821B (zh)
WO (1) WO2020244067A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116248618B (zh) * 2023-05-08 2023-09-08 河北豪沃尔智能科技有限责任公司 信息传输装置、信息传输线路故障检测方法及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103067242A (zh) * 2012-12-04 2013-04-24 中国电信股份有限公司云计算分公司 一种用于提供网络服务的虚拟机***
CN103607296A (zh) * 2013-11-01 2014-02-26 杭州华三通信技术有限公司 一种虚拟机故障处理方法和设备
US20160127509A1 (en) * 2014-10-29 2016-05-05 Vmware, Inc. Methods, systems and apparatus to remotely start a virtual machine
CN105656715A (zh) * 2015-12-30 2016-06-08 ***股份有限公司 用于监测云计算环境下网络设备的状态的方法和装置
CN109218141A (zh) * 2018-11-20 2019-01-15 郑州云海信息技术有限公司 一种故障节点检测方法及相关装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7778191B2 (en) * 2008-12-12 2010-08-17 Mitel Networks Corporation System and method for fast detection of communication path failures
US10263836B2 (en) * 2014-03-24 2019-04-16 Microsoft Technology Licensing, Llc Identifying troubleshooting options for resolving network failures
CN105763471B (zh) * 2014-12-16 2019-12-17 中兴通讯股份有限公司 虚拟机环境下链路管理方法、装置和***
CN105591955B (zh) * 2015-10-30 2019-07-09 新华三技术有限公司 一种报文传输的方法和装置
CN107179957B (zh) * 2016-03-10 2020-08-25 阿里巴巴集团控股有限公司 物理机故障分类处理方法、装置和虚拟机恢复方法、***

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103067242A (zh) * 2012-12-04 2013-04-24 中国电信股份有限公司云计算分公司 一种用于提供网络服务的虚拟机***
CN103607296A (zh) * 2013-11-01 2014-02-26 杭州华三通信技术有限公司 一种虚拟机故障处理方法和设备
US20160127509A1 (en) * 2014-10-29 2016-05-05 Vmware, Inc. Methods, systems and apparatus to remotely start a virtual machine
CN105656715A (zh) * 2015-12-30 2016-06-08 ***股份有限公司 用于监测云计算环境下网络设备的状态的方法和装置
CN109218141A (zh) * 2018-11-20 2019-01-15 郑州云海信息技术有限公司 一种故障节点检测方法及相关装置

Also Published As

Publication number Publication date
CN110247821A (zh) 2019-09-17
CN110247821B (zh) 2022-10-18

Similar Documents

Publication Publication Date Title
US10715411B1 (en) Altering networking switch priority responsive to compute node fitness
US11743097B2 (en) Method and system for sharing state between network elements
US8274881B2 (en) Altering access to a fibre channel fabric
US9678826B2 (en) Fault isolation method, computer system, and apparatus
CN110166355B (zh) 一种报文转发方法及装置
US9838245B2 (en) Systems and methods for improved fault tolerance in solicited information handling systems
US10470111B1 (en) Protocol to detect if uplink is connected to 802.1D noncompliant device
WO2018137520A1 (zh) 一种业务恢复方法及装置
US10530634B1 (en) Two-channel-based high-availability
CN113300917B (zh) Open Stack租户网络的流量监控方法、装置
CN106982244B (zh) 在云网络环境下实现动态流量的报文镜像的方法和装置
CN103746855A (zh) 电信云中异常事件的处理方法及装置
CN103036701A (zh) 一种跨网段的n+1备用方法及装置
WO2018171728A1 (zh) 服务器、存储***及相关方法
JP5558422B2 (ja) ネットワークシステム、冗長化方法、障害検知装置及び障害検知プログラム
WO2020244067A1 (zh) 故障检测方法及相关设备
CN114760192A (zh) 容器切换方法及节点设备
US9563388B2 (en) Sharing a hosted device in a computer network
WO2021244500A1 (zh) 一种备份状态确定方法、装置及***
CN112217718A (zh) 一种业务处理方法、装置、设备及存储介质
CN111741077A (zh) 网络服务的调度方法、装置、电子设备以及存储介质
US20180367634A1 (en) Redundant Network Routing with Proxy Servers
WO2023207235A1 (zh) 用户面管理方法、控制面设备及用户面设备
CN114513398B (zh) 网络设备告警处理方法、装置、设备及存储介质
JP2016131355A (ja) Bmc、情報処理装置、方法、および、プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19932122

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19932122

Country of ref document: EP

Kind code of ref document: A1