CN111209145A - Virtual machine disaster tolerance-based service self-healing method, equipment and storage medium - Google Patents

Virtual machine disaster tolerance-based service self-healing method, equipment and storage medium Download PDF

Info

Publication number
CN111209145A
CN111209145A CN201811393959.0A CN201811393959A CN111209145A CN 111209145 A CN111209145 A CN 111209145A CN 201811393959 A CN201811393959 A CN 201811393959A CN 111209145 A CN111209145 A CN 111209145A
Authority
CN
China
Prior art keywords
virtual machine
standby
main
healing
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811393959.0A
Other languages
Chinese (zh)
Inventor
周志军
李华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201811393959.0A priority Critical patent/CN111209145A/en
Priority to PCT/CN2019/112364 priority patent/WO2020103627A1/en
Publication of CN111209145A publication Critical patent/CN111209145A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Abstract

The invention discloses a virtual machine disaster recovery-based service self-healing method, equipment and a storage medium. The method comprises the following steps: in the process of operating the main virtual machine, monitoring the state of the main virtual machine; when the state of the main virtual machine is monitored to meet the service self-healing triggering condition, controlling a standby virtual machine corresponding to the main virtual machine to process the service of the main virtual machine; the standby virtual machine and the main virtual machine are located in different data centers, and the standby virtual machine is configured as a disaster recovery virtual machine of the main virtual machine. When the state of the main virtual machine meets the service self-healing triggering condition, the standby virtual machine is enabled to replace the main virtual machine to complete the service self-healing process, the main virtual machine does not need to be restarted or the main virtual machine does not need to be re-established, the standby virtual machine is directly controlled to replace the main virtual machine, the service of the main virtual machine is processed, the time consumption of the process is short, the service self-healing can be quickly realized, and the service interruption time is shortened.

Description

Virtual machine disaster tolerance-based service self-healing method, equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, a device, and a storage medium for self-healing of a service based on a virtual machine disaster recovery.
Background
With the popularization and application of cloud computing, more and more applications are deployed in a cloud environment. Cloud computing has the characteristic of dynamically adjusting resources, so that many applications, particularly cluster applications based on load balancing, such as web applications, support dynamic scaling, that is, application servers in a cluster are dynamically adjusted according to the load condition of the application, so as to improve the reliability and availability of the application. However, there is a limiting requirement for applications that want to support dynamic scaling, namely: the application must be stateless. For stateful applications, such as: applications containing state data, applications containing file system data or database data, do not support dynamic scaling under load balancing.
For stateful applications, a service self-healing manner is generally adopted to improve the reliability and availability of the applications. At present, a service self-healing method is generally implemented by using virtual machine regeneration, such a virtual machine mounts a cloud hard disk as a data storage disk, then monitors the state of the virtual machine of an application, if the state of the virtual machine is abnormal, such as PING (Packet Internet group, Internet Packet explorer) is not connected, URL (Uniform Resource Locator) access fails, and the like, the virtual machine with the abnormal state is restarted, if the service is not recovered, the virtual machine is deleted, a virtual machine with the same IP Address (Internet Protocol Address) is created again, and the same cloud hard disk is mounted as the data storage disk, so as to ensure data consistency, thereby implementing service. However, the service self-healing method has the problem that the time for deleting the virtual machine and rebuilding the virtual machine is long, at least several minutes are needed, and thus the problem that the service interruption time is long occurs.
Disclosure of Invention
The invention aims to solve the technical problem of providing a virtual machine disaster recovery-based service self-healing method, equipment and a storage medium, which are used for solving the problems of long self-healing time and long service interruption time of the existing service self-healing method.
In order to solve the technical problems, the invention solves the problems by the following technical scheme:
the invention provides a service self-healing method based on virtual machine disaster tolerance, which comprises the following steps: in the process of operating the main virtual machine, monitoring the state of the main virtual machine; when the state of the main virtual machine is monitored to accord with a service self-healing triggering condition, controlling a standby virtual machine corresponding to the main virtual machine to process the service of the main virtual machine; the standby virtual machine and the active virtual machine are located in different data centers, and the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine.
Wherein the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine, and comprises: the standby virtual machine is configured to be the same as the IP address of the main virtual machine; the standby virtual machine is configured to synchronize data with the active virtual machine.
Wherein the standby virtual machine is configured to have the same internet protocol IP address as the active virtual machine, including: and controlling a network switch between the standby virtual machine and the main virtual machine by calling an Application Programming Interface (API) of a cloud resource management system, so that the network switch configures the standby virtual machine to be the same as the IP address of the main virtual machine.
Wherein the standby virtual machine is configured to synchronize data with the active virtual machine, including: and configuring data in the storage device mounted by the standby virtual machine to be synchronous with data mirror images in the storage device mounted by the main virtual machine by calling an API (application programming interface) of a cloud resource management system.
Wherein the method further comprises: in the process of operating the main virtual machine, copying the data mirror image in the storage device mounted by the main virtual machine into the storage device mounted by the standby virtual machine, so that the standby virtual machine and the main virtual machine are synchronized in data.
Wherein the monitoring the state of the active virtual machine includes: sending a monitoring message to the main virtual machine every other preset time period; collecting a return message corresponding to the monitoring message; and determining the state of the main virtual machine according to the return message corresponding to the monitoring message.
The controlling the standby virtual machine corresponding to the active virtual machine to process the service of the active virtual machine includes: closing the main virtual machine to enable the equipment state of the main virtual machine to be in a standby state; and starting the standby virtual machine to enable the equipment state of the standby virtual machine to be in a main state.
The service self-healing triggering condition comprises the following steps: and the network abnormality and/or the service abnormality of the main virtual machine.
The invention also provides a virtual machine disaster recovery-based service self-healing device, which comprises a processor and a memory; the processor is used for executing the virtual machine disaster recovery-based service self-healing program stored in the memory so as to realize the virtual machine disaster recovery-based service self-healing method.
The present invention further provides a storage medium storing one or more programs, which are executable by one or more processors to implement the above-mentioned virtual machine disaster recovery-based service self-healing method.
The beneficial effects of one embodiment of the invention are as follows:
in the invention, a standby virtual machine is configured as a disaster recovery virtual machine of a main virtual machine, when the state of the main virtual machine meets a service self-healing triggering condition, the standby virtual machine is enabled to replace the main virtual machine, a service self-healing process is completed, in the service self-healing process, the main virtual machine does not need to be restarted, the main virtual machine does not need to be re-established, the standby virtual machine is directly controlled to replace the main virtual machine, the service of the main virtual machine is processed, the time consumption of the process is short, the service self-healing can be rapidly realized, and the service interruption time is shortened.
Drawings
Fig. 1 is a flowchart of a virtual machine disaster recovery-based service self-healing method according to a first embodiment of the present invention;
fig. 2 is a flowchart of a virtual machine disaster recovery-based service self-healing method according to a second embodiment of the present invention;
fig. 3 is a structural diagram of a virtual machine disaster recovery-based service self-healing device according to a third embodiment of the present invention;
fig. 4 is a structural diagram of a virtual machine disaster recovery-based service self-healing system according to a third embodiment of the present invention.
Detailed Description
The present invention will be described in further detail below with reference to the drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Example one
The embodiment provides a service self-healing method based on virtual machine disaster recovery. Fig. 1 is a flowchart of a virtual machine disaster recovery-based service self-healing method according to a first embodiment of the present invention.
Step S110, during the process of running the active virtual machine, performing state monitoring on the active virtual machine.
The method for monitoring the state of the main virtual machine comprises the following steps: sending a monitoring message to the main virtual machine every other preset time period; collecting a return message corresponding to the monitoring message; and determining the state of the main virtual machine according to the return message corresponding to the monitoring message.
The state of the primary virtual machine comprises the following steps: network status and/or access status.
The categories of monitoring messages include, but are not limited to: PING messages and URL access messages. The PING message may be an ICMP (Internet Control Messages Protocol) message corresponding to the PING command.
Monitoring a return message corresponding to the message, including: time Out information corresponding to the PING information, and URL access failure information corresponding to the URL access information.
Step S120, when the state of the main virtual machine is monitored to accord with the service self-healing triggering condition, controlling a standby virtual machine corresponding to the main virtual machine to process the service of the main virtual machine; the standby virtual machine is located in a different data center than the active virtual machine, and the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine.
The disaster recovery virtual machine is as follows: and the virtual machine corresponds to the same service as the main virtual machine and is used for replacing the main virtual machine to process the service.
And the service self-healing triggering condition is used for identifying whether the current state of the main virtual machine needs to replace the main virtual machine by the standby virtual machine or not so as to realize service self-healing.
The service self-healing triggering conditions comprise: and the network exception and/or the service exception of the main virtual machine.
In this embodiment, according to the situation that the active virtual machine returns the message, it may be determined whether the state of the active virtual machine meets the service self-healing trigger condition. For example:
when the state of the main virtual machine meets the service self-healing triggering condition, closing the main virtual machine to enable the equipment state of the main virtual machine to be in a standby state; and starting the standby virtual machine to enable the equipment state of the standby virtual machine to be in the main state. Since the standby virtual machine is the disaster recovery virtual machine of the primary virtual machine, after the primary virtual machine is turned off and the standby virtual machine is turned on, the standby virtual machine replaces the primary virtual machine to start processing the service of the primary virtual machine, and in the process, the primary virtual machine is converted into the disaster recovery virtual machine of the standby virtual machine. .
The standby virtual machine is configured as a disaster recovery virtual machine of the main virtual machine, and comprises: the standby virtual machine is configured to be the same as the IP address of the main virtual machine; the standby virtual machine is configured to synchronize data with the active virtual machine. The IP address of the standby virtual machine is the same as that of the main virtual machine, and the standby virtual machine and the main virtual machine are synchronized, so that the standby virtual machine can replace the main virtual machine.
Further, an Application Programming Interface (API) of the cloud resource management system may be called to control a network switch between the standby virtual machine and the primary virtual machine, so that the network switch configures the standby virtual machine to have the same IP address as the primary virtual machine.
Further, the data in the storage device mounted by the standby virtual machine can be configured to be synchronous with the data in the storage device mounted by the main virtual machine in a mirror image manner by calling the API interface of the cloud resource management system, and the data synchronization of the main virtual machine and the standby virtual machine can be realized through the configuration. The data of the main virtual machine is stored in the storage device mounted by the main virtual machine, and the data of the standby virtual machine is stored in the storage device mounted by the standby virtual machine. The storage devices respectively mounted by the main virtual machine and the standby virtual machine can be cloud hard disks.
Since the data synchronization between the active virtual machine and the standby virtual machine needs to be maintained, in the process of operating the active virtual machine, the data image in the storage device mounted by the active virtual machine is copied to the storage device mounted by the standby virtual machine, so that the data synchronization between the standby virtual machine and the active virtual machine (data image synchronization) is facilitated.
In this embodiment, a standby virtual machine is configured as a disaster recovery virtual machine of a primary virtual machine, when a state of the primary virtual machine meets a service self-healing trigger condition, the standby virtual machine replaces the primary virtual machine to complete service self-healing, and in a service self-healing process, the primary virtual machine does not need to be restarted, or the primary virtual machine does not need to be re-created, the standby virtual machine is directly controlled to replace the primary virtual machine, so that a service of the primary virtual machine is processed.
In this embodiment, the active virtual machine and the standby virtual machine may be disaster recovery virtual machines. That is, after controlling the standby virtual machine corresponding to the active virtual machine to process the service of the active virtual machine, the active virtual machine that stops the service processing has been converted into the standby virtual machine, and the standby virtual machine that starts the service processing has been converted into the active virtual machine. At this time, the original main virtual machine (the converted standby virtual machine) may be subjected to troubleshooting processing, so that the state of the original main virtual machine may be subjected to normal service processing, and thus, when it is monitored that the service state of the original standby virtual machine (the converted main virtual machine) meets the service self-healing triggering condition, the original main virtual machine may replace the original standby virtual machine, and service self-healing is completed.
Example two
In order to make the present invention clearer, a more specific embodiment is provided below to describe the virtual machine disaster recovery-based service self-healing method of the present invention.
Fig. 2 is a flowchart of a virtual machine disaster recovery-based service self-healing method according to a second embodiment of the present invention.
Step S210, a first virtual machine is set in a first data center, and a second virtual machine corresponding to the same service application as the first virtual machine is set in a second data center.
The deployment of the first virtual machine and the second virtual machine corresponding to the same service application can distribute the service application in two different data centers, reduce the risk of service interruption and realize data center level disaster recovery.
Step S220, configuring a virtual machine disaster recovery policy for the first virtual machine and the second virtual machine, so that the first virtual machine is used as the active virtual machine, and the second virtual machine is used as the standby virtual machine.
Configuring a virtual machine disaster recovery strategy, comprising: and configuring the first virtual machine as an active virtual machine, and configuring the second virtual machine as a standby virtual machine, so that the second virtual machine is used as a disaster recovery virtual machine of the first virtual machine.
Specifically, the second virtual machine is configured to have the same IP address as the first virtual machine; configuring a second virtual machine to be in data synchronization with the first virtual machine; and configuring the equipment state of the first virtual machine into an active state, and configuring the equipment state of the second virtual machine into a standby state, so that the first virtual machine becomes the active virtual machine and the second virtual machine becomes the standby virtual machine. By the configuration mode, the standby virtual machine can be used as the disaster recovery virtual machine of the main virtual machine to replace the main virtual machine to perform service processing.
Configuring the device state of the first virtual machine to be an active state, and configuring the device state of the second virtual machine to be a standby state, including: and configuring the first virtual machine into a starting-up state and the second virtual machine into a shutdown state, wherein the starting-up state represents a main state and the shutdown state represents a standby state. By adjusting the starting-up and shutdown states of the virtual machine, the virtual machine can be adjusted to be the main virtual machine or the standby virtual machine. Further, if the first virtual machine is the primary virtual machine, the second virtual machine is the standby virtual machine; and if the second virtual machine is the main virtual machine, the first virtual machine is the main virtual machine.
The first Virtual machine and the second Virtual machine are configured to have consistent IP addresses, and may be implemented by using a Virtual Router Redundancy Protocol (VRRP) technology of a network switch. Further, the network switch connecting the first virtual machine (the primary virtual machine) and the second virtual machine (the standby virtual machine) can be controlled by calling the API interface of the cloud resource management system, so that the network switch configures the same IP address as the first virtual machine for the second virtual machine by using the VRRP technology.
The first virtual machine and the second virtual machine are configured for data synchronization, so that the data consistency of the first virtual machine of the first data center and the second virtual machine of the second data center can be ensured. The data synchronization may be a data mirror synchronization. Further, the data in the storage device mounted by the first virtual machine (primary virtual machine) and the data in the storage device mounted by the second virtual machine (standby virtual machine) can be configured to be in mirror synchronization by calling the API interface of the cloud resource management system.
Step S230, configuring a service self-healing policy for the first virtual machine and the second virtual machine.
Configuring a service self-healing strategy, comprising: and configuring the state of the main virtual machine to start a service self-healing process under the condition of meeting service self-healing triggering.
The service self-healing triggering conditions comprise: and the network exception and/or the service exception of the main virtual machine.
Network exceptions for the active virtual machine include, but are not limited to: and continuously detecting that the network of the main virtual machine is not communicated for N times. N is a positive integer greater than 1, and N may be an empirical value or an experimentally obtained value. For example: the number of times that the continuous PING does not pass through the primary virtual machine reaches N times.
The service exception of the active virtual machine includes but is not limited to: and the service access to the main virtual machine for M times fails continuously. M is a positive integer greater than 1, and M may be an empirical value or an experimentally obtained value. For example: and (5) failing to access the URL of the main virtual machine for M times continuously.
Step S240, in the process of running the first virtual machine, performing state monitoring on the first virtual machine.
The device state of the first virtual machine is a main state (starting state), so that the first virtual machine can operate as a main virtual machine and can process services; the device state of the second virtual machine is a standby state (shutdown state), so the second virtual machine is temporarily inoperable as a standby virtual machine and cannot process a service.
Specifically, a monitoring message is sent to a first virtual machine serving as a primary virtual machine every preset time period, and a return message corresponding to the monitoring message is collected, for example: collecting URL access failure information indicating access failure, collecting Time Out information indicating that PING is not passed and the like, and monitoring the state of the first virtual machine; and judging whether the state of the first virtual machine accords with a service self-healing triggering condition or not according to the configured service self-healing strategy, starting a service self-healing flow if the state of the first virtual machine accords with the service self-healing triggering condition, and continuously monitoring the state of the first virtual machine if the state of the first virtual machine does not accord with the service self-healing triggering condition.
The categories of monitoring messages include, but are not limited to: PING messages and URL access messages.
For example: the service self-healing triggering conditions comprise: the frequency that the continuous PING does not pass through the main virtual machine reaches 3 times, and the frequency that the continuous URL accesses the main virtual machine fails reaches 3 times; sending a PING message to the first virtual machine every 5 seconds, wherein the PING message is not PING for 3 times continuously, sending a URL access message to the first virtual machine every 5 seconds, and wherein the URL access message fails for 3 times continuously, at this time, it can be determined that the state of the first virtual machine meets a service self-healing triggering condition, and a service self-healing process can be started.
Step S250, when it is monitored that the state of the first virtual machine conforms to the service self-healing trigger condition in the service self-healing policy, the second virtual machine is used as the active virtual machine, and the first virtual machine is used as the standby virtual machine.
After the service self-healing process is started, an API (application program interface) of the cloud resource management system is called, a first virtual machine of a first data center is closed, and a second virtual machine of a second data center is started. By the method, the equipment state of the first virtual machine is in a shutdown state, namely the first virtual machine enters a standby state; and enabling the equipment state of the second virtual machine to be in a starting state, namely enabling the second virtual machine to enter an active state, replacing the first virtual machine and starting to process the service of the first virtual machine.
In this embodiment, since the first virtual machine and the second virtual machine correspond to the same application, the data of the first virtual machine and the second virtual machine are mirror-synchronized, and the IP addresses of the first virtual machine and the second virtual machine are the same, after the first virtual machine is closed and the second virtual machine is opened, there is no influence on the access to the service, thereby implementing self-healing of the service.
In this embodiment, in the process of the service self-healing processing, it is not necessary to re-create the active virtual machine, but only the pre-configured disaster recovery virtual machine needs to be started, and the time for starting the disaster recovery virtual machine is usually less than 1 minute, so that the speed of the service self-healing can be effectively increased, and the time for service interruption can be shortened.
EXAMPLE III
The embodiment provides a virtual machine disaster recovery-based service self-healing device. Fig. 3 is a structural diagram of a virtual machine disaster recovery-based service self-healing device according to a third embodiment of the present invention.
In this embodiment, the virtual machine disaster recovery-based service self-healing device includes, but is not limited to: a processor 310, a memory 320.
The processor 310 is configured to execute the virtual machine disaster recovery-based service self-healing program stored in the memory 320, so as to implement the virtual machine disaster recovery-based service self-healing method described above.
Specifically, the processor 310 is configured to execute the virtual machine disaster recovery based service self-healing program stored in the memory 320, so as to implement the following steps of the virtual machine disaster recovery based service self-healing method: in the process of operating the main virtual machine, monitoring the state of the main virtual machine; when the state of the main virtual machine is monitored to accord with a service self-healing triggering condition, controlling a standby virtual machine corresponding to the main virtual machine to process the service of the main virtual machine; the standby virtual machine and the active virtual machine are located in different data centers, and the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine.
Optionally, the configuring, by the standby virtual machine, of the disaster recovery virtual machine of the active virtual machine includes: the standby virtual machine is configured to be the same as the IP address of the main virtual machine; the standby virtual machine is configured to synchronize data with the active virtual machine.
Optionally, the configuring, by the standby virtual machine, an internet protocol IP address that is the same as that of the active virtual machine includes: and controlling a network switch between the standby virtual machine and the main virtual machine by calling an Application Programming Interface (API) of a cloud resource management system, so that the network switch configures the standby virtual machine to be the same as the IP address of the main virtual machine.
Optionally, the configuring, by the standby virtual machine, data synchronization with the active virtual machine includes: and configuring data in the storage device mounted by the standby virtual machine to be synchronous with data mirror images in the storage device mounted by the main virtual machine by calling an API (application programming interface) of a cloud resource management system.
Optionally, in the process of operating the active virtual machine, the data image in the storage apparatus mounted by the active virtual machine is copied to the storage apparatus mounted by the standby virtual machine, so that the data of the standby virtual machine and the data of the active virtual machine are synchronized.
Optionally, the monitoring the state of the active virtual machine includes: sending a monitoring message to the main virtual machine every other preset time period; collecting a return message corresponding to the monitoring message; and determining the state of the main virtual machine according to the return message corresponding to the monitoring message.
Optionally, the controlling the standby virtual machine corresponding to the active virtual machine to process the service of the active virtual machine includes: closing the main virtual machine to enable the equipment state of the main virtual machine to be in a standby state; and starting the standby virtual machine to enable the equipment state of the standby virtual machine to be in a main state.
Optionally, the service self-healing triggering condition includes: and the network abnormality and/or the service abnormality of the main virtual machine.
The virtual machine disaster recovery-based service self-healing device can be arranged on the cloud resource management system side or can be independently arranged. Fig. 4 is a structural diagram of a virtual machine disaster recovery-based service self-healing system according to a third embodiment of the present invention. In fig. 4, the virtual machine disaster recovery-based service self-healing device and the cloud resource management system are independently arranged.
The service self-healing system based on the virtual machine disaster recovery comprises: the system comprises a cloud resource management system 410, a virtual machine disaster recovery-based service self-healing device 420, a first data center 430 and a second data center 440. Included in the cloud resource management system 410 is a network switch (not shown) through which the first data center 430 and the second data center 440 may be connected.
A first virtual machine 431 and a third virtual machine 432 are provided in the first data center 430, and a second virtual machine 441 and a fourth virtual machine 442 are provided in the second data center 440.
The cloud resource management system 410 is configured to manage the virtual machines of the first data center 430 and the virtual machines of the second data center 440. The cloud resource management system 410 provides an API interface 411, and the API interface 411 is connected to the first data center 430 and the second data center 440, respectively.
The virtual machine disaster recovery-based service self-healing device 420 may call the API interface 411, configure the first virtual machine 431, the second virtual machine 441, the third virtual machine 432, and the fourth virtual machine 442, configure the second virtual machine 441 as a disaster recovery virtual machine of the first virtual machine 431, enable the first virtual machine 431 and the second virtual machine 441 to process the first service correspondingly, configure the fourth virtual machine 442 as a disaster recovery virtual machine of the third virtual machine 432, and enable the third virtual machine 432 and the fourth virtual machine 442 to process the second service correspondingly. The virtual machine disaster recovery-based service self-healing device 420 may also call the API interface 411 to configure a first service self-healing trigger condition for the first virtual machine 431 and the second virtual machine 441, and configure a second service self-healing trigger condition for the third virtual machine 432 and the fourth virtual machine 442.
According to the configuration of the virtual machine disaster recovery-based service self-healing device 420, the first virtual machine 431 is in a power-on state as a main virtual machine, and the second virtual machine 441 is in a power-off state as a standby virtual machine; the third virtual machine 432 is in the power-on state as the active virtual machine, and the fourth virtual machine 442 is in the power-off state as the standby virtual machine.
The virtual machine disaster recovery-based service self-healing device 420 may send monitoring messages to the first virtual machine 431 and the third virtual machine 432, respectively, and monitor states of the first virtual machine 431 and the third virtual machine 432 by collecting return messages corresponding to the monitoring messages; when monitoring that the state of the first virtual machine 431 meets the first service self-healing triggering condition, the virtual machine disaster recovery-based service self-healing device 420 calls the API interface 411 of the cloud resource management system 410, closes the first virtual machine 431, and opens the second virtual machine 441 to complete the service self-healing of the first service; when monitoring that the state of the third virtual machine 432 meets the second service self-healing triggering condition, the virtual machine disaster recovery-based service self-healing device 420 invokes the API interface 411 of the cloud resource management system 410, closes the third virtual machine 432, and opens the fourth virtual machine 442, thereby completing the service self-healing of the second service.
Example four
The embodiment of the invention also provides a storage medium (computer readable storage medium). The storage medium herein stores one or more programs. Among others, the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.
When the one or more programs in the storage medium are executable by the one or more processors, the method for self-healing of the virtual machine disaster recovery based service is implemented.
The processor is used for executing the virtual machine disaster recovery-based service self-healing program stored in the memory so as to realize the following steps of the virtual machine disaster recovery-based service self-healing method: in the process of operating the main virtual machine, monitoring the state of the main virtual machine; when the state of the main virtual machine is monitored to accord with a service self-healing triggering condition, controlling a standby virtual machine corresponding to the main virtual machine to process the service of the main virtual machine; the standby virtual machine and the active virtual machine are located in different data centers, and the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine.
Optionally, the configuring, by the standby virtual machine, of the disaster recovery virtual machine of the active virtual machine includes: the standby virtual machine is configured to be the same as the IP address of the main virtual machine; the standby virtual machine is configured to synchronize data with the active virtual machine.
Optionally, the configuring, by the standby virtual machine, an internet protocol IP address that is the same as that of the active virtual machine includes: and controlling a network switch between the standby virtual machine and the main virtual machine by calling an Application Programming Interface (API) of a cloud resource management system, so that the network switch configures the standby virtual machine to be the same as the IP address of the main virtual machine.
Optionally, the configuring, by the standby virtual machine, data synchronization with the active virtual machine includes: and configuring data in the storage device mounted by the standby virtual machine to be synchronous with data mirror images in the storage device mounted by the main virtual machine by calling an API (application programming interface) of a cloud resource management system.
Optionally, in the process of operating the active virtual machine, the data image in the storage apparatus mounted by the active virtual machine is copied to the storage apparatus mounted by the standby virtual machine, so that the data of the standby virtual machine and the data of the active virtual machine are synchronized.
Optionally, the monitoring the state of the active virtual machine includes: sending a monitoring message to the main virtual machine every other preset time period; collecting a return message corresponding to the monitoring message; and determining the state of the main virtual machine according to the return message corresponding to the monitoring message.
Optionally, the controlling the standby virtual machine corresponding to the active virtual machine to process the service of the active virtual machine includes: closing the main virtual machine to enable the equipment state of the main virtual machine to be in a standby state; and starting the standby virtual machine to enable the equipment state of the standby virtual machine to be in a main state.
Optionally, the service self-healing triggering condition includes: and the network abnormality and/or the service abnormality of the main virtual machine.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, and the scope of the invention should not be limited to the embodiments described above.

Claims (10)

1. A service self-healing method based on virtual machine disaster tolerance is characterized by comprising the following steps:
in the process of operating the main virtual machine, monitoring the state of the main virtual machine;
when the state of the main virtual machine is monitored to accord with a service self-healing triggering condition, controlling a standby virtual machine corresponding to the main virtual machine to process the service of the main virtual machine;
the standby virtual machine and the active virtual machine are located in different data centers, and the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine.
2. The method of claim 1, wherein the standby virtual machine is configured as a disaster recovery virtual machine of the primary virtual machine, comprising:
the standby virtual machine is configured to be the same as the IP address of the main virtual machine;
the standby virtual machine is configured to synchronize data with the active virtual machine.
3. The method of claim 2, wherein the standby virtual machine is configured to be the same as an internet protocol, IP, address of the primary virtual machine, comprising:
and controlling a network switch between the standby virtual machine and the main virtual machine by calling an Application Programming Interface (API) of a cloud resource management system, so that the network switch configures the standby virtual machine to be the same as the IP address of the main virtual machine.
4. The method of claim 2, wherein the standby virtual machine is configured for data synchronization with the primary virtual machine, comprising:
and configuring data in the storage device mounted by the standby virtual machine to be synchronous with data mirror images in the storage device mounted by the main virtual machine by calling an API (application programming interface) of a cloud resource management system.
5. The method of claim 4, wherein the method further comprises:
in the process of operating the main virtual machine, copying the data mirror image in the storage device mounted by the main virtual machine into the storage device mounted by the standby virtual machine, so that the standby virtual machine and the main virtual machine are synchronized in data.
6. The method of claim 1, wherein the monitoring the state of the active virtual machine comprises:
sending a monitoring message to the main virtual machine every other preset time period;
collecting a return message corresponding to the monitoring message;
and determining the state of the main virtual machine according to the return message corresponding to the monitoring message.
7. The method according to claim 1, wherein the controlling the standby virtual machine corresponding to the active virtual machine to process the service of the active virtual machine includes:
closing the main virtual machine to enable the equipment state of the main virtual machine to be in a standby state;
and starting the standby virtual machine to enable the equipment state of the standby virtual machine to be in a main state.
8. The method according to any one of claims 1 to 7, wherein the service self-healing triggering condition includes: and the network abnormality and/or the service abnormality of the main virtual machine.
9. The virtual machine disaster recovery-based service self-healing device is characterized by comprising a processor and a memory; the processor is configured to execute the virtual machine disaster recovery-based service self-healing program stored in the memory to implement the virtual machine disaster recovery-based service self-healing method according to any one of claims 1 to 8.
10. A storage medium, storing one or more programs, which are executable by one or more processors to implement the method for self-healing of virtual machine disaster recovery based service according to any one of claims 1 to 8.
CN201811393959.0A 2018-11-21 2018-11-21 Virtual machine disaster tolerance-based service self-healing method, equipment and storage medium Pending CN111209145A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811393959.0A CN111209145A (en) 2018-11-21 2018-11-21 Virtual machine disaster tolerance-based service self-healing method, equipment and storage medium
PCT/CN2019/112364 WO2020103627A1 (en) 2018-11-21 2019-10-21 Service self-healing method and device based on virtual machine disaster recovery, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811393959.0A CN111209145A (en) 2018-11-21 2018-11-21 Virtual machine disaster tolerance-based service self-healing method, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111209145A true CN111209145A (en) 2020-05-29

Family

ID=70774552

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811393959.0A Pending CN111209145A (en) 2018-11-21 2018-11-21 Virtual machine disaster tolerance-based service self-healing method, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN111209145A (en)
WO (1) WO2020103627A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112202853A (en) * 2020-09-17 2021-01-08 杭州安恒信息技术股份有限公司 Data synchronization method, system, computer device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102497288A (en) * 2011-12-13 2012-06-13 华为技术有限公司 Dual-server backup method and dual system implementation device
US20130262390A1 (en) * 2011-09-30 2013-10-03 Commvault Systems, Inc. Migration of existing computing systems to cloud computing sites or virtual machines
CN204859222U (en) * 2015-06-02 2015-12-09 郑州银行股份有限公司 With two high available systems that live of city data center
CN107171870A (en) * 2017-07-17 2017-09-15 郑州云海信息技术有限公司 A kind of two-node cluster hot backup method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3125122B1 (en) * 2014-03-28 2017-10-11 Ntt Docomo, Inc. Virtualized resource management node and virtual machine migration method
CN104579791A (en) * 2015-01-26 2015-04-29 浪潮电子信息产业股份有限公司 Method for achieving automatic K-DB main and standby disaster recovery cluster switching
CN106817238A (en) * 2015-11-30 2017-06-09 中兴通讯股份有限公司 Virtual machine repair method, virtual machine, system and business function network element

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130262390A1 (en) * 2011-09-30 2013-10-03 Commvault Systems, Inc. Migration of existing computing systems to cloud computing sites or virtual machines
CN102497288A (en) * 2011-12-13 2012-06-13 华为技术有限公司 Dual-server backup method and dual system implementation device
CN204859222U (en) * 2015-06-02 2015-12-09 郑州银行股份有限公司 With two high available systems that live of city data center
CN107171870A (en) * 2017-07-17 2017-09-15 郑州云海信息技术有限公司 A kind of two-node cluster hot backup method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112202853A (en) * 2020-09-17 2021-01-08 杭州安恒信息技术股份有限公司 Data synchronization method, system, computer device and storage medium
CN112202853B (en) * 2020-09-17 2022-07-22 杭州安恒信息技术股份有限公司 Data synchronization method, system, computer device and storage medium

Also Published As

Publication number Publication date
WO2020103627A1 (en) 2020-05-28

Similar Documents

Publication Publication Date Title
US8402305B1 (en) Method and system for providing high availability to computer applications
CN102708018B (en) Method and system for exception handling, proxy equipment and control device
CN108234158B (en) VNF establishment method, NFVO and network system
CN109286529B (en) Method and system for recovering RabbitMQ network partition
US20060031540A1 (en) High availability software based contact centre
CN111835685B (en) Method and server for monitoring running state of Nginx network isolation space
CN113347037B (en) Data center access method and device
CN113169895A (en) N +1 redundancy for virtualization services with low latency failover
CN111935244B (en) Service request processing system and super-integration all-in-one machine
CN111314098A (en) Method and device for realizing VIP address drift in HA system
CN113472956A (en) Cloud mobile phone management method and device
CN104503861A (en) Abnormality handling method and system, agency device and control device
CN112202853A (en) Data synchronization method, system, computer device and storage medium
CN110971662A (en) Two-node high-availability implementation method and device based on Ceph
CN114615141A (en) Communication control method
CN111209145A (en) Virtual machine disaster tolerance-based service self-healing method, equipment and storage medium
CN113596152A (en) Load balancing implementation method, system and device
CN112860485A (en) Control method of dual-computer hot standby system based on keepalived
CN114840495A (en) Database cluster split-brain prevention method, storage medium and device
CN110266790B (en) Edge cluster management method and device, edge cluster and readable storage medium
CN111211924A (en) Method and device for controlling single point high availability of computing node
CN107783855B (en) Fault self-healing control device and method for virtual network element
CN115190040B (en) High-availability realization method and device for virtual machine
CN111934909A (en) Method and device for switching IP (Internet protocol) resources of host and standby machine, computer equipment and storage medium
CN104702422A (en) Method, device and system for realizing high availability of communication equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200529