CN114500238A

CN114500238A - Automatic switching system and method for block-level disaster recovery, electronic device and medium

Info

Publication number: CN114500238A
Application number: CN202210090705.1A
Authority: CN
Inventors: 董帅; 陈跃俊
Original assignee: Ybm Technologies Pvt ltd
Current assignee: Ybm Technologies Pvt ltd
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-05-13
Anticipated expiration: 2042-01-25
Also published as: CN114500238B

Abstract

The invention provides an automatic switching system, a method, electronic equipment and a medium of block-level disaster recovery, wherein the automatic switching method of the block-level disaster recovery comprises the following steps: monitoring the conditions of an application port and an IP address in real time; executing a self-healing script when the application port and the IP address are closed; when the application port and the IP address are opened, judging whether the application server has continuous abnormal conditions; when self-healing fails or continuous abnormal conditions exist in the application server, a snapshot creating instruction is generated, the snapshot creating instruction is derived, a virtual machine is created on the disaster recovery resource pool through the derived snapshot creating instruction, and hardware resources of the virtual machine are set based on the original production environment. The automatic switching method of the block-level disaster recovery improves the problem that the ground-level disaster recovery cannot be automatically switched in a copy mode in the prior art.

Description

Automatic switching system and method for block-level disaster recovery, electronic device and medium

Technical Field

The present invention relates to the field of internet technologies, and in particular, to an automatic block-level disaster recovery switching system, method, electronic device, and medium.

Background

The ground-level disaster recovery is to copy all relevant application programs, operating systems and the like on the disk to a block copy server end through disk block-level copying, reserve the disk files in a virtualization format at the server end for storage, and realize the reserve of a rollback point through the snapshot function of a virtualization disk.

At present, in the industry, under a ground-level CDM replication mode, emergency takeover needs to be manually switched, and the operation is complex.

Disclosure of Invention

The invention aims to provide an automatic switching system, an automatic switching method, electronic equipment and a medium for block-level disaster recovery, wherein the automatic switching method for the block-level disaster recovery can solve the problem that the ground-level disaster recovery cannot be automatically switched in a copy mode in the prior art.

In order to achieve the above purpose, the invention provides the following technical scheme:

the embodiment of the invention provides an automatic switching method of block-level disaster recovery, which specifically comprises the following steps:

monitoring the conditions of an application port and an IP address in real time;

executing a self-healing script when the application port and the IP address are closed;

when the application port and the IP address are opened, judging whether the application server has continuous abnormal conditions;

when self-healing fails or continuous abnormal conditions exist in the application server, a snapshot creating instruction is generated, the snapshot creating instruction is derived, a virtual machine is created on the disaster recovery resource pool through the derived snapshot creating instruction, and hardware resources of the virtual machine are set based on the original production environment

On the basis of the technical scheme, the invention can be further improved as follows:

further, the monitoring the conditions of the application port and the IP address in real time includes:

judging whether the communication between the control center and the control center gateway fails or not, and if so, executing a self-healing script;

when the communication between the control center and the control center gateway is successful, judging whether the communication between the control center and the application server gateway fails or not, if so, sending a warning signal and executing a self-healing script;

and when the control center is successfully communicated with the control center gateway and the control center is successfully communicated with the application server gateway, judging whether the application port and the IP address are closed or not, and if so, executing a self-healing script.

Further, when the application port and the IP address are closed, executing a self-healing script, including:

presetting times and interval time for executing the self-healing script in the control center;

after the self-healing script is executed, monitoring whether the application port and the IP address are closed again within the designated time;

and if the application port and the IP address are not closed again, judging that the self-healing result is successful in self-healing.

Further, after the self-healing script is executed, monitoring whether the application port and the IP address are closed again within a specified time includes:

if the application port and the IP address are closed again, judging that the self-healing result is self-healing failure;

the application server sends the self-healing failure feedback information to a control center;

the control center stops executing the self-healing script.

Further, when the application port and the IP address are open, determining whether a persistent abnormal condition exists in the application server includes:

acquiring monitoring data of the application server in real time;

and comparing the monitoring data, judging whether the monitoring data has long-time CPU and memory occupation abnormity, and if so, judging that the application server has abnormity.

Further, when the self-healing fails or the application server has a persistent abnormal condition, generating a snapshot creating instruction, deriving the snapshot creating instruction, creating a virtual machine on the disaster recovery resource pool through the derived snapshot creating instruction, and setting hardware resources of the virtual machine based on the original production environment includes:

setting hardware resources of the virtual machine based on the original production environment and preset resource limit;

and executing a port closing action on the opened application port and the IP address through an application server.

Further, when the self-healing fails or the application server has a persistent abnormal condition, generating a snapshot creating instruction, deriving the snapshot creating instruction, creating a virtual machine on the disaster recovery resource pool through the derived snapshot creating instruction, and setting hardware resources of the virtual machine based on the original production environment, further comprising:

and detecting whether the application server is normally started, if so, monitoring the application port, the IP address and the operation condition of the application server after the application server is normally started.

An automatic switching system for block-level disaster recovery, comprising:

the application server is used for monitoring the conditions of an application port and an IP address in real time and executing a self-healing script when the application port and the IP address are closed;

the control center is used for judging whether the application server has continuous abnormal conditions or not when the application port and the IP address are opened;

and the creating module is used for generating a snapshot creating instruction and deriving the snapshot creating instruction when the self-healing fails or the application server has continuous abnormal conditions, creating a virtual machine on the disaster recovery resource pool through the derived snapshot creating instruction, and setting hardware resources of the virtual machine based on the original production environment.

An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when executing the computer program.

A non-transitory computer readable medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method.

The invention has the following advantages:

the automatic switching method of the block-level disaster recovery system monitors the conditions of an application port and an IP address in real time; executing a self-healing script when the application port and the IP address are closed; when the application port and the IP address are opened, judging whether the application server has continuous abnormal conditions; when self-healing fails or continuous abnormal conditions exist in the application server, a snapshot creating instruction is generated, the snapshot creating instruction is derived, a virtual machine is created on the disaster recovery resource pool through the derived snapshot creating instruction, and hardware resources of the virtual machine are set based on the original production environment. The problem that ground level disaster recovery cannot be automatically switched in a copy mode in the prior art is solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of an automatic switching method of block-level disaster recovery according to the present invention;

FIG. 2 is a block diagram of an automatic switching system for block-level disaster recovery according to the present invention;

FIG. 3 is a block diagram of the disaster recovery system of the present invention;

fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.

Description of the reference numerals

The system comprises an application server 10, a control center 20, a creation module 30, a disaster recovery execution module 40, an electronic device 50, a processor 501, a memory 502 and a bus 503.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In this application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings. These terms are used primarily to better describe the present application and its embodiments, and are not used to limit the indicated devices, elements or components to a particular orientation or to be constructed and operated in a particular orientation.

Moreover, some of the above terms may be used to indicate other meanings besides the orientation or positional relationship, for example, the term "on" may also be used to indicate some kind of attachment or connection relationship in some cases. The specific meaning of these terms in this application will be understood by those of ordinary skill in the art as appropriate.

In addition, the term "plurality" shall mean two as well as more than two.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 is a flowchart of an embodiment of an automatic switching method for block-level disaster recovery, and as shown in fig. 1, the automatic switching method for block-level disaster recovery provided in the embodiment of the present invention includes the following steps:

s101, monitoring the conditions of an application port and an IP address in real time;

specifically, whether the communication between the control center 20 and the gateway of the control center 20 fails or not is judged, and if so, a self-healing script is executed; when the communication between the control center 20 and the gateway of the control center 20 is successful, judging whether the communication between the control center 20 and the gateway of the application server 10 is failed, if so, sending a warning signal and executing a self-healing script;

the Gateway (Gateway) is also called an internetwork connector and a protocol converter. The gateway realizes network interconnection above a network layer, is a complex network interconnection device and is only used for interconnection of two networks with different high-level protocols. The gateway can be used for interconnection of both wide area networks and local area networks. A gateway is a computer system or device that acts as a switch-operative. The gateway is a translator used between two systems that differ in communication protocol, data format or language, or even in an entirely different architecture. Instead of the bridge simply communicating the information, the gateway repackages the received information to accommodate the needs of the destination system. Same layer-application layer.

And when the control center 20 and the control center 20 are successfully communicated with each other and the control center 20 and the application server 10 are successfully communicated with each other, judging whether the application port is closed or not, and if so, executing a self-healing script.

A port is an idea of an English port and can be considered as an outlet for communication between equipment and the outside. Ports can be divided into virtual ports, which refer to ports within a computer or within a switch router, and physical ports, which are not visible. Such as 80 ports, 21 ports, 23 ports, etc. in a computer. The physical ports are also called interfaces, and are visible ports, RJ45 network ports of a computer backplane, RJ45 ports such as a switch router hub, and the like. The use of RJ11 jacks by telephones is also within the category of physical ports.

S102, when the application port and the IP address are closed, executing a self-healing script;

specifically, the number of times and the interval time for executing the self-healing script are preset in the control center 20;

after the self-healing script is executed, monitoring whether the application port and the IP address are closed again within the specified time; if the application port and the IP address are not closed again, judging that the self-healing result is successful in self-healing;

the application server 10 sends the feedback information of the self-healing failure to the control center 20;

the control center 20 stops executing the self-healing script.

S103, when the application port and the IP address are opened, judging whether the application server has continuous abnormal conditions;

specifically, the monitoring data of the application server 10 is obtained in real time;

comparing the monitoring data, judging whether the monitoring data has long-time CPU and memory occupation abnormity, if so, judging that the application server 10 has abnormity.

S104, when the self-healing fails or the application server has continuous abnormal conditions, generating a snapshot creating instruction, deriving the snapshot creating instruction, automatically creating a virtual machine on the disaster recovery resource pool through the derived snapshot creating instruction, and automatically setting resources of the virtual machine according to the original production environment and preset resource limits;

specifically, the open application port and IP address are used to execute a port closing action through the application server 10;

and detecting whether the application server 10 is normally started, if so, monitoring the application port, the IP address and the operation condition of the application server 10 after the application server 10 is normally started.

The disaster recovery backup system is composed of a control center 20, a disaster recovery backup execution module 40 and a client (application server 10).

The application server 10 has three functional modules, which are an agent module, a CDP module and a monitor module respectively; the agent module is used for communicating, executing and transmitting data with the disaster recovery execution module 40; the CDP module is used for driving, real-time data capturing and local caching, the monitor module is used for monitoring the state of the application server 10 and actively pushing the monitoring result to the control center 20 for use, and the monitor module pushes the opening conditions of the CPU, the memory, the network and the application port (specified) to the control center 20 within 3-5 seconds.

The control center 20 actively queries the task running condition of the disaster recovery execution module 40, issues the task to the disaster recovery execution module 40, receives monitor information of the client, detects the condition returned by the monitor, and reversely determines the opening condition of the client IP and the application port according to the condition returned by the monitor.

The self-healing times and the interval time are preset in the control center 20, the monitor module monitors that the application port is closed, automatically executes the preset self-healing script, and monitors whether the application port is closed again in the specified time, the self-healing times and the interval time can be set, and if the application port is successfully self-healed in the specified time, other operations are not performed;

if the self-healing of the application port fails, the monitor feeds back information to the control center 20 and stops the self-healing operation.

Firstly, the control center 20 confirms the communication condition between the control center and the gateway of the control center; if the communication between the control center 20 and the gateway of the control center 20 fails, the control center 20 itself has a problem and does not perform any subsequent operation;

the control center 20 determines the gateway communication condition with the application server 10; if the first step is successful, the communication between the control center 20 and the gateway of the application server 10 fails, and the network between the control center 20 and the application server 10 has a problem, only an alarm is given, but no other subsequent operation is performed.

The condition of an application port; if the first step is successful, the second step is successful and the third step is failed, the automatic scheduling switching process is prepared.

Aiming at self-healing judgment conditions: after the self-healing failure result returned by the client is obtained, the control center 20 judges the communication between the control center 20 and the gateway of the control center 20, the communication between the control center 20 and the gateway of the application server 10, and the condition of the application port, and determines the subsequent operation according to the judgment result.

And judging according to other conditions: the method mainly comprises the steps of judging continuous abnormal conditions of a CPU (Central processing Unit), a memory and the like of an application, comparing and analyzing short-term and long-term monitoring data, if the CPU and the memory occupy abnormally high time (the time length and the times can be set), if the CPU and the memory suddenly appear for a long time, the problem of memory and CPU leakage possibly exists, so that the application cannot be normally provided or can only be provided locally, after conditions are met, the conditions of communication between a control center 20 and a control center 20 gateway, communication between the control center 20 and an application server 10 gateway and an application port are judged, and the success of the first step and the success of the second step are increased, and if the conditions are successful, a starting switching process is executed;

automatically switching the flow by the judgment result of the application port:

before switching: blocking the opened application port, and executing the application port closing action by the monitor program to enable the production network card to be offline or reset to 1.1.1.1; next handover is performed (handover is performed when IP is not available);

in the switching process: the following process is automatically completed: generating a snapshot creating instruction, deriving the snapshot creating instruction, automatically creating a virtual machine on the disaster recovery backup resource pool through the derived snapshot creating instruction, automatically setting resources such as an MAC (media access control) address, a VLAN (virtual local area network) number, a CPU (central processing unit), a memory and the like according to the original production environment and preset resource limitation, and automatically starting up after the setting is finished;

after switching: whether the application server 10 is normally started up is detected, when the application server 10 is normally started up, the conditions of communication between the control center 20 and the control center 20 gateway, communication between the control center 20 and the application server 10 gateway, an application port and an IP address are judged, and after all the IP, the application port, the IP address, the CPU, the memory and the like are normal, a detection result is sent to the control center 20 (including abnormal conditions).

Fig. 2 is a flowchart of an embodiment of an automatic switching system for block-level disaster recovery, and fig. 3 is a block diagram of the disaster recovery system according to the present invention; as shown in fig. 2 to 3, an automatic switching system for block-level disaster recovery provided in an embodiment of the present invention includes the following steps:

the control center is used for judging whether the application server has continuous abnormal conditions or not when the application port and the IP address are opened; judging whether the communication between the control center and the control center gateway fails or not, and if so, executing a self-healing script; when the communication between the control center and the control center gateway is successful, judging whether the communication between the control center and the application server gateway fails or not, if so, sending a warning signal and executing a self-healing script; and when the control center is successfully communicated with the control center gateway and the control center is successfully communicated with the application server gateway, judging whether the application port and the IP address are closed or not, and if so, executing a self-healing script.

The creating module 30 is configured to generate a snapshot creating instruction when the self-healing fails or the application server has a persistent abnormal condition, derive the snapshot creating instruction, create a virtual machine on the disaster recovery resource pool through the derived snapshot creating instruction, and set a hardware resource of the virtual machine based on an original production environment;

The application server 10 has three functional modules, which are an agent module, a CDP module and a monitor module; the agent module is used for communicating, executing and transmitting data with the disaster recovery execution module 40; the CDP module is used for driving, real-time data capturing and local caching, the monitor module is used for monitoring the state of the application server 10 and actively pushing the monitoring result to the control center 20 for use, and the monitor module pushes the opening conditions of the CPU, the memory, the network and the application port (specified) to the control center 20 within 3-5 seconds.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device 50 includes: a processor 501(processor), a memory 502(memory), and a bus 503;

the processor 501 and the memory 502 complete communication with each other through the bus 503;

the processor 501 is configured to call program instructions in the memory 502 to perform the methods provided by the above-described method embodiments, including, for example: monitoring the conditions of an application port and an IP address in real time; executing a self-healing script when the application port and the IP address are closed; when the application port and the IP address are opened, judging whether the application server has continuous abnormal conditions; when self-healing fails or continuous abnormal conditions exist in the application server, a snapshot creating instruction is generated, the snapshot creating instruction is derived, a virtual machine is created on the disaster recovery resource pool through the derived snapshot creating instruction, and hardware resources of the virtual machine are set based on the original production environment.

The present embodiments provide a non-transitory computer readable medium storing computer instructions that cause a computer to perform the methods provided by the above method embodiments, for example, including: monitoring the conditions of an application port and an IP address in real time; executing a self-healing script when the application port and the IP address are closed; when the application port and the IP address are opened, judging whether the application server has continuous abnormal conditions; when self-healing fails or continuous abnormal conditions exist in the application server, a snapshot creating instruction is generated, the snapshot creating instruction is derived, a virtual machine is created on the disaster recovery resource pool through the derived snapshot creating instruction, and hardware resources of the virtual machine are set based on the original production environment.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned media include: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An automatic switching method for block-level disaster recovery is characterized by specifically comprising the following steps:

when self-healing fails or continuous abnormal conditions exist in the application server, a snapshot creating instruction is generated, the snapshot creating instruction is derived, a virtual machine is created on the disaster recovery resource pool through the derived snapshot creating instruction, and hardware resources of the virtual machine are set based on the original production environment.

2. The method according to claim 1, wherein the monitoring the application port and the IP address in real time comprises:

3. The method according to claim 1, wherein the executing a self-healing script when the application port and the IP address are closed comprises:

4. The method according to claim 1, wherein the monitoring whether the application port and the IP address are closed again within a specified time after the self-healing script is executed comprises:

the control center stops executing the self-healing script.

5. The method according to claim 1, wherein the determining whether the application server has a persistent abnormal condition when the application port and the IP address are open includes:

acquiring monitoring data of the application server in real time;

6. The method according to claim 1, wherein the generating a snapshot creating instruction when self-healing fails or a persistent abnormal condition exists in the application server, deriving the snapshot creating instruction, creating a virtual machine on a disaster recovery resource pool through the derived snapshot creating instruction, and setting hardware resources of the virtual machine based on an original production environment includes:

7. The method according to claim 1, wherein when self-healing fails or a persistent abnormal condition exists in the application server, generating a snapshot creation instruction, deriving the snapshot creation instruction, creating a virtual machine on a disaster recovery resource pool through the derived snapshot creation instruction, and setting hardware resources of the virtual machine based on an original production environment, further comprises:

8. An automatic switching system for block-level disaster recovery, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor realizes the steps of the method according to any one of claims 1 to 7 when executing the computer program.

10. A non-transitory computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.