CN112463449B

CN112463449B - Data disaster tolerance backup method and device

Info

Publication number: CN112463449B
Application number: CN202011359464.3A
Authority: CN
Inventors: 张云云
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2022-07-19
Anticipated expiration: 2040-11-27
Also published as: CN112463449A

Abstract

The invention discloses a data disaster recovery backup method and a device, wherein the method comprises the following steps: continuously monitoring the maximum time delay of the synchronous data, and determining that the period is a bad period in response to the synchronous data with the maximum time delay exceeding the maximum time delay threshold value in a time period exceeding a percentage threshold value of total data amount; updating the bad cycle duration based on the number of the bad cycles, and judging that the compression or conversion problem occurs at the master end or the slave end or the transmission delay problem occurs in a network transmission link in response to the fact that the bad cycle duration exceeds a link tolerance length threshold; allocating more computing resources for compressing or transforming the synchronous data at the master or slave in response to determining that a compression transformation problem occurs at the master or slave; in response to determining that a transmission delay problem occurs with the network transmission link, an artificial intelligence engine of the storage controller is used at the master or slave to optimize the network transmission link and to adaptively adjust the number of virtual links. The invention can efficiently and quickly synchronize under large data traffic and low network bandwidth.

Description

Data disaster tolerance backup method and device

Technical Field

The present invention relates to the field of storage, and in particular, to a method and an apparatus for disaster recovery backup of data.

Background

In the existing copy mode, the synchronous copy mode is that after the service application data is written into the master, the master receives the confirmation signal of the completion of the write operation only after the response data is synchronized to the slave volume, and performs the next write operation; the data of the slave end in the asynchronous copy operation does not need to be kept consistent with the data of the master end in time at each time point, the host write operation can be responded to be completed after the host service data reaches the cache of the master end, the next write operation is continued, and the synchronous data of the master end is transmitted to the slave end in sequence. When IP (network address) remote copy is carried out, an asynchronous copy mode is more adopted, and the limitation of long-distance link delay is avoided.

In a low-bandwidth network environment, when the IP remote copy uses an asynchronous copy mode for data transmission, a relatively serious network delay may exist, which may cause inconsistency between backup data at the slave side and data at the master side, and if the amount of service data is relatively large, a situation that a data difference is increased may exist. When too much data backlog is not synchronized and the service data is refreshed all the time, and network delay exists all the time, the situation that data synchronous copying is stopped due to the performance problem of long data frame synchronization period may occur, so that data synchronization is stopped at two ends, data disaster recovery backup cannot be continuously performed, the synchronization function must be manually started again, and synchronization time is wasted. At this time, the disaster recovery backup function of the remote copy function is greatly limited, the backup data is not timely and complete, and the data synchronization is easily interrupted due to performance problems.

Aiming at the problems that disaster recovery backup function is limited and backup data is not timely and comprehensive in the prior art, no effective solution is available at present.

Disclosure of Invention

In view of this, an object of the embodiments of the present invention is to provide a method and an apparatus for disaster recovery backup of data, which can efficiently and quickly synchronize with large data traffic and low network bandwidth.

Based on the above object, a first aspect of the embodiments of the present invention provides a data disaster recovery backup method, including the following steps:

continuously compressing the synchronous data at the master end and writing the synchronous data into the cache of the master end, and continuously transmitting the synchronous data in the cache of the master end to the slave end through a plurality of virtual links to execute disaster recovery backup;

continuously monitoring the maximum time delay of the synchronous data, and determining that the period is a bad period in response to the synchronous data with the maximum time delay exceeding the maximum time delay threshold value in a time period exceeding a percentage threshold value of total data amount;

updating the bad cycle duration based on the number of the bad cycles, and judging that the compression or conversion problem occurs at the master end or the slave end or the transmission delay problem occurs in a network transmission link in response to the fact that the bad cycle duration exceeds a link tolerance length threshold;

allocating more computing resources at the master or slave for compressing or transforming the synchronous data in response to determining that a compression transformation problem occurs at the master or slave;

in response to determining that a transmission delay problem occurs with the network transmission link, an artificial intelligence engine of the storage controller is used at the master or slave to optimize the network transmission link and to adaptively adjust the number of virtual links.

In some embodiments, continuously compressing the synchronized data at the master and writing to the master cache comprises: the synchronous data is compressed using hardware, which is a compression card device installed to the master, and/or software, which is a calculation instruction of the central processor.

In some embodiments, allocating more computing resources for the compressed synchronized data at the master or slave includes:

in response to the master end or the slave end being provided with the compression card equipment, the compression card equipment is used in a hardware mode to compress or convert the synchronous data, and meanwhile, idle computing resources of the central processing unit are called in a software mode to compress or convert the synchronous data;

in response to the master-side or slave-side non-compression card device, additional computing resources of the central processor are invoked in a software manner to compress or convert the synchronous data, and in response to the central processor lacking enough additional computing resources, the process with high load in the central processor is suspended to release more additional computing resources until the central processor is reloaded.

In some embodiments, updating the bad cycle duration based on the number of bad cycles comprises: the bad cycle duration is accumulated in response to the newly completed time period being a bad cycle, and the bad cycle duration is accumulated in response to the newly completed time period not being a bad cycle.

In some embodiments, the method further comprises: while the number of virtual links is adaptively adjusted, the number of virtual links is also stored in a storage controller, and a plurality of virtual links are directly reconstructed based on the number of virtual links in response to the virtual links being restarted after being interrupted.

In some embodiments, the method further comprises: the synchronization data is retransmitted in response to the synchronization data being lost in the virtual link.

In some embodiments, the method further comprises: the transmission of the synchronization data is suspended in response to the bad cycle duration exceeding the link-tolerant length threshold by at least an order of magnitude.

A second aspect of an embodiment of the present invention provides a data disaster recovery backup apparatus, including:

a processor; and

a memory storing program code executable by the processor, the program code when executed performing the steps of:

continuously compressing the synchronous data at the master end and writing the synchronous data into the master end cache, and continuously transmitting the synchronous data in the master end cache to the slave end through a plurality of virtual links to execute disaster recovery backup;

updating the bad cycle duration based on the number of the bad cycles, and judging that the master end or the slave end has a compression or conversion problem or a network transmission link has a transmission delay problem in response to the fact that the bad cycle duration exceeds a link tolerance length threshold;

allocating more computing resources for compressing or transforming the synchronous data at the master or slave in response to determining that a compression transformation problem occurs at the master or slave;

in response to determining that a transmission delay problem occurs with the network transmission link, an artificial intelligence engine of the storage controller is used at the master or slave to optimize the network transmission link and adaptively adjust the number of virtual links.

in response to the master-side or slave-side non-compression card device, additional computing resources of the central processing unit are invoked in a software manner to compress or convert the synchronization data, and in response to the central processing unit lacking sufficient additional computing resources, the process with high load in the central processing unit is suspended to release more additional computing resources until the central processing unit reduces the load again.

The invention has the following beneficial technical effects: according to the data disaster recovery backup method and device provided by the embodiment of the invention, synchronous data are continuously compressed at the main end and written into the main end cache, and meanwhile, the synchronous data in the main end cache are continuously transmitted to the slave end through a plurality of virtual links to execute disaster recovery backup; continuously monitoring the maximum time delay of the synchronous data, and determining that the period is a bad period in response to the synchronous data with the maximum time delay exceeding the maximum time delay threshold value in a time period exceeding a percentage threshold value of total data amount; updating the bad cycle duration based on the number of the bad cycles, and judging that the compression or conversion problem occurs at the master end or the slave end or the transmission delay problem occurs in a network transmission link in response to the fact that the bad cycle duration exceeds a link tolerance length threshold; allocating more computing resources at the master or slave for compressing or transforming the synchronous data in response to determining that a compression transformation problem occurs at the master or slave; the technical scheme of using an artificial intelligence engine of a storage controller to optimize the network transmission link and adaptively adjust the number of virtual links at a master end or a slave end in response to the fact that the transmission delay problem of the network transmission link is determined can efficiently and quickly synchronize under the conditions of large data traffic and low network bandwidth.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a data disaster recovery backup method provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are only used for convenience of expression and should not be construed as a limitation to the embodiments of the present invention, and no description is given in the following embodiments.

In view of the foregoing, a first aspect of the embodiments of the present invention provides an embodiment of a data disaster recovery backup method for efficient and fast synchronization under large data traffic and low network bandwidth. Fig. 1 is a schematic flow chart of a data disaster recovery backup method provided by the present invention.

As shown in fig. 1, the data disaster recovery backup method includes the following steps:

step S101, continuously compressing synchronous data at a master end and writing the data into a master end cache, and continuously transmitting the synchronous data in the master end cache to a slave end through a plurality of virtual links to execute disaster recovery backup;

step S103, continuously monitoring the maximum time delay of the synchronous data, and determining that the period is a bad period in response to the fact that the synchronous data of which the maximum time delay exceeds the maximum time delay threshold exceeds the percentage threshold of the total data amount in a time period;

step S105, updating the bad cycle duration based on the number of the bad cycles, and judging that the master end or the slave end has a compression or conversion problem or a network transmission link has a transmission delay problem in response to the fact that the bad cycle duration exceeds a link tolerance length threshold;

step S107, responding to the compression conversion problem of the master end or the slave end, and allocating more computing resources for compressing or converting the synchronous data at the master end or the slave end;

step S109, in response to determining that the network transmission link has a transmission delay problem, optimizing the network transmission link and adaptively adjusting the number of virtual links at the master end or the slave end using an artificial intelligence engine of the storage controller.

The design scheme provided by the invention solves the problems that in the IP remote copy function, the data synchronization is delayed due to the conditions of smaller network bandwidth, large service data volume and the like, a large amount of unsynchronized data exists at the main end, and the remote copy synchronization is interrupted due to slow synchronization period and low performance; the method improves the remote copy synchronization rate, optimizes the related synchronization strategy of the remote copy function, realizes that the IP remote copy function is more rapid and efficient in synchronization under the conditions of large data traffic and lower network bandwidth, and is not interrupted due to performance problems.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. Embodiments of the computer program may achieve the same or similar effects as any of the preceding method embodiments corresponding thereto.

in response to the master end or the slave end being provided with the compression card equipment, the idle computing resources of the central processing unit are called in a software mode to compress or convert the synchronous data while the synchronous data is compressed or converted by using the compression card equipment in a hardware mode;

In some embodiments, the method further comprises: the synchronization data is resent in response to the synchronization data being lost in the virtual link.

The following further illustrates embodiments of the invention in terms of specific examples:

firstly, in the IP remote copy function stored at two ends, a compression technology is added, and the compression can be realized by two modes: software compression and hardware compression. Software compression consumes CPU resources of a system, hardware compression requires hardware devices such as a compression card and the like to be added at a storage end for online real-time compression, consumption of the CPU resources can be degraded after the hardware compression is used, and data volume actually required to be transmitted on a copy link is reduced through a compression technology.

Secondly, monitoring of data streams of remote copy functions is added in a storage system, a monitoring period Monitorperiod is set at a certain time interval, two global monitoring parameters for remote copy are set, one global monitoring parameter is used for specifying the maximum time delay Maxhostdelay allowed by a remote copy link, the time length Linktolerance for insufficient links among global mirror image operation systems is recorded, if the round-trip delay of at least one third IO in one period is greater than the maximum time delay Maxhostdelay, the period is regarded as a bad period, and a bad period occurs and the bad period is accumulated; a good cycle occurs and the bad cycle time is reduced. The accumulated time of the bad period is compared with the insufficient time length Linktollence of the link, and if the accumulated time of the bad period is higher than the value in the specified fixed time period, the performance problem of the synchronization of the current remote copy environment is shown, which may be the problem of the storage of a master end, the storage of a slave end or the link between systems. At this moment, to continue data synchronization, the CPU and the memory resource are automatically adjusted, the network bandwidth is increased, and the compression rate of the transmission data is improved according to the delay reason and the delay degree of the system.

When the problem of compression conversion of the data stored in the master end and the slave end is detected, if a compression card is installed, the system automatically starts a hardware compression function of data transmission and uses idle CPU resources to perform software compression. If the compression card is not installed and the CPU resource utilization rate is high, if the IOPS is not increased, finding out an event causing the CPU utilization rate to be increased, moving the event to a time with low load, and then performing soft compression on the remote transmission data.

If delay is caused in the data transmission process, a compression technology is adopted to compress transmission data, meanwhile, a network transmission link is required to be optimized in real time by an AI engine at the storage controller end according to the delay condition, and the influence caused by delay is solved by adopting a link virtualization technology for network transmission optimization. Link virtualization is to virtualize a physical link into multiple links, and transmit data through the multiple links. If a packet is lost from any of the virtual links, the data will be retransmitted. The number of virtual links is controlled by the AI engine, which also monitors the link performance during data transmission, and it can adjust the number of virtual links appropriately, this information remaining in the controller so that if the link is stopped and started again, it will be restarted using the previously set virtual link.

In order to prevent synchronization stop caused by detection of overlong bad cycle due to too low performance, and timely allocate related resources to remote copy data synchronization transmission when load is small, when performance is too low, in order to ensure correctness of data transmission, when the duration of the bad cycle is far longer than the duration of a preset link insufficiency during transmission of remote copy, a remote copy synchronization function automatically waits for a period of time and tries again, and synchronization time and resource waste caused by direct stop are avoided.

It can be seen from the foregoing embodiments that, in the data disaster recovery backup method provided in the embodiments of the present invention, synchronous data is continuously compressed at the master and written into the master cache, and meanwhile, the synchronous data in the master cache is continuously transmitted to the slave through a plurality of virtual links to execute disaster recovery backup; continuously monitoring the maximum time delay of the synchronous data, and determining that the period is a bad period in response to the synchronous data with the maximum time delay exceeding the maximum time delay threshold value in a time period exceeding a percentage threshold value of total data amount; updating the bad cycle duration based on the number of the bad cycles, and judging that the master end or the slave end has a compression or conversion problem or a network transmission link has a transmission delay problem in response to the fact that the bad cycle duration exceeds a link tolerance length threshold; allocating more computing resources for compressing or transforming the synchronous data at the master or slave in response to determining that a compression transformation problem occurs at the master or slave; the technical scheme of using an artificial intelligence engine of a storage controller to optimize the network transmission link and adaptively adjust the number of virtual links at a master end or a slave end in response to the fact that the transmission delay problem of the network transmission link is determined can efficiently and quickly synchronize under the conditions of large data traffic and low network bandwidth.

It should be particularly noted that, the steps in the embodiments of the data disaster recovery backup method may be intersected, replaced, added, or deleted, and therefore, the data disaster recovery backup method based on these reasonable permutation and combination transformations shall also belong to the scope of the present invention, and shall not limit the scope of the present invention to the described embodiments.

In view of the foregoing, a second aspect of the embodiments of the present invention provides an embodiment of a data disaster recovery backup device that is efficient and fast synchronized under large data traffic and low network bandwidth. The data disaster recovery backup device comprises:

a processor; and

continuously monitoring the maximum time delay of the synchronous data, and determining that the period is a bad period in response to the percentage threshold value of the total data quantity of the synchronous data of which the maximum time delay exceeds the maximum time delay threshold value in a time period;

It can be seen from the foregoing embodiments that, in the data disaster recovery backup apparatus provided in the embodiments of the present invention, synchronous data is continuously compressed at the master and written into the master cache, and meanwhile, the synchronous data in the master cache is continuously transmitted to the slave through a plurality of virtual links to execute disaster recovery backup; continuously monitoring the maximum time delay of the synchronous data, and determining that the period is a bad period in response to the synchronous data with the maximum time delay exceeding the maximum time delay threshold value in a time period exceeding a percentage threshold value of total data amount; updating the bad cycle duration based on the number of the bad cycles, and judging that the compression or conversion problem occurs at the master end or the slave end or the transmission delay problem occurs in a network transmission link in response to the fact that the bad cycle duration exceeds a link tolerance length threshold; allocating more computing resources for compressing or transforming the synchronous data at the master or slave in response to determining that a compression transformation problem occurs at the master or slave; the technical scheme of using an artificial intelligence engine of a storage controller to optimize the network transmission link and adaptively adjust the number of virtual links at a master end or a slave end in response to the fact that the transmission delay problem of the network transmission link is determined can efficiently and quickly synchronize under the conditions of large data traffic and low network bandwidth.

It should be particularly noted that, in the embodiment of the data disaster recovery backup apparatus, the working process of each module is specifically described by using the embodiment of the data disaster recovery backup method, and a person skilled in the art can easily think that these modules are applied to other embodiments of the data disaster recovery backup method. Of course, since the steps in the embodiment of the data disaster recovery backup method may be mutually intersected, replaced, added, and deleted, the data disaster recovery backup device that is transformed by these reasonable permutations and combinations shall also belong to the protection scope of the present invention, and shall not limit the protection scope of the present invention to the embodiment.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant only to be exemplary, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. A data disaster recovery backup method is characterized by comprising the following steps:

continuously compressing synchronous data at a master end and writing the synchronous data into a master end cache, and continuously transmitting the synchronous data in the master end cache to a slave end through a plurality of virtual links to execute disaster recovery backup;

continuously monitoring a maximum time delay of the synchronized data and determining that the period is a bad period in response to the synchronized data having the maximum time delay exceeding a maximum time delay threshold for a period of time exceeding a percentage threshold of a total amount of data;

updating the duration of the bad cycles based on the number of the bad cycles, and judging that the master end or the slave end has a compression or conversion problem or a network transmission link has a transmission delay problem in response to the duration of the bad cycles exceeding a link tolerance length threshold;

allocating more computing resources at the master or the slave for compressing or transforming the synchronous data in response to determining that a compression transformation problem occurs at the master or the slave;

in response to determining that a transmission delay problem occurs with a network transmission link, an artificial intelligence engine of a storage controller is used at the master or the slave to optimize the network transmission link and to adaptively adjust the number of virtual links.

2. The method of claim 1, wherein continuously compressing the synchronized data at the master and writing to the master cache comprises: the synchronization data is compressed using hardware, which is a compression card device installed to the master, and/or software, which is a calculation instruction of a central processor.

3. The method of claim 1, wherein allocating more computing resources at the master or the slave for compressing the synchronization data comprises:

responding to the master end or the slave end to install a compression card device, and calling idle computing resources of a central processing unit in a software mode to compress or convert the synchronous data while compressing or converting the synchronous data by using the compression card device in a hardware mode;

in response to the master or the slave not having the compression card device, additional computing resources of the central processor are invoked in a software manner to compress or convert the synchronous data, and in response to the central processor lacking enough additional computing resources, the process with high load in the central processor is suspended to release more additional computing resources until the central processor is unloaded again.

4. The method of claim 1, wherein updating the bad cycle duration based on the number of bad cycles comprises: the bad cycle duration is accumulated in response to the newly completed time period being a bad cycle, and the bad cycle duration is accumulated in response to the newly completed time period not being a bad cycle.

5. The method of claim 1, further comprising: while adaptively adjusting the number of virtual links, the number of virtual links is also stored to the storage controller, and the plurality of virtual links are directly reconstructed based on the number of virtual links in response to the virtual links being restarted after being interrupted.

6. The method of claim 1, further comprising: resending the synchronization data in response to the synchronization data being lost in the virtual link.

7. The method of claim 1, further comprising: suspending transmission of the synchronization data in response to the bad cycle duration exceeding a link-tolerant length threshold by at least an order of magnitude.

8. A data disaster recovery backup device, comprising:

a processor; and

allocating more computing resources at the master or the slave for compressing or transforming the synchronized data in response to determining that a compression transformation problem occurred at the master or the slave;

9. The apparatus of claim 8, wherein the continuing to compress the synchronized data at the master and write to the master cache comprises: the synchronization data is compressed using hardware, which is a compression card device installed to the master, and/or software, which is a calculation instruction of a central processor.

10. The apparatus of claim 8, wherein allocating more computing resources at the master or the slave for compressing the synchronization data comprises:

in response to the master or the slave not having the compactcard apparatus, invoking additional computing resources of the central processor in a software manner to compress or convert the synchronization data, and in response to the central processor lacking sufficient additional computing resources, suspending a process with a high load in the central processor to release more additional computing resources until the central processor is again under-loaded.