CN118113526A

CN118113526A - Distributed data storage planning method and system for improving disaster recovery capacity of data center

Info

Publication number: CN118113526A
Application number: CN202410392022.0A
Authority: CN
Inventors: 张腾; 谢作斌; 怀丹阳
Original assignee: Shenzhen Ai Rui Good Technology Co ltd
Current assignee: Shenzhen Ai Rui Good Technology Co ltd
Priority date: 2024-04-02
Filing date: 2024-04-02
Publication date: 2024-05-31

Abstract

The invention discloses a distributed data storage planning method and a system for improving disaster recovery capacity of a data center, which relate to the technical field of data storage management and comprise the following steps: determining an importance weight for each piece of data in the data center; acquiring all data storage nodes in a data center; calling historical operation data of a data center; determining risk correlation among all data storage nodes; dividing a backup storage space in each data center; and determining a data backup scheme of the data center, and carrying out data backup of the data storage nodes. The invention has the advantages that: the method can ensure that when the actual disaster causes the failure of part of data storage nodes, the data and the system are quickly and accurately recovered, the loss caused by the disaster is reduced, the availability and disaster recovery capability of the data are greatly improved, and the service interruption risk caused by the node failure is reduced.

Description

Distributed data storage planning method and system for improving disaster recovery capacity of data center

Technical Field

The invention relates to the technical field of data storage management, in particular to a distributed data storage planning method and system for improving disaster recovery capacity of a data center.

Background

As data centers continue to scale up, security and reliability of data becomes particularly important. Traditional data center storage architecture often faces the problem of data loss and difficult recovery, and cannot meet the requirement of modern enterprises on data security. Therefore, a new method and system for planning distributed data storage are needed to improve disaster recovery capability of a data center.

Disclosure of Invention

In order to solve the technical problems, the distributed data storage planning method and system for improving the disaster recovery capability of the data center are provided, the disaster recovery capability of the data center can be effectively improved, the integrity and usability of data are ensured, the loss caused by disasters is reduced, and the problems that the traditional data center storage architecture often faces the problem of difficult data loss and recovery and the requirement of modern enterprises on data security cannot be met are solved.

In order to achieve the above purpose, the invention adopts the following technical scheme:

a distributed data storage planning method for improving disaster recovery capacity of a data center comprises the following steps:

analyzing the data in the data center based on a preset data classification logic, and determining the important weight of each data in the data center;

acquiring all data storage nodes in a data center;

Calling historical operation data of a data center;

Determining risk correlation among all data storage nodes based on historical operation data of the data center;

dividing a backup storage space in each data center;

And determining a data backup scheme of the data center based on the important weight of the stored data of each data storage node and the risk correlation degree between the data storage nodes, and carrying out data backup of the data storage nodes.

Preferably, the data classification logic specifically includes:

Adding a weight reference value to each data based on the data type;

Setting a data analysis period;

acquiring the accessed times of each data in a data analysis period closest to the current moment, and recording the accessed times as data access analysis times;

Determining a data access frequency index of each data based on the data access analysis times of all the data;

Calculating the important weight of each data by adopting a data weight formula based on the weight reference value of each data and the data access frequency index of each data;

the data weight formula specifically comprises the following steps:

Z_i＝P_i×J_i；

Wherein Z _i is the importance weight of the ith data in the data center, P _i is the data access frequency index of the ith data in the data center, and J _i is the weight reference value of the ith data in the data center.

Preferably, the determining the data access frequency index of each data based on the data access analysis times of all the data specifically includes:

calculating the data access frequency index of each data by adopting an access frequency index calculation formula based on the data access analysis times of all the data;

the access frequency index calculation formula specifically comprises:

Wherein, P _i is the number of data access analysis times of the ith data of the data center, and S _i is the total number of data stored in the data center.

Preferably, the determining the risk correlation degree between all the data storage nodes based on the historical operation data of the data center specifically includes:

Determining operation data of storage node faults in historical operation data of a data center, and recording the operation data as fault operation data;

Combining all data storage nodes in the system in a random mode to obtain a plurality of data storage node groups;

And calculating the risk correlation degree between the two data storage nodes in each data storage node group by adopting a correlation algorithm.

Preferably, the correlation algorithm specifically includes:

Marking two data storage nodes in the data storage node group as M _j and M _k respectively;

screening a plurality of fault operation data of M _j with faults from the fault operation data to serve as first fault operation data;

screening a plurality of fault operation data of M _k with faults from the fault operation data to serve as second fault operation data;

Intersection of the first fault operation data and the second fault operation data is calculated as third fault operation data;

The first fault operation data and the second fault operation data are combined to form fourth fault operation data;

determining total number of first, second, third and fourth fault operation data, respectively;

calculating the risk correlation degree between M _j and M _k through a correlation calculation formula;

the correlation calculation formula specifically comprises the following steps:

Where X _jk is the risk correlation between M _j and M _k, N ₀ is the total number of all fault operation data, N _j is the number of first fault operation data, N _K is the number of second fault operation data, N _j∩k is the number of third fault operation data, and N _j∪k is the number of fourth fault operation data.

Preferably, the determining the data backup scheme of the data center based on the importance weight of the stored data of each data storage node and the risk correlation degree between the data storage nodes specifically includes:

determining a storage node corresponding to each piece of data in the data center;

Determining the size of each data;

Taking the risk correlation degree between the storage node corresponding to each data and each data storage node as the backup risk degree between the data and each data storage node;

constructing a data backup storage limiting condition;

Generating a plurality of preliminary data backup schemes based on the data backup storage constraint conditions;

A scheme evaluation model is built, wherein the scheme evaluation model takes a backup node corresponding to stored data in a data storage node in a preliminary data backup scheme as input, and takes a scheme reasonable value of the preliminary data backup scheme as output;

Calculating scheme rationality of each preliminary data backup scheme based on the scheme evaluation model;

screening out a preliminary data backup scheme with the minimum scheme rationality as a data backup scheme of the data storage node;

The mathematical expression of the data backup storage limiting condition is as follows:

In the mathematical expression of the data backup storage constraint condition, the data total number of the jth data storage node is taken as a backup node, the size of the ith data of the jth data storage node as the backup node is taken as the backup storage space size of the jth data storage node;

The scheme evaluation model specifically comprises the following steps:

In the scheme evaluation model, H is the scheme rationality of the primary data backup scheme, and X _i is the backup risk degree of the backup node corresponding to the ith data in the primary data backup scheme.

Furthermore, a distributed data storage planning system for improving disaster recovery capability of a data center is provided, which is used for implementing the distributed data storage planning method for improving disaster recovery capability of the data center, and the distributed data storage planning method comprises the following steps:

The data classification module is used for analyzing the data in the data center based on preset data classification logic and determining the important weight of each data in the data center;

The data center analysis module is used for acquiring all data storage nodes in the data center, calling historical operation data of the data center, and determining risk correlation among all the data storage nodes based on the historical operation data of the data center;

The storage planning module is electrically connected with the data classification module and the data center analysis module, and is used for determining a data backup scheme of the data center and carrying out data backup of the data storage nodes based on important weights of stored data of each data storage node and risk correlation between the data storage nodes.

Optionally, the data classification module includes:

a reference assignment unit for attaching a weight reference value to each data based on the data type;

The access analysis unit is used for setting a data analysis period, acquiring the accessed times of each data in the data analysis period closest to the current moment, recording the accessed times as data access analysis times, and determining the data access frequency index of each data based on the data access analysis times of all the data;

And the comprehensive weight analysis unit is used for calculating the important weight of each data by adopting a data weight formula based on the weight reference value of each data and the data access frequency index of each data.

Optionally, the data center analysis module includes:

The fault extraction unit is used for determining the operation data of the storage node faults in the historical operation data of the data center, and recording the operation data as fault operation data;

The node combination unit is used for carrying out random pairwise combination on all the data storage nodes in the system to obtain a plurality of data storage node groups;

and the correlation calculation unit is used for calculating the risk correlation degree between the two data storage nodes in each data storage node group by adopting a correlation algorithm.

Optionally, the storage planning module includes:

The data storage analysis unit is used for determining storage nodes corresponding to each data in the data center, determining the size of each data, and taking the risk correlation degree between the storage nodes corresponding to each data and each data storage node as the backup risk degree between the data and each data storage node;

The primary scheme generation unit is used for constructing data backup storage limiting conditions and generating a plurality of primary data backup schemes based on the data backup storage limiting conditions;

the model building unit is used for building a scheme evaluation model;

And the data backup planning unit is used for calculating the scheme rationality of each preliminary data backup scheme based on the scheme evaluation model, and screening out the preliminary data backup scheme with the minimum scheme rationality as the data backup scheme of the data storage node.

Compared with the prior art, the invention has the beneficial effects that:

The invention provides a distributed data storage planning scheme for improving disaster recovery capacity of a data center, which classifies data into different important weights according to the value and importance of the data; corresponding backup storage strategies are formulated aiming at data with different important weights, and redundant backup storage space is added in a distributed storage system to deal with data backup, when part of nodes fail, the data backup nodes can continue to provide service, the availability of the system is ensured, the data and the system can be ensured to be quickly and accurately recovered when the part of data storage nodes fail due to the actual disaster, the loss caused by the disaster is reduced, the availability and disaster tolerance of the data are greatly improved, and the service interruption risk caused by the node failure is reduced.

Drawings

FIG. 1 is a flow chart of a distributed data storage planning method for improving disaster recovery capacity of a data center according to the present disclosure;

FIG. 2 is a flow chart of a method of data classification logic in the present approach;

FIG. 3 is a flow chart of a method for determining risk correlation among all data storage nodes in the present approach;

FIG. 4 is a flow chart of a method of the correlation algorithm in the present solution;

FIG. 5 is a flow chart of a method for determining a data backup scheme for a data center in the present scheme;

fig. 6 is a block diagram of a distributed data storage planning system for improving disaster recovery capability of a data center according to the present disclosure.

Detailed Description

The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art.

Referring to fig. 1, a distributed data storage planning method for improving disaster recovery capability of a data center includes:

acquiring all data storage nodes in a data center;

Calling historical operation data of a data center;

dividing a backup storage space in each data center;

The scheme classifies the data into different important weights according to the value and importance of the data; corresponding backup storage strategies are formulated aiming at data with different important weights, and redundant backup storage space is added in a distributed storage system at the same time so as to deal with data backup, and when part of nodes fail, the data backup nodes can continue to provide service, so that the availability of the system is ensured.

Referring to fig. 2, the data classification logic is specifically:

Adding a weight reference value to each data based on the data type;

Setting a data analysis period;

the data weight formula specifically comprises:

Z_i＝P_i×J_i；

Based on the data access analysis times of all the data, determining the data access frequency index of each data specifically comprises:

The access frequency index calculation formula specifically comprises:

In the scheme, when the classification calculation of the data is performed, the weight reference value of the data is determined based on the attribute of the data, the weight reference value is determined by the attribute of the data and the data classification rule of the data center, and then the important weight of the data is comprehensively determined by combining the number of times the data is recently called.

Referring to fig. 3, determining risk correlation among all data storage nodes based on historical operating data of the data center specifically includes:

Referring to fig. 4, the correlation algorithm specifically includes:

the correlation calculation formula specifically comprises the following steps:

It can be understood that, because the same basic equipment and line connection exist between the data storage nodes in the data center, the data storage nodes with higher association degree usually fail at the same time, and when the data is backed up, the data stored in the data storage nodes need to be backed up to the data storage nodes with low association with the data storage nodes, so that when the data storage nodes fail, the backed up data storage nodes are not affected, and the backup data can be taken to perform normal operation of the system.

Referring to fig. 5, determining a data backup scheme of a data center based on importance weights of stored data of each data storage node and risk correlation between the data storage nodes specifically includes:

Determining the size of each data;

constructing a data backup storage limiting condition;

constructing a scheme evaluation model, wherein the scheme evaluation model takes a backup node corresponding to stored data in a data storage node in a preliminary data backup scheme as input and takes a scheme reasonable value of the preliminary data backup scheme as output;

the scheme evaluation model is specifically as follows:

It can be understood that when data backup is performed, backup data of each data center needs to be ensured to be smaller than backup space, meanwhile, more important data is backed up to data storage nodes with smaller corresponding backup risk, based on the data backup storage limiting conditions and a scheme evaluation model, a backup storage scheme of planning data is performed, and the data are stored on a plurality of nodes in a scattered manner, so that differentiated storage management of different levels of data is realized, and availability and disaster recovery capability of the data are improved.

Further, referring to fig. 6, based on the same inventive concept as the above-mentioned distributed data storage planning method for improving disaster recovery capability of a data center, the present disclosure further provides a distributed data storage planning system for improving disaster recovery capability of a data center, including:

the storage planning module is electrically connected with the data classification module and the data center analysis module, and is used for determining a data backup scheme of the data center and backing up data of the data storage nodes based on important weights of stored data of each data storage node and risk correlation between the data storage nodes.

The data classification module comprises:

The data center analysis module comprises:

The fault extraction unit is used for determining the operation data of the storage node faults in the historical operation data of the data center and recording the operation data as fault operation data;

The storage planning module comprises:

the model building unit is used for building a scheme evaluation model;

The use process of the distributed data storage planning system for improving the disaster recovery capability of the data center is as follows:

Step one: the reference assignment unit attaches a weight reference value to each data based on the data type;

Step two: the access analysis unit sets a data analysis period, acquires the accessed times of each data in the data analysis period closest to the current moment, marks the accessed times as data access analysis times, and determines the data access frequency index of each data based on the data access analysis times of all the data;

Step three: the comprehensive weight analysis unit calculates an important weight of each data using a data weight formula based on the weight reference value of each data and the data access frequency index of each data.

Step four: the fault extraction unit determines the operation data of the storage node faults in the historical operation data of the data center, and records the operation data as fault operation data;

Step five: the node combination unit performs random pairwise combination on all data storage nodes in the system to obtain a plurality of data storage node groups;

Step six: the correlation calculation unit calculates risk correlation between two data storage nodes in each data storage node group by adopting a correlation algorithm.

Step seven: the data storage analysis unit determines storage nodes corresponding to each data in the data center, determines the size of each data, and takes the risk correlation degree between the storage nodes corresponding to each data and each data storage node as the backup risk degree between the data and each data storage node;

step eight: the preliminary scheme generating unit constructs data backup storage limiting conditions and generates a plurality of preliminary data backup schemes based on the data backup storage limiting conditions;

step nine: the model construction unit constructs a scheme evaluation model;

step ten: the data backup planning unit calculates the scheme rationality of each preliminary data backup scheme based on the scheme evaluation model, and screens out the preliminary data backup scheme with the minimum scheme rationality as the data backup scheme of the data storage node.

In summary, the invention has the advantages that: the method can ensure that when the actual disaster causes the failure of part of data storage nodes, the data and the system are quickly and accurately recovered, the loss caused by the disaster is reduced, the availability and disaster recovery capability of the data are greatly improved, and the service interruption risk caused by the node failure is reduced.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A distributed data storage planning method for improving disaster recovery capacity of a data center is characterized by comprising the following steps:

acquiring all data storage nodes in a data center;

Calling historical operation data of a data center;

dividing a backup storage space in each data center;

2. The distributed data storage planning method for improving disaster recovery capacity of a data center according to claim 1, wherein the data classification logic specifically comprises:

Adding a weight reference value to each data based on the data type;

Setting a data analysis period;

the data weight formula specifically comprises the following steps:

Z_i＝P_i×J_i；

3. The method for planning distributed data storage for improving disaster recovery capacity of a data center according to claim 2, wherein determining the data access frequency index of each data based on the number of data access analysis of all data specifically comprises:

the access frequency index calculation formula specifically comprises:

4. The method for planning distributed data storage for improving disaster recovery capacity of a data center according to claim 3, wherein determining risk correlation among all data storage nodes based on historical operation data of the data center specifically comprises:

5. The distributed data storage planning method for improving disaster recovery capacity of a data center according to claim 4, wherein the correlation algorithm specifically comprises:

the correlation calculation formula specifically comprises the following steps:

6. The method for planning distributed data storage for improving disaster recovery capacity of a data center according to claim 5, wherein determining a data backup scheme of the data center based on importance weights of stored data of each data storage node and risk correlation between the data storage nodes specifically comprises:

Determining the size of each data;

constructing a data backup storage limiting condition;

The scheme evaluation model specifically comprises the following steps:

7. A distributed data storage planning system for improving disaster recovery capacity of a data center, wherein the distributed data storage planning method for improving disaster recovery capacity of a data center according to any one of claims 1 to 6 comprises:

8. The distributed data storage planning system for improving disaster recovery capacity of a data center of claim 7 wherein said data classification module comprises:

9. The distributed data storage planning system for improving disaster recovery capacity of a data center of claim 7 wherein the data center analysis module comprises:

10. The distributed data storage planning system for improving disaster recovery capacity of a data center of claim 7 wherein said storage planning module comprises:

the model building unit is used for building a scheme evaluation model;