CN111770158B

CN111770158B - Cloud platform recovery method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN111770158B
Application number: CN202010591210.8A
Authority: CN
Inventors: 葛凯凯; 邬沛君; 郑松坚; 潘晓东; 吴晓清; 徐凯; 李文达; 江鹏飞
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2023-09-19
Anticipated expiration: 2040-06-24
Also published as: CN111770158A

Abstract

The application provides a cloud platform recovery method, a cloud platform recovery device, electronic equipment and a computer readable storage medium, which comprise the following steps: when one or more storage nodes in a Ceph storage cluster of the cloud platform fail, acquiring identification information of each virtual machine in the cloud platform; based on the identification information of each virtual machine, determining the identification information of each block device rbd corresponding to the virtual machine; based on the identification information of each rbd, acquiring each sub-file corresponding to the rbd, which is stored in a Ceph storage cluster in a distributed manner; splicing all the subfiles corresponding to each rbd to obtain the corresponding rbd, and uploading the local files to a standby Ceph storage cluster; and controlling the cloud platform to switch from the Ceph storage cluster to the standby Ceph storage cluster so as to recover the cloud platform. According to the scheme, each functional component in the Ceph storage cluster with the fault is not required to be repaired one by one, and the cloud platform can be recovered in time even under the condition that the Ceph storage cluster is seriously damaged or the data carrying capacity of the Ceph storage cluster is large.

Description

Cloud platform recovery method and device, electronic equipment and computer readable storage medium

Technical Field

The application relates to the technical field of computers, in particular to a cloud platform recovery method, a cloud platform recovery device, electronic equipment and a computer readable storage medium.

Background

Many cloud platforms are a combination of an OpenStack management system that provides management functions of the cloud platform and a Ceph storage cluster that provides unified storage functions including block storage, object storage, and file storage. The current widely used way of Ceph storage clusters is to use block storage as system disks and data disks for virtual machines.

In the use process of the cloud platform, as more virtual machines are applied by users, the data volume borne by the Ceph storage clusters is larger and larger, and the Ceph storage clusters can be gradually enlarged through capacity expansion to reach hundreds of OSD (Qbject Storage Device, object storage device) or thousands of OSD. As the scale of the Ceph storage cluster becomes larger, migration and recovery of data become challenges, and storage is the basis of a cloud platform, if the Ceph storage cluster fails, the cloud platform is at risk of being unusable, and then repair needs to be performed in time to recover normal operation of the cloud platform when the Ceph storage cluster fails.

At present, the repairing scheme for the damage of the Ceph storage cluster usually repairs the corresponding components with faults one by one until all the components can normally operate, and then the self-recovery capability of the Ceph storage cluster is utilized for recovering the data consistency, so that the recovery of the whole cloud platform is finally achieved. However, in the case of serious damage to the Ceph storage cluster or large data volume carried by the Ceph storage cluster, the repairing scheme cannot recover the cloud platform in time.

Disclosure of Invention

The application aims to at least solve one of the technical defects, and the technical scheme provided by the embodiment of the application is as follows:

in a first aspect, an embodiment of the present application provides a cloud platform recovery method, including:

when one or more storage nodes in a Ceph storage cluster of the cloud platform fail, acquiring identification information of each virtual machine in the cloud platform;

based on the identification information of each virtual machine, determining the identification information of each block device rbd corresponding to the virtual machine;

based on the identification information of each rbd, acquiring each sub-file corresponding to the rbd, which is stored in a Ceph storage cluster in a distributed manner;

splicing all the subfiles corresponding to each rbd to obtain a local file corresponding to the rbd, and uploading the local file to a standby Ceph storage cluster;

And controlling the cloud platform to switch from the Ceph storage cluster to the standby Ceph storage cluster so as to recover the cloud platform.

In an optional embodiment of the present application, determining, based on the identification information of each virtual machine, identification information of each block device rbd corresponding to the virtual machine includes:

acquiring names of rbds corresponding to each virtual machine based on the identification information of each virtual machine and the corresponding relation between the virtual machine and each data storage pool in the Ceph storage cluster;

the identification information of each rbd is acquired from the name of the rbd.

In an optional embodiment of the present application, a file name of each subfile includes identification information of a corresponding rbd, and based on the identification information of each rbd, obtaining each subfile corresponding to the rbd, which is stored in a Ceph storage cluster in a distributed manner, includes:

comparing the identification information of each rbd with the file names of all subfiles stored in the Ceph storage cluster, and determining the subfiles containing the identification information of the rbd in the file names as subfiles corresponding to the rbd;

all subfiles corresponding to each rbd are obtained.

In an alternative embodiment of the present application, comparing the identification information of each rbd with the file name of each subfile stored in the Ceph storage cluster, and determining the subfile including the identification information of the rbd in the file name as the subfile corresponding to the rbd includes:

And respectively comparing the identification information of each rbd with the identification information of the rbd contained in each sub-file stored in the OSD of each object storage device in the Ceph storage cluster, and determining the sub-file containing the identification information of the rbd in the file name as the sub-file corresponding to the rbd.

In an alternative embodiment of the present application, after determining the subfile including the identification information of the rbd in the file name as the subfile corresponding to the rbd, the method further includes:

acquiring storage path information of each sub-file corresponding to each rbd, and storing the storage path information of each sub-file and identification information of the corresponding rbd in a preset database in a one-to-one correspondence manner, wherein the storage path information comprises a host name of an OSD (on screen display) where the corresponding sub-file is located and storage directory information of the sub-file in the OSD;

obtaining all subfiles corresponding to each rbd, including:

acquiring storage path information of each corresponding subfile from a preset database based on the identification information of each rbd;

and acquiring all corresponding subfiles from the corresponding OSD based on the path information of each subfile.

In an optional embodiment of the present application, a file of each sub-file includes a position offset of the sub-file in a corresponding rbd, where the position offset indicates a position of the corresponding sub-file in the corresponding rbd, and splicing each sub-file corresponding to each rbd to obtain a local file corresponding to the rbd includes:

Determining target subfiles of all positions of each rbd based on all subfiles corresponding to each rbd;

determining the splicing sequence of each target sub-file based on the position offset contained in the file name of each target sub-file corresponding to each rbd;

and splicing the target subfiles corresponding to each rbd according to the splicing sequence to obtain the local file corresponding to the rbd.

In an alternative embodiment of the present application, determining, based on the subfiles corresponding to each rbd, a target subfile for each rbd location includes:

for each position in rbd, if the position corresponds to one sub-file, determining the sub-file as a target sub-file of the position;

if the position corresponds to at least two subfiles, selecting one subfile from the at least two subfiles according to a preset strategy to determine the subfile as a target subfile of the position.

In an alternative embodiment of the present application, selecting one subfile from at least two subfiles according to a preset policy to determine the subfile as a target subfile of the location includes:

acquiring an MD5 value of an information abstract algorithm of each subfile;

if the MD5 values of all the subfiles are the same, optionally determining one subfile in all the subfiles as a target subfile of the position;

If the MD5 values of the subfiles are different, selecting the subfile with the first or last corresponding modification time in the subfiles to be determined as the target subfile of the position.

In an alternative embodiment of the present application, after controlling the cloud platform to switch from the Ceph storage cluster to the backup Ceph storage cluster, the method further includes:

formatting the Ceph storage clusters to obtain formatted Ceph storage clusters, adding storage nodes in the formatted Ceph storage clusters into standby Ceph storage clusters, and obtaining merged Ceph storage clusters;

and eliminating the storage nodes belonging to the standby Ceph storage cluster system in the combined Ceph storage clusters.

In a second aspect, an embodiment of the present application provides a cloud platform recovery apparatus, including:

the device comprises an identification information acquisition module of a virtual machine, a storage node acquisition module and a storage node acquisition module, wherein the identification information acquisition module is used for acquiring the identification information of each virtual machine in a cloud platform when one or more storage nodes in a Ceph storage cluster of the cloud platform are failed;

the identification information acquisition module of rbd is used for determining the identification information of each block device rbd corresponding to each virtual machine based on the identification information of each virtual machine;

the sub-file acquisition module is used for acquiring each sub-file corresponding to each rbd, which is stored in the Ceph storage cluster in a distributed manner, based on the identification information of each rbd;

The local file uploading module is used for splicing all the subfiles corresponding to each rbd to obtain a local file corresponding to the rbd, and uploading the local file to the standby Ceph storage cluster;

and the storage cluster switching module is used for controlling the cloud platform to switch from the Ceph storage cluster to the standby Ceph storage cluster so as to recover the cloud platform.

In an alternative embodiment of the present application, the rbd identification information obtaining module is specifically configured to:

In an optional embodiment of the present application, a file name of each subfile includes identification information of a corresponding rbd, and the subfile obtaining module is specifically configured to:

all subfiles corresponding to each rbd are obtained.

In an alternative embodiment of the present application, the subfile acquisition module is further configured to:

In an alternative embodiment of the present application, the apparatus further includes a storage path information acquisition module configured to:

after determining a subfile containing identification information of the rbd in a file name as a subfile corresponding to the rbd, acquiring storage path information of each subfile corresponding to each rbd, and storing the storage path information of each subfile and the identification information of the corresponding rbd into a preset database in a one-to-one correspondence manner, wherein the storage path information comprises a host name of an OSD where the corresponding subfile is located and storage directory information of the subfile in the OSD;

correspondingly, the subfile acquisition module is specifically configured to:

In an optional embodiment of the present application, a file of each sub-file includes a position offset of the sub-file in the corresponding rbd, where the position offset indicates a position of the corresponding sub-file in the corresponding rbd, and the local file uploading module is specifically configured to:

In an alternative embodiment of the present application, the local file upload module is further configured to:

acquiring an MD5 value of an information abstract algorithm of each subfile;

In an alternative embodiment of the present application, the apparatus further includes a capacity expansion module for:

after the control cloud platform is switched from the Ceph storage cluster to the standby Ceph storage cluster, formatting the Ceph storage cluster to obtain a formatted Ceph storage cluster, and adding storage nodes in the formatted Ceph storage cluster into the standby Ceph storage cluster to obtain a combined Ceph storage cluster;

In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor;

a memory having a computer program stored therein;

a processor for executing a computer program to implement the method provided in the first aspect embodiment or any of the alternative embodiments of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor implements the method provided in the embodiment of the first aspect or any of the alternative embodiments of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from a computer readable storage medium by a processor of a computer device, which processor executes the computer instructions such that the computer device, when executed, implements the method provided in the embodiment of the first aspect or any alternative embodiment of the first aspect.

The technical scheme provided by the application has the beneficial effects that:

the method comprises the steps of obtaining the identification information of the corresponding rbd through the identification information of each virtual machine, obtaining the corresponding undamaged subfiles from the Ceph storage clusters with faults based on the identification information of each rbd, obtaining the corresponding local files based on the undamaged subfiles corresponding to each rbd and uploading the local files to the backup Ceph storage clusters, and finally realizing the recovery of the cloud platform by switching the cloud platform to the backup Ceph storage clusters.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is an interaction schematic diagram of an OpenStack management system and a Ceph storage cluster in a cloud platform according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a cloud platform recovery method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a distributed storage of subfiles corresponding to rbd in a Ceph storage cluster according to an embodiment of the present application;

fig. 4 is a schematic diagram of storage cluster switching of a cloud platform according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of determining identification information of a corresponding rbd based on identification information of a virtual machine in an embodiment of the application;

FIG. 6 is a schematic diagram illustrating a correspondence between a local file and a sub-file according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a local file spliced from a target subfile in an example of an embodiment of the application;

FIG. 8 is a schematic diagram of parallel alignment of identification information in an example of an embodiment of the present application;

FIG. 9 is a schematic diagram of obtaining corresponding subfiles based on rbd identification information in an example of an embodiment of the present application;

fig. 10 is a schematic diagram of cloud platform recovery after dividing a Ceph storage cluster into a plurality of small Ceph storage clusters according to an embodiment of the present application;

fig. 11 is a structural block diagram of a cloud platform recovery device according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

First, several terms related to the present application are described and explained:

OpenStack: openStack is an open-source IaaS (Infrastructure as a Service ) management platform.

Ceph: ceph is an open-source distributed storage system, and can provide object, block and file storage services at the same time.

OSD: OSD (Object Storage Device, object store) is a component in Ceph that is used to manage specific data disks.

Pg: pg is a logical concept in Ceph and represents a collection of a group of data objects.

rbd: rbd (Rados Block Device) is a block device service of a Ceph cluster.

Fig. 1 is a schematic interaction diagram of an OpenStack management system 101 and a Ceph storage cluster 102 in a cloud platform according to an embodiment of the present application, where data in the cloud platform is generated by a system disk and a data disk of a virtual machine applied by a user in the OpenStack management system 101, and a base mirror image, where the data are stored in various corresponding data storage pools of the Ceph storage cluster 102, and these data storage pools may be understood as being formed by a plurality of storage nodes, that is, the data are stored in the storage nodes corresponding to the corresponding data storage pools. For example, data corresponding to the base image is stored in an images pool, data corresponding to the system disk of the image-boot virtual machine is stored in a vms pool, data corresponding to the data disk of the image-boot virtual machine is stored in a volumes pool, and data corresponding to both the system disk and the data disk of the volume-boot virtual machine is stored in the volumes pool. When the Ceph storage cluster 102 fails, the cloud platform may not be able to operate normally, and in order to ensure that the user uses the cloud platform normally, the cloud platform needs to be recovered in time.

Fig. 2 is a flow chart of a cloud platform recovery method according to an embodiment of the present application, as shown in fig. 2, where the method may include:

step S201, when one or more storage nodes in a Ceph storage cluster of a cloud platform fail, acquiring identification information of each virtual machine in the cloud platform.

It should be noted that, the execution body of the method may be a cloud platform recovery component set in an OpenStack management system of the cloud platform, or may be a cloud platform recovery component independent of the OpenStack management system and the Ceph storage cluster, where the component may interact with the OpenStack management system and the Ceph storage cluster respectively to obtain relevant information required for recovering the cloud platform from the OpenStack management system and the Ceph storage cluster, and control the OpenStack management system and the Ceph storage cluster to execute relevant instructions required for recovering the cloud platform.

Specifically, when one or more storage nodes in a Ceph storage cluster of the cloud platform fail, the cloud platform recovery component obtains identification information of a virtual machine applied by a user through an OpenStack management system accessing the cloud platform, for example, identification information of a base mirror image, identification information of a mirror image starting virtual machine, identification information of a volume starting virtual machine, and the like.

Step S202, based on the identification information of each virtual machine, the identification information of each block device rbd corresponding to the virtual machine is determined.

When each rbd is stored in the corresponding data storage pool, each rbd will have identification information indicating the corresponding virtual machine and the corresponding data storage pool, so that the identification information of the corresponding rbd can be determined based on the identification information of each virtual machine and the corresponding data storage pool.

Step S203, based on the identification information of each rbd, each sub-file corresponding to the rbd, which is stored in the Ceph storage cluster in a distributed manner, is acquired.

When each rbd is stored in the Ceph storage cluster, the rbd is distributed and stored in each storage node of the corresponding data storage pool according to a preset algorithm, specifically, each rbd is split into a plurality of subfiles and is distributed and stored in a plurality of OSDs of the Ceph storage cluster, wherein the preset algorithm may be a credit distribution algorithm. Meanwhile, the file name of the subfile corresponding to each rbd contains the identification information of the rbd to which the subfile belongs.

Specifically, according to the identification information of each rbd, all subfiles containing the identification information of the rbd in the file names can be searched from the Ceph storage cluster, and the subfiles are split by the rbd. It should be noted that, in the process of splitting rbd and then storing multiple OSDs in a distributed manner and a Ceph storage cluster, each sub-file generally replicates multiple copies and stores the copies in different OSDs respectively, so as to ensure the reliability of data storage.

As shown in fig. 3, for n data storage pools Pool1 and Pool2 and … Pool in the Ceph storage cluster, rbd in each Pool of n is converted (split) into n corresponding Pg files by using a Hash algorithm, where each Pg file includes multiple corresponding subfiles, and each subfile is distributed and stored in each OSD of the Ceph storage cluster by using a flush algorithm, where each subfile is duplicated with one copy, and two copies of the same subfile are stored in different OSDs, so as to ensure reliability of data storage.

Step S204, splicing the subfiles corresponding to each rbd to obtain a local file corresponding to each rbd, and uploading the local file to a standby Ceph storage cluster.

Specifically, as can be seen from the foregoing description, since multiple copies are replicated for storing each sub-file during distributed storage, when the Ceph storage cluster fails, all copies of the sub-file are less likely to be damaged, and then a copy without damage can be obtained from the multiple copies as a sub-file to ensure the accuracy of the sub-file. After the undamaged subfiles of each file of the rbd are obtained, the local file of the rbd can be obtained based on the undamaged subfiles. And uploading the local file of the rbd to a standby Ceph storage cluster for storage.

Step S205, the cloud platform is controlled to switch from the Ceph storage cluster to the backup Ceph storage cluster, so as to recover the cloud platform.

Specifically, after local files corresponding to rbds of all virtual machines are uploaded to a backup Ceph storage cluster, data of all virtual machines are recovered, and then the storage cluster of the cloud platform is switched from the Ceph storage cluster to the backup Ceph storage cluster, so that normal operation of the cloud platform can be recovered. Specifically, as shown in fig. 4, which is a schematic diagram of storage cluster switching of a cloud platform, switching the storage cluster of the cloud platform from a Ceph storage cluster to a backup Ceph storage cluster requires the following three steps of updating: (1) Updating Ceph configuration files corresponding to all storage nodes in the OpenStack management system and corresponding key rings of all users, wherein the key rings are used for authenticating user identities and the key rings of different users are different; (2) Updating a database of a glance component for managing images in an OpenStack management system, wherein storage path information of each image data in the database of the glance component comprises cluster identification Information (ID) of Ceph storage clusters, and the cluster IDs of different Ceph storage clusters are different; (3) Updating a database of a nova component for managing a virtual machine in an OpenStack management system, wherein the database of the nova component stores all disk information, each disk information comprises ip addresses of mons corresponding to Ceph storage clusters, and ip addresses of mons corresponding to different Ceph clusters are different, wherein the mons are also called control nodes, and the Ceph storage clusters such as authority authentication, a topological graph of an OSD, a topological graph of a Pg, health conditions of the OSD and the like are comprehensively managed.

According to the scheme provided by the embodiment of the disclosure, the identification information of the corresponding rbd is obtained through the identification information of each virtual machine, then the corresponding undamaged subfiles are obtained from the Ceph storage clusters with faults based on the identification information of each rbd, the corresponding local files are obtained based on the undamaged subfiles corresponding to each rbd and uploaded to the standby Ceph storage clusters, and finally the cloud platform is restored by switching the cloud platform to the standby Ceph storage clusters.

The rbd data in different data storage pools have preset naming rules, for example, for an images pool, data generated by a base image is stored in the images pool, and the name of the corresponding rbd is the identification information of the base image. For the vms pool, data generated by a system disk of the mirror-started virtual machine is stored in the vms pool, and the name of the corresponding rbd is a combination of identification information of the mirror-started virtual machine and "_disk". For volume pool, the name of corresponding rbd is "volume" and the data generated by data disk of storage mirror starting virtual machine, and the data disk of volume starting and the data generated by system disk ^- "combination with the identification information of the mirror-started virtual machine/the identification information of the volume-started virtual machine".

After determining the name of each rbd, the identification information of the rbd when the rbd is stored in the Ceph storage cluster can be obtained through analyzing the name of the rbd, specifically, the identification information of the corresponding rbd can be obtained through analyzing the name of the rbd through a "rbd info" command, for example, the form of the identification information of the rbd can be "rbd_data.e7626a6b8b4567". As shown in fig. 5, the process of determining the identification information of each rbd corresponding to each virtual machine may be divided into two steps by using the identification information of each virtual machine, step 501 obtains the name of each rbd corresponding to each virtual machine through a preset naming rule corresponding to each data storage pool, and step 502 parses the corresponding identification information from the name of each rbd through an "rbd info" command.

all subfiles corresponding to each rbd are obtained.

As can be seen from the foregoing description, the file name of each sub-file corresponding to a rbd includes the identification information of the rbd, and to restore the rbd, it is required to obtain all the sub-files belonging to the rbd from the Ceph storage cluster where the rbd occurs, that is, determine which sub-files in the Ceph storage cluster where the rbd occurs belong to the rbd.

Specifically, the identification information of each rbd is compared with the file names of all the subfiles stored in the Ceph storage cluster, and the file name of each subfile containing the identification information of the rbd is the subfile belonging to the rbd. The subfiles belonging to the rbd are determined and recorded, and the subfiles belonging to the rbd are acquired before the local file of the rbd is generated.

The Ceph storage cluster comprises a plurality of OSDs, the OSDs are main data bearing components in the Ceph storage cluster, the data corresponding to the virtual machine are distributed and stored on the Ceph storage cluster through the OSDs, and therefore, in the process of comparing the identification information of each rbd with the file names of all the subfiles stored in the Ceph storage cluster, the subfiles stored by each OSD in the Ceph storage cluster are required to be compared, and when the number of OSDs is large, if the subfiles stored by each OSD are compared in sequence, the efficiency is lower. In order to improve efficiency of determining subfiles corresponding to rbds, the embodiment of the application can perform parallel comparison on each OSD, namely, simultaneously compare identification information of each rbd with file names of subfiles stored in each OSD.

Specifically, multiple OSD operation components (OSD runners) can be controlled by a sub-file collection component (collection manager) to respectively perform parallel comparison on each OSD, that is, an OSD runner is configured for each OSD to perform comparison of identification information and file names of sub-files.

In an alternative embodiment of the present application, after determining the subfile including the identification information of the rbd in the file name as the subfile corresponding to the rbd, the method may further include:

obtaining all subfiles corresponding to each rbd, including:

Specifically, after determining the subfiles corresponding to each rbd, obtaining storage path information of each subfile, and storing the storage path information of each subfile and the identification information of the corresponding rbd in a preset database in a one-to-one correspondence manner. The storage path information includes the host name of the OSD where the corresponding sub-file is located and the storage directory information of the sub-file in the OSD. In the step of acquiring the corresponding local file through each sub-step corresponding to each rbd, acquiring the storage path information of all corresponding sub-files from a preset database through the identification information of each rbd, and acquiring the sub-files according to the storage path information of each sub-file. Specifically, the storage path information of all corresponding subfiles is obtained from a preset database according to the identification information of rbd, and the corresponding subfiles are obtained under the storage directory of the OSD indicated by the host names according to the host names and the storage directory information in the storage path information.

As can be seen from the foregoing description, each rbd is distributed and stored in each storage node of the corresponding data storage pool according to a preset algorithm, that is, each rbd is split into a plurality of subfiles and is distributed and stored in a plurality of OSDs of the Ceph storage cluster, and then the plurality of subfiles occupy different positions in the corresponding rbd respectively. The file name of each sub-file contains identification information of the corresponding rbd and the position offset of the corresponding rbd in the corresponding rbd, the identification information of the corresponding rbd contained in the file name is used for indicating the rbd, and the offset contained in the file name is used for indicating the position of the corresponding rbd in the corresponding sub-file. For example, as shown in fig. 6, the identification information of a certain rbd is "rbd_data.e7626a6b8b4567", the rbd is split into n (n+.2) subfiles at the time of storage, and the size of each subfile is 4M, when a subfile with a file name of "rbd_data.e7626a6b8b4567.0000000000002" is acquired, since the prefix (identification information of the corresponding rbd) in the subfile name is the same as that of the rbd, "rbd_data.e7626a6b8b4567", the subfile is determined to belong to the rbd, and since the suffix (position offset) "0000000000002" in the subfile name is determined to be in the 2 nd position from the beginning to the end of n positions in the rbd.

It should be noted that, as shown in the foregoing description, in the process of splitting rbd and storing multiple OSDs in a distributed manner and a Ceph storage cluster, each sub-file generally replicates multiple copies and stores the copies in different OSDs, so after obtaining the sub-file corresponding to each rbd, it is necessary to determine the target sub-file at each position in the rbd, that is, determine the target sub-file without damage from the multiple copies corresponding to each position.

Specifically, after all the subfiles corresponding to each rbd are obtained, firstly determining target subfiles at each position, and then splicing all the subfiles according to the arrangement sequence of the subfiles indicated by the position offset to obtain the local files corresponding to the rbd.

For example, fig. 7 is a schematic diagram of a splicing process of a local file corresponding to a rbd, where the rbd is formed by 61 target subfiles, and the corresponding target subfiles are obtained from the corresponding OSDs respectively, and are aligned according to a splicing order to be spliced to obtain the local file of the rbd.

In the following, referring to the drawings again to illustrate the restoration process of the cloud platform, as shown in fig. 8 and fig. 9, the Ceph storage cluster where a fault occurs in a certain cloud platform includes OSD1, OSD2 and OSD3, and when a sub-file corresponding to rbd is searched by using a collection manager, a corresponding OSD runner is configured for each OSD, that is, OSD1 is configured for OSD1, OSD runner2 is configured for OSD2, and OSD runner3 is configured for OSD 3.

The parallel comparison process is shown in fig. 8, specifically, the subfiles of each OSD seed are compared in parallel by using the corresponding OSD runner, and the storage path information of the subfiles obtained by the comparison and the identification information of the corresponding rbd are stored in a preset database (Data Base, DB), respectively. Wherein the subfiles of the OSD are typically stored under a specified directory "/var/lib/ceph/OSD". The process of acquiring the subfiles corresponding to rbd is shown in fig. 9, specifically, the storage path information of each corresponding subfile is acquired from the DB according to the identification information of rbd, and the corresponding subfiles are acquired from the designated directories of the corresponding OSD according to the storage path information, respectively. Then, for the same position of rbd containing a plurality of subfiles (copies), selecting the subfiles which are not damaged as target subfiles through a preset strategy, uploading all target subfiles corresponding to rbd to an aggregation storage node, splicing all target subfiles corresponding to rbd in the aggregation storage node according to a splicing sequence to obtain a local file corresponding to rbd, and uploading the local file to a standby Ceph storage cluster, thereby realizing the recovery of the data of each virtual machine.

Specifically, if the subfiles corresponding to each position in the rbd are not copied during storage, the obtained subfiles corresponding to the position are directly determined to be the target subfiles corresponding to the position. If the subfiles corresponding to each position in rbd duplicate copies during storage, that is, the position corresponds to a plurality of subfiles, determining the target subfiles of the position from the plurality of subfiles according to a preset strategy.

For example, for rbd with the identification information "rbd_data.e7626a6b8b4567", 3 subfiles (copies) corresponding to the 1 st location are obtained, and the storage paths of the 3 subfiles are respectively:

“/var/lib/ceph/osd/ceph-1/current/9.f7_head/rbd_data.e7626a6b8b4567.0000000000001”；

“/var/lib/ceph/osd/ceph-4/current/9.f7_head/rbd_data.e7626a6b8b4567.0000000000001”；

“/var/lib/ceph/osd/ceph-7/current/9.f7_head/rbd_data.e7626a6b8b4567.0000000000001”。

then the target subfile corresponding to the first location needs to be determined from the 3 subfiles before the subfile splice is performed.

acquiring an MD5 value of an information abstract algorithm of each subfile;

Specifically, if rbd corresponds to multiple subfiles at the same location, MD5 values of the multiple subfiles are obtained first, if the MD5 values of the subfiles are the same, the multiple subfiles can be considered to be the same and are not damaged when the Ceph storage cluster fails, and then one subfile can be selected from the multiple subfiles as a target subfile at the location. If the MD5 of each sub-file is different, it may be considered that there is a case of damage to a sub-file in the plurality of sub-files, and it is necessary to determine a target sub-file that is not damaged from the plurality of sub-files, specifically, if a deletion operation is being performed on a sub-file when a Ceph storage cluster fails, it is determined that a sub-file with the first modification time in the plurality of sub-files is a target sub-file, and if a new addition operation is being performed on a sub-file when a Ceph storage cluster fails, it is determined that a sub-file with the last modification time in the plurality of sub-files is a target sub-file.

In an alternative embodiment of the present application, after controlling the cloud platform to switch from the Ceph storage cluster to the backup Ceph storage cluster, the method may further include:

Specifically, after the cloud platform is controlled to be switched from the Ceph storage cluster to the standby Ceph storage cluster, the data of the virtual machine are recovered, and then the cloud platform is recovered, and further, the cloud platform can be switched from the standby Ceph storage cluster to the original Ceph storage cluster. Specifically, firstly, the Ceph storage clusters with faults are formatted and emptied to obtain formatted Ceph storage clusters, then storage nodes in the formatted Ceph storage clusters are added into standby Ceph storage clusters to obtain combined Ceph storage clusters, and the process can be understood as expanding the Ceph storage clusters of the cloud platform. And then, removing the storage nodes belonging to the standby storage clusters in the merged Ceph storage clusters, wherein the data stored on the storage nodes of the standby Ceph storage clusters in the removing process can be automatically transferred to the storage nodes of the formatted Ceph storage clusters, and the process can be understood as capacity reduction of the Ceph storage clusters of the cloud platform. Through the capacity expansion and contraction processes, the cloud platform can be switched from the standby Ceph storage cluster to the original Ceph storage cluster, and the restoration of the Ceph storage cluster with faults is completed.

In an alternative embodiment of the present application, in order to further improve efficiency of recovering the cloud platform when the Ceph storage clusters of the cloud platform fail, the Ceph storage clusters of the cloud platform may be divided into a plurality of small Ceph storage clusters, so that when the small Ceph storage clusters fail, only the small Ceph storage clusters are processed by adopting the scheme described in the above embodiment to recover normal operation of the cloud platform, and because the Ceph storage clusters are divided, only the failed small Ceph storage clusters can be subjected to data repair and switching, thereby further improving recovery efficiency of the cloud platform. Meanwhile, the standby Ceph storage cluster can be constructed by an integrated cabinet, and the integrated cabinet comprises the standby Ceph storage cluster and the aggregation storage nodes, so that the cost is lower.

As shown in fig. 10, a Ceph storage cluster of a certain cloud platform is divided into two small Ceph storage clusters, namely, a Ceph storage cluster 1 and a Ceph storage cluster 2, and when the Ceph storage cluster 1 fails, the Ceph storage cluster 1 is processed by adopting the method described in the embodiment, so that the cloud platform is recovered. Specifically, target subfiles corresponding to each rbd are obtained from the Ceph storage cluster 1, local files corresponding to each rbd are obtained based on target subfiles in a spliced mode, then each local file is uploaded to a standby Ceph storage cluster constructed by an integrated cabinet for storage, finally the cloud platform is switched from the Ceph storage cluster 1 to the standby Ceph storage cluster, and recovery of the cloud platform is achieved.

Fig. 11 is a block diagram of a cloud platform recovery apparatus according to an embodiment of the present application, and as shown in fig. 11, the apparatus 1100 may include: the virtual machine identification information acquisition module 1101, the rbd identification information acquisition module 1102, the sub-file acquisition module 1103 and the local file uploading module 1104, wherein:

the virtual machine identification information obtaining module 1101 is configured to obtain identification information of each virtual machine in an OpenStack management system of the cloud platform when one or more storage nodes in a Ceph storage cluster of the cloud platform fail;

the rbd identification information acquisition module 1102 is configured to determine identification information of each block device rbd corresponding to each virtual machine based on the identification information of the virtual machine;

the sub-file obtaining module 1103 is configured to obtain, based on the identification information of each rbd, each sub-file corresponding to the rbd, which is stored in the Ceph storage cluster in a distributed manner;

the local file uploading module 1104 is configured to splice each sub-file corresponding to each rbd to obtain a local file corresponding to the rbd, and upload the local file to the backup Ceph storage cluster;

the storage cluster switching module 1105 is configured to control the cloud platform to switch from the Ceph storage cluster to the backup Ceph storage cluster, so as to restore the cloud platform.

According to the scheme provided by the application, the identification information of the corresponding rbd is obtained through the identification information of each virtual machine, then the corresponding undamaged sub-file is obtained from the Ceph storage cluster with faults based on the identification information of each rbd, the corresponding local file is obtained based on the undamaged sub-file corresponding to each rbd and is uploaded to the standby Ceph storage cluster, and finally the cloud platform is restored by switching the cloud platform to the standby Ceph storage cluster.

all subfiles corresponding to each rbd are obtained.

Correspondingly, the subfile acquisition module is specifically configured to:

acquiring an MD5 value of an information abstract algorithm of each subfile;

In the embodiment of the application, the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.

Cloud technology (Cloud technology) is based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by Cloud computing business models, and can form a resource pool, so that the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

The Database (Database), which can be considered as an electronic filing cabinet, is a place for storing electronic files, and users can perform operations such as adding, inquiring, updating, deleting and the like on the data in the files. A "database" is a collection of data stored together in a manner that can be shared with multiple users, with as little redundancy as possible, independent of the application.

The database management system (Database Management System, abbreviated as DBMS) is a computer software system designed for managing databases, and generally has basic functions of storage, interception, security, backup and the like. The database management system may classify according to the database model it supports, e.g., relational, XML (Extensible Markup Language ); or by the type of computer supported, e.g., server cluster, mobile phone; or by the query language used, such as SQL (structured query language (Structured Query Language), XQuery, or by the energy impact emphasis, such as maximum-scale, maximum-speed, or other classification means, regardless of which classification means is used, some DBMSs can cross-category, for example, while supporting multiple query languages.

Based on the same principle, the embodiment of the application also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the processor executes the computer program, the method provided in any optional embodiment of the application can be realized, and the following specific situations can be realized:

When one or more storage nodes in a Ceph storage cluster of the cloud platform fail, acquiring identification information of each virtual machine in the cloud platform; based on the identification information of each virtual machine, determining the identification information of each block device rbd corresponding to the virtual machine; based on the identification information of each rbd, acquiring each sub-file corresponding to the rbd, which is stored in a Ceph storage cluster in a distributed manner; splicing all the subfiles corresponding to each rbd to obtain a local file corresponding to the rbd, and uploading the local file to a standby Ceph storage cluster; and controlling the cloud platform to switch from the Ceph storage cluster to the standby Ceph storage cluster so as to recover the cloud platform.

Embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method as shown in any of the embodiments of the present application.

It can be appreciated that the medium may store a computer program corresponding to the cloud platform recovery method.

Fig. 12 is a schematic structural diagram of an electronic device to which the embodiment of the present application is applied, and as shown in fig. 12, an electronic device 1200 shown in fig. 12 includes: a processor 1201 and a memory 1203. The processor 1201 is coupled to the memory 1203, for example, via bus 1202. Further, the electronic device 1200 may also include a transceiver 1204, and the electronic device 1200 may interact with other electronic devices via the transceiver 1204. It should be noted that, in practical applications, the transceiver 1204 is not limited to one, and the structure of the electronic device 1200 is not limited to the embodiment of the present application.

The processor 1201 is applied to the embodiment of the present application, and may be used to implement the functions of the cloud platform recovery apparatus shown in fig. 11.

The processor 1201 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 1201 may also be a combination of computing functions, e.g., including one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

The bus 1202 may include a path to transfer information between the components. Bus 1202 may be a PCI bus or an EISA bus, among others. The bus 1202 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 12, but not only one bus or one type of bus.

The memory 1203 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 1203 is used to store application code for performing the present inventive arrangements and is controlled by the processor 1201 for execution. The processor 1201 is configured to execute application program codes stored in the memory 1203 to implement the actions of the cloud platform recovery apparatus provided in the embodiment shown in fig. 11.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions such that the computer device performs:

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing is only a partial embodiment of the present application, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations should and are intended to be comprehended within the scope of the present application.

Claims

1. The cloud platform recovery method is characterized by comprising the following steps of:

based on the identification information of each rbd, acquiring each sub-file corresponding to the rbd, which is stored in the Ceph storage cluster in a distributed manner;

and controlling the cloud platform to be switched from the Ceph storage cluster to the standby Ceph storage cluster so as to recover the cloud platform.

2. The method of claim 1, wherein the determining, based on the identification information of each virtual machine, the identification information of each block device rbd corresponding to the virtual machine includes:

3. The method of claim 1, wherein a file name of each subfile includes identification information of a corresponding rbd, and the obtaining, based on the identification information of each rbd, each subfile corresponding to the rbd and stored in the Ceph storage cluster in a distributed manner includes:

Comparing the identification information of each rbd with the file names of all the subfiles stored in the Ceph storage cluster, and determining the subfiles with the file names containing the identification information of the rbd as subfiles corresponding to the rbd;

all subfiles corresponding to each rbd are obtained.

4. The method of claim 3, wherein comparing the identification information of each rbd with the file names of the subfiles stored in the Ceph storage cluster, and determining the subfiles of the file names that include the identification information of the rbd as subfiles corresponding to the rbd, comprises:

5. The method of claim 3, wherein after determining the subfile having the identification information of the rbd included in the file name as the subfile corresponding to the rbd, the method further comprises:

The obtaining all subfiles corresponding to each rbd includes:

acquiring storage path information of each corresponding subfile from the preset database based on the identification information of each rbd;

6. The method of claim 1, wherein the file of each sub-file includes a position offset of the sub-file in the corresponding rbd, the position offset indicating a position of the corresponding sub-file in the corresponding rbd, and the splicing the sub-files corresponding to each rbd to obtain the local file corresponding to the rbd includes:

7. The method of claim 6, wherein determining the target subfiles for each rbd location based on the subfiles for each rbd comprises:

if the position corresponds to at least two subfiles, selecting one subfile from the at least two subfiles according to a preset strategy to be determined as a target subfile of the position.

8. The method of claim 7, wherein selecting one subfile from the at least two subfiles according to a preset policy to determine the subfile as the target subfile of the location comprises:

acquiring an MD5 value of an information abstract algorithm of each subfile;

9. The method of claim 1, wherein after controlling the cloud platform to switch from the Ceph storage cluster to the backup Ceph storage cluster, the method further comprises:

formatting the Ceph storage clusters to obtain formatted Ceph storage clusters, and adding storage nodes in the formatted Ceph storage clusters into the standby Ceph storage clusters to obtain merged Ceph storage clusters;

And eliminating the storage nodes belonging to the standby Ceph storage cluster system in the merged Ceph storage cluster.

10. A cloud platform retrieval device, comprising:

the device comprises an identification information acquisition module of a virtual machine, a storage module and a storage module, wherein the identification information acquisition module is used for acquiring the identification information of each virtual machine in the cloud platform when one or more storage nodes in a Ceph storage cluster of the cloud platform are failed;

the sub-file acquisition module is used for acquiring all sub-files corresponding to each rbd, which are stored in the Ceph storage cluster in a distributed mode, based on the identification information of each rbd;

11. An electronic device comprising a memory and a processor;

The memory stores a computer program;

the processor for executing the computer program to implement the method of any one of claims 1 to 9.

12. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1 to 9.