CN113254394B

CN113254394B - Snapshot processing method, system, equipment and storage medium

Info

Publication number: CN113254394B
Application number: CN202110529051.3A
Authority: CN
Inventors: 赵鑫
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-05-14
Filing date: 2021-05-14
Publication date: 2023-10-31
Anticipated expiration: 2041-05-14
Also published as: CN113254394A

Abstract

The application discloses a snapshot processing method, a snapshot processing system, snapshot processing equipment and a storage medium, wherein the snapshot processing method comprises the following steps: constructing a file name list by using names of all files in a data engine, storing the file name list in a metadata file under a snapshot directory, and simultaneously recording a log number at the current moment; when detecting an operation instruction sent by a requester, performing file deletion operation, file copying operation or file recovery operation of the distributed storage system using a consistency algorithm based on the principle that the content of the data engine remains unchanged before and after the data engine plays back the log.

Description

Snapshot processing method, system, equipment and storage medium

Technical Field

The application belongs to the technical field of storage, and relates to a snapshot processing method, a snapshot processing system, snapshot processing equipment and a storage medium.

Background

The distributed storage system using the consistency algorithm generally divides the replication groups inside the cluster, then each replication group uses the consistency algorithm to keep the data carried by members in the same group completely the same, specifically, all client IOs are packaged into logs, then the logs are distributed inside the replication group, the replication group members additionally write the received logs into own log files, and meanwhile, the logs are restored into specific operations to be applied to own data engines (namely, the data stored in a lasting mode). Thus, if one or more (less than the minimum majority of the replication group) members drop off in the middle, only the missing logs can be transferred to the members after the replication group is restarted, and the logs can be replayed to catch up with the authority logs of the replication group, so that the data are smoothed and consistent.

But the log cannot grow indefinitely, otherwise it would take twice as much space. Therefore, the consistency algorithm also uses the snapshot to solidify the data engine at a certain moment, after the data engine takes the snapshot, the log before the snapshot record time point can be deleted to release space, so that once the replication group member needing to restore the data appears, whether the replication group member can restore the data through the log is judged, if the vacancy is too large and exceeds the existing authority log length, the snapshot is required to be copied to the node in full quantity, the leader of the default replication group is generally authoritative, the leader transmits the data to the member when the data is restored, and then the replication group member restores through the log. Therefore, there is a need for a method of snapshot of a data engine on a storage system that uses a similar consistency algorithm.

Traditional incremental snapshot algorithms, such as copy-on-write or redirect-on-write algorithms, are complex to implement, and are difficult to implement by meeting the requirements of high performance, small occupied space and other indexes at the same time, and if snapshot is performed on data in a data engine, additional disk operations are needed, the occupation of bandwidth of a bottom file system is high, and therefore, certain occupation of service bandwidth can occur, which is very unfavorable in certain service scenes sensitive to performance.

Disclosure of Invention

The application aims to overcome the defects of the prior art and provide a snapshot processing method, a system, equipment and a storage medium, which can comprehensively remove additional operations on a disk, occupy less bandwidth on a bottom file system and avoid the occupation of service bandwidth.

In order to achieve the above purpose, the application adopts the following technical scheme:

in a first aspect, the present application provides a snapshot processing method, including:

constructing a file name list by using names of all files in a data engine, and storing the file name list in a metadata file under a snapshot directory;

when a data copying operation instruction sent by a requester is detected, sending metadata files to be copied to the requester, wherein the metadata files to be copied store file names of all files to be copied in the data copying operation instruction; when a file pulling request sent by a requester is detected, sending the pulled file to be copied to the requester according to the file pulling request;

when a file recovery operation instruction sent by a requester is detected, downloading and copying the file from a data engine according to the data recovery operation instruction sent by the requester, creating a snapshot directory, and constructing a metadata file to be recovered, wherein the name of the file obtained by downloading and copying is stored in the metadata file to be recovered, the metadata file to be recovered is stored under the created snapshot directory, and then the log replay is carried out on the data engine.

Further comprises: when the deleting operation instruction sent by the requester is detected, creating a file which has the same name as the file to be deleted in the deleting operation instruction and is blank in the snapshot directory, and deleting the file to be deleted in the data engine.

When a data copying operation instruction sent by a requester is detected, sending a metadata file to be copied to the requester according to the data copying operation instruction under a snapshot directory.

When the file pulling request sent by the requester is detected, searching a file from the data engine according to the file name in the file pulling request, and then sending the searched file to the requester.

When the file pulling request sent by the requester is detected, searching a file from the data engine according to the file name in the file pulling request, and when the file with the same file name as the file pulling request is searched in the data engine, sending the searched file to the requester; when the file with the same name as that in the file pulling request is not found in the data engine, searching the file with the same name as that in the file pulling request and blank in the snapshot directory, and then sending the file with the same name as that in the file pulling request and blank to a requester.

When a data recovery operation request sent by a requester is detected, downloading a file requested to be recovered by the data recovery operation request from a data engine, copying the file, and modifying the name of a downloading directory in the file downloading process;

creating a snapshot directory, constructing a metadata file to be restored, and storing the metadata file to be restored under the newly-built snapshot directory;

deleting the catalog of the data engine, modifying the downloaded catalog into the catalog of the data engine, and then carrying out log replay on the data engine to finish the recovery of the data.

In a second aspect, the present application provides a snapshot processing system comprising:

the creation module is used for constructing a file name list by utilizing the names of all files in the data engine, and storing the file name list in a metadata file under a snapshot directory;

the file copying operation module is used for sending metadata files to be copied to the requesting party when detecting a data copying operation instruction sent by the requesting party, wherein the metadata files to be copied store file names of all files to be copied in the data copying operation instruction; when a file pulling request sent by a requester is detected, sending the pulled file to be copied to the requester according to the file pulling request;

and the file recovery operation module is used for carrying out downloading copy of the file from the data engine according to the data recovery operation instruction sent by the request party when the file recovery operation instruction sent by the request party is detected, then creating a snapshot catalog, and constructing a metadata file to be recovered, wherein the name of the file obtained by downloading copy is stored in the metadata file to be recovered, the metadata file to be recovered is stored under the created snapshot catalog, and then log replay is carried out on the data engine.

Further comprises:

and the file deleting operation module is used for creating a file which has the same name as the file to be deleted in the deleting operation instruction and is blank in the snapshot directory when the deleting operation instruction sent by the requester is detected, and deleting the file to be deleted in the data engine.

In a third aspect, the present application provides a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the snapshot processing method when executing the computer program.

In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the snapshot processing method.

The application has the following beneficial effects:

when the snapshot processing method, the system, the equipment and the storage medium are specifically operated, the file name list is constructed by utilizing the names of all files in the data engine, the file name list is stored in the metadata file under the snapshot directory, when the file copying operation or the file recovering operation is carried out, the file copying and recovering operation can be realized only by transmitting the snapshot file and then executing log playback, and therefore, the disk does not need to be additionally operated, namely, the bandwidth occupation of a bottom file system is reduced to the minimum during the snapshot, and the occupation of service bandwidth is reduced to the maximum extent.

Furthermore, when the deletion operation is performed, the file is deleted, and meanwhile, the empty file with the same name is stored in the snapshot directory, so that the time consumption for downloading the data during data recovery is reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:

FIG. 1 is a schematic diagram of the structure of the present application;

FIG. 2 is a schematic diagram of a snapshot processing system in accordance with the present application;

FIG. 3 is a flow chart of the present application when a recovery operation is performed.

The system comprises a creation module 1, an operation module 2, a file deletion operation module 21, a file copying operation module 22, a file recovery operation module 23, a first acquisition module 221, a second acquisition module 222, a pushing module 223, a third acquisition module 231, a storage module 232 and a log replay module 233.

Detailed Description

The application will be described in detail below with reference to the drawings in connection with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

The following detailed description is exemplary and is intended to provide further details of the application. Unless defined otherwise, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the application.

In the prior art, the snapshot has the functions of performing online data backup and recovery, when the storage device has application faults or file damage, the quick data recovery can be performed, the data is recovered to a state of a certain available time point, the other function of the snapshot is to provide another data access channel for a storage user, when the original data is subjected to online application processing, the user can access the snapshot data, and can also perform testing and other works by utilizing the snapshot, and all storage systems, regardless of the high, medium and low ends, can be used as an indispensable function only when being applied to an online system.

The principle of the application is that the storage system using the consistency algorithm does not need to strictly solidify the data content of each time point because of log replay by utilizing the characteristic that the content is kept unchanged after the data engine replays the log, for example, the files in one copy group are executed with snapshot to generate a snapshot file, the snapshot file is identical with the original file at the time point, and then the engine file is updated by the business along with the progress of the business, so that the snapshot file is different from the snapshot file. When the member needing to restore the data appears later, the snapshot file is only needed to be sent to the past, and then the log playback is executed.

Example 1

Referring to fig. 1, the snapshot processing method of the present application includes:

constructing a file name list by using names of all files in a data engine, storing the file name list in a metadata file under a snapshot directory, and simultaneously recording a log number at the current moment;

and (3) performing file deletion operation of the distributed storage system by using a consistency algorithm based on the principle that the content of the data engine is kept unchanged before and after the log is replayed by the data engine, specifically, when a deletion operation instruction sent by a requester is detected, creating a file which has the same name as the file to be deleted in the deletion operation instruction and is blank in the snapshot directory, and deleting the file to be deleted in the data engine.

For example: when the name of the file to be deleted is file, file is created in the snapshot directory, and meanwhile the file is empty, and then the file to be deleted in the data engine is deleted.

It should be noted that, the application saves the blank file with the same name while deleting the file, which is beneficial to reducing the time consumption of downloading the data during the data recovery.

Example two

the file copying operation of the distributed storage system using the consistency algorithm is carried out based on the principle that the content of the data engine is kept unchanged before and after the log is replayed by the data engine, and the specific process is as follows:

when a data copying operation instruction sent by a requester is detected, sending a metadata file to be copied to the requester under a snapshot directory according to the data copying operation instruction;

when the file pulling request sent by the requester is detected, searching a file from the data engine according to the file name in the file pulling request, and when the file with the same file name as the file pulling request is searched in the data engine, sending the searched file to the requester; when the file with the same name as that in the file pulling request is not found in the data engine, searching the file with the same name as that in the file pulling request and blank under the snapshot directory, and then sending the file with the same name as that in the file pulling request and blank to a requester.

And for the requesting party, the requesting party sequentially sends file pulling requests of the files according to the file names in the received metadata files to be copied, so that the files are sequentially pulled.

For example, the names of the files required to be acquired by the requester are file1, file2 and file3, a data copying operation instruction is sent to the leader, the manager leader sends metadata file information file to the requester, the metadata file information file to be copied stores file names file1, file2 and file3, the requester obtains the metadata file to be copied, file requests for pulling the names file1, file2 and file3 are sequentially sent to the manager leader, and after the manager leader receives the requests, the manager leader sequentially searches the data engine for the files file1, file2 and file3 and then sequentially sends the file requests to the requester.

It should be noted that when no corresponding file is found in the data engine, a file with the same file name and blank file is sent to the requester, where the content of the blank file is blank, which does not cause incorrect subsequent operation, and the requester can be notified that the file is not found in the data engine.

Example III

the file recovery operation of the distributed storage system using the consistency algorithm is carried out based on the principle that the content of the data engine is kept unchanged before and after the log is replayed by the data engine, and the specific process is as follows:

when a data recovery operation request sent by a requester is obtained, downloading a file requested to be recovered by the data recovery operation request from a data engine, copying the file, and modifying the name of a downloading directory in the file downloading process;

Referring to fig. 3, for example, the administrator obtains a data recovery operation instruction sent by the requester, where the operation instruction is used to recover the file1, the file2 and the file3, and then downloads the file1, the file2 and the file3 from the data engine, creates a snapshot directory at the same time, then stores a list of file names stored with the file1, the file2 and the file3 under the newly created snapshot directory meta file, then deletes the directory of the data engine, modifies the downloaded directory into the directory of the data engine, and performs log playback on the data engine, thereby writing the file into the data engine.

It should be noted that, the present application is mainly aimed at the solution of special design of the storage system using the distributed consistency algorithm, compared with the traditional snapshot solution, the present application is simple and reliable to implement, and basically eliminates the consumption of the bottom disk resources and the disturbance of the business when the consistency algorithm snapshots the data engine.

Example IV

Referring to fig. 2, the snapshot processing system of the present application includes:

the creation module 1 is used for constructing a file name list by utilizing the names of all files in the data engine, and storing the file name list in a metadata file under a snapshot directory;

and the operation module 2 is used for performing file deletion operation, file copying operation or file recovery operation of the distributed storage system using the consistency algorithm based on the principle that the content of the data engine remains unchanged before and after the data engine performs the replay log when the operation instruction sent by the requester is detected.

The operation module 2 includes:

the file deleting operation module 21 is configured to, when a deleting operation instruction sent by the requester is detected, create a file that has the same name as a file to be deleted in the deleting operation instruction and is blank in the snapshot directory, and delete the file to be deleted in the data engine;

the file replication operation module 22 is configured to send a metadata file to be replicated to the requester when detecting a data replication operation instruction sent by the requester, where file names of all files to be replicated in the data replication operation instruction are stored in the metadata file to be replicated; when a file pulling request sent by a requester is detected, sending the pulled file to be copied to the requester according to the file pulling request;

and the file recovery operation module 23 is configured to, when detecting a file recovery operation instruction sent by the requester, perform downloading copying of the file from the data engine according to the data recovery operation instruction sent by the requester, then create a snapshot directory, and construct a metadata file to be recovered, where a name of the file obtained by downloading copying is stored in the metadata file to be recovered, store the metadata file to be recovered under the created snapshot directory, and then perform log playback on the data engine.

The file copy operation module 22 includes:

a first obtaining module 221, configured to detect a data replication operation instruction sent by a requester, and send, under a snapshot directory, a metadata file to be replicated to the requester according to the data replication operation instruction;

a second obtaining module 222, configured to obtain a file pulling request sent by a requester;

the pushing module 223 is configured to search for a file from the data engine according to the file name in the file pulling request, and when the file with the same file name as the file pulling request is searched for in the data engine, send the searched file to the requester; when the file with the same name as that in the file pulling request is not found in the data engine, searching the file with the same name as that in the file pulling request and blank under the snapshot directory, and then sending the file with the same name as that in the file pulling request and blank to a requester.

The file recovery operation module 23 includes:

a third obtaining module 231, configured to detect a data recovery operation request sent by a requester, download a file requested to be recovered by the data recovery operation request from a data engine, copy the file, and modify a name of a download directory in a file downloading process;

the storage module 232 is configured to create a snapshot directory, construct a metadata file to be restored, and store the metadata file to be restored under the newly created snapshot directory;

the log replay module 233 is configured to delete the directory of the data engine, modify the downloaded directory to be the directory of the data engine, and perform log replay on the data engine to complete recovery of the data.

Example five

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the snapshot processing method when executing the computer program, wherein the memory may comprise a memory, such as a high-speed random access memory, and may also comprise a non-volatile memory, such as at least one disk memory or the like; the processors, network interfaces, memories are interconnected by an internal bus, which may be an industry standard architecture bus, a peripheral component interconnect standard bus, an extended industry standard architecture bus, etc., and the buses may be divided into address buses, data buses, control buses, etc. The memory is used for storing programs, which may include program code including computer operation instructions, in particular. The memory may include memory and non-volatile storage and provide instructions and data to the processor.

Example six

A computer readable storage medium storing a computer program which when executed by a processor performs the steps of the snapshot processing method, in particular, the computer readable storage medium including, but not limited to, for example, volatile memory and/or nonvolatile memory. The volatile memory may include Random Access Memory (RAM) and/or cache memory (cache), among others. The non-volatile memory may include Read Only Memory (ROM), hard disk, flash memory, optical disk, magnetic disk, and the like.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present application and not for limiting the same, and although the present application has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the application without departing from the spirit and scope of the application, which is intended to be covered by the claims.

Claims

1. A snapshot processing method, comprising:

when a file recovery operation instruction sent by a requester is detected, downloading and copying the file from a data engine according to the data recovery operation instruction sent by the requester, creating a snapshot directory, and constructing a metadata file to be recovered, wherein the name of the file obtained by downloading and copying is stored in the metadata file to be recovered, the metadata file to be recovered is stored under the created snapshot directory, and then the log replay is carried out on the data engine;

when the file pulling request sent by the requester is detected, searching a file from the data engine according to the file name in the file pulling request, and when the file with the same file name as the file pulling request is searched in the data engine, sending the searched file to the requester; when a file with the same name as that in a file pulling request is not found in a data engine, searching a file with the same name as that in the file pulling request and blank in a snapshot directory, and then sending the file with the same name as that in the file pulling request and blank to a requester;

2. The snapshot processing method according to claim 1, further comprising: when the deleting operation instruction sent by the requester is detected, creating a file which has the same name as the file to be deleted in the deleting operation instruction and is blank in the snapshot directory, and deleting the file to be deleted in the data engine.

3. The snapshot processing method according to claim 1, wherein when a data copy operation instruction sent by a requester is detected, a metadata file to be copied is sent to the requester under a snapshot directory according to the data copy operation instruction.

4. The snapshot processing method according to claim 1, wherein when a pull file request sent by a requester is detected, a file is searched from a data engine according to a file name in the pull file request, and the searched file is sent to the requester.

5. A snapshot processing system, comprising:

the creation module (1) is used for constructing a file name list by utilizing the names of all files in the data engine, and storing the file name list in a metadata file under a snapshot directory;

the file copying operation module (22) is used for sending metadata files to be copied to the requesting party when detecting a data copying operation instruction sent by the requesting party, wherein the metadata files to be copied store file names of all files to be copied in the data copying operation instruction; when a file pulling request sent by a requester is detected, sending the pulled file to be copied to the requester according to the file pulling request;

the file recovery operation module (23) is used for carrying out downloading copy of the file from the data engine according to the data recovery operation instruction sent by the request party when the file recovery operation instruction sent by the request party is detected, then creating a snapshot catalog, constructing a metadata file to be recovered, wherein the name of the file obtained by downloading copy is stored in the metadata file to be recovered, storing the metadata file to be recovered under the created snapshot catalog, and then carrying out log replay on the data engine;

6. The snapshot processing system of claim 5, further comprising:

and the file deleting operation module (21) is used for creating a file which has the same name as the file to be deleted in the deleting operation instruction and is blank in the snapshot directory when the deleting operation instruction sent by the requester is detected, and then deleting the file to be deleted in the data engine.

7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the snapshot processing method according to any one of claims 1 to 4 when the computer program is executed.

8. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the snapshot processing method according to any one of claims 1 to 4.