CN111399774A

CN111399774A - Data processing method and device based on snapshot under distributed storage system

Info

Publication number: CN111399774A
Application number: CN202010159285.9A
Authority: CN
Inventors: 雷俊; 廖俊威; 王文斌; 刘名欣; 肖永玲; 张旭明; 王豪迈; 胥昕
Original assignee: Xsky Beijing Data Technology Corp ltd
Current assignee: Beijing Xingchen Tianhe Technology Co ltd
Priority date: 2020-03-09
Filing date: 2020-03-09
Publication date: 2020-07-10
Anticipated expiration: 2040-03-09
Also published as: CN111399774B

Abstract

The invention discloses a data processing method and device based on snapshots in a distributed storage system. Wherein, the method comprises the following steps: acquiring a first snapshot corresponding to a first block device in a first resource pool; cloning the first snapshot in a second resource pool to generate a second block device in the second resource pool; when a write request is received, a new object is created in the second block device, and the data to be written in the write request is directly written in the new object. The invention solves the technical problem of lower data read-write performance after cross-resource pool cloning is carried out on the snapshot of the block device in the prior art.

Description

Data processing method and device based on snapshot under distributed storage system

Technical Field

The invention relates to the field of distributed storage, in particular to a data processing method and device based on snapshots in a distributed storage system.

Background

Currently, a way of COW is adopted by a Ceph distributed storage system snapshot, a clone operation is performed on one block device across resource pools, as shown in fig. 1, after the snapshot operation is performed on the virtual machine 1 across devices, a clone operation is performed to obtain clone1 in Pool, when data is to be written into the clone, as shown in fig. 2, 4K data is written into a clone volume (i.e., C L one1), an object corresponding to the 4K data does not exist in the Pool2, and therefore, data corresponding to a source object (generally 4M) needs to be read out from the Pool1, an object is newly built in the Pool2, and the read 4M data and service 4K data are merged and written into a newly-built object.

The defects of the above method are that, in each data writing process, no matter the data size, the complete 4M object data is read from the original resource pool where the snapshot is located, then the object is written into the resource pool where the clone is located, and then new data is written, and this process needs to read and write twice, which can cause the performance to be sharply reduced, and can cause the space occupancy rate of the clone resource pool to be rapidly increased, and finally, the read-write performance to be reduced.

Aiming at the problem that the data read-write performance is low after cross-resource pool cloning is carried out on snapshots of block devices in the prior art, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the invention provides a data processing method and device based on snapshots in a distributed storage system, which at least solve the technical problem of low data read-write performance after cross-resource pool cloning is carried out on the snapshots of block equipment in the prior art.

According to an aspect of the embodiments of the present invention, a snapshot-based data processing method in a distributed storage system is provided, including: acquiring a first snapshot corresponding to a first block device in a first resource pool; cloning the first snapshot in a second resource pool to generate a second block device in the second resource pool; when a write request is received, a new object is created in the second block device, and the data to be written in the write request is directly written in the new object.

Further, the second piece of equipment is used for being mounted to a virtual machine for use, and when a client of the virtual machine receives a write operation, the write request is sent to the second resource pool.

Further, the first resource pool is composed of full flash memory media disks, and the second resource pool is composed of hard disk drive disk media.

Further, the method further comprises: receiving a read request; judging whether the data to be read indicated by the reading request is in a second block of equipment in the second resource pool or not; and acquiring the data to be read from the second block of equipment or the first block of equipment according to the judgment result.

Further, the determining, by the reading request, whether the data to be read indicated by the reading request is located in a second block device in the second resource pool, where the reading request includes a start position and a length of the data to be read in the block device, includes: taking the starting position and the length as a first window to perform sliding matching with a second window corresponding to the second piece of equipment; and determining that the overlapped part of the first window and the second window is data to be read existing in the second block device, and determining that the overlapped part of the first window and the second window is data to be read not existing in the second block device.

Further, acquiring the read data from the second block device or the first block device according to the determination result includes: acquiring data to be read existing in the second block device from the second block device; and acquiring the data to be read which does not exist in the second block device from the first block device.

According to an aspect of the embodiments of the present invention, there is provided a snapshot-based data processing apparatus in a distributed storage system, including: the acquisition module is used for acquiring a first snapshot corresponding to a first block device in a first resource pool; a generating module, configured to clone the first snapshot in a second resource pool, and generate a second block device in the second resource pool; and the writing module is used for creating a new object in the second block device when a writing request is received, and directly writing the data to be written in the writing request in the new object.

According to an aspect of the embodiments of the present invention, a storage medium is provided, where the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to execute the snapshot-based data processing method in the distributed storage system.

According to an aspect of the embodiments of the present invention, there is provided a processor, configured to execute a program, where the program executes the snapshot-based data processing method in the distributed storage system when running.

In the embodiment of the present invention, a first snapshot corresponding to a first block device in a first resource pool is obtained, the first snapshot is cloned in a second resource pool, a second block device in the second resource pool is generated, when a write request is received, a new object is created in the second block device, and data to be written in the write request is directly written in the new object. According to the method for writing the cross-resource-pool clone of the block device, when IO is written, data in an original block device does not need to be copied to a new block device obtained by cloning, the data is directly written into a new object, and the data on the cross-resource-pool clone volume is used as incremental data of the snapshot of the source resource pool, so that repeated reading and writing are not needed, the clone volume does not occupy large space, if the data variation is 1%, the occupied space of the clone volume is only 1% of the size of the original volume, and the technical problem that the data reading and writing performance is low after the cross-resource-pool clone is carried out on the snapshot of the block device in the prior art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of a block device performing a clone operation across a pool of resources, according to the prior art;

FIG. 2 is a schematic diagram of a write data after a clone operation according to the prior art;

FIG. 3 is a flow chart of a snapshot-based data processing method under a distributed storage system according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of writing data in a clone volume, according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a read operation according to an embodiment of the present invention; and

fig. 6 is a schematic diagram of a snapshot-based data processing apparatus under a distributed storage system according to an embodiment of the present invention.

In the following, in order to facilitate understanding of the embodiments, terms appearing in the embodiments are explained:

resource pool: pool, also called storage pool, is a logical storage space consisting of a set of physical disks or solid state disks.

Object: the basic unit of data stored in the resource pool is typically set to a size of 4 MB.

Block equipment: for volume, a block device is a sequence of bytes (e.g., a 512-byte block of data). Block-based storage interfaces are the most common methods of storing data, and they are based on rotating media, like hard disks, CDs, floppy disks, and even traditional 9-track tapes. A block device may be created in a resource pool and composed of a set of objects.

Copy-on-Write, when a block of protected entry is to be overwritten, it is first copied elsewhere (i.e., to a location specified by the snapshot system) and then overwritten at its original location (i.e., the protected entry's storage location).

ROW: Redirect-on-Write, Row snapshot uses pointers to point to all blocks of protected entry, if a block is to be overwritten, the storage system points the pointer to the block to a new location, and then writes the new data to the new location.

Snapshot: in short Snap, a snapshot is a read-only copy of a block device at a particular point in time, the copy including all data of the block device at that point in time (the point in time at which the copy began). The snapshot may be a copy of the data it represents or may be a replica of the data.

Cloning: called Clone for short, is a writable block device created based on some read-only snapshot of the block device. Read-write changes on the clone do not affect the source snapshot and source block devices.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided an embodiment of a snapshot-based data processing method under a distributed storage system, it should be noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and that although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

Fig. 3 is a flowchart of a snapshot-based data processing method in a distributed storage system according to an embodiment of the present invention, and as shown in fig. 3, the method includes the following steps:

step S302, a first snapshot corresponding to a first block device in a first resource pool is obtained.

Specifically, the first block device may be a block device used by the virtual machine. The above steps are used for virtual machine mirroring, that is, a snapshot operation is performed on a block device used by the virtual machine, and a read-only snapshot of the block device is persisted.

Step S304, cloning the first snapshot in a second resource pool, and generating a second block device in the second resource pool.

When a new virtual machine is created, a clone operation is performed on the first snapshot to obtain a new readable and writable block device, and the new readable and writable block device can be mounted to the new virtual machine for use.

Step S306, when a write request is received, creating a new object in the second block device, and directly writing the data to be written in the write request in the new object.

In the above steps, when the user writes data through the newly created virtual machine, the data to be written is directly written into the new object created in the second block device.

Fig. 4 is a schematic diagram of writing data in a Clone volume according to an embodiment of the present invention, and in an alternative embodiment, as shown in fig. 4, a Clone operation is performed based on a snapshot of an original volume in the POO L1, so as to obtain a Clone volume 1 and a Clone volume 2 in the POO L2 (the Clone volume 1 and Clone volume 2 can support the use of different virtual machines). a write request is obtained through a client of the newly-created virtual machine, so that data to be written is directly written (direct write) into a newly-created object of the Clone volume 1, and the snapshot operation is performed on the Clone volume 1. for IO data with a size of 4K, for example, an empty new object is directly created in the Clone1 of the new resource pool POO L2, and only data with the size of 4K specified by the IO command is written.

As can be seen from the above, in the above embodiments of the present application, a first snapshot corresponding to a first block device in a first resource pool is obtained, the first snapshot is cloned in a second resource pool, a second block device in the second resource pool is generated, when a write request is received, a new object is created in the second block device, and data to be written in the write request is directly written in the new object. According to the method for writing the cross-resource-pool clone of the block device, when IO is written, data in an original block device does not need to be copied to a new block device obtained by cloning, and the data is directly written into a new object, so that the data on the cross-resource-pool clone volume is used as incremental data of the snapshot of the source resource pool, multiple times of reading and writing are not needed, the clone volume does not occupy a large space, if the data variation is 1%, the occupation space of the clone volume is only 1% of the size of the original volume, and the technical problem that the data reading and writing performance is low after the snapshot of the block device is cloned across the resource pool in the prior art is solved.

As an optional embodiment, the second block device is configured to mount to a virtual machine for use, and when a client of the virtual machine receives a write operation, send the write request to the second resource pool.

In the above steps, when a new virtual machine is created, the snapclone of the block device of the original virtual machine is mounted to the new virtual machine for use. When the user performs writing operation from the client of the virtual machine, the client of the virtual machine sends a writing request to the second resource pool.

As an alternative embodiment, the first resource pool is composed of full flash media disks and the second resource pool is composed of hard disk drive disk media.

In application scenarios such as a cloud platform, virtualization, a cloud desktop, and the like, the above scheme is to run a block device used by a virtual machine image in a resource pool composed of full flash memory media disks (the capacity price of a full flash memory Disk is many times higher than the cost of an HDD Disk), and to run a new virtual machine block device created by a virtual machine image in a resource pool composed of HDD (Hard Disk Drive) Disk media. Therefore, the read-write operation speed of the virtual machine can be guaranteed, and the cost of storage required by the cloud platform is greatly reduced.

As an alternative embodiment, the method further includes: receiving a read request; judging whether the data to be read indicated by the reading request is in a second block of equipment in the second resource pool or not; and acquiring the data to be read from the second block of equipment or the first block of equipment according to the judgment result.

Specifically, the read request is still issued when the user performs a read operation on the newly built virtual machine. In this case, it is necessary to determine whether the data to be read exists in the second block device, and then determine how to read the data to be read.

In an alternative implementation, still referring to fig. 4, when reading IO, first reading an object on a clone volume in the pool2, and if the object can satisfy the size of the read IO, returning a read completion; if the object does not exist or cannot meet the size of the whole read IO, the object on the snapshot in the source resource pool1 is continuously read, and the two read results are merged and returned.

As an optional embodiment, the determining, by the reading request, whether the data to be read indicated by the reading request is located in a second block device in the second resource pool includes: taking the starting position and the length as a first window to perform sliding matching with a second window corresponding to the second piece of equipment; and determining that the overlapped part of the first window and the second window is data to be read existing in the second block device, and determining that the overlapped part of the first window and the second window is data to be read not existing in the second block device.

In the above scheme, the read request includes a start position and a length of the data to be read, which is found by the block device, so that a first window can be obtained according to the read request. And the data stored in the second block device also has its start position and length and can therefore constitute a second window. And performing sliding matching on the two windows to determine whether the data to be read exists in the second device. If the first window and the second window are not coincident, it can be determined that all data to be read is not in the second block device, if the first window and the second window are partially coincident, it can be determined that the data to be read, which is represented by the portion coincident with the second window in the first window, is in the second block device, and it is determined that the data, which is represented by the portion not coincident with the second window in the first window, is not in the second block device; and if the first window is completely covered by the second window, the data to be read can be determined to belong to the second device.

As an alternative embodiment, acquiring the read data from the second block device or the first block device according to the determination result includes: acquiring data to be read existing in the second block device from the second block device; and acquiring the data to be read which does not exist in the second block device from the first block device.

Fig. 5 is a schematic diagram of data reading according to an embodiment of the present invention, and in conjunction with fig. 5, a client of a virtual machine may read data through two ways, if the data to be read is all in clone volume 1 of POO L2 (i.e., the second device obtained by cloning), the data is read from clone volume 1 through hash read (1), if the data to be read is all in the original volume (i.e., the first device), the data is read from the original volume through hash read (2), if part of the data in the data to be read is in clone volume 1 and part of the data is in the original volume, the data is read through (1) and (2), respectively, and is returned to the client of the virtual machine after being merged through merge extension buffer (3).

Example 2

According to an embodiment of the present invention, fig. 6 is a schematic diagram of a snapshot-based data processing apparatus in a distributed storage system according to an embodiment of the present invention, and as shown in fig. 6, the apparatus includes:

the obtaining module 60 is configured to obtain a first snapshot corresponding to a first block device in the first resource pool.

A generating module 62, configured to clone the first snapshot in a second resource pool, and generate a second block device in the second resource pool.

And a writing module 64, configured to create a new object in the second block device when a write request is received, and directly write the data to be written in the write request in the new object.

As an alternative embodiment, the apparatus further comprises: a receiving module for receiving a read request; a determining module, configured to determine whether the data to be read indicated by the read request is located in a second block of equipment in the second resource pool; and the reading module is used for acquiring the data to be read from the second block of equipment or the first block of equipment according to the judgment result.

As an alternative embodiment, the read request includes a start position and a length of data to be read in the block device, and the determining module includes: the matching submodule is used for performing sliding matching on the starting position and the length which serve as a first window and a second window corresponding to the second piece of equipment; and the determining submodule is used for determining that the superposed part of the first window and the second window is the data to be read existing in the second block device, and the non-superposed part of the first window and the second window is the data to be read which is not existing in the second block device.

As an alternative embodiment, the reading module comprises: the first reading submodule is used for acquiring data to be read existing in the second block device from the second block device; and the second reading submodule is used for acquiring data to be read which does not exist in the second block device from the first block device.

Example 3

According to an embodiment of the present invention, a storage medium is provided, where the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to execute the snapshot-based data processing method in the distributed storage system described in embodiment 1.

Example 4

According to an embodiment of the present invention, a processor is provided, where the processor is configured to run a program, where the program executes the snapshot-based data processing method in the distributed storage system according to embodiment 1 during running.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A data processing method based on snapshot under a distributed storage system is characterized by comprising the following steps:

acquiring a first snapshot corresponding to a first block device in a first resource pool;

cloning the first snapshot in a second resource pool to generate a second block device in the second resource pool;

when a write request is received, a new object is created in the second block device, and the data to be written in the write request is directly written in the new object.

2. The method of claim 1, wherein the second block device is configured to mount for use by a virtual machine, and wherein the write request is sent to the second resource pool when a write operation is received by a client of the virtual machine.

3. The method of claim 1, wherein the first resource pool is comprised of full flash media disks and the second resource pool is comprised of hard drive disk media.

4. The method according to any one of claims 1 to 3, further comprising:

receiving a read request;

judging whether the data to be read indicated by the reading request is in a second block of equipment in the second resource pool or not;

and acquiring the data to be read from the second block of equipment or the first block of equipment according to the judgment result.

5. The method of claim 4, wherein the read request includes a start position and a length of data to be read in a block device, and determining whether the data to be read indicated by the read request is in a second block device in the second resource pool comprises:

taking the starting position and the length as a first window to perform sliding matching with a second window corresponding to the second piece of equipment;

and determining that the overlapped part of the first window and the second window is data to be read existing in the second block device, and determining that the overlapped part of the first window and the second window is data to be read not existing in the second block device.

6. The method according to claim 5, wherein obtaining the read data from the second block device or the first block device according to the determination result comprises:

acquiring data to be read existing in the second block device from the second block device; and

and acquiring the data to be read which does not exist in the second block device from the first block device.

7. A data processing device based on snapshot under a distributed storage system is characterized by comprising:

the acquisition module is used for acquiring a first snapshot corresponding to a first block device in a first resource pool;

a generating module, configured to clone the first snapshot in a second resource pool, and generate a second block device in the second resource pool;

and the writing module is used for creating a new object in the second block device when a writing request is received, and directly writing the data to be written in the writing request in the new object.

8. The apparatus of claim 7, wherein the second block device is configured to mount for use by a virtual machine, and wherein when a client of the virtual machine receives a write operation, the write request is sent to the second resource pool.

9. A storage medium, characterized in that the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to execute the snapshot-based data processing method in the distributed storage system according to any one of claims 1 to 6.

10. A processor, configured to execute a program, where the program executes the snapshot based data processing method under the distributed storage system according to any one of claims 1 to 6 during the execution.