CN113704349A

CN113704349A - Data synchronization method and device and node equipment

Info

Publication number: CN113704349A
Application number: CN202110787718.XA
Authority: CN
Inventors: 刘浩
Original assignee: New H3C Technologies Co Ltd Chengdu Branch
Current assignee: New H3C Technologies Co Ltd Chengdu Branch
Priority date: 2021-07-13
Filing date: 2021-07-13
Publication date: 2021-11-26

Abstract

The specification provides a data synchronization method, a data synchronization device and node equipment, and relates to the technical field of communication. A data synchronization method is applied to node equipment and comprises the following steps: receiving a write-in request sent by a client; determining a target disk from the local disks according to locally maintained metadata and a write-in request; writing data into a target disk, updating metadata, and sending write feedback to a client; and if the recorded time reaches the preset time, synchronizing the data stored in the target disk and the corresponding metadata to the backup node equipment. By the method, the writing efficiency of the file system can be improved.

Description

Data synchronization method and device and node equipment

Technical Field

The present specification relates to the field of communications technologies, and in particular, to a data synchronization method, an apparatus, and a node device.

Background

With the development of technologies such as high-definition video, image processing, video monitoring, and the like, data that a user needs to store is becoming larger and larger, and the requirement of the user on the data reading and writing performance is also increasing synchronously.

In some scenarios, due to the adoption of the distributed file system, when data is written, the file system selects a plurality of nodes from deployed nodes to write respectively, so as to ensure the reliability of the data. And when all the nodes are completely backed up, returning a completion notice to the client initiating the data writing. In this process, if the initiated client can only support single-threaded transmission, it is necessary to wait for data to be written into the selected multiple nodes before the next data writing can be performed. Therefore, in the case of a large amount of data, the client needs to wait for a long time, so that more write requests will be blocked, which causes a problem of reducing the efficiency of the file system.

Disclosure of Invention

In order to overcome the problems in the related art, the present specification provides a data synchronization method, an apparatus, and a node device.

The application provides a method for synchronizing data, which is applied to node equipment and comprises the following steps:

receiving a write-in request sent by a client;

determining a target disk from the local disks according to locally maintained metadata and a write-in request;

writing data into a target disk, updating metadata, and sending write feedback to a client;

and if the recorded time reaches the preset time, synchronizing the data stored in the target disk and the corresponding metadata to the backup node equipment.

Further, determining a target disk from the local disk according to the locally maintained metadata and the write request, including:

searching metadata maintained by the user according to the file name carried in the write-in request;

if the metadata corresponding to the file name is found, determining a disk which is recorded in the metadata and corresponds to the file name as a target disk;

and if the metadata corresponding to the file name is not found, performing hash operation according to the file name, and determining the target disk from the local disk based on the calculation result.

Optionally, the number of target disks is at least two.

Optionally, after writing data to the target disk and sending the write feedback to the client, the method further includes:

if the occupancy rate of the target disk is greater than the preset occupancy rate, balancing the deployed disks;

the recorded metadata is updated.

In combination with the second aspect of the embodiments of the present specification, the present application provides a data synchronization apparatus, applied to a node device, including:

the receiving unit is used for receiving a writing request sent by a client;

the selecting unit is used for determining a target disk from the local disks according to the locally maintained metadata and the write-in request;

the sending unit is used for writing data into the target disk, updating the metadata and sending write feedback to the client;

and the synchronization unit is used for synchronizing the data stored in the target disk and the corresponding metadata to the backup node equipment if the recorded time reaches the preset time.

Further, the selection unit includes:

the searching module is used for searching metadata maintained by the searching module according to the file name carried in the writing request;

and the determining module is used for determining a disk which is recorded in the metadata and corresponds to the file name as a target disk if the metadata corresponding to the file name is found, performing hash operation according to the file name and determining the target disk from the local disk based on the calculation result if the metadata corresponding to the file name is not found.

Optionally, the number of target disks is at least two.

Optionally, the method further includes:

the balancing unit is used for balancing the deployed disks if the occupancy rate of the target disk is greater than the preset occupancy rate;

an updating unit for updating the recorded metadata.

In connection with a third aspect of embodiments herein, there is provided a node device comprising a processor and a machine-readable storage medium;

a machine-readable storage medium stores machine-executable instructions executable by a processor, the processor being caused by the machine-executable instructions to: implementing any of the above method steps.

In connection with a fourth aspect of embodiments herein, there is provided a machine-readable storage medium having stored thereon machine-executable instructions which, when executed by a processor, implement a method as in any one of the above.

The technical scheme provided by the implementation mode of the specification can have the following beneficial effects:

in the embodiment of the present specification, by setting a redundancy policy, after receiving a write request sent by a client in one node device, a target disk is selected from disks deployed by the node device to write data, and after writing the data, write feedback is directly sent to the client, and an asynchronous synchronization mode is set in the node device to synchronize with a backup node device in a file system.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present specification and together with the description, serve to explain the principles of the specification.

FIG. 1 is a flow chart of a method of data synchronization to which the present application relates;

FIG. 2 is a networking diagram of a file system according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a data synchronization apparatus according to the present application;

fig. 4 is a schematic structural diagram of a node device according to the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification.

The application provides a method for synchronizing data, which is applied to node equipment, and as shown in fig. 1, the method includes:

s100, receiving a write request sent by a client.

The networking of the distributed file system, as shown in fig. 2, includes several node devices and switching devices, and the client is communicatively connected to the distributed file system through a network or a direct connection. Specifically, the distributed file system 1 is described by taking three node devices 2 as an example, which are node devices 20, 21, and 22, and multiple switching devices may be disposed between the node devices 2 to implement different functions, for example, the distributed file system may include a management switch 3, an interconnection switch 4, and a network switch 5. The management switch 3 may be connected to a server to manage and control the node devices 2 in the distributed file system 1, the interconnection switch 4 is respectively connected to different node devices 2 to implement data interaction between the node devices 2, and the network switch 5 is connected to the client 6 to implement operations such as writing and reading of data. The node device 2 may be a server provided with a plurality of disks 7, a server including a plurality of disks 7, or a specially configured storage device, and this scheme is not limited thereto.

When the client 6 finishes collecting data to be written into the distributed file system, a write request is generated, and the write request contains information such as a file name of a file to be written, data length and the like.

S101, determining a target disk from the local disks according to locally maintained metadata and a write-in request.

The node devices 2 respectively maintain metadata of data stored therein, the metadata being description information of the data and being capable of being marked with physical location information (a physical address where the data is stored, that is, a specific location on a disk) of the data.

After receiving the write request sent by client 6, network switch 5 may distribute the write request to one of the plurality of node devices 2, such as to node device 20.

After receiving the write request, the node device 20 analyzes the write request, and based on the write request and the metadata maintained by itself, the node device 20 may determine a target disk to which data carried by the write request needs to be written.

Specifically, step S101, determining the target disk from the local disk according to the locally maintained metadata and the write request may include:

S101A, according to the file name carried in the writing request, searching the metadata maintained by itself.

After parsing out the file name carried in the write request, the node device 20 searches for the metadata maintained in the node device 20 according to the file name, and if the metadata corresponding to the file name can be found, the step goes to step S101B, and if the metadata corresponding to the file name cannot be found, the step goes to step S101C.

S101B, if the metadata corresponding to the file name is found, determining the disk corresponding to the file name recorded in the metadata as a target disk.

S101C, if the metadata corresponding to the file name is not found, performing hash operation according to the file name, and determining the target disk from the local disks based on the calculation result.

If the metadata maintained by the node device 20 includes a corresponding file name, it indicates that the node device 20 has stored data corresponding to the file name, and this time, the operations such as modifying, adding, or deleting the data corresponding to the file name are performed. At this time, the disk storing the data is determined from the file name recorded in the metadata, and the determined disk may be referred to as a target disk. And then writing data into the target disk which stores the data corresponding to the file name carried by the writing request.

If the metadata maintained by the node device 20 does not include the corresponding file name, it indicates that the node device 20 has not previously stored the data corresponding to the file name. At this time, the node device 20 may perform a hash operation, such as a hash algorithm, on the file name, and the file name may be changed to a value within a certain range through the hash operation, so that a specific disk, such as the disk 70, may be selected from the disks 7 deployed in the node device 20 based on the value.

In addition, it should be noted that, in order to further improve the reliability of data in one node device 2, a plurality of disks 7 may be determined as target disks, and the target disks are written into the plurality of target disks, for example, the disks 70 and 71 may be determined as target disks, of course, the number of the determined target disks may be two or more, and may be set based on actual requirements, which is not limited thereto. The following description will be given taking the determination of the two disks 70, 71 as target disks. When the target disk is selected, one disk may be determined according to the hash operation, and then the second disk may be determined by using the number of disks included in the node device 20 as a base value and subtracting the operation result according to the operation result of the hash operation by a fixed deviation, or two operation results may be directly determined by configuring the hash operation, which is not limited in this embodiment.

That is, if the corresponding metadata is not stored, the physical locations of the two disks may be pointed to in the metadata maintained in the node device 20 for storage, or if the corresponding metadata is stored, it may be determined that the two disks are performing data storage through a hash operation.

S102, writing data into the target disk, updating metadata, and sending write feedback to the client.

After the node device 20 determines the target disk (i.e., the disks 70 and 71), the data carried in the write request may be written into the determined target disk, specifically, free data blocks may be applied in the target disk, and the data is written into the free data blocks, where the data blocks may be reflected as physical locations in the disks.

After the data is written to the target disk, the node device 20 needs to update (including adding, modifying, or deleting) the metadata and send write feedback to the client to inform the client that the data write is complete. Based on the network setting, the write request and the write feedback may be messages based on a TCP/IP (Transmission Control Protocol/Internet Protocol) Protocol, and may also be messages of other protocols, which may be set according to actual requirements without limitation.

In addition, if the writing to the disk 7 of the node device 20 is failed, for example, due to insufficient disk space, the writing feedback of the node device 20 is used to notify the client of the writing failure.

After the node device 20 feeds back to the client 6, the write-in request subsequently sent to the node device 20 may be processed, and thus, the subsequent write-in request sent by the client 6 may be processed by the node device 20, thereby reducing congestion caused by the feedback to the client 6 after synchronization among the multi-node devices 2, and improving the write-in efficiency of the file system.

And S103, if the recorded time reaches the preset time, synchronizing the data stored in the target disk and the corresponding metadata to the backup node equipment.

In order to implement the data reliability of the node devices 2, it is still necessary to implement data synchronization between the node devices 2, and since the node device 20 has sent write feedback to the client 6 in step S102 to inform the client 6 that data writing has been completed, the subsequent synchronization may be regarded as a flow independent from the written data, that is, an asynchronous synchronization process. In the node device 20, a preset time may be configured, and the preset time may be a fixed time period, such as 1 hour, or may be determined by combining the system time of the node device 20, such as 0 hour per day.

After the node device 20 starts timing or starts recording the system time, the synchronization process is started if the preset time is reached. The term "synchronization" as used herein refers to synchronizing data stored in a disk of one node device to another node device, for example, data in a disk 7 of a node device 20 may be synchronized to a node device 21, and at this time, the node device 21 is a backup node device.

It should be noted that, in the distributed file system 1, a policy for selecting a backup node device may be configured in advance. For example, based on the configuration of the distributed file system 1, the node device with the lowest disk occupancy rate in the multiple node devices 2 may be selected as the backup node device, and in the distributed file system 1, each node device 2 may interact and record the disk status in each node device 2 through the management switch 3, where the disk occupancy rate of the node device 2 may be included; and a certain node device or a plurality of node devices can be directly designated as backup node devices by workers, and the specific mode can be set according to actual networking and requirements, which is not limited to this. The following description will be given by taking an example in which the worker designates the node device 21 as a backup node device.

When the preset time is reached, the node device 20 may start a backup process, and send data stored in the disk 7 deployed by the node device 20 to the node device 21 through the interconnection switch 4, and after the node device 21 stores the data, the node device may feed back a notification of completion of synchronization to the node device 20.

In this way, in the distributed file system 1, three copies of data, that is, two copies of data contained in the disks 70, 71 of the node device 20 and one copy of data stored in the node device 21, can be contained, thereby ensuring data reliability in the distributed file system 1.

In the distributed file system 1, when the node device 2 selects the disk 7, the hash operation is performed according to the file name carried in the write request, and the occupation of a certain disk is far larger than that of other disks possibly due to the fact that the write operation on a part of files is excessive. At this time, the node apparatus 2 may perform load balancing.

Optionally, after step S102, writing data to the target disk and sending the write feedback to the client, the method further includes:

and S104, if the occupancy rate of the target disk is greater than the preset occupancy rate, balancing the deployed disks.

And S105, updating the recorded metadata.

For example, the node device 20 starts the equalization operation when the preset occupancy rate of the disk 7 exceeds 75%. Then, after the node device 20 has written for a period of time, the occupancy rate of the disk 70 reaches 80%, and the preset occupancy rate is exceeded by 75%. At this time, the node device 20 may perform balancing according to the occupancy rates of the disks 7, where the balancing referred to herein is to migrate data in a disk with an occupancy rate higher than a preset occupancy rate to a disk with a lower occupancy rate, that is, migrate data to the disk 71 and the disk 72, and a specific migration manner may be a current migration manner, which is not limited thereto.

After the node device 20 completes data equalization between the disks 7, the metadata is updated so that the updated metadata can reflect information such as the physical location of the equalized data storage.

Through the equalization process, the data storage of the disks in the node equipment can be more equalized, the problem that the disks are too high to store data due to too much data written in a certain disk or certain disks 7 is avoided, and the reliability of the node equipment for storing the data is improved.

Correspondingly, the present application provides a data synchronization apparatus, applied to a node device, as shown in fig. 3, including:

the receiving unit is used for receiving a writing request sent by a client;

Further, the selection unit includes:

Optionally, the number of target disks is at least two.

Optionally, the method further includes:

an updating unit for updating the recorded metadata.

Correspondingly, the present application provides a node device, as shown in fig. 4, comprising a processor and a machine-readable storage medium;

It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof.

The above description is only for the purpose of illustrating the preferred embodiments of the present disclosure and is not to be construed as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A data synchronization method is applied to node equipment and comprises the following steps:

receiving a write-in request sent by a client;

determining a target disk from the local disks according to locally maintained metadata and the write-in request;

writing data into the target disk, updating the metadata, and sending write feedback to the client;

2. The method of claim 1, wherein determining the target disk from the local disks according to the locally maintained metadata and the write request comprises:

and if the metadata corresponding to the file name is not found, performing hash operation according to the file name, and determining a target disk from the local disks based on a calculation result.

3. The method of claim 2, wherein the target disks are at least two.

4. The method of claim 1, after writing data to the target disk and sending write feedback to the client, further comprising:

the recorded metadata is updated.

5. A data synchronization device applied to a node device includes:

the receiving unit is used for receiving a writing request sent by a client;

the selecting unit is used for determining a target disk from the local disks according to the locally maintained metadata and the writing request;

a sending unit, configured to write data to the target disk, update the metadata, and send write feedback to the client;

and the synchronization unit is used for synchronizing the data stored in the target disk and the corresponding metadata to the backup node device if the recorded time reaches the preset time.

6. The apparatus of claim 5, wherein the selection unit comprises:

and the determining module is used for determining that the disk corresponding to the file name recorded in the metadata is used as a target disk if the metadata corresponding to the file name is not found, performing hash operation according to the file name, and determining the target disk from the local disk based on a calculation result.

7. The apparatus of claim 6, wherein the target disks are at least two.

8. The apparatus of claim 5, further comprising:

an updating unit for updating the recorded metadata.

9. A node device comprising a processor and a machine-readable storage medium;

the machine-readable storage medium stores machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: carrying out the method steps of any one of claims 1 to 4.

10. A machine-readable storage medium having stored thereon machine-executable instructions, which when executed by a processor implement the method of any one of claims 1-4.