CN114647383A - Data access method, device, storage node and storage medium - Google Patents

Data access method, device, storage node and storage medium Download PDF

Info

Publication number
CN114647383A
CN114647383A CN202210323902.3A CN202210323902A CN114647383A CN 114647383 A CN114647383 A CN 114647383A CN 202210323902 A CN202210323902 A CN 202210323902A CN 114647383 A CN114647383 A CN 114647383A
Authority
CN
China
Prior art keywords
data
metadata
written
unit
units
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210323902.3A
Other languages
Chinese (zh)
Inventor
代洪跃
易曌平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Unisinsight Technology Co Ltd
Original Assignee
Chongqing Unisinsight Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Unisinsight Technology Co Ltd filed Critical Chongqing Unisinsight Technology Co Ltd
Priority to CN202210323902.3A priority Critical patent/CN114647383A/en
Publication of CN114647383A publication Critical patent/CN114647383A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of storage, and provides a data access method, a device, a storage node and a storage medium, which are applied to the storage node, wherein the storage node comprises a persistent memory, the persistent memory comprises a metadata area and a data area, and the storage node is in communication connection with a client, and the method comprises the following steps: receiving a data writing request sent by a client, wherein the data writing request comprises the data length and the position of data to be written; generating first metadata for managing data to be written according to the data length; generating second metadata for managing the write log of the data to be written according to the data length and the position to be written; after the data to be written is written into the data area, the first metadata is written into the metadata area, and the second metadata is written into the data area. The invention can ensure the consistency of the data in the persistent memory.

Description

Data access method, device, storage node and storage medium
Technical Field
The present invention relates to the field of storage technologies, and in particular, to a data access method, apparatus, storage node, and storage medium.
Background
Although a persistent memory (PMem) can ensure that data written into the PMem cannot be lost after power failure restart, the data written into the PMem is often written into a CPU Cache first and then is printed into the PMem through a series of CPU instructions. Due to the hardware limitation of the PMem and the CPU, writing more than 8 bytes of data into the PMem and persisting cannot guarantee atomicity of the write operation (i.e., if power is lost during persisting write data, complete write of the data cannot be guaranteed), so how to guarantee consistency of data persisted into the PMem is an urgent problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a data access method, a data access device, a storage node and a storage medium, which can ensure the consistency of data which is persistently written into a PMem.
In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:
in a first aspect, an embodiment of the present invention provides a data access method, which is applied to a storage node, where the storage node includes a persistent memory, the persistent memory includes a metadata area and a data area, and the storage node is in communication connection with a client, where the method includes: receiving a data writing request sent by the client, wherein the data writing request comprises the data length and the position of data to be written; generating first metadata for managing the data to be written according to the data length; generating second metadata for managing the write log of the data to be written according to the data length and the position to be written; after the data to be written is written into the data area, the first metadata is written into the metadata area, and the second metadata is written into the data area.
Optionally, the step of generating, according to the data length, first metadata for managing the data to be written includes:
calculating the number of segments according to the data length and a preset length;
segmenting the data to be written according to the number of segments to obtain at least one data segment;
generating metadata of each data segment according to the number of the data segments and the position of each data segment in the data to be written;
generating a reserved metadata for the data to be written, wherein the reserved metadata comprises a value obtained by adding 1 to the number of the data segments;
and taking the reserved metadata and the metadata of all the data segments as the first metadata.
Optionally, the step of generating second metadata for managing the write log of the data to be written according to the data length and the position to be written includes:
acquiring a flag bit which is used for representing that the data to be written is successfully written into the data area;
generating check data according to the zone bit, the data length and the position to be written;
and taking the flag bit, the data length, the position to be written and the check data as the second metadata.
Optionally, the metadata area includes multiple metadata units, the multiple metadata units are managed in a hierarchical manner, each level corresponds to one linked list, each linked list includes at least one management node, each management node is configured to manage metadata units adjacent to each other in position, the number of metadata units managed by management nodes in the same linked list is the same, the number of metadata units managed by management nodes in any two linked lists is different, and the step of writing the first metadata into the metadata area includes:
determining a target level according to the data length and the preset length;
determining a first metadata unit to be written and a second metadata unit to be written from a management node of a linked list corresponding to the target level, wherein the number of the first metadata unit to be written is the number of the data segments, and the number of the second metadata unit to be written is 1;
sequentially writing the metadata of each data segment into each first metadata unit to be written according to the position of each data segment in the data to be written;
and writing the reserved metadata into the second metadata unit to be written.
Optionally, the data area includes a plurality of data units, each metadata unit corresponds to one data unit, each data segment is written to a data unit corresponding to each target metadata unit, and the step of writing the second metadata to the data area includes:
taking a data unit corresponding to the second metadata unit to be written as a data unit to be written;
and writing the second metadata into the data unit to be written.
Optionally, the data area includes a plurality of data units, the storage node further includes a hard disk, a disk refreshing list and a recovery list, the disk refreshing list includes a disk refreshing position of data to be refreshed and a data unit storing the data to be refreshed, the data to be refreshed is data stored in the persistent memory and is to be refreshed in the hard disk, the disk refreshing position is a position to be written in a write data request for writing the data to be refreshed, and the method further includes:
determining an effective data unit and an invalid data unit according to the position to be written and the disk brushing position;
if the valid data unit does not exist in the disk refreshing list, updating the valid data unit into the disk refreshing list so as to flush the data in the valid data unit from the persistent memory to the hard disk through the disk refreshing list;
and if the invalid data unit exists in the disk refreshing list, deleting the invalid data unit from the disk refreshing list and inserting the invalid data unit into the recovery list so as to recover the invalid data unit through the recovery list.
Optionally, the step of deleting the invalid data unit from the disk-flushing list and inserting the invalid data unit into the recycle list includes:
if the data units to be merged meeting preset merging conditions exist in the recovery list, deleting the data units to be merged from the recovery list;
merging the data unit to be merged and the invalid data unit to obtain a data unit to be inserted;
inserting the data unit to be inserted into the recovery list;
and if the data unit to be merged does not exist in the recovery list, inserting the invalid data unit into the recovery list.
Optionally, the method further comprises:
receiving a data reading request sent by the client, wherein the data reading request comprises the data length and the position of data to be read;
reading original data from the hard disk according to the data length and the position of the data to be read;
determining whether the latest data which corresponds to the data to be read and is not stored in the hard disk exists in the disk refreshing list according to the position to be read;
and combining the original data and the latest data to obtain the data to be read.
Optionally, the storage node further includes a hard disk, when the storage node is powered off, the persistent memory stores data to be downloaded, which is to be written in the hard disk, the metadata area includes a plurality of metadata units, each metadata unit includes a log identifier, the data area includes a plurality of data units, each metadata unit manages one data unit, and the data to be downloaded at least exists in one data unit, and the method further includes:
when the storage node is powered on, dividing the plurality of metadata units into at least one metadata unit group according to the log identification;
and reconstructing each metadata unit group to write the data to be downloaded into the hard disk.
Optionally, each metadata unit includes a slice number, a slice index, and a log identifier, and the step of performing reconstruction processing on each metadata unit group to write the to-be-downloaded data to the hard disk includes:
for any target metadata unit group in the at least one metadata unit group, reconstructing the target metadata unit group according to the fragmentation number and the fragmentation index of the target metadata unit in the target metadata unit group to obtain a log corresponding to the target metadata unit group;
reading the state of the log from the data units managed by the target metadata unit of the last sharded index, wherein the state of the log is used for representing whether target data in the data units managed by the target metadata units except the last sharded index in the target metadata units exist in the data area or not;
if the state of the log is a cache state used for representing that the target data exist in the data area, the target data are flushed into the hard disk;
and brushing data in the data units managed by the metadata units in the metadata unit group with the log state being the cache state into the hard disk, and finally writing the data to be downloaded into the hard disk.
Optionally, the method further comprises:
if the state of the log is a non-cache state representing that the target data does not exist in the data area, recovering the data unit of the target metadata unit;
and recovering all data units managed by the metadata units in the metadata unit group with the journal state being in the non-cache state.
In a second aspect, an embodiment of the present invention provides a data access apparatus, which is applied to a storage node, where the storage node includes a persistent memory, the persistent memory includes a metadata area and a data area, and the storage node is communicatively connected to a client, and the apparatus includes: the receiving module is used for receiving a data writing request sent by the client, wherein the data writing request comprises the data length and the position of data to be written; the generating module is used for generating first metadata used for managing the data to be written according to the data length; the generating module is further configured to generate second metadata for managing a write log of the data to be written according to the data length and the position to be written; and the writing module is used for writing the first metadata into the metadata area and writing the second metadata into the data area after the data to be written is written into the data area.
In a third aspect, an embodiment of the present invention provides a storage node, including a processor and a memory; the memory is used for storing programs; the processor is configured to implement the data access method in the first aspect when executing the program.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the data access method in the first aspect.
Compared with the prior art, according to the data access method, the data access device, the storage node and the storage medium provided by the embodiments of the present invention, when a data writing request sent by a client is received, first metadata for managing data to be written is generated according to the data length of the data to be written in the data writing request, second metadata for managing a log in which the data to be written is generated according to the data length and the position to be written, the data to be written is written in a data area, and then the first metadata and the second metadata are written in the metadata area and the data area, respectively, and it can be ensured that atomicity is satisfied when the data is written in a persistent memory through the first metadata and the second metadata, thereby ensuring consistency of the data in the persistent memory.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is an exemplary diagram of an application scenario provided in an embodiment of the present invention.
Fig. 2 is a schematic block diagram of a storage node according to an embodiment of the present invention.
Fig. 3 is a flowchart illustrating a data access method according to an embodiment of the present invention.
Fig. 4 is a flowchart illustrating another data access method according to an embodiment of the present invention.
Fig. 5 is a flowchart illustrating another data access method according to an embodiment of the present invention.
Fig. 6 is a diagram illustrating an example of division of an address space of a persistent memory according to an embodiment of the present invention.
FIG. 7 is an exemplary diagram of a hierarchical linked list provided by an embodiment of the present invention.
Fig. 8 is a flowchart illustrating another data access method according to an embodiment of the present invention.
Fig. 9 is an exemplary diagram of linked list updating according to an embodiment of the present invention.
Fig. 10 is a schematic diagram of a write flow of data to be written according to an embodiment of the present invention.
Fig. 11 is a flowchart illustrating another data access method according to an embodiment of the present invention.
Fig. 12 is a schematic diagram of various invalid data and valid data determination manners provided by the embodiment of the present invention.
Fig. 13 is an exemplary diagram of a process for inserting a rank linked list according to an embodiment of the present invention.
Fig. 14 is a flowchart illustrating another data access method according to an embodiment of the present invention.
Fig. 15 is a flowchart illustrating another data access method according to an embodiment of the present invention.
Fig. 16 is a block diagram illustrating a data access apparatus according to an embodiment of the present invention.
Icon: 10-a storage node; 11-a processor; 12-persistent memory; 13-hard disk; 14-a bus; 20-a client; 100-a data access device; 110-a receiving module; 120-a generation module; 130-a write module; 140-a reading module; 150-a brush tray module; 160-reconstruction module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.
Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.
Persistent memory is non-volatile memory that can be accessed by conventional memory access instructions (rather than system calls), with low latency (rather than an I/O bus) and byte-addressable (rather than block) characteristics. Byte-addressable means addressing on a basic unit of bytes rather than blocks, and non-volatile means that data is not lost after a power loss. Persistent memory is usually located between external memory (ordinary hard disk or solid state disk) and internal memory (dynamic random access memory), which is located in the middle of the two in terms of capacity, performance, and price.
When using persistent memory, a technical hurdle needs to be overcome how to avoid the problem of data consistency when using persistent memory. Because registers and caches in the processor are volatile, and the large capacitor can only ensure that data in the memory controller is written into the persistent memory after power failure, the data in the persistent memory may not be the latest copy of the data. The problem caused by the inconsistency of data in the cache-memory is called data consistency problem. The data consistency problem is influenced by the fact that data is lost if the data is not consistent, and the system cannot recover if the data is not consistent.
In the prior art, in order to avoid the problem of data consistency, an application program generally displays and calls persistent instructions, and all the persistent instructions must be executed sequentially without overlapping (overlapped), which guarantees data consistency, but the execution efficiency of the persistent instructions is unacceptable, and under the condition that the completion order of the persistent instructions is not emphasized, such as memory copy, the order constraint on the persistent instructions is usually sacrificed, so as to obtain an increase in the execution efficiency.
In view of this, embodiments of the present invention provide a data access method, an apparatus, a storage node, and a storage medium, which are capable of ensuring data consistency without sacrificing order constraint of persistent instructions, and are described in detail below.
Referring to fig. 1, fig. 1 is an exemplary diagram of an application scenario provided by an embodiment of the present invention, in fig. 1, a storage node 10 is in communication connection with a client 20, the storage node 10 includes a persistent memory and a hard disk, the client 20 sends a data access request (including a read data request and a write data request) to the storage node 10, when the client 20 sends the write data request, the storage node 10 may write data to be written to the hard disk, or may write the data to be written to the persistent memory, or in order to ensure write performance, the client is immediately responded to after the data to be written is temporarily stored in the persistent memory, and then the data temporarily stored in the persistent memory is flushed from the persistent memory to the hard disk. The data access method provided by the embodiment of the invention can be suitable for at least one of the two conditions that the data to be written is written into the persistent memory and the data to be written is temporarily stored in the persistent memory and then is flushed to the hard disk.
In this embodiment, the storage node 10 may be a single storage server, or any storage node in a storage array or a server group composed of a plurality of storage servers, and is used for storing user data that needs to be stored when the client 20 writes a data request or managing metadata of the user data.
The client 20 may be a general host, a server, a mobile terminal, or the like, and a user issues a data access request through the client 20, and the client 20 sends the data access request to the storage node 10.
The hard Disk may be a Serial Attached SCSI (SAS) SAS hard Disk, a Serial Advanced Technology Attachment (SATA) hard Disk, or a Solid State Disk SSD (SSD).
Based on the application scenario in fig. 1, an embodiment of the present invention provides a block diagram of a storage node 10 in fig. 1, please refer to fig. 2, fig. 2 is a block diagram of a storage node provided in an embodiment of the present invention, and in fig. 2, the storage node 10 includes a processor 11, a persistent memory 12, a hard disk 13, and a bus 14. The processor 11, persistent memory 12, hard disk 13 communicate over a bus 14.
The processor 11 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 11. The Processor 11 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components.
The memory 12 is used for storing a program, for example, a data access device in the embodiment of the present invention, the data access device includes at least one software functional module which can be stored in the memory 12 in a form of software or firmware (firmware), and the processor 11 executes the program after receiving an execution instruction to implement the data access method in the embodiment of the present invention.
The memory 12 is also used for storing user data that the client write data request needs to store and metadata for managing the user data.
The storage 12 may be at least one of a persistent memory and a hard disk in fig. 1, and optionally, the storage 12 may be a storage device built in the processor 11 or a storage device independent of the processor 11.
The bus 13 may be an ISA bus, a PCI bus, an EISA bus, or the like. Fig. 2 is represented by only one double-headed arrow, but does not represent only one bus or one type of bus.
On the basis of fig. 1 and fig. 2, an embodiment of the present invention provides a data access method applied to the storage node 10 in fig. 1 and fig. 2, please refer to fig. 3, and fig. 3 is a flowchart illustrating a data access method provided by an embodiment of the present invention, where the method includes the following steps:
step S100, receiving a data writing request sent by a client, wherein the data writing request comprises the data length and the position of data to be written.
In this embodiment, the data to be written is data that the user needs to store to the storage node 10. The storage node 10 provides the storage space that can be accessed to the client 20, for example, the storage node 10 provides a storage space a of 10GB to the client 20, and a user can store data to any position of the storage space a through the client, for example, in order to write data to be written in a space 1GB before the storage space a, the user sends a write data request to the storage node 10 through the client 20, where the write data request includes that the data length of the data to be written is 1GB, and the position to be written is a start position of the storage space, that is, a 0 th byte.
Step S101, generating first metadata for managing data to be written according to the data length.
In this embodiment, the persistent memory may be managed according to a fixed length, if the data length is greater than the fixed length, the data to be written may be divided into a plurality of data segments and written into the persistent memory, otherwise, the data to be written may be directly written into the storage space of the fixed length, and the first metadata is used to manage the data to be written, for example, the first metadata manages the data to be written by recording information such as the number of the data segments into which the data to be written is divided and the position of each data segment in the data to be written.
And S102, generating second metadata for managing a write log of the data to be written according to the data length and the position to be written.
In this embodiment, the second metadata is used to manage a write log of the data to be written, where the write log is used to record a write condition of the data to be written, and the write condition may include whether the data is written successfully, a write location, and the like, for example, the second metadata may identify whether the data to be written has been written into the persistent memory successfully through different flags.
Step S103, after writing the data to be written into the data area, writing the first metadata into the metadata area, and writing the second metadata into the data area.
In this embodiment, the data to be written is written first, and then the first metadata and the second metadata are written, so that it can be ensured that the data to be written is all correctly written into the data area, that is, the data area is the latest data at this time, or when the power is turned on again after abnormal power failure, the data area is restored to the state before the data to be written is written according to the first metadata and the second metadata, that is, the data area is the old data before the data to be written, and a situation that data inconsistency exists between the latest data and the old data in the data area cannot occur.
According to the method provided by the embodiment of the invention, the atomicity can be ensured when the data is written into the persistent memory through the first metadata and the second metadata, and the consistency of the data in the persistent memory is further ensured.
Referring to fig. 4, fig. 4 is a flowchart illustrating another data access method provided in an embodiment of the present invention, where step S101 includes the following sub-steps:
and a substep S1010 of calculating the number of segments according to the data length and the preset length.
In this embodiment, the preset length may be set according to actual scene needs, for example, the preset length is set to 4 KB. As a specific implementation manner, the number of segments can be calculated by using the following formula:
Figure BDA0003571032950000081
and a substep S1011, segmenting the data to be written according to the number of segments to obtain at least one data segment.
In this embodiment, it can be understood that, when the data length is an integer multiple of the preset length, the dividing of the data to be written according to the number of segments is average dividing, and when the data length is not an integer multiple of the preset length, the length of the last data segment is different from the length of the remaining data segments.
Substep S1012 generates metadata of each data segment according to the number of data segments and the position of each data segment in the data to be written.
In this embodiment, the metadata of each data segment may include the number of data segments and the location of each data segment in the data to be written.
In this embodiment, in order to determine the data segments belonging to the same data to be written, the same identifier may be set in each data segment of the same data to be written, and the same data segment is identified to belong to the same data to be written. The data to be written of the data writing request at least comprises the following two conditions: the data to be written in one data writing request can be written once, the data to be written in one data writing request needs to be written for multiple times, for the former data writing request, the data to be written is divided into multiple data segments, the identifications of the multiple data segments are the same, for the latter data writing request, the length of each time of writing is firstly determined, then the data written for each time are respectively segmented, the identifications of the data segments written for the same time are the same, and the identifications of the data segments written for different times are different.
In this embodiment, in order to improve reliability of the metadata of each data segment, the metadata of each data segment may further include check data to ensure reliability of information included in the data segment, if each data segment includes the number of data segments and the position of each data segment in the data to be written, the check data may be obtained according to the number of data segments and the position of each data segment, and if each data segment includes the identifier, the number of data segments, and the position of each data segment in the data to be written, the check data may be obtained according to the identifier, the number of data segments, and the position of each data segment.
In this embodiment, in order to reduce the influence of generating the check data on the write performance, the check data may be obtained by an addition operation, for example, the identifier, the number of the data segments, and the position of each data segment in the data to be written are added to obtain the corresponding check data. Of course, other ways of calculating the Check data, such as Cyclic Redundancy Check (CRC), may be used within the acceptable write performance impact.
And a substep S1013 of generating a reserved metadata for the data to be written, wherein the reserved metadata includes a value obtained by adding 1 to the number of the data segments.
In this embodiment, similar to the metadata of each data segment, the reserved metadata may also include a position, check data, and an identifier, except that the reserved metadata includes a value obtained by adding 1 to the number of the data segments, where the position of the reserved metadata is a position after the last data segment, and the identifier of the reserved metadata is the same as the identifier of any data segment.
In sub-step S1014, the reserved metadata and the metadata of all the data segments are used as the first metadata.
According to the method provided by the embodiment of the invention, the corresponding metadata is generated for each data segment, and then the data to be written is taken as a whole to generate the corresponding reserved metadata, so that the integrity and the accuracy of the data to be written recorded by the first metadata are ensured.
Referring to fig. 5, fig. 5 is a flowchart illustrating another data access method provided in an embodiment of the present invention, and step S102 includes the following sub-steps:
and a substep S1020 of obtaining a flag bit representing a successfully written data area of the data to be written.
In this embodiment, different values may be set for the flag field to represent the writing condition of the data to be written, for example, when the flag field is "cached", the flag field represents that the data to be written is successfully written into the data area, when the flag field is "none", the flag field represents that the data to be written has been flushed from the data area to the hard disk, and of course, the flag field may be identified by different integer values, and a value of the flag field is 1 to represent that the data to be written is successfully written into the data area, and is 0 to represent that the data is flushed from the data area to the hard disk.
And a substep S1021, generating check data according to the zone bit, the data length and the position to be written.
In this embodiment, to avoid the influence of the generated Check data on the write performance, the Check data may be obtained by adding the flag bit, the data length, and the to-be-written position, and the Check data may be obtained.
And a substep S1022, using the flag bit, the data length, the position to be written and the check data as second metadata.
According to the method provided by the embodiment of the invention, whether the data to be written is successfully written into the data area can be accurately identified through the zone bit, the reliability of the zone bit, the data length and the position to be written is ensured through the verification of the data, and the zone bit, the data length, the position to be written and the verification data are used as second metadata, so that the integrity and the accuracy of the log written by the data to be written are ensured.
It should be noted that, in an application scenario where data in a persistent memory needs to be flushed to a hard disk, in order to implement efficient disk flushing, in addition to the second metadata, the storage node 10 needs to generate hard disk metadata, where the hard disk metadata is used to represent a location where data to be written needs to be written into the hard disk, and the management modes of hard disk spaces are different, and the representation modes of the hard disk metadata may also differ, for example, the hard disk metadata may include: hard disk identification, file identification, object identification, block identification, version number, and the like.
To describe the writing process of the first metadata and the second metadata more clearly, an embodiment of the present invention first introduces a specific partitioning manner of the address space of the persistent memory, please refer to fig. 6, fig. 6 is a diagram illustrating a partitioning example of the address space of the persistent memory provided in an embodiment of the present invention, and in fig. 6, the address space of the persistent memory is divided into a first reserved area, a second reserved area, a metadata area, and a data area.
The first reserved area and the second reserved area are both 4KB in size, are respectively arranged at the head and the tail of the persistent memory and are backup to each other, such arrangement can have both indication of the start and end positions of the persistent memory on one hand, and can enhance the reliability of data therein on the other hand, taking the second reserved area as an example, the reserved area includes 3 fields: the size field is used for recording the size of the whole space of the persistent memory, the size field occupies 8B, the magic field is set as a fixed value, for example, the fixed value is 0x4A3B2C1D, the fixed value occupies 4B, the CheckSum field is used for storing CRC32 CheckSum of the first 12 bytes (size field + magic field) of the reserved area, the reserved field is a reserved field for use in subsequent expansion, the first reserved area is the same as the second reserved area, and details are not repeated here.
And the metadata area is adjacent to the first reserved area and is positioned in an address space behind the first reserved area, the address space of the metadata area is aligned in a size of 16B and is divided into a plurality of metadata units according to the size of 16B, and one piece of metadata comprises 4 fields including a Log ID field, a count field, an index field and a CheckSum field. The Log ID field is used for recording the Log ID of the Log, and the Log is a set in which a series of stored data are written in a certain sequence. A LOG can be represented by LOG (offset, len), where offset represents an offset position where data to be written should be written, len identifies a length of the data to be written, that is, a LOG can represent data to be written at one time, and LOG ID can be a number of seconds from a current system time to 1 month and 1 day of 1970 + a linearly increasing serial number of 6 bits, which is used to uniquely represent a LOG. If the second is crossed, the sequence number starts counting again from 0.
The count field represents the number of fragments of the data fragment into which the data to be written is split once, which is +1, for example, the data to be written is 256KB, and the data fragment is split into 64 data fragments of 4KB, so that the value of the count field is 65, and the count field occupies 2B.
The index field represents the relative position of the data segment in the data to be written, for example, if the data to be written is divided into 64 data segments, the value of the count field in the metadata unit of the first data segment is 65, the value of the index field is 1, the index values in the metadata units of the remaining data segments are sequentially incremented, and the index field occupies 2B.
The CheckSum field is used for storing a CheckSum of the first 12B (i.e., log id + count + index) in the metadata unit, and occupies 4B, and since metadata is frequently read and written, the CheckSum obtained by adding the first three fields is used in the checking manner, and CRC32 is not used, because CRC32 is time-consuming.
Regardless of the CheckSum field in the metadata area or the CheckSum field in the data area, the occupied byte number is 4B and is less than 8B, and on a 64-bit computer, atomicity can be guaranteed only when the data of 8 bytes is stored in the persistent memory, so that data consistency is guaranteed.
The data area is adjacent to the metadata area and located in an address space behind the metadata area, the address space of the data area is aligned in 4KB size and is divided into a plurality of data units according to the 4KB size, the data units are in one-to-one correspondence with the metadata units, each metadata unit manages the corresponding data unit, the last 4KB of each data unit is used for storing metadata of the hard disk and second metadata, and the rest 4KB is used for storing data of data segmentation. The hard disk metadata in fig. 6 includes a hard disk identification field, a hard disk location field, a file identification, an object identification, a block identification, and a version number, and the second metadata includes a location to be written, a data length, a status flag (corresponding to the flag bit), and check data.
For the address space division manner in fig. 6, knowing the size sizepromem of the persistent memory, it may determine the starting addresses of the first reserved area, the metadata area, the data area, and the second reserved area, and may further obtain the starting address of any metadata unit, and since the metadata units and the data units are in one-to-one correspondence, it may also obtain the starting address of any data unit, so as to write the first metadata, the second metadata, and the data to be written into the respective address spaces. The manner of determining the start address will be described in detail below.
The first reserved area is the first 4KB of the address space of the persistent memory, so the starting address space addrsuper1 of the first reserved area is: addrsuper1 is 0.
The starting address space of the metadata area is: addrmetabase ═ addrsuper1+ sizepresser 1.
In order to calculate the start address of the data area, the total number of metadata units, cntmeta, i.e. the total number of data units, cntdata, is calculated first, and the total number of metadata units is calculated by:
cntmeta ═ cntmeta ═(sizemem-sizesphere 1 × 2)/(sizemeta + sizemeta), sizemeta being the length of a data unit and sizemeta being the length of a metadata unit, so that the starting address of the data area obtained is:
addrdatabase is 0+ sized super1+ ceiling (cntmeta sized), where ceiling indicates that the result of the sum is aligned at 4K, corresponding to the pad alignment in fig. 6 of the partition of the persistent memory address space.
The start addresses of the metadata area and the data area are determined, and the position of any metadata unit and any data unit can be calculated according to multiplication.
Similarly, the starting address addrsuper2 of the second reserved area is:
addrsuper2=Addrdatabase+cntdata*sizedata。
in conjunction with the exemplary diagram of the space partition of the persistent memory shown in fig. 6, the embodiment of the present invention specifically describes how to determine and write the metadata unit and the data unit corresponding to the first metadata, the second metadata, and the data to be written.
In order to determine the metadata units and the data units corresponding to the first metadata, the second metadata and the data to be written more efficiently, the embodiment of the invention manages the metadata units in a hierarchical management mode, each level corresponds to one linked list, each linked list comprises at least one management node, each management node is used for managing metadata units adjacent to each other, the number of the metadata units managed by the management nodes in the same linked list is the same, and the number of the metadata units managed by the management nodes in any two linked lists is different. The level of the linked list may be determined according to the maximum length of the data written at one time and the length of the data unit, for example, if the maximum length of the data written at one time is set to 256KB and the length of the data unit is 4KB, the level of the linked list is: 1-256 KB/4KB, namely 1-64 KB, please refer to fig. 7, where fig. 7 is an exemplary diagram of a hierarchical linked list provided in an embodiment of the present invention, fig. 7 includes 64-level linked lists, and linked lists 1-64, where the numbers of metadata units managed by one management node in each linked list are: 1.2, 3, 4, …, 64.
Referring to fig. 8, fig. 8 is a flowchart illustrating another data access method according to an embodiment of the present invention, where step S103 includes the following sub-steps to write the first metadata:
and a substep S103-10 of determining a target level according to the data length and the preset length.
In this embodiment, according to the data length and the preset length, the number of data segments, that is, the number of segments, is obtained, according to the number of segments, an initial level is determined, where the initial level is segment number +1, if a management node exists in a linked list corresponding to the initial level, the initial level is determined as a target level, if no management node exists in the linked list corresponding to the initial level, starting from the initial level, the level of the found linked list where the management node exists first is determined as the target level, for example, the number of segments is 5, the initial level is 5+1 is 6, no management node exists in the linked list 6, starting from the linked list 6, the found linked list where the management node exists first is a linked list 10, and the level 10 of the linked list 10 is the target level.
And a substep S103-11, determining a first metadata unit to be written and a second metadata unit to be written from the management node of the linked list corresponding to the target level, wherein the number of the first metadata unit to be written is the number of the data segments, and the number of the second metadata unit to be written is 1.
In this embodiment, if there are multiple management nodes in the linked list corresponding to the target level, one management node is arbitrarily selected from the linked list, and then a first metadata unit to be written and a second metadata unit to be written are determined from the management node, where the number of the first metadata unit to be written is the number of data segments, that is, the number of segments.
And a substep S103-12, writing the metadata of each data segment into each first metadata unit to be written in sequence according to the position of each data segment in the data to be written.
In this embodiment, the first metadata unit to be written and the second metadata unit to be written are consecutive in address, when the first metadata unit to be written is multiple, the multiple first metadata units to be written are also consecutive in address, and according to the position of each data segment in the data to be written, the metadata of each data segment is sequentially written into the first metadata unit to be written, that is, the metadata of the first data segment is written into the first metadata unit to be written, the metadata of the second data segment is written into the second first metadata unit to be written, and so on.
Substep S103-13 writes the reserved metadata to the second to-be-written metadata unit.
In this embodiment, the second metadata unit to be written is adjacent to the last first metadata unit to be written.
With continued reference to fig. 8, step S103 further includes the following sub-steps to write the second metadata:
and a substep S103-20, taking the data unit corresponding to the second metadata unit to be written as a data unit to be written.
And substep S103-21, writing the second metadata to the data unit to be written.
It should be noted that the data segments are sequentially written into the data units corresponding to the metadata units of the data segments according to the positions of the data segments in the data to be written, for example, the first data segment is written into the data unit corresponding to the metadata unit of the first data segment. The first metadata and the second metadata are written simultaneously and are written after all data segments of the data to be written are written into corresponding data units at one time, so that the data consistency of the data segments can be ensured under the condition of abnormal power failure.
It should be further noted that, because the first metadata unit to be written and the second metadata unit to be written are already occupied, they need to be deleted from the corresponding management node of the corresponding linked list, if the sum of the numbers of the first metadata unit to be written and the second metadata unit to be written is equal to the number of the metadata units managed by the management node, the corresponding management node is directly deleted, if the sum of the numbers of the first metadata unit to be written and the second metadata unit to be written is less than the number of the metadata units managed by the management node, except that it needs to be deleted from the corresponding management node of the corresponding linked list, at this time, the number of the metadata units managed by the management node changes, in order to satisfy the hierarchical management of the linked list, it needs to insert the metadata unit into the linked list of the corresponding level according to the number of the changed metadata units in the management node, please refer to fig. 9, which is an illustration of updating of the linked list provided by the embodiment of the present invention, the number of the first metadata units to be written is 5, the number of the second metadata units to be written is 1, the linked list 6 is empty, and the linked list 7 is not empty, then 6 metadata units are allocated from the management node in the linked list 7, the linked list 7 still has 1 remaining metadata unit, and the remaining metadata unit is inserted into the linked list 1 from the linked list 7, it should be noted that only the linked list 1, the linked list 6 and the linked list 7 related to the example are drawn in fig. 9, and the remaining linked lists are replaced by ellipses and do not exist in other linked lists.
It should be further noted that, as a specific implementation manner, only one level of linked list may include a management node at first, for example, only one management node in the linked list 64 manages all metadata units, and the linked lists at other levels are empty.
An embodiment of the present invention further provides a schematic diagram of a write flow of data to be written based on fig. 6, please refer to fig. 10, fig. 10 is a schematic diagram of a write flow of data to be written provided in an embodiment of the present invention, in fig. 10, a data length of the data to be written is 254KB, and 245KB is not an integer multiple of 4KB, so that data needs to be read from a hard disk and is complemented to be an integer multiple of 256KB of 4 KB.
In this embodiment, for an application scenario in which data to be written is temporarily stored in a persistent memory and is finally stored in a hard disk, the data to be written in the persistent memory needs to be flushed (or stored) into the hard disk, and once the data to be written is successfully flushed into the hard disk, in order to timely release a space of the persistent memory occupied by the data to be written, a metadata unit and a corresponding data unit occupied by the data to be written need to be released, so as to be used when other data to be written is subsequently written. In order to implement more efficient disk refreshing and releasing, a disk refreshing list and a recovery list are introduced in the embodiment of the invention, when data to be written is written into a persistent memory, the data to be written and a data unit for storing the data to be written are also added into the disk refreshing list, and after the data in the disk refreshing list is refreshed into a hard disk, the data unit corresponding to the refreshed data is added into the recovery list so as to be recovered. As a specific implementation, the disk brushing process and the recycle process may be periodically executed by two independent threads, respectively. An embodiment of the present invention further provides a specific implementation manner of a flash disk refreshing process, please refer to fig. 11, where fig. 11 is a flowchart illustrating another data access method provided in an embodiment of the present invention, and the method further includes the following steps:
and step S200, determining an effective data unit and an ineffective data unit according to the position to be written and the disk brushing position.
In this embodiment, the valid data unit is a data unit that needs to be flushed to the hard disk, and the invalid data unit is a data unit that does not need to be flushed to the hard disk, for example, a first write request writes data a at position 1, and then a writes data a in the persistent memory, and before a is not flushed to the hard disk, write request 2 writes data B at position 1, and then when actually flushing the disk, data a does not need to be flushed to the hard disk, and only the latest data B needs to be flushed to the hard disk.
In this embodiment, the disk-brushing list may include a plurality of data to be brushed, each data to be brushed corresponds to one disk-brushing position, the position to be written needs to be compared with each disk-brushing position in the disk-brushing list one by one to determine all valid data units and invalid data units, the comparison between the position to be written and any disk-brushing position is the same, and only the comparison between the position to be written and any disk-brushing position is described below.
Depending on the position to be written and the position of the brush disk, there may be 3 cases, each of which includes 3 different cases, each of which will be described in detail below.
For convenience of description, the position to be written and the position to be flushed are respectively represented by login and logcmp, and both the position to be written and the position to be flushed include respective start offset beginOffset and end offset endOffset. According to the beginning offset beginOffset and the ending offset endOffset of the position to be written and the disk-brushing position, valid data and invalid data can be determined, a data unit corresponding to the valid data is a valid data unit, a data unit corresponding to the invalid data is an invalid data unit, and specific implementation modes for determining the invalid data and the valid data are mainly described in different cases and different situations.
Case 1: the start offset of login is less than the start offset of logcmp
Case 1.1: end offset of login is less than end offset of logcmp
In this case, the version numbers of login and logcmp need to be compared, and if the version number of login is smaller than the version number of logcmp, it indicates that the data of logcmp in login is invalid data, and the rest of the data is valid data; otherwise, the ending offset of login is assigned to the starting offset of logcmp to indicate that the data of login within logcmp is valid.
In this embodiment, an embodiment of the present invention is described by using diagrams for each case, please refer to fig. 12, fig. 12 is a schematic diagram of various determination manners of invalid data and valid data provided by the embodiment of the present invention, for convenience of description, a white portion in a rectangular frame in all the drawings in fig. 12 represents valid data, a shaded portion represents invalid data, and a schematic diagram of valid data and invalid data of case 1.1 is shown in fig. 12 (a).
Case 1.2: the ending offset of login is equal to the ending offset of logcmp
In this case, the version numbers of login and logcmp need to be compared, if the version number of login is smaller than the version number of logcmp, the data of logcmp in login is indicated as invalid data, the start offset of logcmp is assigned to the end offset of login to represent the end offset, and the rest of data are valid data; otherwise, the data in the whole logcmp is described as invalid data, and the schematic diagram of the valid data and the invalid data of case 1.2 is shown in fig. 12 (b).
Case 1.3: the ending offset of login is greater than the ending offset of logcmp
In this case, the version numbers of login and logcmp need to be compared, if the version number of login is less than the version number of logcmp, the data of logcmp in login is indicated as invalid data, and the start offset of logcmp is assigned to the end offset of login to represent the invalid data; otherwise, the data in the whole logcmp is invalid. A schematic diagram of valid data and invalid data of case 1.3 is shown in fig. 12 (c).
Case 2: the start offset of login is less than the start offset of logcmp
Case 2.1: end offset of login is less than end offset of logcmp
In this case, it is necessary to compare the version numbers of login and logcmp, and if the version number of login is less than the version number of logcmp, it is indicated that the data in the entire login is invalid data; otherwise, the ending offset of login is assigned to the starting offset of logcmp to indicate valid data for logcmp. A schematic diagram of valid data and invalid data of case 2.1 is shown in fig. 12 (d).
Case 2.2: the ending offset of login is equal to the ending offset of logcmp
In this case, it is necessary to compare version numbers of login and logcmp, and if the version number of login is smaller than the version number of logcmp, it means that data in the entire login is invalid data and data in the entire logcmp is valid data, otherwise, it means that data in the entire logcmp is invalid data and data in the entire logcmp is valid data, and a schematic diagram of valid data and invalid data of case 2.2 is shown in fig. 12 (e).
Case 2.3: the ending offset of login is greater than the ending offset of logcmp
In this case, the version numbers of login and logcmp need to be compared, if the version number of login is smaller than the version number of logcmp, the data of logcmp in login is indicated as invalid data, the rest of data are valid data, and the ending offset of logcmp is assigned to the starting offset of login to represent the ending offset; otherwise, it indicates that the data in the entire logcmp is invalid data, and the rest of the data is valid data, and the schematic diagram of valid data and invalid data of case 2.3 is shown in fig. 12 (f).
Case 3: the start offset of login is greater than the start offset of logcmp
Case 3.1: log (log)inIs less than the ending offset of logcmp
In this case, the version numbers of login and logcmp need to be compared, and if the version number of login is smaller than the version number of logcmp, it indicates that the data in the whole login is invalid data, and the rest of the data is valid data; otherwise, it is described that the data of logcmp in login is invalid data, the rest of the data is valid data, and the schematic diagram of valid data and invalid data of case 3.1 is shown in fig. 12 (g).
Case 3.2: the ending offset of login is equal to the ending offset of logcmp
In this case, it is necessary to compare the version numbers of login and logcmp, and if the version number of login is less than the version number of logcmp, it is indicated that the data in the entire login is invalid data; otherwise, the data of logcmp in loggin is described as invalid data, the start offset of loggin is assigned to the end offset of logcmp to represent the invalid data, and the schematic diagram of valid data and invalid data in case 3.2 is shown in fig. 12 (g).
Case 3.3: the ending offset of login is greater than the ending offset of logcmp
In this case, it is necessary to compare login and logcmp version numbers, and if login version number is less than logcmp version number, it indicates logcmp is at loginThe data in the log is invalid data, the other data are valid data, and the ending offset of logcmp is assigned to the starting offset of login to represent the ending offset; otherwise, it is stated that the data of login in logcmp is invalid data, the remaining data is valid data, the start offset of login is assigned to the end offset of logcmp to represent it, and the schematic diagram of valid data and invalid data in case 3.3 is shown in fig. 12 (h).
As a specific embodiment, for invalid data, the flag bit in the second metadata corresponding to the invalid data needs to be set from "cached" to "none" so as to recover the data unit corresponding to the invalid data.
In step S201, if the valid data unit does not exist in the disk refreshing list, the valid data unit is updated to the disk refreshing list, so that the data in the valid data unit is refreshed from the persistent memory to the hard disk through the disk refreshing list.
In the 9 cases, if the valid data unit existing in the scrub list is changed, the first metadata and the second metadata of the changed valid data unit need to be updated; if the data unit part corresponding to the data to be written becomes invalid, the first metadata and the second metadata of the data to be written need to be updated correspondingly; and inserting the data unit becoming invalid into a recycle list to recycle it through the recycle list; if all the data units corresponding to the data to be written become invalid, the data units need to be inserted into the recovery list, so that the invalid data units are recovered through the recovery list.
In step S202, if the invalid data unit exists in the scrub list, the invalid data unit is deleted from the scrub list and inserted into the recycle list, so that the invalid data unit is recycled through the recycle list.
In this embodiment, in order to timely and effectively recycle the data unit without excessively fragmenting the persistent memory space, an embodiment of the present invention further provides a specific implementation manner for inserting a recycle list:
firstly, if the data units to be merged meeting the preset merging condition exist in the recovery list, deleting the data units to be merged from the recovery list.
In this embodiment, the satisfaction of the preset merge condition may be that the data unit in the recycle list is consecutive to the address of the invalid data unit, the address continuation may be that the end address of the invalid data unit is consecutive to the start address of the data unit in the recycle list, or that the start address of the invalid data unit is consecutive to the end address of one data unit and the end address of the invalid data unit is consecutive to the start address of another data unit.
In this embodiment, in order to quickly find a data unit to be merged that meets a preset merging condition, a red-black tree mode may be used to manage the data unit in the persistent memory.
And secondly, merging the data unit to be merged and the invalid data unit to obtain a data unit to be inserted.
Thirdly, inserting the data unit to be inserted into the recycling list.
And finally, if the data unit to be merged does not exist in the recovery list, inserting the invalid data unit into the recovery list.
It can be understood that, when a data unit in the recovery list is recovered, since the data unit and the metadata unit are in one-to-one correspondence, information in the metadata unit corresponding to the data unit also needs to be updated correspondingly, for example, a value of each field in the metadata unit is set to 0 or an invalid value.
It should be noted that, as an implementation, the recycle list may be a list independent from the level linked list, and the process of recycling the recycle list is actually a process of inserting the metadata unit corresponding to the data unit in the recycle list into the level linked list. Referring to fig. 13, fig. 13 is an exemplary diagram of a process of inserting a rank linked list according to an embodiment of the present invention, where in fig. 13, a metadata unit 3 to be inserted includes metadata units 1 and 2 in a linked list 2, and metadata units 4, 5, and 6 in the linked list 3, and since the metadata unit 2 and the metadata unit 4 are both continuous with the metadata unit 3, the metadata units 1 to 6 may be merged, and after merging, the metadata unit 1 to the metadata unit 6 are inserted into the linked list 6 as a management node.
The method provided by the embodiment can avoid the disk refreshing of invalid data, reduce the data amount of the data refreshing into the hard disk, thereby reducing the influence on the writing performance, simultaneously timely release the data units corresponding to the invalid data to improve the utilization rate of the persistent memory, merge the data units needing to be recovered when inserting the recovery list as much as possible, and effectively reduce the fragmentation of the persistent memory space.
In this embodiment, when reading data stored in the storage node 10, the data to be read may be stored in a persistent memory, may also be stored in a hard disk, or may be partially stored in the persistent memory and partially stored in the hard disk, in order to correctly read the data, an embodiment of the present invention further provides a specific implementation manner of reading the data, please refer to fig. 14, where fig. 14 is a flowchart of another data access method provided in the embodiment of the present invention, and the method further includes the following steps:
step S300, receiving a data reading request sent by the client, where the data reading request includes a data length and a position of data to be read.
Step S301, reading original data from the hard disk according to the data length and the position of the data to be read.
Step S302, determining whether the latest data which corresponds to the data to be read and is not stored in the hard disk exists in the disk refreshing list according to the position to be read.
Step S303, the original data and the latest data are combined to obtain the data to be read.
In this embodiment, the original data and the latest data may be overlapped or not overlapped, when overlapped, the latest data is used to replace the overlapped part of the original data and the latest data to obtain the data to be read, and when not overlapped, the original data and the latest data are spliced to obtain the data to be read.
In this embodiment, if the latest data that corresponds to the data to be read and is not stored in the hard disk does not exist in the flash disk list, it indicates that the original data read out from the hard disk is the latest data, and at this time, the original data is the data to be read.
In this embodiment, in an actual operating environment, the storage node 10 may suddenly be powered down. After a power failure, if the data being written is written only in half, when the storage node 10 is restarted, the location of the data written at the time of the power failure is read, and it may occur that the read data is neither the latest nor the next latest. At this time, data inconsistency occurs, and in order to avoid the data inconsistency, an embodiment of the present invention further provides a process of reconstructing data after power is re-turned on, please refer to fig. 15, where fig. 15 is a flowchart illustrating another data access method provided in an embodiment of the present invention, where the method includes the following steps:
in step S400, when the storage node is powered on, the plurality of metadata units are divided into at least one metadata unit group according to the log identifier.
In this embodiment, the journal identifier may be an identifier field stored in the metadata unit, and for the data to be downloaded written at the same time, the journal identifiers in the corresponding metadata units are the same.
In this embodiment, when the storage node 10 is powered off, the persistent memory stores data to be downloaded to the hard disk, and the data to be downloaded at least exists in one data unit and corresponds to at least one metadata unit.
Step S401 is to perform reconstruction processing on each metadata unit group to write the data to be downloaded into the hard disk.
In this embodiment, reconstruction may be performed from information recorded in a metadata unit in each metadata unit group.
Before the specific reconstruction, in order to ensure the reliability of the data, as a specific implementation manner, information in the first reserved area or the second reserved area may be read first, and the integrity of the information is determined by comparing the CheckSum field with the magic field and the size field, and if the information is not complete, the persistent memory is considered to be initialized for the first time, and the initialization operation is performed on the entire persistent memory.
During initialization, each metadata unit is traversed, fields except the CheckSum field are assigned to be 0, and the check value of the metadata unit is calculated and written into the CheckSum field. Then, the whole metadata area is used as a fragment, and a hierarchical list and a corresponding red-black tree are generated according to the recycling process in the foregoing embodiment. Finally, the first 8 bytes (log identification field) of the metadata unit are persisted, and the last 8 bytes (count field, index field, and CheckSum field) of the metadata unit are persisted after completion.
And if the information in the first reserved area and the second reserved area is complete, comparing the size field to determine whether the size field is consistent with the size of the current persistent memory. If not, an error is considered. And if the two are consistent, starting a specific reconstruction process.
As a specific embodiment, the specific reconstruction process may be:
firstly, for any target metadata unit group in at least one metadata unit group, reconstructing the target metadata unit group according to the fragmentation number and the fragmentation index of the target metadata unit in the target metadata unit group, and obtaining the log corresponding to the target metadata unit group.
In this embodiment, as a specific implementation manner, the number of slices and the slice index may be obtained according to a count field and an index field in the metadata unit, so that the metadata unit may be sorted according to a value of the index field, and a corresponding sorted data unit is obtained.
And secondly, reading the state of the log from the data units managed by the target metadata unit of the last shard index, wherein the state of the log is used for representing whether target data in the data units managed by the target metadata units except the last shard index exist in the data area.
In this embodiment, since the data unit managed by the last target metadata unit stores the second metadata including the flag bit, the data length, the position to be written, and the verification data, the flag bit, the data length, and the position to be written may be verified by the verification data, and then the subsequent processing is performed after the verification is passed.
Thirdly, if the state of the log is a cache state used for representing that the target data exists in the data area, the target data is flushed into the hard disk.
And finally, brushing the data in the data units managed by the metadata units in the metadata unit group with the log state being the cache state into the hard disk, and finally writing the data to be downloaded into the hard disk.
In this embodiment, if the status of the log is a non-cache status indicating that the target data does not exist in the data area, the data unit of the target metadata unit needs to be recovered at this time, and the specific processing is as follows:
and if the state of the log is a non-cache state representing that the target data does not exist in the data area, recovering the data unit of the target metadata unit.
And recycling the data units managed by the metadata units in the metadata unit group with the journal state being the non-cache state.
In this embodiment, the recovery of the data unit has been described in step S202 in the foregoing embodiment, and is not described herein again.
In order to perform the corresponding steps in the above-described embodiments and various possible implementations, an implementation of the data access apparatus 100 is given below. Referring to fig. 16, fig. 16 is a block diagram illustrating a data access apparatus 100 according to an embodiment of the present invention. It should be noted that the basic principle and the resulting technical effect of the data access apparatus 100 provided in the present embodiment are the same as those of the above embodiments, and for the sake of brief description, no reference is made to this embodiment portion.
The data access apparatus 100 includes a receiving module 110, a generating module 120, a writing module 130, a reading module 140, a brushing module 150, and a rebuilding module 160.
The receiving module 110 is configured to receive a data writing request sent by a client, where the data writing request includes a data length and a position to be written of data to be written.
Optionally, the receiving module 110 is further configured to receive a read data request sent by the client, where the read data request includes a data length and a read position of data to be read.
The generating module 120 is configured to generate first metadata for managing data to be written according to the data length.
Optionally, the generating module 120 is specifically configured to: calculating the number of segments according to the data length and the preset length; segmenting data to be written according to the number of segments to obtain at least one data segment; generating metadata of each data segment according to the number of the data segments and the position of each data segment in the data to be written; generating a reserved metadata for the data to be written, wherein the reserved metadata comprises a value obtained by adding 1 to the number of data segments; and taking the reserved metadata and the metadata of all the data segments as first metadata.
The generating module 120 is further configured to generate second metadata for managing a log written with the data to be written according to the data length and the position to be written.
Optionally, the generating module 120 is further specifically configured to: acquiring a flag bit used for representing successful writing of data to be written into a data area; generating check data according to the flag bit, the data length and the position to be written; and taking the zone bit, the data length, the position to be written and the check data as second metadata.
The writing module 130 is configured to write the first metadata into the metadata area and write the second metadata into the data area after writing the data to be written into the data area.
Optionally, the metadata area includes a plurality of metadata units, the metadata units are managed in a hierarchical manner, each level corresponds to one linked list, each linked list includes at least one management node, each management node is configured to manage metadata units adjacent to each other in position, the number of metadata units managed by the management nodes in the same linked list is the same, the number of metadata units managed by the management nodes in any two linked lists is different, and the write module 130 is specifically configured to: determining a target level according to the data length and a preset length; determining a first metadata unit to be written and a second metadata unit to be written from a management node of a linked list corresponding to a target level, wherein the number of the first metadata unit to be written is the number of data segments, and the number of the second metadata unit to be written is 1; sequentially writing the metadata of each data segment into each first metadata unit to be written according to the position of each data segment in the data to be written; and writing the reserved metadata into a second metadata unit to be written.
Optionally, the data area includes a plurality of data units, each metadata unit corresponds to one data unit, each data segment is written into the data unit corresponding to each target metadata unit, and the writing module 130 is configured to write the second metadata into the data area, and specifically configured to: taking a data unit corresponding to the second metadata unit to be written as a data unit to be written; and writing the second metadata into the data unit to be written.
Optionally, the reading module 140 is configured to: reading original data from the hard disk according to the data length and the position of the data to be read; determining whether the latest data which corresponds to the data to be read and is not stored in the hard disk exists in the disk refreshing list according to the position to be read; and combining the original data and the latest data to obtain the data to be read.
Optionally, the data area includes a plurality of data units, the storage node further includes a hard disk, a disk refreshing list and a recovery list, the disk refreshing list includes a disk refreshing position of the data to be refreshed and a data unit storing the data to be refreshed, the data to be refreshed is data stored in the persistent memory and to be refreshed in the hard disk, and the disk refreshing module 150 is configured to: determining an effective data unit and an invalid data unit according to the position to be written and the disc brushing position; if the effective data unit does not exist in the disk refreshing list, the effective data unit is updated to the disk refreshing list, and data in the effective data unit is refreshed to the hard disk from the persistent memory through the disk refreshing list; and if the invalid data unit exists in the disk refreshing list, deleting the invalid data unit from the disk refreshing list and inserting the invalid data unit into the recovery list so as to recover the invalid data unit through the recovery list.
Optionally, the brush tray module 150 is specifically configured to: if the data units to be merged meeting the preset merging conditions exist in the recovery list, deleting the data units to be merged from the recovery list; merging the data unit to be merged and the invalid data unit to obtain a data unit to be inserted; inserting the data unit to be inserted into a recovery list; and if the data unit to be merged does not exist in the recovery list, inserting the invalid data unit into the recovery list.
Optionally, the storage node further includes a hard disk, when the storage node is powered off, the persistent memory stores data to be downloaded to be written to the hard disk, the metadata area includes a plurality of metadata units, each metadata unit includes a log identifier, the data area includes a plurality of data units, each metadata unit manages one data unit, the data to be downloaded at least exists in one data unit, and the rebuilding module 160 is configured to: when the storage node is powered on, dividing a plurality of metadata units into at least one metadata unit group according to the log identification; and rebuilding each metadata unit group to write the data to be downloaded into the hard disk.
Optionally, each metadata unit includes a slice number, a slice index, and a log identifier, and the rebuilding module 160 is specifically configured to: for any target metadata unit group in at least one metadata unit group, reconstructing the target metadata unit group according to the fragment number and the fragment index of the target metadata unit in the target metadata unit group to obtain a log corresponding to the target metadata unit group; reading the state of a log from the data units managed by the target metadata unit of the last sharded index, wherein the state of the log is used for representing whether target data in the data units managed by the target metadata units except the last sharded index in the target metadata units exist in a data area or not; if the log state is a cache state used for representing that the target data exist in the data area, the target data are flushed into the hard disk; and brushing the data in the data units managed by the metadata units in the metadata unit group with the log state being the cache state into the hard disk, and finally writing the data to be downloaded into the hard disk.
Optionally, the reconstruction module 160 is further specifically configured to: if the state of the log is a non-cache state representing that the target data does not exist in the data area, recovering the data unit of the target metadata unit; and recycling the data units managed by the metadata units in the metadata unit group with the journal state being the non-cache state.
Embodiments of the present invention provide a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the data access method as described above.
To sum up, embodiments of the present invention provide a data access method, an apparatus, a storage node, and a storage medium, which are applied to a storage node, where the storage node includes a persistent memory, the persistent memory includes a metadata area and a data area, and the storage node is in communication connection with a client, and the method includes: receiving a data writing request sent by a client, wherein the data writing request comprises the data length and the position of data to be written; generating first metadata for managing data to be written according to the data length; generating second metadata for managing the write log of the data to be written according to the data length and the position to be written; after writing the data to be written into the data area, writing the first metadata into the metadata area, and writing the second metadata into the data area. Compared with the prior art, the embodiment of the invention can ensure that atomicity is met when data is written into the persistent memory through the first metadata and the second metadata, and further ensure the consistency of the data in the persistent memory.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (14)

1. A data access method is applied to a storage node, the storage node comprises a persistent memory, the persistent memory comprises a metadata area and a data area, and the storage node is in communication connection with a client, and the method comprises the following steps:
receiving a data writing request sent by the client, wherein the data writing request comprises the data length and the position of data to be written;
generating first metadata for managing the data to be written according to the data length;
generating second metadata for managing the write log of the data to be written according to the data length and the position to be written;
after the data to be written is written into the data area, the first metadata is written into the metadata area, and the second metadata is written into the data area.
2. The data access method of claim 1, wherein the step of generating first metadata for managing the data to be written according to the data length comprises:
calculating the number of segments according to the data length and a preset length;
segmenting the data to be written according to the number of the segments to obtain at least one data segment;
generating metadata of each data segment according to the number of the data segments and the position of each data segment in the data to be written;
generating a reserved metadata for the data to be written, wherein the reserved metadata comprises a value obtained by adding 1 to the number of the data segments;
and taking the reserved metadata and the metadata of all the data segments as the first metadata.
3. The data access method according to claim 1, wherein the step of generating second metadata for managing a write log of the data to be written according to the data length and the position to be written comprises:
acquiring a flag bit which is used for representing that the data to be written is successfully written into the data area;
generating check data according to the zone bit, the data length and the position to be written;
and taking the flag bit, the data length, the position to be written and the check data as the second metadata.
4. The data access method according to claim 2, wherein the metadata area includes a plurality of metadata units, the metadata units are managed in a hierarchical manner, each level corresponds to a linked list, each linked list includes at least one management node, each management node is used for managing metadata units adjacent to each other, the number of metadata units managed by the management nodes in the same linked list is the same, the number of metadata units managed by the management nodes in any two linked lists is different, and the step of writing the first metadata into the metadata area includes:
determining a target level according to the data length and the preset length;
determining a first metadata unit to be written and a second metadata unit to be written from a management node of a linked list corresponding to the target level, wherein the number of the first metadata unit to be written is the number of the data segments, and the number of the second metadata unit to be written is 1;
sequentially writing the metadata of each data segment into each first metadata unit to be written according to the position of each data segment in the data to be written;
and writing the reserved metadata into the second metadata unit to be written.
5. The data access method of claim 4, wherein the data area includes a plurality of data units, one data unit for each metadata unit, each of the data segments is written to the data unit corresponding to each of the metadata units, and the step of writing the second metadata to the data area includes:
taking a data unit corresponding to the second metadata unit to be written as a data unit to be written;
and writing the second metadata into the data unit to be written.
6. The data access method according to claim 1, wherein the data area includes a plurality of data units, the storage node further includes a hard disk, a flush list and a recycle list, the flush list includes a flush position of data to be flushed and a data unit storing the data to be flushed, the data to be flushed is data stored in the persistent memory and to be flushed to the hard disk, the flush position is a position to be written in a write data request for writing the data to be flushed, the method further includes:
determining an effective data unit and an invalid data unit according to the position to be written and the disk brushing position;
if the valid data unit does not exist in the disk refreshing list, updating the valid data unit into the disk refreshing list so as to refresh the data in the valid data unit from the persistent memory into the hard disk through the disk refreshing list;
and if the invalid data unit exists in the disk refreshing list, deleting the invalid data unit from the disk refreshing list and inserting the invalid data unit into the recovery list so as to recover the invalid data unit through the recovery list.
7. The data access method of claim 6, wherein the step of removing and inserting the invalid data unit from the flush list into the recycle list comprises:
if the data units to be merged meeting preset merging conditions exist in the recovery list, deleting the data units to be merged from the recovery list;
merging the data unit to be merged and the invalid data unit to obtain a data unit to be inserted;
inserting the data unit to be inserted into the recovery list;
and if the data unit to be merged does not exist in the recovery list, inserting the invalid data unit into the recovery list.
8. The data access method of claim 6, wherein the method further comprises:
receiving a data reading request sent by the client, wherein the data reading request comprises the data length and the position of data to be read;
reading original data from the hard disk according to the data length and the position of the data to be read;
determining whether the latest data which corresponds to the data to be read and is not stored in the hard disk exists in the disk refreshing list according to the position to be read;
and combining the original data and the latest data to obtain the data to be read.
9. The data access method of claim 1, wherein the storage node further comprises a hard disk, the persistent memory stores data to be downloaded to be written to the hard disk when the storage node is powered down, the metadata region comprises a plurality of metadata units, each metadata unit comprises a log identifier, the data region comprises a plurality of data units, each metadata unit manages one data unit, the data to be downloaded exists in at least one data unit, the method further comprises:
when the storage node is powered on, dividing the plurality of metadata units into at least one metadata unit group according to the log identification;
and reconstructing each metadata unit group to write the data to be downloaded into the hard disk.
10. The data access method according to claim 9, wherein each metadata unit includes a slice number, a slice index, and a log identifier, and the step of performing reconstruction processing on each metadata unit group to write the data to be downloaded to the hard disk includes:
for any target metadata unit group in the at least one metadata unit group, reconstructing the target metadata unit group according to the fragmentation number and the fragmentation index of the target metadata unit in the target metadata unit group to obtain a log corresponding to the target metadata unit group;
reading the state of the log from the data units managed by the target metadata unit of the last sharded index, wherein the state of the log is used for representing whether target data in the data units managed by the target metadata units except the last sharded index in the target metadata units exist in the data area or not;
if the state of the log is a cache state used for representing that the target data exist in the data area, the target data are flushed into the hard disk;
and flushing data in the data units managed by the metadata units in the metadata unit group with the log state being a cache state into the hard disk, and finally writing the data to be downloaded into the hard disk.
11. The data access method of claim 10, wherein the method further comprises:
if the state of the log is a non-cache state which represents that the target data does not exist in the data area, recovering the data unit of the target metadata unit;
and recovering all data units managed by the metadata units in the metadata unit group with the journal state being in the non-cache state.
12. A data access apparatus, applied to a storage node, where the storage node includes a persistent memory, where the persistent memory includes a metadata area and a data area, and where the storage node is communicatively connected to a client, the apparatus includes:
the receiving module is used for receiving a data writing request sent by the client, wherein the data writing request comprises the data length and the position of data to be written;
the generating module is used for generating first metadata used for managing the data to be written according to the data length;
the generating module is further configured to generate second metadata for managing a write log of the data to be written according to the data length and the position to be written;
and the writing module is used for writing the first metadata into the metadata area and writing the second metadata into the data area after the data to be written is written into the data area.
13. A storage node comprising a processor and a memory; the memory is used for storing programs; the processor is configured to implement the data access method of any one of claims 1-11 when executing the program.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the data access method according to any one of claims 1 to 11.
CN202210323902.3A 2022-03-29 2022-03-29 Data access method, device, storage node and storage medium Pending CN114647383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210323902.3A CN114647383A (en) 2022-03-29 2022-03-29 Data access method, device, storage node and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210323902.3A CN114647383A (en) 2022-03-29 2022-03-29 Data access method, device, storage node and storage medium

Publications (1)

Publication Number Publication Date
CN114647383A true CN114647383A (en) 2022-06-21

Family

ID=81994822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210323902.3A Pending CN114647383A (en) 2022-03-29 2022-03-29 Data access method, device, storage node and storage medium

Country Status (1)

Country Link
CN (1) CN114647383A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115981875A (en) * 2023-03-21 2023-04-18 人工智能与数字经济广东省实验室(广州) Incremental update method, apparatus, device, medium, and product for memory storage systems
CN116431080A (en) * 2023-06-09 2023-07-14 苏州浪潮智能科技有限公司 Data disc-dropping method, system, equipment and computer readable storage medium
CN116560585A (en) * 2023-07-05 2023-08-08 支付宝(杭州)信息技术有限公司 Data hierarchical storage method and system
CN117891409A (en) * 2024-03-13 2024-04-16 济南浪潮数据技术有限公司 Data management method, device, equipment and storage medium for distributed storage system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115981875A (en) * 2023-03-21 2023-04-18 人工智能与数字经济广东省实验室(广州) Incremental update method, apparatus, device, medium, and product for memory storage systems
CN115981875B (en) * 2023-03-21 2023-08-25 人工智能与数字经济广东省实验室(广州) Incremental updating method, device, equipment, medium and product of memory storage system
CN116431080A (en) * 2023-06-09 2023-07-14 苏州浪潮智能科技有限公司 Data disc-dropping method, system, equipment and computer readable storage medium
CN116431080B (en) * 2023-06-09 2023-08-29 苏州浪潮智能科技有限公司 Data disc-dropping method, system, equipment and computer readable storage medium
CN116560585A (en) * 2023-07-05 2023-08-08 支付宝(杭州)信息技术有限公司 Data hierarchical storage method and system
CN116560585B (en) * 2023-07-05 2024-04-09 支付宝(杭州)信息技术有限公司 Data hierarchical storage method and system
CN117891409A (en) * 2024-03-13 2024-04-16 济南浪潮数据技术有限公司 Data management method, device, equipment and storage medium for distributed storage system

Similar Documents

Publication Publication Date Title
EP3726364B1 (en) Data write-in method and solid-state drive array
CN114647383A (en) Data access method, device, storage node and storage medium
TWI765289B (en) storage system
US11422703B2 (en) Data updating technology
US20200320036A1 (en) Data unit cloning in memory-based file systems
US9946655B2 (en) Storage system and storage control method
US11023318B1 (en) System and method for fast random access erasure encoded storage
JP6026538B2 (en) Non-volatile media journaling of validated datasets
US8214620B2 (en) Computer-readable recording medium storing data storage program, computer, and method thereof
CN113868192B (en) Data storage device and method and distributed data storage system
KR101077904B1 (en) Apparatus and method for managing flash memory using page level mapping algorithm
US8370587B2 (en) Memory system storing updated status information and updated address translation information and managing method therefor
WO2019184012A1 (en) Data writing method, client server, and system
WO2015054897A1 (en) Data storage method, data storage apparatus, and storage device
CN112631950B (en) L2P table saving method, system, device and medium
CN113568582B (en) Data management method, device and storage equipment
CN110147203B (en) File management method and device, electronic equipment and storage medium
CN111124266A (en) Data management method, device and computer program product
CN112799595A (en) Data processing method, device and storage medium
US11487428B2 (en) Storage control apparatus and storage control method
KR101077901B1 (en) Apparatus and method for managing flash memory using log block level mapping algorithm
JP2006099802A (en) Storage controller, and control method for cache memory
KR101618999B1 (en) Network boot system
US8364905B2 (en) Storage system with middle-way logical volume
CN116893783A (en) Operating system data management method and device, solid state disk and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination