CN116400855A - Data processing method and data storage system - Google Patents

Data processing method and data storage system Download PDF

Info

Publication number
CN116400855A
CN116400855A CN202310232546.9A CN202310232546A CN116400855A CN 116400855 A CN116400855 A CN 116400855A CN 202310232546 A CN202310232546 A CN 202310232546A CN 116400855 A CN116400855 A CN 116400855A
Authority
CN
China
Prior art keywords
partition
protocol processing
metadata
data storage
partitions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310232546.9A
Other languages
Chinese (zh)
Inventor
裴晓辉
李文兆
龚晓峰
邹勇波
熊杉杉
陈亮
汪先登
石兆斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202310232546.9A priority Critical patent/CN116400855A/en
Publication of CN116400855A publication Critical patent/CN116400855A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0632Configuration or reconfiguration of storage systems by initialisation or re-initialisation of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present invention provide a data processing method and a data storage system configured with a data storage partition for storing metadata, the method comprising: creating, in the data storage system, a protocol processing partition separate from the data storage partition; wherein the protocol processing partition is used for processing access scheduling for metadata; splitting the current protocol processing partition according to the access condition aiming at the metadata so as to distribute the access request to a plurality of split protocol processing partitions for service processing. According to the embodiment of the invention, the protocol processing layer and the storage layer of the metadata are separated, and the access request is split by means of the light and rapid splitting capability of the protocol processing partition, so that the burst traffic and the high concurrency access scene such as hot spot access can be dealt with, and the overall expansibility of the service is improved.

Description

Data processing method and data storage system
Technical Field
The present invention relates to the field of storage technologies, and in particular, to a data processing method and a data storage system.
Background
In PDS (Drive & Photo Service), there is massive metadata, and in order to solve the problem of management of massive metadata, a Partition (Partition) management mode may be used to create a data storage Partition.
For the characteristics of high concurrency access in services such as PDS, the system performance can be improved by splitting the data storage partition, but the splitting of the data storage partition needs to consume more resources and time, is difficult to perform high-frequency operation, has poor adaptability to high concurrency access scenes such as burst traffic and hot spot access, and influences the overall expansibility of the services.
Disclosure of Invention
In view of the above, it is proposed to provide a data processing method and a data storage system that overcome or at least partially solve the above problems, comprising:
a data processing method, a data storage system configured with a data storage partition for storing metadata, the method comprising:
creating a protocol processing partition separate from the data storage partition in the data storage system; the protocol processing partition is used for processing access scheduling aiming at metadata;
splitting the current protocol processing partition according to the access condition aiming at the metadata so as to distribute the access request to a plurality of split protocol processing partitions for service processing.
Optionally, the method further comprises:
in the process of processing an access request aiming at metadata by a current protocol processing partition, writing target metadata in a local cache into a log file, and synchronizing the log file to a data storage partition; wherein the target metadata is metadata updated based on the access request.
Optionally, before writing the target metadata in the local cache to the log file, the method further includes:
the required metadata is read from the data storage partition and stored to the local cache.
Optionally, before synchronizing the log file to the data storage partition, further comprising:
and judging whether the number of the metadata written in the log file is larger than a preset number threshold value, and executing the synchronization of the log file to the data storage partition under the condition that the number of the metadata is larger than the preset number threshold value.
Optionally, synchronizing the log file to the data storage partition includes:
and updating the index files of the data storage partitions according to the log files, and notifying all the data storage partitions to reload the updated index files.
Optionally, splitting the current protocol processing partition includes:
dividing a log file of a current protocol processing partition into a plurality of sub-log files;
loading one sub-log file in a plurality of new protocol processing partitions respectively; wherein a plurality of new protocol processing partitions are used to provide services as split protocol processing partitions.
Optionally, the method further comprises:
recording the newly added target metadata; the newly added target metadata are newly added target metadata of the current protocol processing partition after the log file is segmented and before the log file is set to be in a forbidden service state;
The newly added target metadata is synchronized to a plurality of new protocol processing partitions.
Optionally, after synchronizing the newly added target metadata to the plurality of new protocol processing partitions, further comprising:
and sending a splitting completion message to the plurality of new protocol processing partitions by means of two-stage commit so that the plurality of new protocol processing partitions start to serve as split protocol processing partitions.
Optionally, the method further comprises:
the current protocol processing partition is set to a forbidden service state.
Optionally, the method further comprises:
the number of copies of the data storage partition is configured according to the access conditions for the metadata.
Optionally, the data storage partition and the protocol processing partition in the data storage system adopt a distributed architecture, and the data storage partition stores metadata in the form of key value pairs.
A data storage system configured with a data storage partition for storing metadata and a computer program for execution in the data storage system, the computer program when executed by a processor implementing a data processing method as above.
An electronic device comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, the computer program implementing a data processing method as above when executed by the processor.
A computer readable storage medium having stored thereon a computer program which when executed by a processor implements a data processing method as above.
The embodiment of the invention has the following advantages:
in the embodiment of the invention, the protocol processing partition separated from the data storage partition is created in the data storage system, the data storage partition is used for storing metadata, the protocol processing partition is used for processing access scheduling aiming at the metadata, then the current protocol processing partition is split according to the access condition aiming at the metadata, so that the access request is distributed to a plurality of split protocol processing partitions for service processing, the separation of a protocol processing layer and a storage layer of the metadata is realized, the access request is split by means of the light and rapid splitting capability of the protocol processing partition, and further, the high-concurrency access scene such as burst flow and hot access can be dealt with, and the integral expansibility of the service is improved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the description of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow chart illustrating steps of a data processing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a system architecture according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating another data processing method according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating steps of another data processing method according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating steps of another data processing method according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating another data processing method according to an embodiment of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
According to the embodiment of the invention, according to the storage and access characteristics of metadata in services such as PDS, a framework that a protocol processing layer and a storage layer of the metadata are separated is adopted, and access requests are split by means of the light and rapid splitting capacity of the protocol processing partition.
For high availability, the fast splitting of the protocol processing partition has short impact time, and the multi-copy capability of the data storage partition ensures that when one copy is down, other copies can still provide file access. For high performance, more nodes after the protocol processing partition splits provide services, and the performance will be stronger, similar to the multiple copy capabilities of the data processing partition. For high scalability, linear expansion of system capacity is achieved based on splitting of protocol processing partitions and multiple copy capabilities of data processing partitions.
Referring to fig. 1, a flowchart illustrating steps of a data processing method according to an embodiment of the present invention is shown, where a data storage system may be a data storage system for metadata in a service such as PDS.
In order to solve the problem of managing massive metadata in services such as PDS, a partition management mode may be adopted in a data storage system, and the data storage system may configure a data storage partition for storing metadata. In one example, the data storage Partition may store metadata in the form of Key-Value pairs (KV), i.e., the data storage Partition is a KV Partition (KV Partition); wherein the metadata may include directory tree based related data.
Specifically, the method comprises the following steps:
step 101, creating a protocol processing partition separated from a data storage partition in a data storage system; wherein the protocol processing partition is for processing access scheduling for metadata.
For the characteristics of high concurrency access in services such as PDS, the system performance can be improved by splitting the data storage partition, but the splitting of the data storage partition needs to consume more resources and time, is difficult to perform high-frequency operation, has poor adaptability to high concurrency access scenes such as burst traffic and hot spot access, and influences the overall expansibility of the services.
Specifically, in the data storage partition, splitting of the primary data storage partition involves splitting of metadata, after splitting, the child data storage partition first loads a data index of the metadata, and then obtains the latest metadata by replaying a log (redox log) of the parent data storage partition, which is large in both process resource consumption and time consumption.
Based on this, the protocol processing layer of the metadata can be separated from the storage layer, and a protocol processing partition (Protocol Partition) separated from the data storage partition can be created by employing a partition management mode for the protocol processing layer.
The protocol processing partition can be used for processing access scheduling aiming at metadata, is a logic partition and is a minimum scheduling unit for user IO access, and can be used for realizing access scheduling aiming at metadata in services such as PDS (data storage system), for example, the protocol processing partition is responsible for creating a file flow, then newly created data is written into the data storage partition, and the data storage partition manages real metadata.
Since the protocol processing partition only caches the recently accessed metadata, only the recently accessed metadata needs to be split during splitting, the data size is generally small, and only MB level is often used, so that the splitting can be completed quickly.
In an embodiment of the present invention, in order to adapt to the situation that services such as PDS have massive metadata, a data storage partition and a protocol processing partition in a data storage system may adopt a distributed architecture, that is, a plurality of distributed data storage partitions may be set, and a plurality of distributed protocol processing partitions may be set, as shown in fig. 2, where a plurality of protocol processing partitions PP1 to PPn and a plurality of data storage partitions KP1 to KPn are configured in the data storage system.
In an embodiment of the present invention, further includes:
In the process of processing an access request aiming at metadata by a current protocol processing partition, writing target metadata in a local cache into a log file, and synchronizing the log file to a data storage partition; wherein the target metadata is metadata updated based on the access request.
After the protocol processing partition is created, the access request of the application is processed by the protocol processing partition, the metadata in the data storage partition is read-only and not modified, and all modifications are completed on the protocol processing partition side.
In the process that the current protocol processing partition processes the access request aiming at the metadata, the recently accessed metadata can be cached in the local cache of the current protocol processing partition, the target metadata in the local cache can be determined, and the target metadata are metadata updated based on the access request, namely dirty data.
In particular, metadata may be composed of a plurality of attributes, and if the value of any one of the attributes is changed, the metadata is dirty data, i.e., target metadata. For example, when an application accesses a file, the last access time (atime) attribute, i.e., the value of the atime attribute, is updated, and the metadata of the accessed file is referred to as dirty data, i.e., target metadata, because the application access changes.
After the target metadata is determined, the target metadata can be written into a Log (Log) file, and the protocol processing layer and the data storage layer are separated from each other through an interaction process of synchronizing updated metadata existing in the protocol processing layer to the data storage layer, so that the protocol processing layer and the data storage layer can stretch and retract the partitions independently of each other. In addition, metadata of the recently modified file is stored in the unique log file of the current protocol processing partition, so that even if the current protocol processing partition is down or fails, the metadata can be restored through loading of the logs, and services can be provided to the outside.
In an example, if the target metadata cannot be written to the log file, it is determined that the request processing fails, the processing is ended, and the result of the processing failure may be fed back to the application.
After the target metadata is written into the log files, the log files can be synchronized to the data storage partition, and by synchronizing the log files carrying the metadata with the updates to the data storage partition, on one hand, the latest metadata can be visible to other protocol processing partitions through the data storage partition, and on the other hand, after the log files are synchronized to the data storage partition, the current protocol processing partition can not maintain the log files any more, the number of unique log storage of the current protocol processing partition is compressed, and the split data volume is ensured to be controllable.
In one embodiment of the present invention, before writing the target metadata in the local cache to the log file, the method further includes:
the required metadata is read from the data storage partition and stored to the local cache.
For the access request, the metadata required to be accessed can be determined, then the local cache of the current protocol processing partition can be searched preferentially, if the local cache stores the required metadata, the access of the metadata can be directly performed, if the local cache does not store the required metadata, the required metadata can be read from the data storage partition, and then the required metadata can be updated into the local cache of the current protocol processing partition.
The protocol processing partition and the data storage partition may be deployed in the same cluster or in different clusters, but may be loaded by different servers, and may communicate through an RPC (Remote Produce Call, remote procedure call), where the protocol processing partition may read metadata from the data storage partition through the RPC.
In one example, if the required metadata cannot be read from the data storage partition, it is determined that the request processing failed, the processing is ended, and the result of the processing failure may be fed back to the application.
In one embodiment of the present invention, before synchronizing the log file to the data storage partition, further comprising:
and judging whether the number of the metadata written in the log file is larger than a preset number threshold value, and executing the synchronization of the log file to the data storage partition under the condition that the number of the metadata is larger than the preset number threshold value.
In a specific implementation, a preset number threshold, such as 10MB, may be set, it may be checked whether the number of metadata written in the log file is greater than the preset number threshold, if not, the metadata may be written continuously, and if so, a background operation may be triggered to synchronize the log file to the data storage partition.
The above protocol processing partition processing read-write flow is exemplarily described below with reference to fig. 2:
a1, checking whether the accessed metadata is locally cached, if so, processing an access request, otherwise, jumping to a step a5;
a2, writing dirty data (namely target metadata) in the request processing process into Log (namely Log file), if the writing fails, the request processing fails, and jumping to the step a 4;
a3, checking whether the quantity of dirty data in the Log exceeds a certain threshold (namely a preset quantity threshold), and if so, triggering a background operation to synchronize the Log to KV Partition;
a4, ending the processing and returning the result to the application;
and a5, reading metadata from the KV Partition through the RPC, if successful, updating the local cache, and then jumping to the step a1, otherwise, requesting that the processing fails, ending the processing, and returning the result to the application.
In one embodiment of the present invention, synchronizing a log file to a data storage partition includes:
and updating the index files of the data storage partitions according to the log files, and notifying all the data storage partitions to reload the updated index files.
In practical application, the existing index files in the data storage partition can be updated according to the log file carrying the target metadata, namely, the index files are combined in the existing index files to obtain all updated files, and then all the data storage partitions can be informed of reloading the updated index files, namely, synchronization is completed.
Specifically, the log file may be serialized into a storage file in a specified storage format, where the storage file in the specified storage format is a storage file used when the protocol processing partition synchronizes target metadata to the data storage partition, for example, the log file is serialized into a Cpt (Checkpoint) file, and then submitted to a specified service, where the specified service is a service for processing synchronization of the log file in the protocol processing partition to the data storage partition, for example, a Coordinator service.
For the specified service, it may combine the storage file in the specified storage format with the currently existing index file (MetaInfo, the file holding the metadata index file list in the storage data partition) to obtain a new index file, and then the specified service may notify all the data storage partitions to reload the new index folder and notify the current protocol processing partition that the log file has completed synchronously. After synchronization is complete, the current protocol processing partition may reclaim log space.
The above log file synchronization process is exemplarily described below with reference to fig. 2:
b1, serializing Log into Cpt and then submitting the Cpt to a Coordinator;
b2, merging the cpt and the current MetaInfo by using a Coordinator to obtain a new MetaInfo;
b3, notifying all KV Partition to reload new MetaInfo by the Coordinator;
b4, notifying Protocol Partition Log of synchronous completion by the Coordinator;
and b5, protocol Partition receive the synchronization completion message, reclaim Log space and finish synchronization.
Because the synchronization operation of Cpt is idempotent, any step from step b1 to step b4 is failed, the present round of synchronization process is directly exited, and retries are performed after a period of time is exited.
Step 102, splitting the current protocol processing partition according to the access condition aiming at the metadata so as to distribute the access request to the split multiple protocol processing partitions for service processing.
In practical application, the access condition of the PDS and other services to the metadata can be detected, then the current protocol processing partition can be split according to the access condition of the PDS and other services, for example, the current protocol processing partition is split rapidly under the high concurrency access scene of burst traffic, hot spot access and the like, and the self-adaption capability of the system to the burst traffic is improved.
After splitting the current protocol processing partition, the access traffic can be effectively split to more instances (i.e., the split protocol processing partition), and the overall capability of the system is improved.
Because the application generally does not access a lot of files in a short time, when the current protocol processing partition is split, the synchronous data volume is generally small, so that the splitting can be completed in a short time, and the protocol processing partition is ensured to have the capability of coping with burst traffic.
In one embodiment of the present invention, splitting a current protocol processing partition includes:
Dividing a log file of a current protocol processing partition into a plurality of sub-log files; loading one sub-log file in a plurality of new protocol processing partitions respectively; wherein a plurality of new protocol processing partitions are used to provide services as split protocol processing partitions.
In the process of splitting the current protocol processing partition, the log file of the current protocol processing partition can be split into a plurality of sub-log files, then a plurality of new protocol processing partitions can be selected as the split protocol processing partition, one of the sub-log files is loaded in each new protocol processing partition, and then all the sub-log files obtained by switching are migrated to the new protocol processing partitions.
Specifically, the current protocol processing partition may split the log file into a plurality of sub-log files, and then may notify the specified service that it needs to split, such as a Coordinator, the specified service may further allocate a plurality of new protocol processing partitions to the specified service, and specify an initial log file for each new protocol processing partition from the plurality of sub-log files.
After each new protocol processing partition loads the sub-log file, the specified service may be notified, which in turn may notify the current protocol processing partition of which protocol processing partition it split.
In an embodiment of the present invention, further includes:
recording the newly added target metadata; synchronizing the newly added target metadata to a plurality of new protocol processing partitions; the newly added target metadata are newly added target metadata of the current protocol processing partition after the log file is segmented and before the current protocol processing partition is set to be in a forbidden service state.
After the current protocol processing partition cuts the log file and before the current protocol processing partition is set to a forbidden service state, that is, the current protocol processing partition is ready to cut the log file to a new protocol processing partition, but the current protocol processing partition is still providing service, the new target metadata, that is, the new dirty data, can be recorded during the period.
After the sub-log file is loaded in the new protocol processing partition, the current protocol processing partition knows which protocol processing partitions are split, so that the new target metadata can be synchronized to the new protocol processing partition, if 100 pieces of the new target metadata exist in the storage of the current protocol processing partition P0, a two-average method can be adopted to synchronize 50 pieces of the new target metadata for the new protocol processing partitions P1 and P2 respectively.
In one embodiment of the present invention, after synchronizing the newly added target metadata to the plurality of new protocol processing partitions, the method further includes:
And sending a splitting completion message to the plurality of new protocol processing partitions by means of two-stage commit so that the plurality of new protocol processing partitions start to serve as split protocol processing partitions.
After the new protocol processing partition synchronizes the newly added target metadata, a splitting completion message can be sent to a plurality of new protocol processing partitions in a two-stage submitting mode, so that each new protocol processing partition can be ensured to know that splitting is completed, and after the new protocol processing partition is known to complete splitting, the new protocol processing partition can start to provide services.
In an embodiment of the present invention, further includes:
the current protocol processing partition is set to a forbidden service state.
After splitting is completed, the current protocol processing partition may be set to a forbidden service state, and the specified service may be notified, which in turn may reclaim the current protocol processing partition, ending the splitting process.
In an example, during the splitting process of the current protocol processing partition, the progress status of the splitting may be periodically detected by the designated service, and if failure or timeout occurs, the designated service may force the split status to be rolled back, so that the current protocol processing partition continues to provide the service, and the new protocol processing partition is discarded.
The above-described splitting process is exemplified below in conjunction with fig. 2:
assuming Protocol Partition is split in two, P0 is the partition to be split (i.e., the current protocol processing partition), and P1 and P2 are each the sub-partitions after splitting (i.e., the new protocol processing partition), as follows:
c1, P0 divides Log (namely Log file) into Log1 and Log2 (namely sub Log file) according to splitting rule, records newly added dirty data (namely newly added target metadata) at the same time, and then informs a Coordinator of splitting;
c2, the Coordinater allocates two new partitions P1 and P2, and then initial Log of the P1 and the P2 is designated as Log1 and Log2 respectively;
c3, loading Log1 and Log2 respectively by P1 and P2, and notifying a Coordinator after loading is finished;
c4, notifying the P0 sub-partitions after splitting to be P1 and P2 by the Coordinator;
c5, P0 starts to synchronize the newly-added dirty data to P1 and P2 respectively, when synchronization is completed, P1 and P2 can be ensured to be both informed of completion of splitting through a two-stage submitting mode, and P1 and P2 start to be served after obtaining information;
c6, P0 is set to be in a forbidden service state, and the completion of the Coordinator splitting is notified;
c7, recovering P0 by a Coordinator, and completing the splitting process.
In an embodiment of the present invention, further includes:
The number of copies of the data storage partition is configured according to the access conditions for the metadata.
In practice, since the data storage partitions manage read-only metadata, when certain metadata becomes hot spot data, the number of copies of the data storage partitions servicing the hot spot data may be increased.
In the embodiment of the invention, the protocol processing partition separated from the data storage partition is created in the data storage system, the data storage partition is used for storing metadata, the protocol processing partition is used for processing access scheduling aiming at the metadata, then the current protocol processing partition is split according to the access condition aiming at the metadata, so that the access request is distributed to a plurality of split protocol processing partitions for service processing, the separation of a protocol processing layer and a storage layer of the metadata is realized, the access request is split by means of the light and rapid splitting capability of the protocol processing partition, and further, the high-concurrency access scene such as burst flow and hot access can be dealt with, and the integral expansibility of the service is improved.
Compared with the method for managing directory tree metadata based on distributed shared block storage, the method has the following advantages:
1. In the mode based on distributed shared block storage, the protocol layer does not have Log capability, the IO path is longer, and the delay is higher, but in the embodiment of the invention, the protocol processing partition caches the latest access data through a temporary Log mechanism, so that the IO path is shortened, the delay is lower, and the writing performance is improved.
2. In the mode based on the distributed shared block storage, consistency depends on the capacity of the shared blocks, the implementation is complex generally, the Failover time is long, and in the embodiment of the invention, when the protocol processing partition is reloaded by Failover, recently accessed metadata can be loaded by replaying Log, so that the influence of Failover on access is smaller.
3. In the mode based on the distributed shared block storage, the capability of providing multiple copies for the hot spot data is not available, and the capability of coping with hot spots and burst traffic is limited.
Referring to fig. 3, a flowchart illustrating steps of another data processing method according to an embodiment of the present invention may specifically include the following steps:
Step 301, creating a protocol processing partition separated from a data storage partition in a data storage system; wherein the protocol processing partition is for processing access scheduling for metadata.
Step 302, in the process of processing an access request for metadata by a current protocol processing partition, writing target metadata in a local cache into a log file, and synchronizing the log file to a data storage partition; wherein the target metadata is metadata updated based on the access request.
After the protocol processing partition is created, the access request of the application is processed by the protocol processing partition, the metadata in the data storage partition is read-only and not modified, and all modifications are completed on the protocol processing partition side.
In the process that the current protocol processing partition processes the access request aiming at the metadata, the recently accessed metadata can be cached in the local cache of the current protocol processing partition, the target metadata in the local cache can be determined, and the target metadata are metadata updated based on the access request, namely dirty data.
In particular, metadata may be composed of a plurality of attributes, and if the value of any one of the attributes is changed, the metadata is dirty data, i.e., target metadata. For example, when an application accesses a file, the last access time (atime) attribute, i.e., the value of the atime attribute, is updated, and the metadata of the accessed file is referred to as dirty data, i.e., target metadata, because the application access changes.
After the target metadata is determined, the target metadata can be written into a Log (Log) file, and the protocol processing layer and the data storage layer are separated from each other through an interaction process of synchronizing updated metadata existing in the protocol processing layer to the data storage layer, so that the protocol processing layer and the data storage layer can stretch and retract the partitions independently of each other. In addition, metadata of the recently modified file is stored in the unique log file of the current protocol processing partition, so that even if the current protocol processing partition is down or fails, the metadata can be restored through loading of the logs, and services can be provided to the outside.
In an example, if the target metadata cannot be written to the log file, it is determined that the request processing fails, the processing is ended, and the result of the processing failure may be fed back to the application.
After the target metadata is written into the log files, the log files can be synchronized to the data storage partition, and the log files carrying the metadata with the updates are synchronized to the data storage partition, so that the latest metadata can be visible to other protocol processing partitions through the data storage partition on one hand, and on the other hand, after the log files are synchronized to the data storage partition, the current protocol processing partition can not maintain the log files any more, the number of unique log storage of the current protocol processing partition is compressed, and the split data volume is ensured to be controllable.
Step 303, splitting the current protocol processing partition according to the access condition for the metadata, so as to distribute the access request to the split multiple protocol processing partitions for service processing.
Referring to fig. 4, a flowchart illustrating steps of another data processing method according to an embodiment of the present invention may specifically include the following steps:
step 401, creating a protocol processing partition separated from a data storage partition in a data storage system; wherein the protocol processing partition is for processing access scheduling for metadata.
In step 402, during the processing of the access request for metadata by the current protocol processing partition, the required metadata is read from the data storage partition and stored in the local cache.
For the access request, the metadata required to be accessed can be determined, then the local cache of the current protocol processing partition can be searched preferentially, if the local cache stores the required metadata, the access of the metadata can be directly performed, if the local cache does not store the required metadata, the required metadata can be read from the data storage partition, and then the required metadata can be updated into the local cache of the current protocol processing partition.
Where the protocol processing partition and the data storage partition may be deployed in the same cluster, but they may be loaded by different servers, communication may be through the RPC (Remote Produce Call, remote procedure call) through which the protocol processing partition may read metadata from the data storage partition.
In one example, if the required metadata cannot be read from the data storage partition, it is determined that the request processing failed, the processing is ended, and the result of the processing failure may be fed back to the application.
Step 403, writing the target metadata in the local cache into a log file; wherein the target metadata is metadata updated based on the access request.
Step 404, determining whether the number of metadata written in the log file is greater than a preset number threshold, and synchronizing the log file to the data storage partition if the number of metadata is greater than the preset number threshold.
In a specific implementation, a preset number threshold, such as 10MB, may be set, it may be checked whether the number of metadata written in the log file is greater than the preset number threshold, if not, the metadata may be written continuously, and if so, a background operation may be triggered to synchronize the log file to the data storage partition.
The above protocol processing partition processing read-write flow is exemplarily described below with reference to fig. 2:
a1, checking whether the accessed metadata is locally cached, if so, processing an access request, otherwise, jumping to a step a5;
a2, writing dirty data (namely target metadata) in the request processing process into Log (namely Log file), if the writing fails, the request processing fails, and jumping to the step a 4;
a3, checking whether the quantity of dirty data in the Log exceeds a certain threshold (namely a preset quantity threshold), and if so, triggering a background operation to synchronize the Log to KV Partition;
a4, ending the processing and returning the result to the application;
and a5, reading metadata from the KV Partition through the RPC, if successful, updating the local cache, and then jumping to the step a1, otherwise, requesting that the processing fails, ending the processing, and returning the result to the application.
Step 405, splitting the current protocol processing partition according to the access condition for the metadata, so as to distribute the access request to the split multiple protocol processing partitions for service processing.
Referring to fig. 5, a flowchart illustrating steps of another data processing method according to an embodiment of the present invention may specifically include the following steps:
Step 501, creating a protocol processing partition separate from a data storage partition in a data storage system; wherein the protocol processing partition is for processing access scheduling for metadata.
Step 502, writing target metadata in a local cache into a log file in the process of processing an access request for metadata by a current protocol processing partition; wherein the target metadata is metadata updated based on the access request.
Step 503, updating the index files of the data storage partitions according to the log files, and notifying all the data storage partitions to reload the updated index files.
In practical application, the existing index files in the data storage partition can be updated according to the log file carrying the target metadata, namely, the index files are combined in the existing index files to obtain all updated files, and then all the data storage partitions can be informed of reloading the updated index files, namely, synchronization is completed.
Specifically, the log file may be serialized into a storage file in a specified storage format, where the storage file in the specified storage format is a storage file used when the protocol processing partition synchronizes target metadata to the data storage partition, for example, the log file is serialized into a Cpt (Checkpoint) file, and then submitted to a specified service, where the specified service is a service for processing synchronization of the log file in the protocol processing partition to the data storage partition, for example, a Coordinator.
For the specified service, it may combine the storage file in the specified storage format with the currently existing index file (MetaInfo, the file holding the index file list in the storage data partition) to obtain a new index file, and the specified service may then notify all the data storage partitions to reload the new index folder and notify the current protocol processing partition that the log file has completed synchronously. After synchronization is complete, the current protocol processing partition may reclaim log space.
The above log file synchronization process is exemplarily described below with reference to fig. 2:
b1, serializing Log into Cpt and then submitting the Cpt to a Coordinator;
b2, merging the cpt and the current MetaInfo by using a Coordinator to obtain a new MetaInfo;
b3, notifying all KV Partition to reload new MetaInfo by the Coordinator;
b4, notifying Protocol Partition Log of synchronous completion by the Coordinator;
and b5, protocol Partition receive the synchronization completion message, reclaim Log space and finish synchronization.
Because the synchronization operation of Cpt is idempotent, any step from step b1 to step b4 is failed, the present round of synchronization process is directly exited, and retries are performed after a period of time is exited.
Step 504, splitting the current protocol processing partition according to the access condition for the metadata, so as to distribute the access request to the split multiple protocol processing partitions for service processing.
Referring to fig. 6, a flowchart illustrating steps of another data processing method according to an embodiment of the present invention may specifically include the following steps:
step 601, creating a protocol processing partition separated from a data storage partition in a data storage system; wherein the protocol processing partition is for processing access scheduling for metadata.
Step 602, in the process of processing an access request for metadata by a current protocol processing partition, writing target metadata in a local cache into a log file, and synchronizing the log file to a data storage partition; wherein the target metadata is metadata updated based on the access request.
Step 603, splitting the log file of the current protocol processing partition into a plurality of sub-log files, and loading one of the sub-log files in a plurality of new protocol processing partitions respectively; wherein a plurality of new protocol processing partitions are used to provide services as split protocol processing partitions.
In the process of splitting the current protocol processing partition, the log file of the current protocol processing partition can be split into a plurality of sub-log files, then a plurality of new protocol processing partitions can be selected as the split protocol processing partition, one of the sub-log files is loaded in each new protocol processing partition, and then all the sub-log files obtained by switching are migrated to the new protocol processing partitions.
Specifically, the current protocol processing partition may split the log file into a plurality of sub-log files, and then may notify the specified service that it needs to split, such as a Coordinator, the specified service may further allocate a plurality of new protocol processing partitions to the specified service, and specify an initial log file for each new protocol processing partition from the plurality of sub-log files.
After each new protocol processing partition loads the sub-log file, the specified service may be notified, which in turn may notify the current protocol processing partition of which protocol processing partition it split.
Step 604, recording the newly added target metadata and synchronizing the newly added target metadata to a plurality of new protocol processing partitions; the newly added target metadata are newly added target metadata of the current protocol processing partition after the log file is segmented and before the current protocol processing partition is set to be in a forbidden service state.
After the current protocol processing partition cuts the log file and before the current protocol processing partition is set to a forbidden service state, that is, the current protocol processing partition is ready to cut the log file to a new protocol processing partition, but the current protocol processing partition is still providing service, the new target metadata, that is, the new dirty data, can be recorded during the period.
After the sub-log file is loaded in the new protocol processing partition, the current protocol processing partition knows which protocol processing partitions are split, so that the new target metadata can be synchronized to the new protocol processing partition, if 100 pieces of the new target metadata exist in the storage of the current protocol processing partition P0, a two-average method can be adopted to synchronize 50 pieces of the new target metadata for the new protocol processing partitions P1 and P2 respectively.
Step 605, sending a split complete message to the plurality of new protocol processing partitions by way of two-phase commit, to be serviced by the plurality of new protocol processing partitions as a split protocol processing partition.
After the new protocol processing partition synchronizes the newly added target metadata, a splitting completion message can be sent to a plurality of new protocol processing partitions in a two-stage submitting mode, so that each new protocol processing partition can be ensured to know that splitting is completed, and after the new protocol processing partition is known to complete splitting, the new protocol processing partition can start to provide services.
Step 606, the current protocol processing partition is set to a no service state.
After splitting is completed, the current protocol processing partition may be set to a forbidden service state, and the specified service may be notified, which in turn may reclaim the current protocol processing partition, ending the splitting process.
In an example, during the splitting process of the current protocol processing partition, the progress status of the splitting may be periodically detected by the designated service, and if failure or timeout occurs, the designated service may force the split status to be rolled back, so that the current protocol processing partition continues to provide the service, and the new protocol processing partition is discarded.
The above-described splitting process is exemplified below in conjunction with fig. 2:
assuming Protocol Partition is split in two, P0 is the partition to be split (i.e., the current protocol processing partition), and P1 and P2 are each the sub-partitions after splitting (i.e., the new protocol processing partition), as follows:
c1, P0 divides Log (namely Log file) into Log1 and Log2 (namely sub Log file) according to splitting rule, records newly added dirty data (namely newly added target metadata) at the same time, and then informs a Coordinator of splitting;
c2, the Coordinater allocates two new partitions P1 and P2, and then initial Log of the P1 and the P2 is designated as Log1 and Log2 respectively;
c3, loading Log1 and Log2 respectively by P1 and P2, and notifying a Coordinator after loading is finished;
c4, notifying the P0 sub-partitions after splitting to be P1 and P2 by the Coordinator;
c5, P0 starts to synchronize the newly-added dirty data to P1 and P2 respectively, when synchronization is completed, P1 and P2 can be ensured to be both informed of completion of splitting through a two-stage submitting mode, and P1 and P2 start to be served after obtaining information;
c6, P0 is set to be in a forbidden service state, and the completion of the Coordinator splitting is notified;
c7, recovering P0 by a Coordinator, and completing the splitting process.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.
An embodiment of the present invention also provides a data storage system configured with a data storage partition for storing metadata and a computer program for execution in the data storage system, which when executed by a processor implements the data processing method as above.
An embodiment of the present invention also provides an electronic device including a processor, a memory, and a computer program stored on the memory and capable of running on the processor, the computer program implementing the above data processing method when executed by the processor.
An embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the above data processing method.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
It should be noted that, in the embodiment of the present invention, the related data related to the user is performed under the premise of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The foregoing has outlined rather broadly the more detailed description of a data processing method and data storage system that may be implemented in order that the detailed description of the invention and the embodiments that follow may be implemented in accordance with the principles and embodiments of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (14)

1. A data processing method, wherein a data storage system is configured with a data storage partition for storing metadata, the method comprising:
creating, in the data storage system, a protocol processing partition with the data storage partition; wherein the protocol processing partition is used for processing access scheduling for metadata;
splitting the current protocol processing partition according to the access condition aiming at the metadata so as to distribute the access request to a plurality of split protocol processing partitions for service processing.
2. The method as recited in claim 1, further comprising:
writing target metadata in a local cache into a log file in the process of processing an access request aiming at metadata by the current protocol processing partition, and synchronizing the log file to the data storage partition; wherein the target metadata is metadata updated based on the access request presence.
3. The method of claim 2, further comprising, prior to said writing the target metadata in the local cache to the log file:
and reading the required metadata from the data storage partition and storing the metadata into a local cache.
4. The method of claim 2, further comprising, prior to said synchronizing said log file to said data storage partition:
and judging whether the number of the metadata written in the log file is larger than a preset number threshold, and executing the synchronization of the log file to the data storage partition under the condition that the number of the metadata is larger than the preset number threshold.
5. The method of claim 2, wherein the synchronizing the log file to the data storage partition comprises:
and updating the index files of the data storage partitions according to the log files, and notifying all the data storage partitions to reload the updated index files.
6. The method of any of claims 2 to 5, wherein splitting the current protocol processing partition comprises:
dividing the log file of the current protocol processing partition into a plurality of sub-log files;
Loading one sub-log file in a plurality of new protocol processing partitions respectively; wherein the plurality of new protocol processing partitions are configured to provide services as split protocol processing partitions.
7. The method as recited in claim 6, further comprising:
recording the newly added target metadata; the newly added target metadata are newly added target metadata of the current protocol processing partition after the log file is segmented and before the current protocol processing partition is set to be in a forbidden service state;
synchronizing the newly added target metadata to the plurality of new protocol processing partitions.
8. The method of claim 7, further comprising, after said synchronizing said newly added target metadata to said plurality of new protocol processing partitions:
sending a splitting completion message to the plurality of new protocol processing partitions by way of two-phase commit to be serviced by the plurality of new protocol processing partitions as a split protocol processing partition.
9. The method as recited in claim 8, further comprising:
and setting the current protocol processing partition to be in a forbidden service state.
10. The method as recited in claim 1, further comprising:
and configuring the copy number of the data storage partition according to the access condition of the metadata.
11. The method of claim 1, wherein the data storage partitions and protocol processing partitions in the data storage system employ a distributed architecture, and wherein the data storage partitions store metadata in the form of key-value pairs.
12. A data storage system configured with a data storage partition for storing metadata and a computer program for execution in the data storage system, which when executed by a processor implements the data processing method of any of claims 1 to 11.
13. An electronic device comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, which computer program, when executed by the processor, implements the data processing method according to any one of claims 1 to 11.
14. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the data processing method according to any one of claims 1 to 11.
CN202310232546.9A 2023-03-03 2023-03-03 Data processing method and data storage system Pending CN116400855A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310232546.9A CN116400855A (en) 2023-03-03 2023-03-03 Data processing method and data storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310232546.9A CN116400855A (en) 2023-03-03 2023-03-03 Data processing method and data storage system

Publications (1)

Publication Number Publication Date
CN116400855A true CN116400855A (en) 2023-07-07

Family

ID=87018825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310232546.9A Pending CN116400855A (en) 2023-03-03 2023-03-03 Data processing method and data storage system

Country Status (1)

Country Link
CN (1) CN116400855A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407338A (en) * 2023-12-15 2024-01-16 北京壁仞科技开发有限公司 System, method and computing device for data transmission synchronization
CN117596256A (en) * 2024-01-19 2024-02-23 济南浪潮数据技术有限公司 Data synchronization method, device, system, electronic equipment and readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407338A (en) * 2023-12-15 2024-01-16 北京壁仞科技开发有限公司 System, method and computing device for data transmission synchronization
CN117407338B (en) * 2023-12-15 2024-03-19 北京壁仞科技开发有限公司 System, method and computing device for data transmission synchronization
CN117596256A (en) * 2024-01-19 2024-02-23 济南浪潮数据技术有限公司 Data synchronization method, device, system, electronic equipment and readable storage medium
CN117596256B (en) * 2024-01-19 2024-03-22 济南浪潮数据技术有限公司 Data synchronization method, device, system, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN109074306B (en) Hybrid garbage collection in a distributed storage system
US8959227B2 (en) In-flight block map for a clustered redirect-on-write filesystem
US8458181B2 (en) Distributed free block map for a clustered redirect-on-write file system
CN116400855A (en) Data processing method and data storage system
US10289692B2 (en) Preserving file metadata during atomic save operations
CN110807062B (en) Data synchronization method and device and database host
CN109582686B (en) Method, device, system and application for ensuring consistency of distributed metadata management
CN113268472B (en) Distributed data storage system and method
CN115599747B (en) Metadata synchronization method, system and equipment of distributed storage system
CN113220729B (en) Data storage method and device, electronic equipment and computer readable storage medium
CN110022338B (en) File reading method and system, metadata server and user equipment
CN112632029B (en) Data management method, device and equipment of distributed storage system
CN112052230B (en) Multi-machine room data synchronization method, computing device and storage medium
CN106325768B (en) A kind of two-shipper storage system and method
CN112334891B (en) Centralized storage for search servers
CN113449065A (en) Data deduplication-oriented decentralized storage method and storage device
CN106873902B (en) File storage system, data scheduling method and data node
CN109726211B (en) Distributed time sequence database
CN113190619B (en) Data read-write method, system, equipment and medium for distributed KV database
CN112579550B (en) Metadata information synchronization method and system of distributed file system
CN105323271B (en) Cloud computing system and processing method and device thereof
CN115168367B (en) Data configuration method and system for big data
CN110119389B (en) Writing operation method of virtual machine block equipment, snapshot creation method and device
CN112231150B (en) Method and device for recovering fault database in database cluster
CN111400098B (en) Copy management method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination