CN116185949A - Cache storage method and related equipment - Google Patents

Cache storage method and related equipment Download PDF

Info

Publication number
CN116185949A
CN116185949A CN202211717784.0A CN202211717784A CN116185949A CN 116185949 A CN116185949 A CN 116185949A CN 202211717784 A CN202211717784 A CN 202211717784A CN 116185949 A CN116185949 A CN 116185949A
Authority
CN
China
Prior art keywords
area
file
metadata
partition
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211717784.0A
Other languages
Chinese (zh)
Inventor
刘日新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202211717784.0A priority Critical patent/CN116185949A/en
Publication of CN116185949A publication Critical patent/CN116185949A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/164File meta data generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The embodiment of the application discloses a cache storage method and related equipment, which are used for reducing the competition pressure of metadata update brought by a large amount of cache storage. The method comprises the following steps: storing the current data of the current file change in the data area, and determining a first block corresponding to the current file in the file identification area based on the current storage condition of the file identification area, wherein the first block is used for recording the current file identification; storing file identification updating information in the metadata updating area, wherein the file identification updating information comprises the identification of the first block and the current file identification; and if the metadata updating area meets the preset metadata updating condition, writing the current file identification into the first partition according to the identification of the first partition. The cache comprises a metadata area and a data area, wherein the metadata area comprises a file identification area and a current file identification for acquiring a current file by a metadata updating area.

Description

Cache storage method and related equipment
Technical Field
The embodiment of the application relates to the field of computer storage, in particular to a cache storage method and related equipment.
Background
Cache refers to a type of high-speed memory that has a faster access speed than a typical random access memory, and typically does not use dynamic random access memory (DRAM, dynamic random access memory) technology as in system main memory, but rather uses expensive but faster static random access memory (SRAM, static random access memory) technology, the placement of which is one of the important factors in achieving high performance in all modern computer systems.
The cache includes a metadata area for storing data and a data area for describing the data stored in the data area. If there is a large amount of data to be stored, a large amount of metadata to be stored in the metadata area is correspondingly generated, and in the prior art, the metadata is directly written into the corresponding writing position after determining the metadata to be written and the corresponding writing position.
When facing a large amount of metadata to be updated, the continuous writing of a large amount of metadata can affect the query performance of the cache, thereby bringing about competitive pressure during metadata updating.
Disclosure of Invention
The embodiment of the application provides a cache storage method and related equipment, which are used for reducing competitive pressure generated during metadata updating.
An embodiment of the present application provides a method for storing a cache, where the cache includes a metadata area and a data area, the metadata area includes a file identification area and a metadata update area, and the method includes:
acquiring a current file identifier of a current file;
storing the current data of the current file change in the data area, and determining a first block corresponding to the current file in the file identification area based on the current storage condition of the file identification area, wherein the first block is used for recording the current file identification;
storing file identification updating information in the metadata updating area, wherein the file identification updating information comprises the identification of the first block and the current file identification;
and if the metadata updating area meets the preset metadata updating condition, writing the current file identification into the first partition according to the identification of the first partition.
In a specific implementation manner, the metadata area further includes a file partition area and a file block area, and the method further includes:
if the file identification area does not have the file identification consistent with the current file identification, determining at least one second partition corresponding to the current file in the file partition area and at least one third partition corresponding to the current file in the file partition area;
storing at least one file fragment update information and at least one file block update information in the metadata update area; each file fragment updating information comprises an identifier of the second fragment and an association relation between the second fragment and the current file; each file block update information comprises an identifier of the third block, an association relation between the third block and a second block corresponding to the third block, data corresponding to the third block in the data area, and offset of the data corresponding to the second block corresponding to the third block in the data area;
and if the metadata updating area meets preset metadata updating conditions, writing the association relation between the second block and the current file into the second block according to the second block identification, and writing the association relation between the third block and the second block corresponding to the third block into the third block according to the third block identification.
In a specific implementation manner, the block update information of each file further includes: the method comprises the steps of marking whether first part of data of the current file corresponding to the third partition is dirty data, marking the using frequency of the first part of data of the current file corresponding to the third partition, and marking the association relation between the third partition and a fourth partition in the data area.
In a specific implementation, each metadata update information further includes an update order, the metadata update information including the file identification update information, the file fragment update information, and the file identification update information, the method further including:
and if the cache meets the preset abnormal recovery condition, sequentially updating the metadata according to the updating sequence of each piece of metadata updating information in the metadata updating area.
In a specific implementation, the method further includes:
if the file identification area has the file identification consistent with the current file identification, determining that a second partition with the residual storage space exists in a fourth partition corresponding to the data area from second partitions corresponding to the current file identification as a second partition corresponding to the current data;
and determining a target fourth partition and/or an idle fourth partition with a residual storage space from the second partition in each fourth partition corresponding to the data area, and storing the current data in the target fourth partition and/or the idle fourth partition.
In a specific implementation, the size of each third partition of the data area is any one of 8k to 64 k.
In a specific implementation, the metadata update condition includes: and the current free space of the metadata updating area is smaller than or equal to a preset free space threshold value, or the time length from the last updating moment of the metadata updating area to the current moment meets a preset updating time threshold value.
A second aspect of an embodiment of the present application provides a cache, including:
the acquisition unit is used for acquiring the current file identification of the current file;
the determining unit is used for storing the current data of the current file change in the data area, determining a first block corresponding to the current file in the file identification area based on the current storage condition of the file identification area, and recording the current file identification;
the storage unit is used for storing file identification updating information in the metadata updating area, wherein the file identification updating information comprises the identification of the first block and the current file identification;
and the writing unit is used for writing the current file identifier into the first block according to the identifier of the first block if the metadata updating area meets the preset metadata updating condition.
In a specific implementation manner, the metadata area further includes a file partition area and a file partition area, and the determining unit is further configured to determine at least one second partition corresponding to the current file in the file partition area and at least one third partition corresponding to the current file in the file partition area if the file identification area does not have a file identification consistent with the current file identification;
the storage unit is further configured to store at least one file fragment update information and at least one file block update information in the metadata update area; each file fragment updating information comprises an identifier of the second fragment and an association relation between the second fragment and the current file; each file block update information comprises an identifier of the third block, an association relation between the third block and a second block corresponding to the third block, data corresponding to the third block in the data area, and offset of the data corresponding to the second block corresponding to the third block in the data area;
the writing unit is further configured to write, if the metadata update area meets a preset metadata update condition, an association relationship between the second partition and the current file into the second partition according to the second partition identifier, and write, according to the third partition identifier, an association relationship between the third partition and a second partition corresponding to the third partition into the third partition.
In a specific implementation manner, the block update information of each file further includes: the method comprises the steps of marking whether first part of data of the current file corresponding to the third partition is dirty data, marking the using frequency of the first part of data of the current file corresponding to the third partition, and marking the association relation between the third partition and a fourth partition in the data area.
In a specific implementation manner, each piece of metadata update information further includes an update order, where the metadata update information includes the file identification update information, the file fragment update information, and the file identification update information, and the writing unit is further configured to sequentially perform metadata update according to the update order of each piece of metadata update information in the metadata update area if the cache meets a preset abnormal recovery condition.
In a specific implementation manner, the determining unit is further configured to determine, from each second partition corresponding to the current file identifier, that a second partition in which a remaining storage space exists in a fourth partition corresponding to the data area is a second partition corresponding to the current data, if the file identifier exists in the file identifier area in accordance with the current file identifier;
the determining unit is further configured to determine, from the second partitions in each fourth partition corresponding to the data area, a target fourth partition and/or an idle fourth partition in which a remaining storage space exists, and store the current data in the target fourth partition and/or the idle fourth partition.
In a specific implementation, the size of each third partition of the data area is any one of 8k to 64 k.
In a specific implementation, the metadata update condition includes: and the current free space of the metadata updating area is smaller than or equal to a preset free space threshold value, or the time length from the last updating moment of the metadata updating area to the current moment meets a preset updating time threshold value.
A third aspect of an embodiment of the present application provides a cache, including:
a central processing unit, a memory and an input/output interface;
the memory is a short-term memory or a persistent memory;
the central processor is configured to communicate with the memory and to execute instruction operations in the memory to perform the method of the first aspect.
A fourth aspect of the embodiments provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method according to the first aspect.
A fifth aspect of the embodiments of the present application provides a computer storage medium having instructions stored therein, which when executed on a computer, cause the computer to perform the method according to the first aspect.
From the above technical solutions, the embodiments of the present application have the following advantages: after the current file identifier of the current file is obtained, the current data can be directly stored in the data area. And then, after determining the first block corresponding to the current file in the file identification area, directly storing the identification of the first block and the current file identification as file identification updating information in the metadata updating area. And finally, when the metadata updating creep meets the preset metadata updating condition, writing the current file identification into the first partition of the file identification area according to the identification of the first partition. In consideration of competition pressure during writing of a large amount of metadata, when a metadata update area meets preset metadata update conditions, the metadata update is actually performed, namely, the current file identification is written into the corresponding first partition according to the identification of the first partition, so that the metadata update (namely, the file identification update) is completed, and the competition pressure during the metadata update is greatly reduced.
Drawings
Fig. 1 is a schematic flow chart of a cache storage method disclosed in an embodiment of the present application;
FIG. 2 is a diagram illustrating an exemplary structure of a cache according to an embodiment of the present disclosure;
FIG. 3 is another flow chart of a cache storage method disclosed in an embodiment of the present application;
FIG. 4 is a diagram showing an exemplary structure of a metadata update area according to an embodiment of the present application;
FIG. 5 is a diagram illustrating an exemplary structure of a file offset index according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a cache according to an embodiment of the present disclosure;
fig. 7 is another schematic structural diagram of a buffer disclosed in an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In order to better explain the technical solutions of the embodiments of the present application, the following explanation will explain some technical concepts appearing later.
Cache (cache), which is originally meant to be a high-speed memory that has faster access than a typical Random Access Memory (RAM), typically does not use DRAM technology as is the case with system main memory, but rather uses expensive but faster SRAM technology, the setting of which is one of the important factors for all modern computer systems to exert high performance.
Hybrid storage is a compromise storage solution. In particular, storing critical data on high performance flash media while storing other data on lower cost tiered storage, hybrid storage enables organizations to manage data in a unified storage system while still balancing performance and cost.
The software system needs to isolate different attention points (Concern points) through layers, so as to cope with the change of different requirements, and the change can be independently managed, for example, a hybrid storage system consisting of storage media with different performances is managed in a layered manner according to the data cold-hot separation mode of the storage system.
Logical block addresses (LBAs, logical block address) are a common mechanism used on PC data storage devices to indicate where data is located, and most commonly the device using this mechanism is a hard disk. A LBA may refer to an address of a certain data block or a data block pointed to on a certain address. In short, the LBA corresponds to a house number address that is commonly used.
The physical block address (PBA, physics block address) corresponds to the latitude and longitude used for GPS positioning with respect to the LBA. The longitude and latitude of the house address may be: east longitude 113 deg. 16'40.0621 ", north latitude 23 deg. 07' 37.6129".
The embodiment of the application provides a cache storage method and related equipment, which are used for reducing the competitive pressure during metadata updating.
Referring to fig. 1, an embodiment of the present application provides a cache storage method, which includes the following steps:
101. and obtaining the current file identification of the current file.
In order to better explain the technical solution of the embodiment of the present application, in the embodiment of the present application, each cache that needs to be written into the disk is used as current data of a corresponding current file change, and the cache storage flow of the embodiment of the present application is executed on this current data, so as to complete storage.
It will be appreciated that each cache is associated with a file in the system, that is, each cache is the change data for a file in the system. Thus, each cache has a corresponding current file identification. For example, the buffer a is the current change data of the system file user, and then the current file identifier corresponding to the buffer a is the file identifier of the system file user.
In practical application, any preset abstract algorithm can be used for generating the file identifier of the system file user so as to ensure the uniqueness of the file identifier.
102. Storing the current data of the current file change in a data area, and determining a first block corresponding to the current file in a file identification area based on the current storage condition of the file identification area, wherein the first block is used for recording the current file identification.
After determining the file identification of the current file, it can be determined to which file the current data of the current file change belongs, and the first block corresponding to the current file in the file identification area can be determined based on the current storage condition of the file identification area, where the first block is used for recording the current file identification.
Specifically, the first partition is one of a plurality of idle partitions in the file identification area, and may be determined according to the current use condition of the file identification area and a preset storage mode (such as breadth-first storage and/or compact storage).
It should be noted that, after the current data of the current file change is stored in the area corresponding to the data area, the initial storage of the current data of the change is considered to be completed, but the actual storage is completed, and it is required to wait until the corresponding metadata is stored in the corresponding metadata area to complete (for example, the file identifier is stored in the corresponding file identifier area).
103. Storing file identification update information in a metadata update area, wherein the file identification update information comprises an identification of a first partition and a current file identification.
After determining the first block storing the current file identification, the current file identification may not be written into the first block, but the identification of the first block and the current file identification are stored in the metadata update area as file identification update information, so as to avoid metadata competition pressure caused by real-time update.
104. And if the metadata updating area meets the preset metadata updating condition, writing the current file identification into the first partition according to the identification of the first partition.
Wherein the preset metadata update conditions include, but are not limited to: the current free space of the metadata update area is smaller than or equal to a preset free space threshold, or the time length from the last update time of the metadata update area to the current time meets a preset update time threshold. That is, when the free space of the metadata update area is insufficient and/or the metadata update area does not process any update information for a long time, the metadata update writing is performed.
Specifically, the processing of the update information is to write the metadata corresponding to the update information into the partition corresponding to the update information. For example, the current file identifier in the file update information is written into the first partition corresponding to the identifier of the first partition in the file update information.
In this embodiment, after the current file identifier of the current file is obtained, the current data may be directly stored in the data area. And then, after determining the first block corresponding to the current file in the file identification area, directly storing the identification of the first block and the current file identification as file identification updating information in the metadata updating area. And finally, when the metadata updating creep meets the preset metadata updating condition, writing the current file identification into the first partition of the file identification area according to the identification of the first partition. In consideration of competition pressure during writing of a large amount of metadata, when a metadata update area meets preset metadata update conditions, the metadata update is actually performed, namely, the current file identification is written into the corresponding first partition according to the identification of the first partition, so that the metadata update (namely, the file identification update) is completed, and the competition pressure during the metadata update is greatly reduced.
Furthermore, in order to implement multi-level management of metadata, the metadata area in the embodiment of the present application may further include a file partition area and a file partition area, where one partition in the file partition area and one partition in the file partition area record metadata information of data of a file in different granularity ranges respectively. For example, the size of the system file user is 4G, a corresponding one of the partitions of the file may be used to describe data information of all 4G of the system file user (e.g., PBA of the 4G), and a corresponding one of the partitions of the file may be used to describe data information of all 4k of the 4G of the system file user (e.g., LBA of the 4G). The metadata information of a specific file partition and a file partition can be updated in the following manner: if the file identification area does not have the file identification consistent with the current file identification, determining at least one second partition corresponding to the current file in the file partition area and at least one third partition corresponding to the current file in the file partition area; storing at least one file fragment update information and at least one file block update information in a metadata update area; each file fragment updating information comprises an identifier of a second fragment and an association relation between the second fragment and the current file; each file block updating information comprises an identifier of a third block, an association relation between the third block and a second block corresponding to the third block, data corresponding to the third block in a data area, and offset of data corresponding to the second block corresponding to the third block in the data area; if the metadata updating area meets the preset metadata updating condition, writing the association relation between the second partition and the current file into the second partition according to the second partition identification, and writing the third partition and the association relation between the second partition corresponding to the third partition into the third partition according to the third partition identification.
Specifically, if the file identification area does not have the file identification consistent with the current file identification, it is indicated that the cache corresponding to the current file is stored for the first time, that is, the current file is the file recorded in the cache for the first time, and then, correspondingly, the file partition area and the file partition area do not have any metadata record of the current file, so that the second partition, in which the information required to be recorded in the file partition area corresponding to the current data of the current file change should be recorded, is determined according to a preset partition update rule, and similarly, the third partition, in which the file partition area corresponds, is determined according to the preset partition update rule. And the file block updating information and the file fragment updating information are updated when the metadata updating area meets the metadata updating condition. Wherein the data range of each second block description may be 4G and the data range of each third block description may be any one of 8k to 64 k.
It can be understood that the metadata update area stores a plurality of update information, if the metadata update area meets the metadata update condition, the metadata update area is automatically and sequentially processed according to the sequence of adding each metadata update information into the metadata update area, and all update information to be processed is not required to be processed in each class. In addition, when an abnormality occurs in the disk, an abnormality recovery of the metadata update region is triggered (i.e., an abnormality recovery condition is satisfied), and at this time, metadata update should be sequentially performed according to the order in which each metadata update information is added to the metadata update region (i.e., the corresponding update order).
Furthermore, on the basis of the file partition and the file partition design, each file partition update information more specifically further includes: the method comprises the steps of marking whether first part of data of a current file corresponding to a third partition is dirty data, marking heat of using frequency of the first part of data of the current file corresponding to the third partition, and associating relation between the third partition and a fourth partition in a data area.
In other implementations, if the current file is not stored in the disk for the first time, that is, the file identifier area has a file identifier consistent with the current file identifier, the described data range and the remaining second blocks may be determined from a plurality of second blocks corresponding to the current file in the file partition area, where the remaining second blocks are used for recording metadata of the current data. And recording the current data in a corresponding fourth partition having free space in the data area, i.e., a fourth partition in which any content is not stored temporarily, or a fourth partition in which the space has been used is less than half of the maximum space, which is not particularly limited herein.
The cache storage method according to the embodiments of the present application is described below in some specific scenarios.
First, a block of SSD is formatted as a layered cache device in accordance with the layered metadata structure shown in fig. 2, and metadata and data management are performed in the manner shown in the following table.
Figure BDA0004026960160000101
/>
Figure BDA0004026960160000111
Specifically, the metadata design in the embodiment of the present application may be understood according to a file system, where the file system includes metadata (superblocks, inodes, entries, directory entries, indexes, etc.) and data (service data contents), and the invention increases a brick (corresponding to a back-end storage) space in analogy to the file system; a cache data must be stored on a disk accurately and indexable, and includes a back-end physical disk (bridge) to which the current data belongs, a current file (inode, or referred to as a file identifier), a slice location (card) to which the current data belongs, and a location (extension) of a cache device in which the current data is located, and a location (data, data area) in which the cache data is stored.
Referring to fig. 3, the following steps are performed during the write operation of the hierarchical one-time cache. For example, a file named problem list doc is created, which is assigned (with internal algorithm decisions) a back-end storage location (brick), and a unique identification (inode) of the file, such as brick-/dev/sdx, which is identified as inode-uuidx (equivalent to sangftxt). Next, a file ID (inode-uuidx) associated in the back-end device (brick-/dev/sdx) is created to store the inode data to a cache device (index inode-uuidx). After the inode is created, a card area needs to be created, an index of the inode- > card is stored, slice information of a file is included, if the file is an 8GB file, two pieces of slice index information (with a slice granularity of 4G) are generated, namely a card-001 and a card-002, respectively, and the index information is written into the SSD cache device. Finally, an extension area, i.e., a data index information (offset=0, len=4kb) is applied, and data (data) is written to the corresponding data area. It should be noted that, when updating each cache according to different actual requirements, it is not necessary to update metadata corresponding to each metadata area (file identification area, file partition area and/or file block area).
Specifically, each 4KB existence metadata region holds 512 entries, holds the mapping of LBAs and PBAs. Wherein each third partition of the file partition may include the following:
1. the mapping of PBA- > LBA, the logic offset of the business file fragment (4 GB) of the block record is marked, and the LBA is aligned according to the minimum block granularity of 8KB, and only 19bits are needed to be used for the LBA;
2. a board: storing a map of the extension- > the card, and identifying which card the extension belongs to, wherein 19bits are needed because the maximum number of the supported files is 2626144;
3. dirty: identifying whether the cache block data is dirty, i.e., whether the data is flushed back to the back-end storage device (the metadata design allows for the cache to be shared as a read-write cache);
4. bitmap: on the basis of block granularity, data is stored again in fine granularity (4 KB), so that the space utilization rate is improved, and according to the maximum block granularity of 64KB, 64KB/4 KB=16 bits are needed;
5. hot: the heat degree is marked, and the heat degree of the block is provided for the basis of elimination of a replacement algorithm;
6. reserved: space is reserved, 1 bit.
In total, a total of 8 bytes can identify the index of the logical offset of the business file to the cache data location, i.e., LBA- > PBA and the mapping of PBA- > LBA.
The Inode area, the extension area, and the board area are stored in a compact manner, and the three areas must be guaranteed for transaction, so the Journal area is dedicated to the area for journaling the above areas.
Specifically, referring to fig. 4, the entire Journal area is divided into a super block area (super) and a Journal data area (including meta and data), wherein meta is a sector size (4 k), and includes all metadata information of a transaction or batch (block number of the update [ hierarchical metadata is numbered according to 4KB order ], unique number of the current request), and the data area sequentially stores the data content of the update. First, journal space is managed using a circular queue, the head pointer is incremented when data is inserted, the tail pointer is incremented when data is written back (WAL compare), wherein the data in the range of [ tail, head ] is the data to be played back, and the [ tail, head ] drops with one IO at a time. In addition, when the process is abnormal, the loading is restarted, the largest 'seq_id' in all metadata blocks is scanned as the final valid ID, and the log valid by the 'tail, head' is played back. Specifically, asynchronous playback in the business process, and playback time is played back with thresholds such as Journal area capacity (total capacity 25%) and timing (30 min).
The following illustrates that the service triggers an inode update event (the trigger event is an increase in file size). Data is written through a journ module, which fills a head index, which is the address of a journ area; and the data content (4 KB of inode update write) written at this time shares 8KB of data, and the ID written at this time is accumulated and written to the disk at one time. If the power is suddenly turned off or the disc is pulled out at the moment, after recovery, metadata are needed to be loaded from the cache equipment again, a Journal area is needed to be loaded first, if the writing is successful, the power is turned off, effective Journal data (effective standard, namely maximum seq_id) are read, and the state after the writing is successful can be recovered; if the write is unsuccessful, the state before power down is restored. After the Journal loading succeeds, the data in the journ area needs to be written into the area corresponding to the inode, which is called playback, and the entire metadata is restored to the original state.
One cache data index contains belonging back end storage (brick), belonging file (inode), belonging fragment (shard) and belonging SSD position (extension), and each cache data is a set of (ssd_id, brick_id, inode_id, shard_id and extension_id);
in practical application, a cache flow is searched as follows: first, the SSD device ID is looked up through the configuration file, looking up the SSD memory structure to which the data belongs (multiple SSD cache devices may exist for the same host). Next, the brick structure is looked up in the SSD index table (the looked up tag is brickid, the unique identifier stored at the back end, where the index table may be a hash table). The inode structure is then looked up in a brick index table (the file to which the cached data belongs, where the index table may be a hash table). Then, the standard structure is searched in the inode index (each file can split multiple fragments, and the index table is a hash table). Finally, in the card index structure, the physical disk storage location of the data is stored in the content metadata by the logical offset index content metadata of the service request, so that the cache data can be read (the index table may be a red black tree). The file offset index may be determined by the LBA to PBA index, referring to fig. 5, the standard, the extension, and the inode file index relationship.
Referring to fig. 6, an embodiment of the present application provides a cache, where the cache includes a metadata area and a data area, the metadata area includes a file identification area and a metadata update area, and the cache further includes:
an obtaining unit 601, configured to obtain a current file identifier of a current file;
a determining unit 602, configured to store current data of a current file change in a data area, and determine a first partition corresponding to the current file in the file identification area based on a current storage condition of the file identification area, where the first partition is used to record a current file identification;
a storage unit 603, configured to store file identification update information in a metadata update area, where the file identification update information includes an identification of a first partition and a current file identification;
and the writing unit 604 is configured to write the current file identifier into the first partition according to the identifier of the first partition if the metadata update area meets a preset metadata update condition.
In a specific implementation manner, the metadata area further includes a file partition area and a file block area, and the determining unit 602 is further configured to determine at least one second block corresponding to the current file in the file partition area and at least one third block corresponding to the current file in the file partition area if the file identifier area does not have a file identifier consistent with the current file identifier;
the storage unit 603 is further configured to store at least one file fragment update information and at least one file block update information in the metadata update area; each file fragment updating information comprises an identifier of a second fragment and an association relation between the second fragment and the current file; each file block updating information comprises an identifier of a third block, an association relation between the third block and a second block corresponding to the third block, data corresponding to the third block in a data area, and offset of data corresponding to the second block corresponding to the third block in the data area;
the writing unit 604 is further configured to, if the metadata update area meets a preset metadata update condition, write the association relationship between the second partition and the current file into the second partition according to the second partition identifier, and write the third partition and the association relationship between the second partition corresponding to the third partition into the third partition according to the third partition identifier.
In a specific implementation, each file block update information further includes: the method comprises the steps of marking whether first part of data of a current file corresponding to a third partition is dirty data, marking heat of using frequency of the first part of data of the current file corresponding to the third partition, and associating relation between the third partition and a fourth partition in a data area.
In a specific implementation manner, each piece of metadata update information further includes an update order, where the metadata update information includes file identification update information, file fragment update information, and file identification update information, and the writing unit 604 is further configured to sequentially perform metadata update according to the update order of each piece of metadata update information in the metadata update area if the cache meets a preset abnormal recovery condition.
In a specific implementation manner, the determining unit 602 is further configured to determine, from each second partition corresponding to the current file identifier, that a second partition in which a remaining storage space exists in a fourth partition corresponding to the data area is a second partition corresponding to the current data, if the file identifier area has a file identifier consistent with the current file identifier;
the determining unit 602 is further configured to determine, from the second partitions in each fourth partition corresponding to the data area, a target fourth partition and/or an idle fourth partition in which the remaining storage space exists, and store the current data in the target fourth partition and/or the idle fourth partition.
In one specific implementation, each third partition of the data region has a size of any one of 8k to 64 k.
In one specific implementation, the metadata update condition includes: the current free space of the metadata update area is smaller than or equal to a preset free space threshold, or the time length from the last update time of the metadata update area to the current time meets a preset update time threshold.
Fig. 7 is a schematic diagram of a cache structure provided in an embodiment of the present application, where the cache 700 may include one or more central processing units (central processing units, CPU) 701 and a memory 705, where the memory 705 stores one or more application programs or data.
Wherein the memory 705 may be volatile storage or persistent storage. The program stored in the memory 705 may include one or more modules, each of which may include a series of instruction operations in a cache. Still further, the central processor 701 may be configured to communicate with the memory 705 and execute a series of instruction operations in the memory 705 on the cache 700.
The cache 700 may also include one or more power supplies 702, one or more wired or wireless network interfaces 703, one or more input/output interfaces 704, and/or one or more operating systems, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.
The cpu 701 may perform the operations performed by the cache in the embodiments shown in fig. 1 to 6, and detailed descriptions thereof are omitted herein.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM, random access memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Embodiments of the present application also provide a computer program product containing instructions that, when executed on a computer, cause the computer to perform a cache storage method as described above.

Claims (10)

1. A cache storage method, wherein the cache includes a metadata area and a data area, the metadata area includes a file identification area and a metadata update area, the method comprising:
acquiring a current file identifier of a current file;
storing the current data of the current file change in the data area, and determining a first block corresponding to the current file in the file identification area based on the current storage condition of the file identification area, wherein the first block is used for recording the current file identification;
storing file identification updating information in the metadata updating area, wherein the file identification updating information comprises the identification of the first block and the current file identification;
and if the metadata updating area meets the preset metadata updating condition, writing the current file identification into the first partition according to the identification of the first partition.
2. The method of claim 1, wherein the metadata area further comprises a file partition and a file chunk area, the method further comprising:
if the file identification area does not have the file identification consistent with the current file identification, determining at least one second partition corresponding to the current file in the file partition area and at least one third partition corresponding to the current file in the file partition area;
storing at least one file fragment update information and at least one file block update information in the metadata update area; each file fragment updating information comprises an identifier of the second fragment and an association relation between the second fragment and the current file; each file block update information comprises an identifier of the third block, an association relation between the third block and a second block corresponding to the third block, data corresponding to the third block in the data area, and offset of the data corresponding to the second block corresponding to the third block in the data area;
and if the metadata updating area meets preset metadata updating conditions, writing the association relation between the second block and the current file into the second block according to the second block identification, and writing the association relation between the third block and the second block corresponding to the third block into the third block according to the third block identification.
3. The method of claim 2, wherein each file block update information further comprises: the method comprises the steps of marking whether first part of data of the current file corresponding to the third partition is dirty data, marking the using frequency of the first part of data of the current file corresponding to the third partition, and marking the association relation between the third partition and a fourth partition in the data area.
4. A method according to claim 1 or 3, wherein each metadata update information further comprises an update order, the metadata update information comprising the file identification update information, the file fragment update information, and the file identification update information, the method further comprising:
and if the cache meets the preset abnormal recovery condition, sequentially updating the metadata according to the updating sequence of each piece of metadata updating information in the metadata updating area.
5. The method according to claim 1, wherein the method further comprises:
if the file identification area has the file identification consistent with the current file identification, determining that a second partition with the residual storage space exists in a fourth partition corresponding to the data area from second partitions corresponding to the current file identification as a second partition corresponding to the current data;
and determining a target fourth partition and/or an idle fourth partition with a residual storage space from the second partition in each fourth partition corresponding to the data area, and storing the current data in the target fourth partition and/or the idle fourth partition.
6. A method according to any one of claims 2 to 3, wherein the size of each third partition of the data area is any one of 8k to 64 k.
7. The method of claim 1, wherein the metadata update condition comprises: and the current free space of the metadata updating area is smaller than or equal to a preset free space threshold value, or the time length from the last updating moment of the metadata updating area to the current moment meets a preset updating time threshold value.
8. A cache comprising a metadata area and a data area, the metadata area comprising a file identification area and a metadata update area, the cache further comprising:
the acquisition unit is used for acquiring the current file identification of the current file;
the determining unit is used for storing the current data of the current file change in the data area, determining a first block corresponding to the current file in the file identification area based on the current storage condition of the file identification area, and recording the current file identification;
the storage unit is used for storing file identification updating information in the metadata updating area, wherein the file identification updating information comprises the identification of the first block and the current file identification;
and the writing unit is used for writing the current file identifier into the first block according to the identifier of the first block if the metadata updating area meets the preset metadata updating condition.
9. A cache, comprising:
a central processing unit, a memory and an input/output interface;
the memory is a short-term memory or a persistent memory;
the central processor is configured to communicate with the memory and to execute instruction operations in the memory to perform the method of any of claims 1 to 7.
10. A computer storage medium having instructions stored therein, which when executed on a computer, cause the computer to perform the method of any of claims 1 to 7.
CN202211717784.0A 2022-12-29 2022-12-29 Cache storage method and related equipment Pending CN116185949A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211717784.0A CN116185949A (en) 2022-12-29 2022-12-29 Cache storage method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211717784.0A CN116185949A (en) 2022-12-29 2022-12-29 Cache storage method and related equipment

Publications (1)

Publication Number Publication Date
CN116185949A true CN116185949A (en) 2023-05-30

Family

ID=86441523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211717784.0A Pending CN116185949A (en) 2022-12-29 2022-12-29 Cache storage method and related equipment

Country Status (1)

Country Link
CN (1) CN116185949A (en)

Similar Documents

Publication Publication Date Title
US11068455B2 (en) Mapper tree with super leaf nodes
US11010300B2 (en) Optimized record lookups
CN108459826B (en) Method and device for processing IO (input/output) request
US9767140B2 (en) Deduplicating storage with enhanced frequent-block detection
US9146877B2 (en) Storage system capable of managing a plurality of snapshot families and method of snapshot family based read
CN105843551B (en) Data integrity and loss resistance in high performance and large capacity storage deduplication
US11580162B2 (en) Key value append
WO2017113213A1 (en) Method and device for processing access request, and computer system
EP3316150B1 (en) Method and apparatus for file compaction in key-value storage system
CN107329692B (en) Data deduplication method and storage device
KR20150121703A (en) Methods and systems for storing and retrieving data
CN108733306B (en) File merging method and device
US11169968B2 (en) Region-integrated data deduplication implementing a multi-lifetime duplicate finder
CN113535670B (en) Virtual resource mirror image storage system and implementation method thereof
EP3385846B1 (en) Method and device for processing access request, and computer system
KR20230026946A (en) Key value storage device with hashing
CN106095331B (en) Control method for internal resources of fixed large file
CN109407985B (en) Data management method and related device
US11860840B2 (en) Update of deduplication fingerprint index in a cache memory
CN113535092B (en) Storage engine, method and readable medium for reducing memory metadata
CN116185949A (en) Cache storage method and related equipment
CN112597074B (en) Data processing method and device
CN109165172B (en) Cache data processing method and related equipment
US10558618B1 (en) Metadata compression
CN117453632B (en) Data storage method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination