CN113867627B - Storage system performance optimization method and system - Google Patents

Storage system performance optimization method and system Download PDF

Info

Publication number
CN113867627B
CN113867627B CN202110999485.XA CN202110999485A CN113867627B CN 113867627 B CN113867627 B CN 113867627B CN 202110999485 A CN202110999485 A CN 202110999485A CN 113867627 B CN113867627 B CN 113867627B
Authority
CN
China
Prior art keywords
data
hash value
physical address
metadata
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110999485.XA
Other languages
Chinese (zh)
Other versions
CN113867627A (en
Inventor
甄凤远
刘志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202110999485.XA priority Critical patent/CN113867627B/en
Publication of CN113867627A publication Critical patent/CN113867627A/en
Application granted granted Critical
Publication of CN113867627B publication Critical patent/CN113867627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A storage system performance optimization method and system, the method includes: receiving write-in data and a logic address thereof, and calculating a hash value of the write-in data; verifying whether the written data is new data according to the hash value; responding to the written data as new data, writing the written data into a disk and acquiring a physical address of the written data on the disk; and forming the logical address and the hash value of the written data into metadata and writing the metadata into a corresponding table. By the method and the system provided by the invention, original LP, PL and HP are modified into LH, HP and PL, and a mechanism of deleting data repeatedly is changed from keeping old data to forbidding writing new data into continuously writing new data, and the old data is deleted according to the value of the reference count. The ordering of the PL table is maintained, and the PL can be increased in balance through a certain tree-dividing strategy. The insertion and querying of PL no longer requires additional overflow page handling, reducing overhead.

Description

Storage system performance optimization method and system
Technical Field
The invention belongs to the field of computer storage, and particularly relates to a storage system performance optimization method and system.
Background
In order to improve the overall utilization rate of a storage system, the current full flash memory supports the function of online repeated data deletion, and more host data can be stored on the premise of the same capacity by reducing the storage of repeated data.
In order to support advanced storage characteristics such as deduplication, most storage systems use a global data write-in method, that is, newly written data is always sequentially written in sequence on a disk. In order to support the deduplication property, metadata (metadata) needs to be introduced into the full flash memory, which generally comprises LP (mapping from logical address to physical address), PL (mapping from physical address to logical address) and HP (mapping from hash value of data block to physical address), so that when one data block is issued, firstly, a hash fingerprint value is calculated through the data block, if HP does not exist, one data is newly written into the physical address P, when LP, PL and HP metadata are issued, and finally, the three metadata are all persisted in a manner of falling onto a disk; if the newly written data query has HP, then no more new data is written, only LP and PL. The high probability of P in PL is discontinuous at high erasure rates, because multiple HP hits result in situations where load imbalance occurs during the multi-threaded downshifting, resulting in performance degradation; in addition, in the case of high erasure rate, PL may have multiple identical ps corresponding to multiple ls, so that it is difficult for b+tree to cope with this situation, and extra overflow page processing may occur.
Thus, a new solution is needed to address this problem.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method and a system for optimizing performance of a storage system, where the method includes:
receiving write-in data and a logic address thereof, and calculating a hash value of the write-in data;
verifying whether the written data is new data according to the hash value;
responding to the written data as new data, writing the written data into a disk and acquiring a physical address of the written data on the disk;
and forming the logical address and the hash value of the written data into metadata and writing the metadata into a corresponding table.
In some embodiments of the present invention, composing the logical address and the hash value of the write data into metadata and writing into a corresponding table includes:
forming the logical address and the physical address of the written data into LH parameters of metadata and storing the LH parameters into an LH table;
forming the hash value and the physical address of the written data into HP parameters of metadata and storing the HP parameters into an HP table; and
and forming the physical address and the logical address of the write data into PL parameters of metadata and storing the PL parameters into a PL table.
In some embodiments of the invention, composing the hash value and physical address of the write data into the HP parameter of the metadata and storing the HP parameter into the HP table comprises:
and combining the number of times of writing data into a disk as a reference count at the tail end of the physical address and the hash value to form an HP parameter.
In some embodiments of the invention, verifying whether the written data is new data based on the hash value comprises:
searching in an HP table according to the hash value;
responding to the hash value and the physical address corresponding to the hash value in the HP table, and enabling the written data to be old data;
and in response to the hash value and the physical address corresponding to the hash value do not exist in the HP table, the written data are new data.
In some embodiments of the invention, the method further comprises:
responding to the writing data as old data, writing the writing data into a disk and acquiring a new physical address of the writing data on the disk; and
obtaining a reference count of an old physical address end of the write data from the HP table;
adding and merging the reference count value to the end of the new physical address, and simultaneously forming a new HP parameter with the hash value of the written data and storing the new HP parameter into an HP table; and
storing the old physical address into a corresponding garbage collection table, and cleaning corresponding data in the garbage collection table at preset time intervals according to the physical address in the garbage collection table.
In some embodiments of the invention, the method further comprises:
and returning a data writing success signal to the host computer in response to completion of the LH parameter storing in the LH table and the HP parameter storing in the HP table.
In some embodiments of the invention, the method further comprises:
responding to a data query request initiated by a host end, and analyzing a logic address in the query request;
according to the logic address in the data query request, searching a hash value corresponding to the logic address in an LH table; and
searching a physical address corresponding to the hash value in an HP table according to the hash value;
and acquiring the physical address, and returning the data content of the physical address in the data space corresponding to the disk to the host.
In some embodiments of the invention, the method further comprises:
the number of times of writing data into a disk is used as a reference count;
and forming a reference count parameter HN according to the hash value of the written data and the reference count, and storing the reference count parameter HN into a reference count HN table.
In some embodiments of the invention, the method further comprises:
and responding to the host side to initiate a data deleting request, and analyzing a logic address in the request. The method comprises the steps of carrying out a first treatment on the surface of the
According to the logical address in the data deleting request, searching a hash value corresponding to the logical address in an LH table, and searching a reference count value corresponding to the hash value in the reference count HN table;
in response to the value of the reference count being other than 0, subtracting one from the value of the reference count corresponding to the hash value;
and searching a physical address corresponding to the hash value in an HP table according to the hash value in response to the value of the reference count being 0, and deleting data in a disk corresponding to the physical address.
Another aspect of the present invention also provides a storage system performance optimization system, including:
the receiving module is configured to receive the writing data and the logic address thereof and calculate the hash value of the writing data;
the verification module is configured to verify whether the written data is new data according to the hash value;
the processing module is configured to respond to the written data as new data, write the written data into a disk and acquire a physical address of the written data on the disk;
and the execution module forms the logical address and the hash value of the written data into metadata and writes the metadata into a corresponding table.
The invention provides a storage system performance optimization method and system, which changes original LP, PL and HP into LH, HP and PL, changes the mechanism of data re-deleting from the mechanism of keeping old data to prohibit writing new data into the mechanism of continuously writing new data and deletes the old data according to the value of the reference count. The ordering of the PL table is maintained, and the PL can be increased in balance through a certain tree-dividing strategy. And the insertion and inquiry of the PL no longer need extra overflow page processing, so that the system overhead is reduced, and the data volume of the lower disc is increased to a certain extent, but because the stripe needs to be made when the raid is written, unnecessary write amplification can be generated if the stripe is too small, and therefore, the performance influence caused by writing data again each time is not too great.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart illustrating one embodiment of a method for optimizing performance of a storage system in accordance with the present invention;
FIG. 2 is a flow chart of one embodiment of a method for optimizing performance of a storage system according to the present invention;
FIG. 3 is a flow chart of one embodiment of a method for optimizing performance of a storage system according to the present invention;
FIG. 4 is a flow chart of one embodiment of a method for optimizing performance of a storage system according to the present invention;
FIG. 5 is a flow chart of one embodiment of a method for optimizing performance of a storage system according to the present invention;
FIG. 6 is a flow chart of one embodiment of a method for optimizing performance of a storage system according to the present invention;
FIG. 7 is a system architecture diagram of a storage system performance optimization system according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
In a first aspect of the embodiment of the present invention, an embodiment of a method for optimizing performance of a storage system is provided, and fig. 1 is a flowchart of an embodiment of a method for optimizing performance of a storage system provided by the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:
step S101, receiving write-in data and a logic address thereof, and calculating a hash value of the write-in data;
step S102, verifying whether the written data is new data according to the hash value;
step S103, responding to the written data as new data, writing the written data into a disk and acquiring a physical address of the written data on the disk;
and step S104, forming the logical address and the hash value of the written data into metadata and writing the metadata into a corresponding table.
The invention aims to solve the problem of data management in a large storage system with a solid state disk as a main storage medium. In the prior art, based on the special mode (compared with a mechanical hard disk) that the logical address of the stored data of the solid state hard disk corresponds to the physical address, the effect of data deletion (repeated data deletion, data is not repeated in theory in the solid state hard disk) can be realized on the storage layer, namely, the mapping of the storage of the data can be realized and the retention of the uniqueness of the data can be realized through the binding of the logical address or the physical address of the data and the characteristic value (hash value) of the data.
This technology is implemented because a data processing technology of Metadata (Metadata), also called intermediate data, relay data, is introduced, which is not data itself or a basis of data, but data describing data, and a storage area is a set of data composed of a plurality of key-value pairs describing an actual location (physical address) of the stored data and an identification or index (logical address) of the data.
Specifically, metadata generally includes three parameters: LP, L denotes the logical address of the data and P denotes the physical address of the data on disk. The logical address is an identification of the data by the operating system or some application on the host. The physical address is a storage unit address of a hardware layer of the solid state disk. LP represents a key-value pair composed of a logical address L as a key and a physical address P as a value; in addition to this, there is PL, i.e., a key-value pair composed of a physical address P as a key and a logical address L as a value; the key parameters HP and H for solving the problem of repeated data storage are used for representing hash values of the data, namely key value pairs which are formed by taking the hash values of the data as keys and taking physical addresses of the data as values.
Therefore, when writing data in solid state, besides storing the data to the corresponding physical address on the disk, the writing or inserting of the three metadata is realized, wherein LP, PL and HP in the metadata are all realized in a data structure of b+tree, and the corresponding metadata can be searched very fast.
As described above, in the prior art, when new data is stored, it is necessary to perform hash calculation on the data to obtain a hash value of the data, and query the hash value in the HP table, if the hash value H of a certain HP exists in the HP table and the hash value of the data to be written is the same, which means that the data has been stored on the disk, according to the deduplication mechanism, writing of the data will be prevented, and the P of the HP key value pair found in the HP table is used as the physical address of the data to be written to form the LP metadata of the written data, that is, the metadata is constructed for the written data: LP, PL and HP are stored in corresponding LP, PL and HP tables.
However, in some cases, the written data is repeated in large quantities, and according to the above-described erasure mechanism, hash values of the written data in large quantities hit corresponding HP data (key value pair) in the HP table, resulting in that the physical address P always uses the physical address of the old data, the logical address L always increases when constructing PL of metadata, the physical address P corresponds to a plurality of logical addresses L, and the physical address in the PL table is highly likely to be discontinuous. And with the increase of data, the record of the PL table is always increased, the change of the data in the HP table is less due to a large amount of repeated deleted data, so that the searching speed of the PL table is gradually lower than that of the HP table, and further, the problem of unbalanced load when the multithreading executes a storage task is caused, namely, the query of the HP is completed, and the efficiency is reduced and the process of incomplete query is still caused by the malformation of the data structure in the PL table. Ultimately affecting overall data storage performance.
Accordingly, to solve the above problems, the present invention makes structural changes to existing metadata, changing LP, PL and HP to LH, HP and PL. The method comprises the following steps:
in step S101, the storage system receives a write data request from the host, where the write data request includes write data and a corresponding logical address, where the logical address is an identifier of the write data issued by a corresponding application or system on an operating system of the host. After the storage system receives the writing data, hash operation is carried out on the writing data to obtain a hash value of the writing data.
After the hash value of the written data is calculated in step S101 in step S102, it is necessary to search in the HP table through the hash value to verify whether the written data is new data;
in step S103, if it is verified in step S102 that the write data is new data, the write data is written to the disk, and the physical address of the write data on the disk is acquired from the corresponding writing tool;
in step S104, the logical address and hash value of the write data are used as key value pair to constitute the parameter LH of the metadata, that is, LH: { "L": "H" }, in a practical embodiment of the storage system is typically { "0x184499d5f4 bce": "e10adc3949ba59abbe56e057f20f883e" } and inserts the metadata parameter LH of the write data into the corresponding LH table.
In some embodiments of the present invention, composing the logical address and hash value of the write data into metadata and writing into a corresponding table includes:
forming the logical address and the physical address of the written data into LH parameters of metadata and storing the LH parameters into an LH table;
forming the hash value and the physical address of the written data into HP parameters of metadata and storing the HP parameters into an HP table; and
and forming the physical address and the logical address of the write data into PL parameters of metadata and storing the PL parameters into a PL table.
In this embodiment, metadata for describing or indexing the write data is constructed according to the physical address, the logical address and the hash value of the write data, so as to establish a mapping between the host and the real physical address of the data, specifically, the logical address and the physical address of the write data form LH parameters of the metadata and are stored in an LH table, and a mapping relationship between the logical address and the hash value is established, and since the hash value is obtained by calculating the write data and has uniqueness, the uniqueness of the logical address to the write data can be ensured by an LH method; the hash value of the written data and the physical address of the written data form the HP parameter of the metadata, and the hash value of the written data and the physical address are mapped to realize the uniqueness of the data during storage, namely, repeated data cannot exist in a disk, namely, the condition that one hash value H corresponds to a plurality of physical addresses P cannot occur; the physical address and the logical address of the written data are combined into PL parameters of the metadata, so that the written physical address P is always synchronously updated with the logical address of the newly written data.
In some embodiments of the invention, composing the hash value and physical address of the write data into the HP parameter of the metadata and storing the HP parameter into the HP table comprises:
and combining the number of times of writing data into a disk as a reference count at the tail end of the physical address and the hash value to form an HP parameter.
In this embodiment, there is also a problem in the inventive concept of continuing writing data to make physical addresses continuous according to the present invention: that is, if the newly written data is already stored in the disk, the new written data is sequentially written to the disk (to ensure continuity of P) according to the write strategy of the present invention, and in the metadata, the hash value of the written data is updated at the physical address P in the HP table. There is a case where data is deleted by mistake.
Specifically, if at a certain moment of the storage system, the host initiates a data writing request to the storage system, and sends a logical address L1 corresponding to one data M and M with a data size of 5MB to the storage system. The storage system carries out hash calculation on the data M to obtain a hash value H, and searches the HP table according to the hash value H to find that the data does not exist in the HP table, then stores the data M into a place with a physical address P1 in a disk, and establishes metadata LH, HP and PL of the data M. After a certain time, the storage system sends a data writing request, the data in the request is still M with the data size of 5MB, the logic address is L2, the hash value H is obtained again after hash calculation, and the stored HP is searched in the HP table according to the H and hit, namely the data disk is stored. At this time, the method according to the present invention does not stop writing data, but continues writing data, and returns the written new physical address P2 to replace the physical address P1 with the physical address P2 in the HP parameter update HP, and updates the PL table to maintain the continuity of the PL table. If at this time, the host issues a delete data request and requests to delete data with logical address L1 (when deleting data, by deleting the space at the physical address corresponding to the logical address, instead of providing one data). However, in the above-described procedure, the corresponding hash value of L1 in the LH table and the corresponding hash value of L2 are both the hash value H of the data M, and the physical address found when the physical address of the corresponding data M is found in the HP table according to H is P2 because P1 has been replaced by P2. Therefore, if P2 is deleted at this time, the data corresponding to the logical address L2 is invalidated (deleted).
Therefore, in order to solve this problem, in the present embodiment, when writing data, in constructing metadata HP corresponding to the write data for the write data, a reference count to which the write data is added is incremented at the end of a physical address P, if the write data is stored for the first time on a disk (the corresponding HP is not found in the HP table by H), the reference count of the write data is set to 1, if the write data is not stored for the first time on the disk, the value of the reference count at the end of the physical address in the corresponding HP is acquired, and the increment of the reference count is updated to the corresponding HP.
In some embodiments of the present invention, the merging of the reference count and the physical address is to convert the values of the physical address and the reference count into corresponding strings for merging, and then convert the data into integer for digital logical addition and subtraction when the reference count is used.
In some embodiments of the present invention, the reference count is merged with the physical address in the form of a data bitmap, which is the bitmap to which the physical address corresponds with the reference count.
As shown in fig. 2, in some embodiments of the present invention, verifying whether the written data is new data according to the hash value includes:
step S401, searching in an HP table according to the hash value;
step S402, responding to the existence of the hash value and the physical address corresponding to the hash value in the HP table, and enabling the written data to be old data;
step S403, in response to the hash value and the physical address corresponding to the hash value not existing in the HP table, the written data is new data.
In this embodiment, in the verification process of whether the write data is new write data according to the present invention, the hash value of the write data is searched in the HP table of the metadata, as described above, in the HP table under the present invention, the number of physical addresses P corresponding to the hash value H is only 1, and the corresponding physical addresses P and the reference counts thereof are updated continuously along with the change of the repeated write data. Therefore, based on the storage mechanism, the steps in verifying whether the data is new data are as follows:
in step S401, the storage system calls a corresponding query API to retrieve the data in the HP, and transfers the hash value of the written data to the API interface. Note that, in the present invention, the HP, LH and PL tables are all stored in the memory, and may be a non-relational database, or a b+tree data structure implemented based on the memory, so that it is possible to respond very quickly when looking up the metadata table such as HP, since in the case of changing the metadata to LH, HP and PL in the present invention, the key L in the key H, LH table in the HP and the key P in the PL table of the metadata table are all continuous. Therefore, in step S401, only the corresponding API needs to be called, whether the HP table has the HP key value pair with the same hash value as the written data can be quickly searched.
In step 402, if 1 set of key pairs HP are included in the results returned from the API, the written data is regarded as old data, i.e. already existing data.
In step S403, if the result returned from the API is None or null, the written data is regarded as new data, i.e., the corresponding data is not stored in the disk.
As shown in fig. 3, in some embodiments of the invention, the method further comprises:
step S501, responding to the writing data as old data, writing the writing data into a disk and obtaining a new physical address of the writing data on the disk; and
step S502, obtaining a reference count of the end of the old physical address of the writing data from the HP table;
step S503, adding and merging the reference count value to the end of the new physical address and simultaneously forming a new HP parameter with the hash value of the written data, and storing the new HP parameter into an HP table; and
and step S504, storing the old physical address into a corresponding garbage collection table, and cleaning corresponding data in the garbage collection table at preset time intervals according to the physical address in the garbage collection table.
In this embodiment, in step S501, if the verified write data is old data, the write data is further stored in the disk and the physical address of the write data on the disk is obtained through the corresponding API;
in step S502, the reference count is separated from the physical address according to the merging mode of the reference count and the physical address, and if the mode of the reference count and the physical address is merged in a character string mode, the reference count is separated and converted into integer data by a corresponding character string segmentation method; if the reference count is merged by the bitmap data, the corresponding bitmap data is obtained by querying the corresponding bitmap data method through the reference count after the physical address.
In step S503, the acquired reference count is added by 1 and combined with the new physical address, and the combined new physical address and reference count are used as the value of the metadata HP of the write data, and the hash value of the write data is stored in the HP table again as the key of the metadata HP.
In step 504, since the physical address of the HP in the HP table is changed during this time, and the data pointed to by the old physical address and the same data as the written data still exist in the disk, the data of the physical storage space executed by the old physical address is purged to improve the space utilization. Garbage data may be purged by the storage system every 30 seconds by placing the old physical address in the corresponding garbage collection table.
In some embodiments of the invention, the method further comprises:
and returning a data writing success signal to the host computer in response to completion of the LH parameter storing in the LH table and the HP parameter storing in the HP table.
In this embodiment, the metadata has three data parameters, that is, LH, HP, and PL, and different threads are executing about the operations of LH, HP, and PL, in practice, the time when the three threads process the same set of data may be different due to continuous writing of data, in order to increase the response speed of the storage system, after LH and HP are stored in the corresponding LH table and HP table, respectively, the storage system immediately sends a response of successful writing to the host. To reduce the latency of the host. And the storage performance between the storage system and the host is improved.
As shown in fig. 4, in some embodiments of the invention, the method further comprises:
step S701, responding to a data query request initiated by a host end, and analyzing a logic address in the query request;
step S702, searching a hash value corresponding to the logical address in an LH table according to the logical address in the data query request; and
step S703, searching a physical address corresponding to the hash value in an HP table according to the hash value;
step S704, the physical address is obtained, and the data content of the physical address in the corresponding data space on the disk is returned to the host side.
In this embodiment, on the basis of changing the metadata structure according to the present invention, the following steps are implemented when accessing data in a disk, specifically:
in step S701, the storage system and the host keep communicating in real time, and when the host issues a query request, a logical address in the query light is obtained;
in step S702, the corresponding API is called to find the hash value corresponding to the logical address in the LH table, using the logical address in the query request as a parameter. If the data of the corresponding logical address and hash value cannot be found, error information is returned to the host.
In step S703, if a hash value corresponding to the logical address is found, then a physical address corresponding to the hash value is found in the HP table according to the hash value;
in step S704, if the corresponding physical address is acquired in step S703, the data of the storage space pointed to by the physical address is returned to the host according to the physical address.
As shown in fig. 5, in some embodiments of the invention, the method further comprises:
step S801, the number of times of writing data into a disk is used as a reference count;
step S802, according to the hash value of the written data and the reference count, forming a reference count parameter HN and storing the reference count parameter HN into a reference count HN table.
In this embodiment, in order to provide a statistical job of the data erasure rate again in the case of preventing erroneous erasure at the time of data erasure, the reference count and the hash value are bound and stored in the corresponding data table. The method comprises the steps of taking the storage times of written data as reference counts, taking hash values of the written data as keys, taking the reference counts corresponding to the written data as values to form a hash value and reference count key value pair HN, establishing a reference count table HN for the HN, and writing HN of different written data into the HN table.
As shown in fig. 6, in some embodiments of the invention, the method further comprises:
step S901, responding to a request of deleting data initiated by a host end, and analyzing a logic address in the request;
step S902, according to the logical address in the data deleting request, searching a hash value corresponding to the logical address in an LH table, and searching a reference count value corresponding to the hash value in the reference count HN table;
step S903, in response to the value of the reference count not being 0, subtracting one from the value of the reference count corresponding to the hash value;
and step S904, responding to the value of the reference count being 0, searching a physical address corresponding to the hash value in an HP table according to the hash value, and deleting data in a disk corresponding to the physical address.
In this embodiment, in step S901, after the storage system receives a request for deleting data from the host, the storage system analyzes the data to obtain the logical address of the data to be deleted in the request.
In step S902, the key value pair of the logical address and the hash value stored in the LH table is queried according to the logical address call, the corresponding hash value of the logical address is obtained, the value of the reference count corresponding to the hash value in the HN table is further queried,
in step S903, it is determined whether the value of the reference count corresponding to the hash value satisfies the deletion condition, and if the value of the reference count corresponding to the hash value is not 0, the value of the reference count is decremented by one and updated to the HN table.
In step S904, if the value of the reference count corresponding to the hash value is 0, the data corresponding to the current hash value only corresponds to one logical address, and when the host deletes the data corresponding to the logical address, the data can be directly cleared.
As shown in fig. 7, another aspect of the present invention further proposes a storage system performance optimization system, including:
a receiving module 1, wherein the receiving module 1 is configured to receive write data and a logic address thereof, and calculate a hash value of the write data;
a verification module 2, wherein the verification module 2 is configured to verify whether the written data is new data according to the hash value;
a processing module 3, where the processing module 3 is configured to respond to the written data as new data, write the written data into a disk, and obtain a physical address of the written data on the disk;
and the execution module 4 is used for forming metadata by the logical address and the hash value of the written data and writing the metadata into a corresponding table.
The invention provides a storage system performance optimization method and system, which changes original LP, PL and HP into LH, HP and PL, and changes the mechanism of data re-deleting from the mechanism of keeping old data to prohibit writing new data into the mechanism of continuously deleting the old data according to the numerical value of the reference count. The ordering of the PL table is maintained, and the PL can be increased in balance through a certain tree-dividing strategy. And the insertion and inquiry of the PL no longer need extra overflow page processing, so that the system overhead is reduced, and the data volume of the lower disc is increased to a certain extent, but because the stripe needs to be made when the raid is written, unnecessary write amplification can be generated if the stripe is too small, and therefore, the performance influence caused by writing data again each time is not too great.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims (9)

1. A method for optimizing performance of a storage system, comprising:
receiving write-in data and a logic address thereof, and calculating a hash value of the write-in data;
verifying whether the written data is new data according to the hash value;
responding to the written data as new data, writing the written data into a disk and acquiring a physical address of the written data on the disk;
forming metadata by the logic address and the hash value of the written data and writing the metadata into a corresponding table;
the composing the logical address and the hash value of the written data into metadata and writing the metadata into a corresponding table comprises the following steps:
forming the logical address and the physical address of the written data into LH parameters of metadata and storing the LH parameters into an LH table;
forming the hash value and the physical address of the written data into HP parameters of metadata and storing the HP parameters into an HP table; and
and forming the physical address and the logical address of the write data into PL parameters of metadata and storing the PL parameters into a PL table.
2. The method of claim 1, wherein said assembling the hash value and physical address of the write data into the HP parameters of the metadata and storing the HP parameters in the HP table comprises:
and combining the number of times of writing data into a disk as a reference count at the tail end of the physical address and the hash value to form an HP parameter.
3. The method of claim 2, wherein verifying whether the written data is new data based on the hash value comprises:
searching in an HP table according to the hash value;
responding to the hash value and the physical address corresponding to the hash value in the HP table, and enabling the written data to be old data;
and in response to the hash value and the physical address corresponding to the hash value do not exist in the HP table, the written data are new data.
4. A method according to claim 3, further comprising:
responding to the writing data as old data, writing the writing data into a disk and acquiring a new physical address of the writing data on the disk; and
obtaining a reference count of an old physical address end of the write data from the HP table;
adding and merging the reference count value to the end of the new physical address, and simultaneously forming a new HP parameter with the hash value of the written data and storing the new HP parameter into an HP table; and
storing the old physical address into a corresponding garbage collection table, and cleaning corresponding data in the garbage collection table at preset time intervals according to the physical address in the garbage collection table.
5. The method according to claim 4, wherein the method further comprises:
and returning a data writing success signal to the host computer in response to completion of the LH parameter storing in the LH table and the HP parameter storing in the HP table.
6. The method as recited in claim 1, further comprising:
responding to a data query request initiated by a host end, and analyzing a logic address in the query request;
according to the logic address in the data query request, searching a hash value corresponding to the logic address in an LH table; and
searching a physical address corresponding to the hash value in an HP table according to the hash value;
and acquiring the physical address, and returning the data content of the physical address in the data space corresponding to the disk to the host.
7. The method according to claim 1, wherein the method further comprises:
the number of times of writing data into a disk is used as a reference count;
and forming a reference count parameter HN according to the hash value of the written data and the reference count, and storing the reference count parameter HN into a reference count HN table.
8. The method as recited in claim 7, further comprising:
responding to a request of deleting data initiated by a host end, and analyzing a logic address in the request;
according to the logical address in the data deleting request, searching a hash value corresponding to the logical address in an LH table, and searching a reference count value corresponding to the hash value in the reference count HN table;
in response to the value of the reference count being other than 0, subtracting one from the value of the reference count corresponding to the hash value;
and searching a physical address corresponding to the hash value in an HP table according to the hash value in response to the value of the reference count being 0, and deleting data in a disk corresponding to the physical address.
9. A storage system performance optimization system, comprising:
the receiving module is configured to receive the writing data and the logic address thereof and calculate the hash value of the writing data;
the verification module is configured to verify whether the written data is new data according to the hash value;
the processing module is configured to respond to the written data as new data, write the written data into a disk and acquire a physical address of the written data on the disk;
the execution module forms the logical address and the hash value of the written data into metadata and writes the metadata into a corresponding table;
the execution module is further configured to:
forming the logical address and the physical address of the written data into LH parameters of metadata and storing the LH parameters into an LH table;
forming the hash value and the physical address of the written data into HP parameters of metadata and storing the HP parameters into an HP table; and
and forming the physical address and the logical address of the write data into PL parameters of metadata and storing the PL parameters into a PL table.
CN202110999485.XA 2021-08-29 2021-08-29 Storage system performance optimization method and system Active CN113867627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110999485.XA CN113867627B (en) 2021-08-29 2021-08-29 Storage system performance optimization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110999485.XA CN113867627B (en) 2021-08-29 2021-08-29 Storage system performance optimization method and system

Publications (2)

Publication Number Publication Date
CN113867627A CN113867627A (en) 2021-12-31
CN113867627B true CN113867627B (en) 2023-08-22

Family

ID=78988656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110999485.XA Active CN113867627B (en) 2021-08-29 2021-08-29 Storage system performance optimization method and system

Country Status (1)

Country Link
CN (1) CN113867627B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115437579B (en) * 2022-11-04 2023-03-24 苏州浪潮智能科技有限公司 Metadata management method and device, computer equipment and readable storage medium
CN115576956B (en) * 2022-12-07 2023-03-10 苏州浪潮智能科技有限公司 Data processing method, system, equipment and storage medium
CN117271224B (en) * 2023-11-14 2024-02-20 苏州元脑智能科技有限公司 Data repeated storage processing method and device of storage system, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183399A (en) * 2015-09-30 2015-12-23 北京奇艺世纪科技有限公司 Data writing and reading method and device based on elastic block storage
CN106095332A (en) * 2016-06-01 2016-11-09 杭州宏杉科技有限公司 A kind of data heavily delete method and device
CN109189349A (en) * 2018-10-16 2019-01-11 深圳忆联信息***有限公司 A kind of method and its system promoting solid state hard disk copy function
CN109407985A (en) * 2018-10-15 2019-03-01 郑州云海信息技术有限公司 A kind of method and relevant apparatus of data management

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183399A (en) * 2015-09-30 2015-12-23 北京奇艺世纪科技有限公司 Data writing and reading method and device based on elastic block storage
CN106095332A (en) * 2016-06-01 2016-11-09 杭州宏杉科技有限公司 A kind of data heavily delete method and device
CN109407985A (en) * 2018-10-15 2019-03-01 郑州云海信息技术有限公司 A kind of method and relevant apparatus of data management
CN109189349A (en) * 2018-10-16 2019-01-11 深圳忆联信息***有限公司 A kind of method and its system promoting solid state hard disk copy function

Also Published As

Publication number Publication date
CN113867627A (en) 2021-12-31

Similar Documents

Publication Publication Date Title
CN113867627B (en) Storage system performance optimization method and system
US10620862B2 (en) Efficient recovery of deduplication data for high capacity systems
KR102564170B1 (en) Method and device for storing data object, and computer readable storage medium having a computer program using the same
US20160350302A1 (en) Dynamically splitting a range of a node in a distributed hash table
EP3316150B1 (en) Method and apparatus for file compaction in key-value storage system
CN111159252A (en) Transaction execution method and device, computer equipment and storage medium
CN107766374B (en) Optimization method and system for storage and reading of massive small files
US11461239B2 (en) Method and apparatus for buffering data blocks, computer device, and computer-readable storage medium
CN112597114B (en) OLAP (on-line analytical processing) precomputation engine optimization method and application based on object storage
CN113535670B (en) Virtual resource mirror image storage system and implementation method thereof
US11514010B2 (en) Deduplication-adapted CaseDB for edge computing
WO2020215580A1 (en) Distributed global data deduplication method and device
TW201514734A (en) Database managing method, database managing system, and database tree structure
CN114936188A (en) Data processing method and device, electronic equipment and storage medium
CN113641681B (en) Space self-adaptive mass data query method
KR101806394B1 (en) A data processing method having a structure of the cache index specified to the transaction in a mobile environment dbms
CN113326262B (en) Data processing method, device, equipment and medium based on key value database
US9275091B2 (en) Database management device and database management method
CN114896250B (en) Key value separated key value storage engine index optimization method and device
KR101419428B1 (en) Apparatus for logging and recovering transactions in database installed in a mobile environment and method thereof
CN113867626A (en) Method, system, equipment and storage medium for optimizing performance of storage system
CN107506156B (en) Io optimization method of block device
CN115437836A (en) Metadata processing method and related equipment
CN116450591B (en) Data processing method, device, computer equipment and storage medium
CN112597074B (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant