CN117331493A - Method, apparatus, device and computer readable medium for data storage - Google Patents

Method, apparatus, device and computer readable medium for data storage Download PDF

Info

Publication number
CN117331493A
CN117331493A CN202311184114.1A CN202311184114A CN117331493A CN 117331493 A CN117331493 A CN 117331493A CN 202311184114 A CN202311184114 A CN 202311184114A CN 117331493 A CN117331493 A CN 117331493A
Authority
CN
China
Prior art keywords
ssd
dynamic
hdd
data
storage space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311184114.1A
Other languages
Chinese (zh)
Inventor
张峥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202311184114.1A priority Critical patent/CN117331493A/en
Publication of CN117331493A publication Critical patent/CN117331493A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0688Non-volatile semiconductor memory arrays

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device, equipment and a computer readable medium for data storage, and relates to the technical field of cloud computing. One embodiment of the method comprises the following steps: caching data in the mechanical hard disk HDD into a corresponding fixed solid state disk SSD; after the fixed SSD has no residual storage space, distributing the dynamic SSD of the HDD to buffer the data in the HDD; and the dynamic SSDs of the HDDs have no residual storage space, and the dynamic SSDs of the other HDDs have residual storage space, so that the data of the HDDs are cached into the dynamic SSDs of the other HDDs. According to the embodiment, SSD for storing data can be flexibly selected, and the resource utilization rate is improved.

Description

Method, apparatus, device and computer readable medium for data storage
Technical Field
The present invention relates to the field of cloud computing technologies, and in particular, to a method, an apparatus, a device, and a computer readable medium for data storage.
Background
Distributed storage systems typically combine mechanical Hard Disk Drives (HDD) with Solid State Disks (SSD), and use the advantages of each of the two storage media to provide a storage solution that combines both performance and capacity.
SSD is used as a cache to store hot data, so that the performance of a storage system, particularly the random read-write performance, can be greatly improved.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art: in a distributed storage system, a fixed size SSD space is typically allocated as a cache for the HDD. However, the SSD cache space is not flexible enough to be used, unbalanced disk input and output loads cannot be handled, and the resource utilization rate is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a computer readable medium for storing data, which can flexibly select an SSD storing data, and improve a resource utilization rate.
To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a method for data storage, including:
caching data in the mechanical hard disk HDD into a corresponding fixed solid state disk SSD;
after the fixed SSD has no residual storage space, distributing the dynamic SSD of the HDD to buffer the data in the HDD;
and the dynamic SSDs of the HDDs have no residual storage space, and the dynamic SSDs of the other HDDs have residual storage space, so that the data of the HDDs are cached into the dynamic SSDs of the other HDDs.
The fixed SSD, the dynamic SSD of the HDD and the dynamic SSD of the other HDD belong to a storage pool;
the HDD, the fixed SSD of the HDD and the dynamic SSD of the HDD are all storage spaces corresponding to one storage node in the storage pool.
The fixed SSD and the dynamic SDD are independent SDD disks or SSD spaces after logical partitioning.
The dynamic SSD of the HDD has no remaining storage space, and the dynamic SSDs of other HDDs have remaining storage space, and the caching of the data of the HDD into the dynamic SSDs of the other HDDs comprises:
and if the dynamic SSD of the HDD has no residual storage space and the dynamic SSDs of other HDDs have residual storage space, caching the data of the HDD into the dynamic SSDs of one or more of the other HDDs.
The method further comprises the steps of:
setting an initial SSD for the HDD according to the number of IO threads, the IO size and the IO queue depth;
and adjusting the storage space of the initial SSD according to the set coefficient, and taking the adjusted storage space of the SSD as the fixed SSD or the dynamic SSD.
After the adjusted storage space of the SSD is used as the fixed SSD or the dynamic SSD, the method includes:
after a preset processing time period, the storage space of the adjusted SSD is not used, and the storage space of the adjusted SDD is updated to be a part of the storage space of the original adjusted SSD.
The method further comprises the steps of:
and determining the data quantity deleted from the SDD in each SSD recovery period according to the service load and the idle SSD storage space so as to recover the SSD storage space.
The method further comprises the steps of:
and if the dynamic SSDs of the other HDDs have no residual storage space, screening and deleting long-term non-access data of the fixed SSDs of the HDDs and/or long-term non-access data in the dynamic SSDs.
According to a second aspect of an embodiment of the present invention, there is provided an apparatus for data storage, comprising:
the first storage module is used for caching the data in the mechanical hard disk HDD into the corresponding fixed solid state disk SSD;
the second storage module is used for distributing the dynamic SSD of the HDD to buffer the data in the HDD after the fixed SSD has no residual storage space;
and the third storage module is used for caching the data of the HDD into the dynamic SSDs of the other HDDs, wherein the dynamic SSDs of the HDD have no residual storage space, and the dynamic SSDs of the other HDDs have residual storage space.
According to a third aspect of an embodiment of the present invention, there is provided an electronic device for data storage, including:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods as described above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium having stored thereon a computer program which when executed by a processor implements a method as described above.
According to a fifth aspect of embodiments of the present invention, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as provided by embodiments of the present invention.
One embodiment of the above invention has the following advantages or benefits: caching data in the mechanical hard disk HDD into a corresponding fixed solid state disk SSD; after the fixed SSD has no residual storage space, distributing the dynamic SSD of the HDD to buffer the data in the HDD; and the dynamic SSDs of the HDDs have no residual storage space, and the dynamic SSDs of the other HDDs have residual storage space, so that the data of the HDDs are cached into the dynamic SSDs of the other HDDs. And starting to inquire the storage space from the SSD corresponding to the HDD, and further fully utilizing the SSD storage space of other HDDs. Therefore, SSD for storing data can be flexibly selected, and the resource utilization rate is improved.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic flow diagram of a method of data storage according to an embodiment of the invention;
FIG. 2 is a schematic architecture diagram of an application data store according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a buffer area according to an embodiment of the present invention;
FIG. 4 is a flow diagram of setting SSD based on IO threads, according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of the main structure of an apparatus for data storage according to an embodiment of the present invention;
FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 7 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. The data acquisition, storage, use, processing and the like in the technical scheme meet the relevant regulations of national laws and regulations.
The hybrid distributed storage system cannot provide an SSD for each HDD separately as a cache, which is limited by the number of slots in the server, so there are mainly the following ways to use the SSD.
Disk level caching: the plurality of HDDs configure one or more SSDs for caching data of the plurality of HDDs. In this way, the HDD does not have a dedicated SSD cache, and multiple HDDs share SSD cache resources.
Advantages of disk level caching: the distributed cache is easy to expand, and single-point faults are avoided; the resource utilization rate is high, and a plurality of HDDs share SSD cache resources; localized access, low latency, high performance. Disadvantages of disk level caching: the resource allocation difficulty is high, and the cache hit rate of part of the HDD is easy to be low; the cache is repeated, and the hot data of a plurality of HDDs are cached on one SSD; consistency is difficult to guarantee, and cache data of different HDDs cannot be synchronized.
Pool level caching: one or more SSDs are configured under a storage pool for caching all HDD data under the storage pool. In this manner, all HDDs in the storage pool share storage pool level SSD cache resources.
Pool level caching has the advantages: the resource allocation is fairer, and all HDDs share the SSD cache at the storage pool level; the management and configuration are simpler; and the problems of resource allocation and consistency of the cache are avoided. Pool level caching has the disadvantage that single point failure, SSD cache resource failure of the storage pool can affect all HDDs. The access delay is higher, and the SSD cache data can be accessed through the storage pool. The resource utilization rate is difficult to guarantee, and the data of part of the cold HDD also occupies cache resources.
Dedicated cache: each HDD is configured with a capacity of SSD as its exclusive cache for caching data only for that HDD. By the method, dedicated SSD caches can be provided for each HDD, cache contention and resource allocation among multiple HDDs are avoided, and the overall SSD resource utilization rate is low.
Dedicated buffer memory has the advantages: resource allocation is most fair, with each HDD having a dedicated SSD cache. The cache hit rate is highest, and the exclusive cache only stores the data of the HDD. The consistency is best ensured, and the cache data between the HDDs are isolated from each other. Drawbacks of dedicated caching: the resource utilization rate is the lowest, and the cache resources of part of the HDD may be idle and wasted. And the expansibility is poor, each HDD needs to be provided with an SSD buffer, and the implementation is difficult when the scale is large. The cost is high, and SSD cache resources are required to be provided for each HDD.
In conclusion, SSD cache space is not flexible enough to use, unbalanced disk input/output loads cannot be handled, and resource utilization rate is low.
In order to solve the problem that the SSD cache space is not flexible enough to be used and cannot cope with unbalanced disk input/output loads, the following technical scheme in the embodiment of the invention can be adopted.
Referring to fig. 1, fig. 1 is a schematic flow chart of a data storage method according to an embodiment of the present invention, firstly, an SSD corresponding to an HDD is queried, then, SSDs corresponding to other HDDs are queried, and a storage space of the SSD is fully utilized to cache data. As shown in fig. 1 and 100, the method specifically comprises the following steps:
s101, caching data in the mechanical hard disk HDD into a corresponding fixed solid state disk SSD.
In embodiments of the invention, it may be implemented in a distributed storage system. I.e. to a distributed storage system. The execution subject of the steps in fig. 1 may be a storage node.
In an embodiment of the invention, the storage nodes correspond to a mechanical hard disk (HDD) and a solid state disk (SDD). That is, each HDD has a dedicated SSD cache space, which may be a separate SSD disk or a logically partitioned SSD space. That is, a separate SSD disk may be used as the SSD cache space. The SSD space can be divided into a plurality of logical partitions, and the SSD space of one logical partition is used as an SSD cache space.
Referring to fig. 2 and 200, fig. 2 is a schematic diagram of an architecture of an application data store according to an embodiment of the present invention. The metadata clusters in FIG. 2 store data by interacting with storage nodes in the storage pools.
The metadata clusters are responsible for managing routing, volume management, storage device space management, etc. for the entire storage pool cluster. As an example, the metadata cluster may periodically detect the storage pool cluster and update the related information, and the update of the storage pool cluster related information may also be actively notified of the metadata cluster update. The metadata cluster needs to be accessed into at least one storage pool, and can also be accessed into a plurality of storage pools for management at the same time. As one example, metadata clusters are deployed in a highly available manner, with at least 3 or more nodes and an odd number overall.
The storage pool comprises a plurality of storage nodes, each storage node comprises a plurality of HDDs and a plurality of SSDs, and the SSDs are used as caches of the HDDs. The SSD in the storage node may be an independent SSD, or may be an independent buffer space of the HDD after the high-capacity SSD passes through the logical partition.
In one embodiment of the invention, the fixed SSD, the dynamic SSD of the HDD, and the dynamic SSDs of other HDDs belong to a storage pool. The storage pool includes a plurality of storage nodes, and SSDs are provided for each storage node by dividing SSDs.
The HDD, the fixed SSD of the HDD and the dynamic SSD of the HDD are all storage spaces corresponding to one storage node in the storage pool. For the storage node, a corresponding HDD and SSD are included. SSDs include fixed SSDs and dynamic SSDs. The data of the HDD is stored in the fixed SSD first and then in the dynamic SSD.
Referring to fig. 3, 300, fig. 3 is a schematic diagram of a buffer according to an embodiment of the present invention. The HDD corresponds to the SSD. The top of the HDD in fig. 3, i.e., the SSD.
Each block of SSD has a fixed cache size, and in order to manage cache data in the SSD, the space of the entire SSD is divided into three parts: the system comprises a metadata area, a fixed cache area and a dynamic cache area. The fixed buffer area corresponds to the fixed SSD, and the dynamic buffer area corresponds to the dynamic SSD. The dynamic buffer area may store data corresponding to the HDD, or may store data of other HDDs. Therefore, the dynamic cache area can be divided into a local dynamic cache area and a remote dynamic cache area, which correspond to the two situations.
The metadata area stores metadata information such as indexes of cache data, space use conditions of the fixed cache area and space use conditions of the dynamic cache area.
In the embodiment of the invention, the fixed buffer area stores the buffer data in the corresponding HDD of the SSD. The dynamic buffer may store buffered data in the SSD's corresponding HDD and non-corresponding HDD.
Specifically, the size of the SSD cache space corresponding to each HDD is recorded as SSDn_capability; the sizes of the metadata area, the fixed buffer area and the dynamic buffer area are respectively recorded as follows: ssdn_meta, ssdn_fixed, ssdn_dynamic, n denote the sequence number of the SSD.
SSDn_capacity=SSDn_meta+SSDn_fixed+SSDn_dynamic。
Because ssdn_dynamic can store data of its corresponding HDD and also can store data of other HDDs, ssdn_dynamic can be divided into two parts, namely ssdn_dynamic_local and ssdn_dynamic_remote. Ssdn_dynamic_local is used to store the space size of the corresponding HDD data. Ssdn_dynamic_remote is used to store the space size of other HDD data.
SSDn_dynamic=ɑ×SSDn_dynamic_local+(1-ɑ)×SSDn_dynamic_remote。
Wherein, alpha is an adjustment coefficient, in extreme cases, when alpha is 0, the dynamic cache areas of the SSD are all used for storing other HDD cache data; when alpha is 1, all the dynamic cache areas of the SSD are used for storing the HDD cache data corresponding to the SSD.
In the embodiment of the invention, under the condition that the HDD needs to buffer data, the data in the HDD can be buffered in the corresponding fixed SSD. That is, data in the HDD is buffered in the corresponding fixed buffer. In the fixed buffer area, the buffered data in the corresponding HDD is stored.
S102, after the fixed SSD has no residual storage space, distributing the data in the HDD to the dynamic SSD cache of the HDD.
After the data in the HDD is stored in the corresponding fixed SSD, the fixed SSD may have no remaining storage space due to the limited storage space of the fixed SSD, i.e. the fixed SSD corresponding to the HDD is full. The data in the HDD may be allocated to a dynamic SSD cache of the HDD.
In an embodiment of the invention, no remaining storage space comprises a remaining space less than a remaining threshold. That is, in the case where the remaining space is smaller than the remaining threshold, it is determined that there is no remaining storage space.
Specifically, the data in the spatial cache HDD in ssdn_dynamic is allocated, namely ssdn_dynamic_local.
It is understood that the priority of a fixed SSD is higher than that of a dynamic SSD. Therefore, the data in the HDD is cached to the fixed SSD first, and then the data in the HDD is cached to the dynamic SSD.
S103, the dynamic SSD of the HDD has no residual storage space, and the dynamic SSDs of other HDDs have residual storage space, so that the data of the HDD are cached in the dynamic SSDs of the other HDDs.
And (3) no residual storage space exists in the dynamic SSD of the HDD, and the fact that the data of the HDD cannot be cached in the dynamic SSD of the HDD is indicated. In the above case, the dynamic SSD of other HDDs may be considered.
And (3) no residual storage space exists in the dynamic SSDs of the HDDs, and the residual storage space exists in the dynamic SSDs of the other HDDs, so that the data of the HDDs are cached in the dynamic SSDs of the other HDDs.
In one embodiment of the invention, when the SSDn_dynamic_local/SSDn_dynamic is greater than or equal to a preset Threshold, i.e. the dynamic SSD of the HDD has no remaining storage space, the SSDn_dynamic region space is about to be exhausted.
In order not to affect the input-output (IO) performance of the device, it is considered to cache the data in the HDD to the SSDn_dynamic region in the other SSD, denoted as SSDn_dynamic_remote. It will be appreciated that in a dynamic SSD, data for both the corresponding HDD may be stored, as well as data for other HDDs.
In one embodiment of the present invention, to increase the speed of storing data, the dynamic SSD of the HDD has no remaining storage space, and the dynamic SSDs of the other HDDs have remaining storage space, the data of the HDD is cached in the dynamic SSDs of one or more of the other HDDs.
Specifically, when the ssdn_dynamic region space of a block SSD is insufficient to store data of the HDD, the data of the HDD may be cached to a plurality of ssdn_dynamic regions.
In this way, the data of the HDD is stored not only in the corresponding fixed SSD but also in the corresponding dynamic SSD of the other HDD.
For the storage nodes, the data of the HDD in one storage node is cached in the corresponding fixed SSD, then cached in the corresponding dynamic SSD, and also cached in the dynamic SSDs of other storage nodes. It is known that the data of the HDD is cached in a different SSD. For one HDD, the storage space of the SSD may be adjusted with the buffered data.
Referring to fig. 4, 400, fig. 4 is a schematic flow chart of setting an SSD based on IO threads according to an embodiment of the invention. The method specifically comprises the following steps:
s401, setting an initial SSD for the HDD according to the number of IO threads, the IO size and the IO queue depth.
The IO Size (IO Size) is the unit Size of a read-write request sent to the storage system through the disk subsystem of the operating system, initiated by the application program. IO queue depth is the amount of data written at once.
In the embodiment of the invention, an initial SSD can be set for the HDD according to the number of IO threads, the IO size and the IO queue depth. Specifically, the initial SSD size is: IO thread number x IO size x IO depth.
S402, adjusting the storage space of the initial SSD according to the set coefficient, and taking the adjusted storage space of the SSD as a fixed SSD or a dynamic SSD.
In an embodiment of the present invention, the storage space of the initial SSD is adjusted by setting coefficients. The aim is that: avoiding the subsequent frequent application overhead. The storage space is over-allocated when applied, but if the over-allocation is too much, space waste may be caused.
As one example, the adjusted storage space of the SSD: IO thread number x IO size x IO depth x beta. Wherein, beta is a setting coefficient, and beta is more than or equal to 1 and less than or equal to 2. The setting coefficients may be set in connection with a business scenario.
And taking the storage space of the SSD after adjustment as a fixed SSD or a dynamic SSD. As one example, the adjusted SSD stores data of the HDD corresponding to the SDD, and the adjusted SSD is used as a fixed SSD or a dynamic SSD; and the adjusted SSD stores data of the SDD which is not corresponding to the HDD, and the adjusted SSD is used as the dynamic SSD.
In addition, the unused SSD cache space can be recycled according to the use condition of the SSD cache space.
After the adjusted storage space of the SSD is used as the fixed SSD or the dynamic SSD, after a preset processing time period, the adjusted storage space of the SSD is not used, and the adjusted storage space of the SDD is updated to be a part of the original adjusted storage space of the SSD.
Specifically, the processing period may be set according to the application scenario. After the preset processing time period, the storage space of the SSD after adjustment is not used, and part of the storage space of the SSD after adjustment can be recovered. As one example, half of the adjusted SSD storage space is reclaimed. If the adjusted storage space of the SSD has the IO request in the preset processing time period, recycling is not executed.
In one embodiment of the present invention, the space of the SSD buffer for storing HDD data mainly has two parts, a fixed SSD and a dynamic SSD. To increase the utilization of the SSD, the SSD may be reclaimed to reassign the SSD in the following manner.
Specifically, according to the service load and the idle SSD storage space, determining the data quantity deleted in the SDD in each SSD recovery period to recover the SSD storage space.
Ssdrecle=ssd_data×1/(trecycle+1) n. Deleting the data volume in the SDD in the SSDrechle recovery period; SSD_Data is the amount of Data buffered by SSD; trecycle is the recovery period, n is the recovery adjustment parameter, n is greater than or equal to 1, and n is used for controlling the data volume of retrieving each time.
SSD_Data is ordered by access frequency and access time, etc. SSDs with high access frequency and long access time are arranged in the rear. That is, data in the SSD with low access frequency and access period is deleted first.
The n controls the data quantity recovered each time, and the larger n is, the larger the data quantity actively recovered is. The SSD storage space may be free as per traffic load.
The service scenario may be divided into the following four scenarios according to the service load and the free SSD storage space:
scene 1: the traffic load pressure is not large and the free SSD storage space is not zero.
Scene 2: the traffic load pressure is large and the free SSD storage space is not zero.
Scene 3: the traffic load pressure is not large, and the idle SSD storage space is zero.
Scene 4: the traffic load pressure is high and the free SSD storage space is zero.
As one example, for scenario 1 and scenario 2, consider that there is also free SSD storage space, n is set to 1. For scenario 3 and scenario 4, since the free SSD storage space is zero, the subsequent data buffering is affected, and n is set to 2.
In one embodiment of the invention, SSD storage is reclaimed by eliminating long term non-accessed data. Under the condition that IO data needs to be written into SSD, the SSD has no extra space, and long-term unaccessed data needs to be removed.
As one example, the fixed SSD of the HDD has no remaining storage space and the dynamic SSDs of the other HDDs have no remaining storage space, long-term non-accessed data of the fixed SSD of the HDD and/or long-term non-accessed data in the dynamic SSD are filtered and deleted.
Specifically, the long-term non-access data of the fixed SSD of the HDD is deleted first, and if the remaining storage space of the fixed SSD of the HDD is insufficient, the long-term non-access data in the dynamic SSD of the HDD is deleted. If the sum of the remaining storage space of the fixed SSD of the HDD and the remaining storage space of the dynamic SSD of the HDD is still insufficient, the long-term unaccessed data of the other HDD corresponding to the fixed SDD and the dynamic SSD needs to be deleted.
The long-term non-access data of the fixed SSD of the other HDD can be deleted first, and if the remaining storage space of the fixed SSD of the other HDD is insufficient, the long-term non-access data in the dynamic SSD of the other HDD is deleted.
In the above embodiment, data in the mechanical hard disk HDD is cached in the corresponding fixed solid state disk SSD; after the fixed SSD has no residual storage space, distributing the dynamic SSD of the HDD to buffer the data in the HDD; and the dynamic SSDs of the HDDs have no residual storage space, and the dynamic SSDs of the other HDDs have residual storage space, so that the data of the HDDs are cached into the dynamic SSDs of the other HDDs. And starting to inquire the storage space from the SSD corresponding to the HDD, and further fully utilizing the SSD storage space of other HDDs. Therefore, SSD for storing data can be flexibly selected, and the resource utilization rate is improved.
Referring to fig. 5, fig. 5 is a schematic main structural diagram of a data storage device according to an embodiment of the present invention, where the data storage device may implement a data storage method, as shown in fig. 5 and 500, where the data storage device specifically includes:
the first storage module 501 is configured to cache data in the mechanical hard disk HDD into a corresponding fixed solid state disk SSD;
a second storage module 502, configured to allocate a dynamic SSD of the HDD to cache data in the HDD after the fixed SSD has no remaining storage space;
and a third storage module 503, configured to buffer data of the HDD to the dynamic SSD of the other HDD, where the dynamic SSD of the HDD has no remaining storage space and the dynamic SSD of the other HDD has remaining storage space.
In one embodiment of the invention, the fixed SSD, the dynamic SSD of the HDD, and the dynamic SSD of the other HDD belong to a storage pool;
the HDD, the fixed SSD of the HDD and the dynamic SSD of the HDD are all storage spaces corresponding to one storage node in the storage pool.
In one embodiment of the invention, 3. The fixed SSD and the dynamic SDD are separate SDD disks or logically partitioned SSD spaces.
In one embodiment of the present invention, the third storage module 503 is specifically configured to cache the data of the HDD to one or more dynamic SSDs of the other HDDs if the dynamic SSDs of the HDD have no remaining storage space and the dynamic SSDs of the other HDDs have remaining storage space.
In one embodiment of the present invention, the third storage module 503 is further configured to set an initial SSD for the HDD according to the number of IO threads, the size of the IO and the depth of the IO queue;
and adjusting the storage space of the initial SSD according to the set coefficient, and taking the adjusted storage space of the SSD as the fixed SSD or the dynamic SSD.
In an embodiment of the present invention, the third storage module 503 is further configured to update the storage space of the adjusted SDD to a part of the storage space of the original adjusted SSD after the preset processing time period is not used.
In one embodiment of the present invention, the third storage module 503 is further configured to determine, according to the traffic load and the idle SSD storage space, an amount of data deleted in the SDD in each SSD reclamation period, so as to reclaim the storage space of the SSD.
In one embodiment of the present invention, the third storage module 503 is further configured to filter and delete long-term non-accessed data of the fixed SSD of the HDD and/or long-term non-accessed data in the dynamic SSD if the dynamic SSD of the other HDD has no remaining storage space.
Fig. 6 illustrates an exemplary system architecture 600 of a method of data storage or an apparatus of data storage to which embodiments of the present invention may be applied.
As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 is used as a medium to provide communication links between the terminal devices 601, 602, 603 and the server 605. The network 604 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 605 via the network 604 using the terminal devices 601, 602, 603 to receive or send messages, etc. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 601, 602, 603.
The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 605 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using terminal devices 601, 602, 603. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.
It should be noted that, the method for storing data provided by the embodiment of the present invention is generally executed by the server 605, and accordingly, the device for storing data is generally disposed in the server 605.
It should be understood that the number of terminal devices, networks and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
A computer program product according to an embodiment of the present invention includes a computer program that, when executed by a processor, implements a method for storing data according to an embodiment of the present invention.
Referring now to FIG. 7, there is illustrated a schematic diagram of a computer system 700 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 701.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a first memory module, a second memory module, and a third memory module. The names of these modules do not limit the module itself in some cases, for example, the first storage module may also be described as "being used to cache data in the mechanical hard disk HDD into the corresponding fixed solid state disk SSD".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include:
caching data in the mechanical hard disk HDD into a corresponding fixed solid state disk SSD;
after the fixed SSD has no residual storage space, distributing the dynamic SSD of the HDD to buffer the data in the HDD;
and the dynamic SSDs of the HDDs have no residual storage space, and the dynamic SSDs of the other HDDs have residual storage space, so that the data of the HDDs are cached into the dynamic SSDs of the other HDDs.
According to the technical scheme of the embodiment of the invention, the data in the mechanical hard disk HDD are cached in the corresponding fixed solid state disk SSD; after the fixed SSD has no residual storage space, distributing the dynamic SSD of the HDD to buffer the data in the HDD; and the dynamic SSDs of the HDDs have no residual storage space, and the dynamic SSDs of the other HDDs have residual storage space, so that the data of the HDDs are cached into the dynamic SSDs of the other HDDs. And starting to inquire the storage space from the SSD corresponding to the HDD, and further fully utilizing the SSD storage space of other HDDs. Therefore, SSD for storing data can be flexibly selected, and the resource utilization rate is improved.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
In the technical scheme of the invention, the aspects of acquisition, analysis, use, transmission, storage and the like of the related user personal information all meet the requirements of related laws and regulations, are used for legal and reasonable purposes, are not shared, leaked or sold outside the aspects of legal use and the like, and are subjected to supervision and management of a supervision department. Necessary measures should be taken for the personal information of the user to prevent illegal access to such personal information data, ensure that personnel having access to the personal information data comply with the regulations of the relevant laws and regulations, and ensure the personal information of the user. Once these user personal information data are no longer needed, the risk should be minimized by limiting or even prohibiting the data collection and/or deletion.
User privacy is protected, when applicable, by de-identifying the data, including in some related applications, such as by removing a particular identifier (e.g., date of birth, etc.), controlling the amount or specificity of stored data (e.g., collecting location data at a city level rather than at a specific address level), controlling how the data is stored, and/or other methods.

Claims (12)

1. A method of data storage, comprising:
caching data in the mechanical hard disk HDD into a corresponding fixed solid state disk SSD;
after the fixed SSD has no residual storage space, distributing the dynamic SSD of the HDD to buffer the data in the HDD;
and the dynamic SSDs of the HDDs have no residual storage space, and the dynamic SSDs of the other HDDs have residual storage space, so that the data of the HDDs are cached into the dynamic SSDs of the other HDDs.
2. The method of data storage of claim 1, wherein the fixed SSD, the dynamic SSD of the HDD, and the dynamic SSD of the other HDD belong to a storage pool;
the HDD, the fixed SSD of the HDD and the dynamic SSD of the HDD are all storage spaces corresponding to one storage node in the storage pool.
3. The method of data storage of claim 1, wherein the fixed SSD and the dynamic SDD are independent SDD disks or logically partitioned SSD space.
4. The method of data storage of claim 1, wherein the dynamic SSD of the HDD has no remaining storage space and the dynamic SSD of the other HDD has remaining storage space, buffering the data of the HDD into the dynamic SSD of the other HDD, comprising:
and if the dynamic SSD of the HDD has no residual storage space and the dynamic SSDs of other HDDs have residual storage space, caching the data of the HDD into the dynamic SSDs of one or more of the other HDDs.
5. The method of data storage of claim 1, wherein the method further comprises:
setting an initial SSD for the HDD according to the number of IO threads, the IO size and the IO queue depth;
and adjusting the storage space of the initial SSD according to the set coefficient, and taking the adjusted storage space of the SSD as the fixed SSD or the dynamic SSD.
6. The method of data storage according to claim 5, wherein after said taking the adjusted storage space of the SSD as the fixed SSD or the dynamic SSD, comprising:
after a preset processing time period, the storage space of the adjusted SSD is not used, and the storage space of the adjusted SDD is updated to be a part of the storage space of the original adjusted SSD.
7. The method of data storage of claim 1, wherein the method further comprises:
and determining the data quantity deleted from the SDD in each SSD recovery period according to the service load and the idle SSD storage space so as to recover the SSD storage space.
8. The method of data storage of claim 1, wherein the method further comprises:
and if the dynamic SSDs of the other HDDs have no residual storage space, screening and deleting long-term non-access data of the fixed SSDs of the HDDs and/or long-term non-access data in the dynamic SSDs.
9. An apparatus for data storage, comprising:
the first storage module is used for caching the data in the mechanical hard disk HDD into the corresponding fixed solid state disk SSD;
the second storage module is used for distributing the dynamic SSD of the HDD to buffer the data in the HDD after the fixed SSD has no residual storage space;
and the third storage module is used for caching the data of the HDD into the dynamic SSDs of the other HDDs, wherein the dynamic SSDs of the HDD have no residual storage space, and the dynamic SSDs of the other HDDs have residual storage space.
10. An electronic device for data storage, comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-8.
11. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-8.
12. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-8.
CN202311184114.1A 2023-09-13 2023-09-13 Method, apparatus, device and computer readable medium for data storage Pending CN117331493A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311184114.1A CN117331493A (en) 2023-09-13 2023-09-13 Method, apparatus, device and computer readable medium for data storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311184114.1A CN117331493A (en) 2023-09-13 2023-09-13 Method, apparatus, device and computer readable medium for data storage

Publications (1)

Publication Number Publication Date
CN117331493A true CN117331493A (en) 2024-01-02

Family

ID=89282050

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311184114.1A Pending CN117331493A (en) 2023-09-13 2023-09-13 Method, apparatus, device and computer readable medium for data storage

Country Status (1)

Country Link
CN (1) CN117331493A (en)

Similar Documents

Publication Publication Date Title
US10671285B2 (en) Tier based data file management
US11561930B2 (en) Independent evictions from datastore accelerator fleet nodes
WO2019085769A1 (en) Tiered data storage and tiered query method and apparatus
US9444905B2 (en) Allocating network bandwidth to prefetch requests to prefetch data from a remote storage to cache in a local storage
US11245774B2 (en) Cache storage for streaming data
CN103150122B (en) A kind of disk buffering space management and device
US11226778B2 (en) Method, apparatus and computer program product for managing metadata migration
US20220083281A1 (en) Reading and writing of distributed block storage system
US10795579B2 (en) Methods, apparatuses, system and computer program products for reclaiming storage units
CN105095495A (en) Distributed file system cache management method and system
US11429311B1 (en) Method and system for managing requests in a distributed system
CN110162395B (en) Memory allocation method and device
CN113031857B (en) Data writing method, device, server and storage medium
US11341055B2 (en) Method, electronic device, and computer program product for storage management
CN101483668A (en) Network storage and access method, device and system for hot spot data
WO2016090985A1 (en) Cache reading method and apparatus, and cache reading processing method and apparatus
CN113407108A (en) Data storage method and system
CN110737397B (en) Method, apparatus and computer program product for managing a storage system
CN114356241B (en) Small object data storage method, small object data storage device, electronic equipment and readable medium
CN117331493A (en) Method, apparatus, device and computer readable medium for data storage
US10686906B2 (en) Methods for managing multi-level flash storage and devices thereof
CN108984431B (en) Method and apparatus for flushing stale caches
CN113434285A (en) Key value cache system-based memory management method and device
KR102193002B1 (en) Apparatus And Method For Dispersion In-memory Data Grid Using Dynamic Allocation
KR102024846B1 (en) File system program and method for controlling data cener using it

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination