WO2017143972A1 - Procédé et appareil de traitement de données - Google Patents

Procédé et appareil de traitement de données Download PDF

Info

Publication number
WO2017143972A1
WO2017143972A1 PCT/CN2017/074290 CN2017074290W WO2017143972A1 WO 2017143972 A1 WO2017143972 A1 WO 2017143972A1 CN 2017074290 W CN2017074290 W CN 2017074290W WO 2017143972 A1 WO2017143972 A1 WO 2017143972A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
page data
page
type
block
Prior art date
Application number
PCT/CN2017/074290
Other languages
English (en)
Chinese (zh)
Inventor
杨洪章
罗圣美
王志坤
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017143972A1 publication Critical patent/WO2017143972A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0626Reducing size or complexity of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device

Definitions

  • the embodiments of the present invention relate to the field of communications, and in particular, to a data processing method and apparatus.
  • SSDs Solid State Drives
  • SSDs are a new generation of hard drives that combine advanced semiconductor technology into high-capacity mobile storage. Since there is no mechanical structure like a magnetic head inside, there is no need to move the head positioning data, so the solid state hard disk starts up faster, and since there is no seek time, the storage and reading and writing speed of the solid state hard disk is also superior to that of the mechanical hard disk.
  • SSDs have advantages over traditional disks: high reliability, high shock resistance, low power consumption, low noise, and more. For this reason, SSDs are beginning to gain popularity in both personal and enterprise applications.
  • SSDs also have some shortcomings, such as pre-write erase, limited erase times, and garbage collection.
  • erasing before writing means that the solid state hard disk has three operations of reading, writing and erasing, and must be erased before the writing operation, that is, the overwriting operation cannot be directly performed. For example, when you need to modify the written data, you need to invalidate the old data mark and then write the new data to the free space.
  • the feature of erasing before writing greatly reduces the write performance of the solid state drive.
  • the number of limited erasures means that the number of erasures of the solid state hard disk is generally 100,000 times to one million times.
  • GC Garbage Collection
  • the commonly used method in the prior art is to adopt the classic Greedy algorithm, that is, select the data blocks containing the most failed pages for garbage collection, and all the pages in the data block will be invalidated.
  • Priority recycling In other words, in solid When the free space in the hard disk is insufficient, the valid page in the solid state hard disk data recovery block is moved, and the invalid page in the data recovery block is erased to implement garbage collection of the solid state hard disk.
  • the SSD does not subdivide the effective page data page, that is, after the relocation, the cold and dirty page data in the valid page data will be replaced from the cache, and further the dirty data needs to be replaced.
  • the page data is relocated, and the data that has just been moved to the new location is marked as invalid, and the new data in the cache is written to the location of the SSD update.
  • a large number of unnecessary secondary relocations of the effective page during the use of the solid state hard disk will greatly increase the overhead of the solid state hard disk, thereby affecting the processing efficiency of the data in the solid state hard disk.
  • the embodiment of the invention provides a data processing method and device, so as to at least solve the problem that the data processing efficiency is low due to secondary data relocation in the related art.
  • a data processing method includes: acquiring a reclaim request, wherein the reclaiming request is used to request data recovery of page data in a solid state hard disk; and validating the cache request in response to the reclaiming request Obtaining the first type of page data in the page data, wherein the first type of page data is used to indicate that the page data to be replaced by the cache to the solid state hard disk is to be read; and the first type of page data is relocated from the cache And a predetermined relocation position in the solid state hard disk, wherein the predetermined relocation location is a storage location of the valid page data after the data recovery is performed.
  • the obtaining the first type of page data from the cached valid page data in response to the foregoing reclaiming request includes: obtaining an access frequency and a modified identifier of the valid page data in the cache; and obtaining, according to the access frequency and the modified identifier
  • the page type of the valid page data wherein the page type of the valid page data includes the first type of page data and the second type of page data, and the second type of page data is used to indicate that the storage is not replaced from the cache.
  • the second type of page data includes first page data and second page data
  • the page type for obtaining the valid page data according to the access frequency and the modified identifier includes: identifying the modification as unmodified page data as the first page data, and identifying the modification as being modified and the access frequency.
  • the page data that is greater than or equal to the first predetermined threshold is used as the second page data, and the modification is identified as page data that has been modified and the access frequency is less than the first predetermined threshold as the first type of page data.
  • the method before the relocating the first type of page data from the cache to the predetermined relocation position in the solid state hard disk, the method further includes: determining, according to at least the first type of page data, a data recovery block of the solid state hard disk, where The page type of the page data in each data block in the solid state hard disk includes: unwritten page data, invalid page data, and the valid page data, wherein each of the data blocks includes the data recovery block; and the data is collected by using the data recovery block. Recycling.
  • determining, according to the foregoing first type of page data, the data recovery block of the solid state hard disk comprising: acquiring the foregoing data according to the first type of page data in the cache and the page data in each data block in the solid state hard disk.
  • the data recovery rate of the block; the above data recovery block is determined according to the above data recovery rate.
  • obtaining, according to the first type of page data in the cache and the page data in each data block in the solid state hard disk, the data recovery rate of each of the data blocks includes: repeating the following steps until the traversing of the solid state hard disk is completed. And the foregoing data block: obtaining the block identifier of the current data block; acquiring the first type of page data and the invalid page data identified by the block identifier; and acquiring the data recovery rate of the current data block by: Wherein, the r represents the data recovery rate of the current data block, the a represents the number of pages of the invalid page data of the current data block in the solid state hard disk, and the b represents the page data of the first type in the cache. The number of pages, the above P represents the page size, and the above B represents the block size.
  • performing the foregoing data recovery on the data recovery block includes: relocating the valid page data in the data recovery block to the predetermined relocation location, and marking the valid page data as the invalid page data; The above failed page data in the recycle block is erased.
  • the method further includes: according to the other data block except the data recovery block in the solid state hard disk The size of the write page data determines the predetermined relocation location described above.
  • a data processing apparatus including: a first obtaining unit, configured to acquire a recycling request, wherein the recycling request is used to request data recovery of page data in a solid state hard disk;
  • the second obtaining unit is configured to obtain the first type of page data from the cached valid page data in response to the recycling request, wherein the first type of page data is used to indicate that the storage is to be replaced from the cache to the solid state hard disk.
  • a page data a relocation unit configured to relocate the first type of page data from the cache to a predetermined relocation location in the solid state hard disk, wherein the predetermined relocation location is a storage location of the valid page data after performing the data recovery .
  • the second obtaining unit includes: a first acquiring module, configured to acquire an access frequency and a modified identifier of the valid page data in the cache, and a second acquiring module, configured to obtain according to the access frequency and the modified identifier
  • the page type of the valid page data wherein the page type of the valid page data includes the first type of page data and the second type of page data, and the second type of page data is used to indicate that the storage is not replaced from the cache.
  • the separating module is configured to separate the valid page data according to the page type of the valid page data, to obtain the first type of page data.
  • the page data of the second type includes the first page data and the second page data
  • the second obtaining module obtains the page type of the valid page data by: identifying the modification as being unmodified
  • the page data is used as the first page data
  • the modification is identified as page data that has been modified and the access frequency is greater than or equal to a first predetermined threshold as the second page data
  • the modification is identified as having been modified and the access frequency is
  • the page data smaller than the first predetermined threshold is used as the first type of page data.
  • the method further includes: a first determining unit, configured to: before relocating the first type of page data from the cache to a predetermined relocation position in the solid state hard disk, at least according to the foregoing
  • the first type of page data determines a data recovery block of the solid state hard disk, wherein the page type of the page data in each data block in the solid state hard disk includes: unwritten page data, invalid page data, the valid page data, and the foregoing data.
  • the block includes the above data recovery block; and the recovery unit is configured to perform the above data recovery on the data recovery block.
  • the first determining unit includes: a third acquiring module, configured to obtain, according to the first type of page data in the cache and the page data in each data block in the solid state hard disk, a data recovery rate of each of the data blocks;
  • the determining module is configured to determine the above data recovery block according to the above data recovery rate.
  • the foregoing third obtaining module includes: a processing submodule, configured to repeatedly perform the following steps until the foregoing each data block in the solid state hard disk is traversed: acquiring a block identifier of a current data block; acquiring the identifier identified by the block identifier The first type of page data and the invalid page data; the foregoing data recovery rate of the current data block is obtained by: Wherein, the r represents the data recovery rate of the current data block, the a represents the number of pages of the invalid page data of the current data block in the solid state hard disk, and the b represents the page data of the first type in the cache. The number of pages, the above P represents the page size, and the above B represents the block size.
  • the recycling unit includes: a relocation module configured to relocate the valid page data in the data recovery block to the predetermined relocation location, and mark the valid page data as the invalid page data; and an erasing module, It is set to erase the above-mentioned invalid page data in the above data recovery block.
  • the method further includes: a second determining unit, configured to: before relocating the first type of page data from the cache to a predetermined relocation position in the solid state hard disk, according to the solid state hard disk except the data recovery block The size of the above-mentioned unwritten page data in the other data blocks determines the predetermined relocation position.
  • a second determining unit configured to: before relocating the first type of page data from the cache to a predetermined relocation position in the solid state hard disk, according to the solid state hard disk except the data recovery block The size of the above-mentioned unwritten page data in the other data blocks determines the predetermined relocation position.
  • a computer storage medium is further provided, and the computer storage medium may store an execution instruction for executing the data processing method in the foregoing embodiment.
  • data recovery is performed on page data in the solid state hard disk.
  • the first type of page data that is to be replaced from the cache to the SSD is directly relocated to the predetermined relocation position in the SSD without first replacing the first type of page data.
  • another relocation is carried out, thereby overcoming the problem of low data processing efficiency caused by the secondary relocation of data in the related art, thereby improving the efficiency of data processing, and also reducing the SSD in The number of data relocations and the extra overhead caused by data reclamation and cache replacement improves the performance of SSDs.
  • FIG. 1 is a flow chart of an alternative data processing method in accordance with an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of an optional data block according to an embodiment of the present invention.
  • FIG. 3 is a flow chart of another alternative data processing method in accordance with an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an alternative data processing apparatus in accordance with an embodiment of the present invention.
  • FIG. 1 is a diagram of an embodiment of the present invention.
  • a flow chart of an optional data processing method is provided, as shown in FIG. 1, the process includes the following steps:
  • Step S102 acquiring a recycling request, where the recycling request is used to request data recovery of page data in the solid state hard disk;
  • step S104 the first type of page data is obtained from the cached valid page data in response to the reclaiming request, wherein the first type of page data is used to indicate that the page data to be replaced from the cache to the solid state hard disk is to be replaced;
  • Step S106 Relocating the first type of page data from the cache to a predetermined relocation location in the SSD, wherein the predetermined relocation location is a storage location of the valid page data after performing data recovery.
  • the foregoing data processing method may be, but is not limited to, being applied to a garbage data recovery process of a solid state hard disk. That is to say, in the embodiment, when the data of the page data as the garbage in the solid state hard disk is recovered, the valid page data can be obtained when the data recovery request for the page data in the solid state hard disk is obtained.
  • the first type of page data stored in the cache to be replaced by the cache to the SSD is directly moved to the predetermined relocation position in the SSD without first replacing the first type of page data with the SSD, and then relocating again. Therefore, the problem of low data processing efficiency caused by the secondary relocation of data in the related art is overcome, thereby improving the data processing efficiency and greatly reducing the overhead caused by the data relocation of the solid state hard disk.
  • the SSD includes: a Flash Translation Layer (FTL), where the flash translation layer is used to map a logical address to a physical address through a mapping table; Page type; detects free space, triggers data recovery when there is insufficient, for example, when the number of erased blocks in the SSD is less than 20% of the total number of data blocks, it will trigger the request for the page in the SSD Data recovery request for data recovery.
  • FTL Flash Translation Layer
  • Page type detects free space, triggers data recovery when there is insufficient, for example, when the number of erased blocks in the SSD is less than 20% of the total number of data blocks, it will trigger the request for the page in the SSD Data recovery request for data recovery.
  • the page type of the valid page data includes the first type of page data and the second type of page data, wherein the first type of page data is to be The cache replaces the page data stored to the SSD, and the second type of page data is page data that is not replaced by the cache to the SSD.
  • the obtaining, by the response to the reclaiming request, the first type of page data from the cached valid page data comprises: obtaining an access frequency of the valid page data in the cache and modifying the identifier; and according to the access frequency and the modification identifier Obtaining a page type of valid page data, wherein the page type of the valid page data includes the first type of page data and the second type of page data, and the second type of page data is used to indicate that the storage is not replaced from the cache to the solid state hard disk.
  • Page data separating the valid page data according to the page type of the valid page data to obtain the first type of page data.
  • a Cache Layer for temporarily caching valid page data will queue the pages in the cache according to the Least Recently Used algorithm (the most recently unused queue, the LRU queue).
  • the LRU queue may be, but is not limited to, divided into hot (HOT) pages and cold (COOL) pages according to a predetermined threshold. For example, if the predetermined threshold is 10, 10% of the tail of the LRU queue is marked as a cold (COOL) page, before 90% of pages are marked as hot (HOT) pages.
  • the LRU queues may be, but are not limited to, ordered according to the access frequency.
  • the page data modified in the cache layer is marked as a dirty (DIRTY) page
  • the unmodified page data is marked as a clean (CLEAN) page.
  • a dirty page in a cold page is called a COOL DIRTY page (identified by a CD)
  • a dirty page in a hot page is called a hot dirty (HOT DIRTY) page (identified by HD).
  • the page data in the cache there are two copies of the page data in the cache, one is a copy in the SSD, and the other is a copy in the cache. If it is a clean page, the two copies are identical; if it is a dirty page, the copy in the cache is the latest page data, and the copy in the SSD is the old page data. That is to say, the cold dirty page in the solid state hard disk and the cold dirty page in the cache are different copies of the same page, the cold dirty page in the cache stores new data, and the solid state hard disk stores the old data of the cold dirty page.
  • the cold dirty page storing the old data in the solid state hard disk may be marked as invalid page data, and Will replace the new dirty page in the cache
  • the data is directly written to the updated location in the SSD (such as the scheduled relocation location after data recovery). Therefore, the cold dirty page is first moved to the solid state hard disk through the cache replacement, and a secondary relocation step is performed in the process of data recovery, thereby overcoming the data processing caused by the secondary relocation of data in the related art.
  • the problem of low efficiency in addition to improving the efficiency of data processing, also greatly reduces the overhead caused by the relocation of solid state drives.
  • each data block in the solid state hard disk may include, but is not limited to, the following five types of page data: unwritten page data (can be represented by an unwritten page), invalid page data (available invalid page) Indicates), valid page data (can be represented by a valid page), wherein the valid page data includes: clean page data (which can be represented by a clean page), hot dirty page data (which can be represented by a hot dirty page), and cold dirty page data ( Can be represented by a cold dirty page).
  • the unwritten page data is a free space in the data block, has been erased or never allocated, and can directly write data.
  • Clean page data means that the page has been written to the data, and the page data is not modified in the cache; hot dirty page data means that the page data has been modified in the cache, but has not been cached due to frequent access.
  • the cache is used to store valid page data, which may include, but is not limited to, the following three types of page data: clean page data, hot dirty page data, and cold dirty page data.
  • the page data of the second type may include, but is not limited to, clean page data and hot dirty page data
  • the first type of page data may include, but is not limited to, cold dirty page data.
  • the clean page data in the valid page data is consistent with the content stored in the solid state disk in the cache, in this embodiment, the clean page data can be, but is not limited to, the hot dirty page data.
  • the second type of page data stored to the SSD is not replaced from the cache.
  • the method before the relocation of the first type of page data from the cache to the predetermined relocation position in the SSD, the method further includes:
  • performing data recovery on the data recovery block may include: relocating valid page data in the data recovery block to a predetermined relocation location, and marking the valid page data as invalid page data; The failed page data in the recycle block is erased.
  • determining, according to at least the first type of page data, the data recovery block of the solid state hard disk may be, but is not limited to, acquiring data recovery rate of each data block in the solid state hard disk according to at least the first type of page data.
  • the data recovery block is determined by comparing the obtained data recovery rate (such as determining the block identifier of the data recovery block).
  • the manner of determining the data recovery block by comparing the obtained data recovery rates includes at least one of the following:
  • the data recovery rate of each data block in the solid state hard disk is obtained according to at least the first type of page data, which may include, but is not limited to, according to the first type of page data and each data block in the solid state hard disk.
  • the invalidation page data to determine the data recovery rate.
  • the method before the relocation of the first type of page data from the cache to the predetermined relocation position in the SSD, the method further includes: according to other data blocks in the SSD except the data recovery block.
  • the size of the write page data determines the predetermined relocation location.
  • the valid page data in the page can be completely relocated to the area corresponding to the unwritten page data in other data blocks.
  • the method includes:
  • the system triggers a recycling request.
  • the first type of page data stored in the effective page data to be replaced from the cache to the solid state hard disk is directly used. Relocating to a predetermined relocation location in the SSD without first replacing the first type of page data with the SSD, and then re-removing, thereby overcoming the data processing efficiency caused by the secondary relocation of data in the related art.
  • the low problem achieves the effect of improving data processing efficiency.
  • it also reduces the number of data relocations and the additional overhead caused by the SSD during data recovery and cache replacement, and improves the performance of the SSD.
  • the first type of page data is obtained from the cached valid page data in response to the reclaim request:
  • the page type of the valid page data includes the first type of page data and the second type of page data, and the second type of page data is used to indicate that the page data stored to the solid state hard disk is not replaced from the cache;
  • S3 Separating the valid page data according to the page type of the valid page data to obtain the first type of page data.
  • the page data of the second type includes the first page data and the second page data
  • the page type for obtaining the valid page data according to the access frequency and the modification identifier includes: identifying the modification as not being
  • the modified page data is used as the first page data
  • the modification identifies the page data that has been modified and the access frequency is greater than or equal to the first predetermined threshold as the second page data, and identifies the modification as being modified and the access frequency is less than the first predetermined threshold.
  • the page data is used as the first type of page data.
  • the FTL in the SSD will detect the free space of the SSD in real time. After detecting that the free space is insufficient and triggering the reclaim request, the system will start to traverse the LRU queue in the cache and queue the LRU. The page data in the tag page type.
  • the cache layer marks the 10% page of the tail of the LRU queue as a cold page and the remaining pages as a hot page. Further, the cold page and the hot page are respectively traversed, the dirty page in the cold page is marked as a cold dirty page (CD), the dirty page in the hot page is marked as a hot dirty page (HD), and the marked page type is notified. FTL in SSD.
  • the page data of the first type ie, the cold dirty page
  • the page data of the first type can be obtained by separating, so that the first type of page data can be easily relocated. Relocate to the storage location of the valid page data after data recovery, and avoid the two relocations in the cache replacement and data recovery process to reduce the overhead.
  • the method before moving the first type of page data from the cache to the predetermined relocation location in the SSD, the method further includes:
  • S1 determining, according to at least the first type of page data, a data recovery block of the solid state hard disk, where
  • the page type of the page data in each data block in the solid state hard disk includes: unwritten page data, invalid page data, valid page data, and each data block includes a data recovery block;
  • the determining, by the foregoing at least the first type of page data, the data recovery block of the solid state hard disk includes:
  • the manner of determining the data recovery block by comparing the obtained data recovery rates includes at least one of the following:
  • the mapping table of the cache layer as the cache not only stores the page type of the page data mark, but also stores the preset location information after the page data is replaced by the SSD. Such as the block identifier of the data block).
  • the FTL may count the number of cold dirty pages marked by the cache as the first type of page data, and the number of invalid page data in each data block of the solid state hard disk. Obtain the number of failed pages and the number of cold and dirty pages in each data block in units of data blocks, and calculate the number of invalid pages and the number of cold and dirty pages. The data recovery rate of each data block.
  • the data recovery rate is used to accurately locate the data recovery block for data recovery in the solid state hard disk, thereby realizing accurate and efficient data recovery of the solid state hard disk, thereby ensuring data processing efficiency in the solid state hard disk.
  • obtaining data recovery rates of each data block according to the first type of page data in the cache and the page data in each data block in the solid state hard disk includes:
  • r represents the data recovery rate of the current data block
  • a represents the number of pages of the invalid page data of the current data block in the solid state hard disk
  • b represents the number of pages of the first type of page data in the cache
  • P represents the page size
  • B represents Block size.
  • the unit of the read/write operation in the solid state hard disk is a page, wherein the page size is usually 2 KB, and the access delay is generally 15 us to 200 us.
  • the unit of the erase operation is a block, where the block size is usually 128 KB, and erasing a block requires an overhead of about 2 ms.
  • the data recovery rate of each data block in the solid state hard disk is sequentially calculated in the above manner, thereby ensuring the accuracy of the determined data recovery block, thereby realizing accurate and efficient data recovery of the solid state hard disk.
  • data recovery for data recovery blocks includes:
  • the clean page and the hot dirty page in the data recovery block can be directly copied to the predetermined relocation position, and the corresponding page data corresponding to the FTL is modified. location information.
  • the corresponding valid page data in the data recovery block is marked as invalid page data (which can be represented by a stale page).
  • the cold dirty page in the data recovery block is also marked as invalid page data, and the latest data of the cold dirty page in the cache is copied to a predetermined relocation location. Then, the latest data of the cold dirty page in the cache is deleted, and the location information corresponding to the page data in the FTL is modified.
  • the page data in the data recovery block is erased, and the data recovery block is marked as "erased” to realize data recovery of the solid state hard disk and release the free space.
  • the data recovery of the solid state hard disk is ensured by the foregoing manner, and the first type of page data and the second type of page data in the effective page data can be relocated to the predetermined relocation position at one time, thereby avoiding The secondary relocation of the first type of page data achieves the effect of reducing the overhead of the solid state drive.
  • the method before moving the first type of page data from the cache to the predetermined relocation location in the SSD, the method further includes:
  • the size of the valid page data in the statistics recovery block For example, the size of the valid page data in the statistics recovery block, the real-time updated FTL, the unwritten data page of the other data blocks of the SSD that meet the size of the valid page data, and the found data block as the data recovery block.
  • the predetermined relocation location of the valid page data For example, the size of the valid page data in the statistics recovery block, the real-time updated FTL, the unwritten data page of the other data blocks of the SSD that meet the size of the valid page data, and the found data block as the data recovery block.
  • the predetermined relocation location is determined according to the size of the unwritten page data in the data block other than the data recovery block in the solid state hard disk, so as to ensure that the effective page data in the data recovery block can be completely relocated.
  • a data processing device is also provided, which is used to implement the above-mentioned embodiments and preferred embodiments, and will not be described again.
  • the term “module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 4 is a schematic diagram of an optional data processing apparatus according to an embodiment of the present invention. As shown in FIG. 4, the apparatus includes:
  • the first obtaining unit 402 is configured to acquire a recycling request, where the recycling request is used to request data recovery of page data in the solid state hard disk;
  • the second obtaining unit 404 is configured to obtain, according to the recycling request, the first type of page data from the cached valid page data, wherein the first type of page data is used to indicate that the page to be replaced from the cache to the solid state hard disk is to be replaced. data;
  • the relocation unit 406 is configured to relocate the first type of page data from the cache to a predetermined relocation location in the SSD, wherein the predetermined relocation location is a storage location of the valid page data after performing data recovery.
  • the foregoing data processing apparatus may be, but is not limited to, being applied to a garbage data recovery process of a solid state hard disk. That is to say, in the embodiment, when the data of the page data as the garbage in the solid state hard disk is recovered, the valid page data can be obtained when the data recovery request for the page data in the solid state hard disk is obtained.
  • the first type of page data stored in the cache to be replaced by the cache to the SSD is directly moved to the predetermined relocation position in the SSD without first replacing the first type of page data with the SSD. Performing a relocation, thereby overcoming the problem of low data processing efficiency caused by the secondary relocation of data in the related technology, thereby improving the efficiency of data processing and greatly reducing the overhead caused by data relocation of the solid state drive. .
  • the SSD includes: a Flash Translation Layer (FTL), where the flash translation layer is used to map a logical address to a physical address through a mapping table; Page type; detects free space, triggers data recovery when there is insufficient, for example, when the number of erased blocks in the SSD is less than 20% of the total number of data blocks, it will trigger the request for the page in the SSD Data recovery request for data recovery.
  • FTL Flash Translation Layer
  • Page type detects free space, triggers data recovery when there is insufficient, for example, when the number of erased blocks in the SSD is less than 20% of the total number of data blocks, it will trigger the request for the page in the SSD Data recovery request for data recovery.
  • the page type of the valid page data includes the first type of page data and the second type of page data, wherein the first type of page data is to be replaced from the cache to the solid state hard disk.
  • Page data, the second type of page data is page data that is not replaced by the cache to the SSD.
  • the second obtaining unit 404 includes: (1) a first obtaining module, configured to acquire an access frequency and a modified identifier of valid page data in the cache; and (2) a second acquiring module,
  • the page type is set to obtain valid page data according to the access frequency and the modification identifier, wherein the page type of the valid page data includes the first type of page data and the second type of page data, and the second type of page data is used to indicate that the page type is not
  • the cache replaces the page data stored in the solid state hard disk; (3) the separation module is configured to separate the valid page data according to the page type of the valid page data to obtain the first type of page data.
  • a Cache Layer for temporarily caching valid page data will queue the pages in the cache according to the Least Recently Used algorithm (the most recently unused queue, the LRU queue).
  • the LRU queue may be, but is not limited to, divided into hot (HOT) pages and cold (COOL) pages according to a predetermined threshold. For example, if the predetermined threshold is 10, 10% of the tail of the LRU queue is marked as a cold (COOL) page, before 90% of pages are marked as hot (HOT) pages.
  • the LRU queues may be, but are not limited to, ordered according to the access frequency.
  • the page data modified in the cache layer is marked as a dirty (DIRTY) page.
  • Unmodified page data is marked as a CLEAN page.
  • a dirty page in a cold page is called a COOL DIRTY page (identified by a CD), and a dirty page in a hot page is called a hot dirty (HOT DIRTY) page (identified by HD).
  • the page data in the cache there are two copies of the page data in the cache, one is a copy in the SSD, and the other is a copy in the cache. If it is a clean page, the two copies are identical; if it is a dirty page, the copy in the cache is the latest page data, and the copy in the SSD is the old page data. That is to say, the cold dirty page in the solid state hard disk and the cold dirty page in the cache are different copies of the same page, the cold dirty page in the cache stores new data, and the solid state hard disk stores the old data of the cold dirty page.
  • the cold dirty page storing the old data in the solid state hard disk may be marked as invalid page data, and
  • the new data corresponding to the cold and dirty pages in the cache is directly written into the updated position of the SSD (for example, the predetermined relocation position after performing data recovery). Therefore, the cold dirty page is first moved to the solid state hard disk through the cache replacement, and a secondary relocation step is performed in the process of data recovery, thereby overcoming the data processing caused by the secondary relocation of data in the related art.
  • the problem of low efficiency in addition to improving the efficiency of data processing, also greatly reduces the overhead caused by the relocation of solid state drives.
  • each data block in the solid state hard disk may include, but is not limited to, the following five types of page data: unwritten page data (can be represented by an unwritten page), invalid page data (available invalid page) Indicates), valid page data (can be represented by a valid page), wherein the valid page data includes: clean page data (which can be represented by a clean page), hot dirty page data (which can be represented by a hot dirty page), and cold dirty page data ( Can be represented by a cold dirty page).
  • the unwritten page data is a free space in the data block, has been erased or never allocated, and can directly write data.
  • Clean page data means that the page has been written to the data, and the page data is not modified in the cache; hot dirty page data means that the page data has been modified in the cache, but has not been cached due to frequent access.
  • the above cache It is used to store valid page data, which can include, but is not limited to, the following three types of page data: clean page data, hot dirty page data, and cold dirty page data.
  • the page data of the second type may include, but is not limited to, clean page data and hot dirty page data
  • the first type of page data may include, but is not limited to, cold dirty page data.
  • the clean page data in the valid page data is consistent with the content stored in the solid state disk in the cache, in this embodiment, the clean page data can be, but is not limited to, the hot dirty page data.
  • the second type of page data stored to the SSD is not replaced from the cache.
  • the foregoing apparatus further includes: (1) a first determining unit, configured to: at least according to the first, before relocating the first type of page data from the cache to a predetermined relocation position in the solid state hard disk
  • the type of page data determines a data recovery block of the solid state hard disk, wherein the page type of the page data in each data block in the solid state hard disk includes: unwritten page data, invalid page data, valid page data, and each data block includes a data recovery block.
  • Recycling unit set to recover data from the data recovery block.
  • the foregoing recycling unit performs data recovery on the data recovery block by: relocating valid page data in the data recovery block to a predetermined relocation location, and marking the valid page data as a invalidation page. Data; erase the invalid page data in the data recovery block.
  • the first determining unit is configured to determine, according to at least the first type of page data, a data recovery block of the solid state hard disk by acquiring at least each data block in the solid state hard disk according to the first type of page data.
  • Data recovery rate by comparing the data recovery rate to determine the data recovery block (such as determining the block identifier of the data recovery block).
  • the manner of determining the data recovery block by comparing the obtained data recovery rates includes at least one of the following:
  • the data recovery rate of each data block in the solid state hard disk is obtained according to at least the first type of page data, which may include, but is not limited to, according to the first type of page data and each data block in the solid state hard disk.
  • the invalidation page data to determine the data recovery rate.
  • the method before the relocation of the first type of page data from the cache to the predetermined relocation position in the SSD, the method further includes: according to other data blocks in the SSD except the data recovery block.
  • the size of the write page data determines the predetermined relocation location. So that the effective page data in the SSD can be completely relocated to the area corresponding to the unwritten page data in other data blocks.
  • the data processing apparatus can implement data recovery of the solid state hard disk by the following steps:
  • the system triggers a recycling request.
  • the first type of page data stored in the effective page data to be replaced from the cache to the solid state hard disk is directly used. Relocating to a predetermined relocation location in the SSD without first replacing the first type of page data with the SSD, and then re-removing, thereby overcoming the data processing efficiency caused by the secondary relocation of data in the related art.
  • the low problem achieves the effect of improving data processing efficiency.
  • it also reduces the number of data relocations and the additional overhead caused by the SSD during data recovery and cache replacement, and improves the performance of the SSD.
  • the second obtaining unit includes:
  • the first obtaining module is configured to obtain an access frequency and a modified identifier of valid page data in the cache
  • the second obtaining module is configured to obtain a page type of valid page data according to the access frequency and the modification identifier, wherein the page type of the valid page data includes the first type of page data and the second type of page data, and the second type
  • the page data is used to indicate that the page data stored to the solid state hard disk is not replaced from the cache;
  • the separation module is configured to separate the valid page data according to the page type of the valid page data to obtain the first type of page data.
  • the page data of the second type may include: first page data and second page data, where the second obtaining module obtains the page type of the valid page data by: modifying The page data that is identified as being unmodified is used as the first page data, and the modification is identified as the page data that has been modified and the access frequency is greater than or equal to the first predetermined threshold as the second page data, and the modification is identified as having been modified and the access frequency is less than The page data of the first predetermined threshold is used as the first type of page data.
  • the FTL in the SSD will detect the free space of the SSD in real time. After detecting that the free space is insufficient and triggering the reclaim request, the system will start to traverse the LRU queue in the cache and queue the LRU. The page data in the tag page type.
  • the cache layer marks the 10% page of the tail of the LRU queue as a cold page and the remaining pages as a hot page. Further, the cold page and the hot page are respectively traversed, the dirty page in the cold page is marked as a cold dirty page (CD), the dirty page in the hot page is marked as a hot dirty page (HD), and the marked page type is notified. FTL in SSD.
  • the page data of the first type ie, the cold dirty page
  • the page data of the first type can be obtained by separating, so that the first type of page data can be easily relocated. Relocate to the storage location of the valid page data after data recovery, and avoid the two relocations in the cache replacement and data recovery process to reduce the overhead.
  • the first determining unit is configured to determine, according to the first type of page data, a data recovery block of the solid state hard disk, at least before the relocation of the first type of page data from the cache to the predetermined relocation position in the solid state hard disk, wherein the solid state hard disk
  • the page type of the page data in each data block includes: unwritten page data, invalid page data, valid page data, and each data block includes a data recovery block;
  • Recycling unit set to recover data from the data recovery block.
  • the first determining unit includes:
  • the third obtaining module is configured to obtain a data recovery rate of each data block according to the first type of page data in the cache and the page data in each data block in the solid state hard disk;
  • a determination module that is set to determine a data recovery block based on the data recovery rate.
  • the manner of determining the data recovery block by comparing the obtained data recovery rates includes at least one of the following:
  • the mapping table of the cache layer as the cache not only stores the page type of the page data mark, but also stores the preset location information after the page data is replaced by the SSD. Such as the block identifier of the data block).
  • the FTL may count the number of cold dirty pages marked by the cache as the first type of page data, and the number of invalid page data in each data block of the solid state hard disk.
  • the data block unit the number of failed pages and the number of cold and dirty pages in each data block are respectively obtained, and the data recovery rate of each data block is calculated by using the number of failed pages and the number of cold dirty pages.
  • the data recovery rate is used to accurately locate the data recovery block for data recovery in the solid state hard disk, thereby realizing accurate and efficient data recovery of the solid state hard disk, thereby ensuring data processing efficiency in the solid state hard disk.
  • the third obtaining module includes:
  • r represents the data recovery rate of the current data block
  • a represents the number of pages of the invalid page data of the current data block in the solid state hard disk
  • b represents the page of the first type of page data in the cache.
  • the number, P represents the page size
  • B represents the block size.
  • the unit of the read/write operation in the solid state hard disk is a page, wherein the page size is usually 2 KB, and the access delay is generally 15 us to 200 us.
  • the unit of the erase operation is a block, where the block size is usually 128 KB, and erasing a block requires an overhead of about 2 ms.
  • the data recovery rate of each data block in the solid state hard disk is sequentially calculated in the above manner, thereby ensuring the accuracy of the determined data recovery block, thereby realizing accurate and efficient data recovery of the solid state hard disk.
  • the recycling unit includes:
  • the relocation module is configured to relocate the valid page data in the data recovery block to a predetermined relocation location, and mark the valid page data as invalid page data;
  • the erase module is set to erase the invalid page data in the data recovery block.
  • the first type of page data is a cold dirty page
  • the second type of page data is a clean page and a hot dirty page.
  • a clean page in the data recovery block can be recycled.
  • the hot dirty page is directly copied to the predetermined relocation location, and the location information corresponding to the above page data in the FTL is modified.
  • the corresponding valid page data in the data recovery block is marked as invalid page data (which can be represented by a stale page).
  • the cold dirty page in the data recovery block is also marked as invalid page data, and the latest data of the cold dirty page in the cache is copied to a predetermined relocation location. Then, the latest data of the cold dirty page in the cache is deleted, and the location information corresponding to the page data in the FTL is modified.
  • the page data in the data recovery block is erased, and the data recovery block is marked as "erased” to realize data recovery of the solid state hard disk and release the free space.
  • the data recovery of the solid state hard disk is ensured by the foregoing manner, and the first type of page data and the second type of page data in the effective page data can be relocated to the predetermined relocation position at one time, thereby avoiding The secondary relocation of the first type of page data achieves the effect of reducing the overhead of the solid state drive.
  • the second determining unit is configured to: before the relocation of the first type of page data from the cache to the predetermined relocation position in the solid state hard disk, the page data is not written according to the data block other than the data recovery block in the solid state hard disk.
  • the size determines the predetermined relocation location.
  • the size of the valid page data in the statistics recovery block For example, the size of the valid page data in the statistics recovery block, the real-time updated FTL, the unwritten data page of the other data blocks of the SSD that meet the size of the valid page data, and the found data block as the data recovery block.
  • the predetermined relocation location of the valid page data For example, the size of the valid page data in the statistics recovery block, the real-time updated FTL, the unwritten data page of the other data blocks of the SSD that meet the size of the valid page data, and the found data block as the data recovery block.
  • the predetermined relocation location is determined according to the size of the unwritten page data in the data block other than the data recovery block in the solid state hard disk, so as to ensure that the effective page data in the data recovery block can be completely relocated.
  • each of the above modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be configured to store program code for performing the following steps:
  • the first type of page data is obtained from the cached valid page data in response to the reclaiming request, wherein the first type of page data is used to indicate that the page data to be replaced from the cache to the solid state hard disk is to be replaced;
  • the first type of page data is relocated from the cache to a predetermined relocation location in the solid state hard disk, wherein the predetermined relocation location is a storage location of the valid page data after performing data recovery.
  • the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • the first type of page data that is to be replaced from the cache to the solid state hard disk is directly relocated by the valid page data.
  • To the predetermined relocation position in the SSD without first replacing the first type of page data with the SSD, and then performing a relocation, thereby overcoming the low efficiency of data processing caused by the secondary relocation of data in the related art.
  • the problem is to improve the efficiency of data processing.
  • the number of data relocations and the additional overhead caused by the SSD during data recovery and cache replacement are reduced, and the performance of the SSD is improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Selon des modes de réalisation, la présente invention concerne un procédé et un appareil de traitement de données. Le procédé comprend : l'acquisition d'une requête de récupération, la requête de récupération étant utilisée pour demander une récupération de données de données de page dans un disque électronique (SSD) ; en réponse à la requête de récupération, l'acquisition d'un premier type de données de page à partir de données de page valide mises en cache, le premier type de données de page étant utilisé pour indiquer les données de page devant être permutées d'un cache au SSD et stockées dans ce dernier ; la migration du premier type de données de page du cache à un emplacement de migration prédéterminé dans le SSD, l'emplacement de migration prédéterminé étant un emplacement de stockage destiné aux données de page valides après l'exécution de la récupération de données. La présente invention résout le problème d'inefficacité de traitement de données provoqué par une migration secondaire de données dans l'état de la technique, ce qui permet d'améliorer l'efficacité de traitement de données.
PCT/CN2017/074290 2016-02-25 2017-02-21 Procédé et appareil de traitement de données WO2017143972A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610103929.6 2016-02-25
CN201610103929.6A CN107122124B (zh) 2016-02-25 2016-02-25 数据处理方法及装置

Publications (1)

Publication Number Publication Date
WO2017143972A1 true WO2017143972A1 (fr) 2017-08-31

Family

ID=59684803

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/074290 WO2017143972A1 (fr) 2016-02-25 2017-02-21 Procédé et appareil de traitement de données

Country Status (2)

Country Link
CN (1) CN107122124B (fr)
WO (1) WO2017143972A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710541A (zh) * 2018-12-06 2019-05-03 天津津航计算技术研究所 针对NAND Flash主控芯片Greedy垃圾回收的优化方法
CN109739776A (zh) * 2018-12-06 2019-05-10 天津津航计算技术研究所 用于NAND Flash主控芯片的Greedy垃圾回收***

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113805805B (zh) * 2021-05-06 2023-10-13 北京奥星贝斯科技有限公司 缓存内存块的淘汰方法、装置及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279809A (zh) * 2011-08-10 2011-12-14 郏惠忠 一种在固态硬盘中重定向写入及垃圾回收的方法
CN102508788A (zh) * 2011-09-28 2012-06-20 成都市华为赛门铁克科技有限公司 Ssd及ssd垃圾回收方法和装置
CN104424103A (zh) * 2013-08-21 2015-03-18 光宝科技股份有限公司 固态储存装置中高速缓存的管理方法
US20160041903A1 (en) * 2009-12-11 2016-02-11 Nimble Storage, Inc. Garbage collection based on temperature

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8219776B2 (en) * 2009-09-23 2012-07-10 Lsi Corporation Logical-to-physical address translation for solid state disks
US20120159098A1 (en) * 2010-12-17 2012-06-21 Microsoft Corporation Garbage collection and hotspots relief for a data deduplication chunk store
CN102841850B (zh) * 2012-06-19 2016-04-20 记忆科技(深圳)有限公司 减小固态硬盘写放大的方法及***
CN103136121B (zh) * 2013-03-25 2014-04-16 中国人民解放军国防科学技术大学 一种固态盘的缓存管理方法
CN103455435A (zh) * 2013-08-29 2013-12-18 华为技术有限公司 数据写入方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160041903A1 (en) * 2009-12-11 2016-02-11 Nimble Storage, Inc. Garbage collection based on temperature
CN102279809A (zh) * 2011-08-10 2011-12-14 郏惠忠 一种在固态硬盘中重定向写入及垃圾回收的方法
CN102508788A (zh) * 2011-09-28 2012-06-20 成都市华为赛门铁克科技有限公司 Ssd及ssd垃圾回收方法和装置
CN104424103A (zh) * 2013-08-21 2015-03-18 光宝科技股份有限公司 固态储存装置中高速缓存的管理方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710541A (zh) * 2018-12-06 2019-05-03 天津津航计算技术研究所 针对NAND Flash主控芯片Greedy垃圾回收的优化方法
CN109739776A (zh) * 2018-12-06 2019-05-10 天津津航计算技术研究所 用于NAND Flash主控芯片的Greedy垃圾回收***
CN109710541B (zh) * 2018-12-06 2023-06-09 天津津航计算技术研究所 针对NAND Flash主控芯片Greedy垃圾回收的优化方法
CN109739776B (zh) * 2018-12-06 2023-06-30 天津津航计算技术研究所 用于NAND Flash主控芯片的Greedy垃圾回收***

Also Published As

Publication number Publication date
CN107122124A (zh) 2017-09-01
CN107122124B (zh) 2021-06-15

Similar Documents

Publication Publication Date Title
US10838859B2 (en) Recency based victim block selection for garbage collection in a solid state device (SSD)
KR100843543B1 (ko) 플래시 메모리 장치를 포함하는 시스템 및 그것의 데이터복구 방법
US8838875B2 (en) Systems, methods and computer program products for operating a data processing system in which a file delete command is sent to an external storage device for invalidating data thereon
US10176190B2 (en) Data integrity and loss resistance in high performance and high capacity storage deduplication
US8745310B2 (en) Storage apparatus, computer system, and method for managing storage apparatus
US9690694B2 (en) Apparatus, system, and method for an address translation layer
US20170139825A1 (en) Method of improving garbage collection efficiency of flash-oriented file systems using a journaling approach
US8719501B2 (en) Apparatus, system, and method for caching data on a solid-state storage device
US8417878B2 (en) Selection of units for garbage collection in flash memory
US9779027B2 (en) Apparatus, system and method for managing a level-two cache of a storage appliance
CN109656486B (zh) 固态硬盘的配置方法、数据存储方法、固态硬盘和存储控制器
US10877898B2 (en) Method and system for enhancing flash translation layer mapping flexibility for performance and lifespan improvements
US10025669B2 (en) Maintaining data-set coherency in non-volatile memory across power interruptions
CN107391774B (zh) 基于重复数据删除的日志文件***的垃圾回收方法
US20170060448A1 (en) Systems, solid-state mass storage devices, and methods for host-assisted garbage collection
US10114576B2 (en) Storage device metadata synchronization
US10552335B2 (en) Method and electronic device for a mapping table in a solid-state memory
CN111880723B (zh) 数据储存装置与数据处理方法
CN110674056B (zh) 一种垃圾回收方法及装置
CN112596667A (zh) 在固态驱动器中组织nand块并放置数据以便于随机写入的高吞吐量的方法和***
WO2017143972A1 (fr) Procédé et appareil de traitement de données
CN115269451B (zh) 闪存垃圾回收方法、装置及可读存储介质
US20140258591A1 (en) Data storage and retrieval in a hybrid drive
JP2007220107A (ja) 不揮発性メモリのマッピング情報管理装置及び方法
US11429519B2 (en) System and method for facilitating reduction of latency and mitigation of write amplification in a multi-tenancy storage drive

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17755801

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17755801

Country of ref document: EP

Kind code of ref document: A1