CN111143343B - Efficient data deleting method and system based on source terminal deduplication - Google Patents

Efficient data deleting method and system based on source terminal deduplication Download PDF

Info

Publication number
CN111143343B
CN111143343B CN201911374951.4A CN201911374951A CN111143343B CN 111143343 B CN111143343 B CN 111143343B CN 201911374951 A CN201911374951 A CN 201911374951A CN 111143343 B CN111143343 B CN 111143343B
Authority
CN
China
Prior art keywords
container
data
module
library
deduplication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911374951.4A
Other languages
Chinese (zh)
Other versions
CN111143343A (en
Inventor
周建华
张有成
姚崎
丁红
李海鹏
许萍萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace One System Jiangsu Information Technology Co ltd
Original Assignee
Aerospace One System Jiangsu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace One System Jiangsu Information Technology Co ltd filed Critical Aerospace One System Jiangsu Information Technology Co ltd
Priority to CN201911374951.4A priority Critical patent/CN111143343B/en
Publication of CN111143343A publication Critical patent/CN111143343A/en
Application granted granted Critical
Publication of CN111143343B publication Critical patent/CN111143343B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1453Management of the data involved in backup or backup restore using de-duplication of the data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a high-efficiency data deleting method based on source terminal deleting, in the backup process, segmenting a data stream of a source terminal into data blocks, calculating fingerprints, comparing the fingerprints, if the fingerprints are not provided with new blocks, transmitting the corresponding data blocks into a container of a server terminal for storage, marking the corresponding container as 1, writing the container into a data file after the container is fully written, and creating a new container; the backup set is automatically cleaned due to expiration, and the guid object record is cleaned; and (3) during idle time outside the normal service window period, cleaning the data block and the fingerprint thereof for the container marked with 0 by using preset loop deletion logic, wherein the container marked with 0 indicates that the data block and the fingerprint thereof in the container can be cleaned without being referenced. The advantages are that: the application adopts a marking mode, the statistical logic is simpler, the cleaning logic is not influenced by the size of the deduplication library, and the method is more efficient.

Description

Efficient data deleting method and system based on source terminal deduplication
Technical Field
The application relates to a method and a system for efficiently deleting data based on source end deduplication, and belongs to the technical field of data protection.
Background
The source terminal erasure has been widely used in data protection products due to its characteristics of reducing transmission bandwidth and storage space. For convenience of explanation, the convention uses that the source end data after the deduplication is stored in a deduplication library, wherein the deduplication library comprises a deduplication fingerprint library and a deduplication database. The index information of the data blocks is stored in the deduplication fingerprint database, and the data blocks are stored in the deduplication database. The data after the source terminal is used for deleting has the following characteristics: the data blocks stored in the deduplication database are unique in the whole database, and most of the data blocks in the deduplication database can be commonly used by a plurality of data sources, so that the aim of reducing the storage space can be achieved only by the characteristic. This feature has a positive effect on reducing the storage space, but has great complexity on deletion operation, and the data in the deduplication database is difficult to clean conveniently like ordinary data. The first way of the existing method is to record the reference times of each data block, increase the reference times for repeated data blocks during backup, subtract the reference times of the data blocks contained during deletion, and wait until the reference times are 0 to indicate that the data blocks can be cleaned, and the storage space occupied by the data blocks can be released. The mode has great influence on the backup and deletion performance along with the increase of the deletion library, the other mode is centralized cleaning, the centralized cleaning mode is executed at a specific time point, marks are marked on all the used data files, the data files are quite large in granularity relative to data blocks, statistics is quite fast, then the unused data files and fingerprints are deleted, the purpose of releasing space is achieved, the mode has the defects of being large in granularity, and the effect of releasing space is not quite good.
Disclosure of Invention
The application aims to overcome the defects that in the existing source terminal deduplication technology, due to the unique characteristic of the deduplicated data, logic is complex, efficiency is low and space cannot be released rapidly and efficiently when deleting operation is performed, and provides a source terminal deduplication-based data efficient deleting method and system.
In order to solve the technical problems, the application provides a high-efficiency data deleting method based on source end deduplication, which is characterized in that in the backup process, a data stream of a source end is segmented into data blocks, fingerprints are calculated and compared, if no instruction exists for the fingerprints as new blocks, the corresponding data blocks are transmitted into a container of a server end for storage, the corresponding container is marked as 1, the container is written into a data file after being fully written, a new container is created, the container comprises a plurality of data blocks, a deduplication library comprises a plurality of data files with fixed sizes, and each data file comprises a plurality of containers;
the backup set is automatically cleaned due to expiration, and the guid object record is cleaned;
and (3) during idle time outside the normal service window period, cleaning the data block and the fingerprint thereof for the container marked with 0 by using preset loop deletion logic, wherein the container marked with 0 indicates that the data block and the fingerprint thereof in the container can be cleaned without being referenced.
Further, the container is fixed in size.
Further, the marking process of each container is as follows:
determining a backup set, wherein the backup set comprises an object library and a deduplication library, the object library stores object files, the object files store object records and index data of the objects, the deduplication library stores data files, and the data files store information of each data block contained in the objects;
and acquiring the referenced object file, reading index data in the object file according to the unique identifier of the object, finding a corresponding container according to the fingerprint in the index data, and marking the corresponding container record with a mark 1.
Further, the loop delete logic is to:
s1, in the backup process, marking a container where a corresponding data block is located as 1 for the referenced data block, and marking a corresponding object record as 1 to indicate that the data block is checked;
s2, traversing object records, finding out objects marked as 0, finding out the position of a container stored by a corresponding data block in a deduplication library according to index information of records in an object file, marking the container corresponding to a fingerprint as 1, and marking the object record as 0 to indicate that the object record is not inspected yet;
s3, traversing container records in the deduplication library, cleaning data blocks and fingerprints thereof in a container marked with 0, and marking the container state as 2, wherein the container is cleaned and can be reused;
s4, marking the container record in the deduplication library as 0 of 1, and marking all the object records in the object library as 0;
s5, collecting all containers marked as 2 in the deduplication library, and preferentially selecting the collected containers for multiplexing when new data needs to be stored;
s6, circularly executing the steps S1-S5 in a set period.
A high-efficiency data deleting system based on source end deleting comprises a container determining module, a backup set cleaning module and a deleting module;
the container determining module is used for dividing a data stream of a source end into data blocks in a backup process, calculating fingerprints, comparing the fingerprints, if the fingerprints are not provided with new blocks, transmitting the corresponding data blocks into containers of a server end for storage, marking the corresponding containers as 1, writing the containers into data files after the containers are fully written, and creating a new container, wherein the container comprises a plurality of data blocks, a deduplication library comprises a plurality of data files with fixed sizes, and each data file comprises a plurality of containers;
the backup set cleaning module is used for automatically cleaning the backup set after the backup set expires, and simultaneously deleting the guid object records;
and the deleting module is used for cleaning the data block and the fingerprint of the container marked as 0 by utilizing a preset loop deleting logic in idle time outside the normal service window period, wherein the container marked as 0 indicates that the data block and the fingerprint in the container are not referenced and can be cleaned.
Further, the size of the container determined by the container determining module is fixed.
Further, the container determining module comprises a backup set determining module and a container marking module;
the backup set determining module is used for determining a backup set, the backup set comprises an object library and a deduplication library, the object library stores object files, the object files store object records and index data of objects, the deduplication library stores data files, and the data files store information of each data block contained in the objects;
the container marking module is used for acquiring the referenced object file, reading index data in the object file according to the unique identifier of the object, finding a corresponding container according to the fingerprint in the index data, and marking the corresponding container record with a mark 1.
Further, the cleaning module comprises a backup module, a first traversing module, a second traversing module, an initializing module, a collecting module and a circulating module;
the backup module is used for marking a container where a corresponding data block is located as 1 for the referenced data block in the backup process, and marking a corresponding object record as 1 to represent that the referenced data block is checked;
the first traversing module is used for traversing the object records, finding out the objects marked as 0, finding out the position of the container stored by the corresponding data block in the deduplication library according to the index information of the records in the object file, marking the container corresponding to the fingerprint as 1, and marking the object record mark as 0 to indicate that the object record is not inspected yet;
the second traversing module traverses the container records in the deduplication library, cleans the data blocks and fingerprints thereof in the container marked with 0, then marks the container state as 2, and represents that the container is cleaned and can be reused; the method comprises the steps of carrying out a first treatment on the surface of the
The initialization module marks the container record in the deduplication library as 0 of 1, and marks of all object records in the object library are 0;
the collection module is used for collecting all containers marked as 2 in the deduplication library, and the collected containers are preferentially selected to be reused when new data needs to be stored;
the circulation module is used for circularly executing the processes of the backup module, the first traversing module, the second traversing module, the initializing module and the collecting module in a set period.
The application has the beneficial effects that:
the application can regularly execute cleaning in the background under the condition of not affecting normal backup and recovery service, and the cleaned space can be reused, thereby achieving the purpose of releasing the space in phase change. The application adopts a marking mode, the statistical logic is simpler, the cleaning logic is not influenced by the size of the deduplication library, and the method is more efficient.
Drawings
FIG. 1 is a schematic flow diagram of marking containers that are also being referenced;
fig. 2 is a schematic flow diagram of a purge vessel.
Detailed Description
The application is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present application, and are not intended to limit the scope of the present application.
The deleting logic used in the application uses the marks of the containers to distinguish which data blocks in the containers and the fingerprints thereof can be cleaned and which are still used, so that the key is how to quickly mark the containers in the re-deleting library, the deleting of the fingerprints is simple, and the user only needs to read the containers needing to be deleted to analyze the stored fingerprints, and then delete the corresponding records in the fingerprint library. It takes a long time to find the fingerprints in the index file of all objects in the object library one time to determine which containers are still being used as a whole if they are to be marked each time they are cleaned up. The application extends this stage to the backup stage, compare the fingerprint naturally in the backup process, only need make the container hit at this moment mark can, have little influence on backup logic, because the validity of the backup set is limited in time, only need find a small number of objects can confirm which container can be cleared up at this moment when clearing up.
The containers of the present application have a relatively small particle size. Containers are a logical concept for managing a collection of data blocks, and containers are of a fixed size (each container being the same size), and a data file may contain multiple containers. Because the containers are of a fixed size, new data can be stored by multiplexing once the containers are cleaned, and in addition, the cleaning of the container level can be performed by using the idle time of normal business, and the normal business is hardly influenced.
The method comprises the steps of deleting a plurality of data files with fixed sizes in a database, wherein each data file comprises a plurality of containers with fixed sizes, each container comprises a plurality of data blocks, dividing a data stream of a source end into data blocks in a backup process, calculating fingerprints, comparing the fingerprints, if no instruction exists, transmitting the corresponding data blocks to a container of a server end for storage, marking the corresponding container as 1, writing the container into the data file after the container is fully written, and creating a new container.
The reason for the container unit is that the purpose of the deduplication library to contain a batch of fingerprints in container units is the locality principle of the data utilized. The principle of locality is that if a data block is used, then the probability that its neighboring data block will also be used is high. The container is used as an updating unit of the cache, so that the hit rate of the cache can be effectively improved. The delete function can also take advantage of the principle that if a block needs to be cleaned, then the data block that it is adjacent to will be cleaned with a high probability.
The backup set is stored in a background in an object mode and mainly comprises two parts, wherein an object library and an object file store object records and index data of the objects, and a deduplication library and a data file store information of each data block contained in the objects. The read backup set accesses the object library first, and finds out the corresponding data block in the duplicate and delete library according to the index information recorded in the object file.
As shown in fig. 1, a referenced object file is acquired, index data of objects in the object file is read according to guid (unique object identifier), a corresponding container is found according to fingerprints in the index data, and a corresponding container record is marked with a mark 1.
The method specifically comprises the following steps:
1. during the backup process, for the referenced data block, the container in which the corresponding data block is located is labeled 1, representing that the container has a data block used. The corresponding object record is also marked 1, representing that it has been checked.
2. Traversing the object records, finding out the objects marked as 0, finding out the position of the container stored by the corresponding data block in the deduplication library according to the index information of the records in the object file, and marking the container corresponding to the fingerprint.
3. Traversing the container records in the deduplication library, no fingerprint in the container, labeled 0, is referenced to clean up, clean up the fingerprint in the container, and then the container state is labeled 2, indicating that the container has been cleaned up for reuse.
4. The flag bit is initialized. The container record in the deduplication library is marked 1 and set to 0, and the marks of all object records in the object library are set to 0.
5. All containers marked 2 in the deduplication library are collected, and the collected containers are preferably selected for reuse when new data needs to be stored.
As shown in fig. 2, the steps of cleaning using the idle time outside the normal service window period are:
starting to collect the container id to be cleaned, judging whether the container needs cleaning according to the mark of the container, if yes, judging whether the service is idle, if no, waiting for 1s, judging again, if idle, closing the second-level buffer memory in the cleaning write buffer memory, judging whether to idle again, if no, waiting for 1s again, judging whether to idle again, if idle, taking out one container id, checking the container state again, executing cleaning if cleaning is still needed, cleaning the mark, judging whether the service is idle again, if not waiting for 1s, judging again, if yes, opening the second-level buffer memory in the writing buffer memory, initializing a bloom filter, and ending.
The process is circularly executed in the background with a set period, and the backup set in the data protection product has a life cycle, so that the backup set can be automatically cleaned when the life cycle expires, and can be manually cleaned, so that along with the replacement of objects in the object library, the cleaning logic can clean the containers which are out of date and are not used any more, and new data is stored after the recovery, thereby achieving the purposes of space recycling and phase change and storage space reduction.
The application also provides a data efficient deleting system based on source end deleting, which comprises a container determining module, a backup set cleaning module and a deleting module;
the container determining module is used for dividing a data stream of a source end into data blocks in a backup process, calculating fingerprints, comparing the fingerprints, if the fingerprints are not provided with new blocks, transmitting the corresponding data blocks into containers of a server end for storage, marking the corresponding containers as 1, writing the containers into data files after the containers are fully written, and creating a new container, wherein the container comprises a plurality of data blocks, a deduplication library comprises a plurality of data files with fixed sizes, and each data file comprises a plurality of containers;
the backup set cleaning module is used for automatically cleaning the backup set after the backup set expires, and simultaneously deleting the guid object records;
and the deleting module is used for cleaning the data block and the fingerprint of the container marked as 0 by utilizing a preset loop deleting logic in idle time outside the normal service window period, wherein the container marked as 0 indicates that the data block and the fingerprint in the container are not referenced and can be cleaned.
The size of the container determined by the container determining module is fixed.
The container determining module comprises a backup set determining module and a container marking module;
the backup set determining module is used for determining a backup set, the backup set comprises an object library and a deduplication library, the object library stores object files, the object files store object records and index data of objects, the deduplication library stores data files, and the data files store information of each data block contained in the objects;
the container marking module is used for acquiring the referenced object file, reading index data in the object file according to the unique identifier of the object, finding a corresponding container according to the fingerprint in the index data, and marking the corresponding container record with a mark 1.
The cleaning module comprises a backup module, a first traversing module, a second traversing module, an initializing module, a collecting module and a circulating module;
the backup module is used for marking a container where a corresponding data block is located as 1 for the referenced data block in the backup process, and marking a corresponding object record as 1 to represent that the referenced data block is checked;
the first traversing module is used for traversing the object records, finding out the objects marked as 0, finding out the position of the container stored by the corresponding data block in the deduplication library according to the index information of the records in the object file, marking the container corresponding to the fingerprint as 1, and marking the object record mark as 0 to indicate that the object record is not inspected yet;
the second traversing module traverses the container records in the deduplication library, cleans the data blocks and fingerprints thereof in the container marked with 0, then marks the container state as 2, and represents that the container is cleaned and can be reused; the method comprises the steps of carrying out a first treatment on the surface of the
The initialization module marks the container record in the deduplication library as 0 of 1, and marks of all object records in the object library are 0;
the collection module is used for collecting all containers marked as 2 in the deduplication library, and the collected containers are preferentially selected to be reused when new data needs to be stored;
the circulation module is used for circularly executing the processes of the backup module, the first traversing module, the second traversing module, the initializing module and the collecting module in a set period.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely a preferred embodiment of the present application, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present application, and such modifications and variations should also be regarded as being within the scope of the application.

Claims (4)

1. A method for efficiently deleting data based on source end deduplication is characterized in that,
in the backup process, segmenting a data stream of a source end into data blocks, calculating fingerprints, comparing the fingerprints, if the fingerprints are not provided with a new block, transmitting the corresponding data blocks into a container of a server end for storage, marking the corresponding container as 1, writing the container into a data file after the container is full, and creating a new container, wherein the container comprises a plurality of data blocks, a deduplication library comprises a plurality of data files with fixed sizes, and each data file comprises a plurality of containers;
the backup set is automatically cleaned due to expiration, and the guid object record is cleaned;
the method comprises the steps that data blocks and fingerprints of a container marked as 0 are cleaned by utilizing preset loop deletion logic in idle time outside a normal service window period, wherein the container marked as 0 indicates that the data blocks and fingerprints thereof in the container are not referenced, and cleaning is performed;
the process of marking each of the containers is as follows:
determining a backup set, wherein the backup set comprises an object library and a deduplication library, the object library stores object files, the object files store object records and index data of the objects, the deduplication library stores data files, and the data files store information of each data block contained in the objects;
acquiring a referenced object file, reading index data in the object file according to a unique identifier of the object, finding a corresponding container according to fingerprints in the index data, and marking a corresponding container record with a mark 1;
the loop deletion logic is as follows:
s1, in the backup process, marking a container where a corresponding data block is located as 1 for the referenced data block, and marking a corresponding object record as 1 to indicate that the data block is checked;
s2, traversing object records, finding out objects marked as 0, finding out the position of a container stored by a corresponding data block in a deduplication library according to index information of records in an object file, marking the container corresponding to a fingerprint as 1, and marking the object record as 0 to indicate that the object record is not inspected yet;
s3, traversing the container records in the deduplication library, cleaning the data blocks and fingerprints thereof in the container marked with 0, and marking the container state as 2, wherein the container is cleaned for recycling;
s4, marking the container record in the deduplication library as 0 of 1, and marking all the object records in the object library as 0;
s5, collecting all containers marked as 2 in the deduplication library, and preferentially selecting the collected containers for multiplexing when new data needs to be stored;
s6, circularly executing the steps S1-S5 in a set period.
2. The efficient source deduplication-based data deletion method of claim 1, wherein the container is fixed in size.
3. The data efficient deleting system based on source end deleting is characterized by comprising a container determining module, a backup set cleaning module and a deleting module;
the container determining module is used for dividing a data stream of a source end into data blocks in a backup process, calculating fingerprints, comparing the fingerprints, if the fingerprints are not provided with new blocks, transmitting the corresponding data blocks into containers of a server end for storage, marking the corresponding containers as 1, writing the containers into data files after the containers are fully written, and creating a new container, wherein the container comprises a plurality of data blocks, a deduplication library comprises a plurality of data files with fixed sizes, and each data file comprises a plurality of containers;
the backup set cleaning module is used for automatically cleaning the backup set after the backup set expires, and simultaneously deleting the guid object records;
the deleting module is used for cleaning the data block and the fingerprint of the container marked as 0 by utilizing a preset loop deleting logic in idle time outside the normal service window period, wherein the container marked as 0 indicates that the data block and the fingerprint in the container are not referenced, and cleaning is carried out;
the container determining module comprises a backup set determining module and a container marking module;
the backup set determining module is used for determining a backup set, the backup set comprises an object library and a deduplication library, the object library stores object files, the object files store object records and index data of objects, the deduplication library stores data files, and the data files store information of each data block contained in the objects;
the container marking module is used for acquiring the referenced object file, reading index data in the object file according to the unique identifier of the object, finding a corresponding container according to the fingerprint in the index data, and marking the corresponding container record with a mark 1;
the cleaning module comprises a backup module, a first traversing module, a second traversing module, an initializing module, a collecting module and a circulating module;
the backup module is used for marking a container where a corresponding data block is located as 1 for the referenced data block in the backup process, and marking a corresponding object record as 1 to represent that the referenced data block is checked;
the first traversing module is used for traversing the object records, finding out the objects marked as 0, finding out the position of the container stored by the corresponding data block in the deduplication library according to the index information of the records in the object file, marking the container corresponding to the fingerprint as 1, and marking the object record mark as 0 to indicate that the object record is not inspected yet;
the second traversing module traverses the container records in the deduplication library, cleans up the data blocks and fingerprints thereof in the container marked with 0, and then marks the container state as 2, which represents that the container has been cleaned up for reuse;
the initialization module marks the container record in the deduplication library as 0 of 1, and marks of all object records in the object library are 0;
the collection module is used for collecting all containers marked as 2 in the deduplication library, and the collected containers are preferentially selected to be reused when new data needs to be stored;
the circulation module is used for circularly executing the processes of the backup module, the first traversing module, the second traversing module, the initializing module and the collecting module in a set period.
4. The source deduplication-based data efficient deletion system of claim 3, wherein the size of the container determined by the container determination module is fixed.
CN201911374951.4A 2019-12-27 2019-12-27 Efficient data deleting method and system based on source terminal deduplication Active CN111143343B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911374951.4A CN111143343B (en) 2019-12-27 2019-12-27 Efficient data deleting method and system based on source terminal deduplication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911374951.4A CN111143343B (en) 2019-12-27 2019-12-27 Efficient data deleting method and system based on source terminal deduplication

Publications (2)

Publication Number Publication Date
CN111143343A CN111143343A (en) 2020-05-12
CN111143343B true CN111143343B (en) 2023-12-15

Family

ID=70520911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911374951.4A Active CN111143343B (en) 2019-12-27 2019-12-27 Efficient data deleting method and system based on source terminal deduplication

Country Status (1)

Country Link
CN (1) CN111143343B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112637153B (en) * 2020-12-14 2024-02-20 航天壹进制(江苏)信息科技有限公司 Method and system for storing encryption and deduplication
CN113312002A (en) * 2021-06-11 2021-08-27 北京百度网讯科技有限公司 Data processing method, device, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663086A (en) * 2012-04-09 2012-09-12 华中科技大学 Method for retrieving data block indexes
CN102982180A (en) * 2012-12-18 2013-03-20 华为技术有限公司 Method and device for storing data
CN107301019A (en) * 2017-06-22 2017-10-27 重庆大学 The rubbish recovering method of time diagram and container position table is quoted in a kind of combination

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10621143B2 (en) * 2015-02-06 2020-04-14 Ashish Govind Khurange Methods and systems of a dedupe file-system garbage collection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663086A (en) * 2012-04-09 2012-09-12 华中科技大学 Method for retrieving data block indexes
CN102982180A (en) * 2012-12-18 2013-03-20 华为技术有限公司 Method and device for storing data
CN107301019A (en) * 2017-06-22 2017-10-27 重庆大学 The rubbish recovering method of time diagram and container position table is quoted in a kind of combination

Also Published As

Publication number Publication date
CN111143343A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
US10360182B2 (en) Recovering data lost in data de-duplication system
US8224875B1 (en) Systems and methods for removing unreferenced data segments from deduplicated data systems
US8108446B1 (en) Methods and systems for managing deduplicated data using unilateral referencing
TWI557575B (en) Virtual machine snapshot backup method and system based on multi - level row weight
CN102541757B (en) Write cache method, cache synchronization method and device
CN106502587B (en) Hard disk data management method and hard disk control device
CN103955530B (en) Data reconstruction and optimization method of on-line repeating data deletion system
CN111143343B (en) Efficient data deleting method and system based on source terminal deduplication
CN102999433B (en) Redundant data deletion method and system of virtual disks
CN104615504B (en) A kind of method and device for realizing data protection
CN102982180A (en) Method and device for storing data
CN107665219B (en) Log management method and device
CN104050057B (en) Historical sensed data duplicate removal fragment eliminating method and system
CN105787037A (en) Repeated data deleting method and device
CN111125298A (en) Method, equipment and storage medium for reconstructing NTFS file directory tree
CN111367926A (en) Data processing method and device for distributed system
CN115221131A (en) High-speed data reading and writing method and device for time sequence database
CN105068941A (en) Cache page replacing method and cache page replacing device
CN106844236A (en) The date storage method and device of terminal device
CN109189739B (en) Cache space recovery method and device
CN104978241B (en) A kind of data reconstruction method and device of COW type file systems
US7080206B2 (en) System and method for adaptively loading input data into a multi-dimensional clustering table
CN109658985B (en) Redundancy removal optimization method and system for gene reference sequence
CN103699681A (en) Data rollback processing method and data rollback processing device
CN109271353B (en) Method and system for selectively rewriting self-reference block in data deduplication process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Building 1, 6th Floor, Changfeng Building, No.14 Xinghuo Road, Research and Innovation Park, Jiangbei New District, Nanjing City, Jiangsu Province, 210000

Applicant after: Aerospace One System (Jiangsu) Information Technology Co.,Ltd.

Address before: 210014 Building C, Building 3, No. 5 Baixia High-tech Park, No. 5 Yongzhi Road, Qinhuai District, Nanjing City, Jiangsu Province

Applicant before: NANJING UNARY INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant