CN102385554B - Method for optimizing duplicated data deletion system - Google Patents
Method for optimizing duplicated data deletion system Download PDFInfo
- Publication number
- CN102385554B CN102385554B CN201110335112.9A CN201110335112A CN102385554B CN 102385554 B CN102385554 B CN 102385554B CN 201110335112 A CN201110335112 A CN 201110335112A CN 102385554 B CN102385554 B CN 102385554B
- Authority
- CN
- China
- Prior art keywords
- piecemeal
- node
- underload
- queue
- hash fingerprint
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a method for optimizing a duplicated data deletion system. The method comprises the following steps of: acquiring blocks of current data; performing Hash calculation on the blocks to obtain Hash fingerprints of the blocks; judging whether the Hash fingerprints of the blocks are in a Hash fingerprint database or not; judging whether reference number of the blocks in the Hash fingerprint database is larger than a threshold valve and copy number is smaller than the threshold value or not if the Hash fingerprints of the blocks are in the Hash fingerprint database; judging whether information of the blocks exist in an index table or not if the reference number of the blocks in the Hash fingerprint database is larger than the threshold value and the copy number is smaller than the threshold value; calling a node distribution process to select a light-load node if the information of the blocks exist in the index table; storing the blocks in the light-load node; updating the information of the blocks in the index table; and adding 1 into the reference number of the blocks in the Hash fingerprint database. According to the method, storage space can be dynamically distributed according to the current load of all storage nodes and the energy consumption state of the duplicated data deletion system, the working load of all storage nodes is balanced, and the system performance is improved.
Description
Technical field
The present invention relates to field of data storage, be specifically related to a kind of optimization method of data deduplication system.
Background technology
The content-based addressing of data deduplication system, can eliminate the data of repetition, improves space availability ratio.Data deduplication system is divided into several little deblockings by file by given Data Partition Strategy, and certain feature based on deblocking is carried out same detection, only store unduplicated deblocking, thereby reach deletion redundant data, save the object of storage space.In practical application, also usually in conjunction with data compression technique, further reduce the shared storage space of deblocking.
The research of existing data de-duplication technology mainly concentrates on excavates redundant information and raises the efficiency, and to the management of memory node with allocation of space, only adopts simple allocation manager strategy; And the load of memory node and power consumption state etc. do not have enough considerations during for memory allocation, can not the residing various states of self-adaptation memory node, be also unfavorable for that the load balancing of whole system and performance improve.
Summary of the invention
The object of the present invention is to provide a kind of optimization method of data deduplication system, the method can be according to the load of current each memory node of data deduplication system and power consumption state memory allocated space dynamically, the operating load of each memory node of balance, improves system performance.
The present invention is achieved by the following scheme:
A kind of optimization method of data deduplication system, comprise the following steps: the piecemeal that obtains current data, piecemeal is carried out to Hash calculation, to obtain the Hash fingerprint of piecemeal, whether the Hash fingerprint that judges piecemeal is present in Hash fingerprint base, if the Hash fingerprint of piecemeal is present in Hash fingerprint base, the number of references that judges piecemeal in Hash fingerprint base is greater than threshold value and whether number of copies is less than threshold value, if the number of references of piecemeal is greater than threshold value and number of copies is less than threshold value in Hash fingerprint base, whether the information that judges piecemeal is present in concordance list, if the information of piecemeal is present in concordance list, call node assigning process, to select underload node, piecemeal is stored in underload node, upgrade the information of piecemeal in concordance list, the number of references of piecemeal in Hash fingerprint base is added to 1.
Optimization method of the present invention also comprises step: if the Hash fingerprint of piecemeal is not present in Hash fingerprint base, call node assigning process, to select underload node, piecemeal is stored in underload node, the information of piecemeal is added in Hash fingerprint base.
Optimization method of the present invention also comprises step: if the number of references of piecemeal is not more than threshold value and number of copies is not less than threshold value in Hash fingerprint base, enters the number of references of piecemeal in Hash fingerprint base is added to 1 step.
Optimization method of the present invention also comprises step: if the information of piecemeal is not present in concordance list, the information of piecemeal is added in concordance list, and enter and call node assigning process, to select the step of underload node.
Call node assigning process, take and select the step of underload node to comprise: judge that whether current idle load queue is as empty, if current idle load queue is empty, judge whether current underload queue is empty, if current underload queue is empty, judge whether current heavy duty queue is empty, if current heavy duty queue is empty, judge whether current sleep queue is empty, if current dormancy load queue is not empty, wake the memory node in dormant state up, the system of sending is without the notice of available underload node.
Call node assigning process, take and select the step of underload node also to comprise: if current idle load queue is not sky, memory allocated space on first memory node of current unloaded load queue.
Call node assigning process, take and select the step of underload node also to comprise: if current underload queue is not sky, memory allocated space on first memory node in current underload queue.
Call node assigning process, take and select the step of underload node also to comprise: if current heavy duty queue is as empty, enter the system of sending without the step of the notice of available underload node.
Call node assigning process, take and select the step of underload node also to comprise: if current dormancy load queue is as empty, enter the system of sending without the step of the notice of available underload node.
The information of piecemeal comprises Hash fingerprint, number of references and the address of piecemeal.
Optimization method of the present invention has the following advantages:
(1) energy consumption perception: the present invention is by obtaining present load and the power consumption state information of each memory node, to deblocking memory allocated space flexibly;
(2) load balancing: the present invention, by specifying relatively underloaded memory node for the storage of deblocking, makes data storage procedure decentralized, and the balanced load of each memory node; Introduce replication policy, the deblocking that those are quoted by a plurality of files is set up copy, has improved availability and the reliability of data deduplication system.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the optimization method of data deduplication system of the present invention.
Fig. 2 is the refinement process flow diagram of step in optimization method of the present invention (6).
Embodiment
Below in conjunction with accompanying drawing, the present invention is further detailed explanation.
As shown in Figure 1, the optimization method of data deduplication system of the present invention comprises the following steps:
(1) obtain the piecemeal of current data;
(2) piecemeal is carried out to Hash calculation, to obtain the Hash fingerprint of piecemeal;
(3) whether the Hash fingerprint that judges piecemeal is present in Hash fingerprint base, if the Hash fingerprint of piecemeal is present in Hash fingerprint base, proceeds to step (4), otherwise proceeds to step (10);
(4) number of references that judges piecemeal in Hash fingerprint base whether is greater than threshold value and whether number of copies is less than threshold value, if the number of references of piecemeal is greater than threshold value and number of copies is less than threshold value in Hash fingerprint base, enter step (5), otherwise enter step (9);
(5) whether the information that judges piecemeal is present in concordance list, if the information of piecemeal is present in concordance list, enters step (6), otherwise enters step (11); In the present embodiment, the information of piecemeal comprises Hash fingerprint, number of references and the address of piecemeal.
(6) call node assigning process, select underload node;
(7) piecemeal is stored in the underload node of selection;
(8) upgrade the information of corresponding sub-block in concordance list;
(9) number of references of piecemeal adds 1;
(10) call node assigning process, select underload node, piecemeal is stored in the underload node of selection, the information of piecemeal is added in Hash fingerprint base, then return to step (9);
(11) information of piecemeal is added in concordance list, then return to step (6).
As shown in Figure 2, above-mentioned steps (6) further comprises following sub-step:
(61) judge that whether current idle load queue is empty, if idle load queue is empty, enters step (62), otherwise enters step (66);
(62) judge that whether current underload queue is empty, if underload queue is empty, enters step (63), otherwise enters step (67);
(63) judge that whether current heavy duty queue is empty, if heavy duty queue is empty, enters step (64), otherwise enters step (65);
(64) judge that current sleep queue, for empty, is to enter step (65), otherwise enters step (68);
(65) send system without the notice of available underload node;
(66) select first memory node memory allocated space of current idle load queue;
(67) select first memory node memory allocated space of current underload queue;
(68) wake the memory node in dormant state up, and return to step (65).
The present invention is not only confined to above-mentioned embodiment; persons skilled in the art are according to content disclosed by the invention; can adopt other multiple embodiment to implement the present invention; therefore; every employing project organization of the present invention and thinking; do some simple designs that change or change, all fall into the scope of protection of the invention.
Claims (10)
1. an optimization method for data deduplication system, comprises the following steps:
Obtain the piecemeal of current data;
Described piecemeal is carried out to Hash calculation, to obtain the Hash fingerprint of described piecemeal;
Whether the Hash fingerprint that judges described piecemeal is present in Hash fingerprint base;
If the Hash fingerprint of described piecemeal is present in described Hash fingerprint base, the number of references that judges piecemeal described in described Hash fingerprint base whether is greater than threshold value and whether number of copies is less than threshold value;
If the number of references of piecemeal described in described Hash fingerprint base is greater than described threshold value and number of copies is less than described threshold value, judge whether the information of described piecemeal is present in concordance list;
If the information of described piecemeal is present in described concordance list, call node assigning process, to select underload node;
Described piecemeal is stored in described underload node;
Upgrade the information of piecemeal described in described concordance list;
The number of references of piecemeal described in described Hash fingerprint base is added to 1.
2. optimization method according to claim 1, it is characterized in that, also comprise step: if the Hash fingerprint of described piecemeal is not present in described Hash fingerprint base, call node assigning process, to select underload node, described piecemeal is stored in described underload node, the information of described piecemeal is added in described Hash fingerprint base.
3. optimization method according to claim 1, it is characterized in that, also comprise step: if the number of references of piecemeal described in described Hash fingerprint base is not more than threshold value and number of copies is not less than described threshold value, enter the described number of references by piecemeal described in described Hash fingerprint base and add 1 step.
4. optimization method according to claim 1, it is characterized in that, also comprise step: if the information of described piecemeal is not present in described concordance list, the information of described piecemeal is added in described concordance list, and described in entering, call node assigning process, to select the step of underload node.
5. optimization method according to claim 1, is characterized in that, described in call node assigning process, to select the step of underload node to comprise:
Judge whether current idle load queue is empty;
If described current idle load queue is empty, judge whether current underload queue is empty;
If described current underload queue is empty, judge whether current heavy duty queue is empty;
If described current heavy duty queue is empty, judge whether current sleep queue is empty;
If described current sleep queue is not empty, wake the memory node in dormant state up;
The system of sending is without the notice of available underload node.
6. optimization method according to claim 5, it is characterized in that, the described node assigning process that calls, take and select the step of underload node also to comprise: if described current idle load queue is not sky, memory allocated space on first memory node of described current idle load queue.
7. optimization method according to claim 5, it is characterized in that, the described node assigning process that calls, take and select the step of underload node also to comprise: if described current underload queue is not sky, memory allocated space on first memory node in described current underload queue.
8. optimization method according to claim 5, it is characterized in that, the described node assigning process that calls, take and select the step of underload node also to comprise: if described current heavy duty queue is as empty, described in entering, send system without the step of the notice of available underload node.
9. optimization method according to claim 5, it is characterized in that, the described node assigning process that calls, take and select the step of underload node also to comprise: if described current dormancy load queue is as empty, described in entering, send system without the step of the notice of available underload node.
10. optimization method according to claim 1, is characterized in that: the information of described piecemeal comprises Hash fingerprint, number of references and the address of described piecemeal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110335112.9A CN102385554B (en) | 2011-10-28 | 2011-10-28 | Method for optimizing duplicated data deletion system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110335112.9A CN102385554B (en) | 2011-10-28 | 2011-10-28 | Method for optimizing duplicated data deletion system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102385554A CN102385554A (en) | 2012-03-21 |
CN102385554B true CN102385554B (en) | 2014-01-15 |
Family
ID=45824983
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110335112.9A Active CN102385554B (en) | 2011-10-28 | 2011-10-28 | Method for optimizing duplicated data deletion system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102385554B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102722452B (en) * | 2012-05-29 | 2015-02-18 | 南京大学 | Memory redundancy eliminating method |
CN102999605A (en) * | 2012-11-21 | 2013-03-27 | 重庆大学 | Method and device for optimizing data placement to reduce data fragments |
CN103873506A (en) * | 2012-12-12 | 2014-06-18 | 鸿富锦精密工业(深圳)有限公司 | Data block duplication removing system in storage cluster and method thereof |
CN103049508B (en) * | 2012-12-13 | 2017-08-11 | 华为技术有限公司 | A kind of data processing method and device |
CN103870514B (en) * | 2012-12-18 | 2018-03-09 | 华为技术有限公司 | Data de-duplication method and device |
CN104063374A (en) * | 2013-03-18 | 2014-09-24 | 阿里巴巴集团控股有限公司 | Data deduplication method and equipment |
CN104679746A (en) * | 2013-11-26 | 2015-06-03 | 南京中兴新软件有限责任公司 | Recovery method and device of removed repeated data |
CN104902010A (en) * | 2015-04-30 | 2015-09-09 | 浙江工商大学 | Cloud storage method and system for file |
CN105407142B (en) * | 2015-10-22 | 2018-06-26 | 华中科技大学 | Real time picture sharing method and system under a kind of natural calamity environment |
EP3321792B1 (en) * | 2016-09-28 | 2020-07-29 | Huawei Technologies Co., Ltd. | Method for deleting duplicated data in storage system, storage system and controller |
CN110019052A (en) * | 2017-07-26 | 2019-07-16 | 先智云端数据股份有限公司 | The method and stocking system of distributed data de-duplication |
CN107579960A (en) * | 2017-08-22 | 2018-01-12 | 深圳市盛路物联通讯技术有限公司 | A kind of data filtering method and device |
CN111881065B (en) * | 2020-07-30 | 2022-07-05 | 北京浪潮数据技术有限公司 | Physical address processing method, device, equipment and medium for data deduplication operation |
CN113836094B (en) * | 2021-11-30 | 2022-03-01 | 成都同步新创科技股份有限公司 | File life cycle management method and system for distributed video storage |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101814045A (en) * | 2010-04-22 | 2010-08-25 | 华中科技大学 | Data organization method for backup services |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9275067B2 (en) * | 2009-03-16 | 2016-03-01 | International Busines Machines Corporation | Apparatus and method to sequentially deduplicate data |
-
2011
- 2011-10-28 CN CN201110335112.9A patent/CN102385554B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101814045A (en) * | 2010-04-22 | 2010-08-25 | 华中科技大学 | Data organization method for backup services |
Non-Patent Citations (6)
Title |
---|
一种基于重复数据删除的备份***设计与实现;蔡盛鑫;《中国优秀硕士学位论文全文数据库》;20101119;全文 * |
杨天明.网络备份中重复数据删除技术研究.《中国博士学位论文全文数据库》.2011, |
王树鹏.重复数据删除技术的发展及应用.《中兴通讯技术》.2010,第16卷(第5期), |
网络备份中重复数据删除技术研究;杨天明;《中国博士学位论文全文数据库》;20110518;全文 * |
蔡盛鑫.一种基于重复数据删除的备份***设计与实现.《中国优秀硕士学位论文全文数据库》.2010, |
重复数据删除技术的发展及应用;王树鹏;《中兴通讯技术》;20101031;第16卷(第5期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN102385554A (en) | 2012-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102385554B (en) | Method for optimizing duplicated data deletion system | |
CN104023088B (en) | Storage server selection method applied to distributed file system | |
AU2012389110B2 (en) | Data processing method and apparatus in cluster system | |
CN101866359B (en) | Small file storage and visit method in avicade file system | |
CN103412884B (en) | The management method of embedded database under a kind of isomery storage medium | |
CN109154917A (en) | Storage system and solid state hard disk | |
CN103279502B (en) | A kind of framework and method with the data de-duplication file system be combined with parallel file system | |
CN103077197A (en) | Data storing method and device | |
CN101986649B (en) | Shared data center used in telecommunication industry billing system | |
CN103970852A (en) | Data de-duplication method of backup server | |
CN105868004B (en) | Scheduling method and scheduling device of service system based on cloud computing | |
CN102411639A (en) | Multi-copy storage management method and system of metadata | |
CN102521419A (en) | Hierarchical storage realization method and system | |
CN103488685A (en) | Fragmented-file storage method based on distributed storage system | |
CN112380005A (en) | Data center energy consumption management method and system | |
CN104035925A (en) | Data storage method and device and storage system | |
Ranjana et al. | A survey on power aware virtual machine placement strategies in a cloud data center | |
CN101827120A (en) | Cluster storage method and system | |
CN115617762A (en) | File storage method and equipment | |
CN108363719B (en) | Configurable transparent compression method in distributed file system | |
CN101938516A (en) | User-oriented dynamic storage resource distribution method | |
CN106980618B (en) | File storage method and system based on MongoDB distributed cluster architecture | |
CN103020077A (en) | Method for managing memory of real-time database of power system | |
CN201804331U (en) | Date deduplication system based on co-processor | |
CN104516821B (en) | Storage management method and memory management unit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |