CN102385554B - Method for optimizing duplicated data deletion system - Google Patents

Method for optimizing duplicated data deletion system Download PDF

Info

Publication number
CN102385554B
CN102385554B CN201110335112.9A CN201110335112A CN102385554B CN 102385554 B CN102385554 B CN 102385554B CN 201110335112 A CN201110335112 A CN 201110335112A CN 102385554 B CN102385554 B CN 102385554B
Authority
CN
China
Prior art keywords
piecemeal
node
underload
queue
hash fingerprint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110335112.9A
Other languages
Chinese (zh)
Other versions
CN102385554A (en
Inventor
黄建忠
曹强
万胜刚
谢平
韩帅军
谢长生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201110335112.9A priority Critical patent/CN102385554B/en
Publication of CN102385554A publication Critical patent/CN102385554A/en
Application granted granted Critical
Publication of CN102385554B publication Critical patent/CN102385554B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for optimizing a duplicated data deletion system. The method comprises the following steps of: acquiring blocks of current data; performing Hash calculation on the blocks to obtain Hash fingerprints of the blocks; judging whether the Hash fingerprints of the blocks are in a Hash fingerprint database or not; judging whether reference number of the blocks in the Hash fingerprint database is larger than a threshold valve and copy number is smaller than the threshold value or not if the Hash fingerprints of the blocks are in the Hash fingerprint database; judging whether information of the blocks exist in an index table or not if the reference number of the blocks in the Hash fingerprint database is larger than the threshold value and the copy number is smaller than the threshold value; calling a node distribution process to select a light-load node if the information of the blocks exist in the index table; storing the blocks in the light-load node; updating the information of the blocks in the index table; and adding 1 into the reference number of the blocks in the Hash fingerprint database. According to the method, storage space can be dynamically distributed according to the current load of all storage nodes and the energy consumption state of the duplicated data deletion system, the working load of all storage nodes is balanced, and the system performance is improved.

Description

The optimization method of data deduplication system
Technical field
The present invention relates to field of data storage, be specifically related to a kind of optimization method of data deduplication system.
Background technology
The content-based addressing of data deduplication system, can eliminate the data of repetition, improves space availability ratio.Data deduplication system is divided into several little deblockings by file by given Data Partition Strategy, and certain feature based on deblocking is carried out same detection, only store unduplicated deblocking, thereby reach deletion redundant data, save the object of storage space.In practical application, also usually in conjunction with data compression technique, further reduce the shared storage space of deblocking.
The research of existing data de-duplication technology mainly concentrates on excavates redundant information and raises the efficiency, and to the management of memory node with allocation of space, only adopts simple allocation manager strategy; And the load of memory node and power consumption state etc. do not have enough considerations during for memory allocation, can not the residing various states of self-adaptation memory node, be also unfavorable for that the load balancing of whole system and performance improve.
Summary of the invention
The object of the present invention is to provide a kind of optimization method of data deduplication system, the method can be according to the load of current each memory node of data deduplication system and power consumption state memory allocated space dynamically, the operating load of each memory node of balance, improves system performance.
The present invention is achieved by the following scheme:
A kind of optimization method of data deduplication system, comprise the following steps: the piecemeal that obtains current data, piecemeal is carried out to Hash calculation, to obtain the Hash fingerprint of piecemeal, whether the Hash fingerprint that judges piecemeal is present in Hash fingerprint base, if the Hash fingerprint of piecemeal is present in Hash fingerprint base, the number of references that judges piecemeal in Hash fingerprint base is greater than threshold value and whether number of copies is less than threshold value, if the number of references of piecemeal is greater than threshold value and number of copies is less than threshold value in Hash fingerprint base, whether the information that judges piecemeal is present in concordance list, if the information of piecemeal is present in concordance list, call node assigning process, to select underload node, piecemeal is stored in underload node, upgrade the information of piecemeal in concordance list, the number of references of piecemeal in Hash fingerprint base is added to 1.
Optimization method of the present invention also comprises step: if the Hash fingerprint of piecemeal is not present in Hash fingerprint base, call node assigning process, to select underload node, piecemeal is stored in underload node, the information of piecemeal is added in Hash fingerprint base.
Optimization method of the present invention also comprises step: if the number of references of piecemeal is not more than threshold value and number of copies is not less than threshold value in Hash fingerprint base, enters the number of references of piecemeal in Hash fingerprint base is added to 1 step.
Optimization method of the present invention also comprises step: if the information of piecemeal is not present in concordance list, the information of piecemeal is added in concordance list, and enter and call node assigning process, to select the step of underload node.
Call node assigning process, take and select the step of underload node to comprise: judge that whether current idle load queue is as empty, if current idle load queue is empty, judge whether current underload queue is empty, if current underload queue is empty, judge whether current heavy duty queue is empty, if current heavy duty queue is empty, judge whether current sleep queue is empty, if current dormancy load queue is not empty, wake the memory node in dormant state up, the system of sending is without the notice of available underload node.
Call node assigning process, take and select the step of underload node also to comprise: if current idle load queue is not sky, memory allocated space on first memory node of current unloaded load queue.
Call node assigning process, take and select the step of underload node also to comprise: if current underload queue is not sky, memory allocated space on first memory node in current underload queue.
Call node assigning process, take and select the step of underload node also to comprise: if current heavy duty queue is as empty, enter the system of sending without the step of the notice of available underload node.
Call node assigning process, take and select the step of underload node also to comprise: if current dormancy load queue is as empty, enter the system of sending without the step of the notice of available underload node.
The information of piecemeal comprises Hash fingerprint, number of references and the address of piecemeal.
Optimization method of the present invention has the following advantages:
(1) energy consumption perception: the present invention is by obtaining present load and the power consumption state information of each memory node, to deblocking memory allocated space flexibly;
(2) load balancing: the present invention, by specifying relatively underloaded memory node for the storage of deblocking, makes data storage procedure decentralized, and the balanced load of each memory node; Introduce replication policy, the deblocking that those are quoted by a plurality of files is set up copy, has improved availability and the reliability of data deduplication system.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the optimization method of data deduplication system of the present invention.
Fig. 2 is the refinement process flow diagram of step in optimization method of the present invention (6).
Embodiment
Below in conjunction with accompanying drawing, the present invention is further detailed explanation.
As shown in Figure 1, the optimization method of data deduplication system of the present invention comprises the following steps:
(1) obtain the piecemeal of current data;
(2) piecemeal is carried out to Hash calculation, to obtain the Hash fingerprint of piecemeal;
(3) whether the Hash fingerprint that judges piecemeal is present in Hash fingerprint base, if the Hash fingerprint of piecemeal is present in Hash fingerprint base, proceeds to step (4), otherwise proceeds to step (10);
(4) number of references that judges piecemeal in Hash fingerprint base whether is greater than threshold value and whether number of copies is less than threshold value, if the number of references of piecemeal is greater than threshold value and number of copies is less than threshold value in Hash fingerprint base, enter step (5), otherwise enter step (9);
(5) whether the information that judges piecemeal is present in concordance list, if the information of piecemeal is present in concordance list, enters step (6), otherwise enters step (11); In the present embodiment, the information of piecemeal comprises Hash fingerprint, number of references and the address of piecemeal.
(6) call node assigning process, select underload node;
(7) piecemeal is stored in the underload node of selection;
(8) upgrade the information of corresponding sub-block in concordance list;
(9) number of references of piecemeal adds 1;
(10) call node assigning process, select underload node, piecemeal is stored in the underload node of selection, the information of piecemeal is added in Hash fingerprint base, then return to step (9);
(11) information of piecemeal is added in concordance list, then return to step (6).
As shown in Figure 2, above-mentioned steps (6) further comprises following sub-step:
(61) judge that whether current idle load queue is empty, if idle load queue is empty, enters step (62), otherwise enters step (66);
(62) judge that whether current underload queue is empty, if underload queue is empty, enters step (63), otherwise enters step (67);
(63) judge that whether current heavy duty queue is empty, if heavy duty queue is empty, enters step (64), otherwise enters step (65);
(64) judge that current sleep queue, for empty, is to enter step (65), otherwise enters step (68);
(65) send system without the notice of available underload node;
(66) select first memory node memory allocated space of current idle load queue;
(67) select first memory node memory allocated space of current underload queue;
(68) wake the memory node in dormant state up, and return to step (65).
The present invention is not only confined to above-mentioned embodiment; persons skilled in the art are according to content disclosed by the invention; can adopt other multiple embodiment to implement the present invention; therefore; every employing project organization of the present invention and thinking; do some simple designs that change or change, all fall into the scope of protection of the invention.

Claims (10)

1. an optimization method for data deduplication system, comprises the following steps:
Obtain the piecemeal of current data;
Described piecemeal is carried out to Hash calculation, to obtain the Hash fingerprint of described piecemeal;
Whether the Hash fingerprint that judges described piecemeal is present in Hash fingerprint base;
If the Hash fingerprint of described piecemeal is present in described Hash fingerprint base, the number of references that judges piecemeal described in described Hash fingerprint base whether is greater than threshold value and whether number of copies is less than threshold value;
If the number of references of piecemeal described in described Hash fingerprint base is greater than described threshold value and number of copies is less than described threshold value, judge whether the information of described piecemeal is present in concordance list;
If the information of described piecemeal is present in described concordance list, call node assigning process, to select underload node;
Described piecemeal is stored in described underload node;
Upgrade the information of piecemeal described in described concordance list;
The number of references of piecemeal described in described Hash fingerprint base is added to 1.
2. optimization method according to claim 1, it is characterized in that, also comprise step: if the Hash fingerprint of described piecemeal is not present in described Hash fingerprint base, call node assigning process, to select underload node, described piecemeal is stored in described underload node, the information of described piecemeal is added in described Hash fingerprint base.
3. optimization method according to claim 1, it is characterized in that, also comprise step: if the number of references of piecemeal described in described Hash fingerprint base is not more than threshold value and number of copies is not less than described threshold value, enter the described number of references by piecemeal described in described Hash fingerprint base and add 1 step.
4. optimization method according to claim 1, it is characterized in that, also comprise step: if the information of described piecemeal is not present in described concordance list, the information of described piecemeal is added in described concordance list, and described in entering, call node assigning process, to select the step of underload node.
5. optimization method according to claim 1, is characterized in that, described in call node assigning process, to select the step of underload node to comprise:
Judge whether current idle load queue is empty;
If described current idle load queue is empty, judge whether current underload queue is empty;
If described current underload queue is empty, judge whether current heavy duty queue is empty;
If described current heavy duty queue is empty, judge whether current sleep queue is empty;
If described current sleep queue is not empty, wake the memory node in dormant state up;
The system of sending is without the notice of available underload node.
6. optimization method according to claim 5, it is characterized in that, the described node assigning process that calls, take and select the step of underload node also to comprise: if described current idle load queue is not sky, memory allocated space on first memory node of described current idle load queue.
7. optimization method according to claim 5, it is characterized in that, the described node assigning process that calls, take and select the step of underload node also to comprise: if described current underload queue is not sky, memory allocated space on first memory node in described current underload queue.
8. optimization method according to claim 5, it is characterized in that, the described node assigning process that calls, take and select the step of underload node also to comprise: if described current heavy duty queue is as empty, described in entering, send system without the step of the notice of available underload node.
9. optimization method according to claim 5, it is characterized in that, the described node assigning process that calls, take and select the step of underload node also to comprise: if described current dormancy load queue is as empty, described in entering, send system without the step of the notice of available underload node.
10. optimization method according to claim 1, is characterized in that: the information of described piecemeal comprises Hash fingerprint, number of references and the address of described piecemeal.
CN201110335112.9A 2011-10-28 2011-10-28 Method for optimizing duplicated data deletion system Active CN102385554B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110335112.9A CN102385554B (en) 2011-10-28 2011-10-28 Method for optimizing duplicated data deletion system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110335112.9A CN102385554B (en) 2011-10-28 2011-10-28 Method for optimizing duplicated data deletion system

Publications (2)

Publication Number Publication Date
CN102385554A CN102385554A (en) 2012-03-21
CN102385554B true CN102385554B (en) 2014-01-15

Family

ID=45824983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110335112.9A Active CN102385554B (en) 2011-10-28 2011-10-28 Method for optimizing duplicated data deletion system

Country Status (1)

Country Link
CN (1) CN102385554B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722452B (en) * 2012-05-29 2015-02-18 南京大学 Memory redundancy eliminating method
CN102999605A (en) * 2012-11-21 2013-03-27 重庆大学 Method and device for optimizing data placement to reduce data fragments
CN103873506A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 Data block duplication removing system in storage cluster and method thereof
CN103049508B (en) * 2012-12-13 2017-08-11 华为技术有限公司 A kind of data processing method and device
CN103870514B (en) * 2012-12-18 2018-03-09 华为技术有限公司 Data de-duplication method and device
CN104063374A (en) * 2013-03-18 2014-09-24 阿里巴巴集团控股有限公司 Data deduplication method and equipment
CN104679746A (en) * 2013-11-26 2015-06-03 南京中兴新软件有限责任公司 Recovery method and device of removed repeated data
CN104902010A (en) * 2015-04-30 2015-09-09 浙江工商大学 Cloud storage method and system for file
CN105407142B (en) * 2015-10-22 2018-06-26 华中科技大学 Real time picture sharing method and system under a kind of natural calamity environment
EP3321792B1 (en) * 2016-09-28 2020-07-29 Huawei Technologies Co., Ltd. Method for deleting duplicated data in storage system, storage system and controller
CN110019052A (en) * 2017-07-26 2019-07-16 先智云端数据股份有限公司 The method and stocking system of distributed data de-duplication
CN107579960A (en) * 2017-08-22 2018-01-12 深圳市盛路物联通讯技术有限公司 A kind of data filtering method and device
CN111881065B (en) * 2020-07-30 2022-07-05 北京浪潮数据技术有限公司 Physical address processing method, device, equipment and medium for data deduplication operation
CN113836094B (en) * 2021-11-30 2022-03-01 成都同步新创科技股份有限公司 File life cycle management method and system for distributed video storage

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9275067B2 (en) * 2009-03-16 2016-03-01 International Busines Machines Corporation Apparatus and method to sequentially deduplicate data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
一种基于重复数据删除的备份***设计与实现;蔡盛鑫;《中国优秀硕士学位论文全文数据库》;20101119;全文 *
杨天明.网络备份中重复数据删除技术研究.《中国博士学位论文全文数据库》.2011,
王树鹏.重复数据删除技术的发展及应用.《中兴通讯技术》.2010,第16卷(第5期),
网络备份中重复数据删除技术研究;杨天明;《中国博士学位论文全文数据库》;20110518;全文 *
蔡盛鑫.一种基于重复数据删除的备份***设计与实现.《中国优秀硕士学位论文全文数据库》.2010,
重复数据删除技术的发展及应用;王树鹏;《中兴通讯技术》;20101031;第16卷(第5期);全文 *

Also Published As

Publication number Publication date
CN102385554A (en) 2012-03-21

Similar Documents

Publication Publication Date Title
CN102385554B (en) Method for optimizing duplicated data deletion system
CN104023088B (en) Storage server selection method applied to distributed file system
AU2012389110B2 (en) Data processing method and apparatus in cluster system
CN101866359B (en) Small file storage and visit method in avicade file system
CN103412884B (en) The management method of embedded database under a kind of isomery storage medium
CN109154917A (en) Storage system and solid state hard disk
CN103279502B (en) A kind of framework and method with the data de-duplication file system be combined with parallel file system
CN103077197A (en) Data storing method and device
CN101986649B (en) Shared data center used in telecommunication industry billing system
CN103970852A (en) Data de-duplication method of backup server
CN105868004B (en) Scheduling method and scheduling device of service system based on cloud computing
CN102411639A (en) Multi-copy storage management method and system of metadata
CN102521419A (en) Hierarchical storage realization method and system
CN103488685A (en) Fragmented-file storage method based on distributed storage system
CN112380005A (en) Data center energy consumption management method and system
CN104035925A (en) Data storage method and device and storage system
Ranjana et al. A survey on power aware virtual machine placement strategies in a cloud data center
CN101827120A (en) Cluster storage method and system
CN115617762A (en) File storage method and equipment
CN108363719B (en) Configurable transparent compression method in distributed file system
CN101938516A (en) User-oriented dynamic storage resource distribution method
CN106980618B (en) File storage method and system based on MongoDB distributed cluster architecture
CN103020077A (en) Method for managing memory of real-time database of power system
CN201804331U (en) Date deduplication system based on co-processor
CN104516821B (en) Storage management method and memory management unit

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant