CN102385554B

CN102385554B - Method for optimizing duplicated data deletion system

Info

Publication number: CN102385554B
Application number: CN201110335112.9A
Authority: CN
Inventors: 黄建忠; 曹强; 万胜刚; 谢平; 韩帅军; 谢长生
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2011-10-28
Filing date: 2011-10-28
Publication date: 2014-01-15
Anticipated expiration: 2031-10-28
Also published as: CN102385554A

Abstract

The invention relates to a method for optimizing a duplicated data deletion system. The method comprises the following steps of: acquiring blocks of current data; performing Hash calculation on the blocks to obtain Hash fingerprints of the blocks; judging whether the Hash fingerprints of the blocks are in a Hash fingerprint database or not; judging whether reference number of the blocks in the Hash fingerprint database is larger than a threshold valve and copy number is smaller than the threshold value or not if the Hash fingerprints of the blocks are in the Hash fingerprint database; judging whether information of the blocks exist in an index table or not if the reference number of the blocks in the Hash fingerprint database is larger than the threshold value and the copy number is smaller than the threshold value; calling a node distribution process to select a light-load node if the information of the blocks exist in the index table; storing the blocks in the light-load node; updating the information of the blocks in the index table; and adding 1 into the reference number of the blocks in the Hash fingerprint database. According to the method, storage space can be dynamically distributed according to the current load of all storage nodes and the energy consumption state of the duplicated data deletion system, the working load of all storage nodes is balanced, and the system performance is improved.

Description

The optimization method of data deduplication system

Technical field

The present invention relates to field of data storage, be specifically related to a kind of optimization method of data deduplication system.

Background technology

The content-based addressing of data deduplication system, can eliminate the data of repetition, improves space availability ratio.Data deduplication system is divided into several little deblockings by file by given Data Partition Strategy, and certain feature based on deblocking is carried out same detection, only store unduplicated deblocking, thereby reach deletion redundant data, save the object of storage space.In practical application, also usually in conjunction with data compression technique, further reduce the shared storage space of deblocking.

The research of existing data de-duplication technology mainly concentrates on excavates redundant information and raises the efficiency, and to the management of memory node with allocation of space, only adopts simple allocation manager strategy; And the load of memory node and power consumption state etc. do not have enough considerations during for memory allocation, can not the residing various states of self-adaptation memory node, be also unfavorable for that the load balancing of whole system and performance improve.

Summary of the invention

The object of the present invention is to provide a kind of optimization method of data deduplication system, the method can be according to the load of current each memory node of data deduplication system and power consumption state memory allocated space dynamically, the operating load of each memory node of balance, improves system performance.

The present invention is achieved by the following scheme:

A kind of optimization method of data deduplication system, comprise the following steps: the piecemeal that obtains current data, piecemeal is carried out to Hash calculation, to obtain the Hash fingerprint of piecemeal, whether the Hash fingerprint that judges piecemeal is present in Hash fingerprint base, if the Hash fingerprint of piecemeal is present in Hash fingerprint base, the number of references that judges piecemeal in Hash fingerprint base is greater than threshold value and whether number of copies is less than threshold value, if the number of references of piecemeal is greater than threshold value and number of copies is less than threshold value in Hash fingerprint base, whether the information that judges piecemeal is present in concordance list, if the information of piecemeal is present in concordance list, call node assigning process, to select underload node, piecemeal is stored in underload node, upgrade the information of piecemeal in concordance list, the number of references of piecemeal in Hash fingerprint base is added to 1.

Optimization method of the present invention also comprises step: if the Hash fingerprint of piecemeal is not present in Hash fingerprint base, call node assigning process, to select underload node, piecemeal is stored in underload node, the information of piecemeal is added in Hash fingerprint base.

Optimization method of the present invention also comprises step: if the number of references of piecemeal is not more than threshold value and number of copies is not less than threshold value in Hash fingerprint base, enters the number of references of piecemeal in Hash fingerprint base is added to 1 step.

Optimization method of the present invention also comprises step: if the information of piecemeal is not present in concordance list, the information of piecemeal is added in concordance list, and enter and call node assigning process, to select the step of underload node.

Call node assigning process, take and select the step of underload node to comprise: judge that whether current idle load queue is as empty, if current idle load queue is empty, judge whether current underload queue is empty, if current underload queue is empty, judge whether current heavy duty queue is empty, if current heavy duty queue is empty, judge whether current sleep queue is empty, if current dormancy load queue is not empty, wake the memory node in dormant state up, the system of sending is without the notice of available underload node.

Call node assigning process, take and select the step of underload node also to comprise: if current idle load queue is not sky, memory allocated space on first memory node of current unloaded load queue.

Call node assigning process, take and select the step of underload node also to comprise: if current underload queue is not sky, memory allocated space on first memory node in current underload queue.

Call node assigning process, take and select the step of underload node also to comprise: if current heavy duty queue is as empty, enter the system of sending without the step of the notice of available underload node.

Call node assigning process, take and select the step of underload node also to comprise: if current dormancy load queue is as empty, enter the system of sending without the step of the notice of available underload node.

The information of piecemeal comprises Hash fingerprint, number of references and the address of piecemeal.

Optimization method of the present invention has the following advantages:

(1) energy consumption perception: the present invention is by obtaining present load and the power consumption state information of each memory node, to deblocking memory allocated space flexibly;

(2) load balancing: the present invention, by specifying relatively underloaded memory node for the storage of deblocking, makes data storage procedure decentralized, and the balanced load of each memory node; Introduce replication policy, the deblocking that those are quoted by a plurality of files is set up copy, has improved availability and the reliability of data deduplication system.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the optimization method of data deduplication system of the present invention.

Fig. 2 is the refinement process flow diagram of step in optimization method of the present invention (6).

Embodiment

Below in conjunction with accompanying drawing, the present invention is further detailed explanation.

As shown in Figure 1, the optimization method of data deduplication system of the present invention comprises the following steps:

(1) obtain the piecemeal of current data;

(2) piecemeal is carried out to Hash calculation, to obtain the Hash fingerprint of piecemeal;

(3) whether the Hash fingerprint that judges piecemeal is present in Hash fingerprint base, if the Hash fingerprint of piecemeal is present in Hash fingerprint base, proceeds to step (4), otherwise proceeds to step (10);

(4) number of references that judges piecemeal in Hash fingerprint base whether is greater than threshold value and whether number of copies is less than threshold value, if the number of references of piecemeal is greater than threshold value and number of copies is less than threshold value in Hash fingerprint base, enter step (5), otherwise enter step (9);

(5) whether the information that judges piecemeal is present in concordance list, if the information of piecemeal is present in concordance list, enters step (6), otherwise enters step (11); In the present embodiment, the information of piecemeal comprises Hash fingerprint, number of references and the address of piecemeal.

(6) call node assigning process, select underload node;

(7) piecemeal is stored in the underload node of selection;

(8) upgrade the information of corresponding sub-block in concordance list;

(9) number of references of piecemeal adds 1;

(10) call node assigning process, select underload node, piecemeal is stored in the underload node of selection, the information of piecemeal is added in Hash fingerprint base, then return to step (9);

(11) information of piecemeal is added in concordance list, then return to step (6).

As shown in Figure 2, above-mentioned steps (6) further comprises following sub-step:

(61) judge that whether current idle load queue is empty, if idle load queue is empty, enters step (62), otherwise enters step (66);

(62) judge that whether current underload queue is empty, if underload queue is empty, enters step (63), otherwise enters step (67);

(63) judge that whether current heavy duty queue is empty, if heavy duty queue is empty, enters step (64), otherwise enters step (65);

(64) judge that current sleep queue, for empty, is to enter step (65), otherwise enters step (68);

(65) send system without the notice of available underload node;

(66) select first memory node memory allocated space of current idle load queue;

(67) select first memory node memory allocated space of current underload queue;

(68) wake the memory node in dormant state up, and return to step (65).

The present invention is not only confined to above-mentioned embodiment; persons skilled in the art are according to content disclosed by the invention; can adopt other multiple embodiment to implement the present invention; therefore; every employing project organization of the present invention and thinking; do some simple designs that change or change, all fall into the scope of protection of the invention.

Claims

1. an optimization method for data deduplication system, comprises the following steps:

Obtain the piecemeal of current data;

Described piecemeal is carried out to Hash calculation, to obtain the Hash fingerprint of described piecemeal;

Whether the Hash fingerprint that judges described piecemeal is present in Hash fingerprint base;

If the Hash fingerprint of described piecemeal is present in described Hash fingerprint base, the number of references that judges piecemeal described in described Hash fingerprint base whether is greater than threshold value and whether number of copies is less than threshold value;

If the number of references of piecemeal described in described Hash fingerprint base is greater than described threshold value and number of copies is less than described threshold value, judge whether the information of described piecemeal is present in concordance list;

If the information of described piecemeal is present in described concordance list, call node assigning process, to select underload node;

Described piecemeal is stored in described underload node;

Upgrade the information of piecemeal described in described concordance list;

The number of references of piecemeal described in described Hash fingerprint base is added to 1.

2. optimization method according to claim 1, it is characterized in that, also comprise step: if the Hash fingerprint of described piecemeal is not present in described Hash fingerprint base, call node assigning process, to select underload node, described piecemeal is stored in described underload node, the information of described piecemeal is added in described Hash fingerprint base.

3. optimization method according to claim 1, it is characterized in that, also comprise step: if the number of references of piecemeal described in described Hash fingerprint base is not more than threshold value and number of copies is not less than described threshold value, enter the described number of references by piecemeal described in described Hash fingerprint base and add 1 step.

4. optimization method according to claim 1, it is characterized in that, also comprise step: if the information of described piecemeal is not present in described concordance list, the information of described piecemeal is added in described concordance list, and described in entering, call node assigning process, to select the step of underload node.

5. optimization method according to claim 1, is characterized in that, described in call node assigning process, to select the step of underload node to comprise:

Judge whether current idle load queue is empty;

If described current idle load queue is empty, judge whether current underload queue is empty;

If described current underload queue is empty, judge whether current heavy duty queue is empty;

If described current heavy duty queue is empty, judge whether current sleep queue is empty;

If described current sleep queue is not empty, wake the memory node in dormant state up;

The system of sending is without the notice of available underload node.

6. optimization method according to claim 5, it is characterized in that, the described node assigning process that calls, take and select the step of underload node also to comprise: if described current idle load queue is not sky, memory allocated space on first memory node of described current idle load queue.

7. optimization method according to claim 5, it is characterized in that, the described node assigning process that calls, take and select the step of underload node also to comprise: if described current underload queue is not sky, memory allocated space on first memory node in described current underload queue.

8. optimization method according to claim 5, it is characterized in that, the described node assigning process that calls, take and select the step of underload node also to comprise: if described current heavy duty queue is as empty, described in entering, send system without the step of the notice of available underload node.

9. optimization method according to claim 5, it is characterized in that, the described node assigning process that calls, take and select the step of underload node also to comprise: if described current dormancy load queue is as empty, described in entering, send system without the step of the notice of available underload node.

10. optimization method according to claim 1, is characterized in that: the information of described piecemeal comprises Hash fingerprint, number of references and the address of described piecemeal.