CN101477487B

CN101477487B - Multiple incremental files backup and recovery method

Info

Publication number: CN101477487B
Application number: CN2009100459543A
Authority: CN
Inventors: 张漳; 顾夏申; 邹恒明
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2009-01-22
Filing date: 2009-01-22
Publication date: 2010-09-29
Anticipated expiration: 2029-01-22
Also published as: CN101477487A

Abstract

A backup method and a recovery method for multiple incremental files belong to the technical field of computers. The backup method comprises the following steps: scanning history backup and deciding whether to conduct backup or not and the backup mode; opening all index files and context files of the incremental backup and a complete backup file for reading; establishing an index network according to the index file of the multiple incremental backup; gradually reading the context of sub-blocks to a memory from the beginning to the end according to the index network; comparing with the corresponding position of the new source file when each file is read out, so as to generate new index files of the incremental backup block after block, and generate context file of new incremental backup for the unmatched part; releasing the memory occupied by a sub-block before the context of the next sub-block is read; and writing in a recovery file when each file is read out, and other procedures are the same as the backup part. In addition, the previous work for the recovery method part only comprises scanning history backup. The invention picks up the speed of backup and recovery, saves calculation time, and reduces backup space.

Description

The backup method of multiple incremental files and restoration methods

Technical field

The present invention relates to a kind of file backup method and restoration methods of field of computer technology, specifically is a kind of backup method and restoration methods of multiple incremental files.

Background technology

In current informationization, networked society, computing machine is being played the part of extremely important role in work and life.More and more enterprises, businessman, government bodies and individual obtain information, process information by computing machine, simultaneously the form of most important information with data file are kept in the computing machine.In case the data generation disaster that these are important will cause an enterprise to shut down, if loss of data also might make an enterprise face bankruptcy.Thus, people begin to pay close attention to how to guarantee that the intact of data, data backup then are most important solutions.

Find through literature search prior art, application number is 200610001299.8 Chinese patent, name is called " data reconstruction method ", this patent needs to carry out complete file between local and long-range and exchanges, thereby produced a large amount of Network Transmission, when having prolonged data release times, also brought transmission security hidden danger, the more important thing is, store the different editions (but the most contents of these different editions is identical) of identical file repeatedly, can cause a large amount of waste of storage space.This shows the technology of needs a kind of " incremental backup ", the feasible difference that only needs between the backup identical file different editions, thus reduce network overhead and storage overhead.

Also find by retrieval, application number is 200610116303.5 Chinese patent, name is called " method of file coupling in the computer network data backup ", this patent generates and transmits an incremental backup when each backup synthesizes to long-range, although generating an incremental backup ratio is easier to, but for increment repeatedly, this technology has following three weak points:

(1) in logic, incremental backup is the difference between " file of latest edition backup " and " the current file ", so will generate incremental backup, must at first have this two files, just can calculate difference.But because the mode of incremental backup is not stored the complete source file of each version, therefore " file of latest edition backup " directly do not exist, and this source file must at first be resumed out.This causes, and backup, and at first will do once and recover, and this is a kind of waste of time;

(2) speed in order to guarantee to recover, each backup must be synthesized, and the backup that just equals each version after synthetic has all comprised all historical incremental backup information, still wastes storage space (simultaneously, He Cheng action has also been wasted operation time).Under the extreme case, the storage space of waste and at every turn all back up complete file as many.In other words, this is not real repeatedly increment, is a series of single increment;

(3) if do not synthesize, just extremely slow when then recovering, as long as what incremental backup versions are arranged in history, will generate the how many times intermediate file, along with the version of backup is many more, it is more and more slower that recovery can become.

Summary of the invention

The objective of the invention is to overcome above-mentioned deficiency of the prior art, proposed a kind of backup method and restoration methods of multiple incremental files, make it support repeatedly increment, can before incremental backup, needn't recover earlier, thereby accelerate backup rate greatly; Behind the incremental backup, needn't synthesize, thereby save operation time, minimizing backup space greatly; During recovery, needn't generate any intermediate file, thereby accelerate resume speed greatly.

The present invention is achieved through the following technical solutions:

The backup method of multiple incremental files involved in the present invention comprises the steps:

Step 1, previous work: the historical backup of scanning, whether decision is backed up and backup mode;

Step 2 is opened the index file and the content file of all incremental backups, and the direct complete backup file of copy when backing up for the first time, in order to reading;

Step 3 according to the index file of incremental backup repeatedly, is set up the index network;

Step 4, according to the index network, block-by-block is read the content of up-to-date once each piecemeal of backup to internal memory from the beginning to the end;

Step 5, whenever read a piecemeal, all make comparisons with the correspondence position of the source file of latest edition, block-by-block generates the index file of new incremental backup, and unmatched part is wherein generated the content file of new incremental backup, then, before read next piecemeal content, discharge the shared internal memory of this piecemeal earlier;

Step 6, later stage work: discharge the memory headroom of index network, buffer release district memory headroom closes all.

The restoration methods of multiple incremental files involved in the present invention comprises the steps:

Step 1, previous work: the historical backup of scanning;

Step 2 is opened the index file and the content file of all incremental backups, and backup file fully, in order to reading;

Step 5 is whenever read a piecemeal, write recovery files all, and this recovery file is exactly the restoration result that finally obtains, and before the read next piecemeal, discharges the internal memory of this piece;

The historical backup of scanning described in the step 1, whether decision is backed up and backup mode, comprises the steps:

1. all historical backups (that is: deposit under the catalogue of backup enumerate All Files) of source file are sought in scanning, find this source file initial backup fully and all incremental backups afterwards, and deposit all incremental backups in a chained list;

2. take out wherein backup for the last time (may for backup or incremental backup) fully, relatively this backup file and need modification date of the source file of backup if the date of last backup file is newer, is then abandoned backup, otherwise is entered step 3.;

If 3. this document was never done backup in the past, then this time execute full backup, promptly directly copy this document.

Basis described in the step 3 is the index file of incremental backup repeatedly, sets up the index network, comprises following concrete steps:

1. construct article one index chain: each minute block message block-by-block in the initial index file is read, and in internal memory, constitute a chained list in order, wherein match block is pointed to the deposit position in the initial complete backup file, non-match block is pointed to the deposit position in the increment content file of this index file correspondence, and in the process of structure chained list, calculate the residing position of each piece, the starting point of " position " of each piece is the big or small sum of all pieces of this front, by the size of all pieces of front that add up, obtain the position of this piece;

2. construct second to a last index chain:,, all wherein piecemeal information structuring is become a chained list for each index file by from the experienced new remaining index file of sequential read; Simultaneously, match block is pointed to the deposit position in the last complete file, the piece that do not match points to the deposit position in the increment content file of this index correspondence, and calculates the position (by the size of add up front all pieces, obtain the position of this piece) of this piece in this version complete file;

3. connect adjacent two chains: for all two adjacent chains, claim that one of front is an old chain, one of back is new chain.Each match block for new chain, all point to dependence piece corresponding in the old chain, concrete grammar is, 2. the match block that obtains in by step is pointed to the deposit position in the last complete file, the position range of searching that piece in old chain has covered this position, position range is the scope between [reference position, the reference position+block length] of this piece in the complete file of own version, and this piece that has covered correspondence position is corresponding dependence piece.Like this, just formed the index network.

Described in the step 4 according to the index network, from the beginning to the end block-by-block read piecemeal content to internal memory, comprise following concrete steps:

1. will recover a piece, at first check the type of this piece, if do not match piece, then directly read the relevant position from the increment content file of correspondence; If 2. match block then forwards step to;

2. finding the dependence piece of this piece by the index network, read this and rely on piece, is the piece that do not match if rely on piece, then reads from the increment content file that relies on the piece correspondence.If relying on piece is match block, then continue to review to the dependence piece that relies on piece, up to running into the piece that do not match, perhaps run into the match block in the oldest index file chain.This is a recursive procedure;

If the dependence block length that 3. reads does not reach the needs of this piece, then in the index chain, search the next piece that relies on piece and continue to read, reach the requirement of this piece up to the total length of reading.Like this, just read the content of any one piece.

Block-by-block described in the step 5 generates the index file of new incremental backup, and unmatched part is wherein generated the content file of new incremental backup, it is specific as follows: as the backup of ancient deed is arranged, then form the incremental backup that changes to new file from ancient deed, the content of newer, old two files, when the content finding any newly-increased, deletion, revise, just they borders as " piecemeal ", after relatively finishing, newly, ancient deed all is divided into a plurality of piecemeals, every unmodified piece all is the piece of coupling; Every piece that increases, deletes, corrects one's mistakes, it all is unmatched, a newly-built index file, an increment content file, wherein write down the information of new each piecemeal of file in the index file, whether the information of piecemeal comprises position, length, mates, for positional information, if the piece of coupling, the then position of record block in ancient deed; If unmatched, the content of each piece that do not match is write down in the position of record block in the increment content file then in the increment content file.

Whenever read a piecemeal described in the step 5, all the write recovery file is specific as follows: as when recovering, to read in index file, according to the description to each piece, read in the content and write recovery file of piece, if match block then reads in ancient deed; If do not match piece, then in the increment content file, read, just recovered new file.But do not have complete ancient deed, so method is the content of reading ancient deed by the index network blocks.

The previous work of the restoration methods of multiple incremental files only comprises the historical backup of scanning, whenever reads a piecemeal in addition, write recovery files all, and other steps are identical with backup method.

Compared with prior art, the present invention has the following advantages:

(1) before the incremental backup, needn't recover earlier, thereby accelerate backup rate greatly.Incremental backup need compare with the complete source file of the last time backup and new source file, but repeatedly during increment, the last backup be incremental backup file (containing increment content file and index file), rather than complete source file.Prior art is had at first recover a complete source file and is used for comparison.But the present invention uses repeatedly the increment index file to generate the index network, utilizes the index network, and the source file that can regain one's integrity just can read any content in the complete source file, thereby avoid this step of recovery source file.If it is synthetic that prior art is done, then need the time of once recovering this moment when backup; If in backup is not do syntheticly, then the number of times of number of times that need to recover this moment and incremental backup as many;

(2) behind the incremental backup, needn't synthesize, thereby save operation time, minimizing backup space greatly.Prior art need be synthesized, and synthetic operation needs to recover the time of a source file at least; It is synthetic that the present invention has avoided, without any need for the time.The file size that synthesizes is about the summation of all delta files in history, supposes that the content file of each backup is approximately equal, is made as n, backs up k time altogether, and then prior art needs the storage space of k (k+1) n/2 approximately; It is synthetic that the present invention has avoided, and only needs the storage space of kn altogether, only accounts for 2/ (k+1) of prior art requisite space;

When (3) recovering, needn't generate any intermediate file, thereby accelerate resume speed greatly.The recovery of prior art must be based on the complete source file and an incremental backup in a Geju City, for increment repeatedly, n increment for example, prior art has at first recover the 1st intermediate file with the oldest source file and the 1st incremental backup, recover the 2nd intermediate file with the 1st intermediate file and the 2nd incremental backup again ... by that analogy, up to obtaining n-1 intermediate file, recover final source file with n-1 intermediate file and n incremental backup at last.The present invention utilizes the index network can directly locate and read content in the incremental backup of any one version (or the oldest source file backup), has therefore avoided the generation of any intermediate file fully.Prior art need be recovered the time of n file, and the present invention only need recover the time of 1 file.

Description of drawings

Fig. 1 is the fundamental diagram of single incremental backup in the embodiments of the invention;

Fig. 2 is the workflow diagram that the present invention repeatedly backs up, recovers;

Fig. 3 is a repeatedly incremental backup fundamental diagram of the present invention;

Fig. 4 is a repeatedly increment recovery fundamental diagram of the present invention.

Embodiment

Below in conjunction with accompanying drawing embodiments of the invention are elaborated: present embodiment is being to implement under the prerequisite with the technical solution of the present invention, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.

The principle of embodiment single incremental backup at first is described as shown in Figure 1, below.

After the file of needs backup is modified, just become new file from ancient deed.Suppose to have the backup of ancient deed, will obtain changing to the incremental backup of new file now from ancient deed.The content of newer, old two files is during the content that increases newly, deletes, revises when discovery is any, just they borders as " piecemeal ".After relatively finishing, new, ancient deed all is divided into a plurality of piecemeals.Every unmodified piece all is the piece of coupling; Every piece that increases, deletes, corrects one's mistakes all is unmatched.A newly-built index file, an increment content file, whether the information of wherein writing down new each piecemeal of file in the index file (comprises position, length, mates.For " position " this information, if the piece of coupling, the then position of record block in ancient deed; If unmatched, the position of record block in the increment content file then), write down the content of each piece that do not match in the increment content file.When recovering, read in index file, according to description, read in the content and write recovery file of piece (if match block then reads in ancient deed to each piece; If do not match piece, then in the increment content file, read), just recovered new file.

With the concrete data instance among Fig. 1, the ancient deed content is " 10 20 30 40 50 60 70 80 90 ", and new file content is " 10 40 50 60 61 62 63 70 85 86 90 ".The content that compares these two files, according to the change border of (newly-increased, deletion, revise) of these two files, can be they piecemeals, ancient deed is divided into " 10 ", " 20 30 ", " 40 50 60 ", " 70 ", " 80 ", " 90 " these pieces, and new file is divided into " 10 ", " 40 50 60 ", " 61 62 63 ", " 70 ", " 85 86 ", " 90 " these pieces.Wherein, " 10 ", " 40 50 60 ", " 70 ", " 90 " these four, new file is identical with ancient deed, so be exactly match block, they need not to write in the increment content file." 61 62 63 " and " 85 86 " two have only new file to have, and ancient deed does not have, so they are the pieces that do not match, need be written in the increment content file.Other piece has not existed in new file in the ancient deed, is exactly the piece of deletion, and they be need not any processing.No matter be match block or do not match piece, their information all will write in the index file, and these information comprise: match block, position are (if match block then refers to the position in ancient deed; If do not match piece, then refer to the position in the increment content file), length (for example, piece " 10 " length is 1, and piece " 61 62 63 " length is 3).This has just finished backup.During recovery, read in the index file every information, information has pointed out which file to seek the content of piece (if match block is then sought from ancient deed; If do not match piece, then in the increment content file, seek), seek from which position of file piece content (" position " information has been pointed out this point), piece in how long have (" length " information has been pointed out this point).Use these information, the content of each piece is read out, write a file in order, promptly recover successfully.

As shown in Figure 2, present embodiment relates to backup method and restoration methods two parts of multiple incremental files, and wherein, the backup method of multiple incremental files comprises the steps:

Step 1, the historical backup of scanning, whether decision is backed up and backup mode;

Step 5, whenever read a piecemeal, all make comparisons with the correspondence position of new source file, block-by-block generates the index file of new incremental backup, and unmatched part is wherein generated the content file of new incremental backup, then, before read next piecemeal content, discharge the shared internal memory of this piecemeal earlier;

Step 6, the memory headroom of release index network, buffer release district memory headroom closes all.

The restoration methods of multiple incremental files comprises the steps:

Step 1, the historical backup of scanning;

The historical backup of described scanning, whether decision is backed up and backup mode, comprises the steps:

As shown in Figure 3, Figure 4, present embodiment relates to the repeatedly index file of incremental backup of the backup method of multiple incremental files and the described basis of restoration methods, sets up the index network, comprises the steps:

As shown in Figure 3, Figure 4, it is described according to the index network that present embodiment relates to the backup method and the restoration methods of multiple incremental files, from the beginning to the end block-by-block read piecemeal content to internal memory, comprise following concrete steps:

If the dependence block length that 3. reads does not reach this fast needs, then in the index chain, search the next piece that relies on piece and continue to read, reach the requirement of this piece up to the total length of reading.Like this, just read the content of any one piece.

In the step 5 of backup method, described block-by-block generates the index file of new incremental backup, and unmatched part is wherein generated the content file of new incremental backup, it is specific as follows: as the backup of ancient deed is arranged, then form the incremental backup that changes to new file from ancient deed, the content of newer, old two files, when the content finding any newly-increased, deletion, revise, just they borders as " piecemeal ", after relatively finishing, newly, ancient deed all is divided into a plurality of piecemeals, every unmodified piece all is the piece of coupling; Every piece that increases, deletes, corrects one's mistakes, it all is unmatched, a newly-built index file, an increment content file, wherein write down the information of new each piecemeal of file in the index file, whether the information of piecemeal comprises position, length, mates, for positional information, if the piece of coupling, the then position of record block in ancient deed; If unmatched, the content of each piece that do not match is write down in the position of record block in the increment content file then in the increment content file.

As shown in Figure 4, in the step 5 of restoration methods, a described piecemeal, all the write recovery file whenever read, specific as follows: as when recovering, to read in index file, according to description to each piece, read in the content and write recovery file of piece, if match block then reads in ancient deed; If do not match piece, then in the increment content file, read, just recovered new file.But do not have complete ancient deed, so method is the content of reading ancient deed by the index network blocks.

Present embodiment has been avoided first recovery before incremental backup, thereby accelerates backup rate greatly; Behind incremental backup, avoided synthesizing, thereby saved operation time, minimizing backup space greatly; When recovering, avoided generating any intermediate file, thereby accelerated resume speed greatly.

Claims

1. the backup method of a multiple incremental files is characterized in that, comprises the steps:

Step 3 according to a plurality of index files that incremental backup repeatedly obtains, is set up the index network, comprises the steps:

1. construct article one index file chain: each minute block message block-by-block in the initial index file is read, and in internal memory, constitute a chained list in order, wherein match block is pointed to the deposit position in the initial complete backup file, non-match block is pointed to the deposit position in the content file of incremental backup of this index file correspondence, and in the process of structure chained list, calculate the residing position of each piece, the starting point of the position of each piece is the big or small sum of all pieces of this front, by the size of all pieces of front that add up, obtain the position of this piece;

2. construct second to a last index file chain:,, all wherein piecemeal information structuring is become a chained list for each index file by from the experienced new remaining index file of sequential read; Simultaneously, match block is pointed to the deposit position in the last source file, and the piece that do not match points to the deposit position in the content file of incremental backup of this index correspondence, and calculates the position of this piece in the source file of this version;

3. connect adjacent two chains: for all two adjacent chains, claim that one of front is an old chain, next one is new chain, each match block for new chain, all point to dependence piece corresponding in the old chain, concrete grammar is, 2. the match block that obtains in by step is pointed to the deposit position in the last source file, the position range of searching which piece in old chain has covered this position, position range is the [reference position of this piece in the source file of own version, reference position+block length] between scope, this piece that has covered correspondence position is corresponding dependence piece, has formed the index network at this point;

Step 4, according to the index network, block-by-block is read the content of up-to-date once each piecemeal of backup to internal memory from the beginning to the end, whenever read simultaneously a piecemeal, all make comparisons with the correspondence position of the source file of latest edition, block-by-block generates the index file of new incremental backup, and unmatched part is wherein generated the content file of new incremental backup, then, before read next piecemeal content, discharge the shared internal memory of this piecemeal earlier;

Step 5, the memory headroom of release index network, buffer release district memory headroom closes all.

2. the backup method of multiple incremental files according to claim 1 is characterized in that, in the step 1

1. all historical backups of source file are sought in scanning, find the initial complete backup file of this source file and all incremental backups afterwards, and deposit all incremental backups in a chained list;

2. take out wherein backup for the last time, relatively the modification date of this backup file and the source file that needs backup,, then abandon backup, otherwise enter step 3. if the date of last backup file is newer;

3. the backup method of multiple incremental files according to claim 1 is characterized in that, in the step 4 according to the index network, from the beginning to the end block-by-block read piecemeal content to internal memory, comprise the steps:

1. will recover a piece, at first check the type of this piece, if do not match piece, then directly read the relevant position from the content file of the incremental backup of correspondence; If 2. match block then forwards step to;

2. find the dependence piece of this piece by the index network, read this and rely on piece, if relying on piece is the piece that do not match, then from the content file of the incremental backup that relies on the piece correspondence, read, if relying on piece is match block, then continue to review,, perhaps run into the match block in the oldest index file chain up to running into the piece that do not match to the dependence piece that relies on piece;

If the dependence block length that 3. reads does not reach the needs of this piece, then search the next piece that relies on piece in the indexed file chain and continue to read, reach the requirement of this piece up to the total length of reading.

4. the backup method of multiple incremental files according to claim 1, it is characterized in that, block-by-block generates the index file of new incremental backup in the step 4, and unmatched part is wherein generated the content file of new incremental backup, it is specific as follows: as the backup of source file is arranged, then form the incremental backup that changes to new file from source file, the content of newer file and source file, any newly-increased when finding, deletion, during the content revised, just they borders as " piecemeal ", after relatively finishing, new file and source file all are divided into a plurality of piecemeals, every unmodified piece all is the piece of coupling; Every piece that increases, deletes, corrects one's mistakes, it all is unmatched, the content file of a newly-built index file, an incremental backup, wherein write down the information of new each piecemeal of file in the index file, whether the information of piecemeal comprises position, length, mates, for positional information, if the piece of coupling, the then position of record block in source file; If unmatched, the content of each piece that do not match is write down in the position of record block in the content file of incremental backup then in the content file of incremental backup.

5. the restoration methods of a multiple incremental files is characterized in that, comprises the steps:

Step 1, the historical backup of scanning;

Step 4, according to the index network, block-by-block is read the content of up-to-date once each piecemeal of backup to internal memory from the beginning to the end, whenever read simultaneously a piecemeal, write recovery files all, this recovery file is exactly the restoration result that finally obtains, before the read next piecemeal, discharge the internal memory of this piece;

6. the restoration methods of multiple incremental files according to claim 5 is characterized in that, in the step 4 according to the index network, from the beginning to the end block-by-block read piecemeal content to internal memory, comprise the steps:

7. the restoration methods of multiple incremental files according to claim 5, it is characterized in that, in the step 4, whenever read a piecemeal, all the write recovery file, be meant: when recovering, read in index file,, read in the content and write recovery file of piece according to description to each piece, if match block then reads in source file; If do not match piece, then in the content file of incremental backup, read, just recovered new file.