CN109408290A

CN109408290A - A kind of fragment file access pattern method, apparatus and storage medium based on InnoDB

Info

Publication number: CN109408290A
Application number: CN201811225169.1A
Authority: CN
Inventors: 梁德荣; 田庆宜; 黄建邦; 沈长达; 吴少华; 张学君
Original assignee: Xiamen Meiya Pico Information Co Ltd
Current assignee: Xiamen Meiya Pico Information Co Ltd
Priority date: 2018-10-19
Filing date: 2018-10-19
Publication date: 2019-03-01
Anticipated expiration: 2038-10-19
Also published as: CN109408290B

Abstract

The present invention provides a kind of fragment file access pattern method, apparatus and storage medium based on InnoDB, this method comprises: reading a data page of the n byte data as InnoDB data file from the initial position based on InnoDB；Preceding 4 bytes for reading the data page are denoted as check value CheckSum1, calculate the check value CheckSum2 of the data page, judge whether CheckSum1 is equal to CheckSum2, if it is not, then Offset=Offset+m, reading data again, if it is, being restored；The page number PageNo of the data page and the file identification FileId of the affiliated file of the data page are read, the merging of data page is carried out according to the FileId, and is ranked up from small to large in affiliated file according to page number PageNo.The present invention is based on the page structures of InnoDB data file, it can restore data from entire disk, mirror image, file system files record can not depended on and carry out data recovery, if file partial destruction, it is capable of the non-broken parts of extraction document, if the fragment comprising multiple data files, can carry out tracing to the source to fragment recombinating and sorting to fragment is recombinated.

Description

A kind of fragment file access pattern method, apparatus and storage medium based on InnoDB

Technical field

The present invention relates to technical field of data processing, especially a kind of fragment file access pattern method based on InnoDB, dress It sets and storage medium.

Background technique

InnoDB has a wide range of applications as the default storage engine of MySql data.In database recovery, electronic data Industry of collecting evidence restores more urgent to the recovery of MySql database.When MySql database file is artificially deleted, viral subversive, Bad Track etc. leads to data file loss situation, and how accurately, comprehensively restoring file data is one important and urgent Cut problem to be solved.

Have many recovery softwares for deleting file currently on the market, these are all based on the extensive of file system files record The multiple or recovery based on file signature, the restoration methods based on file system files record have following shortcoming: 1, file Record can not be restored after new file record covering；2, disk execute quick format cause file record to be emptied can not be extensive It is multiple；3, disk has gone bad track to cause can not read file record and can not restore in file record.Recovery side based on file signature Method has following deficiency: 1, file data can not discontinuously restore on disk；2, the signature of file header and file is capped can not Restore.

Summary of the invention

The present invention is directed to above-mentioned defect in the prior art, proposes following technical solution.

A kind of fragment file access pattern method based on InnoDB, this method comprises:

Read step reads n byte data as InnoDB number since the initial position Offset=0 based on InnoDB According to a data page of file；

Matching step, preceding 4 bytes for reading the data page are denoted as check value CheckSum1, use the folding of data page Folded and checking algorithm calculates the check value CheckSum2 of the data page, judges whether CheckSuml is equal to CheckSum2, If it is not, then Offset=Offset+m, re-executes read step, if it is, executing recovering step；

Recovering step reads the page number PageNo of the data page and the file identification of the affiliated file of the data page FileId, according to the FileId carry out data page merging, and according to page number PageNo in affiliated file from small to large into Row sequence, then enables Offset=Offset+n, re-executes read step；Wherein, m is a data offset identity, n mono- The size of a data page.

Further, the fragment file is ibdata and/or ibd fragment file.

Further, the check value that the data page is calculated using the folding and checking algorithm of data page The operation of CheckSum2 are as follows: it is that the one piece of data of 22 bytes is folded that length is taken since the 4th byte of the data page It is sum1 that test value, which is calculated, in exclusive or, and it is a number of segment of n-46 byte that length is taken since the 38th byte of the data page It is sum2 that test value, which is calculated, according to progress Puckering-XOR, then the check value checksum2=sum1+sum2 of the data page.

Further, two integer exclusive or value-based algorithms are defined, operator is set as * *: set two 4 byte integer a and B, exclusive or value-based algorithm are as follows:

A**b=(((((a^b^RANDOM_MASK) < < 8)+a) ^RANDOM_MASK2)+b)；

That is the value of a exclusive or b exclusive or RANDOM_MASK moves to left 8 plus a, and exclusive or RANDOM_MASK2 adds b again；

The operation that the Puckering-XOR calculates are as follows: setting number of folds flod initial value is 0, traverses the number of segment by byte order According to, if traversal structure is data set N { N1, N2, N3 .., Nm }, successively calculated by integer exclusive or value-based algorithm with fold, Return value is updated to flod, i.e. flod=flod**Ni, wherein 1=< i <=m, RANDOM_MASK=1653893711, RANDOM_MASK2=1463735687.

Further, the operation that the segment data forms data set N { N1, N2, N3 .., Nm } is traversed by byte order are as follows: It is read from the initial position of the segment data according to every four bytes and generates a 4 byte integers, if last remaining data Remaining byte number is formed an integer as Nm by less than 4 bytes.

The invention also provides a kind of fragment file restoring device based on InnoDB, the device include:

Reading unit, for reading the conduct of n byte data since the initial position Offset=0 based on InnoDB One data page of InnoDB data file；

Matching unit, preceding 4 bytes for reading the data page are denoted as check value CheckSum1, use data page Folding and checking algorithm calculate the check value CheckSum2 of the data page, judge whether CheckSum1 is equal to CheckSum2, if it is not, then Offset=Offset+m, re-executes the operation of reading unit, if it is, executing recovery The operation of unit；

Recovery unit, for reading the page number PageNo of the data page and the file identification of the affiliated file of the data page FileId, according to the FileId carry out data page merging, and according to page number PageNo in affiliated file from small to large into Row sequence, then enables Offset=Offset+n, re-executes the operation of reading unit；Wherein, m is a data-bias list Position, n are the size of a data page.

Further, the fragment file is ibdata and/or ibd fragment file.

A**b=(((((a^b^RANDOM_MASK) < < 8)+a) ^RANDOM_MASK2)+b)；

The invention also provides a kind of computer readable storage medium, computer program generation is stored on the storage medium Code, above-mentioned any method is executed when the computer program code is computer-executed.

Technical effect of the invention are as follows: the present invention is based on the page structures of InnoDB data file, it can data page is single Position carries out the recovery of data, can restore data file from the storage mediums such as entire disk, mirror image, can not depend on file system Unite file record carry out data recovery, if file partial destruction (such as by virus encrypt, part covering), can extraction document not Broken parts can carry out fragment according to file identification FileId if in storage medium including the fragment of multiple data files Recombination of tracing to the source can sort to fragment according to page number PageNo heavy even if fragmentation of data discontinuous and disorder distribution in disk Group.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon.

Fig. 1 is the schematic diagram of the InnoDB data file structure of embodiment according to the present invention.

Fig. 2 is the data page schematic diagram of the InnoDB of embodiment according to the present invention.

Fig. 3 is a kind of flow chart of fragment file access pattern method based on InnoDB of embodiment according to the present invention.

Fig. 4 is a kind of structure chart of fragment file restoring device based on InnoDB of embodiment according to the present invention.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

InnoDB is to handle maximum performance design when huge data volume.Completely and MySQL server based on InnoDB Integration is buffer pool that is data cached in main memory and indexing and maintain own based on InnoDB.InnoDB stores it For table & index in a table space, table space may include several files (or raw disk partition).Technically, InnoDB It is a full database system for being mounted on the backstage MySQL, InnoDB establishes its dedicated buffer pool in main memory and is used for height Fast buffered data and index.

Shown in Figure 1, InnoDB data file structure is made of a series of data page, and data page presses page number from 0 Start ascending be ranked up.

Refering to what is shown in Fig. 2, the size of the data page of InnoDB storage engines is 16384 bytes, wherein 0~3 byte stores Check value, 4~7 bytes store page number PageNo, 34~37 byte storage file ID, i.e. file identification FileId.

Based on above-mentioned introduction, the structure of the file structure and data page that have understood InnoDB storage engines is to realize that data are extensive Multiple basis.

The principle that data of the invention are restored are as follows: 1) check value of data block using folding and checking algorithm is calculated, if The check value being calculated and the check value of top margin portion storage are equal, which is exactly the page of InnoDB data file；2) exist In one database instance, each data file has a unique file ID, i.e. file identification, records in each data There is the file ID of affiliated file, file mergences is carried out to data page by the characteristic；3) each data page records the page number of this page, By page number sort ascending in affiliated file, data page is ranked up in file by the characteristic.

Fig. 3 shows a kind of fragment file access pattern method based on InnoDB of the invention, this method comprises:

Read step S101 reads the conduct of n byte data since the initial position Offset=0 based on InnoDB One data page of InnoDB data file.

Matching step S102, preceding 4 bytes for reading the data page are denoted as check value CheckSum1, use data page Folding and checking algorithm calculate the check value CheckSum2 of the data page, judge whether CheckSum1 is equal to CheckSum2 restores step if it is, executing if it is not, then Offset=Offset+m, re-executes read step S101 Rapid S103.An emphasis of the invention be exactly the check value for the check value and reading for calculating data page whether always, this be restore The key point of file and an important inventive point of the invention.

Recovering step S103 reads the page number PageNo of the data page and the file identification of the affiliated file of the data page FileId, according to the FileId carry out data page merging, and according to page number PageNo in affiliated file from small to large into Row sequence, then enables Offset=Offset+n, re-executes read step S101；Wherein, m is a data offset identity, n For the size of a data page, i.e. m can be 1 byte according to strategy is read, and a sector-size, cluster size etc., n is general It is certainly also likely to be other for 16384 bytes.The recovery operation is continued until that reading data finishes.

In matching step S102, preceding 4 bytes for reading the data page are denoted as check value CheckSum1, this is to know The specific structure of data page, referring to fig. 2 and above-mentioned corresponding description.

In InnoDB, the format of data is ibdata or ibd, and therefore, the fragment file type restored in the present invention is Ibdata or ibd fragment file, the two can also be restored together certainly.

Another important inventive point of the invention is the check value for calculating data page, this is to realize important step of the invention Suddenly, specifically, the folding and checking algorithm using data page calculates the behaviour of the check value CheckSum2 of the data page As: it takes the one piece of data that length is 22 bytes to carry out Puckering-XOR since the 4th byte of the data page and is calculated Test value is sum1, taken since the 38th byte of the data page length be n-46 byte one piece of data fold it is different Or it is sum2, then the check value checksum2=sum1+sum2 of the data page that test value, which is calculated,.

For the check value for calculating data page, invention also defines two integer exclusive or value-based algorithms, this is also the present invention Important inventive point, specifically, the operator of the exclusive or value-based algorithm is set as * *: setting two 4 bytes integer a and b, exclusive or value is calculated Method are as follows:

A**b=(((((a^b^RANDOM_MASK) < < 8)+a) ^RANDOM_MASK2)+b)；

It is that the invention proposes Puckering-XOR calculating, this is also important invention of the invention based on above-mentioned exclusive or value-based algorithm One of point, this is to realize key point of the invention, operation are as follows: setting number of folds flod initial value is 0, by byte order time Go through the segment data, if traversal structure is data set N { N1, N2, N3 .., Nm }, successively by integer exclusive or value-based algorithm and fold into Row calculates, and return value is updated to flod, i.e. flod=flod**Ni, wherein 1=< i <=m, i, m are integer, RANDOM_ MASK=1653893711, RANDOM_MASK2=1463735687.

In one embodiment, the behaviour that the segment data forms data set N { N1, N2, N3 .., Nm } is traversed by byte order As: it is read from the initial position of the segment data according to every four bytes and generates a 4 byte integers, if last remaining Remaining byte number is formed an integer as Nm by less than 4 bytes of data.

In one embodiment, it is shown below according to the combined mode that the FileId carries out data page,

Wherein, f indicates that fragment page information all in single ibdata/ibd file, PageCount indicate number of pages mesh, p_i ={ PageCheckSum_i, PageNo_i, FileId_i, Offset_i, at this point, i is the integer for being less than or equal to n greater than 0, indicate PageCheckSum_iThe check value of data page, PageNo_iIndicate the page number of data page, FileId_iIndicate text belonging to the data page Part id, Offset_iIndicate the data page in the position of disk.I.e. according to FileId_iThe data page recovered can be closed And then PageNo again_iData page is ranked up, the file after being restored.

Fig. 4 shows a kind of fragment file access pattern method based on InnoDB of the invention, this method comprises:

Reading unit 401, for reading the conduct of n byte data since the initial position Offset=0 based on InnoDB One data page of InnoDB data file.

Matching unit 402, preceding 4 bytes for reading the data page are denoted as check value CheckSum1, use data The folding and checking algorithm of page calculate the check value CheckSum2 of the data page, judge whether CheckSum1 is equal to CheckSum2, if it is not, then Offset=Offset+m, re-executes the operation of reading unit 401, if it is, executing extensive The operation of multiple unit 403.An emphasis of the invention be exactly the check value for the check value and reading for calculating data page whether always, This is the key point for restoring file and an important inventive point of the invention.

Recovery unit 403, for reading the page number PageNo of the data page and the files-designated of the affiliated file of the data page Know FileId, according to the FileId carry out data page merging, and according to page number PageNo in affiliated file from small to large It is ranked up, then enables Offset=Offset+n, re-execute the operation of reading unit 401；Wherein, m is that a data are inclined Unit is moved, n is the size of a data page, i.e. m can be 1 byte, a sector-size, a cluster size according to strategy is read Deng n is generally 16384 bytes, is certainly also likely to be other.The recovery operation is continued until that reading data finishes.

In the operation of matching unit 402, preceding 4 bytes for reading the data page are denoted as check value CheckSum1, this It is the specific structure for being realised that data page, referring to fig. 2 and above-mentioned corresponding description.

For the check value for calculating data page, invention also defines two integer exclusive or value-based algorithms, this is also the present invention Important inventive point, specifically, the operator of the exclusive or value-based algorithm is set as expecting: setting two 4 bytes integer a and b, exclusive or value is calculated Method are as follows:

A**b=(((((a^b^RANDOM_MASK) < < 8)+a) ^RANDOM_MASK2)+b)；

The present invention also verifies method of the invention, and verification mode is as follows:

(1) a 3GB size vhd mirror image is created using the disk management tool of windows system, and to carry and format Change mirror image.

(2) the data file TEST.ibd that a 8.65M size Innodb storage engines are copied into the subregion of carry (is copied Suspend copy during shellfish, and other data, which are written, into disk keeps file discontinuous in disk).

To disk carry out ibd file signature restore can only recovered part data, disk is carried out based on the extensive of file record It is multiple, file can not be restored, restore to mirror image the ibd file of available write-in using method of the invention, demonstrate this The technical effect of invention.

Technical effect of the invention be that the present invention is based on the page structures of InnoDB data file, it can data page is single Position carries out the recovery of data, can restore data file from the storage mediums such as entire disk, mirror image, can not depend on file system Unite file record carry out data recovery, if file partial destruction (such as by virus encrypt, part covering), can extraction document not Broken parts can carry out fragment according to file identification FileId if in storage medium including the fragment of multiple data files Recombination of tracing to the source can sort to fragment according to page number PageNo heavy even if fragmentation of data discontinuous and disorder distribution in disk Group.

Method of the invention is particularly suitable in mobile terminal device, the mobile terminal device can be smart phone, Tablet computer, laptop, desktop computer or PDA etc., certain mobile terminal device is also possible to others can be portable Electronic equipment having data processing function.

For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when application.

As seen through the above description of the embodiments, those skilled in the art can be understood that the application can It realizes by means of software and necessary general hardware platform.Based on this understanding, the technical solution essence of the application On in other words the part that contributes to existing technology can be embodied in the form of software products, the computer software product It can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the certain of each embodiment of the application or embodiment Method described in part.

It should be noted last that: above embodiments only illustrate and not to limitation technical solution of the present invention, although reference Above-described embodiment describes the invention in detail, those skilled in the art should understand that: it still can be to this hair It is bright to be modified or replaced equivalently, it without departing from the spirit or scope of the invention, or any substitutions, should all It is included within the scope of the claims of the present invention.

Claims

1. a kind of fragment file access pattern method based on InnoDB, which is characterized in that this method comprises:

Read step reads n byte data as InnoDB data text since the initial position Offset=0 based on InnoDB One data page of part；

Matching step, preceding 4 bytes for reading the data page are denoted as check value CheckSum1, using data page folding and Checking algorithm calculates the check value CheckSum2 of the data page, judges whether CheckSum1 is equal to CheckSum2, if No, then Offset=Offset+m, re-executes read step, if it is, executing recovering step；

Recovering step reads the page number PageNo of the data page and the file identification FileId of the affiliated file of the data page, root The merging of data page is carried out according to the FileId, and is ranked up from small to large in affiliated file according to page number PageNo, so After enable Offset=Offset+n, re-execute read step；

Wherein, m is a data offset identity, and n is the size of a data page.

2. the method according to claim 1, wherein the fragment file is ibdata and/or ibd fragment text Part.

3. according to the method described in claim 2, it is characterized by: described calculated using the folding and checking algorithm of data page The operation of the check value CheckSum2 of the data page are as follows: it is 22 bytes that length is taken since the 4th byte of the data page One piece of data carry out Puckering-XOR be calculated test value be sum1, take length since the 38th byte of the data page It carries out Puckering-XOR test value is calculated being sum2 for the one piece of data of n-46 byte, then the check value of the data page Checksum2=sum1+sum2.

4. according to the method described in claim 3, it is characterized in that,

Two integer exclusive or value-based algorithms are defined, operator is set as * *: setting two 4 bytes integer a and b, exclusive or value-based algorithm are as follows:

A**b=(((((a^b^RANDOM_MASK) < < 8)+a) ^RANDOM_MASK2)+b)；

The operation that the Puckering-XOR calculates are as follows: setting number of folds flod initial value is 0, traverses the segment data by byte order, If traversing structure is data set N { N1, N2, N3 .., Nm }, successively calculated, is returned with fold by integer exclusive or value-based algorithm Value, which updates, arrives flod, i.e. flod=flod**Ni, wherein 1=< i <=m, RANDOM_MASK=1653893711, RANDOM_MASK2=1463735687.

5. according to the method described in claim 4, it is characterized in that, traversing the segment data by byte order forms data set N The operation of { N1, N2, N3 .., Nm } are as follows: read from the initial position of the segment data according to every four bytes and generate 4 bytes Remaining byte number is formed an integer as Nm if less than 4 bytes of last remaining data by integer.

6. a kind of fragment file restoring device based on InnoDB, which is characterized in that the device includes:

Reading unit, for reading n byte data since the initial position Offset=0 based on InnoDB as InnoDB number According to a data page of file；

Matching unit, preceding 4 bytes for reading the data page are denoted as check value CheckSum1, use the folding of data page Folded and checking algorithm calculates the check value CheckSum2 of the data page, judges whether CheckSum1 is equal to CheckSum2, If it is not, then Offset=Offset+m, re-executes the operation of reading unit, if it is, executing the operation of recovery unit；

Recovery unit, for reading the page number PageNo of the data page and the file identification of the affiliated file of the data page FileId, according to the FileId carry out data page merging, and according to page number PageNo in affiliated file from small to large into Row sequence, then enables Offset=Offset+n, re-executes the operation of reading unit；

Wherein, m is a data offset identity, and n is the size of a data page.

7. device according to claim 6, which is characterized in that the fragment file is ibdata and/or ibd fragment text Part.

8. device according to claim 7, it is characterised in that: described to be calculated using the folding and checking algorithm of data page The operation of the check value CheckSum2 of the data page are as follows: it is 22 bytes that length is taken since the 4th byte of the data page One piece of data carry out Puckering-XOR be calculated test value be sum1, take length since the 38th byte of the data page It carries out Puckering-XOR test value is calculated being sum2 for the one piece of data of n-46 byte, then the check value of the data page Checksum2=sum1+sum2.

9. device according to claim 8, which is characterized in that

Two integer exclusive or value-based algorithms are defined, operator is set as expecting: setting two 4 bytes integer a and b, exclusive or value-based algorithm are as follows:

A**b=(((((a^b^RANDOM_MASK) < < 8)+a) ^RANDOM_MASK2)+b)；

10. device according to claim 9, which is characterized in that traverse the segment data by byte order and form data set N The operation of { N1, N2, N3 .., Nm } are as follows: read from the initial position of the segment data according to every four bytes and generate 4 bytes Remaining byte number is formed an integer as Nm if less than 4 bytes of last remaining data by integer.

11. a kind of computer readable storage medium, which is characterized in that it is stored with computer program code on the storage medium, When the computer program code is computer-executed, perform claim requires any method of 1-5.