CN113377778B

CN113377778B - Method and device for comparing differences of database tables

Info

Publication number: CN113377778B
Application number: CN202110747712.XA
Authority: CN
Inventors: 王岚; 高志会; 陈勇铨; 胡军擎
Original assignee: Shanghai Information2 Software Inc
Current assignee: Shanghai Information2 Software Inc
Priority date: 2021-07-01
Filing date: 2021-07-01
Publication date: 2022-04-22
Anticipated expiration: 2041-07-01
Also published as: CN113377778A

Abstract

The invention discloses a difference comparison method and a device of a database table, wherein the method comprises the following steps: step S1, receiving the encrypted database table data of the source end and the backup end, comparing the data rows, caching the data of the database tables of the source end and the backup end which are not matched and failed to be matched in a cache set according to the comparison result, and recording the number of the data rows which are not matched; step S2, after comparing a certain batch of data, judging whether to perform data caching operation according to the recorded data line number, and when the data caching operation is required, acquiring the data required to be cached from a caching set; step S3, creating a file for storing the difference data, writing the acquired data into the corresponding file, and deleting the recorded corresponding line number and the data in the cache set; step S4, after the data transmission from the source end and the backup end is finished, the file storing the difference data is read, and the ordinary table comparison operation is executed.

Description

Method and device for comparing differences of database tables

Technical Field

The invention relates to the technical field of computer data backup disaster recovery, in particular to a difference comparison method and device for a database table with large data volume.

Background

At present, in the process of disaster recovery backup of computer data, comparison of database tables is often required, for example, a difference between current database tables of a backup source end and a backup destination end is required to be queried before data backup is performed to determine whether to backup, at this time, comparison of the database tables is required, and after data backup, it is determined whether data is complete and accurate, and comparison of the database tables is also required to be performed on the backup source end and the backup destination end, so that the difference comparison of the database tables is very important in the process of disaster recovery backup of computer data.

At present, the database table comparison process in the computer data disaster recovery backup generally adopts the following processes: when the database tables are compared, each row of data in the source database and each row of data in the backup database need to be encrypted by MD5, and then compared by MD5 data, if the data are the same, the rows of data are the same.

Specifically, before MD5 encryption is performed on a data row, data of each row needs to be processed, during processing, each column of each row of data is converted (that is, each column of data of a row is stored in a continuous byte space, and then MD5 encryption is performed on the continuous space), then conversion results of all columns are summarized, MD5 encryption operation is performed after the summary, data of a source end is processed at a source end and is already MD5 data when being sent to a backup end, after a data row of the source end is sent to the backup end, the backup end reads database data of the backup end, then MD5 encryption is performed on the data row, and the data row is compared with data sent by the source end, at this time, if the difference data amount is especially large, or it is uncertain which end data reading is especially slow, the comparison process is slow, so that a large amount of data which is not compared is accumulated, because a large amount of data is accumulated in a memory, possibly leading to memory bursting and process crash.

Disclosure of Invention

In order to overcome the defects in the prior art, the present invention provides a method and an apparatus for comparing differences in database tables, so as to temporarily store a part of data in a file when the amount of data stored in the database tables is too large during comparison, and stop the table comparison and return the current result when the difference data is too large, thereby avoiding the problem of explosion of the memory.

In order to achieve the above object, the present invention provides a method for comparing differences of database tables, comprising the following steps:

step S1, receiving the encrypted database table data of the source end and the backup end, comparing the data rows, caching the data of the source end and the backup end database tables which are not matched and fail to be matched in a cache set according to the comparison result, and recording the data rows of the source end and the backup end database tables which are not matched and fail to be matched;

step S2, after comparing a certain batch of data, judging whether to perform data caching operation according to the data line numbers of the source end and the backup end which are recorded and are not matched with each other and are failed to be matched with each other, and when the data caching operation is required, acquiring data required to be cached from a cache set to store the data in a file for storing difference data;

step S3, the acquired data to be cached is subjected to file storage operation, a file for storing the difference data is created, the data is written into the corresponding file, and the corresponding line number of the record and the corresponding data in the cache set are deleted;

step S4, after the source data and the backup data are both sent, the file storing the difference data is read, and the normal table comparison operation is executed.

Preferably, in step S1, the data rows of the source and backup database tables that are not matched and failed to be matched are recorded by using a data row array, where the data row array is a two-dimensional integer array.

Preferably, in step S1, the data of the source and backup database tables that have not been matched and failed in matching are cached by using a cache set of two-dimensional hash arrays.

Preferably, the two-dimensional hash array for caching data of the source-end and backup-end database tables that are not matched and that have failed to match is:

m _ hash [ current data source end or standby end ] [ key value ]

Wherein the key value is obtained by encrypting the MD5 represented by the corresponding line data using a preset key function.

Preferably, in step S2, the number of data lines of the source end and the backup end that are not matched and failed in matching are added, and if the addition result is greater than a preset threshold, it is determined that a cache operation is required.

Preferably, before the cache data is saved, the recorded source data line number and the recorded backup data line number are compared to determine which end has the larger data line number, so as to determine the end with the larger storage line number.

Preferably, in step S2, when performing the data caching operation, a key is determined, and the stored data is obtained and stored in the same file according to the method of increasing 256 keys from the caching set in each traversal.

Preferably, in step S3, a file is created with the determined key and obj n of the current table as file names, the acquired data storage is written into the corresponding file, and the flag indicating whether the cache is stored is set to true.

Preferably, in step S4, before reading the file, the remaining unmatched data sent from the source and backup terminal in the cache set is stored in the corresponding file, and then the file is read, and a common table comparison operation is performed on the data in the file each time the file is read.

In order to achieve the above object, the present invention further provides a database table difference comparing apparatus, including:

the data row comparison processing unit is used for receiving the encrypted database table data of the source end and the backup end, performing data row comparison, caching the data of the source end and the backup end database tables which are not matched and fail to be matched in a cache set according to a comparison result, and recording the data row number of the source end and the backup end database tables which are not matched and fail to be matched;

the data caching unit is used for judging whether to perform data caching operation or not according to the data line numbers of the source end and the standby end which are not matched and are failed to be matched after a certain batch of data is compared, and acquiring data to be cached from a caching set to store the data in a file when the data caching operation is required;

the file storage unit is used for carrying out file storage operation on the acquired data needing to be cached, creating a file for storing the difference data, writing the data into a corresponding file, and deleting the recorded corresponding line number and the corresponding data in the cache set;

and the file reading and comparing unit is used for reading the file storing the difference data and executing the common table comparison operation after the source end data and the backup end data are sent.

Compared with the prior art, the difference comparison method and the difference comparison device for the database table temporarily store a part of data in a file when the amount of the data stored in the database table is overlarge during comparison of the database table, and stop table comparison and return a current result when the difference data is overlarge, so that the purpose of preventing the memory from being burst is achieved.

Drawings

FIG. 1 is a flowchart illustrating the steps of a method for comparing differences in database tables according to the present invention;

FIG. 2 is a diagram of a system architecture of a database table difference comparison apparatus according to the present invention;

Detailed Description

Other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention.

FIG. 1 is a flowchart illustrating the steps of a method for comparing differences in database tables according to the present invention. As shown in fig. 1, the difference comparison method for database tables of the present invention includes the following steps:

step S1, receiving the encrypted database table data of the source end and the backup end, comparing the data rows, caching the data of the source end and the backup end database tables which are not matched and fail to be matched in a cache set according to the comparison result, and recording the data rows of the source end and the backup end database tables which are not matched and fail to be matched.

In the present invention, the table is operated at the backup, first, a data row number group is initialized to record the data rows of the database tables of the source and backup that are not matched and failed to be matched, in the present invention, the data row number group is a two-dimensional integer number group corresponding to the source and backup, if the data of the source is not matched, the value of the corresponding source of the data row group is added with 1, if the data of the backup is not matched, the value of the corresponding backup of the data row group is added with 1, in the present invention, the data of the source and the backup are MD5 encrypted, that is, no matter the data row transmitted at this time is the source data or the backup, if there is no matched corresponding data, that is, MD5 is not the same, then the data row number of the corresponding backup in the data row number group is added with 1, that is, if the data of the source is not matched, then the value of the corresponding source end in the data line array is added with 1, if the data of the backup end is not matched, the value of the corresponding backup end in the data line array is added with 1.

In the database table comparison, in addition to recording the rows of the source end and the backup end, data of the source end and the backup end database tables which are not matched and fail to be matched are cached, specifically, the cache set is a two-dimensional hash array (2), that is, data of the database tables which are not matched and fail to be matched are stored in the two-dimensional hash array (2), because there are 256 values of row data MD5, in the specific embodiment of the present invention, a preset key function is used to encrypt MD5 represented by the row data to obtain a key value, and the data which are not matched and fail to be matched are stored in a corresponding key in the array during the table comparison, for example, the following two-dimensional hash arrays:

m _ hash [ current data source end or standby end ] [ key value ]

For example, if the key value obtained from the key function for a certain row of data MD5 is 254, the data of the data row is stored in m _ hash [0] [254], and if the key value obtained from the key function for a certain row of data MD5 is 762, the data of the data row is stored in m _ hash [0] [762 ]. Since the key is calculated from the MD5 of the data row, each time the data is stored, it is also random, and the valid data will not be the same, so that the data can be stored uniformly.

That is, it is assumed that 0 in the two-dimensional hash array (2) corresponds to the source, 1 corresponds to the backup, the source transmits data to compare with m _ hash [1] [ x ] data, the value of the source corresponding to the two-dimensional integer array of the recorded data line number is added with 1 if the matching fails, and the unmatched data of the source is stored in m _ hash [0] [ x ], whereas the data transmitted by the backup is compared with m _ hash [0] [ x ], the value of the backup corresponding to the two-dimensional integer array of the recorded data line number is added with 1 if the matching fails, and the unmatched data of the backup is stored in m _ hash [1] [ x ].

Step S2, after comparing a certain batch of data, determining whether to perform a data caching operation according to the data line numbers of the source end and the backup end that are recorded and are not matched, and the source end and the backup end that are failed in matching, and when a data caching operation needs to be performed, obtaining data that needs to be cached from a cache set to store the data in a file.

Specifically, after a certain batch of data is compared, the number of data lines of the source end and the backup end which are recorded and are not matched with each other and failed to be matched with each other are added, if the addition result is greater than a preset threshold (the threshold is a fixed value, for example, the maximum number of lines of the memory cache data), it indicates that cache operation needs to be performed, the data stored in the two-dimensional hash array (2) needs to be stored in a file, and if the addition result does not exceed the preset threshold, the next batch of data continues to be compared and processed.

Preferably, before storing the cache data, the recorded source data line number and the data line number of the backup are compared to determine which end line number is larger, so as to store the end with the larger line number.

And when the data line which fails to be matched is too large, preparing to acquire data from the cache set and store the data into a file. Specifically, a key is first determined (for example, a key stored at this time is determined using a macro definition ((idx) & (256-1)), and if the key for determining the file to be stored at this time is 11, file storage is performed by grabbing data from m _ hash [ x ] [11], m _ hash [ x ] [11+256] 2[, m _ hash [ x ] [11+256] 3], that is, data to be cached is obtained by adding 256 to the key (obtained by using a key function through the MD5 of the data row) at each traversal time in the cache set.

Step S3, performing file storage operation on the acquired data to be cached, creating a file for storing the difference data, writing the data into the corresponding file, and deleting the corresponding line number recorded and the corresponding data in the cache set.

Specifically, the file storage operation is started after the data to be stored is acquired, the file name of the file storing the difference data is created according to the key determined before and the obj n of the table, for example, the key is 254, the file with the file name of 254_123456 (supposing 123456 is table obj n) is created, then the data to be stored is written into the file of the corresponding directory, and the file storage flag, which is a flag indicating whether the cache is stored, is set to true.

Specifically, after the source end data and the backup end data are both sent, whether the file storage flag is true is judged, and if the file storage flag is true, the file is read (that is, when the obj _ key file is true, the file is read), because part of the cached data is different after comparison, but part of the cached data is not sent yet, and therefore comparison needs to be performed after the source end and the backup end of the table data are both sent.

Specifically, before reading a file, existing data (referring to data which is sent from a source and a standby terminal and is not matched after comparison and is left in a memory cache set) is stored in the file, then the file is read, one file (namely, an obj _ key file, wherein stored is data of row data which is successfully matched between the source terminal and the standby terminal) is read each time, namely, data in the file corresponding to one key is compared, and the file corresponding to the next key is compared after the data comparison in the file corresponding to one key is finished. Because the probability that the memory is filled with the file corresponding to a certain key is not high. Specifically, the content of the corresponding key file is read, the data sent by the source terminal and the data sent by the standby terminal are stored in the content, the source terminal and the standby terminal can be distinguished due to the fact that the stored data are of the types, and then common table comparison operation is performed on the data. Because the read file operation is performed, the problem of data delay is avoided, when the file corresponding to a certain key is compared, if the difference data of the file corresponding to the certain key is larger than the preset threshold value, the current comparison result is erroneous, the comparison result is not compared with other keys, the memory fullness is avoided, because the difference data has too much meaning, the table comparison is finished, and the table comparison operation needs to be returned again.

FIG. 2 is a system architecture diagram of a database table difference comparison apparatus according to the present invention. As shown in fig. 2, the difference comparing apparatus for database tables of the present invention includes:

and the data row comparison processing unit 201 is configured to receive the encrypted database table data of the source end and the backup end, perform data row comparison, cache the data of the source end and the backup end database tables that are not matched and failed in matching in the cache set according to the comparison result, and record the number of data rows of the source end and the backup end database tables that are not matched and failed in matching.

In the comparison of the database tables, in addition to recording the line numbers of the source end and the backup end, data of the source end and the backup end database tables which are not matched and fail to be matched need to be cached, specifically, the cache set is a two-dimensional hash array (2), that is, data of the database tables which are not matched and fail to be matched are stored in the two-dimensional hash array (2), because there are 256 values of the row data MD5, in the specific embodiment of the present invention, a preset key function is used to encrypt the MD5 represented by the row data to obtain a key value, and the data which are not matched and fail to be matched are stored in a corresponding key in the array during the comparison of the tables. For example, the following two-dimensional hash array:

m _ hash [ current data source end or standby end ] [ key value ]

And the data caching unit 202 is configured to, after a certain batch of data is compared, determine whether to perform a data caching operation according to the number of data lines of the source end and the backup end that are not matched and are failed to be matched, and, when the data caching operation needs to be performed, obtain data that needs to be cached from a cache set to store the data in a file.

Specifically, after a certain batch of data is compared, the number of data lines of the source end and the backup end which are not matched with each other and failed in matching are added, if the addition result is greater than a preset threshold (the threshold is a fixed value, for example, the maximum number of lines of the memory cache data), it indicates that a cache operation needs to be performed, the data stored in the 2-dimensional hash array (2) needs to be saved into a file, and if the addition result does not exceed the preset threshold, the next batch of data continues to be compared and processed.

And when the data line which fails to be matched is too large, preparing to acquire data from the cache set and store the data into a file. Specifically, the key of the file is determined (for example, using a macro definition ((idx) & (256-1)), and if the key of the file is determined to be 11 this time, the data is captured from m _ hash [ x ] [11], m _ hash [ x ] [11+256] 2[, m _ hash [ x ] [11+256] 3] for file storage, that is, the data to be cached acquires the stored data by adding 256 to the key from the cache set every traversal.

The file storage unit 203 is configured to perform a file storage operation on the acquired data that needs to be cached, create a file for storing the difference data, write the data into a corresponding file, and delete the recorded corresponding line number and the corresponding data in the cache set.

Specifically, the file storage operation is started after the data to be stored is acquired, the file name of the file storing the difference data is created according to the key determined before and the obj n of the table, for example, the key is 254, the file with the file name of 254_123456 (supposing 123456 is table obj n) is created, then the data to be stored is written into the file of the corresponding directory, and the flag indicating whether the cache is stored, that is, the file storage flag is set to true.

And a file reading and comparing unit 204, configured to read the file storing the difference data after the source data and the backup data are both sent, and perform a normal table comparison operation.

Specifically, after the source data and the backup data are both sent, it is determined whether the current file storage flag is true, and if true, the file is read (i.e., the obj _ key file is true during reading, and then there is a file reading operation), because part of the cached data is the difference after comparison, but part of the cached data is the data that is not yet transferred, and therefore the comparison needs to be performed after the source and the backup of the table data are both sent.

Specifically, before reading a file, existing data (referring to data which is sent from a source and a standby terminal and is not matched after comparison and is left in a memory cache set) is stored in the file, then the file is read, one file (namely, an obj _ key file which stores data of row data which is successfully matched between the source terminal and the standby terminal) is read each time, namely, data in the file corresponding to one key is compared, and the file corresponding to the next key is compared after the data comparison in the file corresponding to one key is finished. Because the probability that the memory is filled with the file corresponding to a certain key is not high. Specifically, the content of the corresponding key file is read, the data sent by the source terminal and the data sent by the standby terminal are stored in the content, the source terminal and the standby terminal can be distinguished due to the fact that the stored data are of the types, and then common table comparison operation is performed on the data. Because the read file operation is performed, the problem of data delay is avoided, when the file corresponding to a certain key is compared, if the difference data of the file corresponding to the certain key is larger than the preset threshold value, the current comparison result is erroneous, the comparison result is not compared with other keys, the memory fullness is avoided, because the difference data has too much meaning, the table comparison is finished, and the table comparison operation needs to be returned again.

Examples

In the embodiment of the present invention, the difference comparison operation performed by the database table is performed in the node program at the standby terminal, when data row comparison is performed in the node program at the standby terminal, a two-dimensional integer array is set in the node program at the standby terminal, and is used to record the number of data rows of the source terminal and the standby terminal which are not matched and failed to be matched, and if the transmitted row is source terminal data or standby terminal data, that is, if there is no matched corresponding data, that is, MD5 is not the same, the number of data rows of the corresponding terminal of the data is increased by 1.

When the database tables are compared, the line numbers of the source end and the standby end are recorded, and the data sent by the source end and the data sent by the standby end are both stored in a two-dimensional hash array. Because there are 256 values for each row data MD5, a key function is used to encrypt MD5 represented by row data to obtain a key value, and the data that matches and is invalid when the table is compared will be stored in the corresponding array. Since the key is calculated from the MD5 of the data row, each time the data is stored is random, and the valid data is not the same, the data can be uniformly stored

After a certain batch of data is compared, the line numbers of the cache data of the source terminal and the standby terminal are added, if the line numbers are larger than a certain limit value (the limit value is a fixed value and is the maximum line number of the cache data of the memory), the data cache operation is carried out, and if the line numbers are not larger than the limit value, the next batch of data is continuously compared and processed.

Before the cache data is stored, the line number of the cached source end data is compared with the line number of the cached backup end data to determine which end has a larger line number and store the end with a larger line number.

And when the data line which fails to be matched is too large, preparing to acquire data from the cache set and store the data into a file. Specifically, firstly, a key is determined, and then, 256 times of traversal increase are performed on the data needing to be cached from the cache set according to the key to obtain the stored data.

The file storage operation is started after the acquisition of the data to be stored, the file name is created from the key acquired before and the obj n of the table, and then the cache is written to the file of the corresponding directory. The flag of whether the cache is storing is set to true.

Because the number of rows has been deleted from the memory when the data is stored in the file, if the data is over-stored at some time, the data continues to be stored according to the step.

After the source end data and the backup end data are sent, judging whether the current file storage mark is true, and if the current file storage mark is true, reading the file.

Before reading the file, storing the existing data (referring to the data which is not matched after comparison and is sent from the source and the standby in the memory cache set) into the file, then reading the file, reading one file (namely, an obj _ key file which stores the data of the data which is not successfully matched between the source end and the standby end) each time, namely, comparing the data in the file corresponding to one key, and comparing the file corresponding to the next key after the comparison of the data in the file corresponding to one key is finished. Because the probability that the memory is filled with the file corresponding to a certain key is not high. Specifically, the content of the corresponding key file is read, the data sent by the source terminal and the data sent by the standby terminal are stored in the content, the source terminal and the standby terminal can be distinguished due to the fact that the stored data are of the types, and then common table comparison operation is performed on the data. Because the read file operation is performed, the problem of data delay is avoided, when the file corresponding to a certain key is compared, if the difference data of the file corresponding to the certain key is larger than the preset threshold value, the current comparison result is erroneous, the comparison result is not compared with other keys, the memory fullness is avoided, because the difference data has too much meaning, the table comparison is finished, and the table comparison operation needs to be returned again.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Modifications and variations can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present invention. Therefore, the scope of the invention should be determined from the following claims.

Claims

1. A difference comparison method of database tables comprises the following steps:

2. The method of comparing differences between database tables according to claim 1, wherein in step S1, the data rows of the source and backup database tables that did not match and failed to match are recorded using a data row array, said data row array being a two-dimensional integer array.

3. The method as claimed in claim 2, wherein in step S1, a cache set of two-dimensional hash arrays is used to cache the data of the source and backup database tables that have not been matched and failed in matching.

4. The difference comparison method for database tables as claimed in claim 3, wherein the two-dimensional hash array for caching data of source and backup database tables that do not match and fail to match is:

m _ hash [ current data source end or standby end ] [ key value ]

5. The method of claim 4, wherein in step S2, the number of data lines of the source and the backup of the record that are not matched and that fail to match are added, and if the addition result is greater than a predetermined threshold, it is determined that the cache operation is required.

6. The method of claim 5, wherein the greater number of rows stored is determined by comparing the number of rows recorded at the source side with the number of rows recorded at the backup side prior to storing the cached data to determine which side has the greater number of rows.

7. A method of comparing differences of database tables as claimed in claim 6, characterized by: in step S2, when performing the data caching operation, a key is determined, and the stored data is obtained and stored in the same file according to the method of increasing 256 keys from the caching set in each traversal.

8. A method of comparing differences of database tables as claimed in claim 7, characterized by: in step S3, a file is created with the determined key and obj n of the current table as the filename, the acquired data is stored and written into the corresponding file, and the flag indicating whether the cache is stored is set to true.

9. A method of comparing differences of database tables as claimed in claim 7, characterized by: in step S4, before reading the file, the remaining unmatched data sent from the source and backup terminal in the cache set is stored in the corresponding file, and then the file is read, and a file is read each time to perform a common table comparison operation on the data therein.

10. A difference comparison apparatus for database tables, comprising: