CN113377778B - Method and device for comparing differences of database tables - Google Patents

Method and device for comparing differences of database tables Download PDF

Info

Publication number
CN113377778B
CN113377778B CN202110747712.XA CN202110747712A CN113377778B CN 113377778 B CN113377778 B CN 113377778B CN 202110747712 A CN202110747712 A CN 202110747712A CN 113377778 B CN113377778 B CN 113377778B
Authority
CN
China
Prior art keywords
data
file
backup
matched
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110747712.XA
Other languages
Chinese (zh)
Other versions
CN113377778A (en
Inventor
王岚
高志会
陈勇铨
胡军擎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Information2 Software Inc
Original Assignee
Shanghai Information2 Software Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Information2 Software Inc filed Critical Shanghai Information2 Software Inc
Priority to CN202110747712.XA priority Critical patent/CN113377778B/en
Publication of CN113377778A publication Critical patent/CN113377778A/en
Application granted granted Critical
Publication of CN113377778B publication Critical patent/CN113377778B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Storage Device Security (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a difference comparison method and a device of a database table, wherein the method comprises the following steps: step S1, receiving the encrypted database table data of the source end and the backup end, comparing the data rows, caching the data of the database tables of the source end and the backup end which are not matched and failed to be matched in a cache set according to the comparison result, and recording the number of the data rows which are not matched; step S2, after comparing a certain batch of data, judging whether to perform data caching operation according to the recorded data line number, and when the data caching operation is required, acquiring the data required to be cached from a caching set; step S3, creating a file for storing the difference data, writing the acquired data into the corresponding file, and deleting the recorded corresponding line number and the data in the cache set; step S4, after the data transmission from the source end and the backup end is finished, the file storing the difference data is read, and the ordinary table comparison operation is executed.

Description

Method and device for comparing differences of database tables
Technical Field
The invention relates to the technical field of computer data backup disaster recovery, in particular to a difference comparison method and device for a database table with large data volume.
Background
At present, in the process of disaster recovery backup of computer data, comparison of database tables is often required, for example, a difference between current database tables of a backup source end and a backup destination end is required to be queried before data backup is performed to determine whether to backup, at this time, comparison of the database tables is required, and after data backup, it is determined whether data is complete and accurate, and comparison of the database tables is also required to be performed on the backup source end and the backup destination end, so that the difference comparison of the database tables is very important in the process of disaster recovery backup of computer data.
At present, the database table comparison process in the computer data disaster recovery backup generally adopts the following processes: when the database tables are compared, each row of data in the source database and each row of data in the backup database need to be encrypted by MD5, and then compared by MD5 data, if the data are the same, the rows of data are the same.
Specifically, before MD5 encryption is performed on a data row, data of each row needs to be processed, during processing, each column of each row of data is converted (that is, each column of data of a row is stored in a continuous byte space, and then MD5 encryption is performed on the continuous space), then conversion results of all columns are summarized, MD5 encryption operation is performed after the summary, data of a source end is processed at a source end and is already MD5 data when being sent to a backup end, after a data row of the source end is sent to the backup end, the backup end reads database data of the backup end, then MD5 encryption is performed on the data row, and the data row is compared with data sent by the source end, at this time, if the difference data amount is especially large, or it is uncertain which end data reading is especially slow, the comparison process is slow, so that a large amount of data which is not compared is accumulated, because a large amount of data is accumulated in a memory, possibly leading to memory bursting and process crash.
Disclosure of Invention
In order to overcome the defects in the prior art, the present invention provides a method and an apparatus for comparing differences in database tables, so as to temporarily store a part of data in a file when the amount of data stored in the database tables is too large during comparison, and stop the table comparison and return the current result when the difference data is too large, thereby avoiding the problem of explosion of the memory.
In order to achieve the above object, the present invention provides a method for comparing differences of database tables, comprising the following steps:
step S1, receiving the encrypted database table data of the source end and the backup end, comparing the data rows, caching the data of the source end and the backup end database tables which are not matched and fail to be matched in a cache set according to the comparison result, and recording the data rows of the source end and the backup end database tables which are not matched and fail to be matched;
step S2, after comparing a certain batch of data, judging whether to perform data caching operation according to the data line numbers of the source end and the backup end which are recorded and are not matched with each other and are failed to be matched with each other, and when the data caching operation is required, acquiring data required to be cached from a cache set to store the data in a file for storing difference data;
step S3, the acquired data to be cached is subjected to file storage operation, a file for storing the difference data is created, the data is written into the corresponding file, and the corresponding line number of the record and the corresponding data in the cache set are deleted;
step S4, after the source data and the backup data are both sent, the file storing the difference data is read, and the normal table comparison operation is executed.
Preferably, in step S1, the data rows of the source and backup database tables that are not matched and failed to be matched are recorded by using a data row array, where the data row array is a two-dimensional integer array.
Preferably, in step S1, the data of the source and backup database tables that have not been matched and failed in matching are cached by using a cache set of two-dimensional hash arrays.
Preferably, the two-dimensional hash array for caching data of the source-end and backup-end database tables that are not matched and that have failed to match is:
m _ hash [ current data source end or standby end ] [ key value ]
Wherein the key value is obtained by encrypting the MD5 represented by the corresponding line data using a preset key function.
Preferably, in step S2, the number of data lines of the source end and the backup end that are not matched and failed in matching are added, and if the addition result is greater than a preset threshold, it is determined that a cache operation is required.
Preferably, before the cache data is saved, the recorded source data line number and the recorded backup data line number are compared to determine which end has the larger data line number, so as to determine the end with the larger storage line number.
Preferably, in step S2, when performing the data caching operation, a key is determined, and the stored data is obtained and stored in the same file according to the method of increasing 256 keys from the caching set in each traversal.
Preferably, in step S3, a file is created with the determined key and obj n of the current table as file names, the acquired data storage is written into the corresponding file, and the flag indicating whether the cache is stored is set to true.
Preferably, in step S4, before reading the file, the remaining unmatched data sent from the source and backup terminal in the cache set is stored in the corresponding file, and then the file is read, and a common table comparison operation is performed on the data in the file each time the file is read.
In order to achieve the above object, the present invention further provides a database table difference comparing apparatus, including:
the data row comparison processing unit is used for receiving the encrypted database table data of the source end and the backup end, performing data row comparison, caching the data of the source end and the backup end database tables which are not matched and fail to be matched in a cache set according to a comparison result, and recording the data row number of the source end and the backup end database tables which are not matched and fail to be matched;
the data caching unit is used for judging whether to perform data caching operation or not according to the data line numbers of the source end and the standby end which are not matched and are failed to be matched after a certain batch of data is compared, and acquiring data to be cached from a caching set to store the data in a file when the data caching operation is required;
the file storage unit is used for carrying out file storage operation on the acquired data needing to be cached, creating a file for storing the difference data, writing the data into a corresponding file, and deleting the recorded corresponding line number and the corresponding data in the cache set;
and the file reading and comparing unit is used for reading the file storing the difference data and executing the common table comparison operation after the source end data and the backup end data are sent.
Compared with the prior art, the difference comparison method and the difference comparison device for the database table temporarily store a part of data in a file when the amount of the data stored in the database table is overlarge during comparison of the database table, and stop table comparison and return a current result when the difference data is overlarge, so that the purpose of preventing the memory from being burst is achieved.
Drawings
FIG. 1 is a flowchart illustrating the steps of a method for comparing differences in database tables according to the present invention;
FIG. 2 is a diagram of a system architecture of a database table difference comparison apparatus according to the present invention;
Detailed Description
Other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention.
FIG. 1 is a flowchart illustrating the steps of a method for comparing differences in database tables according to the present invention. As shown in fig. 1, the difference comparison method for database tables of the present invention includes the following steps:
step S1, receiving the encrypted database table data of the source end and the backup end, comparing the data rows, caching the data of the source end and the backup end database tables which are not matched and fail to be matched in a cache set according to the comparison result, and recording the data rows of the source end and the backup end database tables which are not matched and fail to be matched.
In the present invention, the table is operated at the backup, first, a data row number group is initialized to record the data rows of the database tables of the source and backup that are not matched and failed to be matched, in the present invention, the data row number group is a two-dimensional integer number group corresponding to the source and backup, if the data of the source is not matched, the value of the corresponding source of the data row group is added with 1, if the data of the backup is not matched, the value of the corresponding backup of the data row group is added with 1, in the present invention, the data of the source and the backup are MD5 encrypted, that is, no matter the data row transmitted at this time is the source data or the backup, if there is no matched corresponding data, that is, MD5 is not the same, then the data row number of the corresponding backup in the data row number group is added with 1, that is, if the data of the source is not matched, then the value of the corresponding source end in the data line array is added with 1, if the data of the backup end is not matched, the value of the corresponding backup end in the data line array is added with 1.
In the database table comparison, in addition to recording the rows of the source end and the backup end, data of the source end and the backup end database tables which are not matched and fail to be matched are cached, specifically, the cache set is a two-dimensional hash array (2), that is, data of the database tables which are not matched and fail to be matched are stored in the two-dimensional hash array (2), because there are 256 values of row data MD5, in the specific embodiment of the present invention, a preset key function is used to encrypt MD5 represented by the row data to obtain a key value, and the data which are not matched and fail to be matched are stored in a corresponding key in the array during the table comparison, for example, the following two-dimensional hash arrays:
m _ hash [ current data source end or standby end ] [ key value ]
For example, if the key value obtained from the key function for a certain row of data MD5 is 254, the data of the data row is stored in m _ hash [0] [254], and if the key value obtained from the key function for a certain row of data MD5 is 762, the data of the data row is stored in m _ hash [0] [762 ]. Since the key is calculated from the MD5 of the data row, each time the data is stored, it is also random, and the valid data will not be the same, so that the data can be stored uniformly.
That is, it is assumed that 0 in the two-dimensional hash array (2) corresponds to the source, 1 corresponds to the backup, the source transmits data to compare with m _ hash [1] [ x ] data, the value of the source corresponding to the two-dimensional integer array of the recorded data line number is added with 1 if the matching fails, and the unmatched data of the source is stored in m _ hash [0] [ x ], whereas the data transmitted by the backup is compared with m _ hash [0] [ x ], the value of the backup corresponding to the two-dimensional integer array of the recorded data line number is added with 1 if the matching fails, and the unmatched data of the backup is stored in m _ hash [1] [ x ].
Step S2, after comparing a certain batch of data, determining whether to perform a data caching operation according to the data line numbers of the source end and the backup end that are recorded and are not matched, and the source end and the backup end that are failed in matching, and when a data caching operation needs to be performed, obtaining data that needs to be cached from a cache set to store the data in a file.
Specifically, after a certain batch of data is compared, the number of data lines of the source end and the backup end which are recorded and are not matched with each other and failed to be matched with each other are added, if the addition result is greater than a preset threshold (the threshold is a fixed value, for example, the maximum number of lines of the memory cache data), it indicates that cache operation needs to be performed, the data stored in the two-dimensional hash array (2) needs to be stored in a file, and if the addition result does not exceed the preset threshold, the next batch of data continues to be compared and processed.
Preferably, before storing the cache data, the recorded source data line number and the data line number of the backup are compared to determine which end line number is larger, so as to store the end with the larger line number.
And when the data line which fails to be matched is too large, preparing to acquire data from the cache set and store the data into a file. Specifically, a key is first determined (for example, a key stored at this time is determined using a macro definition ((idx) & (256-1)), and if the key for determining the file to be stored at this time is 11, file storage is performed by grabbing data from m _ hash [ x ] [11], m _ hash [ x ] [11+256] 2[, m _ hash [ x ] [11+256] 3], that is, data to be cached is obtained by adding 256 to the key (obtained by using a key function through the MD5 of the data row) at each traversal time in the cache set.
Step S3, performing file storage operation on the acquired data to be cached, creating a file for storing the difference data, writing the data into the corresponding file, and deleting the corresponding line number recorded and the corresponding data in the cache set.
Specifically, the file storage operation is started after the data to be stored is acquired, the file name of the file storing the difference data is created according to the key determined before and the obj n of the table, for example, the key is 254, the file with the file name of 254_123456 (supposing 123456 is table obj n) is created, then the data to be stored is written into the file of the corresponding directory, and the file storage flag, which is a flag indicating whether the cache is stored, is set to true.
Step S4, after the source data and the backup data are both sent, the file storing the difference data is read, and the normal table comparison operation is executed.
Specifically, after the source end data and the backup end data are both sent, whether the file storage flag is true is judged, and if the file storage flag is true, the file is read (that is, when the obj _ key file is true, the file is read), because part of the cached data is different after comparison, but part of the cached data is not sent yet, and therefore comparison needs to be performed after the source end and the backup end of the table data are both sent.
Specifically, before reading a file, existing data (referring to data which is sent from a source and a standby terminal and is not matched after comparison and is left in a memory cache set) is stored in the file, then the file is read, one file (namely, an obj _ key file, wherein stored is data of row data which is successfully matched between the source terminal and the standby terminal) is read each time, namely, data in the file corresponding to one key is compared, and the file corresponding to the next key is compared after the data comparison in the file corresponding to one key is finished. Because the probability that the memory is filled with the file corresponding to a certain key is not high. Specifically, the content of the corresponding key file is read, the data sent by the source terminal and the data sent by the standby terminal are stored in the content, the source terminal and the standby terminal can be distinguished due to the fact that the stored data are of the types, and then common table comparison operation is performed on the data. Because the read file operation is performed, the problem of data delay is avoided, when the file corresponding to a certain key is compared, if the difference data of the file corresponding to the certain key is larger than the preset threshold value, the current comparison result is erroneous, the comparison result is not compared with other keys, the memory fullness is avoided, because the difference data has too much meaning, the table comparison is finished, and the table comparison operation needs to be returned again.
FIG. 2 is a system architecture diagram of a database table difference comparison apparatus according to the present invention. As shown in fig. 2, the difference comparing apparatus for database tables of the present invention includes:
and the data row comparison processing unit 201 is configured to receive the encrypted database table data of the source end and the backup end, perform data row comparison, cache the data of the source end and the backup end database tables that are not matched and failed in matching in the cache set according to the comparison result, and record the number of data rows of the source end and the backup end database tables that are not matched and failed in matching.
In the present invention, the table is operated at the backup, first, a data row number group is initialized to record the data rows of the database tables of the source and backup that are not matched and failed to be matched, in the present invention, the data row number group is a two-dimensional integer number group corresponding to the source and backup, if the data of the source is not matched, the value of the corresponding source of the data row group is added with 1, if the data of the backup is not matched, the value of the corresponding backup of the data row group is added with 1, in the present invention, the data of the source and the backup are MD5 encrypted, that is, no matter the data row transmitted at this time is the source data or the backup, if there is no matched corresponding data, that is, MD5 is not the same, then the data row number of the corresponding backup in the data row number group is added with 1, that is, if the data of the source is not matched, then the value of the corresponding source end in the data line array is added with 1, if the data of the backup end is not matched, the value of the corresponding backup end in the data line array is added with 1.
In the comparison of the database tables, in addition to recording the line numbers of the source end and the backup end, data of the source end and the backup end database tables which are not matched and fail to be matched need to be cached, specifically, the cache set is a two-dimensional hash array (2), that is, data of the database tables which are not matched and fail to be matched are stored in the two-dimensional hash array (2), because there are 256 values of the row data MD5, in the specific embodiment of the present invention, a preset key function is used to encrypt the MD5 represented by the row data to obtain a key value, and the data which are not matched and fail to be matched are stored in a corresponding key in the array during the comparison of the tables. For example, the following two-dimensional hash array:
m _ hash [ current data source end or standby end ] [ key value ]
For example, if the key value obtained from the key function for a certain row of data MD5 is 254, the data of the data row is stored in m _ hash [0] [254], and if the key value obtained from the key function for a certain row of data MD5 is 762, the data of the data row is stored in m _ hash [0] [762 ]. Since the key is calculated from the MD5 of the data row, each time the data is stored, it is also random, and the valid data will not be the same, so that the data can be stored uniformly.
That is, it is assumed that 0 in the two-dimensional hash array (2) corresponds to the source, 1 corresponds to the backup, the source transmits data to compare with m _ hash [1] [ x ] data, the value of the source corresponding to the two-dimensional integer array of the recorded data line number is added with 1 if the matching fails, and the unmatched data of the source is stored in m _ hash [0] [ x ], whereas the data transmitted by the backup is compared with m _ hash [0] [ x ], the value of the backup corresponding to the two-dimensional integer array of the recorded data line number is added with 1 if the matching fails, and the unmatched data of the backup is stored in m _ hash [1] [ x ].
And the data caching unit 202 is configured to, after a certain batch of data is compared, determine whether to perform a data caching operation according to the number of data lines of the source end and the backup end that are not matched and are failed to be matched, and, when the data caching operation needs to be performed, obtain data that needs to be cached from a cache set to store the data in a file.
Specifically, after a certain batch of data is compared, the number of data lines of the source end and the backup end which are not matched with each other and failed in matching are added, if the addition result is greater than a preset threshold (the threshold is a fixed value, for example, the maximum number of lines of the memory cache data), it indicates that a cache operation needs to be performed, the data stored in the 2-dimensional hash array (2) needs to be saved into a file, and if the addition result does not exceed the preset threshold, the next batch of data continues to be compared and processed.
Preferably, before storing the cache data, the recorded source data line number and the data line number of the backup are compared to determine which end line number is larger, so as to store the end with the larger line number.
And when the data line which fails to be matched is too large, preparing to acquire data from the cache set and store the data into a file. Specifically, the key of the file is determined (for example, using a macro definition ((idx) & (256-1)), and if the key of the file is determined to be 11 this time, the data is captured from m _ hash [ x ] [11], m _ hash [ x ] [11+256] 2[, m _ hash [ x ] [11+256] 3] for file storage, that is, the data to be cached acquires the stored data by adding 256 to the key from the cache set every traversal.
The file storage unit 203 is configured to perform a file storage operation on the acquired data that needs to be cached, create a file for storing the difference data, write the data into a corresponding file, and delete the recorded corresponding line number and the corresponding data in the cache set.
Specifically, the file storage operation is started after the data to be stored is acquired, the file name of the file storing the difference data is created according to the key determined before and the obj n of the table, for example, the key is 254, the file with the file name of 254_123456 (supposing 123456 is table obj n) is created, then the data to be stored is written into the file of the corresponding directory, and the flag indicating whether the cache is stored, that is, the file storage flag is set to true.
And a file reading and comparing unit 204, configured to read the file storing the difference data after the source data and the backup data are both sent, and perform a normal table comparison operation.
Specifically, after the source data and the backup data are both sent, it is determined whether the current file storage flag is true, and if true, the file is read (i.e., the obj _ key file is true during reading, and then there is a file reading operation), because part of the cached data is the difference after comparison, but part of the cached data is the data that is not yet transferred, and therefore the comparison needs to be performed after the source and the backup of the table data are both sent.
Specifically, before reading a file, existing data (referring to data which is sent from a source and a standby terminal and is not matched after comparison and is left in a memory cache set) is stored in the file, then the file is read, one file (namely, an obj _ key file which stores data of row data which is successfully matched between the source terminal and the standby terminal) is read each time, namely, data in the file corresponding to one key is compared, and the file corresponding to the next key is compared after the data comparison in the file corresponding to one key is finished. Because the probability that the memory is filled with the file corresponding to a certain key is not high. Specifically, the content of the corresponding key file is read, the data sent by the source terminal and the data sent by the standby terminal are stored in the content, the source terminal and the standby terminal can be distinguished due to the fact that the stored data are of the types, and then common table comparison operation is performed on the data. Because the read file operation is performed, the problem of data delay is avoided, when the file corresponding to a certain key is compared, if the difference data of the file corresponding to the certain key is larger than the preset threshold value, the current comparison result is erroneous, the comparison result is not compared with other keys, the memory fullness is avoided, because the difference data has too much meaning, the table comparison is finished, and the table comparison operation needs to be returned again.
Examples
In the embodiment of the present invention, the difference comparison operation performed by the database table is performed in the node program at the standby terminal, when data row comparison is performed in the node program at the standby terminal, a two-dimensional integer array is set in the node program at the standby terminal, and is used to record the number of data rows of the source terminal and the standby terminal which are not matched and failed to be matched, and if the transmitted row is source terminal data or standby terminal data, that is, if there is no matched corresponding data, that is, MD5 is not the same, the number of data rows of the corresponding terminal of the data is increased by 1.
When the database tables are compared, the line numbers of the source end and the standby end are recorded, and the data sent by the source end and the data sent by the standby end are both stored in a two-dimensional hash array. Because there are 256 values for each row data MD5, a key function is used to encrypt MD5 represented by row data to obtain a key value, and the data that matches and is invalid when the table is compared will be stored in the corresponding array. Since the key is calculated from the MD5 of the data row, each time the data is stored is random, and the valid data is not the same, the data can be uniformly stored
After a certain batch of data is compared, the line numbers of the cache data of the source terminal and the standby terminal are added, if the line numbers are larger than a certain limit value (the limit value is a fixed value and is the maximum line number of the cache data of the memory), the data cache operation is carried out, and if the line numbers are not larger than the limit value, the next batch of data is continuously compared and processed.
Before the cache data is stored, the line number of the cached source end data is compared with the line number of the cached backup end data to determine which end has a larger line number and store the end with a larger line number.
And when the data line which fails to be matched is too large, preparing to acquire data from the cache set and store the data into a file. Specifically, firstly, a key is determined, and then, 256 times of traversal increase are performed on the data needing to be cached from the cache set according to the key to obtain the stored data.
The file storage operation is started after the acquisition of the data to be stored, the file name is created from the key acquired before and the obj n of the table, and then the cache is written to the file of the corresponding directory. The flag of whether the cache is storing is set to true.
Because the number of rows has been deleted from the memory when the data is stored in the file, if the data is over-stored at some time, the data continues to be stored according to the step.
After the source end data and the backup end data are sent, judging whether the current file storage mark is true, and if the current file storage mark is true, reading the file.
Before reading the file, storing the existing data (referring to the data which is not matched after comparison and is sent from the source and the standby in the memory cache set) into the file, then reading the file, reading one file (namely, an obj _ key file which stores the data of the data which is not successfully matched between the source end and the standby end) each time, namely, comparing the data in the file corresponding to one key, and comparing the file corresponding to the next key after the comparison of the data in the file corresponding to one key is finished. Because the probability that the memory is filled with the file corresponding to a certain key is not high. Specifically, the content of the corresponding key file is read, the data sent by the source terminal and the data sent by the standby terminal are stored in the content, the source terminal and the standby terminal can be distinguished due to the fact that the stored data are of the types, and then common table comparison operation is performed on the data. Because the read file operation is performed, the problem of data delay is avoided, when the file corresponding to a certain key is compared, if the difference data of the file corresponding to the certain key is larger than the preset threshold value, the current comparison result is erroneous, the comparison result is not compared with other keys, the memory fullness is avoided, because the difference data has too much meaning, the table comparison is finished, and the table comparison operation needs to be returned again.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Modifications and variations can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present invention. Therefore, the scope of the invention should be determined from the following claims.

Claims (10)

1. A difference comparison method of database tables comprises the following steps:
step S1, receiving the encrypted database table data of the source end and the backup end, comparing the data rows, caching the data of the source end and the backup end database tables which are not matched and fail to be matched in a cache set according to the comparison result, and recording the data rows of the source end and the backup end database tables which are not matched and fail to be matched;
step S2, after comparing a certain batch of data, judging whether to perform data caching operation according to the data line numbers of the source end and the backup end which are recorded and are not matched with each other and are failed to be matched with each other, and when the data caching operation is required, acquiring data required to be cached from a cache set to store the data in a file for storing difference data;
step S3, the acquired data to be cached is subjected to file storage operation, a file for storing the difference data is created, the data is written into the corresponding file, and the corresponding line number of the record and the corresponding data in the cache set are deleted;
step S4, after the source data and the backup data are both sent, the file storing the difference data is read, and the normal table comparison operation is executed.
2. The method of comparing differences between database tables according to claim 1, wherein in step S1, the data rows of the source and backup database tables that did not match and failed to match are recorded using a data row array, said data row array being a two-dimensional integer array.
3. The method as claimed in claim 2, wherein in step S1, a cache set of two-dimensional hash arrays is used to cache the data of the source and backup database tables that have not been matched and failed in matching.
4. The difference comparison method for database tables as claimed in claim 3, wherein the two-dimensional hash array for caching data of source and backup database tables that do not match and fail to match is:
m _ hash [ current data source end or standby end ] [ key value ]
Wherein the key value is obtained by encrypting the MD5 represented by the corresponding line data using a preset key function.
5. The method of claim 4, wherein in step S2, the number of data lines of the source and the backup of the record that are not matched and that fail to match are added, and if the addition result is greater than a predetermined threshold, it is determined that the cache operation is required.
6. The method of claim 5, wherein the greater number of rows stored is determined by comparing the number of rows recorded at the source side with the number of rows recorded at the backup side prior to storing the cached data to determine which side has the greater number of rows.
7. A method of comparing differences of database tables as claimed in claim 6, characterized by: in step S2, when performing the data caching operation, a key is determined, and the stored data is obtained and stored in the same file according to the method of increasing 256 keys from the caching set in each traversal.
8. A method of comparing differences of database tables as claimed in claim 7, characterized by: in step S3, a file is created with the determined key and obj n of the current table as the filename, the acquired data is stored and written into the corresponding file, and the flag indicating whether the cache is stored is set to true.
9. A method of comparing differences of database tables as claimed in claim 7, characterized by: in step S4, before reading the file, the remaining unmatched data sent from the source and backup terminal in the cache set is stored in the corresponding file, and then the file is read, and a file is read each time to perform a common table comparison operation on the data therein.
10. A difference comparison apparatus for database tables, comprising:
the data row comparison processing unit is used for receiving the encrypted database table data of the source end and the backup end, performing data row comparison, caching the data of the source end and the backup end database tables which are not matched and fail to be matched in a cache set according to a comparison result, and recording the data row number of the source end and the backup end database tables which are not matched and fail to be matched;
the data caching unit is used for judging whether to perform data caching operation or not according to the data line numbers of the source end and the standby end which are not matched and are failed to be matched after a certain batch of data is compared, and acquiring data to be cached from a caching set to store the data in a file when the data caching operation is required;
the file storage unit is used for carrying out file storage operation on the acquired data needing to be cached, creating a file for storing the difference data, writing the data into a corresponding file, and deleting the recorded corresponding line number and the corresponding data in the cache set;
and the file reading and comparing unit is used for reading the file storing the difference data and executing the common table comparison operation after the source end data and the backup end data are sent.
CN202110747712.XA 2021-07-01 2021-07-01 Method and device for comparing differences of database tables Active CN113377778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110747712.XA CN113377778B (en) 2021-07-01 2021-07-01 Method and device for comparing differences of database tables

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110747712.XA CN113377778B (en) 2021-07-01 2021-07-01 Method and device for comparing differences of database tables

Publications (2)

Publication Number Publication Date
CN113377778A CN113377778A (en) 2021-09-10
CN113377778B true CN113377778B (en) 2022-04-22

Family

ID=77580782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110747712.XA Active CN113377778B (en) 2021-07-01 2021-07-01 Method and device for comparing differences of database tables

Country Status (1)

Country Link
CN (1) CN113377778B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1791095A (en) * 2004-12-17 2006-06-21 上海证券通信有限责任公司 Data concentrated backup method, reduction method and its system
CN101458638A (en) * 2007-12-13 2009-06-17 安凯(广州)软件技术有限公司 Large scale data verification method for embedded system
CN107704342A (en) * 2017-09-26 2018-02-16 郑州云海信息技术有限公司 A kind of snap copy method, system, device and readable storage medium storing program for executing
CN110134694A (en) * 2019-05-20 2019-08-16 上海英方软件股份有限公司 The quick comparison device and method of table data in a kind of dual-active database
CN111581031A (en) * 2020-05-13 2020-08-25 上海英方软件股份有限公司 Data synchronization method and device based on RDC (remote data center) indefinite-length partitioning strategy

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201615748D0 (en) * 2016-09-15 2016-11-02 Gb Gas Holdings Ltd System for importing data into a data repository

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1791095A (en) * 2004-12-17 2006-06-21 上海证券通信有限责任公司 Data concentrated backup method, reduction method and its system
CN101458638A (en) * 2007-12-13 2009-06-17 安凯(广州)软件技术有限公司 Large scale data verification method for embedded system
CN107704342A (en) * 2017-09-26 2018-02-16 郑州云海信息技术有限公司 A kind of snap copy method, system, device and readable storage medium storing program for executing
CN110134694A (en) * 2019-05-20 2019-08-16 上海英方软件股份有限公司 The quick comparison device and method of table data in a kind of dual-active database
CN111581031A (en) * 2020-05-13 2020-08-25 上海英方软件股份有限公司 Data synchronization method and device based on RDC (remote data center) indefinite-length partitioning strategy

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Data backup and recovery based on data de-duplication;Guo-Zi Sun,Yu Dong,Dan-Wei Chen,Jie Wei;《2010 International Conference on Artificial Intelligence and Computational Intelligence》;20101203;全文 *
容灾应用平台与数据比较子***的设计与实现;陈亚堂;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20170315;全文 *

Also Published As

Publication number Publication date
CN113377778A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
US10592348B2 (en) System and method for data deduplication using log-structured merge trees
US9715507B2 (en) Techniques for reconciling metadata and data in a cloud storage system without service interruption
US7478113B1 (en) Boundaries
US8843454B2 (en) Elimination of duplicate objects in storage clusters
EP2965189B1 (en) Managing operations on stored data units
US20060206669A1 (en) Efficient data storage system
EP3438845A1 (en) Data updating method and device for a distributed database system
AU2014226446B2 (en) Managing operations on stored data units
CN111309720A (en) Time sequence data storage method, time sequence data reading method, time sequence data storage device, time sequence data reading device, electronic equipment and storage medium
US20130013570A1 (en) File storage apparatus, data storing method, and data storing program
US20210303213A1 (en) Semiconductor storage device and control method thereof
CN111125002B (en) Data backup archiving method and system based on distributed storage
CN110910249B (en) Data processing method and device, node equipment and storage medium
Sitompul et al. File reconstruction in digital forensic
AU2014226447B2 (en) Managing operations on stored data units
CN115098519A (en) Data storage method and device
CN113377778B (en) Method and device for comparing differences of database tables
CN109271097B (en) Data processing method, data processing device and server
CN114089915A (en) File additional writing operation method and device based on FLASH memory
CN111857603B (en) Data processing method and related device
US11467896B2 (en) Sections in crash dump files
CN111400248B (en) Method for writing data and recovering data and file system
CN110825309A (en) Data reading method, device and system and distributed system
CN110019056B (en) Container metadata separation for cloud layer
CN111787074B (en) File synchronization method and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant