CN107798007B - Distributed database data verification method, device and related device - Google Patents

Distributed database data verification method, device and related device Download PDF

Info

Publication number
CN107798007B
CN107798007B CN201610794307.2A CN201610794307A CN107798007B CN 107798007 B CN107798007 B CN 107798007B CN 201610794307 A CN201610794307 A CN 201610794307A CN 107798007 B CN107798007 B CN 107798007B
Authority
CN
China
Prior art keywords
data
changed
database
specified
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610794307.2A
Other languages
Chinese (zh)
Other versions
CN107798007A (en
Inventor
郭龙波
丁岩
徐宜良
张宗禹
林周凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinzhuan Xinke Co Ltd
Original Assignee
Jinzhuan Xinke Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinzhuan Xinke Co Ltd filed Critical Jinzhuan Xinke Co Ltd
Priority to CN201610794307.2A priority Critical patent/CN107798007B/en
Publication of CN107798007A publication Critical patent/CN107798007A/en
Application granted granted Critical
Publication of CN107798007B publication Critical patent/CN107798007B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device and a related device for checking online data of a distributed database, which are used for determining the consistency of the data to be changed before and after the change by comparing whether check values of specified line data in the data to be changed before and after the introduction are consistent, thereby effectively solving the problem that the distributed database in the prior art cannot determine the consistency of the data before and after the change.

Description

Distributed database data verification method, device and related device
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for checking distributed database data, and a related apparatus.
Background
With the wide application of database technology and the continuous accumulation of online service data, especially the rapid development of internet service, the data volume is increasing, and the performance of a single database becomes the bottleneck of online service, while the distributed database can provide high-performance, large-storage and high-concurrency database service, so that the distributed database is rapidly applied to various online service scenes.
However, when the existing distributed database is used for data migration and data initialization, the consistency of the data before and after the data change cannot be determined, so that the application range of the distributed database is limited.
Disclosure of Invention
The invention provides a method, a device and a related device for checking data of a distributed database, which are used for solving the problem that the distributed database in the prior art cannot determine the consistency of the data before and after the data is changed.
In one aspect, the invention provides a method for checking data of a distributed database, which comprises the following steps:
the method comprises the steps of leading out data to be changed into a data description text, and calculating a check value of specified row data in the data to be changed according to the led out data description text;
splitting the data to be changed according to rows, and importing the split data to be changed into corresponding database nodes;
after the data is imported, calculating the check value of the specified line data in the data to be changed after the data is imported, comparing whether the check values of the specified line data in the data to be changed are consistent before and after the data is imported, and if so, determining that the data to be changed are consistent before and after the data is changed.
Further, the calculating the check value of the specified row data in the data to be changed specifically includes:
and calculating the check value of a certain row of data appointed in the data to be changed, or calculating the sum of the check values of one or more continuous N rows of data appointed in the data to be changed.
Further, when the specified row data is a certain row, the calculating the verification value of the specified row data in the data to be changed after the data to be changed is imported specifically includes: calculating a check value of a certain row of data appointed in the data to be changed after being imported; the comparing whether the check values of the specified row data in the data to be changed are consistent or not specifically comprises: comparing the check value of a certain row of data appointed in the data to be changed before and after the importing;
when the specified line is one or more continuous N lines of data, the calculating the check value of the specified line data in the data to be changed after the data to be changed is imported specifically includes: calculating the sum of check values of one or more continuous N rows of data appointed in the data to be changed after being imported; the comparing whether the check values of the specified row data in the data to be changed are consistent or not specifically comprises: and comparing the sum of check values of one or more continuous N rows of data appointed in the data to be changed before and after the importing.
Further, after splitting the data to be changed according to the rows, and before importing the split data to be changed to a corresponding database node, the method further includes:
and acquiring database nodes in which the split data to be changed are respectively stored according to a distributed distribution rule.
Further, importing the split data to be changed to a corresponding database node, which specifically includes:
writing the split data to be changed into a file cache of a corresponding database node, notifying a database cluster to manage the completed file number and file name list, and triggering a database agent to download the data to be changed stored in the file cache to the database node through the database cluster management;
wherein the database agents are respectively in one-to-one correspondence with the database nodes.
Further, the data to be changed comprises data to be initialized, data to be migrated and data to be re-distributed.
In another aspect, the present invention provides an apparatus for checking data in a distributed database, including:
the first calculation unit is used for exporting data to be changed into a data description text, and calculating a check value of specified row data in the data to be changed according to the exported data description text;
an importing unit, configured to split the data to be changed according to a row, and import the split data to be changed to a corresponding database node;
the second calculation unit is used for calculating the check value of the specified row data in the data to be changed after the data is imported;
and the comparison unit is used for comparing whether the check values of the specified row data in the data to be changed are consistent before and after the leading-in, and if so, determining that the data to be changed are consistent before and after the changing.
Further, the first calculating unit is further configured to calculate a check value of a certain row of data specified in the data to be changed, or calculate a sum of check values of one or more continuous N rows of data specified in the data to be changed.
Further, the second calculating unit is further configured to calculate, when the specified row of data is a row of data, a check value of the specified row of data in the data to be changed after the data to be changed is imported; when the specified behavior is one or more continuous N lines of data, calculating the sum of check values of the one or more continuous N lines of data specified in the data to be changed after being imported;
the comparison unit is further used for comparing the check value of the specified line data in the data to be changed before and after the data to be changed is imported when the specified line data are the same; and comparing the sum of check values of one or more continuous N lines of data appointed in the data to be changed before and after the introduction when the appointed line is one or more continuous N lines of data.
Further, the importing unit further includes:
the splitting module is used for splitting the data to be changed according to the rows;
the acquisition module is used for acquiring database nodes in which the split data to be changed are respectively stored according to a distributed distribution rule;
and the importing module is used for importing the split data to be changed to a corresponding database node.
Further, the importing unit further includes:
the splitting module is used for splitting the data to be changed according to the rows;
and the importing module is used for writing the split data to be changed into the file cache of the corresponding database node, notifying the database cluster to manage the completed file number and file name list, triggering the database agent to download the data to be changed stored in the file cache to the database node through the database cluster management, wherein the database agent corresponds to the database node one by one.
Further, the data to be changed comprises data to be initialized, data to be migrated and data to be re-distributed.
In a further aspect, the invention provides a database cluster server provided with the device for checking any distributed database data.
The invention has the following beneficial effects:
the invention determines the consistency of the data to be changed before and after the change by comparing whether the check values of the specified line data in the data to be changed before and after the introduction are consistent, and effectively solves the problem that the distributed database in the prior art cannot determine the consistency of the data before and after the change.
Drawings
FIG. 1 is a flow chart of a method for distributed database data verification according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method for distributed database data verification according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an apparatus for checking data of a distributed database according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the architecture of a system for online data migration according to an embodiment of the present invention.
Detailed Description
In order to solve the problem that the data consistency before and after the data change cannot be determined in the distributed database in the prior art, the invention provides a method, a device and a related device for verifying the data of the distributed database. The present invention will be described in further detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Method embodiment
The embodiment of the invention provides a method for checking distributed database data, an execution subject of the invention is a database cluster server, and referring to fig. 1, the method comprises the following steps:
s101, data to be changed is exported to form a data description text, and a check value of specified row data in the data to be changed is calculated according to the exported data description text;
s102, splitting the data to be changed according to the row, and importing the split data to be changed to a corresponding database node;
s103, after data import is completed, calculating a check value of specified row data in the data to be changed after the data import is completed;
s104, comparing whether the check values of the specified row data in the data to be changed are consistent before and after the leading-in, and if so, determining that the data to be changed are consistent before and after the changing.
That is, the invention determines the consistency of the data to be changed before and after the change by comparing whether the check values of the specified line data in the data to be changed before and after the introduction are consistent, and effectively solves the problem that the distributed database in the prior art cannot determine the consistency of the data before and after the change.
In specific implementation, the embodiment of the present invention specifically includes the following step S101: and calculating the check value of a certain row of data appointed in the data to be changed, or calculating the sum of the check values of one or more continuous N rows of data appointed in the data to be changed.
That is, the present invention can compare whether the data to be changed is identical before and after the import by calculating the check value of a certain line of data specified in the data to be changed by simple sampling, or by calculating the sum of the check values of one or more continuous N lines of data specified in the data to be changed by a larger range of sampling.
It should be noted that, in the scheme of calculating the sum of the check values of the one or more continuous N rows of data specified in the data to be changed according to the embodiment of the present invention, checking is performed on all rows of the data to be changed.
Specifically, when the specified row data is a certain row data, the calculating the verification value of the specified row data in the data to be changed after the data to be changed is imported specifically includes: calculating a check value of a certain row of data appointed in the data to be changed after being imported; the comparing whether the check values of the specified row data in the data to be changed are consistent or not specifically comprises: comparing the check value of a certain row of data appointed in the data to be changed before and after the importing;
for example, if all even lines are specified to be checked, the invention calculates check values of data of all even lines of the data to be changed before and after the data is imported, and compares the check values to determine whether the data to be changed before and after the data is imported are consistent.
When the specified line is one or more continuous N lines of data, the calculating the check value of the specified line data in the data to be changed after the data to be changed is imported specifically includes: calculating the sum of check values of one or more continuous N rows of data appointed in the data to be changed after being imported; the comparing whether the check values of the specified row data in the data to be changed are consistent or not specifically comprises: and comparing the sum of check values of one or more continuous N rows of data appointed in the data to be changed before and after the importing.
For example, if all the rows are specified to be checked, the invention calculates the check values of the data of all the rows of the data to be changed before and after the data is imported, and compares the check values to determine whether the data to be changed before and after the data is imported are consistent.
In a specific implementation, in the embodiment of the present invention, the row of the data to be changed in step S101 and the data check value corresponding to the row are stored in a preset check table, and of course, the sum of the number of rows of the data to be changed and the data check value of the corresponding number of rows may also be stored in the check table, so as to perform data consistency check after data is imported.
That is, the invention guides the data to the corresponding node according to the preset data distribution rule on the basis of not affecting the operation of the existing service, and ensures the strong consistency of the data which is guided into the database and the original data by two means of checking all the number of lines and checking the number of lines by sampling, and meanwhile, the number of sampling checking lines is configurable.
When the data to be changed is exported into the data description text, namely, each data is calibrated and distributed to the database node according to the data positioning, after the data is imported, the data verification value of the data after the data is imported is obtained according to the row (such as even number of rows, namely, the data is sampled and verified) or the preset row (the row can be arbitrarily set or can be the data of all rows, namely, all the row verification) which is recorded in the preset verification table, and the data verification value recorded in the verification table of the data verification value is compared, and if the data verification value is consistent with the data verification value, the data before and after the data is imported is considered to be consistent.
The data to be changed in the embodiment of the invention comprises data to be initialized, data to be migrated and data to be re-distributed. That is, the invention can verify the consistency of the data before and after the data change, such as data initialization, data migration, data duplication, etc. Because the whole data change checking process does not need to lock the current database, the data can be independently positioned according to the rows, the data distribution and the data checking can be independently performed, and the I/O of a database server is occupied only when single-node data is imported, so that the influence on online business is small.
The embodiment of the invention specifically comprises the steps of:
before data migration or data initialization, the data to be migrated needs to be exported into a data description text, both a cross database and a current distributed database are supported, a database table needing to be migrated needs to be exported into a text description file according to the original database grammar during cross database, and the current distributed database can export the distributed data into the text file through a LoadServer.
The embodiment of the invention calculates the data check value of the data line to be changed to be verified, or calculates the sum of the data check value of the line number of the data line to be changed to be verified and the corresponding line number, and stores the sum of the data check value of the line number of the data line to be changed and the corresponding line number in a preset check table for the subsequent consistency check.
When the invention is implemented, the text is read into the memory according to the text description rule, the ASCII value of the current line data (namely the data check value) is calculated and stored into the memory, and when the continuous line data is required to be verified, the ASCII value of each line data is added, so that the sum of the ASCII values of the line data can be obtained.
After splitting the data to be changed and before importing the split data to be changed into the corresponding database node, the embodiment of the invention further comprises the following steps:
and acquiring database nodes in which the split data to be changed are respectively stored according to a distributed distribution rule.
The embodiment of the invention imports the split data to be changed to the corresponding database node, and specifically comprises the following steps:
writing the split data to be changed into a file cache of a corresponding database node, notifying a database cluster to manage the completed file number and file name list, and triggering a database agent to download the data to be changed stored in the file cache to the database node through the database cluster management;
wherein the database agents are respectively in one-to-one correspondence with the database nodes.
When the method is implemented, the current line data is written into the corresponding database node file cache, and if the cache is full or the configuration file meets the requirement that the current node file stores the line number of the data, the data is written into the file and a new file to be written is generated;
after a certain amount of files are generated, the database cluster server informs the database agent of downloading the corresponding files into the database server according to the number of completed files and the file name list, and the files are imported into the corresponding database;
after data import is completed, the database cluster server initiates verification, and sends a verification request to the database proxy through database cluster management to acquire a data verification value of a data row stored by a current database node counted by the database proxy; the database agent is a database agent corresponding to a database node storing the data of the data line to be changed; or sending a verification request to a database agent through database cluster management according to the row of the data to be changed, and obtaining the sum of the number of rows of the data of the current database node counted by the database agent and the corresponding data verification value, wherein the database agent is the database agent corresponding to the database node storing the data of the row of the data to be changed, and the database agents are respectively in one-to-one correspondence with the database nodes.
Specifically, after data importing is completed, the database cluster server initiates a data verification process, distributes a data verification request to database agents DBAgents of all database nodes of a current verification table, enables the database agents DBAgents to count the number of rows of the current table and ASCII values of data of the current verification table, compares the number of rows of the data with the data verification value after receiving feedback results fed back by all the nodes, and if the number of rows of the data is the same and the sampling verification value is the same, the data consistency verification is passed, and the feedback data migration is successful.
FIG. 2 is a flow chart of another method for checking data in a distributed database according to an embodiment of the present invention, and the method of the present invention will be explained and explained in detail with reference to FIG. 2 below:
s201, starting;
s202, data export;
namely, the data to be changed is led out into a data description text, and the ASCII value of the current line data or the sum of the ASCII values of the current line data is calculated;
s203, data importing and verification data generation;
specifically, the method specifically comprises the following steps: writing the data to be changed into the file cache of the corresponding database node, informing the database cluster of managing the number of completed files and a file name list, triggering a database proxy to download the data to be changed stored in the file cache into the database node through the database cluster management, and sending a verification request to the database proxy through the database cluster management after the data is imported, so as to acquire a data verification value of a data row stored in the current database node counted by the database proxy;
s204, checking data;
and comparing whether the data check values (or the sum of the data check values) of the data lines to be changed, which are required to be verified, before and after the importing are consistent, and if so, determining that the data to be changed are consistent before and after the changing.
S205, ending.
The method according to the invention will be explained and illustrated in further detail below by means of a specific example, the method comprising:
stage one, data file generation:
before data migration or data initialization, the data to be migrated needs to be exported into a data description text, both a cross database and a current distributed database are supported, a database table needing to be migrated needs to be exported into a text description file according to the original database grammar during cross database, and the current distributed database can export the distributed data into a text file through a database cluster server.
Stage two, data migration:
reading the text into the memory according to the text description rule, calculating ASCII value of the current line data and storing the ASCII value into the memory;
acquiring a database node in which the current data is to be stored according to a distributed distribution rule;
writing the current line data into the corresponding database node file cache, and writing the data into a file and generating a new file to be written if the cache is full or the current node file is required to store the line number of the data by the configuration file;
after a certain amount of files are generated, the database cluster server informs the DBagent to download the corresponding files to the database server and import the files to the corresponding databases;
repeating the steps until all the data are imported into the distributed database;
third, checking data consistency:
and after receiving all the data import completion requests, the database cluster server initiates a data verification process.
And the database cluster server distributes the data verification request to DBAgents of all database nodes of the current table, so that the DBAgents count the number of rows of the current table and ASCII values of the data of the current table.
And after receiving feedback results fed back by the nodes, comparing the data line numbers with the data check values, and if the data line numbers are the same and the sampling check values are the same, passing the data consistency check, and successfully migrating the feedback data.
The invention will be described in detail below with respect to an example of migration of a specific DB2 database to a mariadib distributed cluster database:
export data: exporting data to an external file using a DB2 providing method;
generating a check table: generating a check table (supporting full-quantity check and sampling check) according to the configuration check line number and the file line number;
splitting files: reading a data file according to a row, calculating a current row data attribution node according to a distribution rule, judging whether the current row data needs to be checked, if so, generating a current row ASCII value to be accumulated in a check result, generating an sql statement of a database for positioning the current row data, writing the sql statement into a current group check sql file, sequentially circulating, knowing that file reading is finished, and counting the number of the current file rows;
data import: the split data file is imported into a corresponding node database through a database proxy DBagent;
and (3) data verification: after the data is completely imported, a database cluster server initiates a data verification process, compares whether the current file line number and the verification value sum are consistent with the data line number sum and the data verification value sum in the imported database, and if so, the data is consistent before and after data migration, and the data migration is completed; if the migration is inconsistent, the migration needs to be carried out again;
the invention will be described in detail below with respect to a specific example of backup and restore of data based on a mariadib distributed cluster:
acquiring full data: exporting the distributed database data into a text file by using a distributed database import export tool;
generating a check row list: generating a check table (supporting full-quantity check and sampling check) according to the configuration check line number and the file line number;
splitting an original file, reading a data file according to rows, calculating a current row data attribution node according to a distribution rule, judging whether the current row data needs to be checked, if so, generating a current row ASCII value to be accumulated in a check result, generating an sql statement of a database for positioning the current row data, writing the sql statement into a current group check sql file, sequentially circulating, knowing that file reading is finished, and counting the number of rows of the current file;
data recovery: the split data file is imported into a corresponding node database through a database proxy DBagent;
and (3) data verification: after the data is completely imported, a database cluster server initiates a data verification process, compares whether the current file line number and the verification value sum are consistent with the data line number sum and the data verification sum in the new node, and if so, the data is consistent before and after backup recovery, and full data recovery is completed. And if the data are inconsistent, the data recovery process needs to be carried out again.
Compared with the existing technology of the distributed database in the industry, the invention has the following beneficial effects:
1. the invention has good performance. According to the invention, the data verification basic data preparation is completed in the data migration process, and the verification data preparation process is not required to be carried out again, so that the data migration duration time is greatly saved;
2. the method of the invention does not interfere with the operation of the online service, and the invention does not need to add a virtual column in the original check list and lock the list, so the influence on the online service is very small;
3. the method has flexible verification mode, supports the verification of the sampling data and the full data, and can shorten the completion time of the current data migration task by reasonably arranging different verification levels of different verification tables;
4. the method supports data migration data verification across databases, the data migration inlet is a data description text file, each database supports the export of the database into the text description file, and the distributed database can be exported through a database cluster server into a distributed database text file.
Device embodiment
The embodiment of the invention provides a device for checking data of a distributed database, referring to fig. 3, the device comprises: the first calculation unit is used for exporting data to be changed into a data description text, and calculating a check value of specified row data in the data to be changed according to the exported data description text; an importing unit, configured to split the data to be changed according to a row, and import the split data to be changed to a corresponding database node; the second calculation unit is used for calculating the check value of the specified row data in the data to be changed after the data is imported; and the comparison unit is used for comparing whether the check values of the specified row data in the data to be changed are consistent before and after the leading-in, and if so, determining that the data to be changed are consistent before and after the changing.
That is, the invention determines the consistency of the data to be changed before and after the change by comparing whether the check values of the specified line data in the data to be changed before and after the introduction are consistent, and effectively solves the problem that the distributed database in the prior art cannot determine the consistency of the data before and after the change.
Further, the first calculating unit of the embodiment of the present invention is further configured to calculate a check value of a certain row of data specified in the data to be changed, or calculate a sum of check values of one or more continuous N rows of data specified in the data to be changed.
That is, the present invention can compare whether the data to be changed is identical before and after the import by calculating the check value of a certain line of data specified in the data to be changed by simple sampling, or by calculating the sum of the check values of one or more continuous N lines of data specified in the data to be changed by a larger range of sampling.
Further, the second calculating unit in the embodiment of the present invention is further configured to calculate, when the specified row of data is a row of data, a check value of the specified row of data in the data to be changed after the data to be changed is imported; when the specified behavior is one or more continuous N lines of data, calculating the sum of check values of the one or more continuous N lines of data specified in the data to be changed after being imported;
the comparison unit is further used for comparing the check value of the specified line data in the data to be changed before and after the data to be changed is imported when the specified line data are the same; and comparing the sum of check values of one or more continuous N lines of data appointed in the data to be changed before and after the introduction when the appointed line is one or more continuous N lines of data.
It should be noted that, in the embodiment of the present invention, the data to be changed includes data to be initialized, data to be migrated, and data to be repartitioned. That is, the invention can verify the consistency of the data before and after the data change, such as data initialization, data migration, data duplication, etc. Because the whole data change checking process does not need to lock the current database, the data can be independently positioned according to the rows, the data distribution and the data checking can be independently performed, and the I/O of a database server is occupied only when single-node data is imported, so that the influence on online business is small.
Further, the importing unit further includes: the splitting module splits the data to be changed according to the row; the acquisition module acquires database nodes in which the split data to be changed are respectively stored according to a distributed distribution rule; and the importing module imports the split data to be changed to a corresponding database node.
That is, the invention guides the data to the corresponding node according to the preset data distribution rule on the basis of not affecting the operation of the existing service, and ensures the strong consistency of the data which is guided into the database and the original data by two means of checking all the number of lines and checking the number of lines by sampling, and meanwhile, the number of sampling checking lines is configurable.
Further, the importing unit further includes: the splitting module splits the data to be changed according to the row; the importing module writes the split data to be changed into a file cache of a corresponding database node, informs a database cluster of managing the number of completed files and a file name list, triggers a database agent to download the data to be changed stored in the file cache to the database node through the database cluster management, and the database agent corresponds to the database node one by one respectively.
When the method is implemented, the current line data is written into the corresponding database node file cache, and if the cache is full or the configuration file meets the requirement that the current node file stores the line number of the data, the data is written into the file and a new file to be written is generated;
after a certain amount of files are generated, the database cluster server informs the database agent of downloading the corresponding files into the database server according to the number of completed files and the file name list, and the files are imported into the corresponding database.
In a specific implementation, the second calculation unit in the embodiment of the invention sends a verification request to the database proxy through database cluster management to obtain a data verification value of a data row stored in a current database node counted by the database proxy; the database agent is a database agent corresponding to a database node storing the data of the data row to be changed, or a verification request is sent to the database agent through database cluster management according to the row of the data to be changed, and the sum of the number of rows of the data of the current database node counted by the database agent and a corresponding data verification value is obtained; the database agents are database agents corresponding to the database nodes storing the data of the data row to be changed, and the database agents are respectively in one-to-one correspondence with the database nodes.
Fig. 4 is a schematic diagram of an online data migration system according to an embodiment of the present invention, as shown in fig. 4, after data is imported, a comparison unit initiates a data verification process, distributes a data verification request to database agents dbagents of all database nodes of a current verification table, makes the database agents dbagents count the number of rows of the current table and the ASCII value of the data of the current verification table, and after receiving feedback results fed back by each node, performs comparison between the number of rows of data and the data verification value, if the number of rows of data is the same and the sampling verification value is the same, the data consistency verification is passed, and the data migration is successful.
The relevant content of the device of the present invention can be understood by referring to the embodiment part of the method, and will not be described in detail herein.
Server embodiment
The embodiment of the invention provides a database cluster server, which comprises any one of the distributed database data verification devices in the device embodiment.
The relevant content in the embodiments of the present invention may be understood by referring to the device embodiment and the method embodiment, and will not be described herein.
The invention can at least achieve the following beneficial effects:
the invention can accurately determine the consistency of the data to be changed before and after the change by comparing whether the data check values of the data lines to be changed to be verified before and after the introduction are consistent or comparing whether the sum of the data check values of the data lines to be changed to be verified before and after the introduction is consistent with the sum of the data check values of the corresponding data lines, thereby effectively solving the problem that the distributed database in the prior art cannot determine the data consistency before and after the change.
Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, and accordingly the scope of the invention is not limited to the embodiments described above.

Claims (9)

1. A method for data verification of a distributed database, comprising:
the method comprises the steps of leading out data to be changed into a data description text, and calculating a check value of specified row data in the data to be changed according to the led out data description text; wherein the check value is an ASCII value;
splitting the data to be changed according to rows, and importing the split data to be changed into corresponding database nodes;
after the data is imported, calculating the check value of the designated line data in the data to be changed after the data is imported, comparing whether the check values of the designated line data in the data to be changed are consistent before and after the data is imported, and if so, determining that the data to be changed are consistent before and after the data is changed;
the calculating the check value of the specified row data in the data to be changed specifically includes:
calculating the check value of a certain row of data appointed in the data to be changed, or calculating the sum of the check values of one or more continuous N rows of data appointed in the data to be changed;
importing the split data to be changed to a corresponding database node, wherein the method specifically comprises the following steps of:
writing the split data to be changed into a file cache of a corresponding database node, notifying a database cluster to manage the completed file number and file name list, and triggering a database agent to download the data to be changed stored in the file cache to the database node through the database cluster management;
wherein the database agents are respectively in one-to-one correspondence with the database nodes.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
when the specified row data is a certain row, calculating the verification value of the specified row data in the data to be changed after the data to be changed is imported specifically includes: calculating a check value of a certain row of data appointed in the data to be changed after being imported; the comparing whether the check values of the specified row data in the data to be changed are consistent or not specifically comprises: comparing the check value of a certain row of data appointed in the data to be changed before and after the importing;
when the specified line is one or more continuous N lines of data, the calculating the check value of the specified line data in the data to be changed after the data to be changed is imported specifically includes: calculating the sum of check values of one or more continuous N rows of data appointed in the data to be changed after being imported; the comparing whether the check values of the specified row data in the data to be changed are consistent or not specifically comprises: and comparing the sum of check values of one or more continuous N rows of data appointed in the data to be changed before and after the importing.
3. The method according to any one of claims 1-2, wherein after splitting the data to be changed according to rows and before importing the split data to be changed to a corresponding database node, further comprising:
and acquiring database nodes in which the split data to be changed are respectively stored according to a distributed distribution rule.
4. The method according to any one of claims 1-2, wherein,
the data to be changed comprises data to be initialized, data to be migrated and data to be re-distributed.
5. An apparatus for verifying data in a distributed database, comprising:
the first calculation unit is used for exporting data to be changed into a data description text, and calculating a check value of specified row data in the data to be changed according to the exported data description text; wherein the check value is an ASCII value;
an importing unit, configured to split the data to be changed according to a row, and import the split data to be changed to a corresponding database node;
the second calculation unit is used for calculating the check value of the specified row data in the data to be changed after the data is imported;
the comparison unit is used for comparing whether the check values of the specified row data in the data to be changed are consistent before and after the data to be changed are imported, and if so, determining that the data to be changed are consistent before and after the data to be changed are changed;
the first calculating unit is further configured to calculate a check value of a certain row of data specified in the data to be changed, or calculate a sum of check values of one or more continuous N rows of data specified in the data to be changed;
the importing unit further includes:
the splitting module is used for splitting the data to be changed according to the rows;
and the importing module is used for writing the split data to be changed into the file cache of the corresponding database node, notifying the database cluster to manage the completed file number and file name list, triggering the database agent to download the data to be changed stored in the file cache to the database node through the database cluster management, wherein the database agent corresponds to the database node one by one.
6. The apparatus of claim 5, wherein the device comprises a plurality of sensors,
the second calculating unit is further configured to calculate a check value of a certain row of data specified in the data to be changed after the data to be changed is imported when the certain row of data is specified; when the specified behavior is one or more continuous N lines of data, calculating the sum of check values of the one or more continuous N lines of data specified in the data to be changed after being imported;
the comparison unit is further used for comparing the check value of the specified line data in the data to be changed before and after the data to be changed is imported when the specified line data are the same; and comparing the sum of check values of one or more continuous N lines of data appointed in the data to be changed before and after the introduction when the appointed line is one or more continuous N lines of data.
7. The apparatus according to any one of claims 5-6, wherein the importing unit further comprises:
the splitting module is used for splitting the data to be changed according to the rows;
the acquisition module is used for acquiring database nodes in which the split data to be changed are respectively stored according to a distributed distribution rule;
and the importing module is used for importing the split data to be changed to a corresponding database node.
8. The apparatus according to any one of claims 5 to 6, wherein,
the data to be changed comprises data to be initialized, data to be migrated and data to be re-distributed.
9. A database cluster server comprising the apparatus for distributed database data verification of a distributed database as claimed in any one of claims 5 to 8.
CN201610794307.2A 2016-08-31 2016-08-31 Distributed database data verification method, device and related device Active CN107798007B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610794307.2A CN107798007B (en) 2016-08-31 2016-08-31 Distributed database data verification method, device and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610794307.2A CN107798007B (en) 2016-08-31 2016-08-31 Distributed database data verification method, device and related device

Publications (2)

Publication Number Publication Date
CN107798007A CN107798007A (en) 2018-03-13
CN107798007B true CN107798007B (en) 2024-03-19

Family

ID=61530069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610794307.2A Active CN107798007B (en) 2016-08-31 2016-08-31 Distributed database data verification method, device and related device

Country Status (1)

Country Link
CN (1) CN107798007B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109669989A (en) * 2018-12-29 2019-04-23 江苏满运软件科技有限公司 Data verification method, system, equipment and medium
CN110209521B (en) * 2019-02-22 2022-03-18 腾讯科技(深圳)有限公司 Data verification method and device, computer readable storage medium and computer equipment
CN112231403B (en) * 2020-10-15 2024-01-30 北京人大金仓信息技术股份有限公司 Consistency verification method, device, equipment and storage medium for data synchronization
CN116150175A (en) * 2023-04-18 2023-05-23 云账户技术(天津)有限公司 Heterogeneous data source-oriented data consistency verification method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102354292A (en) * 2011-09-21 2012-02-15 国家计算机网络与信息安全管理中心 Method and system for checking consistency of records in master and backup databases
CN103793424A (en) * 2012-10-31 2014-05-14 阿里巴巴集团控股有限公司 Database data migration method and database data migration system
CN104361119A (en) * 2014-12-02 2015-02-18 中国农业银行股份有限公司 Data cleaning method and system
CN104731792A (en) * 2013-12-19 2015-06-24 ***股份有限公司 Method and system for verifying database consistency and method and system for positioning data difference

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102354292A (en) * 2011-09-21 2012-02-15 国家计算机网络与信息安全管理中心 Method and system for checking consistency of records in master and backup databases
CN103793424A (en) * 2012-10-31 2014-05-14 阿里巴巴集团控股有限公司 Database data migration method and database data migration system
CN104731792A (en) * 2013-12-19 2015-06-24 ***股份有限公司 Method and system for verifying database consistency and method and system for positioning data difference
CN104361119A (en) * 2014-12-02 2015-02-18 中国农业银行股份有限公司 Data cleaning method and system

Also Published As

Publication number Publication date
CN107798007A (en) 2018-03-13

Similar Documents

Publication Publication Date Title
CN108319719B (en) Database data verification method and device, computer equipment and storage medium
US10146600B2 (en) Mutable data objects content verification tool
CN107807982B (en) Consistency checking method and device for heterogeneous database
CN107798007B (en) Distributed database data verification method, device and related device
US20170075947A1 (en) Weightless Data Objects Content Verification
US9411867B2 (en) Method and apparatus for processing database data in distributed database system
US10025878B1 (en) Data lineage analysis
CN107315814B (en) Method and system for verifying data consistency after data migration of KDB (KDB) database
CN111753016B (en) Data processing method, device, system and computer readable storage medium
CN107977396B (en) Method and device for updating data table of KeyValue database
CN106886371B (en) Caching data processing method and device
CN105989059A (en) Data record checking method and device
CN107153609B (en) Automatic testing method and device
CN106897342A (en) A kind of data verification method and equipment
US9507798B1 (en) Centralized logging for a data storage system
CN111522811B (en) Database processing method and device, storage medium and terminal
CN113190531A (en) Database migration method, device, equipment and storage medium
CN103970844B (en) The wiring method and device of big data, read method and device and processing system
CN112948473A (en) Data processing method, device and system of data warehouse and storage medium
CN104636397B (en) Resource allocation methods, calculating accelerated method and device for Distributed Calculation
CN107704568A (en) Method and device for adding test data
CN112181790B (en) Capacity statistical method and system of storage equipment and related components
CN112579591B (en) Data verification method, device, electronic equipment and computer readable storage medium
CN111177119A (en) Database-based full-data comparison method, device, equipment and storage medium
CN116401229A (en) Database data verification method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right

Effective date of registration: 20180426

Address after: 518057 five floor, block A, ZTE communication tower, Nanshan District science and Technology Park, Shenzhen, Guangdong.

Applicant after: ZTE Corp.

Address before: 210000 68 Bauhinia Road, Yuhuatai District, Nanjing, Jiangsu

Applicant before: Nanjing Zhongxing New Software Co.,Ltd.

TA01 Transfer of patent application right
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220104

Address after: 100176 602, floor 6, building 6, courtyard 10, KEGU 1st Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing (Yizhuang group, high-end industrial area of Beijing Pilot Free Trade Zone)

Applicant after: Jinzhuan Xinke Co.,Ltd.

Address before: 518057 five floor, block A, ZTE communication tower, Nanshan District science and Technology Park, Shenzhen, Guangdong.

Applicant before: ZTE Corp.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant