CN113297235B

CN113297235B - Data consistency checking method and device for distributed database cluster

Info

Publication number: CN113297235B
Application number: CN202011211408.5A
Authority: CN
Inventors: 郭超
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-11-03
Filing date: 2020-11-03
Publication date: 2023-12-19
Anticipated expiration: 2040-11-03
Also published as: CN113297235A

Abstract

The specification provides a data consistency checking method and device for a distributed database cluster, wherein the data consistency checking method for the distributed database cluster comprises the following steps: under the condition that the completion of data migration from the first database cluster to the second database cluster is detected, constructing a first node hash tree of the first database cluster and a second node hash tree of the second database cluster according to a first hash tree construction rule; constructing a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule comprises: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree; and determining that the data of the first database cluster and the second database cluster are consistent under the condition that the hash values of the root nodes of the hash tree of the first database cluster and the hash tree of the second database cluster are consistent.

Description

Data consistency checking method and device for distributed database cluster

Technical Field

The present disclosure relates to the field of distributed database technologies, and in particular, to a method and an apparatus for checking data consistency of a distributed database cluster.

Background

With the rapid development of computer technology, the storage and management requirements for data are increasing, and distributed databases are created. The distributed database refers to a logically unified database cluster formed by connecting a plurality of physically dispersed data storage nodes by using a high-speed computer network, and the basic idea is to disperse and store the data in the original centralized database onto a plurality of data storage nodes connected by a network so as to obtain larger storage capacity and higher concurrent access quantity. In a distributed database, the data volume of the database clusters is huge, and data migration may need to be performed between the database clusters, and after data is migrated from an old database cluster to a new database cluster, it needs to be clear whether the data of the new database cluster is consistent with the data of the old database cluster, and because the data volume is huge, performing data-by-data comparison is a very time-consuming process, so performing full-volume data comparison to determine whether the data of two database clusters are consistent is impractical.

In the prior art, after the data migration from the old database cluster to the new database cluster is completed, the data of the old database cluster can be sampled at a certain interval, the data of the new database cluster is sampled according to the same rule, whether the sampled data are consistent or not is compared one by one, and if the sampled data are consistent, the total data included in the new database cluster and the old database cluster are directly determined to be consistent. However, in the above method of extracting part of data from the whole data for comparison, there may be a problem of inconsistency in a certain probability of the data not extracted, and the accuracy of checking the consistency of the data of the two database clusters is low, so that a faster and more accurate method is needed to perform the operation or processing of checking the consistency of the data of the distributed database clusters.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide a method for checking data consistency of a distributed database cluster. The present specification also relates to a data consistency check device for a distributed database cluster, a computing device, and a computer readable storage medium, which solve the technical drawbacks of the prior art.

According to a first aspect of embodiments of the present specification, there is provided a data consistency check method of a distributed database cluster, the method comprising:

in the case that the completion of data migration from the first database cluster to the second database cluster is detected, constructing a first node hash tree of the first database cluster and a second node hash tree of the second database cluster according to a first hash tree construction rule, wherein the first hash tree construction rule comprises: constructing a node hash tree according to hash values of data stored by data nodes included in the database cluster;

constructing a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule comprises: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree;

Comparing hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree, and determining that the data of the first database cluster and the data of the second database cluster are consistent under the condition that the hash values of the root nodes of the first database cluster hash tree and the root nodes of the second database cluster hash tree are consistent.

Optionally, constructing a node hash tree according to the hash value of the data stored by the data node included in the database cluster includes:

acquiring data stored by a data node included in the database cluster;

dividing the data according to a first length to obtain at least two data segments;

and constructing the node hash tree by taking hash values of the data of the at least two data segments as leaf nodes.

Optionally, constructing the node hash tree with hash values of the data of the at least two data segments as leaf nodes includes:

for each data segment in the at least two data segments, calculating a hash value of the data of each data segment through a first hash algorithm;

taking the calculated at least two hash values as leaf nodes of an ith layer, and carrying out first combination operation on the hash values of the leaf nodes of the ith layer to obtain leaf nodes of an (i+1) th layer, wherein i is greater than or equal to 1;

Determining whether the number of the leaf nodes of the (i+1) -th layer is equal to 1, if so, determining the leaf nodes of the (i+1) -th layer as root nodes; if not, the operation step of carrying out first combination operation on the hash value of the leaf node of the ith layer to obtain the leaf node of the (i+1) th layer by carrying out self-increment of 1 on the i;

and constructing the node hash tree according to the obtained leaf nodes and the root nodes of each layer, and storing the corresponding relation between the identifications of the root nodes and the data nodes.

Optionally, constructing the database cluster hash tree according to the hash value of the root node of the node hash tree includes:

taking the hash value of the root node of the node hash tree as a leaf node of a k-th layer;

performing a second combination operation on the hash value of the leaf node of the k layer to obtain a leaf node of the k+1th layer, wherein k is greater than or equal to 1;

determining whether the number of the leaf nodes of the k+1 layer is equal to 1, if so, determining the leaf nodes of the k+1 layer as root nodes; if not, the k is increased by 1, and the operation step of carrying out second combination operation on the hash value of the leaf node of the k layer to obtain the leaf node of the k+1 layer is carried out;

And constructing the database cluster hash tree according to the obtained leaf nodes and the root nodes of each layer.

Optionally, the method further comprises:

under the condition that hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree are inconsistent, comparing hash values of leaf nodes included in the first database cluster hash tree and the second database cluster hash tree from the root nodes to leaf node directions in sequence, and determining at least two first target leaf nodes;

and determining at least two corresponding target data nodes according to the at least two first target leaf nodes.

Optionally, after determining the corresponding target data node according to the first target leaf node, the method further includes:

acquiring a first target node hash tree corresponding to a target data node in the first database cluster, and acquiring a second target node hash tree corresponding to the target data node in the second database cluster;

sequentially from root nodes of the first target node hash tree and the second target node hash tree to leaf node directions, comparing hash values of leaf nodes included in the first target node hash tree and the second target node hash tree, and determining a second target leaf node;

And determining a corresponding data segment according to the second target leaf node.

Optionally, the first database cluster hash tree and the second database cluster hash tree each include p layers of leaf nodes, where p is greater than or equal to 1;

comparing hash values of leaf nodes included in the first database cluster hash tree and the second database cluster hash tree to determine a first target leaf node, including:

comparing hash values of leaf nodes of a p-th layer in the first database cluster hash tree and the second database cluster hash tree;

determining leaf nodes with different hash values in leaf nodes of a p-th layer in the first database cluster hash tree and the second database cluster hash tree as a third target leaf node;

the p is subtracted from 1, whether the p is equal to 0 is judged, and if yes, the third target leaf node is determined to be the first target leaf node; and if not, taking the leaf node subordinate to the third target leaf node as the leaf node of the p-th layer, and returning to the operation step of comparing the hash values of the leaf nodes of the p-th layer in the first database cluster hash tree and the second database cluster hash tree.

Optionally, obtaining data stored by a data node included in the database cluster includes:

determining a migration mode of data migration from the first database cluster to the second database cluster;

under the condition that the migration mode is continuous write migration, determining a data migration completion time point, and acquiring data stored by a data node included in the database cluster before the data migration completion time point;

and under the condition that the migration mode is write-down migration, acquiring the full data stored by the data nodes included in the database cluster.

Optionally, performing a first combination operation on the hash value of the leaf node of the i layer to obtain a leaf node of the i+1 layer, including:

dividing the leaf nodes of the ith layer according to a first grouping rule to obtain a plurality of first leaf node combinations;

and aiming at each first leaf node combination, carrying out target operation on the leaf nodes included in the first leaf node combination to obtain the leaf nodes of the (i+1) th layer.

According to a second aspect of embodiments of the present specification, there is provided a data consistency check apparatus for a distributed database cluster, the apparatus comprising:

A first construction module configured to construct a first node hash tree of a first database cluster and a second node hash tree of a second database cluster according to a first hash tree construction rule, in case that data migration from the first database cluster to the second database cluster is detected to be completed, wherein the first hash tree construction rule includes: constructing a node hash tree according to hash values of data stored by data nodes included in the database cluster;

a second construction module configured to construct a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule comprises: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree;

a first determining module configured to compare hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree, and determine that data of the first database cluster and the second database cluster are consistent in a case that the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree are consistent.

According to a third aspect of embodiments of the present specification, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions to implement the method of:

According to a fourth aspect of embodiments of the present description, there is provided a computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of a data consistency check method for a distributed database cluster.

The present disclosure provides a method for checking data consistency of a distributed database cluster, where when it is detected that data migration from a first database cluster to a second database cluster is completed, a first node hash tree of the first database cluster and a second node hash tree of the second database cluster may be first constructed according to a first hash tree construction rule, where the first hash tree construction rule includes: constructing a node hash tree according to hash values of data stored by data nodes included in the database cluster; and then constructing a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule comprises: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree; and comparing hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree, and determining that the data of the first database cluster and the data of the second database cluster are consistent under the condition that the hash values of the root nodes of the first database cluster hash tree and the root nodes of the second database cluster hash tree are consistent. Under the condition, the node hash tree is firstly constructed based on the real full-quantity data of each data node, then the hash trees of the two data clusters are respectively constructed according to the node hash trees of the data nodes included in the two data clusters, the full-quantity data included in the data clusters are ensured to participate in the construction of the hash trees of the data clusters, the root node of the hash tree of the data clusters can represent the full-quantity data included in the data clusters, at the moment, whether the root nodes of the hash trees of the two data clusters are consistent or not only needs to be compared, whether the full-quantity data included in the hash trees of the two data clusters are consistent or not can be rapidly and accurately determined, sampling of the full-quantity data is avoided, the possibility of non-consistency of the non-sampled data is further avoided, and the accuracy of checking the data consistency of the two data clusters is greatly improved.

Drawings

FIG. 1 is a flow chart of a method for data consistency check of a distributed database cluster according to one embodiment of the present disclosure;

fig. 2 is a schematic diagram of a construction process of a first node hash tree according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating a construction process of a hash tree of a second node according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating a construction process of a first database cluster hash tree according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a construction process of a hash tree of a second database cluster according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram illustrating a comparison process between a first database cluster hash tree and a second database cluster hash tree according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram illustrating a comparison process between a first node hash tree and a second node hash tree according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a data consistency checking device for distributed database clusters according to an embodiment of the present disclosure;

fig. 9 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

First, terms related to one or more embodiments of the present specification will be explained.

Hash value: also known as Hash Function (or Hash algorithm, hash Function, english: hash Function) is a method of creating a small digital "fingerprint" from any kind of data. The hash function, which may compress a message or data into a digest such that the amount of data is small and the format of the data is fixed, shuffles the data, recreates a fingerprint called a hash value (hash values, hash codes, or hashes), which is typically represented by a string of short random letters and numbers.

Hash table: also called Hash table (Hash table), is a data structure that is directly accessed according to a Key value (Key value). That is, it accesses the record by mapping the key value to a location in the table to speed up the lookup. This mapping function is called a hash function and the array in which the records are stored is called a hash table. Given a table M, a function f (key) exists, and if the address recorded in the table containing the key can be obtained after substituting the function into any given key value key, the table M is called a hash table, and the function f (key) is a hash function.

merkle-tree: a binary hash tree has a certain height, is a tree for storing hash values, divides data into small data segments at the bottom layer like a hash list, calculates corresponding hash values and corresponding hash values, merges two adjacent hash values into a character string, and then calculates the hash value of the character string, so that every two hash values are combined with each other to obtain a sub hash value. If the total number of hash values of the bottom layer is singular, a single hash value must appear from the end, and the hash operation is directly performed on the single hash value, so that the sub hash value can also be obtained. And pushing upwards in sequence, and adopting the same mode, obtaining a new level hash value with a smaller number, and finally, inevitably forming an inverted tree, wherein a root hash value is left at the position of the tree root, and the root hash value is called as the root node of the hash tree.

Database cluster: refers to a collection of nodes made up of some data nodes in a distributed database. Wherein, the data node refers to a server or a process for storing some partial data in the database cluster.

And (3) data migration: and transferring data among a plurality of nodes or clusters, for example, transferring all data included in each data node in the old database cluster to the data nodes included in the new database cluster. The implementation of data migration can be divided into 3 phases: preparation before data migration, implementation of data migration and verification after data migration. The checking after data migration refers to checking the data consistency of the new database cluster and the old database cluster: when data migration is performed between the new database cluster and the old database cluster, whether the data of the new database cluster and the old database cluster are migrated correctly or not needs to be compared after the data migration is completed, whether the data of the new database cluster and the old database cluster are identical or not is checked, the data consistency of the new database cluster and the old database cluster is checked, and the data checking result is an important basis for judging whether the new database cluster can be formally started or not.

In the present specification, a data consistency check method of a distributed database cluster is provided, and the present specification relates to a data consistency check apparatus of a distributed database cluster, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.

Fig. 1 shows a flowchart of a data consistency check method for a distributed database cluster according to an embodiment of the present disclosure, which specifically includes the following steps:

step 102: under the condition that the completion of data migration from the first database cluster to the second database cluster is detected, constructing a first node hash tree of the first database cluster and a second node hash tree of the second database cluster according to a first hash tree construction rule, wherein the first hash tree construction rule comprises: and constructing a node hash tree according to the hash value of the data stored by the data nodes included in the database cluster.

In practical applications, after migrating data from a first database cluster to a second database cluster, it is necessary to determine whether the data of the second database cluster is consistent with the data of the first database cluster, and because of the huge amount of data, performing a data-by-data comparison is a very time-consuming process, and thus performing a full-data comparison to determine whether the data of the two database clusters are consistent is impractical. By adopting a mode of extracting part of data from the whole data for comparison, the problem of inconsistency possibly exists in the data which is not extracted in a certain probability, and the accuracy rate of checking the consistency of the data of the two database clusters is lower.

In order to quickly and accurately check whether data of two database clusters are consistent, the present specification provides a data consistency check method of a distributed database cluster, and when it is detected that data migration from a first database cluster to a second database cluster is completed, a first node hash tree of the first database cluster and a second node hash tree of the second database cluster may be first constructed according to a first hash tree construction rule, where the first hash tree construction rule includes: constructing a node hash tree according to hash values of data stored by data nodes included in the database cluster; and then constructing a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule comprises: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree; and comparing hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree, and determining that the data of the first database cluster and the data of the second database cluster are consistent under the condition that the hash values of the root nodes of the first database cluster hash tree and the root nodes of the second database cluster hash tree are consistent. The root nodes of the database cluster hash tree can represent the total data contained in the database cluster, and whether the total data contained in the two database cluster hash trees are consistent can be rapidly and accurately determined only by comparing whether the root nodes of the two database cluster hash trees are consistent, so that the sampling of the total data is avoided, the possibility of inconsistent non-sampled data is further avoided, and the accuracy of checking the data consistency of the two database clusters is greatly improved.

It should be noted that the database cluster includes at least two databases, each database including at least one data node, each data node being configured to store data. Migration of the entire amount of data in one database cluster to another database cluster often occurs in the database clusters, and after the data migration is completed, it is required to check whether the data in the two database clusters are consistent, and in the case that the data are consistent, the new database cluster can be formally started later.

Specifically, the first database cluster refers to an old database cluster storing data to be migrated, the second database cluster refers to a new database cluster receiving the data to be migrated, that is, data stored in the first database cluster is migrated to the second database cluster, and when all data in the first database cluster are migrated to the second database cluster, it is determined that data migration from the first database cluster to the second database cluster is completed.

In an alternative implementation manner of this embodiment, in order to ensure that the first node hash tree constructed for the data node included in the first database cluster and the second node hash tree constructed for the data node included in the second database cluster have the same tree structure, the node hash tree should be constructed by adopting the same hash tree construction rule (i.e., the first hash tree construction rule). In specific implementation, the first node hash tree may be constructed according to the hash value of the data stored in the data node included in the first database cluster, and the second node hash tree may be constructed according to the hash value of the data stored in the data node included in the second database cluster.

The specific implementation process of constructing the first node hash tree according to the hash value of the data stored by the data node included in the first database cluster may be as follows:

acquiring first data stored by a data node included in a first database cluster;

dividing the first data according to the first length to obtain at least two first data segments;

and constructing a first node hash tree by taking hash values of data of at least two first data segments as leaf nodes.

Specifically, the data node may be any data node in the first database cluster, and since the first database cluster includes at least two data nodes, for each data node in the at least two data nodes, a corresponding first node hash tree is constructed according to the above operation steps. That is, the first database cluster includes several data nodes, and several first node hash trees are constructed.

In addition, the data nodes may store data in the form of a table, and for any data node, the data in the table in the data node may be obtained, and then the data in the table is divided according to a first length, so as to obtain at least two first data segments, where the first length may be preset, and the length refers to the length of the divided data, for example, the first length may be 128.

The specific implementation process of constructing the second node hash tree according to the hash value of the data stored by the data node included in the second database cluster may be as follows:

acquiring second data stored by a data node included in a second database cluster;

dividing the second data according to the first length to obtain at least two second data segments;

and constructing a second node hash tree by taking hash values of the data of at least two second data segments as leaf nodes.

Specifically, the data node may be any data node in the second database cluster, and since the second database cluster includes at least two data nodes, for each data node in the at least two data nodes, a corresponding second node hash tree is constructed according to the above operation steps. That is, the second database cluster includes several data nodes, and several second node hash trees are constructed.

It should be noted that, the data stored in the data nodes included in the first database cluster and the second database cluster are all divided by adopting the first length, so that the same division mode of the data stored in the data nodes included in the first database cluster and the second database cluster is ensured.

In an optional implementation manner of this embodiment, the hash tree of the first node is constructed by using hash values of data of at least two first data segments as leaf nodes, and the specific implementation process may be as follows:

for each first data segment in at least two first data segments, calculating a hash value of the data of each first data segment through a first hash algorithm;

taking the at least two hash values obtained through calculation as leaf nodes of the ith layer, and carrying out first combination operation on the hash values of the leaf nodes of the ith layer to obtain leaf nodes of the (i+1) th layer, wherein i is greater than or equal to 1;

determining whether the number of leaf nodes of the (i+1) th layer is equal to 1, if so, determining the leaf nodes of the (i+1) th layer as root nodes; if not, the operation step of carrying out first combination operation on the hash value of the leaf node of the ith layer to obtain the leaf node of the (i+1) th layer by carrying out self-increment of 1 on the i;

and constructing a first node hash tree according to the obtained leaf nodes and root nodes of each layer, and storing the corresponding relation between the identifiers of the root nodes and the data nodes.

Specifically, the first hash Algorithm is a preset hash Algorithm, which is used to calculate a hash value of the data of the first data segment, for example, the first hash Algorithm may be an MD5 (MD 5 Message-Digest Algorithm). The first combination operation is a rule for combining hash values of leaf nodes included in the ith layer and performing operation after combination, and the rule can be set in advance.

In order to facilitate the subsequent positioning of the data node with inconsistent data, the data node and the first node hash tree need to be associated, so that in the present specification, the identifiers of the root node of the first node hash tree and the corresponding data node are stored.

The method comprises the steps of performing a first combination operation on hash values of leaf nodes of an ith layer to obtain leaf nodes of an (i+1) th layer, wherein the specific implementation process can be as follows:

dividing leaf nodes of an ith layer according to a first grouping rule to obtain a plurality of first leaf node combinations;

and aiming at each first leaf node combination, carrying out target operation on the leaf nodes included in the first leaf node combination to obtain the leaf node of the (i+1) th layer.

Specifically, the first grouping rule refers to a rule for combining hash values of leaf nodes included in the i-th layer, for example, the hash values of the leaf nodes included in the i-th layer are combined in pairs. The target operation refers to a rule for operating hash values of the combined leaf nodes, such as exclusive or operation, addition operation or subtraction operation on every two leaf nodes.

For example, fig. 2 is a schematic diagram of a construction process of a hash tree of a first node, where the first database cluster includes a data node 1, a data node 2, a data node 3 and a data node 4, as shown in fig. 2, for the data node 1, first data stored in the first database cluster is obtained, the first data is divided into 8 data segments, 8 hash values of A1-A8 and the like are obtained by calculating the 8 data segments, A1-A8 is used as a leaf node of A1 st layer, two leaf nodes of the 1 st layer are combined in sequence (A1 and A2 are combined, A3 and A4 are combined, A5 and A6 are combined, and A7 and A8 are combined), and an exclusive or operation is performed on the hash values of two leaf nodes in each combination to obtain a leaf node of the 2 nd layer; because the number of the leaf nodes of the layer 2 is 4 and is not 1, the leaf nodes of the layer 2 are continuously combined in pairs in sequence, and the hash values of the two leaf nodes in each combination are subjected to exclusive OR operation to obtain the leaf node of the layer 3; since the number of the leaf nodes of the 3 rd layer is 2 and is not 1, the leaf nodes of the 3 rd layer are sequentially combined in pairs, and the hash values of the two leaf nodes in each combination are subjected to exclusive OR operation to obtain the leaf node X1 of the 4 th layer, and since the number of the leaf nodes of the 4 th layer is 1, the leaf node X1 of the 4 th layer is determined to be a root node, and each leaf node of the 1 st layer to the 3 rd layer and the root node X1 of the 4 th layer form a first node hash tree corresponding to the data node 1. For the data node 2, the data node 3 and the data node 4 included in the first database cluster, the same method is adopted to respectively construct first node hash trees (first node hash tree corresponding to the data node 2, first node hash tree corresponding to the data node 3 and first node hash tree corresponding to the data node 4) with root nodes being X2, X3 and X4 as shown in fig. 2, and a correspondence table between the root nodes and the identifications of the data nodes is stored as shown in the following table 1.

Table 1 correspondence table between identities of root nodes and data nodes

Root node	Identification of data nodes
		X1	Data node 1
X2	Data node 2
		X3	Data node 3
X4	Data node 4

Correspondingly, the hash value of the data of at least two second data segments is taken as a leaf node, and a second node hash tree is constructed, and the specific implementation process can be as follows:

for each second data segment in the at least two second data segments, calculating a hash value of the data of each second data segment through a first hash algorithm;

taking the at least two hash values obtained through calculation as leaf nodes of the j th layer, and carrying out first combination operation on the hash values of the leaf nodes of the j th layer to obtain leaf nodes of the j+1 th layer, wherein j is greater than or equal to 1;

determining whether the number of leaf nodes of the j+1th layer is equal to 1, if so, determining the leaf nodes of the j+1th layer as root nodes; if not, the j is increased by 1, and the operation step of carrying out first combination operation on the hash value of the leaf node of the j layer to obtain the leaf node of the j+1 layer is carried out;

and constructing a second node hash tree according to the obtained leaf nodes and root nodes of each layer, and storing the corresponding relation between the identifiers of the root nodes and the data nodes.

The method comprises the steps of performing a first combination operation on hash values of leaf nodes of a j-th layer to obtain leaf nodes of a j+1th layer, wherein the specific implementation process can be as follows:

dividing leaf nodes of a j-th layer according to a first grouping rule to obtain a plurality of second leaf node combinations;

and aiming at each second leaf node combination, carrying out target operation on the leaf nodes included in the second leaf node combination to obtain the leaf nodes of the j+1th layer.

It should be noted that, the construction of the second node hash tree adopts the same hash tree construction rule as the construction of the first node hash tree, and the specific construction process is similar, so that the structures of the constructed first node hash tree and the constructed second node hash tree are identical.

For example, fig. 3 is a schematic diagram of a construction process of a second node hash tree, where the second database cluster includes a data node 5, a data node 6, a data node 7 and a data node 8, as shown in fig. 3, for the data node 5, stored second data is acquired, the second data is divided into 8 data segments, 8 hash values such as B1-B8 are calculated for the 8 data segments, B1-B8 is used as a leaf node of the 1 st layer, a second node hash tree (a second node hash tree corresponding to the data node 5) with a root node of Y1 is constructed by using the same method as the previous example, and second node hash trees (a second node hash tree corresponding to the data node 6, a second node hash tree corresponding to the data node 7 and a second node hash tree corresponding to the data node 8) with root nodes of Y2, Y3 and Y4 respectively shown in fig. 2 are constructed by using the same method, and then a correspondence table between the root node and the identifier of the data node table 2 is updated as shown below.

Table 2 correspondence table between updated root node and data node identifications

Root node	Identification of data nodes
		X1	Data node 1
X2	Data node 2
		X3	Data node 3
X4	Data node 4
		Y1	Data node 5
Y2	Data node 6
		Y3	Data node 7
Y4	Data node 8

It is worth to say that, in the present disclosure, node hash trees can be synchronously constructed for each data node included in the first database cluster and the second database cluster, and each data node performs concurrent computation to construct a hash tree of a subsequent database cluster level, so that efficiency of constructing the node hash tree is higher, and efficiency of constructing the hash tree of the subsequent database cluster is further improved, so that data consistency of the two database clusters is checked by comparing the hash tree of the database cluster conveniently, rapidly and accurately.

Further, for the case of continuous writing, the migration manner of the database cluster may affect the manner of acquiring data in the process of performing data consistency check, so that the first data stored in the data node included in the first database cluster is acquired, and the specific implementation process may be as follows:

Under the condition that the migration mode is continuous write migration, determining a data migration completion time point, and acquiring first data stored by a data node included in a first database cluster before the data migration completion time point;

and under the condition that the migration mode is write-down migration, acquiring the total first data stored by the data nodes included in the first database cluster.

Specifically, the migration mode is a mode of immediately receiving other newly written data by the database cluster after data migration is completed, the non-stop migration is immediately receiving other newly written data after data migration is completed, and the stop migration is stopping receiving other newly written data after data migration is completed. That is, if the migration is not stop, the data consistency check needs to be performed on the data before the data migration is completed, and the data which is newly written does not need to be checked later, so that in the present specification, under the condition that the migration mode is not stop, the data migration is required to be determined firstly, then the first data stored in the data node included in the first database cluster before the data migration is completed is obtained, that is, the first data stored in the data node included in the first database cluster can be screened through the data migration completion time; if the migration is the write-down migration, the data consistency check needs to be performed on the total first data stored by the data nodes included in the first database cluster, so that the total first data stored by the data nodes included in the first database cluster can be directly obtained under the condition that the migration mode is the write-down migration in the specification, and the first data does not need to be screened through a time point.

Correspondingly, the specific implementation process of obtaining the second data stored by the data node included in the second database cluster may be as follows:

under the condition that the migration mode is continuous write migration, determining a data migration completion time point, and acquiring second data stored by a data node included in a second database cluster before the data migration completion time point;

and under the condition that the migration mode is write-down migration, acquiring the full second data stored by the data nodes included in the second database cluster.

Step 104: constructing a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule comprises: and constructing a database cluster hash tree according to the hash value of the root node of the node hash tree.

Specifically, on the basis of constructing a first node hash tree of the first database cluster and a second node hash tree of the second database cluster according to the first hash tree construction rule, further, constructing the first database cluster hash tree and the second database cluster hash tree according to the second hash tree construction rule.

In an alternative implementation manner of this embodiment, in order to ensure that the tree structures of the first database cluster hash tree constructed for the first database cluster and the second database cluster hash tree constructed for the second database cluster are identical, the database cluster hash tree should be constructed by adopting the same hash tree construction rule (i.e., the second hash tree construction rule). Since the step 102 may construct a plurality of first node hash trees for each data node included in the first database cluster, and may construct a plurality of second node hash trees for each data node included in the second database cluster. Therefore, when the database cluster hash tree is built, the first database cluster hash tree can be built according to the hash value of the root node of each first node hash tree, and the second database cluster hash tree can be built according to the hash value of the root node of each second node hash tree.

The method comprises the following steps of constructing a first database cluster hash tree according to hash values of root nodes of each first node hash tree:

taking the hash value of the root node of the first node hash tree as a leaf node of a k-th layer;

Performing a second combination operation on the hash value of the leaf node of the k layer to obtain the leaf node of the k+1th layer, wherein k is greater than or equal to 1;

determining whether the number of leaf nodes of the k+1 layer is equal to 1, if so, determining the leaf nodes of the k+1 layer as root nodes; if not, the k is increased by 1, and the operation step of carrying out second combination operation on the hash value of the leaf node of the k layer to obtain the leaf node of the k+1 layer is carried out;

and constructing and obtaining a first database cluster hash tree according to the obtained leaf nodes and root nodes of each layer.

Specifically, the second combination operation refers to a rule that combines hash values of leaf nodes included in the k-th layer and performs operation after combination, where the rule may be set in advance. It should be noted that, the first combining operation is used to construct the node hash tree, and the second combining operation is used to construct the database cluster hash tree, so the second combining operation may be the same as or different from the first combining operation. In the specific implementation process, the second combination operation is performed on the hash value of the leaf node of the kth layer, so that the specific process of obtaining the leaf node of the k+1 layer is similar to the specific process of performing the first combination operation on the hash value of the leaf node of the ith layer to obtain the leaf node of the i+1 layer, which is not described in detail herein.

Along the above example, fig. 4 is a schematic diagram of a construction process of a first database cluster hash tree, as shown in fig. 4, a root node X1 of a first node hash tree corresponding to a data node 1, a root node X2 of a first node hash tree corresponding to a data node 2, a root node X3 of a first node hash tree corresponding to a data node 3, and a root node X4 of a first node hash tree corresponding to a data node 4 are taken as leaf nodes of a 1 st layer, the leaf nodes of the 1 st layer are combined in pairs (X1 and X2, X3 and X4) in sequence, and hash values of two leaf nodes in each combination are subjected to exclusive or operation to obtain leaf nodes of the 2 nd layer; since the number of the leaf nodes of the layer 2 is 2 and is not 1, the leaf nodes of the layer 2 are sequentially combined two by two, and the hash values of the two leaf nodes in each combination are subjected to exclusive OR operation to obtain the leaf node G1 of the layer 3, and since the number of the leaf nodes of the layer 3 is 1, the leaf node G1 of the layer 3 is determined to be a root node, and each leaf node of the layers 1 to 2 and the root node G1 of the layer 3 form a hash tree of the first database cluster.

Correspondingly, the hash tree of the second database cluster is constructed according to the hash value of the root node of each hash tree of the second node, and the specific implementation process can be as follows:

Taking the hash value of the root node of the second node hash tree as a leaf node of the m-th layer;

performing a second combination operation on the hash value of the leaf node of the m-th layer to obtain the leaf node of the m+1th layer, wherein m is greater than or equal to 1;

determining whether the number of the leaf nodes of the m+1 layer is equal to 1, if so, determining the leaf nodes of the m+1 layer as root nodes; if not, the m is increased by 1, and the operation step of carrying out second combination operation on the hash value of the leaf node of the m layer to obtain the leaf node of the m+1 layer is carried out;

and constructing a hash tree of the second database cluster according to the obtained leaf nodes and root nodes of each layer.

It should be noted that, the process of constructing the first database cluster hash tree and the process of constructing the second database cluster hash tree adopt the same hash tree construction rule (i.e., the second hash tree construction rule), so as to ensure that the tree structures of the constructed first database cluster hash tree and the constructed second database cluster hash tree are the same.

Along the above example, fig. 5 is a schematic diagram illustrating a construction process of the second database cluster hash tree, and as shown in fig. 5, a root node Y1 of the second node hash tree corresponding to the data node 5, a root node Y2 of the second node hash tree corresponding to the data node 6, a root node Y3 of the second node hash tree corresponding to the data node 7, and a root node Y4 of the second node hash tree corresponding to the data node 8 are used as leaf nodes of the 1 st layer, and the second database cluster hash tree with a root node G2 is constructed by using the same method as the above example for the leaf nodes of the 1 st layer.

Step 106: comparing hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree, and determining that the data of the first database cluster and the data of the second database cluster are consistent under the condition that the hash values of the root nodes of the first database cluster hash tree and the root nodes of the second database cluster hash tree are consistent.

Specifically, on the basis of constructing the first database cluster hash tree and the second database cluster hash tree according to the second hash tree construction rule, further, hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree are compared, and under the condition that the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree are consistent, data of the first database cluster and data of the second database cluster are determined to be consistent.

It is worth to say that, in the specification, the node hash tree can be firstly constructed based on the real full-quantity data of each data node, then the hash trees of the two data clusters are respectively constructed according to the node hash trees of the data nodes included in the two data clusters, so that the full-quantity data included in the data clusters are ensured to participate in the construction of the database cluster hash tree, that is, the root node of the database cluster hash tree can represent the full-quantity data included in the database cluster, at the moment, whether the root nodes of the two data cluster hash trees are consistent or not only needs to be compared, and whether the full-quantity data included in the two data cluster hash trees are consistent or not can be quickly and accurately determined, so that the sampling of the full-quantity data is avoided, the possibility of non-consistency of the non-sampled data is avoided, and the accuracy of checking the data consistency of the two data clusters is greatly improved.

Further, if hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree are inconsistent, it is indicated that inconsistent data exists in the first database cluster and the second database cluster, and in the present specification, the data node with inconsistent data may be further located, so that a specific data segment in the data node with inconsistent data is located, so as to determine which data segment or data segments have a problem, and the specific implementation process may be as follows:

under the condition that hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree are inconsistent, sequentially comparing hash values of leaf nodes included in the first database cluster hash tree and the second database cluster hash tree from the root nodes to leaf node directions, and determining at least two first target leaf nodes;

determining at least two corresponding target data nodes according to the at least two first target leaf nodes;

acquiring a first target node hash tree corresponding to a target data node in a first database cluster, and acquiring a second target node hash tree corresponding to the target data node in a second database cluster;

Sequentially directing the root nodes of the first target node hash tree and the second target node hash tree to leaf node directions, comparing hash values of leaf nodes included in the first target node hash tree and the second target node hash tree, and determining a second target leaf node;

and determining the corresponding data segment according to the second target leaf node.

Specifically, the first target leaf nodes are at least two bottom layer (layer 1) leaf nodes with different hash values in leaf nodes included in the first database cluster hash tree and the second database cluster hash tree, so that the first target leaf nodes are at least two, the number of the first target leaf nodes is double, one half of the first target leaf nodes are leaf nodes in the first database cluster hash tree, and the other half of the first target leaf nodes are leaf nodes in the second database cluster hash tree.

In addition, since the bottom leaf node of the database cluster hash tree is the root node of the node hash tree, after the first target leaf node (that is, the root node of the node hash tree) is determined, at least two corresponding target data nodes (half of the corresponding target data nodes are data nodes of the first database cluster and the other half of the corresponding target data nodes are data nodes of the second database cluster) can be determined according to the corresponding relation between the identifiers of the previously stored root node and the data nodes, so that the first node hash tree and the second node hash tree corresponding to the target data nodes are obtained, and then the bottom leaf node (namely, the second target leaf node) with inconsistent hash values in the target node hash tree can be located by comparing the two target node hash trees. Therefore, after the second target leaf node is determined, the data segment in the corresponding data node can be determined according to the second target leaf node, namely, the data segment in the data node which is specifically positioned to be inconsistent in data is realized.

The method comprises the steps that a first database cluster hash tree and a second database cluster hash tree are assumed to comprise p layers of leaf nodes, and p is greater than or equal to 1; the hash values of leaf nodes included in the first database cluster hash tree and the second database cluster hash tree are compared to determine a first target leaf node, and the specific implementation process may be as follows:

determining leaf nodes with different hash values in leaf nodes of a p-th layer in the first database cluster hash tree and the second database cluster hash tree as third target leaf nodes;

p is subtracted from 1, whether p is equal to 0 is judged, if yes, a third target leaf node is determined to be a first target leaf node; if not, the leaf node subordinate to the third target leaf node is taken as the leaf node of the p-th layer, and the operation step of comparing the hash values of the leaf nodes of the p-th layer in the first database cluster hash tree and the second database cluster hash tree is returned.

It should be noted that, comparing hash values of leaf nodes included in the first target node hash tree and the second target node hash tree, determining a specific implementation process of the second target leaf node is similar to comparing hash values of leaf nodes included in the first database cluster hash tree and the second database cluster hash tree, and determining specific implementation processes of at least two first target leaf nodes, which are not described in detail herein. In addition, determining data nodes with inconsistent data needs to compare leaf nodes included in hash trees of different database clusters one by one, so that the complexity of the data nodes with inconsistent data is determined to be O (log (n)), wherein n is the number of the data nodes in a first database cluster (or a second database cluster); and determining the data segments in the data nodes with inconsistent data, and comparing leaf nodes included in hash trees of different nodes one by one, so as to determine the complexity of the data segments in the data nodes with inconsistent data as O (log (f)), wherein f is the number of the data segments obtained by splitting the data in the data nodes.

Along the above example, fig. 6 is a schematic diagram of a comparison process of the first database cluster hash tree and the second database cluster hash tree, as shown in fig. 6, where the first database cluster hash tree and the second database cluster hash tree each include a leaf node of layer 2, and in the case that hash values of a root node G1 of the first database cluster hash tree and a root node G2 of the second database cluster hash tree are inconsistent, the hash values of leaf nodes of layer 2 in the first database cluster hash tree and the second database cluster hash tree are sequentially downward, and it is assumed that leaf nodes with different hash values in leaf nodes of layer 2 in the first database cluster hash tree and the second database cluster hash tree are leaf nodes of left side, and at this time, the leaf node of layer 2 in the first database cluster hash tree and the second database cluster hash tree is determined to be a leaf node of third target, and since 2 is 1 and is not equal to 0, leaf nodes of layer 2 in the first database cluster hash tree and the leaf node of layer 2 in the second database cluster tree are subordinate to the leaf node of layer 2; then, hash values of leaf nodes of the 1 st layer in the first database cluster hash tree and the second database cluster hash tree are compared, and leaf nodes X1 and Y1 with different hash values in leaf nodes of the 1 st layer in the first database cluster hash tree and the second database cluster hash tree are assumed to be left leaf nodes X1 and Y1, at this time, leaf nodes X1 and Y1 on the 1 st layer in the first database cluster hash tree and the second database cluster hash tree are determined to be third target leaf nodes, and since 1 self-subtracting 1 is 0, leaf nodes X1 and Y1 on the 1 st layer (i.e., bottom layer) in the first database cluster hash tree and the second database cluster hash tree are determined to be first target leaf nodes.

According to the above table 2, it is determined that the target data node corresponding to the leaf nodes X1 and Y1 is the data node 1 and the data node 5, and then the first node hash tree (the node hash tree with the root node X1) corresponding to the data node 1 and the second node hash tree (the node hash tree with the root node Y1) corresponding to the data node 5 are obtained.

FIG. 7 is a schematic diagram of a comparison process of the first node hash tree and the second node hash tree, as shown in FIG. 7, where the first node hash tree corresponding to the data node 1 and the second node hash tree corresponding to the data node 5 each include a leaf node of 3 layers, and it has been determined that hash values of a root node X1 of the first node hash tree and a root node Y2 of the second node hash tree are inconsistent, and sequentially downward, hash values of leaf nodes of 3 layers in the first node hash tree and the second node hash tree are compared, and leaf nodes of different hash values in leaf nodes of 3 layers in the first node hash tree and the second node hash tree are assumed to be leaf nodes of right sides, and at this time, leaf nodes of the right sides of 3 layers in the first node hash tree and the second node hash tree are determined to be fourth target nodes, and leaf nodes of leaves of the lower leaves of the leaf nodes of the right sides of 3 layers in the first node hash tree and the second node hash tree are assumed to be leaf nodes of 2 layers in the leaf nodes of 3 layers because 3 is subtracted from 1 to be 2; then, comparing hash values of leaf nodes of the 2 nd layers in the first node hash tree and the second node hash tree, and assuming that leaf nodes with different hash values in the leaf nodes of the 2 nd layers in the first node hash tree and the second node hash tree are left leaf nodes, determining the leaf nodes of the 2 nd layers in the first node hash tree and the second node hash tree as fourth target leaf nodes, wherein the leaf nodes subordinate to the leaf nodes of the 2 nd layers in the first node hash tree and the second node hash tree are used as leaf nodes of the 1 st layer because 2 is subtracted from 1 to be 1 and is not equal to 0; then, hash values of leaf nodes of 1 st layer in the first node hash tree and the second node hash tree are compared, and leaf nodes with different hash values in leaf nodes of 1 st layer in the first node hash tree and the second node hash tree are assumed to be left leaf nodes A5 and B5, at this time, leaf nodes A5 and B5 on the left of 1 st layer in the first node hash tree and the second node hash tree are determined to be fourth target leaf nodes, and leaf nodes A5 and B5 on the left of 1 st layer (i.e. bottom layer) in the first node hash tree and the second node hash tree are determined to be second target leaf nodes because 1 is subtracted from 1 to 0.

Since the leaf nodes A5 and B5 correspond to the 5 th segment of data in the data node 1 and the data node 5, respectively, the data in the 5 th segment of data in the data node 1 and the data node 5 can be accurately located.

It is worth to say that, the data nodes with inconsistent data can be determined by comparing the database cluster hash tree layer by layer, then the node hash tree is further compared layer by layer, and data segments with inconsistent data in the data nodes are determined, so that the data nodes with inconsistent data and the data segments in the data nodes can be rapidly positioned, and subsequent rapid modification of the data in the data segments with inconsistent data is facilitated.

It should be noted that, in the method for quickly positioning data nodes and data segments in the data nodes, the topology structures of the first database cluster and the second database cluster are required to be the same, the construction rules should be the same when constructing node hash trees for different nodes, and the construction rules should be the same when constructing database cluster hash trees for different database clusters, i.e. the tree structures between the finally constructed node hash trees and between the database cluster hash trees should be the same, and then leaf nodes can be compared one by one to finally position the data nodes with inconsistent data segments in the data nodes.

In addition, for the case that the topology structures of the first database cluster and the second database cluster are inconsistent, because the data ranges of the first database cluster and the second database cluster are the same, that is, the data minimum range to the maximum range of any one database cluster is the same (in the case of the MD5 hash algorithm, the range is 0-2≡128), if the first database cluster and the second database cluster segment data according to a certain rule (the same rule), the obtained data segments should be consistent, then the hash value calculated for the data segments should be consistent, the hash value of the root node of the node hash tree constructed according to the hash value should be consistent, and the root node of the constructed database cluster hash tree should be consistent.

That is, for the case that the topology structures of the first database cluster and the second database cluster are inconsistent, the method can also be adopted to determine whether the data of the database clusters are consistent by constructing a node hash tree and a database cluster hash tree and comparing hash values of root nodes of the database cluster hash tree. However, because the topological structures of the first database cluster and the second database cluster are inconsistent, the constructed node hash trees and the tree structures among the database cluster hash trees may be different, namely leaf nodes cannot be in one-to-one correspondence, so that the leaf nodes cannot be compared one by one, data nodes with inconsistent data cannot be positioned to be accurate, data segments in the data nodes with inconsistent data cannot be positioned to be accurate, only the approximate range of the inconsistent data can be given, and then the data nodes with inconsistent data can be further determined in other modes in the approximate range of the inconsistent data.

The present disclosure provides a method for checking data consistency of a distributed database cluster, where when it is detected that data migration from a first database cluster to a second database cluster is completed, a first node hash tree of the first database cluster and a second node hash tree of the second database cluster may be first constructed according to a first hash tree construction rule, where the first hash tree construction rule includes: constructing a node hash tree according to hash values of data stored by data nodes included in the database cluster; and then constructing a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule comprises: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree; and comparing hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree, and determining that the data of the first database cluster and the data of the second database cluster are consistent under the condition that the hash values of the root nodes of the first database cluster hash tree and the root nodes of the second database cluster hash tree are consistent. Under the condition, the node hash tree is firstly constructed based on the real full-quantity data of each data node, then the hash trees of the two data clusters are respectively constructed according to the node hash trees of the data nodes included in the two data clusters, the full-quantity data included in the data clusters are ensured to participate in the construction of the hash trees of the data clusters, the root node of the hash tree of the data clusters can represent the full-quantity data included in the data clusters, at the moment, whether the root nodes of the hash trees of the two data clusters are consistent or not only needs to be compared, whether the full-quantity data included in the hash trees of the two data clusters are consistent or not can be rapidly and accurately determined, sampling of the full-quantity data is avoided, the possibility of non-consistency of the non-sampled data is further avoided, and the accuracy of checking the data consistency of the two data clusters is greatly improved. Further, the leaf nodes of the hash tree can be compared one by one, and finally the data nodes with inconsistent data and the data segments in the data nodes can be positioned.

Corresponding to the method embodiment, the present disclosure further provides an embodiment of a data consistency checking device for a distributed database cluster, and fig. 8 shows a schematic structural diagram of a data consistency checking device for a distributed database cluster according to an embodiment of the present disclosure. As shown in fig. 8, the apparatus includes:

a first construction module 802 configured to construct a first node hash tree of a first database cluster and a second node hash tree of a second database cluster according to a first hash tree construction rule, in case that a completion of data migration from the first database cluster to the second database cluster is detected, wherein the first hash tree construction rule comprises: constructing a node hash tree according to hash values of data stored by data nodes included in the database cluster;

a second construction module 804 configured to construct the first database cluster hash tree and the second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule includes: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree;

a first determining module 806 is configured to compare hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree, and determine that data of the first database cluster and the second database cluster are consistent in a case where hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree are consistent.

In an alternative implementation of this embodiment, the first building block 802 is further configured to:

acquiring data stored by a data node included in the database cluster;

In an alternative implementation of this embodiment, the second building module 804 is further configured to:

In an optional implementation manner of this embodiment, the apparatus further includes:

a first comparison module configured to compare hash values of leaf nodes included in the first database cluster hash tree and the second database cluster hash tree sequentially from the root node to a leaf node direction and determine at least two first target leaf nodes when hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree are inconsistent;

And the second determining module is configured to determine at least two corresponding target data nodes according to the at least two first target leaf nodes.

the acquisition module is configured to acquire a first target node hash tree corresponding to a target data node in the first database cluster and acquire a second target node hash tree corresponding to the target data node in the second database cluster;

the second comparison module is configured to compare hash values of leaf nodes included in the first target node hash tree and the second target node hash tree from root nodes of the first target node hash tree and the second target node hash tree to leaf node directions in sequence, and determine a second target leaf node;

and a third determining module configured to determine a corresponding data segment according to the second target leaf node.

In an optional implementation manner of this embodiment, the first database cluster hash tree and the second database cluster hash tree each include p-layer leaf nodes, where p is greater than or equal to 1; the first comparison module is further configured to:

The data consistency checking device of the distributed database clusters can be firstly constructed based on real full data of each data node to obtain node hash trees, then the hash trees of the two database clusters are respectively constructed according to the node hash trees of the data nodes included in the two database clusters, the full data included in the database clusters are ensured to participate in the construction of the hash trees of the database clusters, the root node of the hash tree of the database clusters can represent the full data included in the database clusters, and at the moment, whether the full data included in the hash trees of the two database clusters are consistent or not can be quickly and accurately determined only by comparing whether the root nodes of the hash trees of the two database clusters are consistent or not, so that sampling of the full data is avoided, the possibility of non-consistency of the non-sampled data is avoided, and the accuracy of checking the data consistency of the two database clusters is greatly improved.

The foregoing is a schematic solution of a data consistency checking apparatus for a distributed database cluster according to the present embodiment. It should be noted that, the technical solution of the data consistency checking device of the distributed database cluster and the technical solution of the data consistency checking method of the distributed database cluster belong to the same concept, and details of the technical solution of the data consistency checking device of the distributed database cluster, which are not described in detail, can be referred to the description of the technical solution of the data consistency checking method of the distributed database cluster.

Fig. 9 illustrates a block diagram of a computing device 900 provided in accordance with an embodiment of the present specification. The components of computing device 900 include, but are not limited to, memory 910 and processor 920. The processor 920 is connected to the memory 910 through a bus 930, and a database cluster 950 is used to store data.

Computing device 900 also includes an access device 940, access device 940 enabling computing device 900 to communicate via one or more networks 960. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 940 may include one or more of any type of network interface, wired or wireless (e.g., a Network Interface Card (NIC)), such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 900 and other components not shown in FIG. 9 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 9 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 900 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 900 may also be a mobile or stationary server.

Wherein the processor 920 is configured to execute the following computer-executable instructions:

The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the data consistency checking method of the distributed database cluster belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the data consistency checking method of the distributed database cluster.

An embodiment of the present disclosure also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, are configured to implement the method of:

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the data consistency checking method of the distributed database cluster belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the data consistency checking method of the distributed database cluster.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present description is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present description. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary in the specification.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the disclosure and the practical application, to thereby enable others skilled in the art to best understand and utilize the disclosure. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. A method of data consistency verification for a distributed database cluster, the method comprising:

2. The method of claim 1, constructing a node hash tree from hash values of data stored by data nodes included in a database cluster, comprising:

acquiring data stored by a data node included in the database cluster;

3. The method of claim 2, constructing the node hash tree with hash values of the data of the at least two data segments as leaf nodes, comprising:

4. The method of claim 1, constructing a database cluster hash tree from hash values of root nodes of the node hash tree, comprising:

5. The method of claim 1, the method further comprising:

6. The method of claim 5, further comprising, after determining the corresponding target data node from the first target leaf node:

7. The method of claim 5, the first database cluster hash tree and the second database cluster hash tree each comprising p levels of leaf nodes, wherein the p is equal to or greater than 1;

8. The method of claim 2, obtaining data stored by a data node included in the database cluster, comprising:

9. A method according to claim 3, wherein the first combination operation is performed on the hash value of the leaf node of the i-th layer to obtain the leaf node of the i+1-th layer, including:

10. A data consistency check device for a distributed database cluster, the device comprising:

11. A computing device, comprising:

a memory and a processor;

12. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the data consistency check method of a distributed database cluster as claimed in any of claims 1 to 9.