CN113297235A

CN113297235A - Data consistency checking method and device for distributed database cluster

Info

Publication number: CN113297235A
Application number: CN202011211408.5A
Authority: CN
Inventors: 郭超
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-11-03
Filing date: 2020-11-03
Publication date: 2021-08-24
Anticipated expiration: 2040-11-03
Also published as: CN113297235B

Abstract

The present specification provides a method and an apparatus for checking data consistency of a distributed database cluster, wherein the method for checking data consistency of the distributed database cluster comprises: under the condition that the data migration from the first database cluster to the second database cluster is detected to be completed, a first node hash tree of the first database cluster and a second node hash tree of the second database cluster are constructed according to the first hash tree construction rule; constructing a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule comprises: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree; and determining that the data of the first database cluster and the second database cluster are consistent under the condition that the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree are consistent.

Description

Data consistency checking method and device for distributed database cluster

Technical Field

The present disclosure relates to the field of distributed database technologies, and in particular, to a method and an apparatus for checking data consistency of a distributed database cluster.

Background

With the rapid development of computer technology, the storage and management requirements for data are increasing, and distributed databases are produced. The distributed database is characterized in that a plurality of physically dispersed data storage nodes are connected to form a logically uniform database cluster by utilizing a high-speed computer network, and the basic idea is to disperse and store data in the original centralized database on the plurality of data storage nodes connected through the network so as to obtain larger storage capacity and higher concurrent access amount. In a distributed database, the data volume of a database cluster is huge, data migration between database clusters may be needed, after data is migrated from an old database cluster to a new database cluster, it is necessary to make sure whether the data of the new database cluster is consistent with the data of the old database cluster, and because the data volume is huge, comparing data one by one is a very time consuming process, so it is impractical to compare the full data volume to determine whether the data of two database clusters are consistent.

In the prior art, after data migration from an old database cluster to a new database cluster is completed, data of the old database cluster can be sampled at certain intervals, data of the new database cluster are sampled according to the same rule, whether the sampled data are consistent or not is compared one by one, and if the sampled data are consistent, the consistency of the total data included in the old database cluster and the new database cluster is directly determined. However, in the above manner of extracting and comparing partial data from the full amount of data, there may be a problem that the data that is not extracted is inconsistent in a certain probability, and the accuracy rate for checking the data consistency of the two database clusters is low, so that a faster and more accurate method is required to perform the operation or processing of checking the data consistency of the distributed database clusters.

Disclosure of Invention

In view of this, the embodiments of the present specification provide a method for checking data consistency of a distributed database cluster. The present specification also relates to a data consistency checking apparatus for a distributed database cluster, a computing device, and a computer-readable storage medium, so as to solve the technical defects in the prior art.

According to a first aspect of embodiments of the present specification, there is provided a data consistency checking method for a distributed database cluster, the method including:

under the condition that the completion of data migration from a first database cluster to a second database cluster is detected, constructing a first node hash tree of the first database cluster and a second node hash tree of the second database cluster according to a first hash tree construction rule, wherein the first hash tree construction rule comprises the following steps: constructing a node hash tree according to hash values of data stored in data nodes included in the database cluster;

constructing a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule comprises: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree;

and comparing the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree, and determining that the data of the first database cluster and the second database cluster are consistent under the condition that the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree are consistent.

Optionally, constructing a node hash tree according to hash values of data stored in the data nodes included in the database cluster includes:

acquiring data stored by data nodes included in the database cluster;

dividing the data according to a first length to obtain at least two data segments;

and constructing the node hash tree by taking the hash values of the data of the at least two data segments as leaf nodes.

Optionally, the constructing the node hash tree by using the hash values of the data of the at least two data segments as leaf nodes includes:

calculating a hash value of data of each data segment of the at least two data segments by a first hash algorithm;

taking at least two calculated hash values as leaf nodes of an ith layer, and performing first combination operation on the hash values of the leaf nodes of the ith layer to obtain leaf nodes of an (i + 1) th layer, wherein i is greater than or equal to 1;

determining whether the number of the leaf nodes of the (i + 1) th layer is equal to 1, if so, determining the leaf nodes of the (i + 1) th layer as root nodes; if not, increasing the i by 1, and returning to the operation step of executing the first combination operation on the hash value of the leaf node of the ith layer to obtain the leaf node of the (i + 1) th layer;

and constructing the node hash tree according to the leaf nodes and the root nodes of each layer, and storing the corresponding relation between the root nodes and the data node identification.

Optionally, constructing a database cluster hash tree according to the hash value of the root node of the node hash tree includes:

taking the hash value of the root node of the node hash tree as a leaf node of a k layer;

performing second combination operation on the hash value of the leaf node of the kth layer to obtain a leaf node of a kth +1 layer, wherein k is greater than or equal to 1;

determining whether the number of the leaf nodes of the (k + 1) th layer is equal to 1, if so, determining the leaf nodes of the (k + 1) th layer as root nodes; if not, increasing k by 1, and returning to the operation step of executing the second combination operation on the hash value of the leaf node of the kth layer to obtain the leaf node of the kth +1 layer;

and constructing the database cluster hash tree according to the leaf nodes and the root nodes of each layer.

Optionally, the method further includes:

under the condition that the hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree are not consistent, sequentially moving from the root nodes to leaf node directions, comparing the hash values of the leaf nodes included in the first database cluster hash tree and the second database cluster hash tree, and determining at least two first target leaf nodes;

and determining at least two corresponding target data nodes according to the at least two first target leaf nodes.

Optionally, after determining the corresponding target data node according to the first target leaf node, the method further includes:

acquiring a first target node hash tree corresponding to a target data node in the first database cluster, and acquiring a second target node hash tree corresponding to a target data node in the second database cluster;

sequentially from root nodes of the first target node hash tree and the second target node hash tree to leaf node directions, comparing hash values of leaf nodes included in the first target node hash tree and the second target node hash tree, and determining a second target leaf node;

and determining a corresponding data segment according to the second target leaf node.

Optionally, the first database cluster hash tree and the second database cluster hash tree both include p layers of leaf nodes, where p is greater than or equal to 1;

comparing hash values of leaf nodes included in the first database cluster hash tree and the second database cluster hash tree, and determining a first target leaf node, including:

comparing hash values of leaf nodes of a p-th layer in the first database cluster hash tree and the second database cluster hash tree;

determining leaf nodes with different hash values in leaf nodes of the p-th layer in the first database cluster hash tree and the second database cluster hash tree as third target leaf nodes;

self-subtracting 1 from p, judging whether p is equal to 0, and if so, determining the third target leaf node as the first target leaf node; if not, taking the leaf node under the third target leaf node as the leaf node of the p-th layer, and returning to execute the operation step of comparing the hash values of the leaf nodes of the p-th layer in the first database cluster hash tree and the second database cluster hash tree.

Optionally, the obtaining data stored by the data node included in the database cluster includes:

determining a migration mode for data migration from the first database cluster to the second database cluster;

determining a data migration completion time point under the condition that the migration mode is non-stop migration, and acquiring data stored by data nodes included in the database cluster before the data migration completion time point;

and under the condition that the migration mode is stop-write migration, acquiring the full data stored by the data nodes included in the database cluster.

Optionally, performing a first combination operation on the hash value of the leaf node on the ith layer to obtain a leaf node on the (i + 1) th layer, including:

dividing the leaf nodes of the ith layer according to a first grouping rule to obtain a plurality of first leaf node combinations;

and aiming at each first leaf node combination, performing target operation on leaf nodes included in the first leaf node combination to obtain leaf nodes of the (i + 1) th layer.

According to a second aspect of embodiments of the present specification, there is provided a data consistency checking apparatus for a distributed database cluster, the apparatus including:

a first building module configured to build a first node hash tree of a first database cluster and a second node hash tree of a second database cluster according to a first hash tree building rule in case that completion of data migration from the first database cluster to the second database cluster is detected, wherein the first hash tree building rule includes: constructing a node hash tree according to hash values of data stored in data nodes included in the database cluster;

a second construction module configured to construct a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule includes: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree;

a first determination module configured to compare hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree, and determine that data of the first database cluster and the second database cluster are consistent if the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree are consistent.

According to a third aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions to implement the method of:

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the data consistency checking method of the distributed database cluster.

The present specification provides a method for checking data consistency of a distributed database cluster, which may first construct a first node hash tree of a first database cluster and a second node hash tree of a second database cluster according to a first hash tree construction rule when detecting that data migration from the first database cluster to the second database cluster is completed, where the first hash tree construction rule includes: constructing a node hash tree according to hash values of data stored in data nodes included in the database cluster; and then constructing a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule comprises: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree; and then comparing the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree, and determining that the data of the first database cluster and the second database cluster are consistent under the condition that the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree are consistent. Under the condition, the node hash tree is constructed based on the real full data of each data node, then the hash trees of the two database clusters are constructed according to the node hash trees of the data nodes included by the two database clusters, and the fact that the full data included by the database clusters participate in the construction of the database cluster hash tree is guaranteed, so that the root node of the database cluster hash tree can represent the full data included by the database cluster, and whether the full data included by the two database cluster hash trees are consistent can be quickly and accurately determined only by comparing whether the root nodes of the two database cluster hash trees are consistent, sampling of the full data is avoided, the possibility that the non-sampled data are inconsistent is avoided, and the accuracy rate of checking the data consistency of the two database clusters is greatly improved.

Drawings

FIG. 1 is a flowchart of a data consistency checking method for a distributed database cluster according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a process for constructing a hash tree of first nodes according to an embodiment of the present specification;

fig. 3 is a schematic diagram of a process for constructing a second node hash tree according to an embodiment of the present specification;

fig. 4 is a schematic diagram of a process for constructing a hash tree of a first database cluster according to an embodiment of the present specification;

fig. 5 is a schematic diagram of a process for constructing a hash tree of a second database cluster according to an embodiment of the present specification;

fig. 6 is a schematic diagram illustrating a comparison process between a first database cluster hash tree and a second database cluster hash tree according to an embodiment of the present specification;

fig. 7 is a schematic diagram illustrating a comparison process between a first node hash tree and a second node hash tree according to an embodiment of the present specification;

FIG. 8 is a schematic structural diagram of a data consistency checking apparatus for a distributed database cluster according to an embodiment of the present disclosure;

fig. 9 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

And (3) hash value: also known as a Hash Function (or hashing algorithm, Hash Function, english: Hash Function), is a method of creating a small digital "fingerprint" from any kind of data. Hash functions, which compress messages or data into digests to reduce the amount of data, fix the format of the data, shuffle the data, and recreate a fingerprint called a hash value (hash values, hash sums, or hashes), which is typically represented by a short string of random letters and numbers.

A hash table: also called a Hash table (Hash table), is a data structure that is directly accessed from a Key value (Key value). That is, it accesses the record by mapping the key value to a location in the table to speed up the lookup. This mapping function is called a hash function and the array of stored records is called a hash table. The method comprises the steps of giving a table M, having a function f (key), substituting a function into any given key value key, and if an address recorded in the table and containing the key can be obtained, calling the table M as a hash table, and using the function f (key) as a hash function.

merkle-tree: a binary hash tree having a height is a tree for storing hash values, and at the bottom level, as with a hash list, data is divided into small data segments, corresponding hash values and their correspondences are calculated, two adjacent hash values are combined into a string, and then the hash values of the string are calculated, so that each two hash values are married to obtain a "sub-hash value". If the total number of the hash values of the bottom layer is singular, a single hash value is necessarily generated at last, and the single hash value is directly hashed, so that the sub-hash value can be obtained. And sequentially pushing up, and obtaining a smaller number of new first-level hash values in the same way, so that an inverted tree is necessarily formed finally, and a root hash value is left at the position of the tree root at the layer, which is called as a root node of the hash tree.

Database clustering: refers to a node set composed of some data nodes in a distributed database. The data node refers to a server or a process storing some part of data in the database cluster.

Data migration: data transfer is performed among a plurality of nodes or clusters, for example, data included in each data node in an old database cluster is completely migrated to a data node included in a new database cluster. The implementation of data migration can be divided into 3 phases: preparation before data migration, implementation of data migration and inspection after data migration. The inspection after the data migration refers to the inspection of the data consistency of the new database cluster and the old database cluster: when data migration is performed between the new database cluster and the old database cluster, after the data migration is completed, whether the data migration of the new database cluster and the data migration of the old database cluster are correct or not needs to be compared, whether the data migration of the new database cluster and the data migration of the old database cluster are the same or not needs to be compared, the data consistency of the new database cluster and the old database cluster is checked, and the data inspection result is an important basis for judging whether the new database cluster can be started formally or not.

In the present specification, a data consistency checking method for a distributed database cluster is provided, and the present specification relates to a data consistency checking apparatus for a distributed database cluster, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.

Fig. 1 is a flowchart illustrating a data consistency checking method for a distributed database cluster according to an embodiment of the present specification, which specifically includes the following steps:

step 102: under the condition that the data migration from the first database cluster to the second database cluster is detected to be completed, a first node hash tree of the first database cluster and a second node hash tree of the second database cluster are built according to a first hash tree building rule, wherein the first hash tree building rule comprises the following steps: and constructing a node hash tree according to the hash value of the data stored in the data node included in the database cluster.

In practical applications, after migrating data from a first database cluster to a second database cluster, it is necessary to determine whether the data of the second database cluster is consistent with the data of the first database cluster, and since the data size is huge, performing comparison data by data is a time-consuming process, and thus performing comparison of full data to determine whether the data of the two database clusters are consistent is impractical. And a mode of extracting partial data from the full data for comparison is adopted, the problem that the data which is not extracted possibly has inconsistency on certain probability is solved, and the accuracy rate of checking the data consistency of the two database clusters is low.

In order to quickly and accurately check whether data of two database clusters are consistent, the present specification provides a data consistency check method for a distributed database cluster, where when it is detected that data migration from a first database cluster to a second database cluster is completed, a first node hash tree of the first database cluster and a second node hash tree of the second database cluster may be first constructed according to a first hash tree construction rule, where the first hash tree construction rule includes: constructing a node hash tree according to hash values of data stored in data nodes included in the database cluster; and then constructing a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule comprises: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree; and then comparing the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree, and determining that the data of the first database cluster and the second database cluster are consistent under the condition that the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree are consistent. The root node of the database cluster hash tree can represent the full data included by the database cluster, and the full data included by the two database cluster hash trees can be quickly and accurately determined to be consistent only by comparing whether the root nodes of the two database cluster hash trees are consistent or not, so that the full data is prevented from being sampled, the possibility of inconsistency of the data which are not sampled is avoided, and the accuracy rate of checking the data consistency of the two database clusters is greatly improved.

It should be noted that the database cluster includes at least two databases, each database includes at least one data node, and each data node is used for storing data. The data migration is completed, whether the data in the two database clusters are consistent needs to be checked, and a new database cluster can be formally started subsequently under the condition that the data are consistent.

Specifically, the first database cluster refers to an old database cluster storing data to be migrated, and the second database cluster refers to a new database cluster receiving the data to be migrated, that is, the data stored in the first database cluster is migrated to the second database cluster, and when all the data in the first database cluster is migrated to the second database cluster, it is determined that the data migration from the first database cluster to the second database cluster is completed.

In an optional implementation manner of this embodiment, in order to ensure that the tree structures of the first node hash tree constructed for the data nodes included in the first database cluster and the second node hash tree constructed for the data nodes included in the second database cluster are the same, the same hash tree construction rule (i.e., the first hash tree construction rule) should be adopted to construct the node hash tree. In specific implementation, the first node hash tree may be constructed according to hash values of data stored by data nodes included in the first database cluster, and the second node hash tree may be constructed according to hash values of data stored by data nodes included in the second database cluster.

The first node hash tree is constructed according to the hash value of the data stored in the data node included in the first database cluster, and the specific implementation process may be as follows:

acquiring first data stored by a data node included in a first database cluster;

dividing the first data according to a first length to obtain at least two first data segments;

and constructing a first node hash tree by taking the hash values of the data of the at least two first data segments as leaf nodes.

Specifically, the data node may be any data node in the first database cluster, and since the first database cluster includes at least two data nodes, the hash tree of the corresponding first node should be constructed for each data node in the at least two data nodes according to the above operation steps. That is, a first database cluster includes several data nodes, and several first node hash trees are constructed.

In addition, the data node may store data in the form of a table, and for any data node, the data in the table in the data node may be acquired, and then the data in the table may be divided according to a first length, which may be set in advance, to obtain at least two first data segments, where the first length refers to a length of the divided data, and for example, the first length may be 128.

The second node hash tree is constructed according to the hash value of the data stored in the data node included in the second database cluster, and the specific implementation process may be as follows:

acquiring second data stored by data nodes included in a second database cluster;

dividing the second data according to the first length to obtain at least two second data segments;

and constructing a second node hash tree by taking the hash values of the data of the at least two second data segments as leaf nodes.

Specifically, the data node may be any data node in the second database cluster, and since the second database cluster includes at least two data nodes, the corresponding second node hash tree should be constructed for each data node in the at least two data nodes according to the above operation steps. That is, the second database cluster includes several data nodes, and several second node hash trees are constructed.

It should be noted that, data stored by data nodes included in the first database cluster and the second database cluster are divided by the first length, so that the data stored by the data nodes included in the first database cluster and the second database cluster are divided in the same manner.

In an optional implementation manner of this embodiment, the hash values of the data of the at least two first data segments are used as leaf nodes to construct the first node hash tree, and a specific implementation process may be as follows:

calculating a hash value of data of each first data segment by a first hash algorithm aiming at each first data segment of at least two first data segments;

taking at least two hash values obtained by calculation as leaf nodes of the ith layer, and carrying out first combination operation on the hash values of the leaf nodes of the ith layer to obtain leaf nodes of the (i + 1) th layer, wherein i is greater than or equal to 1;

determining whether the number of leaf nodes of the (i + 1) th layer is equal to 1, if so, determining the leaf nodes of the (i + 1) th layer as root nodes; if not, increasing the value i by 1, and returning to the operation step of executing the first combination operation on the hash value of the leaf node of the ith layer to obtain the leaf node of the (i + 1) th layer;

and constructing a first node hash tree according to the obtained leaf nodes and root nodes of each layer, and storing the corresponding relation between the root nodes and the data node identification.

Specifically, the first hash Algorithm is a preset hash Algorithm for calculating a hash value of the data of the first data segment, and for example, the first hash Algorithm may be an MD5(MD5 Message-Digest Algorithm, information Digest) Algorithm. The first combination operation refers to a rule for combining the hash values of the leaf nodes included in the ith layer and performing operation after combination, and the rule can be set in advance.

It should be noted that, in order to facilitate subsequent data nodes with inconsistent positioning data, the data nodes need to be associated with the first node hash tree, so in this specification, the root node of the first node hash tree and the identifier of the corresponding data node are stored.

The hash value of the leaf node on the ith layer is subjected to first combination operation to obtain the leaf node on the (i + 1) th layer, and the specific implementation process can be as follows:

Specifically, the first grouping rule is a rule for combining hash values of leaf nodes included in the ith layer, for example, two hash values of leaf nodes included in the ith layer are combined. The target operation refers to a rule for performing an operation on the hash values of the combined leaf nodes, such as performing an exclusive or operation, or an addition operation, or a subtraction operation on every two leaf nodes.

For example, fig. 2 is a schematic diagram of a construction process of a first node hash tree, where a first database cluster includes a data node 1, a data node 2, a data node 3, and a data node 4, and as shown in fig. 2, first data stored in the data node 1 is acquired, the first data is divided into 8 data segments, 8 hash values such as a1-A8 are obtained by calculating the 8 data segments, a1-A8 is used as a leaf node of a layer 1, leaf nodes of the layer 1 are combined in pairs in sequence (a1 and a2 combination, A3 and a4 combination, a5 and A6 combination, and a7 and A8 combination), and the hash values of two leaves in each combination are subjected to an exclusive or operation to obtain a leaf node of the layer 2; the number of the leaf nodes on the 2 nd layer is 4, but not 1, so that the leaf nodes on the 2 nd layer are combined pairwise in sequence, and the hash values of the two leaf nodes in each combination are subjected to XOR operation to obtain the leaf node on the 3 rd layer; because the number of the leaf nodes on the 3 rd layer is 2, but not 1, the leaf nodes on the 3 rd layer are sequentially combined pairwise, and the xor operation is performed on the hash values of the two leaf nodes in each combination to obtain the leaf node X1 on the 4 th layer, because the number of the leaf nodes on the 4 th layer is 1, the leaf node X1 on the 4 th layer is determined as a root node, and the leaf nodes on the 1 st layer to the 3 rd layer and the root node X1 on the 4 th layer form a first node hash tree corresponding to the data node 1. For the data node 2, the data node 3, and the data node 4 included in the first database cluster, the same method is used to respectively construct first node hash trees (the first node hash tree corresponding to the data node 2, the first node hash tree corresponding to the data node 3, and the first node hash tree corresponding to the data node 4) whose root nodes are respectively X2, X3, and X4 as shown in fig. 2, and store a correspondence table between the identifiers of the root node and the data node as shown in table 1 below.

TABLE 1 correspondence table between root node and data node's identity

Root node	Identification of data nodes
		X1	Data node	1
X2	Data node 2
		X3	Data node 3
X4	Data node 4

Correspondingly, the hash values of the data of at least two second data segments are used as leaf nodes to construct a second node hash tree, and the specific implementation process can be as follows:

calculating a hash value of data of each second data segment by a first hash algorithm aiming at each second data segment of at least two second data segments;

taking at least two hash values obtained by calculation as leaf nodes of a jth layer, and performing first combination operation on the hash values of the leaf nodes of the jth layer to obtain leaf nodes of a (j + 1) th layer, wherein j is greater than or equal to 1;

determining whether the number of the leaf nodes of the j +1 th layer is equal to 1, if so, determining the leaf nodes of the j +1 th layer as root nodes; if not, increasing j by 1, returning to execute the operation step of performing the first combination operation on the hash value of the leaf node of the jth layer to obtain the leaf node of the jth +1 layer;

and constructing a second node hash tree according to the obtained leaf nodes and root nodes of each layer, and storing the corresponding relation between the root nodes and the data node identification.

The hash value of the leaf node on the jth layer is subjected to a first combination operation to obtain the leaf node on the jth +1 layer, and the specific implementation process can be as follows:

dividing the leaf nodes of the jth layer according to a first grouping rule to obtain a plurality of second leaf node combinations;

and aiming at each second leaf node combination, performing target operation on leaf nodes included in the second leaf node combination to obtain leaf nodes of the j +1 th layer.

It should be noted that the second node hash tree is constructed by using the same hash tree construction rule as that for constructing the first node hash tree, and the specific construction process is similar, so that the first node hash tree and the second node hash tree which are constructed are ensured to have the same structure.

For example, fig. 3 is a schematic diagram of a construction process of a second node hash tree, where a second database cluster includes a data node 5, a data node 6, a data node 7, and a data node 8, and as shown in fig. 3, second data stored in the data node 5 is obtained, the second data is divided into 8 data segments, 8 hash values such as B1-B8 are obtained by calculating the 8 data segments, B1-B8 are used as leaf nodes of a layer 1, a second node hash tree (a second node hash tree corresponding to the data node 5) whose root node is Y1 is constructed by the same method as in the above example, and second node hash trees (a second node hash tree corresponding to the data node 6, a second node hash tree corresponding to the data node 7, and a second node hash tree corresponding to the data node 8) whose root nodes are Y2, Y3, and Y4 are constructed by the same method as shown in fig. 2, then, the correspondence table between the identifiers of the root node and the data node shown in table 1 is updated, so as to obtain the updated correspondence table between the identifiers of the root node and the data node shown in table 2 below.

TABLE 2 Table of correspondence between the updated root node and data node identifiers

Root node	Identification of data nodes
		X1	Data node	1
X2	Data node 2
		X3	Data node 3
X4	Data node 4
		Y1	Data node 5
Y2	Data node 6
		Y3	Data node 7
Y4	Data node 8

It is worth mentioning that, in this specification, a node hash tree may be synchronously constructed for each data node included in the first database cluster and the second database cluster, and each data node is concurrently calculated to construct a hash tree at a subsequent database cluster level, so that the efficiency of constructing the node hash tree is high, and further the efficiency of subsequently constructing the database cluster hash tree is improved, thereby facilitating subsequent rapid and accurate comparison of the database cluster hash trees to check the data consistency of the two database clusters.

Further, for the case of continuous writing, the migration manner of the database cluster may affect the manner of acquiring data in the data consistency check process, so as to acquire the first data stored by the data node included in the first database cluster, and the specific implementation process may be as follows:

under the condition that the migration mode is non-stop migration, determining a data migration completion time point, and acquiring first data stored by data nodes included in a first database cluster before the data migration completion time point;

and under the condition that the migration mode is stop-write migration, acquiring the total first data stored by the data nodes included in the first database cluster.

Specifically, the migration manner is a manner of whether the database cluster immediately receives other newly written data after the data migration is completed, the non-stop migration manner is a manner of immediately receiving other newly written data after the data migration is completed, and the stop migration manner is a manner of stopping receiving other newly written data after the data migration is completed. That is to say, if migration is not performed without stop writing, what needs to be performed with data consistency check is data before a data migration completion time point, and data to be newly written does not need to be checked later, so that in the case that the migration manner is migration without stop writing, it is necessary to first determine the data migration completion time point, and then acquire first data stored by a data node included in a first database cluster before the data migration completion time point, that is, first data stored by the data node included in the first database cluster may be screened through the data migration completion time point; if the migration is stop-write migration, it is the full amount of first data stored by the data nodes included in the first database cluster that needs to be subjected to data consistency check, so in the case that the migration manner is stop-write migration in this specification, the full amount of first data stored by the data nodes included in the first database cluster may be directly acquired, and the first data does not need to be screened through the time point.

Correspondingly, second data stored by the data node included in the second database cluster is obtained, and the specific implementation process may be as follows:

determining a data migration completion time point under the condition that the migration mode is non-stop migration, and acquiring second data stored by data nodes included in a second database cluster before the data migration completion time point;

and under the condition that the migration mode is write-stop migration, acquiring the total second data stored by the data nodes included in the second database cluster.

Step 104: constructing a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule comprises: and constructing the database cluster hash tree according to the hash value of the root node of the node hash tree.

Specifically, on the basis of constructing a first node hash tree of the first database cluster and a second node hash tree of the second database cluster according to the first hash tree construction rule, further, the first database cluster hash tree and the second database cluster hash tree are constructed according to the second hash tree construction rule.

In an optional implementation manner of this embodiment, in order to ensure that the tree structures of the first database cluster hash tree constructed for the first database cluster and the second database cluster hash tree constructed for the second database cluster are the same, the same hash tree construction rule (i.e., the second hash tree construction rule) should be adopted to construct the database cluster hash tree. Since the plurality of first node hash trees can be constructed for each data node included in the first database cluster through the above step 102, and the plurality of second node hash trees can be constructed for each data node included in the second database cluster. Therefore, when the database cluster hash tree is specifically constructed, the first database cluster hash tree can be constructed according to the hash value of the root node of each first node hash tree, and the second database cluster hash tree can be constructed according to the hash value of the root node of each second node hash tree.

The first database cluster hash tree is constructed according to the hash value of the root node of each first node hash tree, and the specific implementation process can be as follows:

taking the hash value of the root node of the first node hash tree as a leaf node of a k layer;

determining whether the number of leaf nodes of the (k + 1) th layer is equal to 1, if so, determining the leaf nodes of the (k + 1) th layer as root nodes; if not, increasing k by 1, returning to execute the operation step of performing second combination operation on the hash value of the leaf node of the kth layer to obtain the leaf node of the kth +1 layer;

and constructing and obtaining a first database cluster hash tree according to the obtained leaf nodes and root nodes of each layer.

Specifically, the second combination operation refers to a rule for combining hash values of leaf nodes included in the kth layer and performing operation after the combination, and the rule may be set in advance. It should be noted that the first combining operation is used to construct the node hash tree, and the second combining operation is used to construct the database cluster hash tree, so the second combining operation and the first combining operation may be the same or different. In a specific implementation process, the second combination operation is performed on the hash value of the leaf node on the kth layer, and a specific process of obtaining the leaf node on the kth +1 layer is similar to the above-mentioned specific process of obtaining the leaf node on the i +1 layer by performing the first combination operation on the hash value of the leaf node on the ith layer, and this description is not repeated here.

Along with the above example, fig. 4 is a schematic diagram of a construction process of a first database cluster hash tree, and as shown in fig. 4, a root node X1 of a first node hash tree corresponding to a data node 1, a root node X2 of the first node hash tree corresponding to a data node 2, a root node X3 of the first node hash tree corresponding to a data node 3, and a root node X4 of the first node hash tree corresponding to a data node 4 are taken as leaf nodes of a layer 1, leaf nodes of the layer 1 are combined in pairs in sequence (X1 and X2, X3 and X4), and an exclusive or operation is performed on hash values of two leaf nodes in each combination to obtain leaf nodes of the layer 2; because the number of the leaf nodes on the 2 nd layer is 2, but not 1, the leaf nodes on the 2 nd layer are combined pairwise in sequence, and the xor operation is performed on the hash values of the two leaf nodes in each combination to obtain the leaf node G1 on the 3 rd layer, because the number of the leaf nodes on the 3 rd layer is 1, the leaf node G1 on the 3 rd layer is determined as a root node, and the leaf nodes on the 1 st layer to the 2 nd layer and the root node G1 on the 3 rd layer form a first database cluster hash tree.

Correspondingly, the second database cluster hash tree is constructed according to the hash value of the root node of each second node hash tree, and the specific implementation process can be as follows:

taking the hash value of the root node of the second node hash tree as a leaf node of the mth layer;

performing second combination operation on the hash value of the leaf node of the mth layer to obtain a leaf node of an m +1 th layer, wherein m is greater than or equal to 1;

determining whether the number of the leaf nodes of the (m + 1) th layer is equal to 1, if so, determining the leaf nodes of the (m + 1) th layer as root nodes; if not, increasing m by 1, returning to execute the operation step of performing the second combination operation on the hash value of the leaf node of the mth layer to obtain the leaf node of the (m + 1) th layer;

and constructing a second database cluster hash tree according to the obtained leaf nodes and root nodes of each layer.

It should be noted that the same hash tree construction rule (i.e., the second hash tree construction rule) is used in the process of constructing the first database cluster hash tree and the process of constructing the second database cluster hash tree, so that the tree structures of the first database cluster hash tree and the second database cluster hash tree which are constructed are the same.

Along the above example, fig. 5 is a schematic diagram of a construction process of a second database cluster hash tree, and as shown in fig. 5, a root node Y1 of the second node hash tree corresponding to the data node 5, a root node Y2 of the second node hash tree corresponding to the data node 6, a root node Y3 of the second node hash tree corresponding to the data node 7, and a root node Y4 of the second node hash tree corresponding to the data node 8 are used as leaf nodes of the layer 1, and the same method as that in the above example is used for leaf nodes of the layer 1 to construct the second database cluster hash tree whose root node is G2.

Step 106: and comparing the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree, and determining that the data of the first database cluster and the second database cluster are consistent under the condition that the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree are consistent.

Specifically, on the basis of constructing the first database cluster hash tree and the second database cluster hash tree according to the second hash tree construction rule, the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree are compared, and the data of the first database cluster and the data of the second database cluster are determined to be consistent under the condition that the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree are consistent.

It is worth noting that in this specification, a node hash tree may be first constructed based on the real full data of each data node, then, the hash trees of the two database clusters are constructed according to the node hash trees of the data nodes included by the two database clusters respectively, so that the full data included by the database clusters are ensured to participate in the construction of the hash trees of the database clusters, namely, the root node of the hash tree of the database clusters can represent the full data included by the database clusters, at the moment, whether the full data included by the hash trees of the two database clusters are consistent can be quickly and accurately determined only by comparing whether the root nodes of the hash trees of the two database clusters are consistent, thereby avoiding sampling the full data, and furthermore, the possibility of inconsistency of the non-sampled data is avoided, and the accuracy of checking the data consistency of the two database clusters is greatly improved.

Further, if the hash values of the root nodes of the hash tree of the first database cluster and the hash tree of the second database cluster are inconsistent, it is indicated that inconsistent data exists in the first database cluster and the second database cluster, and the inconsistent data nodes may be further located in this specification, so as to locate specific data segments in the data nodes with inconsistent data, so as to determine which data segment or data segments have a problem, and the specific implementation process may be as follows:

under the condition that the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree are not consistent, comparing the hash values of the leaf nodes included in the first database cluster hash tree and the second database cluster hash tree from the root nodes to the leaf nodes in sequence, and determining at least two first target leaf nodes;

determining at least two corresponding target data nodes according to the at least two first target leaf nodes;

acquiring a first target node hash tree corresponding to a target data node in a first database cluster, and acquiring a second target node hash tree corresponding to a target data node in a second database cluster;

sequentially from root nodes of the first target node hash tree and the second target node hash tree to leaf node directions, comparing hash values of the leaf nodes included in the first target node hash tree and the second target node hash tree, and determining a second target leaf node;

and determining the corresponding data segment according to the second target leaf node.

Specifically, the first target leaf node is a bottom layer (layer 1) leaf node included in the first database cluster hash tree and the second database cluster hash tree, and the hash values of the leaf nodes are different, so that the number of the first target leaf node is at least two, half of the first target leaf node is a leaf node in the first database cluster hash tree, and the other half of the first target leaf node is a leaf node in the second database cluster hash tree.

In addition, because the bottom leaf node of the database cluster hash tree is the root node of the node hash tree, after a first target leaf node (namely the root node of the node hash tree) is determined, at least two corresponding target data nodes (half of which is the data node of the first database cluster and the other half of which is the data node of the second database cluster) can be determined according to the corresponding relation between the previously stored root node and the identification of the data node, so as to obtain the first node hash tree and the second node hash tree corresponding to the target data node, then the bottom (layer 1) leaf node (namely the second target leaf node) with inconsistent hash values in the target node hash tree can be positioned by comparing the two target node hash trees, and the bottom leaf node of the node hash tree is obtained by calculation after the data of the data nodes are segmented, each of the bottom leaf nodes corresponds to a data segment. Therefore, after the second target leaf node is determined, the data segment in the corresponding data node can be determined according to the second target leaf node, that is, the data segment specifically positioned to the data node with inconsistent data is realized.

The first database cluster hash tree and the second database cluster hash tree are assumed to comprise p layers of leaf nodes, and p is greater than or equal to 1; comparing the hash values of the leaf nodes included in the first database cluster hash tree and the second database cluster hash tree to determine a first target leaf node, wherein the specific implementation process may be as follows:

comparing the hash values of the leaf nodes of the p-th layer in the first database cluster hash tree and the second database cluster hash tree;

self-subtracting 1 from p, judging whether p is equal to 0, and if so, determining a third target leaf node as a first target leaf node; if not, taking the leaf node under the third target leaf node as the leaf node of the p-th layer, and returning to execute the operation step of comparing the hash values of the leaf nodes of the p-th layer in the first database cluster hash tree and the second database cluster hash tree.

It should be noted that, the hash values of the leaf nodes included in the first target node hash tree and the second target node hash tree are compared, a specific implementation process of the second target leaf node is determined to be similar to the hash values of the leaf nodes included in the first database cluster hash tree and the second database cluster hash tree, and a specific implementation process of at least two first target leaf nodes is determined to be similar, which is not repeated herein. In addition, the data nodes with inconsistent data are determined by comparing leaf nodes included in hash trees of different database clusters one by one, so that the complexity of the data nodes with inconsistent data is determined to be O (log (n)), wherein n is the number of the data nodes in the first database cluster (or the second database cluster); moreover, the leaf nodes included in the hash trees of different nodes also need to be compared one by one to determine the data segments in the data nodes with inconsistent data, so that the complexity of the data segments in the data nodes with inconsistent data is determined to be O (log (f)), where f is the number of the data segments obtained by segmenting the data in the data nodes.

Following the above example, fig. 6 is a schematic diagram of a comparison process between a first database cluster hash tree and a second database cluster hash tree, as shown in fig. 6, where the first database cluster hash tree and the second database cluster hash tree each include 2 layers of leaf nodes, and in the case where hash values of a root node G1 of the first database cluster hash tree and a root node G2 of the second database cluster hash tree are not consistent, hash values of leaf nodes of layer 2 in the first database cluster hash tree and the second database cluster hash tree are compared sequentially downward, assuming that leaf nodes having different hash values among leaf nodes of layer 2 in the first database cluster hash tree and the second database cluster hash tree are leaf nodes on the left side, then determining leaf nodes on the left side of layer 2 in the first database cluster hash tree and the second database cluster hash tree as third target leaf nodes, since 2 is 1 from minus 1 and is not equal to 0, the leaf nodes under the leaf nodes on the left side of the layer 2 in the first database cluster hash tree and the second database cluster hash tree are used as the leaf nodes on the layer 1; then, the hash values of the leaf nodes at the level 1 in the first database cluster hash tree and the second database cluster hash tree are compared, assuming that the leaf nodes having different hash values in the leaf nodes at the level 1 in the first database cluster hash tree and the second database cluster hash tree are the leaf nodes X1 and Y1 at the left, the leaf nodes X1 and Y1 at the level 1 at the left in the first database cluster hash tree and the second database cluster hash tree are determined as the third target leaf nodes, and the leaf nodes X1 and Y1 at the level 1 (i.e., the bottom) at the level 1 in the first database cluster hash tree and the second database cluster hash tree are determined as the first target leaf nodes since 1 is reduced from 1 to 0.

According to the above table 2, the target data nodes corresponding to the leaf nodes X1 and Y1 are determined as data node 1 and data node 5, and then the first node hash tree (the root node is the node hash tree of X1) corresponding to the data node 1 and the second node hash tree (the root node is the node hash tree of Y1) corresponding to the data node 5 are obtained.

Fig. 7 is a schematic diagram of a comparison process of a first node hash tree and a second node hash tree, as shown in fig. 7, a first node hash tree corresponding to a data node 1 and a second node hash tree corresponding to a data node 5 both include 3 layers of leaf nodes, it has been determined that hash values of a root node X1 of the first node hash tree and a root node Y2 of the second node hash tree are not consistent, and sequentially downward, hash values of leaf nodes of the 3 rd layer in the first node hash tree and the second node hash tree are compared, assuming that leaf nodes having different hash values in leaf nodes of the 3 rd layer in the first node hash tree and the second node hash tree are leaf nodes on the right, a leaf node on the right of the 3 rd layer in the first node hash tree and the second node hash tree is determined as a fourth target leaf node, and since 3 is decreased from 1 to 2 and is not equal to 0, a leaf node under the leaf node on the right of the 3 rd layer in the first node hash tree and the second node hash tree is taken as a second target leaf node hash tree Leaf nodes of layer 2; then, comparing hash values of leaf nodes at the layer 2 in the first node hash tree and the second node hash tree, and assuming that leaf nodes with different hash values in the leaf nodes at the layer 2 in the first node hash tree and the second node hash tree are left leaf nodes, determining the left leaf node at the layer 2 in the first node hash tree and the second node hash tree as a fourth target leaf node, wherein since 2 is reduced from 1 to 1 and is not equal to 0, the leaf node under the left leaf node at the layer 2 in the first node hash tree and the second node hash tree is taken as the leaf node at the layer 1; thereafter, the hash values of the leaf nodes at the layer 1 in the first and second node hash trees are compared, and assuming that the leaf nodes having different hash values in the leaf nodes at the layer 1 in the first and second node hash trees are the left leaf nodes a5 and B5, the left leaf nodes a5 and B5 at the layer 1 in the first and second node hash trees are determined as the fourth target leaf node, and the leaf nodes a5 and B5 at the left of the layer 1 (i.e., the bottom layer) in the first and second node hash trees are determined as the second target leaf node because 1 is self-reduced to 1 to 0.

Since the leaf nodes A5 and B5 correspond to the 5 th data segments in the data node 1 and the data node 5, respectively, the data inconsistency in the 5 th data segments in the data node 1 and the data node 5 can be accurately positioned finally.

It is worth to be noted that, in the specification, data nodes with inconsistent data can be determined by comparing database cluster hash trees layer by layer, and then data segments with inconsistent data in the data nodes can be determined by further comparing the node hash trees layer by layer, so that the data nodes with inconsistent data and the data segments in the data nodes can be quickly positioned, and the data in the data segments with inconsistent data can be quickly modified in the follow-up process.

It should be noted that, in the above method for quickly locating data nodes and data segments in data nodes with inconsistent data, the topology structures of the first database cluster and the second database cluster are required to be the same, the construction rules for constructing node hash trees for different nodes are required to be the same, and the construction rules for constructing database hash trees for different database clusters are also required to be the same, that is, the tree structures between the node hash trees finally constructed and between the database hash trees are required to be the same, and then, leaf nodes can be compared one by one to finally locate data nodes and data segments in data nodes with inconsistent data.

In addition, for the case that the first database cluster and the second database cluster are not in the same topological structure, because the data ranges of the first database cluster and the second database cluster are the same, that is, the data minimum range to the data maximum range of any database cluster are the same (in the case of MD5 hash algorithm, the range is 0-2^ 128), if the first database cluster and the second database cluster cut data according to a certain rule (the same rule), the obtained data segments should be consistent, then the obtained hash values are calculated for the data segments, and the hash values of the root nodes of the node hash tree constructed according to the hash values should be consistent, and further the root nodes of the database hash tree constructed should be consistent.

That is to say, for the case that the first database cluster and the second database cluster are not in accordance with the topological structures, the above method may also be adopted to determine whether the data of the database clusters are in accordance by constructing the node hash tree and the database cluster hash tree, and then comparing the hash values of the root nodes of the database cluster hash tree. However, due to the fact that the topological structures of the first database cluster and the second database cluster are not consistent, the tree structures between the constructed node hash trees and the tree structures between the database cluster hash trees may be different, namely the leaf nodes cannot be in one-to-one correspondence, the leaf nodes cannot be compared one to one, the accurate data inconsistent data nodes are located, further the data segments in the accurate data inconsistent data nodes cannot be located, only the rough range of the data inconsistency can be given, and then the data nodes with inconsistent data are further determined in other modes in the rough range of the data inconsistency.

The present specification provides a method for checking data consistency of a distributed database cluster, which may first construct a first node hash tree of a first database cluster and a second node hash tree of a second database cluster according to a first hash tree construction rule when detecting that data migration from the first database cluster to the second database cluster is completed, where the first hash tree construction rule includes: constructing a node hash tree according to hash values of data stored in data nodes included in the database cluster; and then constructing a first database cluster hash tree and a second database cluster hash tree according to a second hash tree construction rule, wherein the second hash tree construction rule comprises: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree; and then comparing the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree, and determining that the data of the first database cluster and the second database cluster are consistent under the condition that the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree are consistent. Under the condition, the node hash tree is constructed based on the real full data of each data node, then the hash trees of the two database clusters are constructed according to the node hash trees of the data nodes included by the two database clusters, and the fact that the full data included by the database clusters participate in the construction of the database cluster hash tree is guaranteed, so that the root node of the database cluster hash tree can represent the full data included by the database cluster, and whether the full data included by the two database cluster hash trees are consistent can be quickly and accurately determined only by comparing whether the root nodes of the two database cluster hash trees are consistent, sampling of the full data is avoided, the possibility that the non-sampled data are inconsistent is avoided, and the accuracy rate of checking the data consistency of the two database clusters is greatly improved. Furthermore, the data nodes with inconsistent data and the data segments in the data nodes can be finally positioned by comparing the leaf nodes of the hash tree one by one.

Corresponding to the above method embodiment, the present specification further provides an embodiment of a data consistency check apparatus for a distributed database cluster, and fig. 8 shows a schematic structural diagram of the data consistency check apparatus for a distributed database cluster provided in an embodiment of the present specification. As shown in fig. 8, the apparatus includes:

a first building module 802 configured to, upon detecting completion of data migration from a first database cluster to a second database cluster, build a first node hash tree of the first database cluster and a second node hash tree of the second database cluster according to a first hash tree building rule, wherein the first hash tree building rule includes: constructing a node hash tree according to hash values of data stored in data nodes included in the database cluster;

a second constructing module 804 configured to construct the first database cluster hash tree and the second database cluster hash tree according to a second hash tree constructing rule, wherein the second hash tree constructing rule includes: constructing a database cluster hash tree according to the hash value of the root node of the node hash tree;

a first determining module 806 configured to compare hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree, and determine that data of the first database cluster and the second database cluster are consistent if the hash values of the root nodes of the first database cluster hash tree and the second database cluster hash tree are consistent.

In an optional implementation manner of this embodiment, the first building module 802 is further configured to:

acquiring data stored by data nodes included in the database cluster;

In an optional implementation manner of this embodiment, the second building module 804 is further configured to:

In an optional implementation manner of this embodiment, the apparatus further includes:

a first comparison module configured to, when hash values of root nodes of the first database cluster hash tree and the second database cluster hash tree are not consistent, compare hash values of leaf nodes included in the first database cluster hash tree and the second database cluster hash tree in a direction from the root node to the leaf nodes in sequence, and determine at least two first target leaf nodes;

a second determining module configured to determine corresponding at least two target data nodes according to the at least two first target leaf nodes.

an obtaining module configured to obtain a first target node hash tree corresponding to a target data node in the first database cluster, and obtain a second target node hash tree corresponding to a target data node in the second database cluster;

a second comparison module configured to compare hash values of leaf nodes included in the first target node hash tree and the second target node hash tree from root nodes of the first target node hash tree and the second target node hash tree to leaf nodes in sequence, and determine a second target leaf node;

a third determination module configured to determine a corresponding data segment according to the second target leaf node.

In an optional implementation manner of this embodiment, the first database cluster hash tree and the second database cluster hash tree both include p layers of leaf nodes, where p is greater than or equal to 1; the first comparison module is further configured to:

The specification provides a data consistency checking device of a distributed database cluster, which can firstly construct a node hash tree based on the real full data of each data node, then, the hash trees of the two database clusters are constructed and obtained according to the node hash trees of the data nodes included by the two database clusters respectively, so that the full amount of data included by the database clusters are ensured to participate in the construction of the hash trees of the database clusters, therefore, the root node of the database cluster hash tree can represent the full data included by the database cluster, and at the moment, whether the full data included by the two database cluster hash trees are consistent can be quickly and accurately determined only by comparing whether the root nodes of the two database cluster hash trees are consistent, thereby avoiding sampling the full data, and furthermore, the possibility of inconsistency of the non-sampled data is avoided, and the accuracy of checking the data consistency of the two database clusters is greatly improved.

The foregoing is an exemplary scheme of the data consistency checking apparatus for a distributed database cluster according to this embodiment. It should be noted that the technical solution of the data consistency check apparatus for a distributed database cluster and the technical solution of the data consistency check method for a distributed database cluster belong to the same concept, and details of the technical solution of the data consistency check apparatus for a distributed database cluster, which are not described in detail, can be referred to the description of the technical solution of the data consistency check method for a distributed database cluster.

Fig. 9 illustrates a block diagram of a computing device 900 provided in accordance with an embodiment of the present description. Components of the computing device 900 include, but are not limited to, a memory 910 and a processor 920. The processor 920 is coupled to the memory 910 via a bus 930 and the database cluster 950 is configured to store data.

Computing device 900 also includes access device 940, access device 940 enabling computing device 900 to communicate via one or more networks 960. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 940 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 900, as well as other components not shown in FIG. 9, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 9 is for purposes of example only and is not limiting as to the scope of the description. Those skilled in the art may add or replace other components as desired.

Computing device 900 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 900 may also be a mobile or stationary server.

Wherein, the processor 920 is configured to execute the following computer-executable instructions:

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the data consistency checking method for the distributed database cluster belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the data consistency checking method for the distributed database cluster.

An embodiment of the present specification also provides a computer readable storage medium storing computer instructions that, when executed by a processor, are configured to implement the method of:

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the data consistency check method of the distributed database cluster belong to the same concept, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the data consistency check method of the distributed database cluster.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present disclosure is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present disclosure. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for this description.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the specification and its practical application, to thereby enable others skilled in the art to best understand the specification and its practical application. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A method of data consistency verification for a distributed database cluster, the method comprising:

2. The method of claim 1, constructing a node hash tree from hash values of data stored by data nodes comprised by the database cluster, comprising:

acquiring data stored by data nodes included in the database cluster;

3. The method of claim 2, constructing the node hash tree with hash values of the data of the at least two data segments as leaf nodes, comprising:

4. The method of claim 1, constructing a database cluster hash tree from hash values of root nodes of the node hash tree, comprising:

5. The method of claim 1, further comprising:

6. The method of claim 5, after determining the corresponding target data node from the first target leaf node, further comprising:

7. The method of claim 5, the first database cluster hash tree and the second database cluster hash tree each comprising p tiers of leaf nodes, wherein p is greater than or equal to 1;

8. The method of claim 2, obtaining data stored by a data node comprised by the database cluster, comprising:

9. The method of claim 3, wherein performing a first combining operation on the hash values of the leaf nodes of the ith layer to obtain leaf nodes of an (i + 1) th layer comprises:

10. An apparatus for data consistency verification for a distributed database cluster, the apparatus comprising:

11. A computing device, comprising:

a memory and a processor;

12. A computer readable storage medium storing computer instructions which, when executed by a processor, carry out the steps of the method of data consistency checking for a distributed database cluster according to any one of claims 1 to 9.