CN115129779A

CN115129779A - Database synchronization method, device and readable medium

Info

Publication number: CN115129779A
Application number: CN202110312862.8A
Authority: CN
Inventors: 祝百万
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Cloud Computing Beijing Co Ltd
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2022-09-30

Abstract

The application discloses a database synchronization method, a database synchronization device and a readable medium, and relates to the field of databases. The method comprises the following steps: receiving a write command sent by a programmable gateway, wherein the programmable gateway is used for sending the write command to a database main instance and a copy instance; generating a digest value corresponding to the write command; writing the abstract value into a binary log file corresponding to the write command; and synchronizing the binary log file to the copy instance, wherein the copy instance is used for executing the write command corresponding to the digest value after matching the digest value with the received write command. After the write command is sent to the main instance and the copy instance through the programmable gateway, the main instance sends the abstract value corresponding to the write command to the copy instance, and the abstract length of the abstract value is smaller than the command length of the write command, so that the problems of large data interaction amount and large data transmission pressure caused by the fact that the complete write command needs to be transmitted between the main instance and the copy instance are solved, and the data interaction amount is reduced through the abstract value.

Description

Database synchronization method and device and readable medium

Technical Field

The embodiment of the application relates to the field of databases, in particular to a database synchronization method, a database synchronization device and a readable medium.

Background

In the field of databases, synchronizing data in a database from a host to a standby machine is a main way of disaster recovery. The database data of the standby machine and the database data of the host machine are kept consistent, and when the host machine data are damaged or the host machine crashes, the standby machine and the host machine can directly continue to work through the database on the standby machine. The host and the standby machine can be realized as physical equipment, and can also be realized as virtual equipment of a cloud.

Typically, a client makes data modifications to a database on a host through database transactions. After the host receives a plurality of database transactions submitted by the client, the database transactions are written into a binary log (English: binlog) file, the binary log file is sent to the standby machine, the standby machine can obtain corresponding database transactions by analyzing the binary log file, and the database transactions are executed on the standby machine so as to ensure that the database of the host is consistent with the database of the standby machine.

However, when the host is equipped with a plurality of standby machines in the above manner, the cpu of the host needs to perform a large amount of calculation to transmit the binlog file to each standby machine, which causes a large data processing pressure on the host and affects the efficiency of disaster recovery.

Disclosure of Invention

The embodiment of the application provides a database synchronization method, a database synchronization device and a readable medium, which can improve the efficiency of data interaction between a main database and a standby database and reduce the data pressure of data transmission between the main database and the standby database. The technical scheme is as follows:

in one aspect, a method for synchronizing a database is provided, and is applied to a database master instance, where the method includes:

receiving a write command sent by a programmable gateway, wherein the programmable gateway is used for sending the write command to the database main instance and the replica instance;

generating a digest value corresponding to the write command, wherein the digest length of the digest value is smaller than the command length of the write command;

writing the abstract value into a binary log file corresponding to the write command;

and synchronizing the binary log file to the copy example, wherein the copy example is used for executing the write command corresponding to the digest value after matching the digest value with the received write command.

In another aspect, a database synchronization method is provided, and is applied to a replica instance, where the method includes:

receiving a write command sent by a programmable gateway, wherein the programmable gateway is used for sending the write command to a database main instance and the copy instance;

writing the write command into a playback file, wherein the playback file is used for controlling the execution of the write command;

receiving a binary log file synchronized with the database main instance, wherein the binary log file comprises a summary value, the summary value is a summary generated by the database main instance based on a received write command, and the summary length of the summary value is smaller than the command length of the write command;

and after the abstract value is matched with the write command in the replay file, executing the write command corresponding to the abstract value.

In another aspect, an apparatus for synchronizing a database is provided, the apparatus including:

the receiving module is used for receiving a write command sent by a programmable gateway, and the programmable gateway is used for sending the write command to the database main instance and the copy instance;

the generating module is used for generating a digest value corresponding to the write command, and the digest length of the digest value is smaller than the command length of the write command;

the writing module is used for writing the abstract value into a binary log file corresponding to the writing command;

and the copy instance is used for executing the write command corresponding to the digest value after matching the digest value with the received write command.

the receiving module is used for receiving a write command sent by a programmable gateway, and the programmable gateway is used for sending the write command to a database main instance and the copy instance;

the writing module is used for writing the writing command into a replay file, and the replay file is used for controlling and executing the writing command;

the receiving module is further configured to receive a binary log file synchronized with the database master instance, where the binary log file includes a digest value, the digest value is a digest generated by the database master instance based on a received write command, and a digest length of the digest value is smaller than a command length of the write command;

and the execution module is used for matching the abstract value with the write command in the replay file and then executing the write command corresponding to the abstract value.

In another aspect, a computer device is provided, and the computer device includes a processor and a memory, where the memory stores at least one program, and the at least one program is loaded by the processor and executed to implement the database synchronization method according to any of the embodiments of the present application.

In another aspect, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the method of synchronization of databases as described in any of the embodiments of the present application.

In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the database synchronization method described in any of the above embodiments.

The beneficial effects that technical scheme that this application embodiment brought include at least:

after the write command is sent to the main instance and the copy instance through the programmable gateway, the main instance sends the abstract value corresponding to the write command to the copy instance to indicate the copy instance to execute the corresponding write command, so that the synchronization between the main instance and the copy instance is realized.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an implementation environment of a database synchronization method according to an exemplary embodiment of the present application;

FIG. 2 is a flow chart of a method for synchronizing databases provided by an exemplary embodiment of the present application;

fig. 3 is a schematic diagram of the operating principle of a programmable gateway provided based on the embodiment shown in fig. 2;

fig. 4 is a schematic diagram of the operating principle of a programmable gateway provided based on the embodiment shown in fig. 2;

FIG. 5 is a flow chart of a method for synchronizing databases provided in another exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of an asynchronous replication approach provided based on the embodiment shown in FIG. 5;

FIG. 7 is a schematic diagram of a semi-synchronous manner provided based on the embodiment shown in FIG. 5;

FIG. 8 is a schematic diagram of a multi-threading approach provided based on the embodiment shown in FIG. 5;

FIG. 9 is a schematic diagram of a group replication approach provided based on the embodiment shown in FIG. 5;

FIG. 10 is a flow chart of a method for synchronizing databases provided by another exemplary embodiment of the present application;

FIG. 11 is a flow chart of a method for synchronizing databases provided by another exemplary embodiment of the present application;

FIG. 12 is a schematic diagram of a database distribution structure provided based on the embodiment shown in FIG. 11;

fig. 13 is a schematic diagram of an operation principle of a programmable gateway provided based on the embodiment shown in fig. 11;

FIG. 14 is a schematic structural diagram of a data write request provided based on the embodiment shown in FIG. 11;

fig. 15 is a schematic structural diagram of a MySQL message provided based on the embodiment shown in fig. 11;

FIG. 16 is a block diagram of a database synchronization apparatus according to an exemplary embodiment of the present application;

fig. 17 is a block diagram of a database synchronization apparatus according to another exemplary embodiment of the present application;

FIG. 18 is a block diagram of a computer device provided in an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

First, the following terms referred to in the embodiments of the present application are explained:

cloud technology (Cloud technology): the management method is a management technology for unifying series resources such as hardware, software, network and the like in a wide area network or a local area network to realize the calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied in the cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

Database (Database): in short, it can be regarded as an electronic file cabinet, i.e. a place for storing electronic files, and a user can add, query, update, delete, etc. to the data in the files. A "database" is a collection of data stored together in a manner that can be shared with multiple users, has as little redundancy as possible, and is independent of applications.

A Database Management System (DBMS) is a computer software System designed for managing a Database, and generally has basic functions such as storage, interception, security assurance, and backup. The database management system may classify the database according to the database model it supports, such as relational, XML (Extensible Markup Language); or classified according to the type of computer supported, e.g., server cluster, mobile phone; or classified according to the Query Language used, such as Structured Query Language (SQL), XQuery; or by performance impulse emphasis, e.g., maximum size, maximum operating speed; or other classification schemes. Regardless of the manner of classification used, some DBMSs are capable of supporting multiple query languages across categories, for example, simultaneously.

Embodiments of the present application relate to a primary database (also referred to as a primary instance) and a backup database (also referred to as a replica instance).

Main example: in the database mainly used in the practical application, that is, in the database example, the master node is used for undertaking the read-write task. The read-write task created by the client is completed on the master instance first, and for the write task, the master instance also needs to synchronize the write task to the slave instance. The main instance may be implemented on a physical host or a cloud virtual host, which is not limited in the embodiments of the present application.

Copy example (repica): refers to an instance having a data synchronization relationship with a master instance, wherein the replica instance includes at least one of a slave instance, a read-only instance, and the like. The slave instance is an embodiment which is hung behind the main instance and is used as a standby machine of the main instance, a data synchronization relation is kept between the slave instance and the main instance, and when the main instance breaks down, the slave instance is automatically switched to provide read-write service for the client; the read-only instance is an instance which can provide read-only service to the client after being hung on the main instance, but does not participate in failover when the main instance fails.

Optionally, the copy example may be implemented on a physical standby machine, or may also be implemented on a cloud virtual standby machine, which is not limited in the embodiment of the present application.

The programmable gateway: the programmable gateway is a relay device between a client and a database instance and is used for forwarding a write command between the client and the database instance. In some embodiments, a read command sent by a client may be directly transmitted to a main instance, the main instance feeds back a data read result, and a write command sent by the client is forwarded to the main instance by a programmable gateway, and the main instance executes a write operation; or, the read-write command sent by the client is forwarded by the programmable gateway, that is, the programmable gateway needs to judge the operation type of the command sent by the client, and determines the target address of the command forwarding based on the judgment result of the operation type.

In the related art, the forwarding rule of the gateway device is relatively fixed, and usually, when the gateway receives an instruction, the gateway determines a target address from the instruction and forwards the instruction to the target address. In the embodiment of the present application, since the gateway device forwards the command between the client and the master-slave instance, different forwarding situations exist for different database service types, such as: the command forwarding method only forwards the command to the main instance, and forwards the command to the main instance and the duplicate instance simultaneously, and the like, so that a programmable gateway needs to be set, and command forwarding rules are set at an application layer of the programmable gateway, so that the command is forwarded to the main instance and the duplicate instance based on the forwarding rules.

Database transaction: database transactions are logical work units in database operation, and generally, one database transaction can perform one operation on a database, such as: add operations, modify operations, delete operations, etc., optionally, a database transaction is a strip or set of Structured Query Language (SQL) statements.

Binary log file: refers to a file that stores database transactions in binary form.

First, referring to fig. 1, fig. 1 is a schematic diagram of an implementation environment of a database synchronization method according to an exemplary embodiment of the present application, and as shown in fig. 1, an operation environment of the database synchronization method includes: client 110, programmable gateway 120, host 130, and standby 140;

the client 110 is configured to initiate a read-write request for data, and when the client 110 initiates the read-write request, the read-write request is sent to the programmable gateway 120, illustratively, a data display interface is displayed in the client 110, and a user can perform operations such as adding, deleting, modifying, querying and the like on data in the data display interface, taking an adding operation as an example, after the user selects a data adding control in the data display interface and types in data that needs to be added to a database, the client 110 sends a data write request to the programmable gateway 120.

The programmable gateway 120 is used for command forwarding between the client 110 and the host 130 and the standby 140. Illustratively, when the programmable gateway 120 receives a data write request sent by the client 110, it determines a target address corresponding to an instance that needs to perform a data write operation, and forwards a write command to the target address. In some embodiments, for write commands, programmable gateway 120 forwards the write commands to the master instance and the replica instance simultaneously.

The host 130 includes a master instance, that is, a master node for undertaking a read-write task, and when receiving a write command sent by the programmable gateway 120, the host 130 executes the write command and writes the write command into a binlog file, where when writing the write command into the binlog file, a digest value corresponding to the write command is obtained by performing digest extraction on the write command, and the digest value is written into the binlog file. Since the digest length of the digest value is smaller than the command length of the write command itself, the amount of data transfer is reduced in the data transfer between the main instance and the replica instance.

The host 130 synchronizes the binlog file to the standby 140. The standby machine 140 includes a copy instance, which is a slave instance, or a read-only instance. Since programmable gateway 120 sends write commands to both the master instance and the replica instance at the same time when sending the write commands, the replica instance writes the write commands to a replay (replay) file when receiving the write commands. And when the copy example receives the binlog file synchronized by the main example, acquiring the digest value corresponding to the write command from the binlog file, matching the digest value with the write command stored in the replay file, and executing the successfully matched write command when the write command in the replay file is matched with the digest value.

Thereby ensuring that the same write command is executed in both the primary and replica instances while maintaining synchronization.

It should be noted that the client 110, the programmable gateway 120, the host 130 and the standby machine 140 are connected to each other through a communication network, which may be a wired network or a wireless network, and the communication network includes a network in the form of a local area network, a wide area network, and the like, which is not limited in this embodiment of the present invention.

With reference to the terms involved in the embodiments of the present application and the implementation environment of the embodiments of the present application, as shown in fig. 2, fig. 2 is a flowchart of a database synchronization method provided in an exemplary embodiment of the present application, and a main example of the database synchronization method applied in a host shown in fig. 1 is taken as an example for description, where the database synchronization method includes:

step 201, receiving a write command sent by a programmable gateway.

The programmable gateway is used for sending write commands to the database main instance and the copy instance. The programmable gateway is a gateway device for forwarding commands between the client and the database.

In the related art, the forwarding rule of the gateway device is relatively fixed, and usually, when receiving an instruction, the gateway device determines a target address from the instruction and forwards the instruction to the target address. In the embodiment of the present application, since the gateway device forwards the command between the client and the master-slave instance, different forwarding situations exist for different database service types, such as: the command forwarding method only forwards the command to the main instance, and forwards the command to the main instance and the duplicate instance simultaneously, and the like, so that a programmable gateway needs to be set, and command forwarding rules are set at an application layer of the programmable gateway, so that the command is forwarded to the main instance and the duplicate instance based on the forwarding rules.

In some embodiments, after receiving a data write request sent by a client, the programmable gateway matches the command forwarding rule based on the data write request, determines an instance that needs to receive the write command based on a matching result, and forwards the write command to a main instance and a copy instance that need to receive the write command.

The programmable gateway provided in the embodiment of the present application is introduced, and for an exemplary purpose, refer to fig. 3 as follows. After the command forwarding rule 300 is compiled, inputting the command forwarding rule 300 into a compiler 310 in the programmable gateway, and compiling the command forwarding rule 300 by the compiler 310 to obtain a hardware logic control file 321 and an interface control file 322; the interface control file 322 is understood by the control plane 331, so as to perform corresponding rule control, the control plane 331 is implemented as a Central Processing Unit (CPU) in the programmable gateway, the runtime module 332 is configured to specifically execute rule control of the interface control file 322 based on the understanding of the interface control file 322 by the control plane 331, and the execution process of the interface control file 322 reaches the driver layer 333, and finally controls the hardware 334 in the programmable gateway together with the hardware logic control file 321. Such as: and controlling an input/output interface in the programmable gateway to send commands and the like.

In some embodiments, the command forwarding rule 300 is written in the P4 language, wherein the rule control process of the interface control file 322 refers to fig. 4. As shown in fig. 4, taking a write operation as an example for explanation, the programmable parser 400 parses a received data write request, so that after passing through the pre-switching stage 410 and the post-switching stage 420, a rule matching result corresponding to the data write request is reflowed to the pre-switching stage 410 and the post-switching stage 420, and the rule matching result is processed through the control plane 430.

The pre-switching stage 410 and the post-switching stage 420 both include matching logic and action logic, and the matching logic includes logic matching processes such as a hybrid lookup table, a counter, a flow meter, and a universal hash table of a Static Random-Access Memory (SRAM) and a Ternary Content Addressable Memory (TCAM); action Logic includes performing Logic processes for standard Boolean sum Arithmetic operations, header modification operations, hash operations, etc. of Arithmetic Logic Units (ALUs). That is, the pre-switching stage mainly realizes the matching between the data writing request and the forwarding rule; the post-switching phase is mainly to realize the forwarding of the write command to each instance.

Illustratively, the command forwarding rule written in the P4 language defines an entry procedure (corresponding to the pre-switching stage) and an exit procedure (corresponding to the post-switching stage). Schematically, the portal is shown as a function:

wherein, key represents the matching domain; ipdv 4.dstaddr: lpm indicates that the matching field is the ip destination address of the data header of the data write request, lpm indicates that the matching pattern is Longest Prefix Match (Longest Prefix Match), and the lpm matching pattern can also be implemented as an exact matching pattern (exact) or a ternary matching pattern (ternary). actions represents a set of matching action types; ipv4_ forward represents a forwarding action, i.e. an egress action, that needs to be defined; drop represents a drop action, i.e., dropping a received data request; NoAction represents a null action, i.e., no further processing is performed; size 1024 represents the matching relationship terms that can be accommodated.

After the programmable gateway is matched with the forwarding rule corresponding to the data request sent by the client through the process, the command is forwarded based on the forwarding rule.

Illustratively, after the client sends the data write request to the programmable gateway, the programmable gateway matches the data write request through the above process, and determining the target instance corresponding to the data write request includes: master instance a, replica instance b, and replica instance c, sending write commands to master instance a, replica instance b, and replica instance c.

In some embodiments, the write command does not only indicate a command for adding data to the database, but also includes a command for deleting data and modifying data to the database, and the specific service type of the write command is not limited in the embodiments of the present application.

At step 202, a digest value corresponding to the write command is generated.

Wherein the digest length of the digest value is smaller than the command length of the write command. In some embodiments, when receiving a write command, the primary instance executes the write command first, and then writes the write command into the binlog file; or, the main example writes the write command into the binlog file first, and then executes the write command; or the master instance synchronizes the process of executing and writing write commands to the binlog file.

In some embodiments, in generating a digest value corresponding to a write command, the primary instance first determines a command length of the write command and digests the write command based on the command length.

Illustratively, in response to the command length being less than or equal to the length threshold, performing digest extraction on the write command to obtain a digest value; in response to the command length reaching (i.e., being greater than or equal to) the length threshold, the command fragment is intercepted from the write command, and the command fragment is abstracted to obtain an abstract value.

Wherein, when intercepting the command fragment from the write command, at least one of the following conditions is included:

firstly, intercepting a segment with a preset length from a preset position as a command segment;

secondly, intercepting a segment with a first preset length from a first preset position as a first command segment; intercepting a segment with a second preset length from a second preset position to serve as a second command segment; and combining the first command segment and the second command segment to obtain a command segment corresponding to the write command.

Illustratively, a first command segment with a first preset length is intercepted from the head position of the write command backward, a second command segment with a second preset length is intercepted from the tail position of the write command forward, and the first command segment and the second command segment are combined to obtain a command segment of the write command. Wherein the first preset length and the second preset length are equal or unequal.

In some embodiments, the combination manner of the first command segment and the second command segment includes an addition manner or a splicing manner, that is, the first command segment and the second command segment are added to obtain a command segment of the write command; or splicing the first command segment and the second command segment to obtain the command segment of the write command.

Illustratively, taking the length threshold as 1KB as an example, in order to improve the data transmission efficiency, if the command length of the write command is less than (or equal to) 1KB, the write command is directly summarized to obtain a summarized value; if the command length of the write command is larger than (or equal to) 1KB, adding the first 256 bytes and the last 256 bytes of the write command to obtain a command segment, and extracting a digest value from the command segment.

In some embodiments, the process of extracting the Digest value from the write command is a process of performing Digest calculation on the write command through an MD5 Message Digest Algorithm (MD5 Message-Digest Algorithm); similarly, the process of extracting the digest value from the command segment is a process of performing digest calculation on the command segment by using the MD5 algorithm.

It should be noted that, in the above intercepting manner of the command fragment, an example of intercepting one fragment and intercepting two fragments is taken as an example for description, in this embodiment of the present application, the command fragment may also be implemented as three, four, or more, which is not limited in this embodiment of the present application.

The MD5 algorithm is a cryptographic hash function that generates a 128-bit (16-byte) hash value (hash value) to determine that the write command received and executed by the main instance is consistent with the write command received and executed by the duplicate instance. Since the digest length of the hash value generated by the MD5 algorithm is smaller than the command length described above, the amount of data interaction between the master instance and the replica instance can be reduced by the hash value.

Step 203, writing the digest value into the binary log file corresponding to the write command.

The main instance writes the digest value into a binlog file, which is a binary file that synchronizes between the main instance and the replica instance.

Optionally, the binary log file includes a command padding bit, and the digest value is converted into a binary format and then written into the command padding bit, so as to obtain the binlog file.

In some embodiments, the binlog file further includes, in addition to the digest value corresponding to the write command, a transaction identifier, a command length, and the like, which is not limited in this embodiment of the present application.

Step 204, the binary file is synchronized to the replica instance.

In some embodiments, the master instance synchronizes the binlog file to the replica instance through the programmable gateway; alternatively, the primary instance synchronizes the binlog file directly to the replica instance through other routing devices.

The copy instance is used for matching the abstract value with the received write command and then executing the write command corresponding to the abstract value.

The programmable gateway forwards the write command to the main instance and the copy instance simultaneously when forwarding the write command, and the copy instance caches the write command and does not directly execute the write command when receiving the write command, and after the main instance sends the binlog file to the copy instance, the binlog file comprises the digest value corresponding to the write command executed by the main instance, namely the binlog file is used for indicating that the current main instance determines to execute the write command, so that the copy instance needs to be executed synchronously with the main instance on the write command, and after acquiring the digest value from the binlog file, the copy instance matches the digest value with the current write command to be executed, thereby determining the write command corresponding to the digest value from the write command to be executed to execute, and ensuring the synchronism between the main instance and the copy instance.

To sum up, according to the database synchronization method provided by the embodiment of the present application, after the write command is sent to the main instance and the copy instance through the programmable gateway, the main instance sends the digest value corresponding to the write command to the copy instance to instruct the copy instance to execute the corresponding write command, so that synchronization between the main instance and the copy instance is achieved.

In some embodiments, the binlog file also includes a transaction identification and a command length. Fig. 5 is a flowchart of a database synchronization method according to another exemplary embodiment of the present application, for example, the method is applied to a master instance in a host, as shown in fig. 5, and the method includes:

step 501, receiving a write command sent by a programmable gateway.

The programmable gateway is used for sending write commands to the database master instance and the replica instance. The programmable gateway is a gateway device for forwarding commands between the client and the database.

In some embodiments, after receiving a data write request sent by a client, the programmable gateway matches the command forwarding rule based on the data write request, determines an instance that needs to receive a write command based on a matching result, and forwards the write command to a main instance and a copy instance that need to receive the write command.

Step 502, determine the command length of the write command.

The command length is used to indicate the expression length of the write command, and illustratively, when the write command occupies 1KB, that is, corresponding to 1024 bytes, the command length is 1024 bytes. Converting the byte number 1024 into binary, and writing the binary into a binary log binlog file.

It should be noted that, in the present embodiment, the command length is uniformly expressed by the number of bytes as an example, in some embodiments, the command length may also be expressed by the number of bits, the number of kilobytes, and the like, and the expression manner of the command length is not limited in the embodiments of the present application.

Step 503, generating a digest value corresponding to the write command.

Wherein the digest length of the digest value is smaller than the command length of the write command.

Illustratively, in response to the command length being less than or equal to the length threshold, performing digest extraction on the write command to obtain a digest value; in response to the command length reaching (i.e., being greater than or equal to) the length threshold, the command fragments are intercepted from the write command, and the command fragments are summarized to obtain a summarized value.

Step 504, determine the transaction id corresponding to the write command.

The transaction identification is determined based on a generation order of the database transactions to which the write commands correspond. Database transactions are logical work units in database operation, and generally, one database transaction can perform one operation on a database, such as: add operations, modify operations, delete operations, etc., optionally one database transaction is one or a set of SQL statements.

Starting from mysql5.6, a Global Transaction Identifier (GTID), which is a Transaction Identifier referred to in the embodiments of the present application, is introduced on the master-slave synchronization. Due to the introduction of the GTID, the uniqueness of the database transaction can be ensured no matter which node the database transaction is generated at. The GTID is composed of two parts, namely source _ id and sequence _ id, wherein the source _ id represents the identification of the environment where the database node is located, such as: the GTID is generated by a main node, so that the uniqueness of the GTID and the unidirectional growth can be guaranteed.

After the GTID is generated on the main instance, it is recorded in the binlog file, and the binlog file is synchronized to the server (i.e. standby) corresponding to the replica instance and stored in the replay file relay log. And reading the database transaction corresponding to the GTID by the copy example, judging whether the database transaction is executed or not, if the database transaction is not executed, executing the database transaction, and writing the data into a database engine.

In the related art, on the basis of GTID, synchronization between a main instance and a replica instance is usually achieved by an asynchronous replication mode, a semi-synchronous mode, a multi-thread mode, or a group replication mode, and these four modes are briefly introduced:

1. asynchronous replication method

Illustratively, referring to FIG. 6, the asynchronous replication mode refers to write database transactions on the master instance 610 and feeds back packets directly to the client without waiting for the data in the binlog file to synchronize to the slave instance 620 and the slave instance 630. The client is not aware of whether there is a synchronization from the instance to the database transaction at the time the reply packet is received. If the master instance fails at this point and the data happens not to be synchronized to the slave instance, there is a potential for the database transaction to be lost.

2. Semi-synchronous mode

Referring to fig. 7, the semi-synchronization mode means that after a binlog file on the master instance 710 is successfully written, the binlog file needs to be synchronized to the slave instance 720 and the slave instance 730, and the data is returned to the client after a packet returned from the slave instance is received, so that the data written by the database transaction is guaranteed to be stored on both machines.

3. Multithreading system

Referring to fig. 8, schematically, the multi-thread mode refers to parallel transmission without waiting for each other for non-conflicting database transactions. As shown in FIG. 8,

transactions

811 and 812 are transmitted using thread 801; transaction 821 and transaction 822 are transmitted using thread 802; transaction 831 and transaction 832 are transmitted using thread 803.

4. Group copy system

Referring to fig. 9, the group copy method is to send the SQL statement to be executed to multiple slave instances at the same time by the master instance 910, instead of sending the binlog file. As shown in FIG. 9, the master instance 910 sends SQL statements to both the slave instance 920 and the slave instance 930 simultaneously.

However, in the above modes 1 and 2, after the GTID exists, the consistency and reliability of data synchronization are ensured, but the data synchronization speed is slow, which may cause the increase of master-slave delay.

In the above-described modes 3 and 4, the speed of the shuttle between the master and the slave is increased, and the data amount is not changed. When the CPU of the master or slave is busy, the CPU needs to be further consumed, which leads to an increase in the utilization rate of the CPU, thereby causing a problem that the response speed is slow when a complex SQL statement is executed.

It should be noted that, the

above steps

503 and 504 are two parallel steps, and the step 503 may be executed first, the step 504 may be executed first, or the step 503 and the step 504 may be executed simultaneously.

Step 505, writing the abstract value, the command length and the transaction identifier into a binary log file.

The master instance writes the digest value, the command length, and the transaction identification into a binlog file, which is a binary file that synchronizes between the master instance and the replica instance.

In some embodiments, when the main instance executes the write operation corresponding to the write command, the write operation further corresponds to a self-increment identifier, and the self-increment identifier is used for indicating the operation sequence of the write operation. In some embodiments, the write operation itself includes at least two operational steps, such as: the write operation is implemented as a data increment operation, where the data to be incremented includes 3 pieces of sub data included in one piece of complete data, which are number data 003, name data "small a" and attribute data "woman", respectively, and then for the 3 pieces of sub data, based on the addition of the sequence number data, the name data and the attribute data, the process of adding the number data 003 corresponds to the self-increment flag 1, the process of adding the name data "small a" corresponds to the self-increment flag 2, and the process of adding the attribute data "woman" corresponds to the self-increment flag 3. The self-increment identifier is described by taking the example of adding one to increment one by one, and the increment mode of the self-increment identifier may also be other self-defining modes, such as: and adding two one by one, adding any value one by one, and the like, which is not limited in the embodiment of the present application.

And writing the corresponding relation between the write operation and the self-increment identification into a binary log file. In some embodiments, the correspondence between the sub-operations of the write operation and the incrementation identification is written to the binlog file.

Step 506, the binary file is synchronized to the replica instance.

In some embodiments, after the digest value is successfully matched with the write command, the command length of the write command needs to be compared with the command length in the binlog file, and when the comparison result indicates that the command lengths are consistent, the write command corresponding to the digest value is executed.

In some embodiments, the copy instance writes the write command into the replay delaylog file when receiving the write command sent by the programmable gateway, matches the digest value in the binlog file with the write command in the replay delaylog file when receiving the binlog file sent by the main instance, and executes the write command matched with the digest value.

The binlog file also comprises a transaction identifier, when the copy instance writes the write command into the relaylog file, an identification bit is set as control, and the identification bit is used for controlling the execution condition of the write command. And when the digest value in the binlog file is matched with the write command in the relaylog file, writing the transaction identifier corresponding to the digest value into an identification bit corresponding to the write command in the relaylog file, so that the write command is executed based on the identification bit.

To sum up, according to the database synchronization method provided in the embodiment of the present application, after the write command is sent to the main instance and the copy instance through the programmable gateway, the main instance sends the digest value corresponding to the write command to the copy instance to instruct the copy instance to execute the corresponding write command, so as to implement synchronization between the main instance and the copy instance.

According to the method provided by the embodiment, the GTID is filled into the binlog file, so that after the digest value is successfully matched with the write command, the GTID is used for controlling the execution process of the write command in the copy example, and the problem of low synchronization accuracy caused by chaos among the execution sequences of a plurality of write commands is avoided.

According to the method provided by the embodiment, the self-increment identification is filled into the binlog file, so that after the digest value is successfully matched with the write command, the write operations in the write command are sequentially executed based on the self-increment identification in the process that the copy instance executes the write command, and the problem that the write command comprises a plurality of write operations, and the execution sequence of the write operations is inconsistent with that of the main instance, so that the synchronization accuracy is low is avoided.

With reference to the above description, a method for synchronizing a database on a replica instance side is described, fig. 10 is a flowchart of a method for synchronizing a database according to another exemplary embodiment of the present application, taking as an example that the method is applied to a replica instance in a standby machine, as shown in fig. 10, the method includes:

step 1001, receiving a write command sent by the programmable gateway.

At step 1002, a write command is written to a playback file.

The playback file is used for the execution of the control of the hostile write command. That is, when receiving the write command, the replica instance does not execute the write command, but stores the write command in the replay file, and executes the write command after waiting for the binlog file indication of the main instance.

In some embodiments, the write command is written in command fill bits from the playback file. Setting an identification bit in the replay file to be a null value, wherein the identification bit is used for controlling the execution condition of the write command, and when the identification bit takes the null value, the write command circularly waits and is not executed temporarily; and when the flag bit has the GTID value, executing a write command.

Step 1003, receiving a binary log file synchronized with the database master instance, wherein the binary log file comprises a digest value.

The abstract value is generated by the database main instance based on the received write command, and the abstract length of the abstract value is smaller than the command length of the write command.

The copy instance obtains a digest value from the binlog file.

Step 1004, after matching the digest value with the write command in the playback file, executing the write command corresponding to the digest value.

In some embodiments, the copy instance extracts a reference digest value of the write command in the replay file, matches the extracted digest value in the binlog file with the reference digest value, and in response to a target digest value existing in the reference digest value being consistent with the extracted digest value in the binlog file, executes the write command corresponding to the target digest value. And the extraction mode of the reference abstract value is consistent with the extraction mode of the abstract value in the binlog file.

Optionally, a transaction identification GTID corresponding to the write command is also included in the binlog file. Filling a transaction identifier GTID to an identifier bit in response to the fact that a target abstract value in the reference abstract value is consistent with an abstract value extracted from the binlog file; and responding to the cycle to the write command in the replay file and the transaction identification GTID existing in the corresponding identification bit of the write command, and executing the write command.

It is noted that when the binlog file of the master instance is synchronized to the replica instance, no record can be found in the relaylog file with a GTID of null and a MD5 value consistent, and then the subsequent logic is executed in a master-slave semi-synchronous or asynchronous configuration.

In view of the above, a method for synchronizing databases provided in this embodiment of the present application is introduced from a system perspective, and please refer to fig. 11 schematically, which shows a flowchart of a method for synchronizing databases provided in an exemplary embodiment of the present application, and as shown in fig. 11, the method includes:

at step 1101, the programmable gateway receives a data write request sent by a client.

And when the client has a data writing requirement, sending a data writing request to the programmable gateway to request to write data contents in the database.

At step 1102, the programmable gateway determines a primary instance and a replica instance to receive a write command based on the data write request.

Illustratively, as shown in fig. 12, the databases are distributed in different racks, where the databases include different types of databases, such as a master example 1210, a slave example 1220, and a read-only example 1230, and meanwhile, there may be other traffic types that do not belong to the databases in the same rack, such as: virtual machines, object stores, and the like.

In the P4 layer of the programmable gateway, a packet corresponding to the database needs to be identified, and data replication and forwarding are not performed, for an illustrative example, referring to fig. 13, control information interaction is performed between the control layer 1310 and the data layer 1320, where matching of data write requests is performed in a processing process of the data layer 1320, that is, the data write requests are analyzed through a matching (match) stage and an action (action) stage, and finally a write command corresponding to the data write request is sent to a main instance and a replica instance obtained through matching. Specifically, in the protocol matching layer, the TCP protocol is judged first, and a field specified by the database is identified in the body of the TCP, and the operation type accessed by the data write request is judged. And completes the forwarding of the write command. As shown in fig. 14, a type field 1410 is included in the data write request, where the type field 1410 indicates a data operation type corresponding to the current data write request, such as: read, write, delete, etc.

The programmable gateway determines a list of sets of Internet Protocol (IP) addresses of the database machine and a database communication port when the database machine is on the shelf. The P4 program is written with configuration information for the IP address set and the database communication port, and with matching forwarding rules. And when the client requests are identified to be matched with the IP and the port of a certain instance, distinguishing command types of the data writing requests and combining the command types with the database kernel. Illustratively, a message of MySQL is divided into a message header and a message body, as shown in fig. 15, the message 1500 of MySQL includes a message header 1510 and a message body 1520, where the message header 1510 is composed of a message length 1511 and a sequence number 1512, and the message body 1520 includes message data.

The message length 1511 is used to mark the actual data length value of the current data write request message, and is in bytes. Sequence number 1512 is used to ensure that the message sequence is correct during a complete request/response interaction, and the sequence number value starts from 0 each time the client initiates a request. The message body 1520 stores the content of the data write request and the data of the response, and the length of the message body 1520 is determined by the message length 1511 in the message header 1510.

The programmable gateway first needs to identify whether the received request is a login authentication command, since all commands need to be executed with a preference for obtaining authorization. If the login command is identified, the forwarding to the copy instance is abandoned, and the copy instance is handed to the host computer where the main instance is located for processing.

In some embodiments, in the database login stage, after the main instance successfully logs in, the main instance needs to broadcast session success information of authentication to all the copy instances to ensure that authentication can be successful after a subsequent copy instance obtains a read request.

At step 1103, the programmable gateway sends a write command to the host instance.

At step 1104, the programmable gateway sends a write command to the replica instance.

In step 1105, the copy instance writes the write command to the replay file.

At step 1106, the master instance generates a digest value corresponding to the write command.

In step 1107, the primary instance writes the digest value into the binary log file corresponding to the write command.

In some embodiments, the binlog file includes, in addition to the digest value corresponding to the write command, a transaction identifier, a command length, and the like, which is not limited in this application.

At step 1108, the master instance synchronizes the binary log file to the replica instance.

In step 1109, the copy instance executes the write command corresponding to the digest value after matching the digest value with the write command in the playback file.

Fig. 16 is a block diagram of a database synchronization apparatus according to an exemplary embodiment of the present application, where as shown in fig. 16, the apparatus includes:

a receiving module 1610, configured to receive a write command sent by a programmable gateway, where the programmable gateway is configured to send the write command to the database master instance and the replica instance;

a generating module 1620, configured to generate a digest value corresponding to the write command, where a digest length of the digest value is smaller than a command length of the write command;

a writing module 1630, configured to write the digest value into a binary log file corresponding to the write command;

a synchronization module 1640, configured to synchronize the binary log file to the copy instance, where the copy instance is configured to execute the write command corresponding to the digest value after matching the digest value with the received write command.

In an alternative embodiment, the generating module 1620 is further configured to determine the command length of the write command;

the generating module 1620 is further configured to perform digest extraction on the write command to obtain the digest value in response to that the command length is smaller than a length threshold;

the generating module 1620, further configured to intercept a command fragment from the write command in response to the command length reaching the length threshold; and carrying out abstract extraction on the command segment to obtain the abstract value.

In an optional embodiment, the generating module 1620 is further configured to intercept a first command fragment of a first preset length from a head position of the write command backward; intercepting a second command segment of a second preset length from the tail position of the write command; and combining the first command segment and the second command segment to obtain the command segment of the write command.

In an optional embodiment, the generating module 1620 is further configured to add the first command fragment and the second command fragment to obtain the command fragment of the write command;

or,

the generating module 1620 is further configured to splice the first command segment and the second command segment to obtain the command segment of the write command.

In an alternative embodiment, the write module 1630 is further configured to determine the command length of the write command; determining a transaction identifier corresponding to the write command, wherein the transaction identifier is determined based on a generation order of database transactions corresponding to the write command; and writing the abstract value, the command length and the transaction identifier into the binary log file.

In an optional embodiment, the writing module 1630 is further configured to execute a write operation corresponding to the write command; and in response to the existence of a self-increment identification in the execution process of the write operation, writing the corresponding relation between the write operation and the self-increment identification into the binary log file, wherein the self-increment identification is used for representing the operation sequence of the write operation.

Fig. 17 is a block diagram of a database synchronization apparatus according to an exemplary embodiment of the present application, where as shown in fig. 17, the apparatus includes:

a receiving module 1710, configured to receive a write command sent by a programmable gateway, where the programmable gateway is configured to send the write command to a database master instance and the copy instance;

a write module 1720 for writing the write command to a playback file, the playback file for controlling execution of the write command;

the receiving module 1710 is further configured to receive a binary log file synchronized with the database master instance, where the binary log file includes a digest value, the digest value is a digest generated by the database master instance based on a received write command, and a digest length of the digest value is smaller than a command length of the write command;

the executing module 1730 is configured to match the digest value with a write command in the playback file, and then execute the write command corresponding to the digest value.

In an alternative embodiment, the execution module 1730 is further configured to extract a reference digest value of a write command in the playback file; matching the digest value with the reference digest value; and in response to the target abstract value in the reference abstract value is consistent with the abstract value, executing a write command corresponding to the target abstract value.

In an alternative embodiment, the reference digest value is extracted in a manner consistent with the manner in which the digest value is extracted.

In an optional embodiment, the binary log file further includes a transaction identifier corresponding to the write command;

the write module 1720, further configured to write the write command in a command fill bit of the playback file; setting an identification bit of the replay file to be a null value, wherein the identification bit is used for controlling the execution condition of a write command; and when the identification bit takes a value, the write command is circularly waited, and when the identification bit takes a value, the write command is executed.

In an optional embodiment, the write module 1720 is further configured to stuff the transaction identifier to the identification bit in response to the target digest value being consistent with the digest value;

the write module 1720 is further configured to execute the write command in response to a cycle to the write command in the replay file and the transaction identifier being present in the identifier bit corresponding to the write command.

To sum up, according to the synchronization apparatus for a database provided in the embodiment of the present application, after sending a write command to a main instance and a replica instance through a programmable gateway, the main instance sends a digest value corresponding to the write command to the replica instance to instruct the replica instance to execute the corresponding write command, so as to implement synchronization between the main instance and the replica instance.

It should be noted that: the database synchronization apparatus provided in the foregoing embodiment is only illustrated by dividing each functional module, and in practical applications, the foregoing function allocation may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the embodiment of the synchronization apparatus for a database and the embodiment of the synchronization method for a database provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are described in the embodiment of the method for the embodiment and are not described herein again.

Fig. 10 shows a schematic structural diagram of a computer device provided in an exemplary embodiment of the present application, where the computer device may be implemented as the server described above. Specifically, the method comprises the following steps:

computer device 1800 includes a Central Processing Unit (CPU) 1801, a system Memory 1804 including a Random Access Memory (RAM) 1802 and a Read Only Memory (ROM) 1803, and a system bus 1805 that couples system Memory 1804 and Central Processing Unit 1801. The computer device 1800 also includes mass storage devices 1806 for storing an operating system 1813, application programs 1814, and other program modules 1815.

The mass storage device 1806 is connected to the central processing unit 1801 through a mass storage controller (not shown) connected to the system bus 1805. The mass storage device 1806 and its associated computer-readable media provide non-volatile storage for the computer device 1800. That is, the mass storage device 1806 may include a computer-readable medium (not shown) such as a hard disk or Compact disk Read Only Memory (CD-ROM) drive.

Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, CD-ROM, Digital Versatile Disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 1804 and mass storage device 1806 described above may be collectively referred to as memory.

According to various embodiments of the application, the computer device 1800 may also operate as a remote computer connected to a network, such as the Internet. That is, the computer device 1800 may be connected to the network 1812 through the network interface unit 1811 that is coupled to the system bus 1805, or the network interface unit 1811 may be used to connect to other types of networks or remote computer systems (not shown).

The memory further includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU.

The embodiment of the present application further provides a computer device, where the computer device includes a memory and a processor, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded by the processor and implements the database synchronization method described in the foregoing embodiment.

Embodiments of the present application further provide a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the database synchronization method as described in the foregoing embodiments.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the database synchronization method described in any of the above embodiments.

Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A synchronization method for a database, applied to a database master instance, the method comprising:

2. The method of claim 1, wherein generating the digest value corresponding to the write command comprises:

determining the command length of the write command;

in response to the command length being smaller than a length threshold, performing abstract extraction on the write command to obtain an abstract value;

intercepting a command fragment from the write command in response to the command length reaching the length threshold; and carrying out abstract extraction on the command segment to obtain the abstract value.

3. The method of claim 2, wherein intercepting a command fragment from the write command comprises:

intercepting a first command segment of a first preset length from the head position of the write command backwards;

intercepting a second command segment of a second preset length from the tail position of the write command;

and combining the first command segment and the second command segment to obtain the command segment of the write command.

4. The method of claim 3, wherein combining the first command fragment and the second command fragment to obtain the command fragment of the write command comprises:

adding the first command segment and the second command segment to obtain the command segment of the write command;

or,

and splicing the first command segment and the second command segment to obtain the command segment of the write command.

5. The method according to any one of claims 1 to 4, wherein the writing the digest value into the binary log file corresponding to the write command includes:

determining the command length of the write command;

determining a transaction identifier corresponding to the write command, wherein the transaction identifier is determined based on a generation order of database transactions corresponding to the write command;

and writing the abstract value, the command length and the transaction identifier into the binary log file.

6. The method of claim 5, wherein after receiving the write command sent by the programmable gateway, further comprising:

executing the write operation corresponding to the write command;

and in response to the existence of a self-increment identification in the execution process of the write operation, writing the corresponding relation between the write operation and the self-increment identification into the binary log file, wherein the self-increment identification is used for representing the operation sequence of the write operation.

7. A database synchronization method applied to a replica instance, the method comprising:

8. The method of claim 7, wherein after matching the digest value with a write command in the playback file, executing the write command corresponding to the digest value comprises:

extracting a reference digest value of a write command in the replay file;

matching the digest value with the reference digest value;

and in response to the target abstract value in the reference abstract value is consistent with the abstract value, executing a write command corresponding to the target abstract value.

9. The method of claim 8,

and the extraction mode of the reference abstract value is consistent with that of the abstract value.

10. The method of claim 8, further comprising a transaction identification corresponding to the write command in the binary log file;

the writing the write command into a playback file includes:

writing the write command in a command fill bit of the playback file;

setting an identification bit of the replay file to be a null value, wherein the identification bit is used for controlling the execution condition of a write command; and when the identification bit takes a value, the write command is circularly waited, and when the identification bit takes a value, the write command is executed.

11. The method of claim 10, wherein the executing the write command corresponding to the target digest value in response to the presence of the target digest value that is consistent with the digest value in the reference digest value comprises:

in response to the target digest value being consistent with the digest value, populating the identification bit with the transaction identification;

and executing the write command in response to the fact that the write command is circulated in the replay file and the transaction identification exists in the identification bit corresponding to the write command.

12. An apparatus for synchronizing databases, the apparatus comprising:

13. An apparatus for synchronizing databases, the apparatus comprising:

14. A computer device comprising a processor and a memory, said memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by said processor to implement a method of synchronization of a database according to any one of claims 1 to 11.

15. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of synchronization of a database according to any one of claims 1 to 11.