CN112307118A - Method for guaranteeing data consistency based on log analysis synchronization and synchronization system - Google Patents

Method for guaranteeing data consistency based on log analysis synchronization and synchronization system Download PDF

Info

Publication number
CN112307118A
CN112307118A CN202011065219.1A CN202011065219A CN112307118A CN 112307118 A CN112307118 A CN 112307118A CN 202011065219 A CN202011065219 A CN 202011065219A CN 112307118 A CN112307118 A CN 112307118A
Authority
CN
China
Prior art keywords
transaction
commit
sub
merged
identifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011065219.1A
Other languages
Chinese (zh)
Other versions
CN112307118B (en
Inventor
孙峰
付铨
彭青松
刘启春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Dameng Database Co Ltd
Original Assignee
Wuhan Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Dameng Database Co Ltd filed Critical Wuhan Dameng Database Co Ltd
Priority to CN202011065219.1A priority Critical patent/CN112307118B/en
Publication of CN112307118A publication Critical patent/CN112307118A/en
Application granted granted Critical
Publication of CN112307118B publication Critical patent/CN112307118B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a synchronization system for guaranteeing data consistency based on log analysis synchronization, wherein the method comprises the following steps: loading the table identifier and the corresponding log serial number in the submitted transaction table into a filter container; acquiring a transaction identifier to which the DML operation belongs and a table identifier related to the DML operation, and performing classification management on the DML operation by taking the transaction identifier and the table identifier as a combined key so as to divide one transaction into at least one sub-transaction; acquiring a commit transaction identifier and a commit log serial number of a commit operation; acquiring target sub-transactions with the same transaction identifiers and submitted transaction identifiers, and sequentially extracting table identifiers of the target sub-transactions; and filtering the target sub-transaction based on the commit log sequence number, the table identifier of the target sub-transaction and the filtering container so as to ensure the consistency of data synchronization. In the invention, only one row of information is registered in the submitted transaction table by each table, thus greatly reducing the data scale in the submitted transaction table and preventing access hot spots from being generated.

Description

Method for guaranteeing data consistency based on log analysis synchronization and synchronization system
Technical Field
The invention belongs to the technical field of synchronization, and particularly relates to a method and a system for guaranteeing data consistency based on log analysis synchronization.
Background
The real-time synchronization of database data is a technical scheme for improving the availability of an information system and ensuring the continuity of services. Through real-time synchronization of data, the service data of the target database and the source database are kept consistent in real time, and when the source database fails and is interrupted in service, the application system can be quickly switched to the target database, so that the requirement of service continuity is met.
The database data real-time copying technology based on log analysis has the characteristics of small influence on the performance and data mode of a source database, support of a heterogeneous operating system and a database platform, high data copying performance and the like, and is widely applied to the fields of emergency disaster recovery, multi-service centers, heterogeneous resource integration, data migration and the like. The technology captures an online log or an archived log of a source database through a log capture process of a source end, analyzes that INSERT (insertion), UPDATE (UPDATE) and DELETE (deletion) operations of the database are converted into message packets with an internal specific format, sends the message packets to a destination end of a replication system through a TCP/IP (Transmission Control Protocol/Internet Protocol, abbreviated as TCP/IP) network, unpacks the message packets after the destination end receives the message packets, restores transaction information of the source end into corresponding SQL (Structured Query Language, abbreviated as SQL) statements, and performs real-time replication on a target database through a local database interface to achieve database data synchronization.
In order to ensure the data consistency between the source database and the target database, the database data replication technology based on log analysis usually takes the transaction of the source database as the minimum replication unit, and performs real-time data replication strictly according to the transaction sequence of the source database, thereby ensuring the integrity and consistency of the transactions between the target database and the source database, and ensuring that the target database conforms to the transaction logic of the source database. Therefore, in the database data replication technology based on log analysis, the technical key is how to ensure the transaction consistency of a source database and a target database, and particularly after a replication system fails, the target database can be correctly recovered according to the requirements of transaction integrity and consistency.
At present, in order to ensure the integrity and consistency of the replication transactions of the target database and the source database, the following methods are generally adopted: a commit transaction table is created in the target database to record transaction information for which synchronization has been completed. When the replication system fails and recovers, the continuously transmitted to-be-executed transaction needs to use the transaction information recorded in the submitted transaction table to filter out the transaction which is synchronized before the failure, so that repeated execution of the transaction is avoided, and the consistency of the transaction after failure recovery is ensured.
However, the above scheme for registering synchronized transactions based on the commit transaction table needs to register all transaction information that has been synchronized, which may slow down the synchronization performance in a specific application scenario, for example, a synchronization scenario in which there are a large number of small transactions (a small transaction refers to a very small number of operations in a single transaction, and each transaction may have only one or two operations), in which, in each synchronization of one transaction in this scenario, one piece of transaction information is recorded in the transaction commit table, a large number of small transactions insert a large amount of transaction information into the commit transaction table during synchronization, and a large amount of transaction information in the transaction commit table needs to be cleared by way of a checkpoint in the synchronization process, and centralized access to the commit transaction table may cause a hotspot of the database to form a performance bottleneck, thereby affecting the data synchronization performance.
In view of this, overcoming the deficiencies of the prior art products is an urgent problem to be solved in the art.
Disclosure of Invention
In view of the above drawbacks or needs for improvement in the prior art, the present invention provides a method and a system for guaranteeing data consistency based on log parsing synchronization, which aims to organize information in a commit transaction table based on table identifiers and commit log sequence numbers of transactions, and only one row of information is registered in the commit transaction table by each table during data synchronization operation, thereby greatly reducing the data size in the commit transaction table, effectively reducing the number of accesses to the commit transaction table, and preventing access hotspots from being generated.
To achieve the above object, according to an aspect of the present invention, there is provided a method for guaranteeing data consistency based on log parsing synchronization, the method including:
loading a table identifier and a corresponding log serial number in a submitted transaction table into a filtering container, wherein the submitted transaction table takes the table identifier as a main key and takes the log serial number as an additional column;
acquiring a transaction identifier to which a DML operation belongs and a table identifier related to the DML operation, and performing classification management on the DML operation by taking the transaction identifier and the table identifier as a combined key so as to divide one transaction into at least one sub-transaction;
acquiring a commit transaction identifier and a commit log serial number of a commit operation;
in the sub-transactions subjected to classified management, acquiring target sub-transactions with transaction identifications identical to submitted transaction identifications, and sequentially extracting table identifications of the target sub-transactions;
and filtering the target sub-transaction based on the commit log serial number, the table identifier of the target sub-transaction and the filtering container so as to ensure the consistency of data synchronization.
Preferably, a to-be-executed transaction linked list is arranged in the destination data synchronization system, and the to-be-executed transaction linked list is used for storing the to-be-executed and warehoused transactions;
filtering the target sub-transaction based on the commit log sequence number, the table identifier of the target sub-transaction, and the filter container to ensure consistency of data synchronization comprises:
determining whether a table identification of the target sub-transaction exists in the filter container;
if the filtering log sequence number exists in the filtering container, the filtering log sequence number corresponding to the table identifier of the target sub-transaction is obtained;
judging whether the submitted log serial number is larger than the filtering log serial number or not;
if so, adding the target sub-transaction into the to-be-executed transaction linked list;
if not, discarding the target sub-transaction.
Preferably, filtering the target sub-transaction based on the commit log sequence number, the table identifier of the target sub-transaction, and the filter container to ensure consistency of data synchronization further comprises:
if the transaction list does not exist in the filter container, registering the list identification of the target sub-transaction in the submitted transaction list, and setting the corresponding log serial number to be 0;
and loading the table identification of the target sub-transaction and the log sequence number 0 into the filter container.
Preferably, a transaction linked list to be merged is arranged in the destination data synchronization system, and the transaction linked list to be merged is used for storing the transactions to be merged;
the method further comprises the following steps:
taking out a sub-transaction to be merged from the transaction linked list to be executed, and adding the sub-transaction to be merged to the transaction linked list to be merged;
according to the table identifier of the sub-transaction to be merged, taking out the sub-transaction to be merged with the same table identifier from the linked list to be executed to obtain a plurality of sub-transactions to be merged,
adding the sub-transactions to be merged into the transaction linked list to be merged from first to last according to the submission sequence corresponding to the sub-transactions to be merged;
merging all operations except the commit operation;
and after the transaction submission table is updated, submitting the transaction in the transaction linked list to be merged.
Preferably, the submitting the transaction in the transaction linked list to be merged after the updating of the transaction submission table is completed includes:
acquiring a commit log serial number of the last sub-transaction to be merged in the transaction linked list to be merged;
updating the log serial number of the corresponding additional column in the submitted transaction table to be the submitted log serial number of the last sub-transaction to be merged by taking the table identifier of the sub-transaction to be merged as a main key;
a commit operation is performed.
Preferably, the method further comprises:
and when the fault recovery is carried out, loading the filtering container through the updated transaction submission table so as to update the filtering log serial number.
Preferably, the method further comprises:
and after the sub-transaction to be merged is added to the transaction linked list to be merged, deleting the sub-transaction to be merged from the transaction linked list to be executed.
Preferably, before loading the table identifier in the commit transaction table and the corresponding log sequence number into the filter container, the method further comprises:
the target end data synchronization system judges whether a submitted transaction table exists in a target end database;
if the submitted transaction table exists, the step of loading the table identifier and the corresponding log serial number in the submitted transaction table into the filtering container is executed;
if not, the table identifier is taken as a primary key, and the log sequence number is taken as an additional column to create a commit transaction table.
Preferably, after loading the table identifier in the commit transaction table and the corresponding log sequence number into the filter container, the method includes:
receiving a log record from a source end, and analyzing the log record to obtain corresponding operation;
judging the type of operation;
if the operation is a DML operation, executing the step of acquiring the transaction identifier to which the DML operation belongs and the table identifier related to the DML operation, and performing classification management on the DML operation by taking the transaction identifier and the table identifier as a combined key so as to divide one transaction into at least one sub-transaction;
and if the operation is a commit operation, executing the step of acquiring the commit transaction identifier and the commit log serial number of the commit operation.
To achieve the above object, according to another aspect of the present invention, there is provided a synchronization system including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor and programmed to perform the method of securing transaction consistency of the present invention.
Generally, compared with the prior art, the technical scheme of the invention has the following beneficial effects: the invention provides a method and a synchronization system for guaranteeing data consistency based on log analysis synchronization, wherein the method comprises the following steps: loading a table identifier and a corresponding log serial number in a submitted transaction table into a filtering container, wherein the submitted transaction table takes the table identifier as a main key and takes the log serial number as an additional column; acquiring a transaction identifier to which a DML operation belongs and a table identifier related to the DML operation, and performing classification management on the DML operation by taking the transaction identifier and the table identifier as a combined key so as to divide one transaction into at least one sub-transaction; acquiring a commit transaction identifier and a commit log serial number of a commit operation; in the sub-transactions subjected to classified management, acquiring target sub-transactions with transaction identifications identical to submitted transaction identifications, and sequentially extracting table identifications of the target sub-transactions; and filtering the target sub-transaction based on the commit log serial number, the table identifier of the target sub-transaction and the filtering container so as to ensure the consistency of data synchronization. In the invention, the information in the submitted transaction table is organized based on the table identification and the serial number of the submitted log of the transaction, and only one line of information is registered in the submitted transaction table by each table in the data synchronous operation process, so that the data scale in the submitted transaction table is greatly reduced, the access times of the submitted transaction table can be effectively reduced, and the access hot spot is prevented from being generated.
Drawings
Fig. 1 is a schematic flowchart of a method for guaranteeing data consistency based on log parsing synchronization according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of step 105 of FIG. 1 according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a synchronization system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the description of the present invention, the terms "inner", "outer", "longitudinal", "lateral", "upper", "lower", "top", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are for convenience only to describe the present invention without requiring the present invention to be necessarily constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1:
the submitted transaction table is used for registering transaction information, mainly to prevent the inconsistent synchronous data after the data synchronization is recovered from a fault (such as a synchronous program exception, a database service exception, an operating system exception or a hardware fault), for example, when the synchronization service finishes the transaction submission at the destination database, the synchronization service abnormally crashes under the condition that the submission command is successfully sent to the database service but the execution result returned by the database service is not received, since the synchronization service and the database are two independent entities, although the synchronization service is abnormal, the database may still successfully submit the submission command sent by the synchronization service, in this case, after the synchronization service is recovered, it is not possible to directly obtain from the database whether the execution of the commit command sent before the last failure was successful, but it is only possible to determine whether the last commit was successful by querying whether the last executed transaction information is registered in the commit transaction table through the transaction information.
At present, the synchronization service organizes records in a commit transaction table by taking transactions as units, but the method is not suitable for all application scenarios, and an access hot spot for committing the transaction table can be caused under the synchronization scenario in which large-scale small transactions exist.
To solve the foregoing problem, this embodiment provides a method for guaranteeing data consistency based on log parsing synchronization, and referring to fig. 1, the method includes the following steps:
step 101: and loading the table identifier and the corresponding log serial number in the commit transaction table into a filter container, wherein the commit transaction table takes the table identifier as a primary key and takes the log serial number as an additional column.
Wherein, the table identification refers to an ID of a certain table.
And the log sequence number corresponding to each table identifier is used as a filtering log sequence number, and the transactions related to the table identifier are filtered to judge whether the transactions are synchronized before the fault.
And the log serial number is a log serial number corresponding to the submitting operation. In actual use, a transaction includes multiple DML operations (e.g., an insert operation, a delete operation, and an update operation), where the table involved in each operation may be different, each transaction corresponds to a commit operation, and the commit log sequence number corresponding to the commit operation is used to update the log sequence number of the append column.
Under the actual application scene, the target end data synchronization system judges whether a submitted transaction table exists in a target end database; if a commit transaction table exists, executing step 101; if not, the table identifier is taken as a primary key, and the log sequence number is taken as an additional column to create a commit transaction table.
Wherein, the table building sentence is: CREATE TABLE TX (TABLE _ ID INT PRIMARYKEY, LSN NUMBER).
Step 102: the method comprises the steps of obtaining a transaction identifier to which a DML operation belongs and a table identifier related to the DML operation, and carrying out classification management on the DML operation by taking the transaction identifier and the table identifier as a combined key so as to divide one transaction into at least one sub-transaction.
In this embodiment, synchronization systems are deployed in the source database and the destination database, the source data synchronization system reads logs from the source database, and the destination data synchronization system is responsible for applying synchronization operations sent by the source to the destination database.
The method comprises the steps that a target end data synchronization system receives log records from a source end, analyzes the log records to obtain corresponding operations, judges the types of the operations, obtains transaction identifiers to which the DML operations belong and table identifiers related to the DML operations if the operations are the DML operations, and classifies and manages the DML operations by taking the transaction identifiers and the table identifiers as combination keys so as to divide one transaction into at least one sub-transaction.
For example, transaction a (transaction ID1) includes operation 1, operation 2, operation 3, operation 4, and operation 5, where the table referred to by operation 1 and operation 2 is identified as ID1, the table referred to by operation 3 and operation 4 is identified as ID2, and the table referred to by operation 5 is identified as ID3, then the combination keys (ID 1) of operation 1 and operation 2 are the same, operation 1 and operation 2 are divided into one sub-transaction B, the combination keys (ID1, ID2) of operation 3 and operation 4 are the same, operation 3 and operation 4 are divided into another sub-transaction C, the combination key (ID1, ID3) of operation 5, and operation 5 are divided into another sub-transaction D.
And if the operation is a commit operation, acquiring a commit transaction identifier and a commit log sequence number of the commit operation.
Step 103: and acquiring a commit transaction identifier and a commit log sequence number of the commit operation.
Step 104: and in the sub-transactions subjected to classified management, acquiring target sub-transactions with the same transaction identifiers and submitted transaction identifiers, and sequentially extracting the table identifiers of the target sub-transactions.
For example, the commit transaction of the commit operation is identified as ID1, indicating that transaction A was committed at the source. Then, in the sub-transactions subjected to classification management, a target sub-transaction with the same transaction identifier as the submitted transaction identifier ID1 is obtained according to the combination key of the sub-transaction, and then a sub-transaction B, a sub-transaction C and a sub-transaction D are obtained, and the table identifiers of the target sub-transaction are sequentially extracted to obtain table identifiers ID1, ID2 and ID 3.
Step 105: and filtering the target sub-transaction based on the commit log serial number, the table identifier of the target sub-transaction and the filtering container so as to ensure the consistency of data synchronization.
Specifically, when filtering is performed through the filtering container, a filtering log serial number is determined according to the table identifier of the target sub-transaction, the submitted log serial number is compared with the filtering log serial number to determine whether the target sub-transaction is synchronized before the fault, if so, the target sub-transaction is discarded, if not, the corresponding filtering log serial number is updated in the submitted transaction table and the filtering container according to the submitted log serial number of the target sub-transaction, and then, the synchronization operation of the target sub-transaction is performed.
In the invention, the information in the submitted transaction table is organized based on the table identification and the serial number of the submitted log of the transaction, and only one line of information is registered in the submitted transaction table by each table in the data synchronous operation process, thereby greatly reducing the data scale in the submitted transaction table; and when the transaction operation is executed, the operation in each transaction is classified according to the table, one large transaction is divided into at least one small transaction (sub-transaction), the operation of the same table in a plurality of small transactions is combined into one transaction to be executed, and N small transactions are combined into one large transaction to be executed, so that the access times of the submitted transaction table can be effectively reduced, and the access hot spot is prevented from being generated.
In addition, when the operation between the tables is put in storage in parallel, the transaction information of each table is updated to maintain and submit different rows in the transaction table, and the conflict between the put-in threads can be effectively prevented.
In an actual application scenario, in order to improve synchronization efficiency and avoid frequent access of a commit transaction table, in a preferred embodiment, different sub-transactions for the same table are merged and then a synchronization operation is performed, and a specific implementation process is as follows:
the target end data synchronization system is provided with a to-be-executed transaction linked list used for storing the to-be-executed warehousing transactions, and the target end data synchronization system is provided with a to-be-merged transaction linked list used for storing the to-be-merged transactions. In the to-be-executed transaction linked list and the to-be-merged transaction linked list, the transaction order is stored according to the commit order of the transactions (that is, according to the size order of the commit operation LSN).
With reference to fig. 2, in step 105, the method specifically includes the following steps:
step 1051: determining whether a table identification of the target sub-transaction exists in the filter container.
If present in the filtration vessel, perform step 1052; if not, then step 1056 is performed.
Step 1052: and if the filtering log sequence number exists in the filtering container, acquiring a filtering log sequence number corresponding to the table identifier of the target sub-transaction.
Step 1053: and judging whether the submitted log serial number is larger than the filtering log serial number.
If so, go to step 1054; if not, go to step 1055.
Step 1054: and if so, adding the target sub-transaction into the to-be-executed transaction linked list.
In this embodiment, because a combination key of the transaction ID and the table ID is used to manage the transaction, the transaction related to multiple table operations in the source database is broken into multiple small transactions, and when the transaction commit operation is received, all the broken small transactions need to be found and added to the to-be-executed transaction linked list to wait for execution.
Step 1055: if not, discarding the target sub-transaction.
Step 1056: if the target sub-transaction does not exist in the filtering container, registering the table identifier of the target sub-transaction in the submitted transaction table, setting the corresponding log sequence number to be 0, and loading the table identifier of the target sub-transaction and the log sequence number 0 into the filtering container.
In this embodiment, before adding a small transaction to the to-be-executed transaction linked list, it is necessary to compare the commit LSN of the transaction with the LSN in the corresponding filtering container, and if the table ID corresponding to the transaction cannot be found in the container, it indicates that the table is not synchronized, besides the need to perform supplementary registration in the filtering container, it is also necessary to insert a row of information of the table in the commit transaction table, and the value of the LSN column is set to 0. Because the commit LSN in the log is incremented, when the current transaction commit LSN is greater than the filter LSN, the transaction is a new transaction and needs to be executed.
Step 1057: and taking out a sub-transaction to be merged from the transaction linked list to be executed, and adding the sub-transaction to be merged to the transaction linked list to be merged.
Step 1058: and taking out the sub-transactions to be merged with the same table identifier from the linked list to be executed according to the table identifier of the sub-transactions to be merged to obtain a plurality of sub-transactions to be merged.
In this embodiment, after the sub-transaction to be merged is added to the transaction linked list to be merged, the sub-transaction to be merged is deleted from the transaction linked list to be executed.
The execution thread responsible for the transaction warehousing extracts a sub-transaction to be merged from the transaction to be executed linked list and adds the sub-transaction to the transaction to be merged into the transaction to be merged linked list, then selects the transactions with the same table ID from the transaction to be executed linked list according to the table ID of the transaction, adds the transactions into the transaction to be merged, and moves the selected transaction out of the transaction to be executed linked list, so that the action can be continued when the transaction which cannot be merged last time is selected again.
In this embodiment, when selecting the number of merged transactions, the selection may be determined according to the set size value of the merged transactions, and if the number of operations after merging is set to be N, the operands of the merged transactions may be counted when selecting the transactions, and when the operands reach or exceed N, the selection is stopped. When N is set too small, the effect of merging operations is affected, and when N is set too large, the parallelism between threads is affected, and the value of N needs to be determined according to the synchronization environment.
Step 1059: and adding the sub-transactions to be merged into the transaction linked list to be merged from first to last according to the submission sequence corresponding to the sub-transactions to be merged.
Step 1060: all operations except the commit operation are merged.
The execution thread performs the merge of all operations except the commit operation after selecting the merged transaction.
Step 1061: and after the transaction submission table is updated, submitting the transaction in the transaction linked list to be merged.
Specifically, a commit log serial number of the last sub-transaction to be merged in the transaction linked list to be merged is obtained; updating the log serial number of the corresponding additional column in the submitted transaction table to be the submitted log serial number of the last sub-transaction to be merged by taking the table identifier of the sub-transaction to be merged as a main key; and executing the commit operation to complete the synchronization of the corresponding transaction.
The ordering of the transactions in the chain table to be merged is arranged according to the size of the submitted LSN of each transaction, and the submitted LSN of the last transaction is the largest, so that the LSN is used for updating the submitted transaction table, all the transactions less than or equal to the LSN are completely synchronized when the corresponding table is synchronized, and the LSN can be used for filtering the synchronized transactions on the corresponding table when the fault is recovered.
In addition, N transaction information in the chain table to be merged uniformly uses the last transaction information to be registered in the submitted transaction table, so that the access pressure of the submitted transaction table can be effectively reduced, and the record scale in the submitted transaction table is reduced. By adopting the optimization scheme, the scale of the record number of the submitted transaction table is only related to the number of the synchronous tables, so that the data volume of the table is constant after a period of synchronization regardless of the number of the synchronous transactions in the whole synchronization process.
In actual use, when fault recovery is performed, the filtering container is loaded through the updated transaction commit table to update the filtering log serial number, so that transactions are filtered, and the consistency of data synchronization is guaranteed.
In this embodiment, the transactions having a plurality of table operations are scattered according to the table IDs, then the transactions having the same table ID are selected according to the table IDs to be merged, N small transactions are merged into one large transaction to be executed, and after the execution is completed, the transaction information in the commit transaction table is updated and then the commit operation is executed. When the synchronous fault is recovered, each table identifies the synchronized transaction according to the corresponding information in the submitted transaction table, and the consistency of data during recovery is ensured.
The basic steps of the above embodiment can be explained as follows:
firstly, LSN in the log stream of the database has the characteristic of continuous increment, after the transactions are classified according to the table ID, a large transaction is divided into at least one small transaction, then the small transactions passing through a filter container are added into a transaction chain table to be executed, and when synchronization is carried out, N small transactions with the same table ID are synthesized into a large transaction to be executed, so that the transaction table can be registered and the transaction LSN of the last transaction can be used as a boundary to clearly distinguish the synchronized transaction from the unsynchronized transaction.
Secondly, the adoption of an UPDATE (UPDATE) mode to operate the commit transaction table has more performance advantages compared with the existing Insertion (INSERT) mode. Because the number of transactions is ever increasing during synchronization, registering synchronized transactions in an intrusive manner causes the commit transaction table to continually expand, and periodic checkpoints are necessary to empty the commit transaction table to maintain its data size. In addition, frequent insertions and deletions have a certain effect on the IO of the database. By adopting the scheme updating mode, the data scale in the submitted transaction table is stabilized after the operation is stable (the type of the table can be basically constant), the synchronous transaction is submitted only by updating in the fixed rows, the cost is much lower than that of the inserted scheme, and the synchronization performance is indirectly improved.
Example 2:
to facilitate an understanding of the aspects of the foregoing embodiments, the following description is made.
The above scheme is exemplified as follows: both the source and destination databases now have table T1(IDVARCHAR), and the source application has three transactions, table T1, to operate as follows:
TRX1:INSERT INTO T1(ID)VALUES('TRX1_T1_1');
TRX2:INSERT INTO T1(ID)VALUES('TRX2_T1_10');
TRX3:INSERT INTO T1(ID)VALUES('TRX3_T1_30');
TRX1:COMMIT;
TRX2:COMMIT;
TRX3:COMMIT;
the order of the above operations will form the following situation in the numbering table after the log receiving thread receives it:
Figure BDA0002713562070000131
Figure BDA0002713562070000141
the transaction synchronization recovery process is as follows:
the target end synchronization system is started, before starting to receive the log, whether a submitted transaction table exists on a target end database is judged, and the submitted transaction table does not exist when the log is received for the first time, so that the submitted transaction table is created:
CREATE TABLE TX(TABLE_ID INT PRIMARY KEY,LSN NUMBER)
after creation is complete, a filter container, currently an empty container, is constructed from the commit transaction table.
Then, the receiving source side operation is started, and the INSERT operation aiming at the T1 table in three transactions is received in sequence, and the operation is classified and managed by using the transaction ID and the table ID.
Receiving a commit operation of transaction TRX1 (LSN 4), assuming that the ID of T1 table is 1, acquiring a filtered LSN with table ID1 from the container, and since it is an empty container, acquiring none, using table ID as KEY, inserting filtered LSN as 0 into the container, and registering the filtered LSN of the table in the commit transaction table, performing:
INSERT INTO TX(TABLE_ID,LSN)VALUES(1,0);
after the above operation is completed, the transaction of the TRX1 is found and added to the to-be-executed transaction linked list.
Receiving the commit operation (LSN ═ 5) of the transaction TRX2, taking out the filtered LSN (LSN ═ 0) of table T1 from the filter container, comparing the LSNs, finding out the transaction of TRX2 because the commit operation LSN is greater than the filtered LSN, and adding it to the to-be-executed transaction linked list.
Receiving the commit operation (LSN ═ 6) of the transaction TRX3, taking out the filtered LSN (LSN ═ 0) of table T1 from the filter container, comparing the LSNs, finding out the transaction of TRX3 because the commit operation LSN is greater than the filtered LSN, and adding it to the to-be-executed transaction linked list.
When data is put in a warehouse, a merged transaction is selected from a transaction chain table to be executed, two transactions of TRX1 and TRX2 are supposed to be selected for merged execution, after the execution is completed, a commit LSN (LSN is 5) of the last transaction TRX2 is used for updating a filter LSN of a T1 table in a committed transaction table, and the execution is as follows:
UPDATE TX SET LSN=5WHERE TABLE_ID=1;
a commit is performed, assuming that the program failed after this commit is completed.
The next step is to restart the synchronization system to restore data synchronization.
And starting a destination end synchronization system, judging whether a submitted transaction table exists before starting receiving, loading data in the submitted transaction table to construct a filter container, wherein the LSN of the T1 table in the container is 5 at present.
Starting to receive source-side operation, receiving INSERT operation aiming at T1 table in three transactions in turn again, and classifying and managing the operation by using transaction ID and table ID.
Receiving a commit operation (LSN is 4) of the transaction TRX1, acquiring a filtering LSN (LSN is 5) of the table T1 from a filtering container, comparing the LSN, finding out the transaction of the TRX1 as the commit operation LSN is smaller than the filtering LSN, directly releasing the transaction without executing the transaction, and executing the transaction before the failure;
receiving the commit operation (LSN-5) of transaction TRX2, fetching the filtered LSN (LSN-5) of table T1 from the filter container, comparing the LSNs, finding the transaction TRX2 as the commit operation LSN equals the filtered LSN, releasing it directly without execution as it was executed before the failure.
Receiving the commit operation (LSN ═ 6) of the transaction TRX3, taking out the filtered LSN (LSN ═ 5) of table T1 from the filter container, comparing the LSNs, finding out the transaction of TRX3 because the commit operation LSN is greater than the filtered LSN, and adding it to the to-be-executed transaction linked list.
When data is put in a warehouse, a merged transaction is selected from a queue to be executed, only a TRX3 transaction exists in a transaction chain table to be executed, after the execution is completed, a filter LSN of a T1 table in a transaction submitting table is updated by using a submitting LSN (LSN is 6) of a transaction TRX3, and the following steps are executed:
UPDATE TX SET LSN=6WHERE TABLE_ID=1;
the commit is performed.
It can be seen from the foregoing process that, during fault recovery, a source end has a part of repeated operations to be sent to a destination end, where it is necessary to filter synchronized transactions depending on synchronized transaction information registered in a commit transaction table.
Example 3:
referring to fig. 3, fig. 3 is a schematic structural diagram of a synchronization system according to an embodiment of the present invention. The synchronization system of the present embodiment includes one or more processors 31 and a memory 32. In fig. 3, one processor 31 is taken as an example.
The processor 31 and the memory 32 may be connected by a bus or other means, such as the bus connection in fig. 3.
The memory 32, which is a non-volatile computer-readable storage medium based on a method of ensuring transaction consistency, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, the methods of the above-described embodiments, and corresponding program instructions. The processor 31 implements the methods of the foregoing embodiments by executing non-volatile software programs, instructions, and modules stored in the memory 32 to thereby execute various functional applications and data processing.
The memory 32 may include, among other things, high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the memory 32 may optionally include memory located remotely from the processor 31, and these remote memories may be connected to the processor 31 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It should be noted that, for the information interaction, execution process and other contents between the modules and units in the apparatus and system, the specific contents may refer to the description in the embodiment of the method of the present invention because the same concept is used as the embodiment of the processing method of the present invention, and are not described herein again.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for guaranteeing data consistency based on log parsing synchronization is characterized by comprising the following steps:
loading a table identifier and a corresponding log serial number in a submitted transaction table into a filtering container, wherein the submitted transaction table takes the table identifier as a main key and takes the log serial number as an additional column;
acquiring a transaction identifier to which a DML operation belongs and a table identifier related to the DML operation, and performing classification management on the DML operation by taking the transaction identifier and the table identifier as a combined key so as to divide one transaction into at least one sub-transaction;
acquiring a commit transaction identifier and a commit log serial number of a commit operation;
in the sub-transactions subjected to classified management, acquiring target sub-transactions with transaction identifications identical to submitted transaction identifications, and sequentially extracting table identifications of the target sub-transactions;
and filtering the target sub-transaction based on the commit log serial number, the table identifier of the target sub-transaction and the filtering container so as to ensure the consistency of data synchronization.
2. The method according to claim 1, wherein a to-be-executed transaction linked list is provided in the destination data synchronization system, and the to-be-executed transaction linked list is used for storing the to-be-executed and warehoused transactions;
filtering the target sub-transaction based on the commit log sequence number, the table identifier of the target sub-transaction, and the filter container to ensure consistency of data synchronization comprises:
determining whether a table identification of the target sub-transaction exists in the filter container;
if the filtering log sequence number exists in the filtering container, the filtering log sequence number corresponding to the table identifier of the target sub-transaction is obtained;
judging whether the submitted log serial number is larger than the filtering log serial number or not;
if so, adding the target sub-transaction into the to-be-executed transaction linked list;
if not, discarding the target sub-transaction.
3. The method of claim 2, wherein filtering the target sub-transaction based on the commit log sequence number, the table identification of the target sub-transaction, and the filter container to ensure consistency of data synchronization further comprises:
if the transaction list does not exist in the filter container, registering the list identification of the target sub-transaction in the submitted transaction list, and setting the corresponding log serial number to be 0;
and loading the table identification of the target sub-transaction and the log sequence number 0 into the filter container.
4. The method according to claim 2, wherein a transaction linked list to be merged is provided in the destination data synchronization system, and the transaction linked list to be merged is used for storing the transactions to be merged;
the method further comprises the following steps:
taking out a sub-transaction to be merged from the transaction linked list to be executed, and adding the sub-transaction to be merged to the transaction linked list to be merged;
according to the table identifier of the sub-transaction to be merged, taking out the sub-transaction to be merged with the same table identifier from the linked list to be executed so as to obtain a plurality of sub-transactions to be merged;
adding the sub-transactions to be merged into the transaction linked list to be merged from first to last according to the submission sequence corresponding to the sub-transactions to be merged;
merging all operations except the commit operation;
and after the transaction submission table is updated, submitting the transaction in the transaction linked list to be merged.
5. The method of claim 4, wherein committing the transaction in the to-be-merged transaction linked list after completing the update of the transaction commit table comprises:
acquiring a commit log serial number of the last sub-transaction to be merged in the transaction linked list to be merged;
updating the log serial number of the corresponding additional column in the submitted transaction table to be the submitted log serial number of the last sub-transaction to be merged by taking the table identifier of the sub-transaction to be merged as a main key;
a commit operation is performed.
6. The method of claim 4, further comprising:
and when the fault recovery is carried out, loading the filtering container through the updated transaction submission table so as to update the filtering log serial number.
7. The method of claim 4, further comprising:
and after the sub-transaction to be merged is added to the transaction linked list to be merged, deleting the sub-transaction to be merged from the transaction linked list to be executed.
8. The method of claim 1, wherein loading the table identification and corresponding log sequence number in the commit transaction table into the filter container comprises:
the target end data synchronization system judges whether a submitted transaction table exists in a target end database;
if the submitted transaction table exists, the step of loading the table identifier and the corresponding log serial number in the submitted transaction table into the filtering container is executed;
if not, the table identifier is taken as a primary key, and the log sequence number is taken as an additional column to create a commit transaction table.
9. The method of claim 1, wherein loading the table identifier and corresponding log sequence number in the commit transaction table into the filter container comprises:
receiving a log record from a source end, and analyzing the log record to obtain corresponding operation;
judging the type of operation;
if the operation is a DML operation, executing the step of acquiring the transaction identifier to which the DML operation belongs and the table identifier related to the DML operation, and performing classification management on the DML operation by taking the transaction identifier and the table identifier as a combined key so as to divide one transaction into at least one sub-transaction;
and if the operation is a commit operation, executing the step of acquiring the commit transaction identifier and the commit log serial number of the commit operation.
10. A synchronization system, characterized in that the synchronization system comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor and programmed to perform the method of securing transactional consistency as claimed in any one of claims 1 to 9.
CN202011065219.1A 2020-09-30 2020-09-30 Method for guaranteeing data consistency based on log analysis synchronization and synchronization system Active CN112307118B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011065219.1A CN112307118B (en) 2020-09-30 2020-09-30 Method for guaranteeing data consistency based on log analysis synchronization and synchronization system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011065219.1A CN112307118B (en) 2020-09-30 2020-09-30 Method for guaranteeing data consistency based on log analysis synchronization and synchronization system

Publications (2)

Publication Number Publication Date
CN112307118A true CN112307118A (en) 2021-02-02
CN112307118B CN112307118B (en) 2024-03-22

Family

ID=74488721

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011065219.1A Active CN112307118B (en) 2020-09-30 2020-09-30 Method for guaranteeing data consistency based on log analysis synchronization and synchronization system

Country Status (1)

Country Link
CN (1) CN112307118B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5278982A (en) * 1991-12-23 1994-01-11 International Business Machines Corporation Log archive filtering method for transaction-consistent forward recovery from catastrophic media failures
CN1784677A (en) * 2004-03-31 2006-06-07 微软公司 System and method for a consistency check of a database backup
CN101719149A (en) * 2009-12-03 2010-06-02 联动优势科技有限公司 Data synchronization method and device
US20120030172A1 (en) * 2010-07-27 2012-02-02 Oracle International Corporation Mysql database heterogeneous log based replication
CN103345470A (en) * 2013-05-31 2013-10-09 深圳市沃信科技有限公司 Database disaster tolerance method, database disaster tolerance system and server
US20150278281A1 (en) * 2014-03-28 2015-10-01 Futurewei Technologies, Inc. Efficient Methods and Systems for Consistent Read in Record-Based Multi-Version Concurrency Control
CN105574187A (en) * 2015-12-23 2016-05-11 武汉达梦数据库有限公司 Duplication transaction consistency guaranteeing method and system for heterogeneous databases
US20170351585A1 (en) * 2016-06-03 2017-12-07 International Business Machines Corporation Transaction consistency query support for replicated data from recovery log to external data stores
CN109189608A (en) * 2018-08-13 2019-01-11 武汉达梦数据库有限公司 A kind of method guaranteeing duplication transaction consistency and corresponding reproducing unit
CN109189852A (en) * 2018-08-01 2019-01-11 武汉达梦数据库有限公司 A kind of method that data are synchronous and the device synchronous for data
CN109241185A (en) * 2018-08-27 2019-01-18 武汉达梦数据库有限公司 A kind of method and data synchronization unit that data are synchronous
CN110222115A (en) * 2019-04-30 2019-09-10 武汉达梦数据库有限公司 The database synchronization method and equipment loaded based on table initialisation packet
CN110427422A (en) * 2019-05-23 2019-11-08 武汉达梦数据库有限公司 Data consistency verification method, equipment and storage medium when data synchronous abnormality
CN111694893A (en) * 2020-04-23 2020-09-22 武汉达梦数据库有限公司 Partial rollback analysis method based on log analysis and data synchronization system
CN111694798A (en) * 2020-04-23 2020-09-22 武汉达梦数据库有限公司 Data synchronization method and data synchronization system based on log analysis
CN111694799A (en) * 2020-04-30 2020-09-22 武汉达梦数据库有限公司 Parallel log analysis method and data synchronization system based on log analysis synchronization

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5278982A (en) * 1991-12-23 1994-01-11 International Business Machines Corporation Log archive filtering method for transaction-consistent forward recovery from catastrophic media failures
CN1784677A (en) * 2004-03-31 2006-06-07 微软公司 System and method for a consistency check of a database backup
CN101719149A (en) * 2009-12-03 2010-06-02 联动优势科技有限公司 Data synchronization method and device
US20120030172A1 (en) * 2010-07-27 2012-02-02 Oracle International Corporation Mysql database heterogeneous log based replication
CN103345470A (en) * 2013-05-31 2013-10-09 深圳市沃信科技有限公司 Database disaster tolerance method, database disaster tolerance system and server
US20150278281A1 (en) * 2014-03-28 2015-10-01 Futurewei Technologies, Inc. Efficient Methods and Systems for Consistent Read in Record-Based Multi-Version Concurrency Control
CN106462586A (en) * 2014-03-28 2017-02-22 华为技术有限公司 Efficient methods and systems for consistent read in record-based multi-version concurrency control
CN105574187A (en) * 2015-12-23 2016-05-11 武汉达梦数据库有限公司 Duplication transaction consistency guaranteeing method and system for heterogeneous databases
US20170351585A1 (en) * 2016-06-03 2017-12-07 International Business Machines Corporation Transaction consistency query support for replicated data from recovery log to external data stores
CN109189852A (en) * 2018-08-01 2019-01-11 武汉达梦数据库有限公司 A kind of method that data are synchronous and the device synchronous for data
CN109189608A (en) * 2018-08-13 2019-01-11 武汉达梦数据库有限公司 A kind of method guaranteeing duplication transaction consistency and corresponding reproducing unit
CN110262929A (en) * 2018-08-13 2019-09-20 武汉达梦数据库有限公司 A kind of method guaranteeing duplication transaction consistency and corresponding reproducing unit
CN109241185A (en) * 2018-08-27 2019-01-18 武汉达梦数据库有限公司 A kind of method and data synchronization unit that data are synchronous
CN110222115A (en) * 2019-04-30 2019-09-10 武汉达梦数据库有限公司 The database synchronization method and equipment loaded based on table initialisation packet
CN110427422A (en) * 2019-05-23 2019-11-08 武汉达梦数据库有限公司 Data consistency verification method, equipment and storage medium when data synchronous abnormality
CN111694893A (en) * 2020-04-23 2020-09-22 武汉达梦数据库有限公司 Partial rollback analysis method based on log analysis and data synchronization system
CN111694798A (en) * 2020-04-23 2020-09-22 武汉达梦数据库有限公司 Data synchronization method and data synchronization system based on log analysis
CN111694799A (en) * 2020-04-30 2020-09-22 武汉达梦数据库有限公司 Parallel log analysis method and data synchronization system based on log analysis synchronization

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
余安东 等: "数据副本一致性的算法研究与实现", 《计算机应用研究》, vol. 37, no. 1, pages 63 - 65 *
李荣幸 等: "移动数据库中保持数据一致性的一种方法", 《桂林工学院学报》, no. 01, pages 127 - 130 *
钟容: "基于事务日志的数据同步模型研究", 《科技创新与应用》, no. 10, pages 17 - 18 *
韩承双 等: "信息***事务一致性研究与实现", 《河南科技大学学报(自然科学版)》, no. 03, pages 28 - 31 *

Also Published As

Publication number Publication date
CN112307118B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
EP3968175B1 (en) Data replication method and apparatus, and computer device and storage medium
US11429641B2 (en) Copying data changes to a target database
CN109241185B (en) Data synchronization method and data synchronization device
CN110262929B (en) Method for ensuring consistency of copying affairs and corresponding copying device
CN112559473B (en) Priority-based two-way synchronization method and system
WO2021184679A1 (en) Continuous data protection system and method
CN111241094B (en) Database deleted column synchronization method and device based on log analysis
CN112559626B (en) Synchronous method and synchronous system of DDL operation based on log analysis
CN111221907B (en) Database added column synchronization method and device based on log analysis
WO2022134876A1 (en) Data synchronization method and apparatus, and electronic device and storage medium
CN112035463B (en) Bidirectional synchronization method and synchronization device of heterogeneous database based on log analysis
CN111694800A (en) Method for improving data synchronization performance and data synchronization system
CN111694893B (en) Partial rollback analysis method and data synchronization system based on log analysis
CN111694798B (en) Data synchronization method and data synchronization system based on log analysis
CN114968966A (en) Distributed metadata remote asynchronous replication method, device and equipment
CN111858504B (en) Operation merging execution method based on log analysis synchronization and data synchronization system
CN111930828B (en) Data synchronization method and data synchronization system based on log analysis
CN112800060A (en) Data processing method and device, computer readable storage medium and electronic equipment
CN112307118B (en) Method for guaranteeing data consistency based on log analysis synchronization and synchronization system
CN111858503A (en) Parallel execution method and data synchronization system based on log analysis synchronization
CN115422286A (en) Data synchronization method and device for distributed database
CN112035464B (en) Data synchronization filtering method and synchronization device based on log analysis
CN114138833A (en) Method and system for data synchronization of relational database and cache database
CN111930693B (en) Transaction merging execution method and device based on log analysis synchronization
CN117763052B (en) Data synchronization method and system for charging multi-center memory database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Applicant after: Wuhan dream database Co.,Ltd.

Address before: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Applicant before: WUHAN DAMENG DATABASE Co.,Ltd.

CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Sun Feng

Inventor after: Peng Qingsong

Inventor after: Liu Qichun

Inventor before: Sun Feng

Inventor before: Fu Quan

Inventor before: Peng Qingsong

Inventor before: Liu Qichun

GR01 Patent grant
GR01 Patent grant