WO2024051454A1 - 处理事务日志的方法及装置 - Google Patents

处理事务日志的方法及装置 Download PDF

Info

Publication number
WO2024051454A1
WO2024051454A1 PCT/CN2023/113247 CN2023113247W WO2024051454A1 WO 2024051454 A1 WO2024051454 A1 WO 2024051454A1 CN 2023113247 W CN2023113247 W CN 2023113247W WO 2024051454 A1 WO2024051454 A1 WO 2024051454A1
Authority
WO
WIPO (PCT)
Prior art keywords
transaction
log
identifier
data
fragment
Prior art date
Application number
PCT/CN2023/113247
Other languages
English (en)
French (fr)
Inventor
田伟
刘浩
韩富晟
Original Assignee
北京奥星贝斯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京奥星贝斯科技有限公司 filed Critical 北京奥星贝斯科技有限公司
Publication of WO2024051454A1 publication Critical patent/WO2024051454A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • One or more embodiments of this specification relate to the field of computer technology, and in particular, to a method and device for processing transaction logs.
  • the database will record each insertion, update, delete and other operations into the log. Therefore, based on the CDC (change data capture) technology of the log, the complete data change history of the database can be obtained by parsing the log, thereby achieving data synchronization. .
  • CDC change data capture
  • the SCN System Change Number, system change serial number
  • the SCN is allocated at a single point and is guaranteed to increase monotonically, just by passing the SCN sequence. Get the historical commit record of the transaction.
  • each log stream has its own independent LSN (log sequence number, log sequence number) and supports distributed transaction writing.
  • LSN log sequence number, log sequence number
  • different log streams may be distributed across multiple machine nodes.
  • the log writing order of different log streams is random, and there is no global timing for multiple log streams. , so how to process transaction logs for multiple log streams of distributed transactions, so as to obtain the transaction data of a transaction, is an urgent problem that needs to be solved.
  • One or more embodiments of this specification provide a method and device for processing transaction logs to implement multiple log streams for distributed transactions to obtain transaction data of the transaction.
  • embodiments of this specification provide a method for processing transaction logs, wherein a transaction is a database operation sequence that accesses and/or operates data; all logs corresponding to a transaction are written to at least two of the distributed databases. in a log stream.
  • the method includes: obtaining at least two log streams from a distributed database; wherein each log stream carries a transaction identifier; and based on the transaction identifier carried in each log stream, determining whether all logs corresponding to a transaction have been obtained and written. All the log streams entered, if so, use all the log streams for data assembly to obtain the transaction data corresponding to the transaction.
  • the method further includes: for each acquired log stream, execute: obtain each log corresponding to the same transaction from the current log stream; and obtain each log stream.
  • the log aggregates a transaction fragment corresponding to the transaction; wherein the transaction fragment carries the identifier of the current log stream and the transaction identifier of the transaction; accordingly, the determination of whether all logs corresponding to a transaction have been obtained has been written
  • All log streams entered including: determining whether all log streams written to all logs corresponding to a transaction have been obtained based on the log stream identifiers and transaction identifiers obtained from each transaction shard; accordingly, using the All log streams perform data assembly, including assembling all transaction fragments carrying the same transaction ID.
  • the method further includes: each transaction identifier corresponds to a log stream list, and the log stream list includes: all logs written to all logs corresponding to the transaction with the transaction identifier The identifier of the stream; accordingly, determining whether all log streams written in all logs corresponding to a transaction have been obtained based on the identifier of the log stream obtained from each transaction fragment and the transaction identifier, including: for each transaction fragment shards are executed: obtain the log stream identifier and transaction identifier from the current transaction shard; mark the obtained log stream identifier as arrival in the log stream list corresponding to the obtained transaction identifier; and for each Log stream list.
  • each transaction fragment when each transaction fragment is aggregated, it further includes: sending the aggregated transaction fragment to one of more than one assembly queues; on the basis of The identifier of the log stream obtained from each transaction shard and the transaction identifier are determined before all log streams written to all logs corresponding to a transaction have been obtained, further including: using at least one thread to obtain respectively from more than one assembly queue.
  • Each transaction is sharded.
  • each log written to the log stream carries a preparation version number; after obtaining the transaction data corresponding to the transaction, the method further includes: according to the preparation version carried in each log number, determine the order between each transaction, and output the transaction data of each transaction in sequence according to the order.
  • each time the transaction data of a transaction is assembled using each log corresponding to the transaction Determine the delivery version number of the transaction data based on the preparation version number carried in the transaction data, carry the determined delivery version number in the transaction data, and then send the transaction data to the sequencing queue;
  • Outputting the transaction data of each transaction includes: using the delivery version number and sequencing queue in each transaction data to output the transaction data of each transaction; among which, the output order conforms to the order of each transaction.
  • the method further includes: each time a transaction fragment is aggregated, determining the transaction fragment according to the preparation version number carried in each log used to aggregate the transaction fragment.
  • the version number corresponding to the slice carry the version number in the transaction fragment, and send the transaction fragment to one of more than one assembly queue; and send a message to each assembly queue every preset cycle time Send a full
  • the global heartbeat value wherein, the global heartbeat value is equal to the minimum value of the version number carried in each transaction fragment sent to each assembly queue within this preset period; accordingly, the use of the delivery in each transaction data
  • the version number and sequencing queue output the transaction data of each transaction, including: obtaining the global heartbeat value from all assembly queues respectively. If the same global heartbeat value is obtained from all assembly queues, use the same global heartbeat value to update all the delivery check value corresponding to the sequencing queue; and output the transaction data whose delivery version number is less than or equal to the delivery check value in the sequencing queue.
  • the version number carried in the transaction fragment is: aggregating one of the preparation version numbers carried in each log used by the transaction fragment; or, the The delivery version number of the transaction to which the transaction shard belongs.
  • the delivery version number is equal to the maximum value of the preparation version numbers carried in each log corresponding to the transaction.
  • embodiments of this specification provide a device for processing transaction logs, where a transaction is a database operation sequence that accesses and/or operates data; all logs corresponding to one transaction are written to at least two of the distributed databases.
  • the device includes: a log stream acquisition unit configured to acquire at least two log streams from the distributed database; wherein each log stream carries a transaction identifier; a transaction assembly unit configured to obtain at least two log streams according to each log stream.
  • the transaction identifier carried in the log stream determines whether all log streams written to all logs corresponding to a transaction have been obtained. If so, all log streams are used for data assembly to obtain transaction data corresponding to the transaction.
  • embodiments of the present specification provide a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed in a computer, the computer is caused to perform the method as described above.
  • embodiments of this specification provide a computing device, including a memory and a processor.
  • the memory stores executable code.
  • the processor executes the executable code, the method as described above is implemented. .
  • the combination of one or more embodiments of this specification has at least the following advantages: 1) After obtaining the log stream from the distributed database, the embodiment of this specification will, according to the transactions carried in each log stream, Identification, determine whether all log streams written by all logs corresponding to a transaction have been obtained, and if so, use all log streams for assembly to obtain transaction data corresponding to the transaction. It can be seen that through the embodiments of this specification, multiple log streams for distributed transactions can be implemented to obtain transaction data of a transaction.
  • the embodiment of this specification sets the global heartbeat value and the delivery check value of the sequencing queue based on the preparation version number carried in each log, and utilizes the difference between the delivery version number of the transaction data in the sequencing queue and the delivery check value of the sequencing queue. Relationships are used to control the output order of transaction data, thereby ensuring the correctness of the output order of transaction data.
  • Figure 1 shows an exemplary system architecture diagram to which embodiments of this specification can be applied
  • Figure 2 is a flow chart of a method for processing transaction logs provided by an embodiment of this specification
  • Figure 3 is a schematic diagram of generating transaction fragmentation provided by the embodiment of this specification.
  • Figure 4 is an assembly processing flow chart provided by the embodiment of this specification.
  • Figure 5 is an example diagram of assembling transaction data provided by the embodiment of this specification.
  • FIG. 6 is a sequence diagram of transaction data provided by the embodiment of this specification.
  • Figure 7 is a schematic diagram of generating a global heartbeat value provided by an embodiment of this specification.
  • Figure 8 is a structural diagram of a device for processing transaction logs provided by an embodiment of this specification.
  • Figure 1 illustrates an exemplary system architecture in which embodiments of the present specification may be applied.
  • the system mainly includes: distributed database and data synchronization equipment.
  • the write-ahead logging (WAL) mechanism is usually used, that is, the log of a transaction is first persisted into the log stream.
  • WAL write-ahead logging
  • Distributed transactions support multi-machine writing and can persist corresponding logs to corresponding log streams. The same log stream can be synchronized on different machine nodes.
  • Log stream (LogStream) is the basic unit of log reading and writing, and records the log of database transactions.
  • Transaction refers to the sequence of database operations that access and/or manipulate data.
  • a unit of program execution that accesses and possibly updates data items in a database, consisting of all operations performed between the beginning and end of a transaction. These operations must all be completed successfully, otherwise all changes made in each operation will all be undone.
  • a transfer transaction can consist of increasing the balance of one account and decreasing the balance of another account.
  • all logs corresponding to a transaction are usually written to at least two log streams in the distributed database.
  • the data synchronization device can pull the log stream from the distributed database, using the process provided in the embodiment of this specification. Process the transaction log and assemble the transaction data of each transaction. Furthermore, data synchronization is performed based on the dependencies or sequences between transaction data.
  • the above-mentioned distributed database can use multiple machine nodes as instances of running distributed database software.
  • the data synchronization device and the distributed database can interact through the network, which can include various connection types, such as wired and wireless communication links or fiber optic cables, etc.
  • the data synchronization device can be a single server, a server group composed of multiple servers, or a cloud server.
  • Cloud server also known as cloud computing server or cloud host, is a host product in the cloud computing service system to solve the difficult management and service expansion problems existing in traditional physical host and virtual private server (VPs, Virtual Private Server) services. The defect of sexual weakness. In addition, it can also be a computer terminal with strong computing power.
  • Figure 2 is a flow chart of a method for processing transaction logs provided by an embodiment of this specification. It can be understood that this method can be performed by the data synchronization device in the system shown in Figure 1.
  • the method includes: Step 202: Obtain more than one log stream from the distributed database, and each log stream carries a transaction identifier.
  • Step 204 Based on the transaction identifier carried in each log stream, determine whether all log streams written in all logs corresponding to a transaction have been obtained. If so, use all log streams to perform data assembly to obtain the transaction corresponding to the transaction. data.
  • step 202 that is, "obtaining more than one log stream from the distributed database” will be described in detail with reference to the embodiment.
  • each log stream may have multiple copies.
  • the grayscale part represents the primary copy
  • the non-grayscale part represents the backup copy.
  • Each copy is distributed on a different machine node.
  • each machine node can ensure the data consistency of each copy of the log stream through algorithms such as Paxos.
  • the data synchronization device can obtain the log distribution table in the distributed database in advance, and the distribution status of each log stream is recorded in the log distribution table.
  • the data synchronization device can obtain each log stream from the database based on the log distribution table, such as the log streams P1 to P8 in Figure 3.
  • RPC Remote Procedure Call
  • a process obtains each log stream from the corresponding machine node.
  • the log stream can be obtained from the primary copy by default. If the primary copy is unavailable, the log stream can be obtained from the standby copy.
  • Log streams can also be obtained from one of the primary and secondary replicas based on a preset load balancing policy.
  • step 204 In conjunction with the embodiment, that is, "According to the transaction identifier carried in each log stream, determine whether all log streams written in all logs corresponding to a transaction have been obtained. If so, use all log streams for data assembly. , to obtain the transaction data corresponding to the transaction" is described in detail.
  • a transaction usually corresponds to multiple logs, and the multiple logs are usually written to multiple log streams.
  • transaction 1 corresponds to 5 logs, and the 5 logs are written to log stream 1, log stream 2, and log stream 3 respectively. Therefore, a log stream can include one or more logs of the same transaction.
  • Transaction sharding refers to the aggregation of all logs corresponding to the same transaction in the same log stream.
  • Each transaction fragment has a transaction identifier and a log stream identifier, that is, it carries the transaction identifier of the transaction to which the log from which the transaction fragment originates belongs and the identifier of the log stream from which the transaction fragment originates. so.
  • step 203 may be further included: obtain all logs corresponding to the same transaction from each log stream, and aggregate all logs corresponding to the same transaction obtained from the same log stream as A processing unit corresponding to the transaction is called a transaction shard.
  • a transaction shard A processing unit corresponding to the transaction.
  • multiple transaction fragments corresponding to the same transaction obtained from multiple log streams can be used for assembly, that is, all transaction fragments carrying the same transaction identifier are assembled. Output the transaction data of the same transaction.
  • step 203 The processing of step 203 is described with an example.
  • log 1 and log 2 of transaction 1 are written in log stream 1, log 1 and log 2 of transaction 1 are obtained from log stream 1, and log 1 and log 2 are aggregated together to form a log of transaction 1.
  • Transaction fragment, recorded as transaction fragment 1 transaction fragment 1 carries the identifier of transaction 1 and the identifier of log stream 1;
  • log stream 2 is written with log 3 of transaction 1, which is obtained from log stream 2
  • Log 3 of transaction 1 is regarded as another transaction fragment of transaction 1, recorded as transaction fragment 2.
  • Transaction fragment 2 carries the identifier of transaction 1 and the identifier of log stream 2; log stream 3 is Log 4 and log 5 of transaction 1 are written, log 4 and log 5 of transaction 1 are obtained from the log stream 3, and log 4 and log 5 are aggregated together to form another transaction fragment of transaction 1, recorded as Transaction fragment 3 carries the identifier of transaction 1 and the identifier of log stream 3. Subsequently, the three transaction fragments are assembled to obtain the transaction data of transaction 1.
  • the process of aggregating all logs corresponding to the same transaction in the same log stream may include: Arrange all the logs in order, or merge all the logs in order.
  • a certain log stream includes log S1: decrease account a by A yuan; log S2: increase account a by B yuan.
  • the transaction shards aggregated using log S1 and log S2 can include two operations: reducing account a by A yuan and increasing account a by B yuan.
  • each log stream carries a transaction identifier and a log stream identifier.
  • the identifier of the log stream may be, for example, the LSN (log sequence number) of the log stream.
  • the log data of a transaction may be divided into multiple log streams and written to different machine nodes. Therefore, in order to distinguish and identify the transactions to which each log stream belongs, the transaction identifier is carried in the log stream.
  • multiple log streams in a transaction are distinguished by different log stream identifiers.
  • tx1, tx2 and tx3 For example, assume that three distributed transactions occur in the distributed database: denoted tx1, tx2 and tx3.
  • the log data of transaction tx1 is written into the log streams of P1 to P3. These log streams carry the transaction identifier tx1 and their respective log stream identifiers P1 to P3.
  • the log data of transaction tx2 is written into the log streams of P4 to P6. These log streams carry the transaction identifier tx2 and the log stream identifiers P4 to P6 respectively.
  • the log data of transaction tx3 is written into the log streams of P7 to P8. These log streams carry the transaction identifier tx3 and their respective log stream identifiers P7 to P8.
  • step 2041 when determining whether all log streams written in all logs corresponding to a transaction have been obtained, step 2041 is specifically executed: based on the data obtained from each transaction fragment The obtained log stream identifier and transaction identifier determine whether all log streams written to all logs corresponding to a transaction have been obtained.
  • the above step 202 is usually executed using multiple threads. That is to say, multiple threads obtain at least two log streams from the distributed database respectively, which will result in the acquisition of multiple log streams. There is a sequence of time, that is to say, some log streams arrive first (that is, they are obtained first), and some log streams arrive later (that is, they are obtained later). In this way, the log flow list can be used to implement the process of step 2041.
  • a log stream list corresponding to each transaction ID is recorded.
  • the log stream list includes all log streams written to all logs corresponding to the transaction with the transaction ID. logo.
  • the data synchronization device can pre-obtain the log stream list corresponding to each transaction identifier from the distributed database. For example, all logs of transaction tx1 are written to three log streams, then the log stream list corresponding to the transaction identifier tx1 includes the identifiers of the three log streams: P1, P2, and P3.
  • the distributed database can carry in each log stream a log stream list corresponding to the transaction identifier of the transaction to which the log stream belongs. For example, all logs of transaction tx1 are written to 3 log streams including P1, P2 and P3. Then the log streams of P1, P2 and P3 all contain the log stream list corresponding to tx1.
  • the log stream list Contains P1, P2 and P3, used to indicate which log streams tx1 corresponds to.
  • each transaction fragment After the obtained log is aggregated into a transaction fragment corresponding to the transaction, each transaction fragment also carries a log stream list of the log stream from which the transaction fragment comes.
  • step 2041 may include: Step 2041A: For each transaction fragment, execute: obtain the log stream identifier and transaction identifier from the current transaction fragment; in the log stream list corresponding to the obtained transaction identifier, Mark the acquired log stream identifier as arrived; step 2041B: For each log stream list, if the identifiers of all log streams in the current log stream list are marked as arrived, it is determined that a transaction corresponding to All log streams to which all logs are written; where the transaction ID of the transaction corresponds to the current log stream list.
  • step 2041A for transaction fragment 1, the log stream ID is P1 and the transaction ID is tx1 obtained from transaction fragment 1. Then, the log stream list corresponding to tx1 (the list includes P1, P2, P3) Boat General P1 is marked as arrived.
  • the method when each transaction fragment is aggregated, the method further includes: sending the aggregated transaction fragment to one of more than one assembly queues; Before determining whether all log streams written to all logs corresponding to a transaction have been obtained by obtaining the log stream identifier and the transaction identifier, it further includes: using at least one thread to obtain each transaction fragment from more than one assembly queue.
  • embodiments of this specification provide another implementable method, that is, using multiple assembly queues.
  • Each transaction fragment obtained in the above step is stored in the multiple assembly queues.
  • the transaction fragments obtained in the above steps can be sent to multiple assembly queues using round-robin (round-robin scheduling) method. .
  • transaction fragment tx1_P1 (the transaction identifier and log stream identifier indicating the transaction fragment are tx1 and P1 respectively, and similar expressions will be used later), tx1_P2, and tx1_P3 are located in assembly queues 1 to 3 respectively; tx2_P4, tx2_P5, and tx2_P6 are located in assembly queues 1 to 3 respectively; Queue 3, Queue 2 and Queue 2; tx3_P7 and tx3_P8 are located in Queue 3 and Queue 1 respectively.
  • the assembly queue is a first-in-first-out queue, and each assembly queue can be bound to an assembly thread to perform subsequent assembly of transaction shards.
  • the difficulty of sharding transactions in step 204 is how to know which transaction shards belong to the same transaction, and whether all transaction shards belonging to the same transaction have been obtained so that assembly can begin.
  • a log stream list and a log stream identifier are used to group transaction fragments of the same transaction. Pack.
  • each assembly thread can execute the assembly process shown in Figure 4.
  • the assembly process can include the following steps: Step 402: From the assembly queue Get transaction shards from.
  • Step 404 Obtain the log stream list and log stream identifier from the obtained current transaction fragment.
  • the log stream list and log stream identifier obtained from the current transaction fragment are called the current log stream list and The identifier of the current log stream.
  • Step 406 Determine whether the current log stream list carried by the current transaction fragment has been maintained in the transaction manager. If not, perform step 408; if yes, perform step 410.
  • a multi-thread method is used to process the transaction fragments in the assembly queue in parallel.
  • a transaction manager is set up in the embodiment of this specification, and the transaction manager uniformly maintains the obtained transaction fragment information. That is to say, for each transaction obtained from the assembly queue, The "context" of a transaction shard is recorded in the transaction manager and can be queried from the transaction manager.
  • Other methods can also be used to uniformly maintain the log stream list corresponding to each transaction identifier. In this embodiment, only the transaction manager is used as an example for description.
  • Step 408 Provide the current log stream list to the transaction manager for maintenance.
  • a log stream list of each transaction is maintained in the transaction manager.
  • the log stream list of each transaction corresponds to the transaction identifier and includes the identifiers of all log streams corresponding to the transaction identifier.
  • Step 410 Mark the identifier of the above-mentioned current log stream as arrival in the current log stream list maintained by the transaction manager.
  • Step 412 Determine whether the identifiers of all log streams in the current log stream list are marked as arrived. If so, proceed to step 414; otherwise, go to step 402.
  • Step 414 Assemble the transaction fragments that carry the transaction identifier corresponding to the current log stream list among the obtained transaction fragments, and obtain the transaction data corresponding to the transaction identifier.
  • the transaction fragment tx1_P1 is obtained from assembly queue 1, and the log stream list of tx1 and the log stream identifier P1 are obtained from the transaction fragment. Since the log stream list of tx1 has not been stored in the transaction manager, the log stream list of tx1 is provided to the transaction manager for storage, and the identity P1 of the log stream is marked as arrival in the log stream list of tx1 maintained by the transaction manager. . After obtaining tx2_P2 from assembly queue 2, obtain the log stream list and log of tx1 from the transaction shard The identifier of the stream is P2.
  • the log stream identifier P2 is marked as arrived in the log stream list of tx1 maintained by the transaction manager.
  • the transaction fragment tx2_P4 is obtained from assembly queue 3, and the log stream list of tx2 and the log stream identifier P4 are obtained from the transaction fragment. Since the log stream list of tx2 has not been stored in the transaction manager, the log stream list of tx2 is provided to the transaction manager for storage, and the identity P4 of the log stream is marked as arrival in the log stream list of tx2 maintained by the transaction manager. .
  • tx2_P5 After obtaining tx2_P5 from assembly queue 2, obtain the log stream list of tx2 and the log stream identifier P5 from the transaction shard. Since the log stream list of tx2 is already stored in the transaction manager, the log stream identifier P5 is marked as arrived in the log stream list of tx2 maintained by the transaction manager.
  • tx1_P3 After obtaining tx1_P3 from assembly queue 3, obtain the log stream list of tx1 and the log stream identifier P3 from the transaction shard. Since the log stream list of tx1 is already stored in the transaction manager, the log stream identifier P3 is marked as arrived in the log stream list of tx1 maintained by the transaction manager.
  • the identifiers of all log streams in the log stream list of tx1 are marked as arrived, so the received transaction fragments carrying tx1 can be assembled to obtain the transaction data corresponding to tx1.
  • the processing of other transaction fragments can be deduced in the same way, so that the transaction fragments corresponding to each transaction identifier can be assembled separately.
  • the transaction data of tx1 may be sent to the sequencing queue first, and then the transaction data of tx3 may be input to the sequencing queue. Then, due to concurrent execution or other reasons, the transaction data of tx2 has not yet been sent to the sequencing queue. If the transaction data of tx2 is not sent to the sequencing queue for a long time, it may cause the transaction data of tx3 to be output first after the transaction data of tx1 is output, and the correctness and integrity of the output sequence cannot be guaranteed. However, in some cases, there is a dependency relationship between transaction data between transactions.
  • each log written to the log stream carries a preparation version number; after obtaining the transaction data corresponding to each transaction, refer to Figure 2 to further perform sequencing processing on each transaction data.
  • the sequencing process may include step 206: determining the sequence of transactions according to the preparation version number carried in each log, and outputting the transaction data of each transaction in sequence according to the sequence.
  • each transaction when assembling all transaction fragments carrying the same transaction identifier, it further includes: each time the transaction data of a transaction is assembled, the preparation version number carried in each log corresponding to the transaction is used. , determine the delivery version number of the transaction data, carry the determined delivery version number in the transaction data, and then send the transaction data to the sequencing queue; in this way, in step 206, each transaction is output in sequence according to the order.
  • the transaction data includes step 2061: using the delivery version number and sequencing queue in each transaction data, output the transaction data of each transaction; wherein, the output order conforms to the order of each transaction.
  • the method further includes: each time a transaction fragment is aggregated, according to Aggregate the preparation version numbers carried in each log used by the transaction fragment, determine the version number corresponding to the transaction fragment; carry the version number in the transaction fragment, and send the transaction fragment to more than one One of the assembly queues of the assembly queue; and sending a global heartbeat value to each assembly queue every preset cycle time; wherein, the global heartbeat value is equal to the global heartbeat value sent to each assembly queue within this preset cycle time.
  • the minimum value of the version number carried in the transaction shard accordingly, the above step 2061 includes: obtaining the global heartbeat value from all assembly queues respectively. If the same global heartbeat value is obtained from all assembly queues, use the same The global heartbeat value updates the delivery check value corresponding to the sequencing queue; and outputs transaction data in the sequencing queue whose delivery version number is less than or equal to the delivery check value.
  • the sequencing queue can be implemented using a small top heap.
  • the transaction data inside the sequencing queue are arranged in ascending order according to the commit version number (commit version).
  • a new parameter namely the global heartbeat value
  • a new parameter can be introduced to sequence each transaction data.
  • prepare version In distributed databases such as OceanBase, distributed transactions pass 2PC (Two-Phase Commit , two-phase commit) protocol ensures the atomicity of transactions. Each participant in the transaction will generate a prepare version during the negotiation process, so each log will have prepare version information.
  • the transaction fragments generated by aggregating logs in the log stream may also have a version number. The version number corresponding to the transaction fragment is based on the version number carried in each log used to aggregate the transaction fragment.
  • the prepare version is determined. Prepare versions are usually incremental based on the order in which transactions occur.
  • the version number corresponding to the above-mentioned transaction sharding may be a prepared version number among the prepared version numbers carried in each log used by the transaction sharding.
  • the version number corresponding to the above-mentioned transaction fragment may be the delivery version number of the transaction to which the transaction fragment belongs.
  • each participant can obtain the commit version of a transaction through negotiation.
  • the commit version usually uses the maximum prepare version of each log of a transaction.
  • the commit version information of the transaction data can be obtained.
  • Global heartbeat value (GH, Globalheartbeat): It is a global value sent to each assembly queue at each preset cycle time. It is generated based on the version number corresponding to the transaction fragment carried by the obtained transaction fragment. As one of the implementable methods, the Globalheartbeat value is determined by the minimum value of the version number carried in each transaction fragment sent to each assembly queue within this preset cycle.
  • Commit checkpoint It is a parameter maintained for the sequencing queue to determine whether the transaction data in the sequencing queue can be output.
  • the commit checkpoint is determined based on Globalheartbeat. If all assembly threads obtain Globalheartbeat, the value of the commit checkpoint is updated to the value of Globalheartbeat. For the sequencing queue, all transaction data whose commit version is less than or equal to the commit checkpoint can be output.
  • Globalheartbeat can be periodically determined and sent to the assembly queue.
  • the Globalheartbeat can be determined and sent to the assembly queue after each transaction fragment is polled for the assembly queue.
  • Other sending opportunities can also be used, which are not listed here.
  • the log stream can be obtained through multiple processes and the log stream can be aggregated to obtain a transaction fragment of the transaction.
  • each process corresponds to a log stream version number.
  • each dispatch_progress after each dispatch_progress obtains the transaction fragment, it updates the log stream version number of the dispatch_progress according to the version number of the transaction fragment. If the version number of the currently obtained transaction fragment is less than the log stream version number, the version number of the transaction fragment is used to update the log stream version number; otherwise, the log stream version number is not updated.
  • the minimum value of the log stream version number of each dispatch_progress will be used as Globalheartbeat and sent to each assembly queue.
  • GH1 is the minimum value of the version numbers among tx1_P1, tx1_P2 and tx2_P4, that is, the version number of tx1_P1.
  • GH2 is the minimum value of the version numbers in tx2_P5 and tx1_P3, that is, the version number of tx1_P3 is taken.
  • GH3 is the minimum value of the version numbers among tx3_P8, tx2_P6 and tx3_P7, that is, the version number of tx2_P6 is taken.
  • the commit checkpoint is set to the value of GH1.
  • the transaction data of tx1 and tx2 have not been assembled yet, and there is no data in the sequencing queue.
  • the commit checkpoint is set to the value of GH2, which is the version number of tx1_P3.
  • the transaction data of tx1 is assembled and placed in the sequencing queue.
  • the commit version of the transaction data of tx1 is the maximum value of all prepare versions in tx1_P1, tx1_P2 and tx1_P3. This means that the current commit checkpoint is smaller than the commit version of tx1, and the sequencing queue does not output.
  • the commit checkpoint is set to the value of GH3, which is the version of tx2_P6 This number.
  • the transaction data of tx2 and tx3 are assembled and located in the sequencing queue.
  • the commit version of the transaction data of tx2 is the maximum value of all prepare versions in tx2_P4, tx2_P5 and tx2_P6.
  • the commit version of the transaction data of Tx3 is the maximum value of all prepare versions in tx3_P7 and tx3_P8.
  • Figure 8 shows a structural diagram of a device for processing transaction logs according to an embodiment of this specification.
  • the device 800 may include: a log stream acquisition unit 801 and a transaction assembly unit 804, and may further include shard acquisition Unit 802, fragment delivery unit 803, transaction sequencing unit 805 and heartbeat generation unit 806.
  • the main functions of each component unit are as follows: the log stream acquisition unit 801 is configured to acquire at least two log streams from the distributed database; each log stream carries a transaction identifier.
  • the transaction assembly unit 804 is configured to determine, based on the transaction identifier carried in each log stream, whether all log streams written in all logs corresponding to a transaction have been obtained, and if so, use all log streams to perform data assembly to Get the transaction data corresponding to the transaction.
  • the shard acquisition unit 802 is configured to execute for each acquired log stream: obtain each log corresponding to the same transaction from the current log stream; and obtain each log stream.
  • the log aggregates into a transaction fragment corresponding to the transaction; the transaction fragment carries the identifier of the current log stream and the transaction identifier of the transaction.
  • the transaction assembly unit 804 can perform: based on the identifier of the log stream and the transaction identifier obtained from each transaction shard, determine whether it has been obtained. All log streams to which all logs corresponding to a transaction are written.
  • the transaction assembly unit 804 can perform the following steps: assemble all transaction fragments carrying the same transaction identifier.
  • each transaction identifier corresponds to a log stream list
  • the log stream list includes: the identifiers of all log streams to which all logs corresponding to the transaction with the transaction identifier are written.
  • the transaction assembly unit 804 determines the log stream identifier and the transaction identifier obtained from each transaction shard. To determine whether all log streams written to all logs corresponding to a transaction have been obtained, you can execute: for each transaction fragment, execute: obtain the log stream identifier and transaction identifier from the current transaction fragment; in all In the log stream list corresponding to the acquired transaction identifier, mark the acquired log stream identifier as arrived; and for each log stream list, if the identifiers of all log streams in the current log stream list are marked as arrived, Then it is determined that all log streams in which all logs corresponding to a transaction are written have been obtained; where the transaction identifier of the transaction corresponds to the current log stream list.
  • the fragment delivery unit 803 is configured to send the aggregated transaction fragment to one or more assembly queues each time the fragment acquisition unit 802 aggregates the transaction fragment. An assembly queue.
  • the transaction assembly unit 804 is configured to use at least one thread to obtain each transaction fragment from more than one assembly queue.
  • each log written to the log stream carries a preparation version number.
  • the transaction sequencing unit 805 is configured to determine the order of transactions according to the preparation version number carried in each log, and output the transaction data of each transaction sequentially according to the order.
  • the transaction assembly unit 804 is configured to use the preparation version number carried in each log corresponding to the transaction to determine the delivery version number of the transaction data each time the transaction data of a transaction is assembled. The determined delivery version number is carried in the transaction data, and then the transaction data is sent to the sequencing queue.
  • the transaction sequencing unit 805 is configured to use the delivery version number and sequencing queue in each transaction data to output the transaction data of each transaction; wherein the output order conforms to the order of each transaction.
  • the fragment acquisition unit 802 determines the transaction fragment corresponding to the transaction fragment according to the preparation version number carried in each log used to aggregate the transaction fragment. version number.
  • the heartbeat generation unit 806 is configured to send a global heartbeat value to each assembly queue every preset cycle time; wherein the global heartbeat value is equal to the transaction points sent to each assembly queue within this preset cycle time. The minimum version number carried in the slice.
  • the transaction sequencing unit 805 is specifically configured to: obtain the global heartbeat value from all assembly queues respectively, and if the same global heartbeat value is obtained from all assembly queues, update the sequencing queue with the same global heartbeat value. The corresponding delivery check value; output the transaction data in the sequencing queue whose delivery version number is less than or equal to the delivery check value.
  • the version number carried in the transaction shard is: aggregating the version number used by the transaction shard One of the preparation version numbers carried in each log used.
  • the version number carried in the transaction fragment is: the delivery version number of the transaction to which the transaction fragment belongs.
  • the delivery version number is equal to the preparation version number carried in each log corresponding to the transaction. maximum value.
  • the functions described in the present invention can be implemented by hardware, software, firmware, or any combination thereof.
  • the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
  • Embodiments of this specification also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the steps of the method described in any one of the foregoing method embodiments are implemented.
  • an electronic device comprising: one or more processors; and memory associated with the one or more processors, the memory being used to store program instructions, the program instructions being processed by the one or more processors When the processor reads and executes, the steps of the method described in any one of the foregoing method embodiments are executed.
  • Embodiments of this specification also provide a computer program product, including a computer program that, when executed by a processor, implements the steps of the method described in any one of the foregoing method embodiments.
  • the memory can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc.
  • ROM Read Only Memory
  • RAM Random Access Memory
  • static storage device static storage device
  • dynamic storage device etc.
  • the embodiments of this specification can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the technical solutions of the embodiments of this specification can be embodied in the form of a computer program product in essence or that contribute to the existing technology.
  • the computer program product can be stored in a storage medium, such as ROM/RAM. , magnetic disk, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments of this specification.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种处理事务日志的方法及装置。针对一个事务对应的所有日志被写入分布式数据库中的至少两个日志流中,该方法包括:从分布式数据库获取至少两个日志流;其中,每一个日志流中携带有事务标识;根据各日志流中携带的事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流,如果是,则利用该所有日志流进行数据组装,以得到该事务对应的事务数据。能够实现针对分布式事务的多日志流进行组装。

Description

处理事务日志的方法及装置 技术领域
本说明书一个或多个实施例涉及计算机技术领域,尤其涉及一种处理事务日志的方法及装置。
背景技术
数据库会将每个***、更新、删除等操作记录到日志里,因此基于比如日志的CDC(change data capture,变更数据捕获)技术,通过解析日志获取到数据库完整的数据变更历史,从而实现数据同步。对于传统的单机数据库,例如Oracle、MySQL等,全局只有一个日志流,因此通过顺序获取和解析日志流即可还原出事务的提交历史记录,且事务提交顺序等价于日志中事务数据记录的顺序。对于诸如Oracle RAC等支持多点写入的***,日志流虽然分布在不同的机器节点上,但是SCN(System Change Number,***变更序号)单点分配,且保证单调递增,通过SCN的顺序即可获取事务的历史提交记录。
然而,对于OceanBase等采用多日志流的分布式数据库而言,每个日志流都有自己独立的LSN(log sequence number,日志序列号)且支持分布式事务写入。分布式事务写入时可能会将不同日志流分布于多个机器节点,当存在多个并发执行的事务时,不同日志流的日志写入顺序都是随机的,且多个日志流没有全局时序,因此如何针对分布式事务的多日志流来处理事务日志,从而能得到一个事务的事务数据,是一个亟待解决的问题。
发明内容
本说明书一个或多个实施例提供了一种处理事务日志的方法及装置,用以实现针对分布式事务的多日志流来得到事务的事务数据。
根据第一方面,本说明书实施例提供了一种处理事务日志的方法,其中,事务为访问和/或操作数据的数据库操作序列;一个事务对应的所有日志被写入分布式数据库中的至少两个日志流中。所述方法包括:从分布式数据库获取至少两个日志流;其中,每一个日志流中携带有事务标识;根据各日志流中携带的事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流,如果是,则利用该所有日志流进行数据组装,以得到该事务对应的事务数据。
根据本申请实施例中一可实现的方式,该方法进一步包括:针对获取的每一个日志流,均执行:从当前日志流中得到对应于同一个事务的各个日志;以及将得到的该各个 日志聚合出对应该事务的一个事务分片;其中,该事务分片中携带有当前日志流的标识以及该事务的事务标识;相应地,所述确定是否已经获取一个事务对应的所有日志被写入的所有日志流,包括:依据从各个事务分片中获取的日志流的标识以及事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流;相应地,所述利用该所有日志流进行数据组装,包括:将携带有同一事务标识的所有事务分片进行组装。
根据本申请实施例中一可实现的方式,该方法进一步包括:每一个事务标识对应一个日志流列表,该日志流列表中包括:具有该事务标识的事务对应的所有日志被写入的所有日志流的标识;相应地,所述依据从各个事务分片中获取的日志流的标识以及事务标识确定是否已经获取一个事务对应的所有日志被写入的所有日志流,包括:针对每一个事务分片,均执行:从当前事务分片中获取日志流的标识以及事务标识;在所获取的事务标识对应的日志流列表中,将所获取的该日志流的标识标记为到达;以及针对每一个日志流列表,如果当前日志流列表中所有的日志流的标识均被标记为到达,则确定已经获取了一个事务对应的所有日志被写入的所有日志流;其中,该一个事务的事务标识对应当前日志流列表。
根据本申请实施例中一可实现的方式,在每聚合出一个事务分片时,进一步包括;将聚合出的该事务分片发送至一个以上的组装队列的其中一个组装队列;在所述依据从各个事务分片中获取的日志流的标识以及事务标识确定是否已经获取一个事务对应的所有日志被写入的所有日志流之前,进一步包括:利用至少一个线程从一个以上的组装队列中分别获取各个事务分片。
根据本申请实施例中一可实现的方式,每一个被写入日志流的日志中携带有准备版本号;在得到事务对应的事务数据之后,该方法进一步包括:根据各日志中携带的准备版本号,确定各个事务之间的先后顺序,按照该先后顺序依次输出各个事务的事务数据。
根据本申请实施例中一可实现的方式,在所述将携带有同一事务标识的所有事务分片进行组装时,进一步包括:每组装出一个事务的事务数据时,利用该事务对应的各日志中携带的准备版本号,确定该事务数据的投递版本号,将确定出的该投递版本号携带在该事务数据中,然后将该事务数据发送至定序队列中;所述按照该先后顺序依次输出各个事务的事务数据,包括:利用每一个事务数据中的投递版本号及定序队列,输出各个事务的事务数据;其中,输出的顺序符合各事务之间的先后顺序。
根据本申请实施例中一可实现的方式,该方法进一步包括:在每聚合出一个事务分片时,根据聚合出该事务分片所使用的各日志中携带的准备版本号,确定该事务分片对应的版本号;将该版本号携带在该事务分片中,并将该事务分片发送至一个以上的组装队列的其中一个组装队列;以及每隔一个预设周期时间向每一个组装队列中发送一个全 局心跳值;其中,该全局心跳值等于本预设周期时间内发送至各个组装队列中的各个事务分片中携带的版本号的最小值;相应地,所述利用每一个事务数据中的投递版本号及定序队列输出各个事务的事务数据,包括:分别从所有组装队列中获取全局心跳值,若从所有组装队列中分别获取到了同一个全局心跳值,利用该同一个全局心跳值更新所述定序队列对应的投递检查值;将所述定序队列中投递版本号小于或等于所述投递检查值的事务数据进行输出。
根据本申请实施例中一可实现的方式,事务分片中携带的版本号为:聚合出该事务分片所使用的各日志中携带的各准备版本号中的一个准备版本号;或者,该事务分片所属事务的投递版本号,该投递版本号等于该事务对应的各日志中携带的各个准备版本号中的最大值。
根据第二方面,本说明书实施例提供了一种处理事务日志的装置,其中,事务为访问和/或操作数据的数据库操作序列;一个事务对应的所有日志被写入分布式数据库中的至少两个日志流中;所述装置包括:日志流获取单元,被配置为从分布式数据库获取至少两个日志流;其中,每一个日志流中携带有事务标识;事务组装单元,被配置为根据各日志流中携带的事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流,如果是,则利用该所有日志流进行数据组装,以得到该事务对应的事务数据。
根据第三方面,本说明书实施例提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行如上所述的方法。
根据第四方面,本说明书实施例提供了一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现如上所述的方法。
由以上技术方案可以看出,本说明书一个或多个实施例的组合,至少具备以下优点:1)本说明书实施例在对从分布式数据库中获取日志流后,根据各日志流中携带的事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流,如果是,则利用该所有日志流进行组装,以得到该事务对应的事务数据。可见通过本说明书实施例能够实现针对分布式事务的多日志流,来得到一个事务的事务数据。
2)本说明书实施例依据各日志中携带的准备版本号设置全局心跳值和定序队列的投递检查值,利用定序队列中事务数据的投递版本号和定序队列的投递检查值之间的关系来控制事务数据的输出顺序,从而保证事务数据输出顺序的正确性。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明 的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了可以应用本说明书实施例的示例性***架构图;
图2为本说明书实施例提供的处理事务日志的方法流程图;
图3为本说明书实施例提供的产生事务分片的示意图;
图4为本说明书实施例提供的一种组装处理流程图;
图5为本说明书实施例提供的一种组装事务数据的实例图;
图6为本说明书实施例提供的一种事务数据的定序实例图;
图7为本说明书实施例提供的产生全局心跳值的示意图;
图8为本说明书实施例提供的处理事务日志的装置的结构图。
具体实施方式
下面结合附图,对本说明书提供的方案进行描述。
在本发明实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本发明。在本发明实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。
图1示出了可以应用本说明书实施例的示例性***架构。该***主要包括:分布式数据库和数据同步设备。
为了保证数据库***中数据操作的原子性和持久性,通常采用预写日志(WAL,Write-Ahead Logging)机制,即先持久化一个事务的日志到日志流中。分布式事务支持多机写入,可以分别持久化对应日志到对应的日志流中,同一日志流可以在不同机器节点进行同步。日志流(LogStream)是日志读写的基本单位,记录数据库的事务的日志。
所谓事务(Transation)是指访问和/或操作数据的数据库操作序列。在计算机术语中指访问并可能更新数据库中数据项的一个程序执行单元,由事务开始到事务结束之间执行的全体操作组成,这些全体操作必须全部成功完成,否则在每个操作中所作的所有更改都会被撤消。比如转账事务可以由对一个账号的余额进行增加,以及对另一个账号的余额进行减少组成。
在分布式数据库中,一个事务对应的所有日志通常被写入分布式数据库中的至少两个日志流中。
数据同步设备可以从分布式数据库中拉取日志流,采用本说明书实施例中提供的处 理事务日志的方法进行处理,组装得到各事务的事务数据。进一步地,基于各事务数据之间的依赖关系或顺序进行数据同步。
上述的分布式数据库可以采用多个机器节点作为运行分布式数据库软件的实例。数据同步设备与分布式数据库之间可以通过网络进行交互,网络可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
数据同步设备可以是单一服务器,也可以是多个服务器构成的服务器群组,还可以是云服务器。云服务器又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决传统物理主机与虚拟专用服务器(VPs,Virtual Private Server)服务中存在的管理难度大,服务扩展性弱的缺陷。除此之外也可以是具备较强计算能力的计算机终端。
应该理解,图1中的分布式数据库、机器节点以及数据同步设备的数目仅仅是示意性的。根据实现需要,可以具有任意数目的分布式数据库、机器节点以及数据同步设备。
图2为本说明书实施例提供的处理事务日志的方法流程图。可以理解,该方法可以通过图1所示***中的数据同步设备来执行。参见图2,该方法包括:步骤202:从分布式数据库获取多于一个的日志流,每一个日志流中携带有事务标识。
步骤204:根据各日志流中携带的事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流,如果是,利用该所有日志流进行数据组装,以得到该事务对应的事务数据。
由以上实施例提供的技术内容可以看出,在对从分布式数据库中获取日志流后,根据各日志流中携带的事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流,如果是,则利用该所有日志流进行组装,以得到该事务对应的事务数据。可见通过本说明书实施例能够实现针对分布式事务的多日志流组装,从而得到每一个事务的事务数据。
下面对图2所示的各个步骤进行说明。
首先结合实施例对上述步骤202即“从分布式数据库获取多于一个的日志流”进行详细描述。
在诸如OceanBase等分布式数据库中,一般会存在多个日志流,以图3所示为例,存在P1~P8八个日志流,每个日志流可能存在多个副本。在图3中,带灰度部分表示主副本,未带灰度部分表示备副本,每个副本分布于不同的机器节点上。在分布式数据库中各机器节点可以通过诸如Paxos算法来保证日志流的各副本的数据一致性。
本说明书实施例中,数据同步设备可以预先获取分布式数据库中的日志分布表,该日志分布表中记录有各日志流的分布状况。数据同步设备可以依据该日志分布表从数据库中获取各日志流,例如图3中的P1~P8的日志流。
为了加快日志流的获取效率,可以采用RPC(Remote Procedure Call,远程过程调用)并发地从分布式数据库获取多个日志流,例如采用8个进程分别用以获取P1~P8的日志流。基于日志分布列表,各进程从对应的机器节点上分别获取各日志流。其中,可以默认先从主副本获取日志流,若主副本不可用,则从备副本获取日志流。也可以基于预设的负载均衡策略从主副本和备副本中的一个来获取日志流。
下面结合实施例重点对上述步骤204即“根据各日志流中携带的事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流,如果是,利用该所有日志流进行数据组装,以得到该事务对应的事务数据”进行详细描述。
一个事务通常会对应多个日志,该多个日志通常被写入多个日志流中。比如,事务1对应5个日志,该5个日志分别被写入了日志流1、日志流2以及日志流3中。因此,一个日志流中可以包括同一个事务的一个或者多个日志。基于此,在本说明书一个实施例中,提出了事务分片的概念。事务分片是指:同一个日志流中的对应于同一个事务的所有日志的聚合。每个事务分片具有事务标识和日志流标识,即携带有该事务分片所来源的日志所属事务的事务标识以及携带有该事务分片所来源的日志流的标识。这样。在步骤202与步骤204之间,可以进一步包括步骤203:从每一个日志流中得到对应于同一个事务的所有日志,将从同一个日志流中得到的对应于同一个事务的所有日志聚合为对应于该事务的一个处理单元,将该处理单元称之为事务分片。相应地,后续步骤204中,则可以利用从多个日志流中获取的对应于同一个事务的多个事务分片进行组装,即,将携带有同一事务标识的所有事务分片进行组装,组装出该同一个事务的事务数据。
举例说明步骤203的处理。比如,日志流1中被写入了事务1的日志1及日志2,从该日志流1中得到事务1的日志1及日志2,将日志1及日志2聚合在一起,形成事务1的一个事务分片,记为事务分片1,事务分片1中携带有事务1的标识以及日志流1的标识;日志流2中被写入了事务1的日志3,从该日志流2中得到事务1的日志3,将该日志3作为事务1的又一个事务分片,记为事务分片2,事务分片2中携带有事务1的标识以及日志流2的标识;日志流3中被写入了事务1的日志4及日志5,从该日志流3中得到事务1的日志4及日志5,将日志4及日志5聚合在一起,形成事务1的再一个事务分片,记为事务分片3,事务分片3中携带有事务1的标识以及日志流3的标识。后续将该3个事务分片进行组装,既可以得到事务1的事务数据。
其中,对同一个日志流中对应于同一个事务的所有日志进行聚合的过程可以包括: 将该所有日志按序排列,或者,将该所有日志按序进行合并。例如,某个日志流包括日志S1:对账户a减少A元;日志S2:对账户a增加B元。那么利用该日志S1及日志S2聚合出的事务分片可以包括对账户a减少A元以及对账户a增加B元两个操作。
对于分布式数据库中的日志流,为了对日志流进行区分,各日志流中均携带有事务标识和日志流的标识。其中,日志流的标识比如可以是日志流的LSN(log sequence number,日志序列号)。通常一个事务的日志数据可能会被分成多个日志流写入不同的机器节点,因此为了各日志流所属的事务进行区分和标识,会在日志流中携带事务标识。另外,对于一个事务中的多个日志流分别采用不同的日志流的标识进行区分。
举个例子,假设分布式数据库中发生三个分布式事务:记为tx1、tx2和tx3。事务tx1的日志数据写入了P1~P3的日志流,这些日志流携带有事务标识tx1,以及分别携带有各自的日志流标识P1~P3。事务tx2的日志数据写入了P4~P6的日志流,这些日志流携带有事务标识tx2,以及分别携带有日志流标识P4~P6。事务tx3的日志数据写入了P7~P8的日志流,这些日志流携带有事务标识tx3,以及分别携带有各自的日志流标识P7~P8。
在利用上述步骤203得到各事务分片后,本步骤204中,在确定是否已经获取一个事务对应的所有日志被写入的所有日志流时,具体是执行步骤2041:依据从各个事务分片中获取的日志流的标识以及事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流。
在实际的业务实现中,上述步骤202通常是利用多个线程来执行的,也就是说,由多个线程分别从分布式数据库获取至少两个日志流,这样就会导致多个日志流的获取时间是存在先后顺序的,也就是说,有的日志流先到达(即,先获取到),有的日志流后到达(即,后获取到)。这样可以利用日志流列表来实现步骤2041的过程。在分布式数据库中为了对同一事务的各日志流进行区分,会记录每个事务标识对应的日志流列表,该日志流列表包括具有该事务标识的事务对应的所有日志被写入的所有日志流的标识。
作为其中一种可实现的方式,数据同步设备可以从分布式数据库中预先获取各事务标识对应的日志流列表。举个例子,事务tx1的所有日志被写入了3个日志流,那么事务标识tx1对应的日志流列表中包括了该3个日志流的标识:P1、P2和P3。
作为另一种可实现的方式,分布式数据库端可以在各日志流中携带该日志流所属事务的事务标识对应的日志流列表。举个例子,事务tx1的所有日志被写入了3个日志流包括P1、P2和P3,那么在P1、P2和P3的日志流中均包含有tx1对应的日志流列表,该日志流列表中包含P1、P2和P3,用于表明tx1对应哪些日志流。将得到的该日志聚合出对应该事务的一个事务分片后,每一个事务分片也携带有该事务分片所来源的日志流的日志流列表。
这样,步骤2041的过程可以包括:步骤2041A:针对每一个事务分片,均执行:从当前事务分片中获取日志流的标识以及事务标识;在所获取的事务标识对应的日志流列表中,将所获取的该日志流的标识标记为到达;步骤2041B:针对每一个日志流列表,如果当前日志流列表中所有的日志流的标识均被标记为到达,则确定已经获取了一个事务对应的所有日志被写入的所有日志流;其中,该一个事务的事务标识对应当前日志流列表。
比如,对于步骤2041A,对于事务分片1,从事务分片1中获取了日志流标识为P1,事务标识为tx1,那么,在tx1对应的日志流列表(该列表包括P1、P2、P3)中将P1标记为到达。如此执行,对于每一个事务分片都执行2041A的处理后,如果一个日志流列表比如tx1对应的日志流列表中,P1、P2、P3均被21标记为到达,则说明,已经得到了事务tx1的所有日志被写入的所有日志流,也就意味着,已经从所有日志流中获取到了事务tx1的所有事务分片,即事务tx1的所有日志已经获取完毕,可以组装事务tx1的事务数据了。
在本说明书一个实施例中,在每聚合出一个事务分片时,进一步包括;将聚合出的该事务分片发送至一个以上的组装队列的其中一个组装队列;在依据从各个事务分片中获取的日志流的标识以及事务标识确定是否已经获取一个事务对应的所有日志被写入的所有日志流之前,进一步包括:利用至少一个线程从一个以上的组装队列中分别获取各个事务分片。
作为其中一种可实现的方式,在数据同步设备中可以存在一个组装队列,上面步骤中获取的各事务分片均发送至该组装队列以供后续进行组装,但这种方式性能较为低下。
为了优化组装性能,支持并发式处理,本说明书实施例提供了另一种可实现的方式,即采用多个组装队列。上面步骤中得到的各个事务分片存入该多个组装队列。如图3中所示,假设存在三个组装队列,对于上面步骤得到的事务分片可以采用round-robin(轮询调度)方式将从各日志流产生的各个事务分片发送至多个组装队列中。比如:事务分片tx1_P1(表示该事务分片的事务标识和日志流的标识分别为tx1和P1,后续采用类似表达)、tx1_P2、tx1_P3分别位于组装队列1~3;tx2_P4、tx2_P5、tx2_P6分别位于队列3、队列2和队列2;tx3_P7、tx3_P8分别位于队列3和队列1。
组装队列是一个先入先出队列,每个组装队列可以分别绑定一个组装线程以执行后续对事务分片的组装。
本步骤204中对事务分片的难点在于,如何获知哪些事务分片属于同一事务,以及属于同一事务的事务分片是否均已经获取到从而可以开始进行组装。为了解决该难点,本申请实施例中利用的是日志流列表和日志流的标识来对同一事务的事务分片进行组 装。
作为其中一种可实现的方式,对于每一个组装线程均可以执行如图4中所示的组装处理流程,如图4中所示,该组装处理流程可以包括以下步骤:步骤402:从组装队列中获取事务分片。
步骤404:从获取到的当前事务分片中获取日志流列表和日志流的标识,为了方便描述,将从当前事务分片中获取的日志流列表和日志流的标识称为当前日志流列表和当前日志流的标识。
步骤406:判断事务管理器中是否已经维护有当前事务分片携带的当前日志流列表,如果否,执行步骤408;如果是,执行步骤410。
为了提高处理效率,本说明书实施例中采用多线程的方式并行对组装队列中的事务分片进行处理,为了能够获知每个线程获取到的事务分片是否已经能够进行组装,即已经完成一个事务对应的所有事务分片的获取,本说明书实施例中设置了一个事务管理器,由该事务管理器统一维护已经获取到的事务分片信息,也就是说,对于从组装队列中获取到的每一个事务分片的“上下文”均在事务管理器中进行记录并能够从事务管理器中查询到。当然,也可以采用其他方式统一维护各事务标识对应的日志流列表,本实施例中仅以事务管理器为例进行描述。
步骤408:将当前日志流列表提供给事务管理器进行维护。
在事务管理器中维护有各事务的日志流列表,各事务的日志流列表与事务标识相对应,包含该事务标识对应的所有日志流的标识。
步骤410:在事务管理器维护的当前日志流列表中上述当前日志流的标识标记为到达。
步骤412:判断当前日志流列表中是否所有的日志流的标识均被标记为到达,如果是,执行步骤414;否则,转至步骤402。
步骤414:将已经获取到的事务分片中,携带当前日志流列表所对应事务标识的事务分片进行组装,得到该事务标识对应的事务数据。
为了方便对上述流程的理解,下面以图5为例进行描述。首先从组装队列1中获取到事务分片tx1_P1,从该事务分片中获取到tx1的日志流列表以及日志流的标识P1。由于事务管理器中尚未存储tx1的日志流列表,因此将tx1的日志流列表提供给事务管理器进行存储,并在事务管理器维护的tx1的日志流列表中将日志流的标识P1标记为到达。从组装队列2中获取tx2_P2后,从该事务分片中获取到tx1的日志流列表以及日志 流的标识P2。由于事务管理器中已经存储有tx1的日志流列表,因此在事务管理器维护的tx1的日志流列表中将日志流的标识P2标记为到达。从组装队列3中获取到事务分片tx2_P4,从该事务分片中获取到tx2的日志流列表以及日志流的标识P4。由于事务管理器中尚未存储tx2的日志流列表,因此将tx2的日志流列表提供给事务管理器进行存储,并在事务管理器维护的tx2的日志流列表中将日志流的标识P4标记为到达。从组装队列2中获取tx2_P5后,从该事务分片中获取到tx2的日志流列表以及日志流的标识P5。由于事务管理器中已经存储有tx2的日志流列表,因此在事务管理器维护的tx2的日志流列表中将日志流的标识P5标记为到达。从组装队列3中获取tx1_P3后,从该事务分片中获取到tx1的日志流列表以及日志流的标识P3。由于事务管理器中已经存储有tx1的日志流列表,因此在事务管理器维护的tx1的日志流列表中将日志流的标识P3标记为到达。此时tx1的日志流列表中所有日志流的标识均被标记为到达,因此可以将已经接收到的携带tx1的事务分片进行组装,得到tx1对应的事务数据。对于其他事务分片的处理以此类推,这样就可以将各事务标识对应的事务分片分别进行组装。
然而,由于分布式事务数据的组装是并发执行的,因此可能得到如图5中所示的事务数据的顺序。即可能先将tx1的事务数据送入定序队列,再将tx3的事务数据输入定序队列,然后由于并发执行或其他原因,造成tx2的事务数据尚未送入定序队列。若tx2的事务数据长时间未送入定序队列,则可能造成tx1的事务数据输出后,先输出了tx3的事务数据,输出顺序的正确性和完整性无法保证。但在一些情况下,各事务之间的事务数据是存在依赖关系的,例如,事务tx2依赖tx1、tx3依赖tx2,那么需要保证tx1先输出、tx2再输出、最后输出tx3。那么就需要进一步对组装得到的各事务数据进行定序,即确定各事务数据的排序以确定事务数据的正确性。
在本说明书一个实施例中,每一个被写入日志流的日志中携带有准备版本号;在得到各个事务对应的事务数据之后,可以参见图2,进一步执行对各事务数据进行定序处理。该定序处理可以包括步骤206:根据各日志中携带的准备版本号,确定各个事务之间的先后顺序,按照该先后顺序依次输出各个事务的事务数据。
在本说明书一个实施例中,在将携带有同一事务标识的所有事务分片进行组装时,进一步包括:每组装出一个事务的事务数据时,利用该事务对应的各日志中携带的准备版本号,确定该事务数据的投递版本号,将确定出的该投递版本号携带在该事务数据中,然后将该事务数据发送至定序队列中;这样,步骤206中按照该先后顺序依次输出各个事务的事务数据,包括步骤2061:利用每一个事务数据中的投递版本号及定序队列,输出各个事务的事务数据;其中,输出的顺序符合各事务之间的先后顺序。
在本说明书一个实施例中,该方法进一步包括:在每聚合出一个事务分片时,根据 聚合出该事务分片所使用的各日志中携带的准备版本号,确定该事务分片对应的版本号;将该版本号携带在该事务分片中,并将该事务分片发送至一个以上的组装队列的其中一个组装队列;以及每隔一个预设周期时间向每一个组装队列中发送一个全局心跳值;其中,该全局心跳值等于本预设周期时间内发送至各个组装队列中的各个事务分片中携带的版本号的最小值;相应地,上述步骤2061包括:分别从所有组装队列中获取全局心跳值,若从所有组装队列中分别获取到了同一个全局心跳值,利用该同一个全局心跳值更新所述定序队列对应的投递检查值;将所述定序队列中投递版本号小于或等于所述投递检查值的事务数据进行输出。
下面对上述步骤206涉及的定序处理进行详细说明。
定序队列可以采用小顶堆的方式实现,定序队列内部的事务数据都按照投递版本号(commit version)从小到大的顺序排列。
在本说明书实施例中,可以引入一个新的参数,即全局心跳值来对各事务数据进行定序。为了方便对后续的定序过程进行理解,首先对其中涉及到的几个概念进行说明:准备版本号(prepare version):在诸如OceanBase等分布式数据库中,分布式事务通过2PC(Two-Phase Commit,两阶段提交)协议保证事务的原子性,事务的各参与者在协商过程中会生成prepare version,因此各日志会具有prepare version的信息。相应地,在本说明书实施例中对日志流中的日志进行聚合生成的事务分片也可以具有一个版本号,事务分片对应的版本号是依据聚合出该事务分片所用的各日志中携带的prepare version确定的。Prepare version通常依据事务的发生顺序是递增的。
作为其中一种可实现的方式,上述事务分片对应的版本号可以是聚合出事务分片所使用的各日志中携带的各准备版本号中的一个准备版本号。
作为另一种可实现的方式,上述事务分片对应的版本号可以是该事务分片所属事务的投递版本号。
投递版本号(commit version):分布式事务场景下,各个参与者通过协商可以得到一个事务的commit version。该commit version通常采用一个事务的各日志具有的prepare version的最大值。相应地,在本说明书实施例中,在将同一事务的事务分片进行组装后,可以得到事务数据的commit version的信息。
全局心跳值(GH,Globalheartbeat):是每个一个预设周期时间发送至每一个组装队列的全局值,依据得到的事务分片携带的事务分片对应的版本号生成。作为其中一种可实现的方式,该Globalheartbeat值由本预设周期时间内发送至各组装队列中的各事务分片中携带的版本号的最小值确定。
后续实施例将对全局心跳值的具体产生和应用方式进行详述。
投递检查值(commit checkpoint):是针对定序队列维护的一个参数,用以判断定序队列中的事务数据是否可以输出。该commit checkpoint依据Globalheartbeat来确定,若所有组装线都程获取到Globalheartbeat,则将commit checkpoint的值更新为该Globalheartbeat的值。对于定序队列而言,可以输出所有commit version小于或等于commit checkpoint的事务数据。
下面首先对Globalheartbeat的发送机制进行描述。
作为其中一种可实现的方式,Globalheartbeat可以周期性地确定并发送至组装队列。
作为另一种可实现的方式,可以在每针对组装队列轮询一次事务分片的发送后,确定Globalheartbeat并发送至组装队列。还可以采用其他的发送时机,在此不做一一列举。
之前实施例中已经提到,可以通过多进程的方式获取日志流并对日志流进行聚合得到事务的一个事务分片。如图7中所示,每个进程(dispatch_progress)都对应有一个日志流版本号。
作为其中一种可实现的方式,每个dispatch_progress在得到事务分片后,依据该事务分片的版本号更新该dispatch_progress的日志流版本号。若当前得到的事务分片的版本号小于日志流版本号,则利用该事务分片的版本号更新日志流版本号;否则,不更新日志流版本号。
在达到确定并发送Globalheartbeat的时机时,例如达到发送周期,则将当前各dispatch_progress的日志流版本号的最小值作为Globalheartbeat并发送至每一个组装队列。
以图6中所示为例,GH1是tx1_P1、tx1_P2和tx2_P4中版本号的最小值,即tx1_P1的版本号。GH2是tx2_P5和tx1_P3中版本号的最小值,即取tx1_P3的版本号。GH3是tx3_P8、tx2_P6和tx3_P7中版本号的最小值,即取tx2_P6的版本号。
各组装线程均获取到GH1时,commit checkpoint设置为GH1的值。此时,tx1和tx2的事务数据尚未组装完成,定序队列中没有数据。
各组装线程均获取到GH2时,commit checkpoint设置为GH2的值,即tx1_P3的版本号。此时,tx1的事务数据组装完成并位于定序队列中,tx1的事务数据的commit version为tx1_P1、tx1_P2和tx1_P3中所有prepare version的最大值。这就意味着当前commit checkpoint小于tx1的commit version,定序队列不做输出。
各组装线程均获取到GH3时,commit checkpoint设置为GH3的值,即tx2_P6的版 本号。此时,tx2和tx3的事务数据组装完成并位于定序队列中。tx2的事务数据的commit version为tx2_P4、tx2_P5和tx2_P6中所有prepare version的最大值。Tx3的事务数据的commit version为tx3_P7和tx3_P8中所有prepare version的最大值。这意味着当前commit checkpoint大于tx1的commit version并且等于tx2的commit version,因此依次输出tx1、tx2的事务数据,tx3的事务数据并不输出。可以看出,保证了事务数据正确的输出顺序。可以看出,保证了事务数据正确的输出顺序。
后续过程以此类推。可见,投递检查值和GH相结合形成一个类似于“barrier(栅栏)”的功能,仅允许正确顺序的事务数据输出。一方面保证了事务数据收齐并以正确顺序输出,另一方面也保证了事务数据的输出顺序与提交顺序一致。
以上是对本说明书实施例所提供方法进行的详细描述,下面对本说明书实施例所提供的装置进行详细描述。
图8示出根据本说明书一个实施例的处理事务日志的装置的结构图,如图8所示,该装置800可以包括:日志流获取单元801和事务组装单元804,还可以进一步包括分片获取单元802、分片投递单元803、事务定序单元805和心跳产生单元806。其中各组成单元的主要功能如下:日志流获取单元801,被配置为从分布式数据库获取至少两个日志流;其中,每一个日志流中携带有事务标识。
事务组装单元804,被配置为根据各日志流中携带的事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流,如果是,则利用该所有日志流进行数据组装,以得到该事务对应的事务数据。
作为其中一种可实现的方式,分片获取单元802,被配置为针对获取的每一个日志流,均执行:从当前日志流中得到对应于同一个事务的各个日志;以及将得到的该各个日志聚合出对应该事务的一个事务分片;其中,该事务分片中携带有当前日志流的标识以及该事务的事务标识。
相应地,事务组装单元804在确定是否已经获取一个事务对应的所有日志被写入的所有日志流,可以执行:依据从各个事务分片中获取的日志流的标识以及事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流。
相应地,事务组装单元804在利用该所有日志流进行数据组装时,可以执行:将携带有同一事务标识的所有事务分片进行组装。
作为其中一种可实现的方式,每一个事务标识对应一个日志流列表,该日志流列表中包括:具有该事务标识的事务对应的所有日志被写入的所有日志流的标识。
事务组装单元804在依据从各个事务分片中获取的日志流的标识以及事务标识确 定是否已经获取一个事务对应的所有日志被写入的所有日志流,时,可以执行:针对每一个事务分片,均执行:从当前事务分片中获取日志流的标识以及事务标识;在所获取的事务标识对应的日志流列表中,将所获取的该日志流的标识标记为到达;以及针对每一个日志流列表,如果当前日志流列表中所有的日志流的标识均被标记为到达,则确定已经获取了一个事务对应的所有日志被写入的所有日志流;其中,该一个事务的事务标识对应当前日志流列表。
作为其中一种可实现的方式,分片投递单元803,被配置为在分片获取单元802每聚合出一个事务分片时,将聚合出的该事务分片发送至一个以上的组装队列的其中一个组装队列。
事务组装单元804,被配置为利用至少一个线程从一个以上的组装队列中分别获取各个事务分片。
作为其中一种可实现的方式,每一个被写入日志流的日志中携带有准备版本号。
事务定序单元805,被配置为根据各日志中携带的准备版本号,确定各个事务之间的先后顺序,按照该先后顺序依次输出各个事务的事务数据。
作为其中一种可实现的方式,事务组装单元804,被配置为每组装出一个事务的事务数据时,利用该事务对应的各日志中携带的准备版本号,确定该事务数据的投递版本号,将确定出的该投递版本号携带在该事务数据中,然后将该事务数据发送至定序队列中。
相应地,事务定序单元805,被配置为利用每一个事务数据中的投递版本号及定序队列,输出各个事务的事务数据;其中,输出的顺序符合各事务之间的先后顺序。
作为其中一种可实现的方式,分片获取单元802在每聚合出一个事务分片时,根据聚合出该事务分片所使用的各日志中携带的准备版本号,确定该事务分片对应的版本号。
心跳产生单元806,被配置为每隔一个预设周期时间向每一个组装队列中发送一个全局心跳值;其中,该全局心跳值等于本预设周期时间内发送至各个组装队列中的各个事务分片中携带的版本号的最小值。
相应地,事务定序单元805具体被配置为:分别从所有组装队列中获取全局心跳值,若从所有组装队列中分别获取到了同一个全局心跳值,利用该同一个全局心跳值更新定序队列对应的投递检查值;将定序队列中投递版本号小于或等于投递检查值的事务数据进行输出。
作为其中一种可实现的方式,事务分片中携带的版本号为:聚合出该事务分片所使 用的各日志中携带的各准备版本号中的一个准备版本号。
作为另一种可实现的方式,事务分片中携带的版本号为:该事务分片所属事务的投递版本号,该投递版本号等于该事务对应的各日志中携带的各个准备版本号中的最大值。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述方法实施例中任一项所述的方法的步骤。
以及一种电子设备,包括:一个或多个处理器;以及与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行前述方法实施例中任一项所述的方法的步骤。
本说明书实施例还提供了一种计算机程序产品,包括计算机程序,该计算机程序在被处理器执行时实现前述方法实施例中任一项所述的方法的步骤。
其中,存储器可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机程序产品的形式体现出来,该计算机程序产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书各个实施例或者实施例的某些部分所述的方法。
以上的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。

Claims (10)

  1. 一种处理事务日志的方法,其特征在于,其中,事务为访问和/或操作数据的数据库操作序列;一个事务对应的所有日志被写入分布式数据库中的至少两个日志流中;所述方法包括:
    从分布式数据库获取至少两个日志流;其中,每一个日志流中携带有事务标识;
    根据各日志流中携带的事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流,如果是,则利用该所有日志流进行数据组装,以得到该事务对应的事务数据。
  2. 根据权利要求1所述的方法,其中,该方法进一步包括:针对获取的每一个日志流,均执行:
    从当前日志流中得到对应于同一个事务的各个日志;以及
    将得到的该各个日志聚合出对应该事务的一个事务分片;其中,该事务分片中携带有当前日志流的标识以及该事务的事务标识;
    相应地,所述确定是否已经获取一个事务对应的所有日志被写入的所有日志流,包括:
    依据从各个事务分片中获取的日志流的标识以及事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流;
    相应地,所述利用该所有日志流进行数据组装,包括:将携带有同一事务标识的所有事务分片进行组装。
  3. 根据权利要求2所述的方法,该方法进一步包括:每一个事务标识对应一个日志流列表,该日志流列表中包括:具有该事务标识的事务对应的所有日志被写入的所有日志流的标识;
    相应地,所述依据从各个事务分片中获取的日志流的标识以及事务标识确定是否已经获取一个事务对应的所有日志被写入的所有日志流,包括:
    针对每一个事务分片,均执行:从当前事务分片中获取日志流的标识以及事务标识;在所获取的事务标识对应的日志流列表中,将所获取的该日志流的标识标记为到达;以及
    针对每一个日志流列表,如果当前日志流列表中所有的日志流的标识均被标记为到达,则确定已经获取了一个事务对应的所有日志被写入的所有日志流;其中,该一个事务的事务标识对应当前日志流列表。
  4. 根据权利要求2所述的方法,其中,在每聚合出一个事务分片时,进一步包括;将聚合出的该事务分片发送至一个以上的组装队列的其中一个组装队列;
    在所述依据从各个事务分片中获取的日志流的标识以及事务标识确定是否已经获取一个事务对应的所有日志被写入的所有日志流之前,进一步包括:利用至少一个线程从一个以上的组装队列中分别获取各个事务分片。
  5. 根据权利要求2至4中任一项所述的方法,其特征在于,每一个被写入日志流的日志中携带有准备版本号;
    在得到事务对应的事务数据之后,该方法进一步包括:根据各日志中携带的准备版本号,确定各个事务之间的先后顺序,按照该先后顺序依次输出各个事务的事务数据。
  6. 根据权利要求5所述的方法,其中,在所述将携带有同一事务标识的所有事务分片进行组装时,进一步包括:
    每组装出一个事务的事务数据时,利用该事务对应的各日志中携带的准备版本号,确定该事务数据的投递版本号,将确定出的该投递版本号携带在该事务数据中,然后将该事务数据发送至定序队列中;
    所述按照该先后顺序依次输出各个事务的事务数据,包括:
    利用每一个事务数据中的投递版本号及定序队列,输出各个事务的事务数据;其中,输出的顺序符合各事务之间的先后顺序。
  7. 根据权利要求6所述的方法,其特征在于,该方法进一步包括:
    在每聚合出一个事务分片时,根据聚合出该事务分片所使用的各日志中携带的准备版本号,确定该事务分片对应的版本号;
    将该版本号携带在该事务分片中,并将该事务分片发送至一个以上的组装队列的其中一个组装队列;以及
    每隔一个预设周期时间向每一个组装队列中发送一个全局心跳值;其中,该全局心跳值等于本预设周期时间内发送至各个组装队列中的各个事务分片中携带的版本号的最小值;
    相应地,所述利用每一个事务数据中的投递版本号及定序队列输出各个事务的事务数据,包括:
    分别从所有组装队列中获取全局心跳值,
    若从所有组装队列中分别获取到了同一个全局心跳值,利用该同一个全局心跳值更 新所述定序队列对应的投递检查值;将所述定序队列中投递版本号小于或等于所述投递检查值的事务数据进行输出。
  8. 根据权利要求7所述的方法,其特征在于,事务分片中携带的版本号为:
    聚合出该事务分片所使用的各日志中携带的各准备版本号中的一个准备版本号;或者,
    该事务分片所属事务的投递版本号,该投递版本号等于该事务对应的各日志中携带的各个准备版本号中的最大值。
  9. 一种处理事务日志的装置,其特征在于,其中,事务为访问和/或操作数据的数据库操作序列;一个事务对应的所有日志被写入分布式数据库中的至少两个日志流中;所述装置包括:
    日志流获取单元,被配置为从分布式数据库获取至少两个日志流;其中,每一个日志流中携带有事务标识;
    事务组装单元,被配置为根据各日志流中携带的事务标识,确定是否已经获取一个事务对应的所有日志被写入的所有日志流,如果是,则利用该所有日志流进行数据组装,以得到该事务对应的事务数据。
  10. 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1至8中任一项所述的方法。
PCT/CN2023/113247 2022-09-06 2023-08-16 处理事务日志的方法及装置 WO2024051454A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211082758.5 2022-09-06
CN202211082758.5A CN115185787B (zh) 2022-09-06 2022-09-06 处理事务日志的方法及装置

Publications (1)

Publication Number Publication Date
WO2024051454A1 true WO2024051454A1 (zh) 2024-03-14

Family

ID=83522667

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/113247 WO2024051454A1 (zh) 2022-09-06 2023-08-16 处理事务日志的方法及装置

Country Status (2)

Country Link
CN (1) CN115185787B (zh)
WO (1) WO2024051454A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117992257A (zh) * 2024-04-07 2024-05-07 天津南大通用数据技术股份有限公司 一种分布式数据库并行数据采集处理方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115185787B (zh) * 2022-09-06 2022-12-30 北京奥星贝斯科技有限公司 处理事务日志的方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503020A (zh) * 2015-09-08 2017-03-15 阿里巴巴集团控股有限公司 日志数据处理方法及装置
CN107408070A (zh) * 2014-12-12 2017-11-28 微软技术许可有限责任公司 分布式存储***中的多事务日志
CN112035222A (zh) * 2020-07-30 2020-12-04 武汉达梦数据库有限公司 一种基于日志解析同步的事务操作合并执行方法及装置
CN114138604A (zh) * 2021-12-01 2022-03-04 浪潮云信息技术股份公司 一种分布式数据库的事务日志处理方法及***
CN114661816A (zh) * 2020-12-24 2022-06-24 金篆信科有限责任公司 数据同步方法、装置、电子设备、存储介质
CN114925073A (zh) * 2022-06-14 2022-08-19 九有技术(深圳)有限公司 支持灵活动态分片的分布式数据库架构及其实现方法
CN115185787A (zh) * 2022-09-06 2022-10-14 北京奥星贝斯科技有限公司 处理事务日志的方法及装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354765B (zh) * 2016-08-19 2020-06-26 广东亿迅科技有限公司 一种基于分布式采集的日志分析***及方法
US10282268B1 (en) * 2016-10-31 2019-05-07 Cisco Technology, Inc. Software flow execution tracing
JP7151548B2 (ja) * 2019-02-26 2022-10-12 富士通株式会社 異常検知プログラム、異常検知方法及び異常検知装置
CN111563017B (zh) * 2020-04-28 2023-05-16 北京字节跳动网络技术有限公司 数据处理方法及装置
CN112764997B (zh) * 2021-01-28 2024-02-20 抖音视界有限公司 一种日志存储的方法、装置、计算机设备和存储介质
CN114817190A (zh) * 2022-04-29 2022-07-29 阿里云计算有限公司 日志同步的方法、装置、***、设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107408070A (zh) * 2014-12-12 2017-11-28 微软技术许可有限责任公司 分布式存储***中的多事务日志
CN106503020A (zh) * 2015-09-08 2017-03-15 阿里巴巴集团控股有限公司 日志数据处理方法及装置
CN112035222A (zh) * 2020-07-30 2020-12-04 武汉达梦数据库有限公司 一种基于日志解析同步的事务操作合并执行方法及装置
CN114661816A (zh) * 2020-12-24 2022-06-24 金篆信科有限责任公司 数据同步方法、装置、电子设备、存储介质
CN114138604A (zh) * 2021-12-01 2022-03-04 浪潮云信息技术股份公司 一种分布式数据库的事务日志处理方法及***
CN114925073A (zh) * 2022-06-14 2022-08-19 九有技术(深圳)有限公司 支持灵活动态分片的分布式数据库架构及其实现方法
CN115185787A (zh) * 2022-09-06 2022-10-14 北京奥星贝斯科技有限公司 处理事务日志的方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117992257A (zh) * 2024-04-07 2024-05-07 天津南大通用数据技术股份有限公司 一种分布式数据库并行数据采集处理方法

Also Published As

Publication number Publication date
CN115185787B (zh) 2022-12-30
CN115185787A (zh) 2022-10-14

Similar Documents

Publication Publication Date Title
WO2024051454A1 (zh) 处理事务日志的方法及装置
US11281644B2 (en) Blockchain logging of data from multiple systems
US11397709B2 (en) Automated configuration of log-coordinated storage groups
US10296606B2 (en) Stateless datastore—independent transactions
EP3191984B1 (en) Scalable log-based transaction management
US10303795B2 (en) Read descriptors at heterogeneous storage systems
US9323569B2 (en) Scalable log-based transaction management
CN107148617B (zh) 日志协调存储组的自动配置
US11822540B2 (en) Data read method and apparatus, computer device, and storage medium
CN110750592A (zh) 数据同步的方法、装置和终端设备
WO2024109239A1 (zh) 集群数据同步方法、装置、设备及非易失性可读存储介质
CN105373563B (zh) 数据库切换方法及装置
US11522966B2 (en) Methods, devices and systems for non-disruptive upgrades to a replicated state machine in a distributed computing environment
WO2021109777A1 (zh) 一种数据文件的导入方法及装置
CN116304390B (zh) 时序数据处理方法、装置、存储介质及电子设备
PETRESCU Leader Election in a Cluster using Zookeeper
Liarokapis Event-driven architectures using Apache Kafka
CN118093614A (zh) 多个Neo4j的数据一致性与查询方法、装置及***
CN118012584A (zh) 一种应用于天翼云备份产品的分布式任务调度方法
CN115567459A (zh) 一种基于缓冲区的流量控制***与方法
CN117762495A (zh) 一种基于Kafka实现高并发高可用的异步命令处理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23862151

Country of ref document: EP

Kind code of ref document: A1