CN116089537A

CN116089537A - Incremental data synchronization method, system, computer and storage medium

Info

Publication number: CN116089537A
Application number: CN202310363286.9A
Authority: CN
Inventors: 李磊; 程光剑; 刘锦豪; 杨献祥; 徐杰
Original assignee: Jiangxi Intelligent Industry Technology Innovation Research Institute
Current assignee: Jiangxi Intelligent Industry Technology Innovation Research Institute
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-05-09
Anticipated expiration: 2043-04-07
Also published as: CN116089537B

Abstract

The invention provides a method, a system, a computer and a storage medium for synchronizing incremental data, wherein the method comprises the steps of processing the synchronous data by utilizing a preset partitioning strategy, and synchronizing a plurality of processed synchronous data into the kafka; generating a plurality of partitions by using a preset statement processing target table, synchronizing a plurality of pieces of synchronous data in the kafka into the plurality of partitions in a one-to-one correspondence manner, judging whether a time source table for acquiring a synchronous task request writes data or not, and judging whether the synchronous data is successfully synchronized into the partitions by using a preset second checking strategy if the time source table for acquiring the synchronous task request does not write data; and the data loss condition of incremental data synchronization caused by no self-increment primary key in the time sequence database is reduced through a time partition and block strategy.

Description

Incremental data synchronization method, system, computer and storage medium

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to an incremental data synchronization method, an incremental data synchronization system, a computer and a storage medium.

Background

The time sequence database is used for processing a series of data based on time sequence change and time serialization, and is widely applied to scenes such as the Internet of things, the industrial Internet of things, basic operation and maintenance systems and the like. Taking the industrial Internet of things as an example, the current time sequence database is applied to equipment monitoring alarm and historical operation condition analysis of the industrial Internet of things. The time sequence data mainly originate from data collected by real-time monitoring equipment of various types of industrial equipment, and the typical characteristics of the data are as follows: the generation frequency is fast, the acquisition time is seriously depended, and the quantity of the multiple information of the measuring points is large. By analyzing the historical time sequence data of the industrial equipment, the running states of all equipment of the factory can be tracked, so that the change of the running states of the equipment can be measured, the past change is analyzed, the present change is monitored, and how the future change is predicted, thereby guiding the industrial production to be carried out and optimized. The time sequence data analysis is applied to specific scenes in the industrial field, such as analyzing faults and seeing what is the main equipment faults; analyzing the productivity, and seeing how to optimize configuration to improve the production efficiency; analyzing the energy consumption to see how to reduce the production cost; and analyzing potential safety hazards to reduce the fault duration. The historical time sequence data has great value on the efficiency and benefit of industrial production.

Time series databases, while having numerous advantages, clustered versions of time series databases (e.g., influxDB) are charged high and do not originate. The continuous increase of access data of the single-edition time sequence database causes the inquiry performance of the database to be reduced, and the history and increment data in the time sequence database are migrated to other high-performance storage platforms (such as distributed storage) to be a considered option. How to design a reliable and stable method for synchronizing data in a time sequence database to other data platforms is a problem. The existing incremental data synchronization requires designing a self-increment field in a source table, but most time sequence databases are designed with a main key by using a time field labeling column, the traditional method for performing incremental synchronization according to the self-increment field is not suitable for synchronization of the time sequence databases, and the time field value of the time sequence data is not unique, so that data loss can be caused during synchronization.

Disclosure of Invention

In order to solve the technical problems, the invention provides an incremental data synchronization method, an incremental data synchronization system, a computer and a storage medium, which are used for solving the technical problems that the traditional incremental data synchronization method based on the self-increment field is not suitable for incremental data synchronization of a time sequence database.

In one aspect, the present invention provides the following technical solutions, and an incremental data synchronization method, including:

acquiring a synchronous task request of a time sequence database, and calculating synchronous data in the synchronous task request;

processing the synchronous data by using a preset blocking strategy, connecting kafka, and synchronizing the processed synchronous data into the kafka;

generating a plurality of partitions by utilizing a preset statement processing target table, and synchronizing a plurality of pieces of synchronous data in the kafka into the plurality of partitions in a one-to-one correspondence manner;

judging whether the time source table for acquiring the synchronous task request writes data or not, if the time for acquiring the synchronous task request does not write data in the source table, judging whether the synchronous data is successfully synchronized into the partition by utilizing a preset second check strategy;

the second checking strategy is used for indicating that the synchronization of the synchronization data is successful if the data quantity of the processed plurality of pieces of the synchronization data and the data quantity of the plurality of pieces of data in the kafka are equal to the data quantity of the data in the plurality of partitions in a one-to-one correspondence manner;

and constructing a state database, and recording the table name of the source table, the processed synchronous data of a plurality of blocks, the name of the kafka, the stored data quantity of the kafka, the table name of the target table, the data quantity of each partition of the target table, a synchronous result mark and a timestamp corresponding to the last block of synchronous data into the state database.

Compared with the prior art, the beneficial effects of this application are: compared with the existing incremental synchronization which can only be carried out through the self-increasing main key, the method does not need to create the self-increasing main key through a time partition and block strategy, thereby reducing the loss of data and improving the accuracy of synchronization; and comparing the partitioned data through a second checking strategy, so that the checking accuracy is improved.

Further, after the step of determining whether the time source table of the synchronous task obtaining request writes data, the method further includes:

if the source table writes data at the moment of acquiring the synchronous task request, judging whether the synchronous data is successful or not by utilizing a preset first verification strategy;

if it is

The synchronization of the synchronous data is successful;

if it is

Unsuccessful;

resynchronizing the synchronization data in the kafka that failed to synchronize into the partition;

if it is

Unsuccessful;

resynchronizing the synchronization data that failed to synchronize into the kafka and resynchronizing into the corresponding partition;

where k represents the number of data blocks in the synchronization, s (i) represents the data amount corresponding to the i-th block after the source table is partitioned, z (i) represents the data amount corresponding to the i-th partition of kafka, and m (i) represents the data amount corresponding to the i-th partition of the target table.

Further, after the step of determining whether the synchronization data is successfully synchronized into the partition by using a preset second checking policy, the method further includes:

if it is

Re-synchronizing the synchronization data in the kafka that failed to synchronize into the partition;

if it is

Synchronizing the failed synchronization data into the kafka to the kafka and to the corresponding partition;

Further, the step of synchronizing the processed pieces of the synchronization data into the kafka includes:

processing the kafka to generate a plurality of topics, and synchronizing the processed synchronous data of the plurality of blocks into the corresponding topics.

Further, the method further comprises:

and re-executing the synchronous task request for acquiring the time sequence database every preset time interval, and calculating synchronous data in the synchronous task request.

Further, after the step of re-executing the step of obtaining the synchronous task request of the time sequence database and calculating the synchronous data in the synchronous task request at each preset time interval, the method further includes:

Acquiring a time period corresponding to the synchronous data of the last block in the last synchronous task request, and taking the time period as a starting time period of the synchronous task;

and covering the synchronous data of the first block of the synchronous task of this time with the synchronous data of the last block in the partition in the last synchronous task.

In a second aspect, the present invention provides a system for synchronizing incremental data, including:

the computing module is used for acquiring a synchronous task request of the time sequence database and computing synchronous data in the synchronous task request;

the partitioning module is used for processing the synchronous data by utilizing a preset partitioning strategy, connecting kafka and synchronizing the processed synchronous data into the kafka;

the processing module is used for processing the target table by using a preset statement to generate a plurality of partitions, and synchronizing a plurality of pieces of synchronous data in the kafka into the plurality of partitions in a one-to-one correspondence manner;

the judging module is used for judging whether the time source table for acquiring the synchronous task request writes data or not, and if the time source table for acquiring the synchronous task request does not write data, judging whether the synchronous data is successfully synchronized into the partition by utilizing a preset second checking strategy;

The corresponding module is used for indicating that the synchronization of the synchronization data is successful if the data quantity of the processed multiple blocks of the synchronization data and the data quantity of the multiple blocks of the kafka are equal to the data quantity of the data in the multiple partitions in a one-to-one correspondence manner;

the construction module is used for constructing a state database, and recording the table name of the source table, the processed synchronous data of a plurality of blocks, the name of the kafka, the data volume stored by the kafka, the table name of the target table, the data volume of each partition of the target table, a synchronous result mark and a timestamp corresponding to the last block of synchronous data into the state database.

Further, the judging module includes:

a first resynchronization unit for, if unsuccessful

a second resynchronization unit for, if unsuccessful

The synchronization data that failed to be synchronized into the kafka is re-synchronized into the kafka and re-synchronized into the corresponding partition.

Further, the system further comprises:

the verification module is used for judging whether the synchronization data is successful or not by utilizing a preset first verification strategy if the source table writes data at the moment of acquiring the synchronization task request;

If it is

The synchronization of the synchronous data is successful;

if it is

Unsuccessful;

if it is

Unsuccessful;

Further, the blocking module includes:

and the processing unit is used for processing the kafka to generate a plurality of topics and synchronizing the synchronous data of the processed multiple blocks into the corresponding topics.

Further, the system further comprises:

and the execution module is used for re-executing the synchronous task request for acquiring the time sequence database every preset time interval and calculating synchronous data in the synchronous task request.

Further, the system further comprises:

the covering module is used for acquiring a time period corresponding to the synchronous data of the last block in the synchronous task request, taking the time period as a starting time period of the synchronous task, and covering the synchronous data of the first block of the synchronous task of this time with the synchronous data of the last block in the partition in the synchronous task of this time.

In a third aspect, the present invention provides a computer, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the incremental data synchronization method as described above when executing the computer program.

In a fourth aspect, the present invention provides a storage medium having a computer program stored thereon, which when executed by a processor implements an incremental data synchronization method as described above.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for synchronizing incremental data according to a first embodiment of the present invention;

FIG. 2 is a diagram illustrating incremental data synchronization according to a first embodiment of the present invention;

FIG. 3 is a flowchart of a method for synchronizing incremental data according to a second embodiment of the present invention;

FIG. 4 is a diagram illustrating incremental data synchronization according to a second embodiment of the present invention;

FIG. 5 is a block diagram of an incremental data synchronization system according to a third embodiment of the present invention;

fig. 6 is a schematic hardware structure of a computer according to a fourth embodiment of the present invention.

Embodiments of the present invention will be further described below with reference to the accompanying drawings.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended to illustrate embodiments of the invention and should not be construed as limiting the invention.

In the description of the embodiments of the present invention, it should be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate description of the embodiments of the present invention and simplify description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present invention, the meaning of "plurality" is two or more, unless explicitly defined otherwise.

In the embodiments of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured" and the like are to be construed broadly and include, for example, either permanently connected, removably connected, or integrally formed; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the embodiments of the present invention will be understood by those of ordinary skill in the art according to specific circumstances.

Example 1

In a first embodiment of the present invention, as shown in fig. 1 and 2, an incremental data synchronization method includes steps S101 to S105:

S101, acquiring a synchronous task request of a time sequence database, and calculating synchronous data in the synchronous task request;

when receiving a synchronous task request, the method firstly acquires a time sequence database needing to be synchronized, a plurality of source tables are arranged in the time sequence database, and synchronous data needed in a time period of the source tables are calculated. It should be noted that, the synchronization data is represented as data (incremental data) from the current time to the last synchronization end time in the current synchronization task of the source table, the source table involved is a time sequence database, and the time sequence database is easy to generate performance bottleneck due to a high-frequency strategy of collecting data. It is an effective strategy to consider migrating historical data in a time series database to other data platforms to ensure database performance. The source table information comprises table names, fields and the like, and the target table information comprises information such as database connection, tables, fields and the like; the target table can be automatically generated through a source time sequence database table structure, and the specific operation is to maintain a time sequence database field and target table field configuration mapping table in advance, as shown in table 1, when the source table field name and the field type are acquired, the target field type in the configuration table is automatically matched, and SQL table construction sentence is used for generating a target partition table, wherein partition fields in the table are set to be preset time partitions.

Table 1 configuration mapping table

When the synchronization task is completed before the synchronization task request is acquired, when the synchronization data is calculated, the last time after the last synchronization is completed is the starting time of the current synchronization time, that is, the time interval data between the current time and the time after the last synchronization is queried in the time sequence database to be the data needing to be synchronized this time, for example, the synchronization task request is initiated for the third time now, and the data from the time of the third time to the time of the current time is the synchronization data increment of the third synchronization task.

S102, processing the synchronous data by using a preset partitioning strategy, and connecting kafka, and synchronizing the processed synchronous data in a plurality of blocks into the kafka;

the data of the time sequence database is accumulated in a large amount along with time, if a large amount of data is read once, the time sequence database service is affected, and the data taken out is synchronized to the target table once, which may cause synchronization failure due to a large amount of data. The time sequence database takes time as a partition, and basically all time sequence data inquiry can take time as a screening condition, so that the speed of inquiring data can be increased.

The preset blocking strategy is to take time stamps with different granularity in a time sequence database as blocking conditions of a synchronous task, and write different blocking data into kafka (kafka is a distributed message queue), wherein the kafka is divided into a plurality of topics, the synchronous data of each block corresponds to one topic (the topic is called a theme, in the kafka, one category attribute is used for dividing the category of the message, and the category for dividing the message is called the topic).

When the method is implemented, firstly, according to a preset blocking strategy, synchronous data are blocked according to time periods, such as h (hours), then the data (data in one time period) to be synchronized in a time sequence database are segmented according to h to obtain h1, h2, and hk, wherein k represents the number of blocks, hk represents the data after the k th block, and the data with different block numbers after the block are synchronized into topic corresponding to kafka, namely, h 1- & gt topic1, h 2- & gt topic2, and h k- & gt topick, wherein topick represents the data blocks in the k th block hk. As in table 2:

table 2 time series data table

It should be noted that, according to a preset partitioning strategy, the data in the table may be divided into two partitions, that is, the first and second pieces of data are divided into a first block h1, where h1 represents a data block in a time range 2023-02-23:00:00-2023-02-23:10:59, and the partition in the corresponding target table may be set to pt=2023-02-23-10; the third and fourth pieces of data are divided in a second block h2, where h2 represents the data blocks of the time range 2023-02-23:00:00-2023-02-23:11:59:59, and the partition in the corresponding target table may be set to pt=2023-02-23-11. Where pt refers to the partition field in the target table, t is 1, 2, etc.

In addition, the read operation can simultaneously start a plurality of read operations according to the system resource condition to perform concurrent processing, which can be understood as synchronizing a plurality of blocks of data into the target table at a time.

S103, generating a plurality of partitions by using a preset statement processing target table, and synchronizing a plurality of pieces of synchronous data in the kafka into the plurality of partitions in a one-to-one correspondence manner.

Wherein, the preset sentence is: SQL builds a table statement. The target table generation multiple partitions are also partitioned by time period. It will be appreciated that there are a plurality of data tables in the target table, and that in generating a plurality of partitions for the target table, the data tables in the target table are divided into a plurality of partitions.

In specific implementation, a target table is processed by using SQL (structured query language) to generate a plurality of partitions, then topic in the kafka is correspondingly connected with the partitions of the target table, and data of different topic in the kafka are written into different partitions of the target table until the data in the last partition are synchronized.

S104, judging whether the time source table for acquiring the synchronous task request writes data or not, if the time for acquiring the synchronous task request does not write data in the source table, judging whether the synchronous data is successfully synchronized into the partition by utilizing a preset second checking strategy;

And S105, the second checking strategy is used for indicating that the synchronization of the synchronous data is successful if the data quantity of the processed plurality of blocks of the synchronous data and the data quantity of the plurality of blocks of the kafka are equal to the data quantity of the data in the plurality of partitions in a one-to-one correspondence manner.

The second verification strategy is for: 1. if it is

Indicating successful synchronization of the synchronization data, it is worth noting that each processed block of synchronization data, and pairThe data in the corresponding topic is compared with the data amount of the data in the corresponding partition, when the data in the corresponding topic and the data in the corresponding partition are equal, the synchronization data is successfully synchronized, and it can be understood that when the synchronization data to be synchronized is divided into 10 blocks, the data in the 10 blocks is compared with the data amount in the corresponding 10 topic and the data amount in the corresponding 10 partitions one by one, when the data in the 10 blocks are equal, the synchronization data is successfully synchronized, and when the synchronization data is successfully synchronized, the data in all topics in kafka can be deleted.

2. If it is

The synchronization of the synchronization data into the kafka is successful, but the synchronization of the synchronization data in the kafka fails into the partition, and the synchronization data in the kafka that fails in synchronization is re-synchronized into the partition. It is worth to say that, the data volume of each processed block of synchronous data is compared with the data volume of the corresponding topic and the data volume of the corresponding partition, when the data volume of one of the topic is not equal to the data volume of the data of the corresponding partition, but the data volume of the topic is equal to the data volume after the partitioning, the failure of synchronizing the data volume of the topic to the corresponding partition is indicated, so that only the data volume of the topic needs to be re-synchronized to the corresponding partition, whether the data of the topic is successfully synchronized to the corresponding partition is checked again, the data after the partitioning does not need to be re-synchronized to the topic again, and then the data after the partitioning is synchronized to the corresponding partition, thereby saving the working flow and improving the working efficiency. For example, when the synchronization data to be synchronized is divided into 10 blocks, the fifth block of synchronization data after the division is equal to the data in the fifth topic, but the data in the fifth topic is not equal to the data in the fifth partition, it is indicated that the data in the fifth topic fails to be synchronized to the fifth partition, so that the data in the fifth topic only needs to be re-synchronized to the fifth partition.

3. If it is

ThenThe processed synchronous data is failed to be synchronized in the kafka, the synchronous data which is failed to be synchronized in the kafka is re-synchronized in the kafka, and the synchronous data is re-synchronized in the corresponding partition. It should be noted that, when the data amount in one of the topic is equal to the data amount of the data in the corresponding partition, but the data amount in the topic is not equal to the data amount after the corresponding partition, it is indicated that the data amount in the topic is successfully synchronized to the corresponding partition, but the synchronization data after the partition is erroneously synchronized to the topic, so that the synchronization to the data in the partition is also erroneous, so that the processed synchronization data of the block needs to be re-synchronized to the corresponding topic, then the data in the topic is synchronized to the corresponding partition, and then whether the data amounts of the three are consistent is checked again. For example, when the synchronization data block to be synchronized is 10 blocks, the data amount of the fifth block of synchronization data after the block is not equal to the data amount in the fifth topic, which indicates that the synchronization of the data amount of the fifth block to the fifth topic fails, so that the data amount of the fifth block needs to be synchronized again to the fifth topic, and then the data amount in the fifth topic is re-synchronized to the fifth partition, and then three comparison judgment is performed on the data amount of the fifth block, the data amount in the fifth topic and the data amount of the fifth partition.

In the second checking strategy, the data quantity of the synchronized data after the blocking, the data quantity in the topic and the data quantity in the partition are compared one by one, and when the first checking condition occurs, the fact that each block of synchronized data after the blocking is successfully synchronized to the partition corresponding to the target table is described. When the second verification situation occurs, it is indicated that synchronization of one block of synchronous data after the blocking to the topic is successful, but synchronization of the synchronous data in the topic to the corresponding partition fails, the synchronous data in the topic corresponding to the partition needs to be re-synchronized to the partition, and then three verification is performed. When the third verification situation occurs, it is indicated that the synchronization of the processed synchronization data to the corresponding topic fails, so that the synchronization data that fails to be synchronized to the topic needs to be re-synchronized to the topic and re-synchronized to the corresponding partition.

In summary, compared with the existing incremental synchronization which can only be performed through the self-increasing main key, the method does not need to create the self-increasing main key through a time partition and block strategy, thereby reducing the loss of data, improving the accuracy of synchronization, and being more suitable for a time sequence database; and the accuracy of data transmission can be improved through a second checking strategy, wherein the data after blocking are compared with each other, and the accuracy of checking can be improved.

Example two

As shown in fig. 3 and 4, in a second embodiment of the present invention, there is provided an incremental data synchronization method, which is different from the incremental data synchronization method provided in the first embodiment in that: comprising steps S301 to S308:

s301, acquiring a synchronous task request of a time sequence database, and calculating synchronous data in the synchronous task request.

S302, processing the synchronous data by using a preset partitioning strategy, and connecting kafka, and synchronizing the processed synchronous data in a plurality of blocks into the kafka;

wherein, before the synchronization data is synchronized into the kafka, the kafka is processed to generate a plurality of topics, and then the processed synchronization data of a plurality of blocks are synchronized into the corresponding topics.

S303, generating a plurality of partitions by using a preset statement processing target table, connecting the kafka, and synchronizing a plurality of blocks of synchronous data in the kafka into the plurality of partitions in a one-to-one correspondence manner.

S304, judging whether the time source table for acquiring the synchronous task request writes data or not, and if the time source table for acquiring the synchronous task request writes data, judging whether the synchronous data is successfully synchronized into the partition by utilizing a preset first verification strategy;

it should be noted that there may be multiple devices simultaneously and continuously writing data into the source table at the same time, so when the current time is taken as the end time during the blocking by the blocking policy, but there may be continuous data written into the source table at the current time, so it is necessary to determine whether there is continuous data writing into the source table, so that when the data of one time period is taken, there is data writing at the current time, and if the second checking policy is selected, there is an error in determination.

In the implementation, firstly, whether data is continuously written into a source table is judged, and if so, whether the synchronous data is successfully synchronized into the partition is judged by utilizing a preset first verification strategy.

The first verification policy is used for

1. If it is

The synchronization of the synchronous data is successful; it is worth to say that, the data volume of the last block of synchronous data does not need to be compared, the rest of synchronous data needs to be compared, the last block of synchronous data only needs to be compared with the data volume of the data in the corresponding partition, when the synchronous is successful, if the synchronous is not the same, the data in the topic is synchronized to the corresponding partition again, and the judgment mode of the rest of synchronous data of the block number is the same as that of the first verification strategy. It can be understood that, when the last block of synchronous data (the synchronous data corresponding to the number of blocks at the current time) in the source table is calculated again and compared again, the last block of synchronous data is larger than the synchronous data calculated previously when the last block of synchronous data is calculated again because the last block of synchronous data is written continuously at the current time, resulting in error in judgment. For example, when the synchronization data block to be synchronized is 10 blocks, where k is 10, the current time corresponds toThe 10 th block of synchronous data is only needed to compare the data quantity in the 10 th topic with the data quantity of the data in the 10 th partition because the data is continuously written into the source table at the current moment.

2. If it is

The synchronization is unsuccessful, the synchronization of the synchronization data to the kafka is successful, but the synchronization of the synchronization data in the kafka to the partition is failed, and the synchronization data in the kafka which is failed in synchronization is re-synchronized to the partition.

3. If it is

The synchronization is unsuccessful, the processed synchronization data fails to synchronize into the kafka, the synchronization data which fails to synchronize into the kafka is re-synchronized into the kafka, and the synchronization data is re-synchronized into the corresponding partition.

Most of the time sequence databases use time+tags (time is a time field of data, tags is a tag column field) as a primary key, and performing incremental data synchronization only through the time field may result in partial data loss, for example:

2023-02-23 11:28:45 serverA us_west 0.67

2023-02-23 11:28:45 serverB us_west 0.68

2023-02-23 11:28:45 serverC us_west 0.69

...

at 2023-02-23 11:28:45, different devices write data to the time-series database, which may result in the subsequent data written at the same time not being successfully synchronized to the target table when we use the time node as the data screening condition. Therefore, when the incremental data synchronization task is started each time, the data of the first k-1 partitions are written into the first k-1 partitions of the target table in an additional mode, and the data of the kth partition are updated in an overwriting mode so as to ensure the consistency of the data.

In the data verification, for the data amount of the last block in the synchronous data in the source table, only the last topic data amount and the last partition data amount in the kafka are verified, and if the data are the same, the data synchronization is considered to be successful. Because new data may be generated when the source table is queried to finally block data after the synchronization task is finished, the difference of the check data amount is caused (a plurality of data writes may exist in the source table at the same time). And the data of the last block is synchronized again to cover the original data when the incremental synchronization is performed next time, so that only three check values of the previous k-1 block are judged. In addition, if the data synchronization is judged to be successful, the topic data in the kafka is deleted to empty the data space to wait for the next synchronization task. And finally, storing the last data time stamp of the increment synchronization for the next increment synchronization, and determining that the data synchronization is successful by checking the time stamp data with the predefined granularity in the original time sequence database, the topic data in the kafka and the partition data in the target table.

And S305, if the source table does not write data at the moment of acquiring the synchronous task request, judging whether the synchronous data is successful or not by utilizing a preset second check strategy.

S306, constructing a state database, and recording the table name of the source table, the processed synchronous data of a plurality of blocks, the name of the kafka, the data volume stored by the kafka, the table name of the target table, the data volume of each partition of the target table, a synchronous result mark and a timestamp corresponding to the last block of synchronous data into the state database;

when a synchronous task request is acquired, firstly acquiring a plurality of data tables of a source table, and calculating synchronous data corresponding to each data table of the source table, wherein the synchronous data represents data increment which is needed to be synchronized and corresponds to each data table of the source table, and the target table also has a plurality of data tables.

In a specific implementation, the table name of each table of the source table, the corresponding data amount after the synchronization data of each table is divided, the name of the topic in the kafka, the data amount stored by each topic, the table name of each table of the target table, the data amount of each partition of the table of the target table, the corresponding synchronization result flag, and the last timestamp corresponding to each partition are recorded into a state database, as shown in table 3.

TABLE 3 synchronous task result State Table

S307, re-executing the synchronous task request of the acquired time sequence database every preset time interval, and calculating synchronous data in the synchronous task request;

It should be noted that, according to the preset time, the synchronization of the data increment in the source table may be repeated.

S308, acquiring a time period corresponding to the synchronous data of the last block in the synchronous task request, taking the time period as a starting time period of the synchronous task, and covering the synchronous data of the first block of the synchronous task with the synchronous data of the last block in the partition in the synchronous task;

the timestamp corresponding to the last recorded piece of synchronous data in the state database plays an auxiliary role in starting time of the next synchronous task. Since the synchronization data is divided by time periods, the last block of synchronization data is the data of the last time period.

In the specific implementation, a time stamp corresponding to the synchronous data of the last block in the last synchronous task in the state database needs to be acquired, then a time period corresponding to the time stamp is taken as a starting time period of the last synchronous task, then the time period is correspondingly taken as a first time period in the last synchronous task, the head data corresponding to the first time period can be understood as the first block synchronous data in the last synchronous task, and then the synchronous data of the first block in the last synchronous task is replaced by the last block data in the last synchronous task. It should be noted that, because the last block of data in the last synchronization task is still written with new data if the last block of data in the last synchronization task is written with data at the same time, if the last block of data in the last synchronization task is not covered by the current synchronization task, the last block of data has errors, so that the data amount synchronization has errors, and the accuracy of data synchronization is improved by covering the last block of data.

The second embodiment of the present invention has the advantages that compared with the first embodiment, the status database is set to record the status of the synchronous intermediate process, the accuracy of data synchronization can be improved by covering the last block of data, and the first check strategy or the second check strategy can be selected by judging whether the data is continuously written into the source table in advance, when the second check strategy is selected, it is indicated that the last moment is not written into the status database, the last block of synchronous data is not required to be covered, the step of covering is reduced, and the working efficiency is improved.

Example III

As shown in fig. 5, in a third embodiment of the present invention, there is provided an incremental data synchronization system including:

the calculation module 10 is configured to obtain a synchronous task request of the time sequence database, and calculate synchronous data in the synchronous task request.

And a blocking module 20, configured to process the synchronization data by using a preset blocking policy, and connect kafka, and synchronize the processed blocks of the synchronization data to the kafka.

And a processing module 30, configured to generate a plurality of partitions by using a preset statement processing target table, and connect the kafka, and synchronize a plurality of pieces of the synchronization data in the kafka into a plurality of the partitions in a one-to-one correspondence manner.

The judging module 40 is configured to judge whether the source table writes data at the time of the synchronous task obtaining request, and if the source table does not write data at the time of the synchronous task obtaining request, judge whether the synchronous data is successfully synchronized to the partition by using a preset second checking policy;

a corresponding module 50, configured to indicate that the synchronization of the synchronization data is successful if the data amount of the processed plurality of pieces of synchronization data and the data amount of the plurality of pieces of data in the kafka are equal to the data amounts of the data in the plurality of partitions in a one-to-one correspondence manner;

the construction module 60 constructs a state database, and records the table name of the source table, the processed synchronized data of the plurality of blocks, the name of the kafka, the data amount stored by the kafka, the table name of the target table, the data amount of each partition of the target table, the synchronization result flag, and the timestamp corresponding to the last block of the synchronized data into the state database.

In some alternative embodiments, the determining module 40 includes:

a first resynchronization unit for, if unsuccessful

A second resynchronization unit for, if unsuccessful

In some alternative embodiments, the system further comprises:

if it is

The synchronization of the synchronous data is successful;

if it is

Unsuccessful;

if it is

Unsuccessful; />

In some alternative embodiments, the blocking module 20 includes:

In some alternative embodiments, the system further comprises:

The incremental data synchronization system provided in the embodiment of the present invention has the same implementation principle and technical effects as those of the foregoing method embodiment, and for brevity, reference may be made to corresponding contents in the foregoing method embodiment where the system embodiment is not mentioned.

Example IV

As shown in fig. 6, in a fourth embodiment of the present invention, a computer according to the present invention includes a memory 202, a processor 201, and a computer program stored in the memory 202 and executable on the processor 201, where the processor 201 implements the incremental data synchronization method as described above when executing the computer program.

In particular, the processor 201 may include a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.

Memory 202 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 202 may comprise a Hard Disk Drive (HDD), floppy Disk Drive, solid state Drive (Solid State Drive, SSD), flash memory, optical Disk, magneto-optical Disk, tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. Memory 202 may include removable or non-removable (or fixed) media, where appropriate. The memory 202 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 202 is a Non-Volatile (Non-Volatile) memory. In a particular embodiment, the Memory 202 includes Read-Only Memory (ROM) and random access Memory (Random Access Memory, RAM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable ROM (Programmable Read-Only Memory, abbreviated PROM), an erasable PROM (Erasable Programmable Read-Only Memory, abbreviated EPROM), an electrically erasable PROM (Electrically Erasable Programmable Read-Only Memory, abbreviated EEPROM), an electrically rewritable ROM (Electrically Alterable Read-Only Memory, abbreviated EAROM), or a FLASH Memory (FLASH), or a combination of two or more of these. The RAM may be Static Random-Access Memory (SRAM) or dynamic Random-Access Memory (Dynamic Random Access Memory DRAM), where the DRAM may be a fast page mode dynamic Random-Access Memory (Fast Page Mode Dynamic Random Access Memory FPMDRAM), extended data output dynamic Random-Access Memory (Extended Date Out Dynamic Random Access Memory EDODRAM), synchronous dynamic Random-Access Memory (Synchronous Dynamic Random-Access Memory SDRAM), or the like, as appropriate.

Memory 202 may be used to store or cache various data files that need to be processed and/or communicated, as well as possible computer program instructions for execution by processor 201.

The processor 201 implements the incremental data synchronization method described above by reading and executing computer program instructions stored in the memory 202.

In some of these embodiments, the computer may also include a communication interface 203 and a bus 200. As shown in fig. 6, the processor 201, the memory 202, and the communication interface 203 are connected to each other through the bus 200 and complete communication with each other.

The communication interface 203 is configured to enable communication between modules, apparatuses, units, and/or devices in embodiments of the present application. Communication interface 203 may also enable communication with other components such as: and the external equipment, the image/data acquisition equipment, the database, the external storage, the image/data processing workstation and the like are used for data communication.

Bus 200 includes hardware, software, or both, coupling components of a computer to each other. Bus 200 includes, but is not limited to, at least one of: data Bus (Data Bus), address Bus (Address Bus), control Bus (Control Bus), expansion Bus (Expansion Bus), local Bus (Local Bus). By way of example, and not limitation, bus 200 may include a graphics acceleration interface (Accelerated Graphics Port), abbreviated AGP, or other graphics Bus, an enhanced industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) Bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an industry standard architecture (Industry Standard Architecture, ISA) Bus, a radio Bandwidth (InfiniBand) interconnect, a low Pin Count (LO Pin Count, abbreviated LPC) Bus, a memory Bus, a micro channel architecture (Micro Channel Architecture, abbreviated MCa) Bus, a peripheral component interconnect (Peripheral Component Interconnect, abbreviated PCI) Bus, a PCI-Express (PCI-X) Bus, a serial advanced technology attachment (Serial Advanced Technology Attachment, abbreviated SATA) Bus, a video electronics standards Association local (Video Electronics Standards Association Local Bus, abbreviated VLB) Bus, or other suitable Bus, or a combination of two or more of these. Bus 200 may include one or more buses, where appropriate. Although embodiments of the present application describe and illustrate a particular bus, the present application contemplates any suitable bus or interconnect.

Example five

In a fifth embodiment of the present invention, in combination with the above incremental data synchronization method, the embodiment of the present invention provides a storage medium, where a computer program is stored on the storage medium, and the computer program implements the above incremental data synchronization method when executed by a processor.

Those of skill in the art will appreciate that the logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a sequence data table of executable instructions that may be considered to implement the logic functions, may be embodied in any computer readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list of data) of the readable medium include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method of incremental data synchronization, comprising:

2. The incremental data synchronization method of claim 1 wherein after the step of determining whether the time source table of the get synchronization task request is written with data, the method further comprises:

If it is

The synchronization of the synchronous data is successful;

if it is

Unsuccessful;

if it is

Unsuccessful;

3. The incremental data synchronization method of claim 1 wherein, after the step of determining whether the synchronization data was successfully synchronized into the partition using a preset second parity policy, the method further comprises:

if not successful, and

re-synchronizing the synchronization data in the kafka that failed to synchronize into the partition; />

If not successful, and

4. The incremental data synchronization method of claim 1 wherein the step of synchronizing the processed pieces of the synchronization data into the kafka comprises:

5. The incremental data synchronization method of claim 1 wherein the method further comprises:

6. The incremental data synchronization method of claim 5 wherein, after the step of re-executing the step of obtaining the synchronization task request of the time series database and calculating the synchronization data in the synchronization task request at each preset time interval, the method further comprises:

7. An incremental data synchronization system, comprising:

8. A computer comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the incremental data synchronization method of any one of claims 1 to 6 when the computer program is executed by the processor.

9. A storage medium having stored thereon a computer program which, when executed by a processor, implements the incremental data synchronization method of any one of claims 1 to 6.