WO2017050177A1 - 一种数据同步方法和装置 - Google Patents

一种数据同步方法和装置 Download PDF

Info

Publication number
WO2017050177A1
WO2017050177A1 PCT/CN2016/099055 CN2016099055W WO2017050177A1 WO 2017050177 A1 WO2017050177 A1 WO 2017050177A1 CN 2016099055 W CN2016099055 W CN 2016099055W WO 2017050177 A1 WO2017050177 A1 WO 2017050177A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
processing
job
machine
tasks
Prior art date
Application number
PCT/CN2016/099055
Other languages
English (en)
French (fr)
Inventor
罗海伟
邓小勇
陈守元
刘鹏
张轩丞
Original Assignee
阿里巴巴集团控股有限公司
罗海伟
邓小勇
陈守元
刘鹏
张轩丞
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 罗海伟, 邓小勇, 陈守元, 刘鹏, 张轩丞 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017050177A1 publication Critical patent/WO2017050177A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Definitions

  • the present application relates to the field of data synchronization technologies, and in particular, to a data synchronization method and a data synchronization device.
  • Data synchronization is the process of establishing data consistency between data sources and data destinations in a production system.
  • data synchronization can be divided into database synchronization, file synchronization, etc.; according to the synchronization strategy, data synchronization can be divided into offline synchronization and real-time synchronization, wherein offline synchronization can extract snapshots of the storage system data (snapshot) It is often used in BI (Business Intelligence) analysis, data migration and other scenarios.
  • BI Business Intelligence
  • the existing solution often implements the above-mentioned offline synchronization through a synchronous ETL (Extract Transform Load) tool, and the corresponding data synchronization process may specifically include: extracting online data to an offline analysis platform, such as ODPS (Open Data Processing Service) Open Data Processing Service, Hadoop, etc., perform big data analysis and calculation on the above offline analysis platform, and after completing the calculation, load the calculation result into an online database for use in the foreground business.
  • an offline analysis platform such as ODPS (Open Data Processing Service) Open Data Processing Service, Hadoop, etc.
  • embodiments of the present application have been made in order to provide a data synchronization method and a corresponding data synchronization apparatus that overcome the above problems or at least partially solve the above problems, and can avoid the occurrence of network card traffic on a machine running the above task. Full of problems, which can improve the stability of the production system.
  • a data synchronization method including:
  • the processing unit of the task is dormant.
  • the traffic speed of the task conforms to the preset condition, that is, the byte traffic speed of the task exceeds a first threshold, and/or the recorded traffic speed of the task exceeds a second threshold.
  • the step of performing a dormancy process on the processing unit of the task includes:
  • the processing unit that controls the task enters a sleep state in which the maintenance time is the sleep duration.
  • the method further includes:
  • Processing the user's job by using the resources of the machine in the resource group includes:
  • the processing unit of the task is dormant processing by using the machine in the resource group.
  • the step of processing, by the machine in the resource group, the job of the user includes:
  • the machine in the resource group processes the job of the user.
  • the resources of the machine include: slot resources obtained according to the physical resource abstraction of the machine.
  • the method further includes:
  • the processing unit comprises: a reading unit and a writing unit;
  • the step of processing the task by using the concurrent connection includes:
  • a read unit and a write unit are selected from the out-of-order processing results to form a corresponding task.
  • the method further comprises: performing processing of the task when the current time is within a time window of the task correspondence database.
  • the step of dividing the job into multiple tasks comprises:
  • the present application also discloses a data synchronization device, including:
  • a segmentation module configured to divide a job into a plurality of tasks; wherein the tasks have corresponding processing units;
  • a monitoring module configured to monitor a traffic speed of the task during processing of the task
  • the hibernation module is configured to perform a dormancy process on the processing unit of the task when the traffic speed of the task meets a preset condition.
  • the traffic speed of the task conforms to the preset condition, that is, the byte traffic speed of the task exceeds a first threshold, and/or the recorded traffic speed of the task exceeds a second threshold.
  • the hibernation module includes:
  • Determining a sub-module configured to determine, according to the traffic speed, the monitoring period, and the upper limit of the traffic speed of the task, a sleep duration of the processing unit of the task;
  • control submodule configured to control the processing unit of the task to enter a sleep state in which the maintenance time is the sleep duration.
  • the device further comprises:
  • a job processing module configured to process a job of the user by using a resource of a machine in the resource group, where the process of processing a job of the user by using a machine in the resource group includes:
  • the processing unit of the task is dormant processing by using the machine in the resource group.
  • the job processing module includes:
  • the condition processing submodule is configured to process, by the machine in the resource group, the user's job when the remaining resources of the machine in the user corresponding resource group exceed the resources required by the user's job.
  • the resources of the machine include: slot resources obtained according to the physical resource abstraction of the machine.
  • the device further comprises:
  • a first task processing module configured to process the task by using the concurrent connection when there is a concurrent connection available in the database
  • a release module configured to release the concurrent connection occupied by the task after the task processing is completed.
  • the processing unit comprises: a reading unit and a writing unit;
  • the first task processing module includes:
  • An out-of-order processing sub-module configured to perform out-of-order processing on the plurality of read units and the plurality of write units corresponding to the job respectively;
  • the combination sub-module is configured to select a read unit and a write unit from the out-of-order processing result to form a corresponding task.
  • the device further comprises:
  • the second task processing sub-module is configured to perform processing of the task when the current time is within the time window of the task corresponding database.
  • the segmentation module comprises:
  • a first molecular module for dividing a column interval into a plurality of sub-intervals according to a column minimum value and a maximum value of the job corresponding data table
  • the second splicing module is configured to determine a singularity point of the file according to the size and the number of dicing files of the job corresponding file, and segment the file according to the dicing point.
  • the embodiment of the present application divides the job into multiple tasks, and each of the tasks may have a corresponding processing unit, thereby monitoring the traffic speed of the task and the traffic in the task during the processing of the task.
  • the processing unit of the task performs a sleep process; because the processing unit of the task performs the sleep process, the processing unit of the task may pause the data synchronization for the task, thereby reducing the running of the task.
  • the network card traffic of the machine avoids problems such as full NIC traffic on the machine running the above tasks, thereby improving the stability of the production system.
  • the embodiment of the present application can separately monitor the traffic speed of each task, and perform differentiated flow control on different tasks according to the monitoring result, so that the flow control of different tasks is independent of each other, for example, multiple tasks for the same job.
  • the traffic speed of some tasks exceeds the threshold, so the processing unit can be dormant to alleviate the pressure of the network card traffic and memory resources of the machine running these tasks. However, the traffic speed of some tasks does not exceed the threshold.
  • the processing unit is not dormant, so the embodiment of the present application can improve the rationality of the flow control.
  • FIG. 1 is a schematic structural diagram of a data synchronization system of the present application.
  • Embodiment 1 of a data synchronization method according to the present application is a flow chart showing the steps of Embodiment 1 of a data synchronization method according to the present application;
  • FIG. 3 is a flow chart of steps of a second embodiment of a data synchronization method according to the present application.
  • Embodiment 4 is a flow chart showing the steps of Embodiment 3 of a data synchronization method according to the present application.
  • FIG. 5 is a flow chart showing the steps of Embodiment 4 of a data synchronization method according to the present application.
  • Embodiment 5 is a flow chart showing the steps of Embodiment 5 of a data synchronization method according to the present application.
  • FIG. 7 is a schematic structural diagram of a data synchronization system of the present application.
  • FIG. 8 is a schematic diagram of a state machine for data synchronization of the present application.
  • Embodiment 9 is a structural block diagram of Embodiment 1 of a data synchronization apparatus according to the present application.
  • FIG. 10 is a structural block diagram of a second embodiment of a data synchronization apparatus according to the present application.
  • FIG. 11 is a structural block diagram of a third embodiment of a data synchronization apparatus according to the present application.
  • FIG. 12 is a structural block diagram of Embodiment 4 of a data synchronization apparatus according to the present application.
  • the embodiment of the present application can be applied to data synchronization between an arbitrary data source and a destination end of a heterogeneous data source, a homogeneous data source, and the like, and is used to control the flow speed during the processing of the task to avoid running the above task.
  • the machine has problems such as full network card traffic and memory overflow, thus improving the stability of the production system.
  • FIG. 1 a schematic diagram of a data synchronization system of the present application is shown, which may include: a synchronization center 101 and a synchronization engine 102; wherein the synchronization engine 102 may specifically include: a scheduling unit 121 and a processing unit 122;
  • the synchronization center 101 is configured to receive a job submitted by the user, and submit the above job to the scheduling unit 121;
  • the scheduling module 121 is configured to divide the job into multiple tasks (Task), and perform scheduling execution on multiple tasks;
  • the processing unit 122 may be configured to process the foregoing job.
  • the processing unit 122 may specifically include: a read unit 1221, a channel 1222, and a write unit 1223.
  • the read unit 1221 may be configured to load from a data source.
  • the data is stored in channel 1222 as a buffer, and write unit 1223 can be used to read data from channel 1222 and write the read data to the destination.
  • the channel 1222 functions as a read/write buffer and the size of the memory space can be controlled.
  • the first control parameter transport.channel.capacity can be used to control the number of data records that the channel 1222 can put into, and the value can be a value of 512 or the like; for example, the second control parameter transport.channel.byteCapacity can be used to control the channel 1222.
  • the size of the bytes that can be placed into the data record for example, it can be 8MB of stack space.
  • the read and write operations of the channel 1222 need to be locked; and, when accessing the data to the channel 1222, it may be more Data is accessed in batches to avoid frequent locking problems caused by accessing each data, thereby improving data synchronization performance.
  • FIG. 2 a flow chart of the steps of the first embodiment of the data synchronization method of the present application is shown, which may specifically include the following steps:
  • Step 201 Dividing a job into a plurality of tasks; wherein the tasks have corresponding processing units;
  • the job may be a perspective provided to the user.
  • the job may be divided into multiple tasks according to the segmentation logic, and each of the tasks may have a corresponding processing unit.
  • the traffic speed of each task can be separately monitored. Since the difference of the processing process of different tasks can be considered, different traffic can be differentiated according to the traffic speed of each task. control.
  • the job is divided into five tasks: task 1, task 2, task 3, task 4, and task 5, wherein the flow speeds of task 1 and task 3 are in accordance with presets.
  • the number of tasks may be equal to the ratio of the job speed to the channel speed.
  • the working speed and the channel speed may be specified by the user or may be determined according to experience. The embodiment of the present application does not limit the working speed, the channel speed, and the specific number of tasks.
  • the embodiment of the present application can implement the concurrent processing of multiple tasks by using the multi-processing thread corresponding to the multiple processing unit. It will affect the execution speed of the job, and it can also improve the processing efficiency and data synchronization efficiency of the job.
  • the technical solution 1 can be applied to an application scenario such as an RDBMS (Relational Database Management System), which can divide a column interval into multiple sub-intervals to form a SQL (Structured Query Language) Where sentence to obtain The read SQL of each task; specifically, it may divide the column interval into a plurality of sub-intervals according to the column minimum value and the maximum value of the job correspondence data table.
  • RDBMS Relational Database Management System
  • SQL Structured Query Language
  • the demo table of the relational database Mysql has two fields of column id and name, wherein the id has a value range of [1, 100], and the number of tasks that need to be segmented is 5,
  • the data of each interval may be concurrently processed by the multi-processing threads corresponding to the plurality of processing units.
  • tasks corresponding to multiple intervals may be grouped into one task group for management.
  • a process may be used to manage tasks of a task group, and corresponding processing threads are established for each task in the process, and the processing thread may be used to perform operations corresponding to the processing unit. That is, the process of establishing a corresponding processing thread for each task may include: associating one read unit, one write unit, and one channel into one processing thread, and generally, each task may have a corresponding One of the processing threads.
  • the technical solution 2 can be applied to an application scenario such as ODPS, local file, FTP (File Transfer Protocol), etc., which can be based on the size of the file corresponding to the job. And dividing the number of copies, determining a segmentation point of the file, and segmenting the file according to the segmentation point.
  • an application scenario such as ODPS, local file, FTP (File Transfer Protocol), etc.
  • the data of a data table of the ODPS may be abstracted into a file, and the segmentation point of the file is determined according to the size and the number of segments of the corresponding file of the job, and according to the segmentation point
  • the file is divided into multiple chunks, so that multiple processing threads can be used to perform processing such as reading, buffering, and writing of each chunk.
  • the technical solution 1 and the technical solution 2 for dividing the work into a plurality of tasks are described above. It can be understood that any one of the above technical solutions may be adopted by a person skilled in the art according to actual application requirements, or The job is divided into other technical solutions for multiple tasks.
  • the job in the application scenario of the OSS (Object Storage Service), the job may be divided according to the file strength, assuming that there are multiple objects in a bucket. (object), you can configure an object name prefix to indicate the scope of the object to be dragged. If there are 10 objects to be synchronized, you can divide the job corresponding to a bucket into 10 tasks to achieve concurrent processing of 10 objects. .
  • object you can configure an object name prefix to indicate the scope of the object to be dragged. If there are 10 objects to be synchronized, you can divide the job corresponding to a bucket into 10 tasks to achieve concurrent processing of 10 objects. .
  • a job in an application scenario of an OTS (Open Table Service), may be divided according to a data storage partition of the OTS data.
  • OTS Open Table Service
  • Step 202 Monitor, during the processing of the task, a traffic speed of the task.
  • the traffic of the task may be monitored by reporting, and the current traffic speed of the task may be calculated according to the monitored traffic, where the current traffic speed may be the traffic flowing through the channel per unit time.
  • the process of monitoring the traffic of the task by using the reporting manner may be: processing the task by using a processing thread corresponding to the processing unit, to implement data synchronization of the task, and corresponding to the processing unit
  • the processing thread can report the traffic speed of the task to the corresponding process
  • the process can report the traffic speed of the multiple tasks of the job to the synchronization engine, so that the synchronization
  • the engine gets the traffic speed of all tasks for all jobs. It can be understood that the traffic speed of the task is monitored by the reporting method as an optional embodiment.
  • those skilled in the art can adopt any technical solution for monitoring the traffic speed of the task according to the actual application requirement.
  • the specific technical solution for monitoring the traffic speed of the task is not limited.
  • the traffic speed of the task may specifically include: a byte speed of the task (Byte) and/or a record speed of the task (Record Per Second); wherein the byte traffic speed of the task may flow through the unit time
  • the number of bytes of the channel indicates that the record flow rate of the task can be used to indicate the number of data records read in a unit time. It can be understood that the specific measurement manner of the traffic speed of the task is not limited in the embodiment of the present application.
  • Step 203 Perform a sleep process on the processing unit of the task when the traffic speed of the task meets a preset condition.
  • performing dormancy processing on the processing unit of the task so that the processing unit of the task suspends data synchronization for the task, thereby reducing the network card traffic of the machine running the task, and saving the memory of the machine running the task.
  • Resources to avoid the problem of full NIC traffic running on the machine running the above tasks, thereby improving the stability of the production system; and, by performing sleep processing on the processing unit of the task, the flow speed of the task can also be controlled within the upper limit of the flow speed. Therefore, the stability of the flow rate can be improved.
  • the process of performing the dormancy processing on the processing unit of the foregoing task may include: performing a sleep process on the processing thread corresponding to the processing unit, so that the processing thread suspends scheduling execution of the CPU (Central Processing Unit).
  • the traffic speed of the task conforming to the preset condition may specifically include: the byte traffic speed of the task exceeds a first threshold, and/or the recorded traffic speed of the task The second threshold is exceeded.
  • the first threshold and the second threshold may be specified by the user, or may be determined by the synchronization engine according to an empirical value.
  • the specific first threshold and the second threshold are not limited in this embodiment.
  • the step of performing a dormancy process on the processing unit of the task may specifically include:
  • Step S11 determining, according to the traffic speed, the monitoring period, and the upper limit of the traffic speed of the task, The sleep duration of the processing unit of the task;
  • Step S12 The processing unit that controls the task enters a sleep state in which the maintenance time is the sleep duration.
  • the monitoring period can be used to indicate the period of the traffic speed of the monitoring task, which can be expressed by the process reporting the period of the traffic speed of the task to the synchronization engine, or the interval between two reports, for example, the monitoring period can be 20 milliseconds, etc. It can be understood that the embodiment of the present application does not limit the specific monitoring period.
  • the formula for determining the sleep duration of the processing unit of the task may be: (flow rate * monitoring period) / flow rate upper limit - monitoring period, it can be understood that the embodiment of the present application is for the sleep duration
  • the specific determination method is not limited.
  • the embodiment of the present application divides a job into multiple tasks, and each of the tasks may have a corresponding processing unit, thereby monitoring the traffic speed of the task during the processing of the task, and in the When the flow speed of the task meets the preset condition, the processing unit of the task performs a sleep process; because the processing unit of the task performs the sleep process, the processing unit of the task may pause the data synchronization for the task, thereby reducing the operation.
  • the network card traffic of the above-mentioned tasks saves the memory resources of the machine running the above tasks, and avoids problems such as full network card traffic and memory overflow on the machine running the above tasks, thereby improving the stability of the production system.
  • the embodiment of the present application can separately monitor the traffic speed of each task, and perform differentiated flow control on different tasks according to the monitoring result, so that the flow control of different tasks is independent of each other, for example, multiple tasks for the same job.
  • the traffic speed of some tasks exceeds the threshold, so the processing unit can be dormant to alleviate the pressure of the network card traffic and memory resources of the machine running these tasks. However, the traffic speed of some tasks does not exceed the threshold.
  • the processing unit is not dormant, so the embodiment of the present application can improve the rationality of the flow control.
  • FIG. 3 a flow chart of the second embodiment of the data synchronization method of the present application is shown, which may be applied to the synchronization process of the synchronization engine, and may specifically include the following steps:
  • Step 301 Dividing a job into a plurality of tasks; wherein the tasks have corresponding processing orders yuan;
  • Step 302 Obtain the current report content, where the report content is the current traffic of multiple tasks of the job reported by the processing process of the job to the synchronization process;
  • Step 303 determining whether the current time interval from the last reporting time reaches the monitoring period, and if so, executing step 304, otherwise, returning to step 302;
  • Step 304 Determine whether the format of the current traffic is a byte or a record.
  • Step 305 When the format of the current traffic is byte, calculate the current byte traffic speed according to the current reporting content, the last reporting content, and the time interval between the two reporting contents.
  • Step 306 determining whether the current byte traffic speed exceeds the first threshold, and if so, executing step 307, otherwise, returning to step 302;
  • Step 307 Determine, according to the current byte traffic speed of the task, a time interval between the two reports, and a first threshold, determining a sleep duration of the processing unit of the task.
  • Step 308 When the format of the current traffic is record, calculate the current recorded traffic speed according to the current report content, the last report content, and the time interval between the two report contents;
  • Step 309 determining whether the current recorded traffic speed exceeds the second threshold, and if so, executing step 310, otherwise, returning to step 302;
  • Step 310 Determine, according to the current recorded traffic speed of the task, a time interval between the two reports, and a second threshold, determining a sleep duration of the processing unit of the task;
  • Step 311 The processing unit that controls the task enters a sleep state in which the maintenance time is the sleep duration.
  • FIG. 4 a flow chart of the steps of the third embodiment of the data synchronization method of the present application is shown, which may specifically include the following steps:
  • Step 401 Create a resource group for a user, where the resource group may specifically include at least one machine.
  • Step 402 Process the user's job by using resources of the machine in the resource group, where the step 402 of processing the user's job by using the machine in the resource group, Specifically, it may include:
  • Step 421 The machine in the resource group is used to divide the job into multiple tasks; wherein the task has a corresponding processing unit;
  • Step 422 During the processing of the task, use a machine in the resource group to monitor a traffic speed of the task.
  • Step 423 When the traffic speed of the task meets a preset condition, the device in the resource group performs a dormancy process on the processing unit of the task.
  • the embodiment may also create a resource group for the user, and process the user's job by using resources of the machine in the resource group, so that the user may synchronize resources.
  • the isolation allows for the interaction between jobs of different users. Since cloud computing usually requires resource isolation between multiple users (two unrelated users or tenants), the occupation of resources by one user should not affect the use of resources by another user. Therefore, this embodiment can be applied to cloud computing. Scenes.
  • the resource of the machine may specifically include: a slot resource obtained according to the physical resource abstraction of the machine.
  • Each machine has its own physical resources, which can include CPU, disk, memory, network card, etc., and the physical resources can be abstracted to obtain the slot resource.
  • the performance of the machine is stronger. The more resources, the worse the performance of the machine, the less the slot resources.
  • a resource group may be created for each user and a slot resource of the machine corresponding to the resource group may be set; since the user's job runs in the corresponding resource group, between the resource groups Jobs do not interfere with each other, so interaction between jobs of different users can be avoided.
  • processing the user's job by using resources of the machine in the resource group may further include other operations, such as allocating resources of the machine for the task, utilizing the resources of the machine.
  • the specific process of processing the job of the user by using the resources of the machine in the resource group is not limited.
  • the step of processing, by the machine in the resource group, the job of the user may include: when the remaining resources of the machine in the user corresponding resource group exceed the resources required by the user And processing, by the machine in the resource group, the user's job. At the same time, when the remaining resources of the machine in the user corresponding resource group do not exceed the resources required by the user's job, the user's job may be queued until the user corresponding resources of the machine in the resource group. Exceeding the resources required by the user's work, the above processing can avoid the memory overflow problem of the machine running the above task, thereby improving the stability of the production system.
  • the expected slot resource can be calculated according to the user-specified expected number of channels, and the calculation result is used as a resource required by the user's job.
  • the expected number of cuts can be used. Indicates the number of tasks that may be concurrently executed at the same time. It can be understood that the specific determination manner of the resources required for the user's work is not limited in the embodiment of the present application.
  • the process of using the device in the resource group to process the user's job may be: assigning a corresponding slot resource to each task, and using the slot resource.
  • the task is processed, and after the task is completed, the corresponding slot resources are released to save the slot resources of the machine running the job.
  • the slot resource can be set to 160, and the slot resource allocated for each task can be 2, which can be understood.
  • the slots resources of the machine and the slot resources allocated for each task are taken as an example. In fact, those skilled in the art can determine the slot resources of the machine and the specific values of the slot resources allocated for each task according to actual application requirements. . It can be understood that the actual number of tasks obtained by the segmentation may exceed the expected number of segments, ChannelNumber. In this case, the tasks that are exceeded may be queued.
  • FIG. 5 a flow chart of the steps of the fourth embodiment of the data synchronization method of the present application is shown, which may specifically include the following steps:
  • Step 501 Dividing a job into a plurality of tasks; wherein the tasks have corresponding processing units;
  • Step 502 Monitor, during the processing of the task, a traffic speed of the task.
  • Step 503 Perform a sleep process on the processing unit of the task when the traffic speed of the task meets a preset condition
  • Step 504 When the database has available concurrent connections, use the concurrent connection to process the task.
  • Step 505 After the task processing is completed, release the concurrent connection occupied by the job.
  • the number of database connections increased per database is limited. Take the data source database (similar to the destination database, refer to each other). In the data extraction, the number of concurrent connections that each data source database can provide is limited, and often the same database will be caught by multiple synchronization jobs. Taking snapshot data (possibly multiple tables of the same library), simply increasing the number of concurrent connections may cause the database to be overloaded.
  • the embodiment describes the processing procedure of the task. Specifically, when there is a concurrent connection available in the database, the concurrent connection can be used to process the job, because when the database does not exist, the concurrent is available. When connecting, you can wait for the job to process, instead of increasing the number of concurrent connections, thus avoiding the problem of increasing the number of concurrent connections and causing excessive database load, which can improve the stability of the production system.
  • each database may be configured with a preset number of concurrent connections that can be supported.
  • the current concurrent connection is processed during the processing of the task by using the concurrent connection.
  • the number can be equal to the number of preset concurrent connections minus 1.
  • the current number of concurrent connections is greater than 0, it indicates that there are available concurrent connections.
  • the current number of concurrent connections is less than or equal to 0, it means that there is no available concurrent connection.
  • the occupied concurrent connection can be released. In this case, the current number of concurrent connections can be increased by 1, so that the released concurrent connection can support the new task.
  • the processing unit may specifically include: a reading unit and a writing unit, where the step of processing the task by using the concurrent connection may specifically include:
  • Step S21 performing out-of-order processing on the plurality of read units and the plurality of write units corresponding to the job respectively;
  • Step S22 selecting a read unit and a write unit from the out-of-order processing result to form a corresponding Task.
  • the optional embodiment performs out-of-order processing on the processing unit of the plurality of tasks, the plurality of channels, and the plurality of writing units corresponding to the job, so that the concurrent connection can be prevented from falling on the same database, and the pressure of the database is reduced.
  • This alternative embodiment is applicable to application scenarios such as reading of a sub-database.
  • the database can be divided into sub-libraries.
  • the sub-library refers to splitting the massive data from one database storage management into multiple database storage management.
  • the table refers to the massive data is divided into multiple data table storage management by a data table storage management.
  • the database can be divided according to geographical regions (such as Beijing and Jiangsu have different sub-libraries), production time and other factors.
  • the sub-table table can be represented as a logical table in the application layer, and a logical table can represent 1024 physical tables, and the 1024 physical tables are located in multiple sub-tables under different sub-databases.
  • a user's job is for a logical table, for example, the job can be used to put 1024
  • the contents of the physical table are synchronized to the destination.
  • the job is divided into 1024 tasks.
  • the number of tasks that the slot resource allows concurrent processing is 30.
  • the concurrent connections corresponding to the 30 tasks are the same.
  • the 1024 tasks can be processed out of order, and the corresponding new tasks are obtained according to the out-of-order processing result combination, and the new tasks are re-processed concurrently.
  • FIG. 6 a flow chart of the steps of the fifth embodiment of the data synchronization method of the present application is shown, which may specifically include the following steps:
  • Step 601 Dividing a job into a plurality of tasks; wherein the tasks have corresponding processing units;
  • Step 602 Perform processing of the task when the current time is in a time window of the task corresponding database.
  • Step 603 Monitor the traffic speed of the task during the processing of the task.
  • Step 604 Perform a sleep process on the processing unit of the task when the traffic speed of the task meets a preset condition.
  • the embodiment may configure a time window for the database, and when the current time is within the time window of the job corresponding database, process the job, and the time window of the job corresponding database is not at the current time. Intra-time, waiting time window comes, so it can avoid data synchronization affecting the performance of online database.
  • the time window may be used to indicate a low peak period of the database corresponding service, which may include: a start time begin_time and an end time end_time, and before the data synchronization, when the job can be scheduled, it is determined whether the current time is at the start time and the end time. If so, the above operation is processed; if not, the waiting time is not reached, waiting to enter the time window; if the end time has been exceeded, the job fails.
  • the database in the production environment is generally redundant for primary and backup, and the primary database will set the data.
  • the data is copied to the standby database.
  • the data extraction generally reads the standby database.
  • the specific solution can be: check the active/standby replication. If the standby data does not reach the specified location, Without data synchronization, databases such as Oracel and Mysql have mechanisms to check this status.
  • the polling method can be used to determine whether the standby data reaches the specified location.
  • FIG. 7 a schematic structural diagram of a data synchronization system of the present application is shown, which may specifically include: a synchronization center 701, a resource management module 702, and a synchronization engine. 703; wherein, the synchronization engine 703 may specifically include: a scheduling module 731 and a processing unit 732;
  • the synchronization center 701 is configured to receive a job submitted by the user, and submit the job to the resource management module 702;
  • the resource management module 702 is configured to uniformly coordinate and manage resources of the machine, and allocate corresponding resources to the synchronization engine 703 for processing the foregoing operations for the foregoing operations; each machine may provide corresponding slot resources, and multiple machines form a cluster or The resource group, the corresponding synchronization job runs in the specified resource group and does not interfere with each other;
  • the scheduling module 731 is configured to divide a job into multiple tasks, and perform scheduling execution on multiple tasks;
  • the processing unit 732 can be configured to process the foregoing job.
  • the processing unit 732 can specifically include: a read unit 7321, a channel 7322, and a write unit 7323, wherein the read unit 7321 can be used to load data from the data source and store the data as A buffer 7322, write unit 7323 can be used to read data from channel 7322 and write the read data to the destination.
  • FIG. 8 a schematic diagram of a state machine for data synchronization of the present application is shown.
  • the state machine may be applied to a CDP (Cloud Data Pipeline) or a synchronization center 701.
  • the corresponding data synchronization process may specifically include:
  • Step S1 The user submits a new job to the synchronization center 701 or the cloud channel, and the job is in the SUBMITTED state, and if the submission fails, the FAILED state is entered;
  • Step S2 the synchronization center 701 or the cloud channel performs preliminary flow control on the job in the SUBMITTED state, and the preliminary flow control may specifically include at least one of the following determination schemes.
  • the preliminary flow control may specifically include at least one of the following determination schemes.
  • the standby database (the general online database provides the primary and backup redundancy, synchronously extracts the read standby database, loads and writes to the main library, and needs to determine whether the standby database data is complete or not, whether it includes complete data to be read. );
  • the synchronization center 701 or the cloud channel submits the job to the resource management module (Alisa) 702, and the job enters the READY state; the judgment result outputted by the above judgment scheme does not satisfy the task.
  • Job entry failure (FAILED) status when required;
  • Step S3 in the READY state, the job waits for the resources of the machine (disk, network card, CPU, memory).
  • the resource management module 702 submits the job to the synchronization engine 703. If the submission is successful, the job enters the operation. (RUNNING) state, if the submission fails, the job enters the FAILED state; when the resource of the machine does not meet the demand, the job can be processed;
  • each job can occupy a certain amount of machine resources, and can allocate resources of the machine to the job according to the priority of the job;
  • Step S4 When the job is in the RUNNING state, the resource management module 702 finds a synchronization process of a machine running the synchronization engine 703 in a resource group, and the synchronization process can occupy corresponding resources;
  • the function of the scheduling module 731 and the processing unit 732 can be completed on the synchronization process. Specifically, the data can be loaded from the data source, read into the memory, and the data is written to the destination.
  • the synchronization process can also perform flow control on the job in the RUNNING state. For example, when the job is divided into five tasks, if the upper limit of the byte traffic speed of each task is 1 MBPS, the synchronization process can be followed. This constraint performs flow control of the task;
  • the synchronization process can also report the transient status of the running job to the synchronization.
  • the reporting interval may be 10 seconds.
  • the instantaneous status of the report may include: the total number of records read, the total number of bytes read, the percentage of task progress, the number of dirty records read, and the read dirty data bytes. The number, the task reading speed, and the like, so that the user obtains the above transient state by accessing the synchronization center 701 or the cloud channel query;
  • Step S5 when the user thinks that the job does not need to continue to run, the job in the READY, RUNNING state may be stopped;
  • the synchronization process may be performed when the job is in the READY state, so the corresponding stop process may be: modifying the related operation information of the synchronization center 701 or the cloud channel, and releasing the job request of the resource management module 702;
  • stopping a job is not a transient and complete process. It is necessary to periodically send the Kill signal and poll the job status until the job status reaches the KILLED state from the KILLING state.
  • the synchronization process in the embodiment of the present application may be run in a stand-alone environment or a distributed environment, and the embodiment of the present application does not limit the specific running environment of the synchronization process.
  • the embodiment of the present application divides a job into a plurality of tasks, each of which may have a corresponding processing unit, thereby monitoring the traffic speed of the task during the processing of the task, and in the When the flow speed of the task meets the preset condition, the processing unit of the task performs a sleep process; because the processing unit of the task performs the sleep process, the processing unit of the task may pause the data synchronization for the task, thereby reducing the operation.
  • the NIC traffic of the above-mentioned tasks avoids problems such as full NIC traffic on the machine running the above tasks, thereby improving the stability of the production system.
  • the embodiment of the present application can separately monitor the traffic speed of each task, and perform differentiated traffic control on different tasks according to the monitoring result, so that the flow control of different tasks is mutually Independently, for example, for multiple tasks of the same job, the traffic speed of some tasks exceeds the threshold, so the processing unit can be dormant to alleviate the pressure of the network card traffic, memory resources, etc. of the machine running these tasks, and The traffic speed of the task does not exceed the threshold, and the processing unit may not be dormant. Therefore, the embodiment of the present application can improve the rationality of the flow control;
  • the embodiment of the present application creates a resource group for the user, and processes the user's job by using the resources of the machine in the resource group, so that the synchronization of the resources between the users can be realized, thereby avoiding the operations of different users.
  • the interaction between each other since cloud computing usually requires resource isolation between multiple users (two unrelated users or tenants), the occupation of resources by one user should not affect the use of resources by another user. Therefore, this embodiment can be applied to cloud computing. Scenes;
  • the concurrent connection when there is a concurrent connection available in the database, the concurrent connection may be used to process the job, because when there is no concurrent connection available in the database, the job may be waited for, instead of being increased.
  • the number of concurrent connections can avoid the problem of increasing the number of concurrent connections and causing excessive database load, which can improve the stability of the production system;
  • the embodiment of the present application may configure a time window for the database, and when the current time is within the time window of the job corresponding database, process the above operation, when the current time is not in the time window of the job corresponding database, Waiting for the arrival of the time window, thus avoiding the impact of data synchronization on the online database;
  • the embodiment of the present application can be applied to data synchronization between any two data sources, and therefore has good versatility.
  • FIG. 9 a structural block diagram of a first embodiment of a data synchronization apparatus of the present application is shown, which may specifically include the following modules:
  • a segmentation module 901 configured to divide a job into a plurality of tasks; wherein the tasks have corresponding processing units;
  • the monitoring module 902 is configured to monitor a traffic speed of the task during processing of the task.
  • the hibernation module 903 is configured to perform a dormancy process on the processing unit of the task when the traffic speed of the task meets a preset condition.
  • the traffic speed of the task conforming to the preset condition may specifically include: the byte traffic speed of the task exceeds a first threshold, and/or the recorded traffic speed of the task The second threshold is exceeded.
  • the dormant module 903 may specifically include:
  • Determining a sub-module configured to determine, according to the traffic speed, the monitoring period, and the upper limit of the traffic speed of the task, a sleep duration of the processing unit of the task;
  • control submodule configured to control the processing unit of the task to enter a sleep state in which the maintenance time is the sleep duration.
  • the segmentation module 901 may specifically include:
  • a first molecular module for dividing a column interval into a plurality of sub-intervals according to a column minimum value and a maximum value of the job corresponding data table
  • the second splicing module is configured to determine a singularity point of the file according to the size and the number of dicing files of the job corresponding file, and segment the file according to the dicing point.
  • FIG. 10 a structural block diagram of a second embodiment of a data synchronization apparatus of the present application is shown, which may specifically include the following modules:
  • a creating module 1001 configured to create a resource group for a user; wherein the resource group includes at least one machine;
  • the job processing module 1002 is configured to process the user's work by using resources of the machine in the resource group, where the job processing module 1002 may specifically include:
  • the molecular module 1021 is configured to use the machine in the resource group to divide the job into multiple tasks; Wherein the task has a corresponding processing unit;
  • a monitoring sub-module 1022 configured to monitor, by the machine in the resource group, a traffic speed of the task during processing of the task;
  • the sleep sub-module 1023 is configured to perform a sleep process on the processing unit of the task by using a machine in the resource group when the traffic speed of the task meets a preset condition.
  • the job processing module 1002 may specifically include:
  • the condition processing submodule is configured to process, by the machine in the resource group, the user's job when the remaining resources of the machine in the user corresponding resource group exceed the resources required by the user's job.
  • the resources of the machine may specifically include: a slot resource obtained according to the physical resource abstraction of the machine.
  • FIG. 11 a structural block diagram of a third embodiment of a data synchronization apparatus of the present application is shown, which may specifically include the following modules:
  • a segmentation module 1101 configured to divide a job into a plurality of tasks; wherein the tasks have corresponding processing units;
  • the first task processing module 1102 is configured to process the task by using the concurrent connection when there is a concurrent connection available in the database;
  • the monitoring module 1103 is configured to monitor a traffic speed of the task during processing of the task;
  • the hibernation module 1104 is configured to perform a dormancy process on the processing unit of the task when the traffic speed of the task meets a preset condition;
  • the release module 1105 is configured to release the concurrent connection occupied by the task after the task processing is completed.
  • the processing unit may specifically include: a reading unit and a writing unit;
  • the first task processing module 1102 may specifically include:
  • An out-of-order processing sub-module configured to perform out-of-order processing on the plurality of read units and the plurality of write units corresponding to the job respectively;
  • the combination sub-module is configured to select a read unit and a write unit from the out-of-order processing result to form a corresponding task.
  • FIG. 12 a structural block diagram of a fourth embodiment of a data synchronization apparatus of the present application is shown, which may specifically include the following modules:
  • a segmentation module 1201 configured to divide a job into a plurality of tasks; wherein the tasks have corresponding processing units;
  • the second task processing sub-module 1202 is configured to perform processing of the task when the current time is within a time window of the task corresponding database;
  • the monitoring module 1203 is configured to monitor a traffic speed of the task during processing of the task;
  • the hibernation module 1204 is configured to perform a dormancy process on the processing unit of the task when the traffic speed of the task meets a preset condition.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种数据同步方法和装置,其中的方法具体包括:将作业切分为多个任务;其中,所述任务具有对应的处理单元(201);在所述任务的处理过程中,监控所述任务的流量速度(202);在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理(203)。所述方法和装置可以避免运行上述任务的机器出现网卡流量打满等问题,从而能够提高生产***的稳定性。

Description

一种数据同步方法和装置 技术领域
本申请涉及数据同步技术领域,特别是涉及一种数据同步方法和一种数据同步装置。
背景技术
数据同步是建立生产***中数据源和数据目的端的数据一致性的过程。根据同步数据源类型,数据同步可被分为数据库同步、文件同步等;根据同步策略,数据同步可被分为离线同步和实时同步,其中,离线同步可以对存储***数据抽取其快照(snapshot),其常被应用于BI(商务智能,Business Intelligence)分析、数据迁移等场景。
现有方案往往通过同步ETL(抽取、转换、装载,Extract Transform Load)工具来实现上述离线同步,相应的数据同步流程具体可以包括:抽取线上数据到离线分析平台,如ODPS(开放数据处理服务,Open Data Processing Service)、Hadoop等,在上述离线分析平台进行大数据分析和计算,完成计算后将计算结果加载至线上数据库供给前台业务使用。
然而,Sqoop等现有同步ETL工具并未考虑到数据同步过程中的流量控制,这样,当多个作业(Job)在同一台机器上运行时,该台机器将会出现网卡流量打满等问题,因此影响了生产***的稳定性。
发明内容
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种数据同步方法和相应的一种数据同步装置,可以避免运行上述任务的机器出现网卡流量打满等问题,从而能够提高生产***的稳定性。
为了解决上述问题,本申请公开了一种数据同步方法,包括:
将作业切分为多个任务;其中,所述任务具有对应的处理单元;
在所述任务的处理过程中,监控所述任务的流量速度;
在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理。
优选的,所述任务的流量速度符合预置条件包括:所述任务的字节流量速度超出第一阈值,和/或,所述任务的记录流量速度超出第二阈值。
优选的,所述对所述任务的处理单元进行休眠处理的步骤,包括:
依据所述任务的流量速度、监控周期和流量速度上限,确定所述任务的处理单元的休眠时长;
控制所述任务的处理单元进入维持时间为所述休眠时长的休眠状态。
优选的,所述方法还包括:
针对用户创建资源组;其中,所述资源组包括至少一台机器;
利用所述资源组中机器的资源对所述用户的作业进行处理,其中,所述利用所述资源组中机器对所述用户的作业进行处理的步骤,包括:
利用所述资源组中机器将作业切分为多个任务;其中,所述任务具有对应的处理单元;
在所述任务的处理过程中,利用所述资源组中机器监控所述任务的流量速度;
在所述任务的流量速度符合预置条件时,利用所述资源组中机器对所述任务的处理单元进行休眠处理。
优选的,所述利用所述资源组中机器对所述用户的作业进行处理的步骤,包括:
在所述用户对应资源组中机器的剩余资源超出所述用户的作业所需的资源时,利用所述资源组中机器对所述用户的作业进行处理。
优选的,所述机器的资源包括:依据所述机器的物理资源抽象得到的槽位资源。
优选的,所述方法还包括:
当数据库存在可用的并发连接时,利用所述并发连接对所述任务进行处理;
在所述任务处理完成后,释放所述任务所占用的并发连接。
优选的,所述处理单元包括:读单元和写单元;
所述利用所述并发连接对所述任务进行处理的步骤,包括:
分别对所述作业对应的多个读单元和多个写单元进行乱序处理;
从乱序处理结果中选择一个读单元和一个写单元,组成相应的任务。
优选的,所述方法还包括:在当前时间处于任务对应数据库的时间窗口内时,进行所述任务的处理。
优选的,所述将作业切分为多个任务的步骤,包括:
根据所述作业对应数据表的列最小值和最大值,将列区间划分多个子区间;或者
根据所述作业对应文件的大小和切分份数,确定所述文件的切分点位,并依据所述切分点位对所述文件进行切分。
另一方面,本申请还公开了一种数据同步装置,包括:
切分模块,用于将作业切分为多个任务;其中,所述任务具有对应的处理单元;
监控模块,用于在所述任务的处理过程中,监控所述任务的流量速度;及
休眠模块,用于在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理。
优选的,所述任务的流量速度符合预置条件包括:所述任务的字节流量速度超出第一阈值,和/或,所述任务的记录流量速度超出第二阈值。
优选的,所述休眠模块,包括:
确定子模块,用于依据所述任务的流量速度、监控周期和流量速度上限,确定所述任务的处理单元的休眠时长;
控制子模块,用于控制所述任务的处理单元进入维持时间为所述休眠时长的休眠状态。
优选的,所述装置还包括:
创建模块,用于针对用户创建资源组;其中,所述资源组包括至少一台 机器;
作业处理模块,用于利用所述资源组中机器的资源对所述用户的作业进行处理,其中,所述利用所述资源组中机器对所述用户的作业进行处理的过程,包括:
利用所述资源组中机器将作业切分为多个任务;其中,所述任务具有对应的处理单元;
在所述任务的处理过程中,利用所述资源组中机器监控所述任务的流量速度;
在所述任务的流量速度符合预置条件时,利用所述资源组中机器对所述任务的处理单元进行休眠处理。
优选的,所述作业处理模块,包括:
条件处理子模块,用于在所述用户对应资源组中机器的剩余资源超出所述用户的作业所需的资源时,利用所述资源组中机器对所述用户的作业进行处理。
优选的,所述机器的资源包括:依据所述机器的物理资源抽象得到的槽位资源。
优选的,所述装置还包括:
第一任务处理模块,用于当数据库存在可用的并发连接时,利用所述并发连接对所述任务进行处理;
释放模块,用于在所述任务处理完成后,释放所述任务所占用的并发连接。
优选的,所述处理单元包括:读单元和写单元;
所述第一任务处理模块,包括:
乱序处理子模块,用于分别对所述作业对应的多个读单元和多个写单元进行乱序处理;
组合子模块,用于从乱序处理结果中选择一个读单元和一个写单元,组成相应的任务。
优选的,所述装置还包括:
第二任务处理子模块,用于在当前时间处于任务对应数据库的时间窗口内时,进行所述任务的处理。
优选的,所述切分模块,包括:
第一切分子模块,用于根据所述作业对应数据表的列最小值和最大值,将列区间划分多个子区间;或者
第二切分子模块,用于根据所述作业对应文件的大小和切分份数,确定所述文件的切分点位,并依据所述切分点位对所述文件进行切分。
本申请实施例包括以下优点:
本申请实施例将作业切分为多个任务,其中的每个任务都可以具有对应的处理单元,由此在任务的处理过程,可以对任务的流量速度进行监控,并在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理;由于对所述任务的处理单元进行休眠处理,可使得任务的处理单元暂停对于任务的数据同步,因此可以降低运行上述任务的机器的网卡流量,避免运行上述任务的机器出现网卡流量打满等问题,从而提高生产***的稳定性。
并且,本申请实施例可以对每个任务的流量速度进行单独监控,并依据监控结果对不同任务进行有区分的流量控制,使得不同任务的流量控制相互独立,例如,对于同一作业的多个任务,有的任务的流量速度超出了阈值,故可以对其处理单元进行休眠处理,以缓解运行这些任务的机器的网卡流量、内存资源等压力,而有的任务的流量速度未超出阈值,则可以不对其处理单元进行休眠处理,因此本申请实施例可以提高流量控制的合理性。
附图说明
图1是本申请的一种数据同步***的结构示意图;
图2是本申请的一种数据同步方法实施例一的步骤流程图;
图3是本申请的一种数据同步方法实施例二的步骤流程图;
图4是本申请的一种数据同步方法实施例三的步骤流程图;
图5是本申请的一种数据同步方法实施例四的步骤流程图;
图6是本申请的一种数据同步方法实施例五的步骤流程图;
图7是本申请的一种数据同步***的结构示意图;
图8是本申请的一种数据同步的状态机的示意图;
图9是本申请的一种数据同步装置实施例一的结构框图;
图10是本申请的一种数据同步装置实施例二的结构框图;
图11是本申请的一种数据同步装置实施例三的结构框图;及
图12是本申请的一种数据同步装置实施例四的结构框图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
本申请实施例可以应用于异构数据源、同构数据源等任意的数据源头和目的端之间的数据同步,用于通过对任务的处理过程中的流量速度进行控制,避免运行上述任务的机器出现网卡流量打满、内存溢出等问题,从而提高生产***的稳定性。
参照图1,示出了本申请的一种数据同步***的结构示意图,其具体可以包括:同步中心101和同步引擎102;其中,同步引擎102具体可以包括:调度单元121和处理单元122;
其中,同步中心101用于接收用户提交的作业,并将上述作业提交到调度单元121;
调度模块121用于将作业切分为多个任务(Task),并对多个任务进行调度执行;
处理单元122可用于对上述作业进行处理,上述处理单元122具体可以包括:读单元(Reader)1221、通道(Channel)1222和写单元(Writer)1223,其中,读单元1221可用于从数据源头加载数据,并存入作为缓冲区的通道1222,写单元1223可用于从通道1222读取数据,并将读取的数据写入目的端。
需要说明的是,通道1222作为读写的中转缓冲区,其内存空间的大小可被控制。例如,第一控制参数transport.channel.capacity可用于控制通道1222能够放入数据记录的条数,其值可以为512等数值;又如,第二控制参数transport.channel.byteCapacity可用于控制通道1222能够甭管放入数据记录的字节大小,例如,其可以为8MB大小的堆栈空间。
另外,在本申请的一种可选实施例中,为了保障多线程处理的安全性,需要对通道1222的读写操作进行加锁;并且,在向通道1222中存取数据时,可以对多个数据批量存取,以避免每个数据进行存取所导致的频繁加锁问题,从而能够提高数据同步性能。
方法实施例一
参照图2,示出了本申请的一种数据同步方法实施例一的步骤流程图,具体可以包括如下步骤:
步骤201、将作业切分为多个任务;其中,所述任务具有对应的处理单元;
本申请实施例中,作业可以为对用户提供的视角,在作业的处理过程中,可以根据切分逻辑将作业切分为多个任务,其中的每个任务都可以具有对应的处理单元,这样,在任务的处理过程,可以对每个任务的流量速度进行单独监控,由于可以考虑到不同任务的处理过程的差异性,因此,可以根据每个任务的流量速度对不同任务进行有区分的流量控制。
例如,在本申请的一种应用示例1中,作业被切分为5个任务:任务1、任务2、任务3、任务4和任务5,其中,任务1和任务3的流量速度符合预置条件,而任务2、任务4和任务5的流量速度不符合预置条件,故可以仅对任务1和任务3进行流量控制,而可以不对任务2、任务4和任务5进行流量控制。
在本申请的一种可选实施例中,所述任务的数量可以等于作业速度与通道速度的比值。其中,作业速度、通道速度可以由用户指定,也可依据经验确定,本申请实施例对于作业速度、通道速度和任务的具体数量不加以限制。
需要说明的是,本申请实施例中虽然对任务的流量速度进行了控制,但是由于本申请实施例可以通过多处理单元对应的多处理线程实现多个任务的并发处理,因此本申请实施例不会影响到作业的执行速度,而且还能够提高作业的处理效率和数据同步效率。
本申请实施例可以提供将作业切分为多个任务的如下技术方案:
技术方案1、
技术方案1可以适用于RDBMS(关系数据库管理***,Relational Database Management System)等类似的应用场景,其可以将列区间划分多个子区间,形成SQL(结构化查询语言,Structured Query Language)Where字句以得到各个任务的读取SQL;具体地,其可以根据所述作业对应数据表的列最小值和最大值,将列区间划分多个子区间。
在本申请的一种应用示例2中,假设关系数据库Mysql的demo表有列id和name两个字段,其中id的取值范围为[1,100],假设需要切分的任务的数量为5,则切分得到的区间可以为:[1<=id<20],[20<=id<40],[40<=id<60],[60<=id<80],[80<=id<100],[id is null];可以理解,上述5只是作为任务的数量的示例,而不理解为本申请实施例对于任务的数量的应用限制。
对于上述应用示例2,在区间切分完成后,可以通过多个处理单元对应的多处理线程并发处理各个区间的数据。可选地,为了便于管理,可以将多个区间对应的任务组成一个任务组进行管理。可选地,为了减少资源消耗,可以采用一个进程对一个任务组的任务进行管理,并在该进程下针对每个任务建立对应的处理线程,上述处理线程可用于执行上述处理单元对应的操作,也即,上述针对每个任务建立对应的处理线程的过程具体可以包括:将1个读单元、1个写单元和一个Channel关联起来,放入一个处理线程执行,通常,每个任务可以具有对应的一个处理线程。
技术方案2、
技术方案2可以适用于ODPS、本地文件、FTP(文件传输协议,File Transfer Protocol)等类似的应用场景,其可以根据所述作业对应文件的大小 和切分份数,确定所述文件的切分点位,并依据所述切分点位对所述文件进行切分。
在实际应用中,可以将ODPS的一个数据表的数据抽象为一个文件,根据所述作业对应文件的大小和切分份数,确定所述文件的切分点位,并依据所述切分点将文件切分为多个分块,从而可以利用多处理线程进行各个分块的读取、缓冲和写入等处理。
以上对将作业切分为多个任务的技术方案1和技术方案2进行了介绍,可以理解,本领域技术人员可以根据实际应用需求,采用上述技术方案中的任一,或者,还可以采用将作业切分为多个任务的其他技术方案。
在本申请的一种可选实施例中,在OSS(对象存储服务,Object Storage Service)的应用场景下,还可以根据文件力度对作业进行切分,假设一个bucket(桶)中有多个object(对象),则可以配置一个object名称前缀表示需要拖取的object范围,如果有10个object需要同步,则可以将一个bucket对应的作业切分为10个Task,以实现10个object的并发处理。
在本申请的另一种可选实施例中,在OTS(开放结构化数据服务,Open Table Service)的应用场景下,可以根据OTS数据的数据存储分区对作业进行切分等。
可以理解,本领域技术人员可以根据数据源头的特性,采用相匹配的将作业切分为多个任务的技术方案,本申请实施例对于将作业切分为多个任务的具体技术方案不加以限制。
步骤202、在所述任务的处理过程中,监控所述任务的流量速度;
在本申请的一种可选实施例中,可以通过汇报方式监控任务的流量,并根据所监控的流量计算任务的当前流量速度,其中,当前流量速度可以为单位时间内流过通道的流量。
在本申请的另一种可选实施例中,上述通过汇报方式监控任务的流量的过程可以为:采用处理单元对应的处理线程进行任务的处理,以实现任务的数据同步,并且,处理单元对应的处理线程可以向对应的进程汇报任务的流量速度,进程可以向同步引擎汇报作业的多个任务的流量速度,以使同步引 擎得到所有作业的所有任务的流量速度。可以理解,上述通过汇报方式监控任务的流量速度只是作为可选实施例,实际上,本领域技术人员可以根据实际应用需求,采用监控任务的流量速度的任意技术方案,本申请实施例对于通过汇报方式监控任务的流量速度的具体技术方案不加以限制。
在实际应用中,任务的流量速度具体可以包括:任务的字节(Byte)流量速度和/或任务的记录流量速度(Record Per Second);其中,任务的字节流量速度可用单位时间内流过通道的字节数来表示,任务的记录流量速度可用于单位时间内读取的数据记录的条数来表示,可以理解,本申请实施例对于任务的流量速度的具体衡量方式不加以限制。
步骤203、在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理。
本申请实施例中,对所述任务的处理单元进行休眠处理,可使得任务的处理单元暂停对于任务的数据同步,因此可以降低运行上述任务的机器的网卡流量,节省运行上述任务的机器的内存资源,避免运行上述任务的机器出现网卡流量打满问题,从而提高生产***的稳定性;并且,对所述任务的处理单元进行休眠处理,还可以将任务的流量速度控制在流量速度上限内,因此可以提高流量速度的稳定性。
在实际应用中,对上述任务的处理单元进行休眠处理的过程具体可以包括:对上述处理单元对应的处理线程进行休眠处理,以使上述处理线程暂停CPU(中央处理器)的调度执行。
在本申请的一种可选实施例中,所述任务的流量速度符合预置条件具体可以包括:所述任务的字节流量速度超出第一阈值,和/或,所述任务的记录流量速度超出第二阈值。其中,上述第一阈值和第二阈值可由用户指定,也可由同步引擎根据经验值确定,本申请实施例对于具体的第一阈值和第二阈值不加以限制。
在本申请的另一种可选实施例中,所述对所述任务的处理单元进行休眠处理的步骤,具体可以包括:
步骤S11、依据所述任务的流量速度、监控周期和流量速度上限,确定 所述任务的处理单元的休眠时长;
步骤S12、控制所述任务的处理单元进入维持时间为所述休眠时长的休眠状态。
其中,监控周期可用于表示监控任务的流量速度的周期,其可以通过进程向同步引擎汇报任务的流量速度的周期、或者两次汇报之间的间隔来表示,例如,监控周期可以为20毫秒等,可以理解,本申请实施例对于具体的监控周期不加以限制。
在本申请的一种应用示例中,确定所述任务的处理单元的休眠时长的公式可以为:(流量速度*监控周期)/流量速度上限-监控周期,可以理解,本申请实施例对于休眠时长的具体确定方法不加以限制。
综上,本申请实施例将作业切分为多个任务,其中的每个任务都可以具有对应的处理单元,由此在任务的处理过程,可以对任务的流量速度进行监控,并在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理;由于对所述任务的处理单元进行休眠处理,可使得任务的处理单元暂停对于任务的数据同步,因此可以降低运行上述任务的机器的网卡流量,节省运行上述任务的机器的内存资源,避免运行上述任务的机器出现网卡流量打满、内存溢出等问题,从而提高生产***的稳定性。
并且,本申请实施例可以对每个任务的流量速度进行单独监控,并依据监控结果对不同任务进行有区分的流量控制,使得不同任务的流量控制相互独立,例如,对于同一作业的多个任务,有的任务的流量速度超出了阈值,故可以对其处理单元进行休眠处理,以缓解运行这些任务的机器的网卡流量、内存资源等压力,而有的任务的流量速度未超出阈值,则可以不对其处理单元进行休眠处理,因此本申请实施例可以提高流量控制的合理性。
方法实施例二
参照图3,示出了本申请的一种数据同步方法实施例二的步骤流程图,可以应用于同步引擎的同步进程,具体可以包括如下步骤:
步骤301、将作业切分为多个任务;其中,所述任务具有对应的处理单 元;
步骤302、获取本次的汇报内容;其中,所述汇报内容为作业的处理进程向同步进程汇报的、作业的多个任务的当前流量;
步骤303、判断当前时间距离上次汇报时间的间隔是否达到监控周期,若是,则执行步骤304,否则,返回执行步骤302;
步骤304、判断所述当前流量的格式为字节还是记录;
步骤305、在当前流量的格式为字节时,根据本次的汇报内容、上次的汇报内容、及该两次汇报内容之间的时间间隔,计算当前字节流量速度;
步骤306、判断当前字节流量速度是否超出第一阈值,若是,则执行步骤307,否则,返回执行步骤302;
步骤307、依据所述任务的当前字节流量速度、该两次汇报内容之间的时间间隔和第一阈值,确定所述任务的处理单元的休眠时长;
步骤308、在当前流量的格式为记录时,根据本次的汇报内容、上次的汇报内容、及该两次汇报内容之间的时间间隔,计算当前记录流量速度;
步骤309、判断当前记录流量速度是否超出第二阈值,若是,则执行步骤310,否则,返回执行步骤302;
步骤310、依据所述任务的当前记录流量速度、该两次汇报内容之间的时间间隔和第二阈值,确定所述任务的处理单元的休眠时长;
步骤311、控制所述任务的处理单元进入维持时间为所述休眠时长的休眠状态。
方法实施例三
参照图4,示出了本申请的一种数据同步方法实施例三的步骤流程图,具体可以包括如下步骤:
步骤401、针对用户创建资源组;其中,所述资源组具体可以包括至少一台机器;
步骤402、利用所述资源组中机器的资源对所述用户的作业进行处理,其中,所述利用所述资源组中机器对所述用户的作业进行处理的步骤402, 具体可以包括:
步骤421、利用所述资源组中机器将作业切分为多个任务;其中,所述任务具有对应的处理单元;
步骤422、在所述任务的处理过程中,利用所述资源组中机器监控所述任务的流量速度;
步骤423、在所述任务的流量速度符合预置条件时,利用所述资源组中机器对所述任务的处理单元进行休眠处理。
相对于方法实施例一和方法实施例二,本实施例还可以针对用户创建资源组,并利用所述资源组中机器的资源对所述用户的作业进行处理,因此可以实现用户之间同步资源的隔离,从而可以避免不同用户的作业之间的相互影响。由于云计算通常要求多个用户(两个无关使用者或租户)之间资源隔离,一个用户对资源的占用不应该影响另外一个用户对资源的使用,因此,本实施例可以适用于云计算的场景。
本申请实施例中,所述机器的资源具体可以包括:依据所述机器的物理资源抽象得到的槽位资源。每台机器都有其对应的物理资源,具体可以包括CPU、磁盘、内存、网卡等,则可以对上述物理资源进行抽象,得到槽位资源(SlotNumber),通常,机器的性能越强则槽位资源越多,机器的性能越差则槽位资源越少。
在本申请实施例的数据同步过程中,可以针对每个用户创建一个资源组并设定该资源组对应机器的槽位资源;由于用户的作业运行在对应的资源组中,资源组之间的作业互不干扰,因此可以避免不同用户的作业之间的相互影响。
需要说明的是,除了步骤421-步骤423外,利用所述资源组中机器的资源对所述用户的作业进行处理还可以包括其它操作,如为任务分配机器的资源的操作,利用机器的资源对任务进行处理的操作等,本申请实施例对于利用所述资源组中机器的资源对所述用户的作业进行处理的具体过程不加以限制。
由于每个作业或任务都将占用的槽位资源,故在本申请的一种可选实施 例中,所述利用所述资源组中机器对所述用户的作业进行处理的步骤,具体可以包括:在所述用户对应资源组中机器的剩余资源超出所述用户的作业所需的资源时,利用所述资源组中机器对所述用户的作业进行处理。同时,在所述用户对应资源组中机器的剩余资源未超出所述用户的作业所需的资源时,可以对所述用户的作业进行排队处理,直至所述用户对应资源组中机器的剩余资源超出所述用户的作业所需的资源,上述处理可以避免运行上述任务的机器出现内存溢出问题,从而能够提高生产***的稳定性。在本申请的一种应用示例中,可以根据用户指定的预期切分数目ChannelNumber,计算预期消耗的槽位资源,并将计算结果作为用户的作业所需的资源,这里,预期切分数目可用于表示同时可能并发执行的任务数目。可以理解,本申请实施例对于用户的作业所需的资源的具体确定方式不加以限制。
在本申请的另一种可选实施例中,上述利用所述资源组中机器对所述用户的作业进行处理的过程可以为,针对每个任务分配对应的槽位资源,利用上述槽位资源进行任务的处理,并在任务完成后,释放对应的槽位资源,以节省运行作业的机器的槽位资源。在本申请的一种应用示例中,对于一台96GB内存、千兆网卡、32Core机器,可以设定其槽位资源为160,针对每个任务分配的槽位资源可以为2,可以理解,上述机器的槽位资源和为每个任务分配的槽位资源均作为示例,实际上,本领域技术人员可以根据实际应用需求确定机器的槽位资源和为每个任务分配的槽位资源的具体数值。可以理解,实际切分得到的任务数目可能超出预期切分数目ChannelNumber,此种情况下,可以对超出的任务进行排队处理。
方法实施例四
参照图5,示出了本申请的一种数据同步方法实施例四的步骤流程图,具体可以包括如下步骤:
步骤501、将作业切分为多个任务;其中,所述任务具有对应的处理单元;
步骤502、在所述任务的处理过程中,监控所述任务的流量速度;
步骤503、在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理;
步骤504、当数据库存在可用的并发连接时,利用所述并发连接对所述任务进行处理;
步骤505、在所述任务处理完成后,释放所述作业所占用的并发连接。
在数据同步过程中,每个数据库提高的数据库连接数是有限的。以数据源头数据库(目的端数据库类似,相互参考即可)为例,在数据抽取时,每个数据源头数据库能够提供的并发连接数是有限的,而往往同一个数据库会被多个同步作业抓取快照数据(可能是同一个库的多个表),一味的增加并发连接数可能引发数据库负载过重。
相对于方法实施例一,本实施例描述了任务的处理过程,具体地,当数据库存在可用的并发连接时,可以利用所述并发连接对所述作业进行处理,由于当数据库不存在可用的并发连接时,可以对作业进行等待处理,而不是增加并发连接数,因此能够避免一位增加并发连接数、引发的数据库负载过重的问题,从而能够提高生产***的稳定性。
在实际应用中,可以针对每个数据库配置一个能够支持的预置并发连接数,通常,一个任务占用一个并发连接,则在利用所述并发连接对所述任务进行处理的过程中,当前并发连接数可以等于预置并发连接数减1,在当前并发连接数大于0时,表示存在可用的并发连接,在当前并发连接数小于等于0时,表示不存在可用的并发连接。在任务处理完成后,可以将占用的并发连接释放,此种情况下当前并发连接数可以加1,从而使释放的并发连接能够支持新的任务。
在本申请的一种可选实施例中,所述处理单元具体可以包括:读单元和写单元,则所述利用所述并发连接对所述任务进行处理的步骤,具体可以包括:
步骤S21、分别对所述作业对应的多个读单元和多个写单元进行乱序处理;
步骤S22、从乱序处理结果中选择一个读单元和一个写单元,组成相应 的任务。
本可选实施例对所述作业对应的多个任务的读单元、多个通道和多个写单元等处理单元进行乱序处理,能够避免并发连接落到同一个数据库上,降低数据库的压力。
在本申请的一种应用示例中,假设作业被切分为5个任务:任务1、任务2、任务3、任务4和任务5,假设任务1、任务2、任务3、任务4和任务5对应的并发连接分别为并发连接1、并发连接2、并发连接3、并发连接4和并发连接5,则可以将任务1、任务2、任务3、任务4和任务5对应的多个读单元放入一个数组并乱序,多个写单元放入一个数组并乱序,假设读单元的乱序结果为:任务1的读单元1、任务5的读单元5、任务4的读单元4、任务2的读单元2和任务1的读单元1,写单元的乱序结果为:任务5的写单元5、任务3的写单元3、任务2的写单元2、任务1的写单元1和任务4的写单元4,则根据乱序结果组成的新任务分别为,新任务1:任务1的读单元1和任务5的写单元5,新任务2:任务5的读单元5和任务3的写单元3,新任务3:任务4的读单元4和任务2的写单元2,新任务4:任务2的读单元2和任务1的写单元1,新任务5:任务1的读单元1和任务4的写单元4,由于新任务1占用的是并发连接1和并发连接5,新任务2占用的是并发连接5和并发连接3,因此,能够避免并发连接落到同一个并发连接对应的数据库上,降低数据库的压力。
本可选实施例适用于分库分表的读取等应用场景。例如,在数据业务量非常大导致超出单一数据库的承载能力的情况下,可以对数据库进行分库分表,分库是指将海量数据由一个数据库存储管理拆分为多个数据库存储管理,分表是指将海量数据由一个数据表存储管理拆分为多个数据表存储管理,具体地,可以按照地理区域(如北京和江苏具有不同的分库)、生产时间等因素进行数据库的分库分表;上述分库分表在应用层可以表现为一个逻辑表,一个逻辑表可以代表1024个物理表,该1024个物理表位于不同分库下的多个分表。
通常,用户的作业针对的是一个逻辑表,例如,该作业可用于将1024 个物理表的内容同步至目的端,假设该作业被切分为1024个任务,槽位资源允许并发处理的任务数量为30,则为了避免并发处理的30个任务对应的并发连接落在相同的分库上,可以对该1024个任务进行乱序处理,依据乱序处理结果组合得到对应的新任务,并重新并发处理新任务。
方法实施例五
参照图6,示出了本申请的一种数据同步方法实施例五的步骤流程图,具体可以包括如下步骤:
步骤601、将作业切分为多个任务;其中,所述任务具有对应的处理单元;
步骤602、在当前时间处于任务对应数据库的时间窗口内时,进行所述任务的处理;
步骤603、在所述任务的处理过程中,监控所述任务的流量速度;
步骤604、在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理。
由于数据同步对数据库的性能消耗较大,故有些数据库有严格的性能要求,只有业务低峰期才能进行数据拖取,以避免数据同步影响线上数据库的性能,保障线上业务的稳定。
相对于方法实施例一,本实施例可以针对数据库配置时间窗口,并在当前时间处于所述作业对应数据库的时间窗口内时,处理上述作业,在当前时间不处于所述作业对应数据库的时间窗口内时,等待时间窗口的到来,因此能够避免数据同步影响线上数据库的性能。
在实际应用中,时间窗口可用于表示数据库对应业务的低峰时期,其可以包括:开始时间begin_time和结束时间end_time,在数据同步前,可以作业调度时,判断当前时间是否在开始时间和结束时间之间,若是则处理上述作业;若否,且未到达开始时间则进行等待,等待进入时间窗口内;若已经超过结束时间,则作业失败。
需要说明的是,生产环境中数据库一般为主备冗余复制,主库将数据定 期复制到备库中,数据抽取一般读取备库,同步时需要保障备库有预期读取的数据,具体方案可以为:检查主备复制情况,如果备库数据没有达到指定位点,则不进行数据同步,Oracel、Mysql等数据库都有相关机制检查该状态。可通过轮询方式确定备库数据是否达到指定位点。
为使本领域技术人员更好地理解本申请实施例,参照图7,示出了本申请的一种数据同步***的结构示意图,其具体可以包括:同步中心701、资源管控模块702和同步引擎703;其中,同步引擎703具体可以包括:调度模块731和处理单元732;
其中,同步中心701用于接收用户提交的作业,并将上述作业提交到资源管控模块702;
资源管控模块702,用于统一协调管理机器的资源,并针对上述作业为处理上述作业的上述同步引擎703分配相应的资源;每个机器可提供相应的槽位资源,多个机器构成一个集群或资源组,对应的同步作业运行在指定的资源组中,互不干扰;
调度模块731,用于将一个作业切分为多个任务,并对多个任务进行调度执行;
处理单元732可用于对上述作业进行处理,上述处理单元732具体可以包括:读单元(Reader)7321、通道7322和写单元7323,其中,读单元7321可用于从数据源头加载数据,并存入作为缓冲区的通道7322,写单元7323可用于从通道7322读取数据,并将读取的数据写入目的端。
参照图8,示出了本申请的一种数据同步的状态机的示意图,该状态机可应用到CDP(云道,Cloud Data Pipeline)或同步中心701,相应的数据同步过程具体可以包括:
步骤S1、用户提交一个新的作业到同步中心701或者云道,作业处于提交(SUBMITTED)状态,如果提交失败进入失败(FAILED)状态;
步骤S2、同步中心701或云道对处于SUBMITTED状态的作业进行初步的流量控制,上述初步的流量控制具体可以包括如下判断方案中的至少一 种:
定期轮询每一个提交的作业,判断是否处于同步窗口内(只有在同步窗口内才可以同步,同步抽取不能影响数据库线上服务);
读取主库还是备库(一般线上数据库提供主备冗余,同步抽取读取备库,加载写入主库,读取时需要确定备库数据是否完备,是否包括完整待读取的数据);
当前数据库的并发连接数目(限制数据源连接数量,保护数据源)是否充足;
在上述判断方案输出的判断结果满足任务需求时,同步中心701或云道提交作业到资源管控模块(Alisa)702,则作业进入准备(READY)状态;在上述判断方案输出的判断结果不满足任务需求时,作业进入失败(FAILED)状态;
步骤S3、在READY状态,作业等待机器的资源(磁盘、网卡、CPU、内存),在机器的资源满足需求时,资源管控模块702将作业提交至同步引擎703,如果提交成功,则作业进入运行(RUNNING)状态,如果提交失败则作业进入失败(FAILED)状态;在机器的资源不满足需求时,可以对作业进行等待处理;
其中,每个作业可以占用一定的机器的资源,可以按照作业的优先级为作业分配机器的资源;
步骤S4、在作业处于RUNNING状态时,资源管控模块702在一个资源组中寻找到一台机器运行同步引擎703的同步进程,该同步进程可以占用相应的资源;
其中,该同步进程上可以完成调度模块731和处理单元732的功能,具体地,可以从数据源头加载数据,读取到内存,将数据写入到目的端的过程;
并且,该同步进程还可以对处于RUNNING状态的作业进行流量控制,例如,在将作业切分为5个任务时,假设每个任务的字节流量速度的上限为1MBPS,则该同步进程可以按照此约束进行任务的流量控制;
另外,该同步进程还可以将处于运行状态的作业的瞬时状态汇报给同步 中心701或者云道,汇报间隔可以为10秒,汇报的瞬时状态具体可以包括:已读取记录总数、已读取字节总数、任务进度百分比、已读脏纪录数、已读脏数据字节数、任务读取速度情况等,以使用户通过访问同步中心701或者云道查询得到上述瞬时状态;
在该同步进程处理完作业的任务且作业正常退出时,作业进入成功(SUCCESSED)状态,否则作业进入失败状态;
步骤S5、在用户认为作业不需要继续运行时,可以将处于READY、RUNNING状态的作业停止;
其中,在作业处于READY状态时还没有同步进程存在,故相应的停止过程可以为:修改同步中心701或云道的相关作业信息,释放资源管控模块702的作业请求;
在作业处于RUNNING状态时存在同步进程,此时还需要将同步X进程杀死;
需要说明的是,将一个作业停止不是一个瞬时完整的过程,需要定期发送Kill信号并轮询作业状态直到作业状态从KILLING(停止中)状态到达KILLED(杀死)状态。
另外,需要说明的是,本申请实施例的同步进程可以运行在单机环境或者分布式环境下,本申请实施例对于同步进程的具体运行环境不加以限制。
综上,本申请实施例具有如下优点:
第一,本申请实施例将作业切分为多个任务,其中的每个任务都可以具有对应的处理单元,由此在任务的处理过程,可以对任务的流量速度进行监控,并在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理;由于对所述任务的处理单元进行休眠处理,可使得任务的处理单元暂停对于任务的数据同步,因此可以降低运行上述任务的机器的网卡流量,避免运行上述任务的机器出现网卡流量打满等问题,从而提高生产***的稳定性。
第二,本申请实施例可以对每个任务的流量速度进行单独监控,并依据监控结果对不同任务进行有区分的流量控制,使得不同任务的流量控制相互 独立,例如,对于同一作业的多个任务,有的任务的流量速度超出了阈值,故可以对其处理单元进行休眠处理,以缓解运行这些任务的机器的网卡流量、内存资源等压力,而有的任务的流量速度未超出阈值,则可以不对其处理单元进行休眠处理,因此本申请实施例可以提高流量控制的合理性;
第三,本申请实施例针对用户创建资源组,并利用所述资源组中机器的资源对所述用户的作业进行处理,因此可以实现用户之间同步资源的隔离,从而可以避免不同用户的作业之间的相互影响。由于云计算通常要求多个用户(两个无关使用者或租户)之间资源隔离,一个用户对资源的占用不应该影响另外一个用户对资源的使用,因此,本实施例可以适用于云计算的场景;
第四,本申请实施例在数据库存在可用的并发连接时,可以利用所述并发连接对所述作业进行处理,由于当数据库不存在可用的并发连接时,可以对作业进行等待处理,而不是增加并发连接数,因此能够避免一位增加并发连接数、引发的数据库负载过重的问题,从而能够提高生产***的稳定性;
第五,本申请实施例可以针对数据库配置时间窗口,并在当前时间处于所述作业对应数据库的时间窗口内时,处理上述作业,在当前时间不处于所述作业对应数据库的时间窗口内时,等待时间窗口的到来,因此能够避免数据同步影响线上数据库的性能;
第六,本申请实施例可以适用于任意两个数据源之间的数据同步,因此具有较好的通用性。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
装置实施例一
参照图9,示出了本申请的一种数据同步装置实施例一的结构框图,具体可以包括如下模块:
切分模块901,用于将作业切分为多个任务;其中,所述任务具有对应的处理单元;
监控模块902,用于在所述任务的处理过程中,监控所述任务的流量速度;及
休眠模块903,用于在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理。
在本申请的一种可选实施例中,所述任务的流量速度符合预置条件具体可以包括:所述任务的字节流量速度超出第一阈值,和/或,所述任务的记录流量速度超出第二阈值。
在本申请的另一种可选实施例中,所述休眠模块903,具体可以包括:
确定子模块,用于依据所述任务的流量速度、监控周期和流量速度上限,确定所述任务的处理单元的休眠时长;
控制子模块,用于控制所述任务的处理单元进入维持时间为所述休眠时长的休眠状态。
在本申请的再一种可选实施例中,所述切分模块901,具体可以包括:
第一切分子模块,用于根据所述作业对应数据表的列最小值和最大值,将列区间划分多个子区间;或者
第二切分子模块,用于根据所述作业对应文件的大小和切分份数,确定所述文件的切分点位,并依据所述切分点位对所述文件进行切分。
装置实施例二
参照图10,示出了本申请的一种数据同步装置实施例二的结构框图,具体可以包括如下模块:
创建模块1001,用于针对用户创建资源组;其中,所述资源组包括至少一台机器;
作业处理模块1002,用于利用所述资源组中机器的资源对所述用户的作业进行处理,其中,所述作业处理模块1002,具体可以包括:
切分子模块1021,用于利用所述资源组中机器将作业切分为多个任务; 其中,所述任务具有对应的处理单元;
监控子模块1022,用于在所述任务的处理过程中,利用所述资源组中机器监控所述任务的流量速度;及
休眠子模块1023,用于在所述任务的流量速度符合预置条件时,利用所述资源组中机器对所述任务的处理单元进行休眠处理。
在本申请的一种可选实施例中,所述作业处理模块1002,具体可以包括:
条件处理子模块,用于在所述用户对应资源组中机器的剩余资源超出所述用户的作业所需的资源时,利用所述资源组中机器对所述用户的作业进行处理。
在本申请的另一种可选实施例汇总,所述机器的资源具体可以包括:依据所述机器的物理资源抽象得到的槽位资源。
装置实施例三
参照图11,示出了本申请的一种数据同步装置实施例三的结构框图,具体可以包括如下模块:
切分模块1101,用于将作业切分为多个任务;其中,所述任务具有对应的处理单元;
第一任务处理模块1102,用于当数据库存在可用的并发连接时,利用所述并发连接对所述任务进行处理;
监控模块1103,用于在所述任务的处理过程中,监控所述任务的流量速度;
休眠模块1104,用于在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理;及
释放模块1105,用于在所述任务处理完成后,释放所述任务所占用的并发连接。
在本申请的一种可选实施例中,所述处理单元具体可以包括:读单元和写单元;
则所述第一任务处理模块1102,具体可以包括:
乱序处理子模块,用于分别对所述作业对应的多个读单元和多个写单元进行乱序处理;及
组合子模块,用于从乱序处理结果中选择一个读单元和一个写单元,组成相应的任务。
装置实施例四
参照图12,示出了本申请的一种数据同步装置实施例四的结构框图,具体可以包括如下模块:
切分模块1201,用于将作业切分为多个任务;其中,所述任务具有对应的处理单元;
第二任务处理子模块1202,用于在当前时间处于任务对应数据库的时间窗口内时,进行所述任务的处理;
监控模块1203,用于在所述任务的处理过程中,监控所述任务的流量速度;
休眠模块1204,用于在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。
本申请实施例是参照根据本申请实施例的方法、终端设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设 备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种数据同步方法和一种数据同步装置,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种数据同步方法,其特征在于,包括:
    将作业切分为多个任务;其中,所述任务具有对应的处理单元;
    在所述任务的处理过程中,监控所述任务的流量速度;
    在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理。
  2. 根据权利要求1所述的方法,其特征在于,所述任务的流量速度符合预置条件包括:所述任务的字节流量速度超出第一阈值,和/或,所述任务的记录流量速度超出第二阈值。
  3. 根据权利要求1所述的方法,其特征在于,所述对所述任务的处理单元进行休眠处理的步骤,包括:
    依据所述任务的流量速度、监控周期和流量速度上限,确定所述任务的处理单元的休眠时长;
    控制所述任务的处理单元进入维持时间为所述休眠时长的休眠状态。
  4. 根据权利要求1至3中任一所述的方法,其特征在于,所述方法还包括:
    针对用户创建资源组;其中,所述资源组包括至少一台机器;
    利用所述资源组中机器的资源对所述用户的作业进行处理,其中,所述利用所述资源组中机器对所述用户的作业进行处理的步骤,包括:
    利用所述资源组中机器将作业切分为多个任务;其中,所述任务具有对应的处理单元;
    在所述任务的处理过程中,利用所述资源组中机器监控所述任务的流量速度;
    在所述任务的流量速度符合预置条件时,利用所述资源组中机器对所述任务的处理单元进行休眠处理。
  5. 根据权利要求4所述的方法,其特征在于,所述利用所述资源组中机器对所述用户的作业进行处理的步骤,包括:
    在所述用户对应资源组中机器的剩余资源超出所述用户的作业所需的资源时,利用所述资源组中机器对所述用户的作业进行处理。
  6. 根据权利要求4所述的方法,其特征在于,所述机器的资源包括:依据所述机器的物理资源抽象得到的槽位资源。
  7. 根据权利要求1至3中任一所述的方法,其特征在于,所述方法还包括:
    当数据库存在可用的并发连接时,利用所述并发连接对所述任务进行处理;
    在所述任务处理完成后,释放所述任务所占用的并发连接。
  8. 根据权利要求7所述的方法,其特征在于,所述处理单元包括:读单元和写单元;
    所述利用所述并发连接对所述任务进行处理的步骤,包括:
    分别对所述作业对应的多个读单元和多个写单元进行乱序处理;
    从乱序处理结果中选择一个读单元和一个写单元,组成相应的任务。
  9. 根据权利要求1至3中任一所述的方法,其特征在于,所述方法还包括:在当前时间处于任务对应数据库的时间窗口内时,进行所述任务的处理。
  10. 根据权利要求1至3中任一所述的方法,其特征在于,所述将作业切分为多个任务的步骤,包括:
    根据所述作业对应数据表的列最小值和最大值,将列区间划分多个子区间;或者
    根据所述作业对应文件的大小和切分份数,确定所述文件的切分点位,并依据所述切分点位对所述文件进行切分。
  11. 一种数据同步装置,其特征在于,包括:
    切分模块,用于将作业切分为多个任务;其中,所述任务具有对应的处理单元;
    监控模块,用于在所述任务的处理过程中,监控所述任务的流量速度; 及
    休眠模块,用于在所述任务的流量速度符合预置条件时,对所述任务的处理单元进行休眠处理。
  12. 根据权利要求11所述的装置,其特征在于,所述任务的流量速度符合预置条件包括:所述任务的字节流量速度超出第一阈值,和/或,所述任务的记录流量速度超出第二阈值。
  13. 根据权利要求11所述的装置,其特征在于,所述休眠模块,包括:
    确定子模块,用于依据所述任务的流量速度、监控周期和流量速度上限,确定所述任务的处理单元的休眠时长;
    控制子模块,用于控制所述任务的处理单元进入维持时间为所述休眠时长的休眠状态。
  14. 根据权利要求11至13中任一所述的装置,其特征在于,所述装置还包括:
    创建模块,用于针对用户创建资源组;其中,所述资源组包括至少一台机器;
    作业处理模块,用于利用所述资源组中机器的资源对所述用户的作业进行处理,其中,所述利用所述资源组中机器对所述用户的作业进行处理的过程,包括:
    利用所述资源组中机器将作业切分为多个任务;其中,所述任务具有对应的处理单元;
    在所述任务的处理过程中,利用所述资源组中机器监控所述任务的流量速度;
    在所述任务的流量速度符合预置条件时,利用所述资源组中机器对所述任务的处理单元进行休眠处理。
  15. 根据权利要求14所述的装置,其特征在于,所述作业处理模块,包括:
    条件处理子模块,用于在所述用户对应资源组中机器的剩余资源超出所述用户的作业所需的资源时,利用所述资源组中机器对所述用户的作业进行 处理。
  16. 根据权利要求14所述的装置,其特征在于,所述机器的资源包括:依据所述机器的物理资源抽象得到的槽位资源。
  17. 根据权利要求11至13中任一所述的装置,其特征在于,所述装置还包括:
    第一任务处理模块,用于当数据库存在可用的并发连接时,利用所述并发连接对所述任务进行处理;
    释放模块,用于在所述任务处理完成后,释放所述任务所占用的并发连接。
  18. 根据权利要求17所述的装置,其特征在于,所述处理单元包括:读单元和写单元;
    所述第一任务处理模块,包括:
    乱序处理子模块,用于分别对所述作业对应的多个读单元和多个写单元进行乱序处理;
    组合子模块,用于从乱序处理结果中选择一个读单元和一个写单元,组成相应的任务。
  19. 根据权利要求11至13中任一所述的装置,其特征在于,所述装置还包括:
    第二任务处理子模块,用于在当前时间处于任务对应数据库的时间窗口内时,进行所述任务的处理。
  20. 根据权利要求11至13中任一所述的装置,其特征在于,所述切分模块,包括:
    第一切分子模块,用于根据所述作业对应数据表的列最小值和最大值,将列区间划分多个子区间;或者
    第二切分子模块,用于根据所述作业对应文件的大小和切分份数,确定所述文件的切分点位,并依据所述切分点位对所述文件进行切分。
PCT/CN2016/099055 2015-09-25 2016-09-14 一种数据同步方法和装置 WO2017050177A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510624904.6 2015-09-25
CN201510624904.6A CN106557492A (zh) 2015-09-25 2015-09-25 一种数据同步方法和装置

Publications (1)

Publication Number Publication Date
WO2017050177A1 true WO2017050177A1 (zh) 2017-03-30

Family

ID=58385567

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/099055 WO2017050177A1 (zh) 2015-09-25 2016-09-14 一种数据同步方法和装置

Country Status (2)

Country Link
CN (1) CN106557492A (zh)
WO (1) WO2017050177A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109646784A (zh) * 2018-12-21 2019-04-19 华东计算技术研究所(中国电子科技集团公司第三十二研究所) 基于沉浸式vr的失眠障碍心理治疗***和方法
CN111797158B (zh) * 2019-04-08 2024-04-05 北京沃东天骏信息技术有限公司 数据同步***、方法和计算机可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101056264A (zh) * 2007-04-25 2007-10-17 华为技术有限公司 流量控制的方法和业务处理***
CN102298580A (zh) * 2010-06-22 2011-12-28 Sap股份公司 使用异步缓冲器的多核查询处理
US20120054770A1 (en) * 2010-08-31 2012-03-01 International Business Machines Corporation High throughput computing in a hybrid computing environment
CN103810041A (zh) * 2014-02-13 2014-05-21 北京大学 一种支持动态伸缩的并行计算的方法
CN103955491A (zh) * 2014-04-15 2014-07-30 南威软件股份有限公司 一种定时数据增量同步的方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102098731B (zh) * 2011-01-25 2014-06-25 无锡泛联物联网科技股份有限公司 无线传感网中的基于跳数的流量自适应休眠调度方法
CN102662633A (zh) * 2012-03-16 2012-09-12 深圳第七大道科技有限公司 一种Flash任务的多线程处理方法和***
CN102790698B (zh) * 2012-08-14 2014-08-13 南京邮电大学 一种基于节能树的大规模计算集群任务调度方法
CN103257892B (zh) * 2013-05-27 2016-03-23 北京世纪瑞尔技术股份有限公司 一种基于宏组合的多任务调度方法及***

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101056264A (zh) * 2007-04-25 2007-10-17 华为技术有限公司 流量控制的方法和业务处理***
CN102298580A (zh) * 2010-06-22 2011-12-28 Sap股份公司 使用异步缓冲器的多核查询处理
US20120054770A1 (en) * 2010-08-31 2012-03-01 International Business Machines Corporation High throughput computing in a hybrid computing environment
CN103810041A (zh) * 2014-02-13 2014-05-21 北京大学 一种支持动态伸缩的并行计算的方法
CN103955491A (zh) * 2014-04-15 2014-07-30 南威软件股份有限公司 一种定时数据增量同步的方法

Also Published As

Publication number Publication date
CN106557492A (zh) 2017-04-05

Similar Documents

Publication Publication Date Title
US11888599B2 (en) Scalable leadership election in a multi-processing computing environment
US10402220B2 (en) System and method for supporting a scalable thread pool in a distributed data grid
US11816063B2 (en) Automatic archiving of data store log data
US9027028B2 (en) Controlling the use of computing resources in a database as a service
EP2954424B1 (en) Method, device, and system for peer-to-peer data replication and method, device, and system for master node switching
CN104735110B (zh) 元数据管理方法和***
WO2019001017A1 (zh) 集群间数据迁移方法、***、服务器及计算机存储介质
EP3039844B1 (en) System and method for supporting partition level journaling for synchronizing data in a distributed data grid
CN104965850A (zh) 一种基于开源技术的数据库高可用实现方法
US9984139B1 (en) Publish session framework for datastore operation records
WO2019109854A1 (zh) 分布式数据库数据处理方法、装置、存储介质及电子装置
WO2017028690A1 (zh) 一种基于etl的文件处理方法及***
TW201702908A (zh) 資料庫彈性調度方法以及裝置
WO2022126863A1 (zh) 一种基于读写分离及自动伸缩的云编排***及方法
CN116302574B (zh) 一种基于MapReduce的并发处理方法
TW201804346A (zh) 針對資料庫的資料修改請求處理方法和裝置
WO2017181430A1 (zh) 分布式***的数据库复制方法及装置
WO2017050177A1 (zh) 一种数据同步方法和装置
CN107566341B (zh) 一种基于联邦分布式文件存储***的数据持久化存储方法及***
US20210042322A1 (en) System and method of time-based snapshot synchronization
Yuan et al. Research of scheduling strategy based on fault tolerance in Hadoop platform
KR101681651B1 (ko) 데이터베이스 관리 시스템 및 방법
KR101542605B1 (ko) 온톨로지 매칭의 시멘틱 이질성 병렬 처리 장치 및 처리 방법
US20240223510A1 (en) Scalable leadership election in a multi-processing computing environment
WO2020207078A1 (zh) 数据处理方法、装置和分布式数据库***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16848059

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16848059

Country of ref document: EP

Kind code of ref document: A1