CN111324610A - Data synchronization method and device - Google Patents

Data synchronization method and device Download PDF

Info

Publication number
CN111324610A
CN111324610A CN202010101450.5A CN202010101450A CN111324610A CN 111324610 A CN111324610 A CN 111324610A CN 202010101450 A CN202010101450 A CN 202010101450A CN 111324610 A CN111324610 A CN 111324610A
Authority
CN
China
Prior art keywords
target
data
database
information
synchronization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010101450.5A
Other languages
Chinese (zh)
Inventor
邓静茹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Rongyimai Information Technology Co ltd
Original Assignee
Shenzhen Rongyimai Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Rongyimai Information Technology Co ltd filed Critical Shenzhen Rongyimai Information Technology Co ltd
Priority to CN202010101450.5A priority Critical patent/CN111324610A/en
Publication of CN111324610A publication Critical patent/CN111324610A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The application is applicable to the technical field of computers, and provides a data synchronization method, which comprises the following steps: acquiring a data synchronization instruction; determining field information and field types corresponding to the target data based on the database address information, the database permission information and the table name information; establishing a target table in a distributed database based on the field information, the field type, the target synchronous address information and a preset distributed database tool; determining a synchronization script corresponding to the target data based on the data synchronization instruction; synchronizing the target data to a target table in the distributed database based on the synchronization script. According to the method, the synchronous script is generated, the original manually written script is automatically realized, the data synchronization efficiency is improved, and the possibility of errors is reduced.

Description

Data synchronization method and device
Technical Field
The present application belongs to the field of computer technologies, and in particular, to a method and an apparatus for data synchronization.
Background
The first link in big data development is to import massive data into a platform for data analysis or algorithm modeling, and in the prior art, corresponding scripts are manually written for data stored in different types of databases, so that data synchronization is realized. However, the existing method for importing large-batch data is inefficient and is prone to errors during synchronization.
Disclosure of Invention
The embodiment of the application provides a data synchronization method and device, and can solve the problems that the existing mass data import mode is low in efficiency and errors are easy to occur in synchronization.
In a first aspect, an embodiment of the present application provides a data synchronization method, including:
acquiring a data synchronization instruction; the data synchronization instruction comprises database address information and database permission information of a relational database to which target data to be synchronized belong, table name information of a table corresponding to the target data in the relational database, and target synchronization address information;
determining field information and field types corresponding to the target data based on the database address information, the database permission information and the table name information;
establishing a target table in a distributed database based on the field information, the field type, the target synchronous address information and a preset distributed database tool; the distributed database is a target database for carrying out data synchronization on the target data; the target table is used for storing the target data;
determining a synchronization script corresponding to the target data based on the data synchronization instruction;
synchronizing the target data to a target table in the distributed database based on the synchronization script.
Further, the establishing a target table in a distributed database based on the field information, the field type and a preset distributed database tool includes:
determining metadata based on the field information, the field type and a preset distributed database tool; the metadata includes table information of the target table;
a target table is established in a distributed database based on the metadata.
Further, the determining a synchronization script corresponding to the target data based on the data synchronization instruction includes:
and generating an Apache Spark submission script corresponding to the target data based on a preset generation strategy and the data synchronization instruction.
Further, the data synchronization instruction further comprises a database type of the relational database;
the synchronizing the target data to a target table in the distributed database based on the synchronization script comprises:
determining driving information of a relational database according to the database type;
determining an executable task file based on the synchronization script and the driving information;
and performing task scheduling based on the executable task file, and synchronizing the target data to a target table in the distributed database.
Further, after the task scheduling based on the executable task file and the synchronization of the target data to the target table in the distributed database, the method further includes:
and when the executable task file is detected to be failed to be scheduled, performing task scheduling again based on the executable task file.
In a second aspect, an embodiment of the present application provides an apparatus for data synchronization, including:
the acquisition unit is used for acquiring a data synchronization instruction; the data synchronization instruction comprises database address information and database permission information of a relational database to which target data to be synchronized belong, table name information of a table corresponding to the target data in the relational database, and target synchronization address information;
the first determining unit is used for determining field information and field types corresponding to the target data based on the database address information, the database permission information and the table name information;
the establishing unit is used for establishing a target table in a distributed database based on the field information, the field type, the target synchronous address information and a preset distributed database tool; the distributed database is a target database for carrying out data synchronization on the target data; the target table is used for storing the target data;
a second determining unit, configured to determine, based on the data synchronization instruction, a synchronization script corresponding to the target data;
a synchronization unit to synchronize the target data to a target table in the distributed database based on the synchronization script.
Further, the establishing unit is specifically configured to:
determining metadata based on the field information, the field type and a preset distributed database tool; the metadata includes table information of the target table;
a target table is established in a distributed database based on the metadata.
Further, the second determining unit is specifically configured to:
and generating an Apache Spark submission script corresponding to the target data based on a preset generation strategy and the data synchronization instruction.
Further, the data synchronization instruction further comprises a database type of the relational database;
the synchronization script is specifically configured to:
determining driving information of a relational database according to the database type;
determining an executable task file based on the synchronization script and the driving information;
and performing task scheduling based on the executable task file, and synchronizing the target data to a target table in the distributed database.
Further, the apparatus for data synchronization further includes:
and the execution unit is used for carrying out task scheduling again based on the executable task file when the executable task file scheduling failure is detected.
In a third aspect, an embodiment of the present application provides a device for data synchronization, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the method for data synchronization according to the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, where the computer program is executed by a processor to implement the method for data synchronization according to the first aspect.
In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the method for data synchronization according to the first aspect.
It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.
In the embodiment of the application, a data synchronization instruction is obtained; determining field information and field types corresponding to the target data based on the database address information, the database permission information and the table name information; establishing a target table in a distributed database based on the field information, the field type, the target synchronous address information and a preset distributed database tool; determining a synchronization script corresponding to the target data based on the data synchronization instruction; synchronizing the target data to a target table in the distributed database based on the synchronization script. According to the method, the synchronous script is generated, the original manually written script is automatically realized, the data synchronization efficiency is improved, and the possibility of errors is reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a schematic flow chart diagram of a method for data synchronization provided in a first embodiment of the present application;
fig. 2 is a schematic flowchart of a refinement of S103 in a method for data synchronization provided in a first embodiment of the present application;
fig. 3 is a schematic flowchart of a refinement of S105 in a method for data synchronization provided in a first embodiment of the present application;
FIG. 4 is a schematic diagram of an apparatus for data synchronization provided in a second embodiment of the present application;
fig. 5 is a schematic diagram of a device for data synchronization according to a third embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
Referring to fig. 1, fig. 1 is a schematic flow chart of a data synchronization method according to a first embodiment of the present application. The main execution body of the method for data synchronization in this embodiment is a device with a data synchronization function, and may be a computer, a server, or the like. The method of data synchronization as shown in fig. 1 may include:
s101: acquiring a data synchronization instruction; the data synchronization instruction comprises database address information and database permission information of a relational database to which target data to be synchronized belong, table name information of a table corresponding to the target data in the relational database, and target synchronization address information.
The first link in big data development is to import massive data into a platform for data analysis or algorithm modeling, but the existing technology cannot perform data synchronization in batch automation. Therefore, in the embodiment, efficient and automatic data synchronization is realized by automatically constructing the data synchronization script. In this embodiment, the device acquires a data synchronization instruction, wherein the user may input related data on a front-end page, and then click a "data synchronization" virtual button to trigger the device to generate the data synchronization instruction. The data synchronization instruction comprises database address information of a relational database to which target data to be synchronized belongs, database authority information, table name information of a table corresponding to the target data in the relational database, and target synchronization address information. The database address information of the relational database to which the target data belongs is the ip address corresponding to the database; the database authority information is a user to which a table in the database belongs, and the user can access or modify the table and the like; the target synchronous address information is an address to which the target data is to be synchronized, and the target synchronous address information may be in the form of a mapping table.
Wherein, the relational database is a relational database. A relational database refers to a database that uses a relational model to organize data, and stores data in rows and columns, and a series of rows and columns of the relational database may be referred to as a table, and a set of tables constitutes the database. Data in a database may be retrieved by a query, which is an executing code that defines certain areas in the database. The relational model can be simply understood as a two-dimensional table model, and a relational database is a data organization composed of two-dimensional tables and relations between them.
That is, the user may input, on the front-end page, database address information of a relational database to which target data to be synchronized belongs, database authority information, table name information of a table corresponding to the target data in the relational database, and target synchronization address information, and generate a data synchronization instruction according to the relevant data input by the user.
S102: and determining field information and field types corresponding to the target data based on the database address information, the database permission information and the table name information.
The device finds the database and the content in the database based on the database address information and the database permission information, determines a target table corresponding to the target data based on the table name information, and reads the field information and the field type corresponding to the target data from the target table. In a database, in most cases, the "columns" of a table are called "fields," each of which contains certain information. For example, in the directory database, "name" and "contact" are attributes common to all rows in the table, so these columns are referred to as the "name" field and the "contact" field. The field types may generally include text, int, tinyint, datetime, vachar, char.
S103: establishing a target table in a distributed database based on the field information, the field type, the target synchronous address information and a preset distributed database tool; the distributed database is a target database for carrying out data synchronization on the target data; the target table is used for storing the target data.
In this embodiment, the preset distributed database tool may be a data warehouse tool (hive), and the hive is a data warehouse tool based on Hadoop, and is used for data extraction, conversion, and loading, which is a mechanism capable of storing, querying, and analyzing large-scale data stored in Hadoop. The hive data warehouse tool can map the structured data file into a database table, provide SQL query function and convert SQL sentences into MapReduce tasks for execution. Hive has the advantages that the learning cost is low, rapid MapReduce statistics can be realized through similar SQL sentences, MapReduce is simpler, and a special MapReduce application program does not need to be developed. hive is a statistical analysis and Windows registry file well suited for data warehouses.
The device establishes a target table in a distributed database based on field information, field types, target synchronous address information and a preset distributed database tool, wherein the distributed database is a target database for performing data synchronization on target data, and the target table is used for storing the target data.
Distributed database systems typically use smaller computer systems, each of which may be individually located in a single location, each of which may have a complete copy, or a partial copy, of the DBMS, and its own local database, with many computers located at different locations interconnected via a network to form a complete, globally logically centralized, physically distributed, large database.
Further, in order to accurately establish the target table and thus perform data synchronization more accurately, S103 may include S1031 to S1032, as shown in fig. 2, S1031 to S1032 specifically include the following steps:
s1031: determining metadata based on the field information, the field type and a preset distributed database tool; the metadata includes table information of the target table.
Metadata (Metadata), also called intermediary data and relay data, is data (data aboutdata) describing data, mainly information describing data attribute (property), and is used to support functions such as indicating storage location, history data, resource search, file record, and the like. The device determines metadata based on the field information, the field type and the preset distributed database tool, wherein the metadata comprises table information of the target table, and generally, the metadata in the hive comprises a table name, a table column and a partition and attributes thereof, attributes of the table (whether the table is an external table or not), a directory where the data of the table is located, and the like. Specifically, the device converts the field information and the field type into the hive type to generate basic database information, namely metadata such as DBS, TBLS and the like, wherein the metadata information is just like a hadoop translator and translates responsible java codes into sql, and finally, the interaction between hadoop distributed storage and relational data is realized.
S1032: a target table is established in a distributed database based on the metadata.
The device knows the name of the table, the column and the partition of the table and the attribute thereof, the attribute of the table, the directory where the data of the table is located and other information, and establishes a target table in the distributed database based on the metadata.
S104: and determining a synchronization script corresponding to the target data based on the data synchronization instruction.
The data synchronization instruction comprises database address information and database permission information of a relational database to which target data to be synchronized belong, table name information of a table corresponding to the target data in the relational database, and target synchronization address information, and the device assembles the database address information and the database permission information of the relational database to which the target data to be synchronized belong, the table name information of the table corresponding to the target data in the relational database, and the target synchronization address information to determine a synchronization script corresponding to the target data. Wherein the synchronization script is used to synchronize the target data.
Further, to improve the efficiency of data synchronization, S104 may include: and generating an Apache Spark submission script corresponding to the target data based on a preset generation strategy and the data synchronization instruction.
The device pre-stores a preset generation strategy, wherein the preset generation strategy is used for generating an Apache Spark submission script corresponding to target data, and the device generates the Apache Spark submission script corresponding to the target data based on database address information, database permission information, table name information of a table corresponding to the target data in a relational database and target synchronization address information of the relational database to which the target data to be synchronized belongs, which are included in a data synchronization instruction, and the Apache Spark submission script is a rapid and general calculation engine specially designed for large-scale data processing. Spark enables a memory distributed dataset that, in addition to being able to provide interactive queries, can also optimize the iterative workload. Spark submission is an improved submission mode, the conservative estimation can be 10 times faster than the previous sqoop submission speed, a Spark framework adopts a memory calculation mode, the data processing efficiency is greatly improved, the default memory and the cpu size are generated in the current etl command, the memory size can be flexibly configured in the Spark command, the number of used cpus is large, and corresponding reasonable resources are adopted according to different tasks.
S105: synchronizing the target data to a target table in the distributed database based on the synchronization script.
The synchronous script is used for synchronizing the target data to a target table in the distributed database, and the equipment runs the synchronous script and synchronizes the target data to the target table in the distributed database.
Further, in order to synchronize the data more accurately, S105 may include S1051 to S1053, as shown in fig. 3, S1051 to S1053 are specifically as follows:
s1051: and determining the driving information of the relational database according to the database type.
In this embodiment, the data synchronization instruction further includes a database type of the relational database, a corresponding relationship between a preset database type and preset driving information is pre-stored in the device, the device determines, based on the corresponding relationship between the preset database type and the preset driving information, driving information corresponding to the database type, that is, the driving information of the relational database, and the driving information is used to trigger the relational database to start, for example, drive the relational database to output data.
S1052: and determining an executable task file based on the synchronous script and the driving information.
The device determines an executable task file based on the synchronization script and the drive information, and specifically, the device can generate a jobfile executable by the Azkaban based on the synchronization script and the drive information. Azkaban is a batch workflow task scheduler open sourced by Linkedin. For running a set of jobs and processes in a particular order within a workflow. Azkaban defines a KV file format to establish dependencies between tasks and provides an easy-to-use web user interface to maintain and track workflows.
S1053: and performing task scheduling based on the executable task file, and synchronizing the target data to a target table in the distributed database.
The device carries out task scheduling based on executable task files, azkaban is developed by java, jetty is packaged as a lightweight web server, the web ui is used for controlling, the web ui can visually see the dependency relationship among different tasks, and the azkaban can be deployed in a single-point mode or a double-machine mode. And the equipment carries out task scheduling, and if the task scheduling is successful, the target data are synchronized into a target table in the distributed database.
Further, if the task scheduling fails, after S1053, the method may further include: and when the executable task file is detected to be failed to be scheduled, performing task scheduling again based on the executable task file. When the device detects that the scheduling of the executable task file fails, the task scheduling is performed again based on the executable task file, and each task can be rescheduled for multiple times, so that the problem of task failure caused by network reasons can be effectively avoided.
In the embodiment of the application, a data synchronization instruction is obtained; determining field information and field types corresponding to the target data based on the database address information, the database permission information and the table name information; establishing a target table in a distributed database based on the field information, the field type, the target synchronous address information and a preset distributed database tool; determining a synchronization script corresponding to the target data based on the data synchronization instruction; synchronizing the target data to a target table in the distributed database based on the synchronization script. According to the method, the synchronous script is generated, the original manually written script is automatically realized, the data synchronization efficiency is improved, and the possibility of errors is reduced.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Referring to fig. 4, fig. 4 is a schematic diagram of a data synchronization apparatus according to a second embodiment of the present application. The units included are used to perform the steps in the embodiments corresponding to fig. 1-3. Please refer to the related description of the embodiments corresponding to fig. 1 to fig. 3. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 4, the apparatus 4 for data synchronization includes:
an obtaining unit 410, configured to obtain a data synchronization instruction; the data synchronization instruction comprises database address information and database permission information of a relational database to which target data to be synchronized belong, table name information of a table corresponding to the target data in the relational database, and target synchronization address information;
a first determining unit 420, configured to determine, based on the database address information, the database permission information, and the table name information, field information and a field type corresponding to the target data;
the establishing unit 430 is configured to establish a target table in a distributed database based on the field information, the field type, the target synchronization address information, and a preset distributed database tool; the distributed database is a target database for carrying out data synchronization on the target data; the target table is used for storing the target data;
a second determining unit 440, configured to determine, based on the data synchronization instruction, a synchronization script corresponding to the target data;
a synchronization unit 450, configured to synchronize the target data to a target table in the distributed database based on the synchronization script.
Further, the establishing unit 430 is specifically configured to:
determining metadata based on the field information, the field type and a preset distributed database tool; the metadata includes table information of the target table;
a target table is established in a distributed database based on the metadata.
Further, the second determining unit 440 is specifically configured to:
and generating an Apache Spark submission script corresponding to the target data based on a preset generation strategy and the data synchronization instruction.
Further, the data synchronization instruction further comprises a database type of the relational database;
the synchronization script 450 is specifically configured to:
determining driving information of a relational database according to the database type;
determining an executable task file based on the synchronization script and the driving information;
and performing task scheduling based on the executable task file, and synchronizing the target data to a target table in the distributed database.
Further, the apparatus 4 for data synchronization further includes:
and the execution unit is used for carrying out task scheduling again based on the executable task file when the executable task file scheduling failure is detected.
Fig. 5 is a schematic diagram of a device for data synchronization according to a third embodiment of the present application. As shown in fig. 5, the apparatus 5 for data synchronization of this embodiment includes: a processor 50, a memory 51 and a computer program 52, such as a data synchronization program, stored in said memory 51 and executable on said processor 50. The processor 50 executes the computer program 52 to implement the steps in the above-mentioned embodiments of the method for synchronizing data, such as the steps 101 to 105 shown in fig. 1. Alternatively, the processor 50, when executing the computer program 52, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the modules 410 to 450 shown in fig. 4.
Illustratively, the computer program 52 may be partitioned into one or more modules/units, which are stored in the memory 51 and executed by the processor 50 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 52 in the access point 5. For example, the computer program 52 may be divided into an acquisition unit, a first determination unit, an establishment unit, a second determination unit, and a synchronization unit, and the specific functions of each unit are as follows:
the acquisition unit is used for acquiring a data synchronization instruction; the data synchronization instruction comprises database address information and database permission information of a relational database to which target data to be synchronized belong, table name information of a table corresponding to the target data in the relational database, and target synchronization address information;
the first determining unit is used for determining field information and field types corresponding to the target data based on the database address information, the database permission information and the table name information;
the establishing unit is used for establishing a target table in a distributed database based on the field information, the field type, the target synchronous address information and a preset distributed database tool; the distributed database is a target database for carrying out data synchronization on the target data; the target table is used for storing the target data;
a second determining unit, configured to determine, based on the data synchronization instruction, a synchronization script corresponding to the target data;
a synchronization unit to synchronize the target data to a target table in the distributed database based on the synchronization script.
The data synchronization device may include, but is not limited to, a processor 50 and a memory 51. It will be appreciated by those skilled in the art that fig. 5 is merely an example of a data synchronization device 5 and does not constitute a limitation of the data synchronization device 5 and may include more or fewer components than shown, or some components may be combined, or different components, e.g. the data synchronization device may also include input output devices, network access devices, buses, etc.
The Processor 50 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 51 may be an internal storage unit of the data synchronization device 5, such as a hard disk or a memory of the data synchronization device 5. The memory 51 may also be an external storage device of the data synchronization device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the data synchronization device 5. Further, the data synchronization device 51 may also include both an internal storage unit and an external storage device of the data synchronization device 5. The memory 51 is used for storing the computer program and other programs and data required by the device for data synchronization. The memory 51 may also be used to temporarily store data that has been output or is to be output.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
An embodiment of the present application further provides a network device, where the network device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.
The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), random-access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A method of data synchronization, comprising:
acquiring a data synchronization instruction; the data synchronization instruction comprises database address information and database permission information of a relational database to which target data to be synchronized belong, table name information of a table corresponding to the target data in the relational database, and target synchronization address information;
determining field information and field types corresponding to the target data based on the database address information, the database permission information and the table name information;
establishing a target table in a distributed database based on the field information, the field type, the target synchronous address information and a preset distributed database tool; the distributed database is a target database for carrying out data synchronization on the target data; the target table is used for storing the target data;
determining a synchronization script corresponding to the target data based on the data synchronization instruction;
synchronizing the target data to a target table in the distributed database based on the synchronization script.
2. The method of data synchronization of claim 1, wherein the building a target table in a distributed database based on the field information, the field type, and a preset distributed database tool comprises:
determining metadata based on the field information, the field type and a preset distributed database tool; the metadata includes table information of the target table;
a target table is established in a distributed database based on the metadata.
3. The method for data synchronization according to claim 1, wherein the determining a synchronization script corresponding to the target data based on the data synchronization instruction comprises:
and generating an Apache Spark submission script corresponding to the target data based on a preset generation strategy and the data synchronization instruction.
4. The method of data synchronization of claim 1, wherein the data synchronization instruction further comprises a database type of the relational database;
the synchronizing the target data to a target table in the distributed database based on the synchronization script comprises:
determining driving information of a relational database according to the database type;
determining an executable task file based on the synchronization script and the driving information;
and performing task scheduling based on the executable task file, and synchronizing the target data to a target table in the distributed database.
5. The method of data synchronization of claim 4, wherein after the task scheduling based on the executable task file synchronizes the target data to a target table in the distributed database, further comprising:
and when the executable task file is detected to be failed to be scheduled, performing task scheduling again based on the executable task file.
6. An apparatus for data synchronization, comprising:
the acquisition unit is used for acquiring a data synchronization instruction; the data synchronization instruction comprises database address information and database permission information of a relational database to which target data to be synchronized belong, table name information of a table corresponding to the target data in the relational database, and target synchronization address information;
the first determining unit is used for determining field information and field types corresponding to the target data based on the database address information, the database permission information and the table name information;
the establishing unit is used for establishing a target table in a distributed database based on the field information, the field type, the target synchronous address information and a preset distributed database tool; the distributed database is a target database for carrying out data synchronization on the target data; the target table is used for storing the target data;
a second determining unit, configured to determine, based on the data synchronization instruction, a synchronization script corresponding to the target data;
a synchronization unit to synchronize the target data to a target table in the distributed database based on the synchronization script.
7. The apparatus for data synchronization according to claim 6, wherein the establishing unit is specifically configured to:
determining metadata based on the field information, the field type and a preset distributed database tool; the metadata includes table information of the target table;
a target table is established in a distributed database based on the metadata.
8. The apparatus for data synchronization according to claim 6, wherein the second determining unit is specifically configured to:
and generating an Apache Spark submission script corresponding to the target data based on a preset generation strategy and the data synchronization instruction.
9. An apparatus for data synchronization comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 5 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 5.
CN202010101450.5A 2020-02-19 2020-02-19 Data synchronization method and device Pending CN111324610A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010101450.5A CN111324610A (en) 2020-02-19 2020-02-19 Data synchronization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010101450.5A CN111324610A (en) 2020-02-19 2020-02-19 Data synchronization method and device

Publications (1)

Publication Number Publication Date
CN111324610A true CN111324610A (en) 2020-06-23

Family

ID=71167355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010101450.5A Pending CN111324610A (en) 2020-02-19 2020-02-19 Data synchronization method and device

Country Status (1)

Country Link
CN (1) CN111324610A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069261A (en) * 2020-09-09 2020-12-11 携程计算机技术(上海)有限公司 Data synchronization method, system, equipment and storage medium based on distributed system
CN112364049A (en) * 2020-11-10 2021-02-12 中国平安人寿保险股份有限公司 Data synchronization script generation method, system, terminal and storage medium
CN112507020A (en) * 2020-11-20 2021-03-16 平安普惠企业管理有限公司 Data synchronization method and device, computer equipment and storage medium
CN112612783A (en) * 2020-12-22 2021-04-06 航天信息股份有限公司 Method for realizing cross-platform data sharing
CN112732242A (en) * 2021-01-12 2021-04-30 中国邮政储蓄银行股份有限公司 Wide table processing script generation method and device
CN113204598A (en) * 2021-05-28 2021-08-03 平安科技(深圳)有限公司 Data synchronization method, system and storage medium
CN113254534A (en) * 2021-06-04 2021-08-13 四川省明厚天信息技术股份有限公司 Data synchronization method and device and computer storage medium
CN113297326A (en) * 2021-05-21 2021-08-24 中国邮政储蓄银行股份有限公司 Data processing method and device, computer readable storage medium and processor
CN113486116A (en) * 2021-07-07 2021-10-08 建信金融科技有限责任公司 Data synchronization method and device, electronic equipment and computer readable medium
CN115391459A (en) * 2022-08-24 2022-11-25 南京领行科技股份有限公司 Data synchronization method and device, electronic equipment and computer readable storage medium
CN116881244A (en) * 2023-06-05 2023-10-13 北京捷泰云际信息技术有限公司 Real-time processing method and device for space data based on column storage database
CN113672683B (en) * 2021-08-19 2024-03-29 上海沄熹科技有限公司 Spark SQL-based distributed database metadata synchronization device and method
CN112364049B (en) * 2020-11-10 2024-05-17 中国平安人寿保险股份有限公司 Data synchronization script generation method, system, terminal and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967316A (en) * 2017-11-22 2018-04-27 平安科技(深圳)有限公司 A kind of method of data synchronization, equipment and computer-readable recording medium
CN110109897A (en) * 2019-04-15 2019-08-09 深圳壹账通智能科技有限公司 Database script generation method, device, computer equipment and storage medium
CN110543476A (en) * 2019-07-03 2019-12-06 威富通科技有限公司 Synchronization method and device of database table structure and server
CN110708335A (en) * 2019-10-29 2020-01-17 深圳市融壹买信息科技有限公司 Access authentication method and device and terminal equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967316A (en) * 2017-11-22 2018-04-27 平安科技(深圳)有限公司 A kind of method of data synchronization, equipment and computer-readable recording medium
CN110109897A (en) * 2019-04-15 2019-08-09 深圳壹账通智能科技有限公司 Database script generation method, device, computer equipment and storage medium
CN110543476A (en) * 2019-07-03 2019-12-06 威富通科技有限公司 Synchronization method and device of database table structure and server
CN110708335A (en) * 2019-10-29 2020-01-17 深圳市融壹买信息科技有限公司 Access authentication method and device and terminal equipment

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069261B (en) * 2020-09-09 2023-07-07 携程计算机技术(上海)有限公司 Data synchronization method, system, equipment and storage medium based on distributed system
CN112069261A (en) * 2020-09-09 2020-12-11 携程计算机技术(上海)有限公司 Data synchronization method, system, equipment and storage medium based on distributed system
CN112364049A (en) * 2020-11-10 2021-02-12 中国平安人寿保险股份有限公司 Data synchronization script generation method, system, terminal and storage medium
CN112364049B (en) * 2020-11-10 2024-05-17 中国平安人寿保险股份有限公司 Data synchronization script generation method, system, terminal and storage medium
CN112507020A (en) * 2020-11-20 2021-03-16 平安普惠企业管理有限公司 Data synchronization method and device, computer equipment and storage medium
CN112612783A (en) * 2020-12-22 2021-04-06 航天信息股份有限公司 Method for realizing cross-platform data sharing
CN112732242A (en) * 2021-01-12 2021-04-30 中国邮政储蓄银行股份有限公司 Wide table processing script generation method and device
CN113297326A (en) * 2021-05-21 2021-08-24 中国邮政储蓄银行股份有限公司 Data processing method and device, computer readable storage medium and processor
CN113204598A (en) * 2021-05-28 2021-08-03 平安科技(深圳)有限公司 Data synchronization method, system and storage medium
CN113204598B (en) * 2021-05-28 2023-05-09 平安科技(深圳)有限公司 Data synchronization method, system and storage medium
CN113254534A (en) * 2021-06-04 2021-08-13 四川省明厚天信息技术股份有限公司 Data synchronization method and device and computer storage medium
CN113486116A (en) * 2021-07-07 2021-10-08 建信金融科技有限责任公司 Data synchronization method and device, electronic equipment and computer readable medium
CN113672683B (en) * 2021-08-19 2024-03-29 上海沄熹科技有限公司 Spark SQL-based distributed database metadata synchronization device and method
CN115391459A (en) * 2022-08-24 2022-11-25 南京领行科技股份有限公司 Data synchronization method and device, electronic equipment and computer readable storage medium
CN116881244A (en) * 2023-06-05 2023-10-13 北京捷泰云际信息技术有限公司 Real-time processing method and device for space data based on column storage database
CN116881244B (en) * 2023-06-05 2024-03-26 易智瑞信息技术有限公司 Real-time processing method and device for space data based on column storage database

Similar Documents

Publication Publication Date Title
CN111324610A (en) Data synchronization method and device
CN109997126B (en) Event driven extraction, transformation, and loading (ETL) processing
CN107506451B (en) Abnormal information monitoring method and device for data interaction
US10558615B2 (en) Atomic incremental load for map-reduce systems on append-only file systems
CN101719149B (en) Data synchronization method and device
CN110795499B (en) Cluster data synchronization method, device, equipment and storage medium based on big data
CN111241203B (en) Hive data warehouse synchronization method, system, equipment and storage medium
CN111651519B (en) Data synchronization method, data synchronization device, electronic equipment and storage medium
CN112148788A (en) Data synchronization method and system for heterogeneous data source
CN103810272A (en) Data processing method and system
CN115934855A (en) Full-link field level blood margin analysis method, system, equipment and storage medium
CN113177090A (en) Data processing method and device
CN112948486A (en) Batch data synchronization method and system and electronic equipment
CN112347192A (en) Data synchronization method, device, platform and readable medium
CN115858488A (en) Parallel migration method and device based on data governance and readable medium
CN109473178B (en) Method, system, device and storage medium for medical data integration
EP2904520B1 (en) Reference data segmentation from single to multiple tables
CN109800069B (en) Method and device for realizing data management
CN105930354B (en) Storage model conversion method and device
CN113962597A (en) Data analysis method and device, electronic equipment and storage medium
CN112965939A (en) File merging method, device and equipment
CN103809915B (en) The reading/writing method of a kind of disk file and device
CN114547206A (en) Data synchronization method and data synchronization system
CN113672556A (en) Batch file migration method and device
CN108595552B (en) Data cube publishing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200623

WD01 Invention patent application deemed withdrawn after publication