CN111737355B - Heterogeneous data source synchronization method and system based on MongoDB metadata management - Google Patents

Heterogeneous data source synchronization method and system based on MongoDB metadata management Download PDF

Info

Publication number
CN111737355B
CN111737355B CN202010616235.9A CN202010616235A CN111737355B CN 111737355 B CN111737355 B CN 111737355B CN 202010616235 A CN202010616235 A CN 202010616235A CN 111737355 B CN111737355 B CN 111737355B
Authority
CN
China
Prior art keywords
metadata
data
mongodb
information
change
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010616235.9A
Other languages
Chinese (zh)
Other versions
CN111737355A (en
Inventor
达星宇
吴明杰
李晓峰
吴智良
李奇
江魁栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Yuefei Finance Cloud Technology Co ltd
Original Assignee
Guangdong Yuefei Finance Cloud Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Yuefei Finance Cloud Technology Co ltd filed Critical Guangdong Yuefei Finance Cloud Technology Co ltd
Priority to CN202010616235.9A priority Critical patent/CN111737355B/en
Publication of CN111737355A publication Critical patent/CN111737355A/en
Application granted granted Critical
Publication of CN111737355B publication Critical patent/CN111737355B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a heterogeneous data source synchronization method based on MongoDB metadata management, which comprises the following steps: s1, scanning a MySQL source data end to-be-synchronized table structure, and initializing metadata information of a source data table; s2, constructing a universal database metadata model in the MongoDB database based on the scanning result; s3, monitoring a binlog operation log of the MySQL database, analyzing the operation log, distinguishing the data operation behaviors, and carrying out subsequent processing; s4, acquiring data change information to be consumed in the distributed message queue; s5, acquiring information of the latest version of the metadata related to the MongoDB according to the data change information: s6, according to MongoDB metadata information and message queue data, the DML change data of the Kudu is converted and constructed in real time, and the Kudu data update is executed. The invention realizes the real-time synchronization of data between the relational database MySQL and the distributed column database Kudu, supports the data operation change of the DML and the DDL, and can maintain the metadata structure information of different historical versions.

Description

Heterogeneous data source synchronization method and system based on MongoDB metadata management
Technical Field
The invention relates to the technical field of information, in particular to a heterogeneous data source synchronization method and system based on MongoDB metadata management.
Background
In order to cope with large-scale data analysis and query, the current production databases of all financial business systems are MySQL, and business data needs to be synchronized to a column-type storage database Kudu with stronger query performance in real time, which supports horizontal expansion, high availability and distribution. However, the current mature data synchronization schemes are based on the data synchronization between traditional relational databases, such as the Canal framework of MySQL database and the OGG component of Oracle database. The data synchronization between the traditional relational database and the Hadoop platform distributed database only supports incremental data synchronization, the metadata of the source end and the target end can be performed when the metadata are completely consistent, the synchronization of DDL data operation change is not supported, the update and deletion synchronization of DML data operation is not supported, the synchronization limiting condition is high, and all data change operations of the business system database cannot be covered.
For the above reasons, there is a need to design a method that can realize real-time synchronization of data between the relational database MySQL and the distributed columnar database Kudu.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention aims to provide a heterogeneous data source synchronization method and system based on MongoDB metadata management, which can realize the real-time synchronization of data between a relational database MySQL and a distributed column database Kudu and can manage and maintain metadata structure information of different historical versions.
In order to achieve the above purpose, the present invention provides the following technical solutions:
the heterogeneous data source synchronization method based on MongoDB metadata management is characterized by comprising the following steps:
s1, starting a metadata change monitoring service, and initializing metadata information of a source data table by scanning a MySQL source data end to-be-synchronized table structure;
s2, based on the scanning result of the step S1, the metadata change monitoring service constructs a universal database metadata model in the MongoDB database;
s3, monitoring a binlog operation log of the MySQL database through a 'change data capturing' service, analyzing the operation log, distinguishing the data operation behaviors into a data change operation and a data structure change operation, and carrying out corresponding processing according to different data operation behaviors;
s4, acquiring data change information to be consumed in a distributed message queue through a data processing unit;
s5, according to the data change information acquired in the step S4, the data processing unit acquires information of the latest version of the metadata related to MongoDB:
s6, the data processing unit converts the DML change data constructing the Kudu in real time according to the MongoDB metadata information and the message queue data, and executes the update of the Kudu data.
As a preferable scheme: in the S3 step, aiming at the data changing operation, the changed data is written into a distributed message queue in a serialization manner according to a MongoDB storage metadata structure; and aiming at the metadata change operation, only the general metadata change operation is reserved, and the metadata change operation information is updated to the latest version of the MongoDB metadata model.
As a preferable scheme: s5, if the metadata information does not exist, a Kudu metadata structure is created according to the latest version of the MongoDB; if the metadata information exists in the Kudu, the data processing unit compares the metadata version information between the MongoDB and the Kudu, and if the metadata version of the Kudu is smaller than the latest version reserved by the MongoDB, the data processing unit updates the metadata structure information of the Kudu so as to keep consistent with the latest version of the metadata in the MongoDB.
A system for heterogeneous data source synchronization based on MongoDB metadata management, comprising:
MySQL database;
kudu database;
a MongoDB database;
the metadata change monitoring module is used for initializing metadata information of a source data table by scanning a MySQL source data end to-be-synchronized table structure and constructing a universal database metadata model in the MongoDB database based on a scanning result;
the change data capture module is used for monitoring a binlog operation log of the MySQL database, distinguishing data operation behavior from data change operation (DML) and data structure change operation (DDL) after analyzing the operation log, and carrying out corresponding processing according to different data operation behaviors;
the distributed message queue is used for writing data change information;
the data processing module is used for acquiring the data change information to be consumed in the distributed message queue and acquiring the latest version information of the metadata related to MongoDB according to the acquired data change information.
As a preferable scheme: for data changing operation, the change data capturing module writes the changed data into a distributed message queue in a serialization manner according to a MongoDB storage metadata structure; and aiming at the metadata change operation, only the general metadata change operation is reserved, and the metadata change operation information is updated to the latest version of the MongoDB metadata model.
As a preferable scheme: when the data processing module acquires the latest version information of the metadata of the MongoDB, if the metadata information does not exist, a Kudu metadata structure is created according to the latest version of the MongoDB, if the metadata information exists in the Kudu, the data processing unit compares the metadata version information between the MongoDB and the Kudu, and if the metadata version of the Kudu metadata is smaller than the latest version reserved by the MongoDB, the data processing unit updates the Kudu metadata structure information so as to keep consistent with the latest version of the metadata in the MongoDB; the data processing module is also used for converting the DML change data constructing the Kudu in real time according to the MongoDB metadata information and the message queue data and executing the update of the Kudu data.
Compared with the prior art, the invention has the advantages that: the method provided by the invention mainly solves the problem of real-time synchronization of data between the relational database MySQL and the distributed columnar database Kudu, and comprises the steps of supporting data operation change of DML and DDL, and managing and maintaining metadata structure information of different historical versions.
Drawings
FIG. 1 is a flow chart of a method in a first embodiment;
fig. 2 is a schematic diagram of a system in a second embodiment.
Detailed Description
Embodiment one:
referring to fig. 1, a method for heterogeneous data source synchronization based on MongoDB metadata management includes the steps of:
s1, starting a metadata change monitoring service, and initializing metadata information of a source data table by scanning a MySQL source data end to-be-synchronized table structure.
S2, based on the scanning result of the step S1, the Metadata change monitoring service constructs a universal database Metadata model (Metadata) in the MongoDB database.
S3, monitoring a binlog operation log of the MySQL database through a 'change data capturing' service. After analyzing the operation log, distinguishing the data operation behavior into a data change operation (DML) and a data structure change operation (DDL).
a. And for data changing operation, the changed data are written into the distributed message queue in a serialization manner according to the MongoDB storage metadata structure.
b. And aiming at the metadata change operation, only the general metadata change operation is reserved, and the metadata change operation information is updated to the latest version of the MongoDB metadata model.
S4, acquiring data change information to be consumed in the distributed message queue through a data processing unit.
S5, according to the data change information acquired in the step S4, the data processing unit acquires information of the latest version of the metadata related to MongoDB:
if the metadata information does not exist, a Kudu metadata structure is created according to the latest version of the MongoDB;
II, if the metadata information exists in the Kudu, the data processing unit compares the metadata version information between the MongoDB and the Kudu. If the Kudu metadata version is less than the latest version retained by the MongoDB, the data processing unit will update the Kudu metadata structure information to remain consistent with the latest version of metadata in the MongoDB.
S6, the data processing unit converts the DML change data constructing the Kudu in real time according to the MongoDB metadata information and the message queue data, and executes the update of the Kudu data.
Embodiment two:
referring to fig. 2, a system for heterogeneous data source synchronization based on mongo db metadata management, comprising:
MySQL database;
kudu database;
a MongoDB database;
and the metadata change monitoring module is used for initializing metadata information of the source data table by scanning a MySQL source data end to-be-synchronized table structure and constructing a universal database metadata model in the MongoDB database based on a scanning result.
The change data capture module is used for monitoring a binlog operation log of the MySQL database, analyzing the operation log and distinguishing data operation behavior from data change operation (DML) and data structure change operation (DDL); for data changing operation, changing data is written into a distributed message queue in a serialization mode according to a MongoDB storage metadata structure; and aiming at the metadata change operation, only the general metadata change operation is reserved, and the metadata change operation information is updated to the latest version of the MongoDB metadata model.
The distributed message queue is used for writing data change information;
the data processing module is used for acquiring data change information to be consumed in the distributed message queue, acquiring metadata latest version information related to the MongoDB according to the acquired data change information, if the metadata information does not exist, creating a Kudu metadata structure according to the MongoDB latest version, if the metadata information exists in the Kudu, comparing the metadata version information between the MongoDB and the Kudu by the data processing unit, and if the Kudu metadata version is smaller than the latest version reserved by the MongoDB, updating the Kudu metadata structure information by the data processing unit so as to keep consistent with the MongoDB metadata latest version; the data processing module is also used for converting the DML change data constructing the Kudu in real time according to the MongoDB metadata information and the message queue data and executing the update of the Kudu data.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims (6)

1. The heterogeneous data source synchronization method based on MongoDB metadata management is characterized by comprising the following steps:
s1, starting a metadata change monitoring service, and initializing metadata information of a source data table by scanning a MySQL source data end to-be-synchronized table structure;
s2, based on the scanning result of the step S1, the metadata change monitoring service constructs a universal database metadata model in the MongoDB database;
s3, monitoring a binlog operation log of the MySQL database through a 'change data capturing' service, analyzing the operation log, distinguishing the data operation behaviors into a data change operation and a data structure change operation, and carrying out corresponding processing according to different data operation behaviors;
s4, acquiring data change information to be consumed in a distributed message queue through a data processing unit;
s5, according to the data change information acquired in the step S4, the data processing unit acquires information of the latest version of the metadata related to MongoDB:
s6, the data processing unit converts the DML change data constructing the Kudu in real time according to the MongoDB metadata information and the message queue data, and executes the update of the Kudu data.
2. The method for heterogeneous data source synchronization based on MongoDB metadata management according to claim 1, wherein: in the S3 step, aiming at the data changing operation, the changed data is written into a distributed message queue in a serialization manner according to a MongoDB storage metadata structure; and aiming at the metadata change operation, only the general metadata change operation is reserved, and the metadata change operation information is updated to the latest version of the MongoDB metadata model.
3. The method for heterogeneous data source synchronization based on MongoDB metadata management according to claim 1, wherein: s5, if the metadata information does not exist, a Kudu metadata structure is created according to the latest version of the MongoDB; if the metadata information exists in the Kudu, the data processing unit compares the metadata version information between the MongoDB and the Kudu, and if the metadata version of the Kudu is smaller than the latest version reserved by the MongoDB, the data processing unit updates the metadata structure information of the Kudu so as to keep consistent with the latest version of the metadata in the MongoDB.
4. A system for heterogeneous data source synchronization based on MongoDB metadata management, comprising:
MySQL database;
kudu database;
a MongoDB database;
the metadata change monitoring module is used for initializing metadata information of a source data table by scanning a MySQL source data end to-be-synchronized table structure and constructing a universal database metadata model in the MongoDB database based on a scanning result;
the change data capture module is used for monitoring a binlog operation log of the MySQL database, distinguishing data operation behavior from data change operation (DML) and data structure change operation (DDL) after analyzing the operation log, and carrying out corresponding processing according to different data operation behaviors;
the distributed message queue is used for writing data change information;
the data processing module is used for acquiring the data change information to be consumed in the distributed message queue and acquiring the latest version information of the metadata related to MongoDB according to the acquired data change information.
5. The system for heterogeneous data source synchronization based on MongoDB metadata management according to claim 4, wherein: for data changing operation, the change data capturing module writes the changed data into a distributed message queue in a serialization manner according to a MongoDB storage metadata structure; and aiming at the metadata change operation, only the general metadata change operation is reserved, and the metadata change operation information is updated to the latest version of the MongoDB metadata model.
6. The system for heterogeneous data source synchronization based on MongoDB metadata management according to claim 4, wherein: when the data processing module acquires the latest version information of the metadata of the MongoDB, if the metadata information does not exist, a Kudu metadata structure is created according to the latest version of the MongoDB, if the metadata information exists in the Kudu, the data processing unit compares the metadata version information between the MongoDB and the Kudu, and if the metadata version of the Kudu metadata is smaller than the latest version reserved by the MongoDB, the data processing unit updates the Kudu metadata structure information so as to keep consistent with the latest version of the metadata in the MongoDB; the data processing module is also used for converting the DML change data constructing the Kudu in real time according to the MongoDB metadata information and the message queue data and executing the update of the Kudu data.
CN202010616235.9A 2020-06-29 2020-06-29 Heterogeneous data source synchronization method and system based on MongoDB metadata management Active CN111737355B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010616235.9A CN111737355B (en) 2020-06-29 2020-06-29 Heterogeneous data source synchronization method and system based on MongoDB metadata management

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010616235.9A CN111737355B (en) 2020-06-29 2020-06-29 Heterogeneous data source synchronization method and system based on MongoDB metadata management

Publications (2)

Publication Number Publication Date
CN111737355A CN111737355A (en) 2020-10-02
CN111737355B true CN111737355B (en) 2023-06-23

Family

ID=72653889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010616235.9A Active CN111737355B (en) 2020-06-29 2020-06-29 Heterogeneous data source synchronization method and system based on MongoDB metadata management

Country Status (1)

Country Link
CN (1) CN111737355B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559475B (en) * 2020-12-11 2023-01-10 上海哔哩哔哩科技有限公司 Data real-time capturing and transmitting method and system
CN113051347B (en) * 2021-03-25 2024-03-29 未鲲(上海)科技服务有限公司 Method, system, equipment and storage medium for synchronizing data between heterogeneous databases
CN113342578A (en) * 2021-06-28 2021-09-03 上海万向区块链股份公司 Method and system for realizing MySQL data free recovery
CN114048193A (en) * 2022-01-12 2022-02-15 树根互联股份有限公司 Data management and control method, device and computer readable storage medium
CN115794827B (en) * 2022-11-29 2023-07-21 广发银行股份有限公司 Data table structure management system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701181A (en) * 2016-01-06 2016-06-22 中电科华云信息技术有限公司 Dynamic heterogeneous metadata acquisition method and system
CN107783975A (en) * 2016-08-24 2018-03-09 北京京东尚科信息技术有限公司 The method and apparatus of distributed data base synchronization process
US10459647B1 (en) * 2017-03-02 2019-10-29 Amazon Technologies, Inc. Multiple storage class representation in versioned storage

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156165A (en) * 2015-04-16 2016-11-23 阿里巴巴集团控股有限公司 Method of data synchronization between heterogeneous data source and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701181A (en) * 2016-01-06 2016-06-22 中电科华云信息技术有限公司 Dynamic heterogeneous metadata acquisition method and system
CN107783975A (en) * 2016-08-24 2018-03-09 北京京东尚科信息技术有限公司 The method and apparatus of distributed data base synchronization process
US10459647B1 (en) * 2017-03-02 2019-10-29 Amazon Technologies, Inc. Multiple storage class representation in versioned storage

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
异构环境下数据库增量同步更新机制;王玉标;饶锡如;何盼;;计算机工程与设计(03);全文 *

Also Published As

Publication number Publication date
CN111737355A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN111737355B (en) Heterogeneous data source synchronization method and system based on MongoDB metadata management
CN112445863B (en) Data real-time synchronization method and system
CN110222236A (en) The generation of XML message template and update method and its system
CN110175213A (en) A kind of oracle database synchronization system and method based on SCN mode
CN111563102A (en) Cache updating method, server, system and storage medium
WO2018178641A1 (en) Data replication system
CN102291416A (en) Two-way synchronizing method and system of client-side and server-side
CN111125260A (en) Data synchronization method and system based on SQL Server
CN109558452B (en) Synchronization method for query table building operation
CN102752372A (en) File based database synchronization method
CN104506625A (en) Method for improving reliability of metadata nodes of cloud databases
CN108062314B (en) Dynamic sub-table data processing method and device
CN109947801A (en) Database in phase system, method and device
CN115098567B (en) Low-code platform data transmission method based on BI platform
CN111274257A (en) Real-time synchronization method and system based on data
CN111752920A (en) Method, system, and storage medium for managing metadata
CN116975159B (en) Incremental data synchronization processing method
CN104252505A (en) Method and device for synchronizing database instance in database management platform
CN116089545B (en) Method for collecting storage medium change data into data warehouse
CN111400321A (en) Method for automatically recycling high water level based on ORAC L E database
CN116450660A (en) Method and device for processing primary key conflict in data synchronization
CN105630997A (en) Data parallel processing method, device and equipment
CN114925042A (en) Method for constructing metadata relation based on graphic database
CN110515955B (en) Data storage and query method and system, electronic equipment and storage medium
CN109344192B (en) Optimized CIMISS database system and adaptation method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant