CN112269823A - Method and system for realizing synchronization of PostgreSQL incremental data - Google Patents

Method and system for realizing synchronization of PostgreSQL incremental data Download PDF

Info

Publication number
CN112269823A
CN112269823A CN202011190462.6A CN202011190462A CN112269823A CN 112269823 A CN112269823 A CN 112269823A CN 202011190462 A CN202011190462 A CN 202011190462A CN 112269823 A CN112269823 A CN 112269823A
Authority
CN
China
Prior art keywords
sql
postgresql
incremental data
wal
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011190462.6A
Other languages
Chinese (zh)
Inventor
朱传举
齐光鹏
李朝铭
方建勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202011190462.6A priority Critical patent/CN112269823A/en
Publication of CN112269823A publication Critical patent/CN112269823A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a system for realizing synchronization of PostgreSQL incremental data, belonging to the technical field of data synchronization.

Description

Method and system for realizing synchronization of PostgreSQL incremental data
Technical Field
The invention relates to a data synchronization technology, in particular to a method and a system for realizing postgreSQL incremental data synchronization based on WAL logs.
Background
The postgreSQL is an object-relational database management system (ORDBMS), not only supports various functions of a relational database, but also has the characteristics of class, inheritance and other object databases, simultaneously provides writing and installation of user-defined plug-ins, is an opening source database with the most powerful functions, the most abundant characteristics and the most complex structure at present, and is not provided with any business database in some characteristic settings. As PostgreSQL becomes more and more powerful and users become more and more, PostgreSQL becomes one of the databases that we need to pay more attention to.
The WAL mechanism, originally implemented in version 7.1, is used to mitigate the effects of data being difficult to recover due to server crashes, and is extremely important for database failure recovery, while also making it possible to obtain PostgreSQL operation records by parsing WAL logs, the WAL segment is by default a 16MB file, which is internally divided into 8192 byte pages, the first page is the header data defined by the structure xloglongpageheader data, and the headers of all other pages have page information defined by the structure xlogpageheader data. After the page header, XLOG records are written to each page in descending order from the beginning.
In the stored WAL log, each XLOG record is each record for operating PostgreSQL, the records are stored in the WAL log in sequence, each XLOG record contains LSN of the operation record, when a database is abnormally recovered, only a WAL log file where a last checkpoint point corresponds to the LSN is needed to be found and is sequentially analyzed, the abnormal state recovery of the database can be completed, the incremental data synchronization is also the same, the WAL log file is sequentially analyzed by obtaining the time or the LSN when the incremental data starts, the incremental data of the PostgreSQL can be obtained, and the incremental data synchronization to other databases is completed, so the WAL is an important flashback log.
Along with the appearance of the WAL (write Ahead Log) log after the 7.1 version of the PostgreSQL database, the method provides possibility for the technical implementation of fault recovery, incremental data synchronization and the like of the PostgreSQL database, wherein the WAL is a pre-written transaction log, and history RECORDs of changes and operations of the database are stored in the WAL log in the form of XLOG RECORD before the actual writing into the database.
The current PostgreSQL and other types of databases realize data synchronization, the realization mode of full-volume incremental acquisition is not much, and the PostgreSQL does not provide a log analysis tool and is used for analyzing and generating corresponding redo and undo SQL.
Disclosure of Invention
In order to solve the technical problem, the invention provides a method for realizing synchronization of postgreSQL incremental data based on a WAL log.
The invention provides an interface for external access analysis of the log through a WAL log analysis tool, an external user can realize WAL log analysis by using the function just like calling the SQL of PostgreSQL, the incremental data acquisition of a source end database is completed, the incremental data acquisition reaches a target end by means of CMSP transmission, and the aim of synchronizing the incremental data source end and the target end data is realized through a conversion and storage module.
The technical scheme of the invention is as follows:
a method for realizing synchronization of postgreSQL incremental data includes analyzing WAL log generated by production library to be SQL corresponding to operation, filtering through table name or user name to obtain incremental data, transmitting said incremental data to destination end in real time and efficiently by means of CMSP transmission data, converting obtained incremental data to be operation SQL of destination end database and realizing synchronization of postgreSQL with other common type databases and incremental data.
The method comprises the following steps:
1) acquiring incremental data of a PostgreSQL database through WAL log analysis;
2) filtering incremental data needing to be collected by setting a collected table name or a collected user name;
3) and the destination end converts the SQL into corresponding SQL in real time according to the type of the destination database and writes the SQL into the database.
Further, in the above-mentioned case,
and calling a wal parse tool to generate a corresponding redo log.
When the tool is used, the log analysis tool is registered in extension of PostgreSQL and provided for a user interface, and then the analysis of incremental data can be completed.
The log analysis tool outputs pg dml change information into an sql format, which is an extension of pg and supports a pg9.4+ version.
The log analysis tool extracts the information of each changed record column, can generate corresponding sql, or converts the sql into binary data according to a custom format protocol for the consumption of a client program of postgresql. A database dml change record is obtained.
The invention also provides a system for realizing the synchronization of PostgreSQL incremental data, which is characterized by comprising a log analysis tool wal parse tool, an incremental acquisition module, a conversion module of a destination end and a warehousing module,
wherein the content of the first and second substances,
a log analysis tool WAL part tool is registered in a production library of PostgreSQL or a backup library of the WAL log of the production library, automatically analyzed into SQL operated by DML according to the generated WAL log, and sent to an increment acquisition module in real time;
the incremental acquisition module is used for setting acquired filtering conditions according to the table name or user name information through parameter setting, acquiring the incremental SQL sent by the log analysis tool in real time and sending the incremental data to a destination end by means of CMSP;
the conversion module of the destination end converts the incremental data sent by the CMSP into the incremental SQL of the corresponding database in real time according to the database type of the destination end and various parameter settings, such as field type mapping, coding format and other information;
and the warehousing module is used for warehousing the converted incremental SQL.
The invention carries out efficient transmission of messages by means of CMSP, thereby greatly improving the transmission efficiency of data; the system verifies performance indexes such as processing efficiency, processing accuracy and the like of the system by means of CMSP transmission.
Drawings
Fig. 1 is a schematic diagram of the operation of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
The CMSP is a high-performance message middleware in cloud and big data era, provides high-performance and high-reliability message queue service externally in a cloud service and cluster mode, and supports a big data acquisition, transmission, aggregation and exchange, big data real-time processing and micro-service processing architecture.
The design of the invention is based on that PostgreSQL database provides writing and installation of external plug-ins, and sets WAL log analysis tool, registers the tool in extension of PostgreSQL, and provides a convenient interface for users like SQL of database, thus completing analysis of incremental data, and combining CMSP transmission, realizing real-time and efficient transmission of the incremental data in PostgreSQL to the target end database and synchronizing function.
The invention realizes the analysis of the DML operation record, generates SQL containing all column field types, and sends the SQL to the CMSP acquisition module in real time.
The acquisition module is combined with the efficient transmission characteristic of the CMSP to send the acquired data to the destination terminal, the destination terminal starts corresponding conversion according to the type of the database, and the warehousing module writes the incremental data into the destination database in real time.
The working steps are as follows:
1) according to the invention, the PostgreSQL database incremental data acquisition is realized through WAL log analysis;
2) filtering incremental data needing to be collected by setting a collected table name or a collected user name;
3) the target end converts the SQL into a corresponding SQL in real time according to the type of the target database and the type of the target end database, and the storage module is responsible for writing the SQL into the database;
technical characteristics of wall part tool log analysis tool
(1) A test _ decoding plugin encoding instance is provided in the source code following the postgresql 9.4+ version, for example, the plugin can resolve the dml data change information into
Figure BDA0002752631630000051
(2) The wal parse tool can output pg dml change information into an sql format, is an extension of pg, and supports pg9.4+ version.
(3) The main working principle of the tool is as follows: the information of each changed record column is extracted, the corresponding sql can be generated, and the information can also be converted into binary data according to a custom format protocol for consumption by a postgresql client program. A database dml change record is obtained.
(4) The tool is similar to the wal2json plug-in for the open source community, wal2json parsing dml change records into json format, while the wal parse tool of the present invention parses them into sql format.
As shown in fig. 1:
the operation process of the invention is as follows:
1) and (3) registering the WAL log analysis tool in a production library of PostgreSQL or a backup library of the WAL log of the production library, automatically analyzing the WAL parse tool into SQL operated by the DML according to the generated WAL log, and sending the SQL to the increment acquisition module in real time.
2) And the increment acquisition module can set acquired filtering conditions according to information such as table names or user names through parameter setting, acquire the increment SQL sent by the wal false tool in real time and send the increment data to a destination end by means of CMSP.
3) The conversion module of the destination end converts the incremental data sent by the CMSP into the incremental SQL of the corresponding database in real time according to the database type of the destination end and various parameter settings, such as field type mapping, coding format and other information,
4) and the warehousing module rapidly warehouses the converted incremental SQL.
It is also the basis of the present invention to validate WAL the procedure that the log parsing tool WAL parse tool operates DML as SQL.
Service database demodb, service table _ student field id int, name varchar (30), and address varchar (30).
(1) Insertion operation
Figure BDA0002752631630000061
(2) Update operations
Figure BDA0002752631630000062
Figure BDA0002752631630000071
(3) Delete operation
Figure BDA0002752631630000072
According to the verification, when the business table Record is inserted, updated or deleted, the WAL parse tool can start incremental data analysis in real time according to Record recorded in the WAL log, and convert the incremental data analysis into corresponding SQL, and the SQL can contain field types of all columns in the table _ student table. And according to the conversion and storage module developed based on CMSP transmission, the synchronization of the incremental data of the source table and the incremental data of the target table is realized.
Limitation of conditions
(1) The wal part tool mainly resolves SQL of DML operation, and the DDL type SQL is incompletely resolved.
(2) If the analysis is executed in the non-production library, the production library is required to generate a matched data dictionary for the analysis of the non-production library.
(3) Only wal files that are consistent with the data dictionary timeline can be parsed.
The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (8)

1. A method for realizing synchronization of PostgreSQL incremental data is characterized in that,
the method comprises the following steps:
1) acquiring incremental data of a PostgreSQL database through WAL log analysis;
2) filtering incremental data needing to be collected by setting a collected table name or a collected user name;
3) and the destination end converts the SQL into corresponding SQL in real time according to the type of the destination database and writes the SQL into the database.
2. The method of claim 1,
and resolving the log of the WAL generated by the production library into SQL corresponding to the operation.
3. The method of claim 2,
and filtering through a table name or a user name to obtain incremental data, transmitting the incremental data to a destination end by means of CMSP transmission data, converting the incremental data into operation SQL of a database of the destination end, and realizing the function of synchronizing the PostgreSQL with other databases of common types and the incremental data.
4. The method according to claim 2 or 3,
and calling a wal parse tool to generate a corresponding redo log.
5. The method of claim 2,
the log analysis tool is registered in extension of PostgreSQL and provided for a user interface, and then the incremental data can be analyzed.
6. The method of claim 5,
the log analysis tool outputs pg dml change information into an sql format, which is an extension of pg and supports a pg9.4+ version.
7. The method of claim 6,
the log analysis tool extracts the information of each changed record column, can generate corresponding sql, or converts the sql into binary data according to a custom format protocol for the consumption of a client program of postgresql, and obtains the change record of the database dml.
8. A system for implementing PostgreSQL incremental data synchronization,
comprises a log analysis tool wal parse tool, an increment acquisition module, a conversion module of a destination end and a warehousing module,
wherein the content of the first and second substances,
a log analysis tool WAL part tool is registered in a production library of PostgreSQL or a backup library of the WAL log of the production library, automatically analyzed into SQL operated by DML according to the generated WAL log, and sent to an increment acquisition module in real time;
the incremental acquisition module is used for setting acquired filtering conditions according to the table name or user name information through parameter setting, acquiring the incremental SQL sent by the log analysis tool in real time and sending the incremental data to a destination end by means of CMSP;
the conversion module of the destination end converts the incremental data sent by the CMSP into the incremental SQL of the corresponding database in real time according to the type of the database of the destination end and various parameter settings;
and the warehousing module is used for warehousing the converted incremental SQL.
CN202011190462.6A 2020-10-30 2020-10-30 Method and system for realizing synchronization of PostgreSQL incremental data Pending CN112269823A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011190462.6A CN112269823A (en) 2020-10-30 2020-10-30 Method and system for realizing synchronization of PostgreSQL incremental data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011190462.6A CN112269823A (en) 2020-10-30 2020-10-30 Method and system for realizing synchronization of PostgreSQL incremental data

Publications (1)

Publication Number Publication Date
CN112269823A true CN112269823A (en) 2021-01-26

Family

ID=74345691

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011190462.6A Pending CN112269823A (en) 2020-10-30 2020-10-30 Method and system for realizing synchronization of PostgreSQL incremental data

Country Status (1)

Country Link
CN (1) CN112269823A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112783927A (en) * 2021-01-27 2021-05-11 浪潮云信息技术股份公司 Database query method and system
CN112883118A (en) * 2021-03-31 2021-06-01 浪潮云信息技术股份公司 Method and system for synchronously acquiring incremental data based on sql
CN116226093A (en) * 2023-04-25 2023-06-06 北京庚顿数据科技有限公司 Real-time database system based on dual-activity high-availability architecture
CN116361076A (en) * 2023-06-01 2023-06-30 杭州费尔斯通科技有限公司 Domestic database backup method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709043A (en) * 2016-12-30 2017-05-24 江苏瑞中数据股份有限公司 Data synchronous loading method based on database log
CN108228756A (en) * 2017-12-21 2018-06-29 江苏瑞中数据股份有限公司 Data based on the PG databases of daily record analytic technique to Hadoop platform synchronize clone method
CN110727548A (en) * 2019-09-29 2020-01-24 上海英方软件股份有限公司 Continuous data protection method and device based on database DML synchronization
CN110879813A (en) * 2019-11-20 2020-03-13 浪潮软件股份有限公司 Binary log analysis-based MySQL database increment synchronization implementation method
CN111209344A (en) * 2020-02-07 2020-05-29 浪潮软件股份有限公司 Data synchronization method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709043A (en) * 2016-12-30 2017-05-24 江苏瑞中数据股份有限公司 Data synchronous loading method based on database log
CN108228756A (en) * 2017-12-21 2018-06-29 江苏瑞中数据股份有限公司 Data based on the PG databases of daily record analytic technique to Hadoop platform synchronize clone method
CN110727548A (en) * 2019-09-29 2020-01-24 上海英方软件股份有限公司 Continuous data protection method and device based on database DML synchronization
CN110879813A (en) * 2019-11-20 2020-03-13 浪潮软件股份有限公司 Binary log analysis-based MySQL database increment synchronization implementation method
CN111209344A (en) * 2020-02-07 2020-05-29 浪潮软件股份有限公司 Data synchronization method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112783927A (en) * 2021-01-27 2021-05-11 浪潮云信息技术股份公司 Database query method and system
CN112783927B (en) * 2021-01-27 2023-03-17 浪潮云信息技术股份公司 Database query method and system
CN112883118A (en) * 2021-03-31 2021-06-01 浪潮云信息技术股份公司 Method and system for synchronously acquiring incremental data based on sql
CN116226093A (en) * 2023-04-25 2023-06-06 北京庚顿数据科技有限公司 Real-time database system based on dual-activity high-availability architecture
CN116226093B (en) * 2023-04-25 2023-08-08 北京庚顿数据科技有限公司 Real-time database system based on dual-activity high-availability architecture
CN116361076A (en) * 2023-06-01 2023-06-30 杭州费尔斯通科技有限公司 Domestic database backup method
CN116361076B (en) * 2023-06-01 2023-11-14 杭州费尔斯通科技有限公司 Domestic database backup method

Similar Documents

Publication Publication Date Title
CN112269823A (en) Method and system for realizing synchronization of PostgreSQL incremental data
KR101917806B1 (en) Synchronization Error Detection AND Replication Method of Database Replication System Using SQL Packet Analysis
CN109739867B (en) Industrial metadata management method and system
CN112685433B (en) Metadata updating method and device, electronic equipment and computer-readable storage medium
CN111125260A (en) Data synchronization method and system based on SQL Server
CN109753502B (en) Data acquisition method based on NiFi
CN113468170B (en) System for automatically realizing real-time synchronization of data
CN105786998A (en) Database middleware system and method for processing data through database middleware system
CN112231407B (en) DDL synchronization method, device, equipment and medium of PostgreSQL database
CN111708673A (en) Log data compression method, device, equipment and storage medium
CN112988770B (en) Method, device, electronic equipment and storage medium for updating serial number
CN113312376B (en) Method and terminal for real-time processing and analysis of Nginx logs
CN114691704A (en) Metadata synchronization method based on MySQL binlog
CN110851409A (en) Log compression and decompression method, device and storage medium
CN109902070B (en) WiFi log data-oriented analysis storage search method
CN111400303B (en) Intelligent contract data extraction and synchronization method and system
CN109145155A (en) High-concurrency warehousing processing method for mass remote sensing image metadata
CN108228756A (en) Data based on the PG databases of daily record analytic technique to Hadoop platform synchronize clone method
CN109992476B (en) Log analysis method, server and storage medium
CN110222121A (en) A kind of SQL Server database increment synchronization realization method and system based on CDC mode
CN111125045B (en) Lightweight ETL processing platform
CN112883118A (en) Method and system for synchronously acquiring incremental data based on sql
CN111770145A (en) One-way network bilateral data synchronization system and method based on log analysis
CN116303427A (en) Data processing method and device, electronic equipment and storage medium
CN113342890A (en) Method and system for realizing cross-security isolation gatekeeper database real-time synchronization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210126

RJ01 Rejection of invention patent application after publication