CN112256485A - Data backup method, device, medium and computing equipment - Google Patents

Data backup method, device, medium and computing equipment Download PDF

Info

Publication number
CN112256485A
CN112256485A CN202011199633.1A CN202011199633A CN112256485A CN 112256485 A CN112256485 A CN 112256485A CN 202011199633 A CN202011199633 A CN 202011199633A CN 112256485 A CN112256485 A CN 112256485A
Authority
CN
China
Prior art keywords
log file
database
data
disk
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011199633.1A
Other languages
Chinese (zh)
Other versions
CN112256485B (en
Inventor
余利华
温正湖
蒋鸿翔
王刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN202011199633.1A priority Critical patent/CN112256485B/en
Publication of CN112256485A publication Critical patent/CN112256485A/en
Application granted granted Critical
Publication of CN112256485B publication Critical patent/CN112256485B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

The invention provides a data backup method. The method is applied to any one of a plurality of databases included in a distributed system, and the method firstly responds to the restart of any one database and determines the log file supply equipment. And then sending an acquisition request to the log file supply device according to the first archive log file aiming at the latest disk refreshing operation executed by any database so as to acquire a second archive log file recorded with the transaction information aiming at the data to be backed up. And finally, executing a disk refreshing operation aiming at the second filing log file so as to store the data to be backed up into a disk of any one database. The filing log and the submitted data in any database are stored in a disk by adopting other updating rules except the real-time updating rule. By the method, the integrity of the data transaction data can be ensured while high processing performance of the relational database is ensured. The invention also provides a data backup device, a medium and a computing device.

Description

Data backup method, device, medium and computing equipment
Technical Field
The embodiment of the invention relates to the technical field of databases, in particular to a data backup method, a data backup device, a data backup medium and a computing device.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
A relational database (e.g., MySQL) is used as an On-Line Transaction Processing (OLTP) database, and a "double one" setting is generally adopted to avoid data loss due to a server downtime after a Transaction is submitted. The "double one" setting is: performing a disk flushing operation on an archive log (e.g., Binlog) file each time a transaction group commit (group commit) is performed, to store the archive log of the transaction to a disk in real time; meanwhile, after each transaction is submitted in the engine, the redo log file corresponding to the transaction is stored to the disk in real time.
When the 'double one' setting is adopted, the processing performance of the relational database is reduced due to the real-time disk refreshing, and the execution efficiency of normal transactions is influenced.
Disclosure of Invention
Therefore, in the prior art, data integrity and database processing efficiency cannot be considered, which is a very annoying process.
Therefore, an improved data backup method is highly needed to ensure the integrity of data and avoid the loss of data while not storing the archive log and the redo log in real time in order to ensure the processing performance of the database.
In this context, embodiments of the present invention desirably provide a method for accurately acquiring data to be backed up that is not stored locally in a restart database when the database in a distributed system is restarted, so as to ensure the integrity of the data.
In a first aspect of embodiments of the present invention, a data backup method is provided, which is applied to any one of a plurality of databases included in a distributed system. The method comprises the following steps: in response to a reboot of any of the databases, determining a log file serving device; according to a first filing log file which is aimed at by the latest disk refreshing operation executed by any database, sending an acquisition request to log file supply equipment to acquire a second filing log file which is recorded with transaction information aiming at data to be backed up; and executing a disk refreshing operation aiming at the second filing log file so as to store the data to be backed up into the disk of any database, wherein the filing log and the submitted data in any database are stored into the disk by adopting other updating rules except the real-time updating rule.
In an embodiment of the present invention, the sending of the acquisition request to the log file provision apparatus includes: and sending an acquisition request to the log file supply device according to an offset position prestored in any database, wherein the offset position indicates the position of the transaction information for the latest disk refreshing operation in the first archiving log file.
In another embodiment of the present invention, the sending the acquisition request to the log file providing device according to the offset location pre-stored in any one of the databases includes: intercepting the first filing log file according to the offset position to obtain a log segment, wherein the transaction information in the log segment is the transaction information for the executed disk refreshing operation; initializing a preset parameter according to the log segment, wherein the preset parameter is used for recording a global identifier of transaction information for which the executed disk refreshing operation is performed; and sending an acquisition request to log file supply equipment according to the initialized preset parameters.
In another embodiment of the present invention, the performing the disk-flushing operation on the second archive log file includes: a disk-flushing operation is performed for the second archive log file based on the idempotent mode.
In a further embodiment of the present invention, the data stored in the disk of any one of the databases is stored in a table format, and the data backup method further includes, during the process of backing up the data by any one of the databases: in response to completing the update operation to the structure of the table stored in the disk, a new archive log file is generated.
In another embodiment of the present invention, the data backup method further includes: updating the offset position in the process of backing up the data in any database; updating the offset position includes: acquiring first operation time of the latest disk refreshing operation aiming at the filing log file and a disk refreshing position of the aimed transaction information in a third filing log file; acquiring second operation time of the latest disk refreshing operation aiming at the redo log file; and updating the offset position according to the first operation time, the second operation time and the brushing position.
In yet another embodiment of the present invention, the updating the offset position according to the first operation time, the second operation time, and the brush tray position includes: under the condition that the first operation time is earlier than the second operation time, updating the offset position to be the initial position of the third filing log file; and under the condition that the first operation time is not earlier than the second operation time, updating the offset position to be the brushing disk position.
In a further embodiment of the present invention, before sending the acquisition request to the log file provision device, the data backup method further includes: determining whether any database comprises a fourth archiving log file which is not executed with the disk refreshing operation; and under the condition that the fourth archiving log file is determined to exist, executing a disk refreshing operation on the fourth archiving log file, wherein under the condition that the fourth archiving log file is determined to exist, the transaction information in the fourth archiving log file is acquired and requested to be sent after the disk refreshing operation is executed; and after determining that the fourth archived log file does not exist, the acquisition request is sent according to the first archived log file aiming at the latest disk refreshing operation executed before the downtime of any database.
In still another embodiment of the present invention, the determining the log file provision apparatus in response to the restart of any one of the databases as described above includes: in response to the reboot of any one of the databases, determining a master database of the plurality of databases; determining the main database as a log file supply device under the condition that the main database is determined to be other databases except any one database; and under the condition that the main database is determined to be any one of the databases, determining the log database to be the log file supply equipment, wherein the data stored in the log database are all the archived log files generated by the distributed system.
In a second aspect of the embodiments of the present invention, there is provided a data backup apparatus, which is applied to any one of a plurality of databases included in a distributed system. The device includes: the device determining module is used for responding to the restart of any database and determining the log file supply device; the request sending module is used for sending an acquisition request to log file supply equipment according to a first filing log file corresponding to the latest disk refreshing operation executed by any database so as to acquire a second filing log file recorded with transaction information corresponding to data to be backed up; and the operation execution module is used for executing the disk refreshing operation aiming at the second filing log file so as to store the backup data into the disk of any database.
In an embodiment of the present invention, the request sending module is configured to send the acquisition request to the log file providing device according to an offset location pre-stored in any one of the databases, where the offset location indicates a location of the transaction information for the latest disk flushing operation in the first archive log file.
In another embodiment of the present invention, the request sending module includes: the log intercepting submodule is used for intercepting the first filing log file according to the offset position to obtain a log segment, and the transaction information in the log segment is the transaction information aiming at the executed disk refreshing operation; the parameter initialization submodule is used for initializing a preset parameter according to the log segment, and the preset parameter is used for recording the global identification of the transaction information aiming at the executed disk refreshing operation; and the request sending submodule is used for sending an acquisition request to the log file supply equipment according to the initialized preset parameters.
In another embodiment of the present invention, the operation execution module is configured to execute a disk-flushing operation for the second archive log file based on an idempotent mode.
In a further embodiment of the present invention, the data stored in the disk of any one of the databases is stored in a table format, and the data backup apparatus further includes a log generation module, configured to, in a process that any one of the databases backs up the data through the operation execution module: in response to completing the update operation to the structure of the table stored in the disk, a new archive log file is generated.
In a further embodiment of the present invention, the data backup apparatus further includes an offset location updating module, configured to update an offset location in a process that any one of the databases backs up data through the operation executing module; the offset location update module includes: the first obtaining submodule is used for obtaining the first operation time of the latest disk refreshing operation aiming at the filing log file and the disk refreshing position of the aimed transaction information in the third filing log file; the second obtaining submodule is used for obtaining second operation time of the latest disk refreshing operation aiming at the redo log file; and the position updating submodule is used for updating the offset position according to the first operation time, the second operation time and the brush disc position.
In a further embodiment of the present invention, the location updating sub-module is configured to update the offset location to be a starting location of the third filing log file when the first operation time is earlier than the second operation time; and under the condition that the first operation time is not earlier than the second operation time, updating the offset position to be the brushing disk position.
In a further embodiment of the present invention, the data backup apparatus further includes a file determining module, configured to determine whether a fourth archived log file that is not subjected to the disk-flushing operation is included in any one of the databases before the request sending module sends the acquisition request to the log file providing device; the operation execution module is further configured to, when the file determination module determines that the fourth archived log file exists, execute a disk flushing operation for the fourth archived log file, where, when the file determination module determines that the fourth archived log file exists, the request sending module sends the acquisition request after the transaction information in the fourth archived log file is executed with the disk flushing operation; after determining that the fourth archived log file does not exist, the request sending module sends an acquisition request according to the first archived log file aiming at the latest disk refreshing operation executed by any database before downtime.
In a further embodiment of the present invention, the device determination module includes: a master database determination submodule, configured to determine a master database of the plurality of databases in response to restart of any one of the databases; and a supply device determining submodule for determining the master database as the log file supply device in case of determining the master database as another database except any one of the databases; and the log database is determined to be the log file supply equipment under the condition that the main database is determined to be any one of the databases, wherein the data stored in the log database is all the archived log files generated by the distributed system.
In a third aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, implement the data backup method described above.
In a fourth aspect of embodiments of the present invention, there is provided a computing device comprising: one or more processors storing executable instructions; and one or more processors executing executable instructions to implement the data backup method described above.
According to the data backup method and device provided by the embodiment of the invention, when any one database is restarted, according to the first filing log file aimed at the latest disk refreshing operation, the data to be backed up which is not backed up to the local is accurately acquired from the log file supply equipment, so that the accurate positioning of the data which is not backed up can be realized after the database is down, and the data loss can be avoided under the condition that the data is not refreshed in time. Therefore, the embodiment of the invention can give consideration to both the processing efficiency of the database and the integrity of the data under the condition that the archiving log is not subjected to the disk-flushing operation in real time.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
FIG. 1 is a diagram schematically illustrating an application scenario of a data backup method, apparatus, medium, and computing device according to an embodiment of the present invention;
fig. 2 schematically shows a flow chart of a data backup method according to an embodiment of the present invention:
FIG. 3 schematically shows a flow diagram of a determining a log file provision apparatus according to an embodiment of the invention;
FIG. 4 is a schematic diagram schematically illustrating a principle of sending a fetch request to a log file provision apparatus according to an offset location, according to an embodiment of the present invention;
FIG. 5 schematically illustrates a schematic diagram of updating offset locations according to an embodiment of the invention;
FIG. 6 schematically shows a flow diagram of a data backup method according to another embodiment of the invention;
fig. 7 is a block diagram schematically showing the construction of a data backup apparatus according to an embodiment of the present invention;
FIG. 8 schematically shows a schematic view of a program product adapted to perform a data backup according to an embodiment of the present invention; and
FIG. 9 schematically illustrates a block diagram of a computing device adapted to perform a data backup in accordance with an embodiment of the present invention.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to the embodiment of the invention, a method, a device, a medium and a computing device for data backup are provided.
In this context, it is to be understood that the terminology involved and its explanations are as follows.
DDL, Data Definition Language, an abbreviation for Data Definition Language, refers to the creation, modification and deletion operations of objects such as tables, indexes and users in a database system using commands such as create, alter and drop.
DML, Data Manipulation Language, an abbreviation for Data management Language, refers to creating, modifying, and deleting operations of records/rows in a table using instet, update, delete, and other commands in a database system.
ACID, which refers to four characteristics that a Database Management System (DBMS) must possess in order to ensure the correctness and reliability of a transaction (transaction) during a writing or updating process: atomicity (or indivisible), Consistency (Consistency), Isolation (also called independence), and Durability (also called reliability).
OLTP, Online or Online Transaction Processing, is an abbreviation of Online Transaction Processing, and refers to Processing generally real-time job data in an Online Transaction manner through an information system, a computer network, and a database, unlike the operation manner of a large number of batches in an earlier conventional database system.
Group commit (group commit), which means to commit a plurality of transactions executed simultaneously or approximately simultaneously, and when the transactions are committed, the logs of the transactions are all persisted through one disk-flushing operation.
redox log: and the redo log file is used for recording the change of the transaction operation, and the recorded value is the value after the data is modified. For example, when the database is powered down, the InnoDB storage engine can use the redo log to restore to the moment before the power down, so as to ensure the integrity of the data.
Binlog, which records all operations that perform changes to the MySQL database.
Moreover, any number of elements in the drawings are by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.
Summary of The Invention
The MySQL database is the most popular open source database at present, and has been widely used worldwide. In the foreseeable future, the deployment scale of the MySQL database is required to be continuously expanded. However, MySQL has a transaction ACID constraint, which is specifically: MySQL is used as an OLTP database, and in order to ensure that data is not lost after a transaction is submitted, the 'double one' setting of MySQL parameters is required. MySQL "double one" setting means that the MySQL Server layer parameter sync _ binlog is set to 1, and the engine layer parameter innodb _ flush _ log _ at _ trx _ commit is also set to 1. The setting of the sync _ Binlog to 1 indicates that the Binlog files of the transactions in the transaction group need to be flushed through sync operation each time the transaction group is submitted in MySQL, and the Binlog files are guaranteed not to be lost due to the downtime of the server. The setting of the engine layer parameter to 1 indicates that when each transaction is submitted in an InNODB engine (one of the database engines of MySQL, which supports ACID transactions), a redo log file corresponding to each transaction needs to be flushed, so that the transaction is not lost due to the downtime of the server. Through the double-one device, the maximum data persistence guarantee of the MySQL high-availability instance can be realized.
The inventor finds that if the values of sync _ Binlog and innodb _ flush _ log _ at _ trx _ commit are set to values other than 1, the Binlog file and the redo log file are both written into the storage file when the transaction is committed, and only the disk-flushing operation is not executed. That is, the data of the part of the newly written file is not persisted, but is cached in a Cache memory (Cache) of the Linux kernel. At this time, if MySQL exits abnormally (for example, crashes due to the presence of a bug, is killed by the system due to memory overflow, or is considered to be killed), since the log is already written into the cache, the transaction data will not be lost. However, if the Linux server is down (for example, power is down or a Linux kernel bug occurs), the cache is released, so that the transaction data is lost.
When the Innodb _ flush _ method, which controls the opening and the flashing modes of the InnodB data file and the redo log file, takes the same value, the value of the Innodb _ flush _ log _ at _ trx _ commit affects the throughput of MySQL. Especially in the case of the Innodb _ flush _ method taking O _ DIECT and O _ DIECT _ NO _ FSYNC, when Innodb _ flush _ log _ at _ trx _ commit takes 0 and 2, the throughput of MySQL is significantly higher than when Innodb _ flush _ log _ at _ trx _ commit takes 1. Wherein, taking 0 for the Innodb _ flush _ log _ at _ trx _ commit indicates that the Linux background thread performs the disk-flushing operation of the redo log file, and taking 2 indicates that the disk-flushing operation is performed on the redo log file once every interval of Innodb _ flush _ at _ timeout seconds. In addition, the value of sync _ binlog also affects the throughput of MySQL. When sync _ binlog takes values other than 1, the throughput of MySQL is significantly greater than when sync _ binlog takes 1. Wherein, taking 0 for sync _ Binlog means that the Linux background thread automatically executes the disk-flushing operation of the Binlog file, and taking n as other positive integers except 1 for sync _ Binlog means that the Binlog file disk-flushing operation is performed after every n transaction groups are submitted.
The present inventors have also found that, when a certain database in a distributed system is down (i.e., a server where the database is located is down), because complete transaction data is generally stored in disks of other databases that are not down, it may be considered that after the database that is down is restarted, the transaction data is acquired by interacting with the database in which the complete transaction data is stored, and the transaction data is backed up, so as to avoid data loss. The starting point of obtaining the transaction data can be determined through the first segment content in the Binlog file for the latest disk refreshing operation before downtime, so that the situation that repeated playback of backed-up data is possible when the transaction data stored in other databases is backed up is avoided.
Having described the general principles of the invention, various non-limiting embodiments of the invention are described in detail below.
Application scene overview
Referring first to fig. 1, fig. 1 schematically illustrates an application scenario diagram of a data backup method, apparatus, medium, and computing device according to an embodiment of the present invention.
As shown in fig. 1, the application scenario 100 includes a distributed system composed of a plurality of data servers 101, 102, 103, and the plurality of data servers 101, 102, 103 can interact with each other through a network 104. The network may be a local area network, a wide area network, a mobile internet, etc.
In one embodiment, the data Server is provided with a database management system, which may be MySQL, SQL Server, or the like. In an embodiment, the database management system is configured to execute a transaction, and specifically may perform operations such as creating, modifying, and deleting objects such as tables, indexes, and users on a storage space in the data server, or perform operations such as creating, modifying, and deleting records/lines in tables on the storage space.
In one embodiment, the plurality of data servers in the distributed system may include, for example, a master library obtained by election according to a preset election rule, and the other data servers except the master library are slave libraries serving as standby databases. After the master library performs a transaction (i.e., create, modify, and delete operations on stored data), a Binlog file is generated and broadcast to the slave libraries. Each slave library can play back transactions (i.e. perform corresponding create, modify and delete operations on locally stored data) according to the received Binlog file to ensure consistency of the data stored by each slave library with the data stored by the master library. After each data server executes or plays back a transaction, in order to ensure the recoverability of data, a redo log file needs to be generated in the process of executing the transaction. The disk refreshing operation can be performed on the generated redo log file and the received Binlog file according to preset rules so as to store the redo log file and the Binlog file in the disk, and after the data server is crashed unexpectedly and restarted, data recovery can be performed according to the log stored in the disk.
According to the embodiment of the invention, if the Binlog file or the redo log file is not flushed in real time, the data server which is down may lose the transaction data which is not flushed under the condition that the master library or the slave library is down due to power failure or Linux kernel bug. However, as each data server which is not down stores the Binlog files of all the transactions. Therefore, after the data server is restarted, the Binlog file can be acquired from the data server which is not down so as to continue to play back the transaction, and data loss is avoided.
In an embodiment, in order to avoid data loss caused by downtime of all data servers, as shown in fig. 1, for example, a log persistent node 105 may be further included in the application scenario 100, and the log persistent node 105 is only used for storing the received Binlog file. Accordingly, the master library, after performing the transaction to generate the Binlog file, may also broadcast the Binlog file to the log durable node 105 over the network 104. Thus, after any one of all the data servers which are down is restarted, the Binlog file which is not stored in the disk can be obtained from the log persistent node 105.
It is understood that the data backup method provided by the present invention can be executed by any data server in the distributed system. Accordingly, the data backup device provided by the invention can be arranged in any data server.
Exemplary method
In the following, a data backup method according to an exemplary embodiment of the present invention is described with reference to fig. 2 to 6 in conjunction with the application scenario of fig. 1. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present invention, and the embodiments of the present invention are not limited in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.
Fig. 2 schematically shows a flow chart of a data backup method according to an embodiment of the present invention.
As shown in fig. 2, the data backup method of this embodiment may include operations S210 to S230. The data backup method can be applied to any database in a plurality of databases included in a distributed system. The database here may be the data server described in fig. 1, or the like.
According to the embodiment of the invention, in order to ensure the efficiency of database transaction processing, the archive log and the submitted data in any database in the embodiment are stored in the disk by adopting other update rules except the real-time update rule.
Illustratively, the values of the parameter inbo _ flush _ log _ at _ trx _ commit and the parameter sync _ binlog described above are all values other than 1, so as to perform a disk-flushing operation on the archive log and the submitted data by using a rule of non-real-time update.
In operation S210, in response to a reboot of any one of the databases, a log file providing apparatus is determined.
According to the embodiment of the invention, the log file supply device can be any one of the databases which are not down in the distributed system. Alternatively, the log file provision device may be any database or server that stores a full archive log. The log file provision device may be determined by a distributed system, for example.
In operation S220, an acquisition request is sent to the log file providing apparatus according to the first archive log file for which the latest disk-flushing operation performed by any one of the databases is directed, so as to acquire a second archive log file in which transaction information for data to be backed up is recorded.
According to an embodiment of the present invention, the archive log file is specifically the aforementioned Binlog file. After any one of the databases is restarted, in order to ensure that the locally persistent data of the database is consistent with the data stored in other databases in the distributed system, the data that is not persisted in the downtime phase needs to be backed up to the local of the database. The data which is not persisted necessarily comprises transaction data recorded in an archive log file generated after a first archive log file for which the latest disk-flushing operation is executed before downtime. Accordingly, the operation S220 may transmit an identification of the first archival log file to the log file provision apparatus, so that the log file provision apparatus transmits the second archival log file according to the identification. The identification may be information such as a timestamp.
According to the embodiment of the invention, after the acquisition request is sent to the log file supplying device, the acquired second archived log file may include not only the archived log file generated after the first archived log file, but also the first archived log file, so as to avoid data loss caused by transaction data in the first archived log file not being persisted when the device is down.
According to the embodiment of the present invention, the execution of operation S220 is premised on that all the archived log files received by any one of the databases before receiving the first archived log file are persistent through the disk flushing operation, so as to ensure that all the data in any one of the databases is consistent with all the data in the database that is not down after any one of the databases performs the playback operation according to the obtained second archived log file. Through analysis of a source code layer, the working mechanism of the database can meet the requirement of the premise, because the database can force the disk-flushing operation on the redo log file corresponding to all the transactions submitted by the InnodB engine when the Binlog file currently executing the disk-flushing operation is closed or a new Binlog file is created and opened, and the operation is not influenced by the Innodb _ flush _ log _ at _ trx _ com parameter. Moreover, when the database closes the Binlog file of the currently executed disk-flushing operation, all the Binlog files in the database can be ensured to be stored in the disk without being influenced by the sync _ Binlog parameter.
For example, operation S220 may first query a first segment of content privous _ gtids of the down first archive log file, where transaction identifiers for all Binlog files generated before the first archive log file is generated are recorded in the first segment of content. And then, taking the transaction identifier in the first segment of content as the value of the gtid _ purge parameter, so as to initialize the gtid _ executed. Wherein, the gtid _ purge parameter records the Binlog transaction set which has been cleared, and the gtid _ executed is the transaction identification set executed by the MySQL database on the current instance, which actually contains all the transactions recorded in the Binlog. Any of the databases then sends the initialized gtid _ executed to the log file provision device. The log file supplying device may use, as a second archive log file, a Binlog file in which a transaction executed after the transaction recorded by the gtid _ executed is recorded, based on the initialized gtid _ executed, and may transmit the Binlog file to any one of the databases.
In operation S230, a disk-flushing operation is performed on the second archive log file to store the data to be backed up in the disk of any one of the databases.
According to an embodiment of the present invention, a data playback operation may be performed first according to the second archive log file. And after the data playback operation is executed, writing the second archive log file into a disk of any database.
According to the embodiment of the invention, the disk-flushing operation can be executed for the second archiving log file based on the idempotent mode, so that the disk-flushing operation can not jump out of the playback of the data operation recorded in the whole Binlog file due to the error of single data in the Binlog file.
Illustratively, the IDEMPOTENT mode may be implemented, for example, by taking the replication parameter slave _ exec _ mode of MySQL to an optional value IDEMPOTENT. The idempotent mode can ignore errors such as recording absence or repeated main keys and the like encountered by the database in the process of replaying the transaction Binlog, skip the error Binlog event and continue to execute other Binlog events recorded in the Binlog file. It should be noted that, here, the IDEMPOTENT mode is implemented by using the optional value IDEMPOTENT of the copy parameter slave _ exec _ mode, and the parameter slave _ skip _ errors is not used, because the latter skips the whole error transaction, i.e. skips the whole Binlog file, when encountering the error such as the record absence or the repeated primary key.
According to an embodiment of the present invention, data stored in the disks of any database is stored in a table format. In the process that any one of the databases backs up data or operates the data to generate an archive log file, the any one of the databases may generate a new archive log file in response to the completion of the update operation on the structure of the table stored in the disk, so as to facilitate the implementation of the idempotent mode. This is because the idempotent mode of the copy parameter slave _ exec _ mode cannot ignore errors of the DDL statement. For example, for a table creation operation, if there is already a table to be created, an error may still be reported during the playback operation of the transaction in the idempotent mode. By the operation of generating a new filing log file, any database can ensure that all events before the last log event recorded by each filing log file are not the update operation of the structure of the table stored in the disk, thereby ensuring the smooth execution of the disk-flushing operation of the whole filing log file.
Illustratively, the distributed system can adopt a framework of a plug-in storage engine, wherein the bottom storage engine is responsible for transaction processing and data persistence, the upper layer is responsible for business SQL analysis and execution, and the replication mechanism of a Binlog file is used for carrying out data increment synchronization and high service availability guarantee. For example, the distributed system relies on a MySQL Group Replication plug-in (MGR) to achieve data synchronization. The master library of the MGR instance may execute a Binlog rotate in the code processing flow of the group commit to generate a new archive log file when the DDL operation is committed or the slave library is finished replaying the DDL operation.
For example, after any of the databases obtains the second archive log file, for example, the value of the copy parameter slave _ exec _ mode may be set to IDEMPOTENT, and the value of the parameter gtid _ executed in the log file supply device may be obtained. The following code is then run: START SLAVE SQL _ read undo SQL _ enter _ GTIDS _ gtid _ set to replay the transactions recorded in the second archive log file and to store the Binlog file to disk. Any database can WAIT FOR completing the playback of the transaction recorded by the Binlog file and stopping the storage operation of the Binlog file through WAIT _ FOR _ EXECUTED _ GTID _ SET (GTID _ SET). Finally, the value of the copy parameter slave _ exec _ mode is adjusted to the default value STRICT.
In summary, the data backup method of the embodiment of the present invention can automatically obtain the complete archive log document that is not backed up when any server goes down and restarts under the condition that the data is stored in the disk by adopting other update rules except the real-time update rule, so that the integrity of the data can be ensured while the database processes the transaction with high efficiency.
Fig. 3 schematically shows a flow diagram of a determining a log file provision device according to an embodiment of the invention.
According to an embodiment of the present invention, in order to ensure the integrity of the acquired second archival log file, as shown in fig. 3, the aforementioned operation of determining the log file provision apparatus may include, for example, operations S311 to S314.
In operation S311, a master database among the plurality of databases is determined in response to a reboot of any one of the databases.
According to the embodiment of the invention, if the distributed system relies on the MGR to realize data synchronization, after the database where the MGR is located is down and restarted, the database restarts a mysqld (background service program) process to perform fault recovery. After the fault recovery is completed, the database can clear the information of the main library set before the downtime by executing the reset master, and any one of the databases executes a change master to operation to determine the information of the IP address, the Binlog file position and the like of the main library needing to be synchronized.
In operation S312, it is determined whether the master database is another database than any one of the above databases.
According to an embodiment of the present invention, after determining the master database, any of the databases may compare, for example, the IP address of the master database with its own IP address. If the two are consistent, the main database is determined to be any one of the databases, otherwise, the main database is determined to be other databases except any one of the databases.
In case that it is determined that the master database is a database other than any one of the above, operation S313 is performed to determine that the master database is the log file providing apparatus.
In case that it is determined that the master database is any one of the above databases, operation S314 is performed to determine the log database as the log file providing apparatus.
According to an embodiment of the present invention, the data stored in the log database is all archived log files generated by the distributed system. The log database group is a Binlog node and is realized by replacing a storage engine of an MySQL bottom layer from InNODB into a black hole (Black hole) engine. The Binlog node improves the SQL processing performance of MySQL by removing the business data processing and only saving the Binlog file, and the Binlog file is reserved to be used as an incremental data source of other databases. For example, the log database may act as a relay node, or as an Arbiter node for the MGR. The Arbiter node is initially set when all databases are down, and when any one of the databases is restarted, the Arbiter node is used as a voting node, and the restarted database is elected as a main database. Accordingly, after the main library processes the transaction to generate the Binlog file, the Binlog file can be broadcast to the log database so as to ensure that the log database stores the full amount of Binlog files.
In summary, the embodiment of the present invention can ensure that submitted transaction data is not lost when all databases in the distributed system are down by introducing the log database, thereby improving the reliability of the distributed system.
Fig. 4 schematically shows a schematic diagram of a principle of sending a fetch request to a log file provision apparatus according to an offset position according to an embodiment of the present invention.
According to an embodiment of the present invention, the operation S220 may also send an acquisition request to the log file providing device according to an offset location pre-stored in any database, for example. Wherein the offset location indicates a location in the first archive log file of the transaction information for which the most recent disk-flushing operation is directed.
Illustratively, the offset position setting may be realized by, for example: using the starting location of the first archived log file as a checkpoint, only the two first Binlog information in the first archived log file is retained: format _ desc and Previous _ gtids. And intercepting other information, and initializing the parameter gtid _ executed to ensure that the transaction corresponding to the transaction identifier in the new initialized gtid _ executed is persistent through a disk refreshing operation. Accordingly, the offset location is the location between the first transaction information and the two first Binlog information in the first archive log file.
According to the embodiment of the invention, more transaction information is recorded in the first filing log file, and any database is down after part of the recorded transaction information is played back, after any database is restarted, if the offset position is determined by only retaining two initial Binlog information in the first filing log file, the played back part of the transaction information may need to be played back repeatedly, which may cause unnecessary loss of computing resources.
In this regard, the embodiment of the present invention may continuously advance the offset position in the process of playing back the transaction information in the archive log file. In order to facilitate the promotion of the offset position, a variable which can be dynamically updated and supports persistence may be added in the embodiment, a value of the variable is the offset position, and the value of the variable is updated and persisted once every time any one of the databases finishes playing back the transaction information. When sending an acquisition request to the log file providing device, as shown in fig. 4, the first archive log file 410 may be intercepted according to the offset location to obtain the log segment 420. The transaction information in the log segment is the transaction information for which the disk-flushing operation has been performed. And then, initializing a preset parameter for recording the global identification of the transaction information for which the executed disk-flushing operation is directed according to the intercepted log segment to obtain an initialized preset parameter 430. Finally, an acquisition request 440 is generated according to the initialized preset parameters 430, and the acquisition request 440 is sent to the log file providing device. The preset parameter may be the aforementioned gtid _ executed.
Illustratively, the variable that can be dynamically updated and supports persistence may be, for example, a parameter last _ binlog _ persistence _ offset supported by version MySQL 8.0, and the value of the variable needs to be set to 0 after each new archive log file is generated. This variable may be persisted, for example, by executing the following commands: set persistence last _ binlog _ persistence _ offset is offset 1. Accordingly, when the first archive log file 410 is intercepted, the value of the variable may be read first, and the first archive log file may be intercepted according to the read value to obtain the log segment. Correspondingly, the embodiment may specifically be that the preset parameter is initialized according to the identifier of the transaction information recorded in the intercepted log segment.
Fig. 5 schematically shows a principle view of updating an offset position according to an embodiment of the present invention.
According to the embodiment of the invention, the offset position can be continuously pushed and updated according to the disk refreshing operation of the filing log file (Binlog file) and the redo log file (redo log file), so that the accuracy of the second filing log file obtained after the restart of any database is ensured, and the repeated playback of the transaction information is reduced. Accordingly, the data backup method described in the foregoing embodiment further includes an operation of updating the offset position in the process of backing up data in any database.
According to the embodiment of the present invention, as shown in fig. 5, the embodiment can record, in real time, the operation time of the last playback operation performed on the transaction information described in the archive log file and the end position of the transaction information to which the last playback operation is directed. A first operation time 501 of the archive log file is obtained (i.e. the first operation time of the latest disk-flushing operation for the archive log file) and the disk-flushing position of the transaction information for the last playback operation in the archive log file (the third archive log file) to which the transaction information belongs. Meanwhile, the operation time of the last disk refreshing of the redo log file is recorded in real time, and a second operation time 502 of the disk refreshing of the redo log file (i.e., a second operation time of the latest disk refreshing operation for the redo log file) is obtained. Accordingly, when updating the offset position, a first operation time of the latest disk-flushing operation for the archive log file and a disk-flushing position of the targeted transaction information in the third archive log file may be obtained first, and a second operation time of the latest disk-flushing operation for the redo log file may be obtained. And then updating the offset position according to the first operation time, the second operation time and the brushing position.
Illustratively, the first operating time and the second operating time may be recorded according to a system time of either database.
For example, as shown in fig. 5, when the offset position is updated according to the first operation time, the second operation time and the brush position, operation S541 may be performed first to determine whether the first operation time is earlier than the second operation time. The redo log file is generated in the process of executing the transaction information, and the time for storing the redo log file into the disk is earlier than the time for storing the archive log file into the disk. Therefore, when the first operation time is earlier than the second operation time, the archive log indicating the transaction information targeted by the latest disk flushing operation is not stored in the disk, but the redo log generated by the latest disk flushing operation is stored in the disk. Therefore, when the first operation time is earlier than the second operation time, the embodiment performs operation S542 to update the offset location to the starting location of the third archive log file, so that the transaction information targeted by the latest disk flushing operation can be replayed, so as to store the archive log in the disk, and avoid the loss of the archive log. When the first operation time is not earlier than the second operation time, the embodiment performs operation S543 of updating the offset position to the swashplate position, so as to reduce repeated playback of the transaction information as much as possible.
Fig. 6 schematically shows a flowchart of a data backup method according to another embodiment of the present invention.
According to the embodiment of the invention, when any database is a slave database before downtime, the situation that the archived log files are down before playback is possibly caused after the archived log files broadcasted by the master database are received. In this case, since the generation time of the first archived log file is earlier than the latest archived log file received before the downtime, if the second archived log file is acquired according to the defined archived log file, any one of the databases will repeatedly acquire the latest archived log file received before the downtime. In order to avoid this situation of repeatedly acquiring the archive log file, as shown in fig. 6, the data backup method of this embodiment includes operations S650 to S660 in addition to operations S210 to S230. Operation S650 is performed between operation S210 and operation S220.
In operation S650, it is determined whether a fourth archive log file, on which a disk-flushing operation is not performed, is included in any of the databases.
According to an embodiment of the present invention, any of the databases may determine whether there is a fourth archive log file for which a disk-flushing operation is not performed according to whether there is a binary log in the local relay log file. The archive log is copied to the local relay log file after the archive log file broadcasted by the master library is received by the slave library, and the archive log is read from the relay log when the transaction information in the archive log file is replayed. Wherein, the relay log file is a local file of the binary log of the slave library for recording the master library.
In a case where it is determined that the fourth archive log file exists, operation S660 is performed, and a disk-flushing operation is performed with respect to the fourth archive file. And performs operation S220 after performing a flushing operation on all the transaction information in the fourth archive log file through operation S660. At this time, the first archive log file in operation S220 is the fourth archive log file targeted by operation S660.
Upon determining that the fourth archive log file does not exist, operation S220 is directly performed. At this time, the obtaining request sent in operation S220 is sent according to the first archive log file for the latest disk-flushing operation executed before downtime of any database.
In summary, the data backup method of the embodiment of the present invention can weaken the constraint of data persistence when a single database submits transactions, and can recover the lost submitted transaction data according to the archive log files provided by other databases after downtime. Therefore, the transaction processing capacity of the MySQL instance can be effectively improved, and meanwhile, the risk of losing the submitted transaction data is solved. Thus, a higher performance relational database system based on distributed failure recovery can be realized.
Exemplary devices
Having introduced the data backup method according to an exemplary embodiment of the present invention, the structure of the data backup apparatus according to an exemplary embodiment of the present invention will be described in detail with reference to fig. 7.
Fig. 7 schematically shows a block diagram of a data backup apparatus according to an embodiment of the present invention.
As shown in fig. 7, the data backup apparatus 700 of this embodiment includes a device determination module 710, a request transmission module 720, and an operation execution module 730. The data backup apparatus 700 may be applied to any one of a plurality of databases included in a distributed system.
The device determination module 710 is configured to determine a log file serving device in response to a reboot of any of the databases. In an embodiment, the device determining module 710 may be configured to perform operation S210 described in fig. 2, for example, and is not described herein again.
The request sending module 720 is configured to send an obtaining request to the log file providing device according to the first archive log file for which the latest disk-flushing operation executed by any one of the databases is targeted, so as to obtain a second archive log file in which transaction information for the data to be backed up is recorded. In an embodiment, the request sending module 720 may be configured to perform operation S220 described in fig. 2, for example, and is not described herein again.
The operation executing module 730 is configured to execute a disk-flushing operation on the second archive log file, so as to store the backup data in a disk of any one of the databases. In an embodiment, the operation performing module 730 may be configured to perform the operation S230 described in fig. 2, for example, and is not described herein again.
According to an embodiment of the present invention, the request sending module 720 is configured to send the obtaining request to the log file providing device according to an offset location pre-stored in any one of the databases, where the offset location indicates a location of the transaction information for the latest disk flushing operation in the first archive log file.
According to an embodiment of the present invention, the request sending module 720 includes: the log intercepting submodule, the parameter initializing submodule and the request sending submodule. And the log intercepting submodule is used for intercepting the first filing log file according to the offset position to obtain a log segment, and the transaction information in the log segment is the transaction information aiming at the executed disk refreshing operation. The parameter initialization submodule is used for initializing a preset parameter according to the log segment, and the preset parameter is used for recording the global identification of the transaction information aiming at the executed disk refreshing operation. And the request sending submodule is used for sending an acquisition request to the log file supply equipment according to the initialized preset parameters.
The operation execution module 730 is configured to execute a disk-flushing operation for the second archive log file based on the idempotent mode according to an embodiment of the present invention.
According to an embodiment of the present invention, the data stored in the disk of any database is stored in a table format, and the data backup apparatus 700 further includes a log generation module, configured to, in a process that any database backs up data through the operation execution module: in response to completing the update operation to the structure of the table stored in the disk, a new archive log file is generated.
According to an embodiment of the present invention, the data backup apparatus 700 further includes an offset updating module, configured to update an offset during the process of backing up data by any one of the databases through the operation executing module. The offset location update module includes a first acquisition submodule, a second acquisition submodule, and a location update submodule. The first obtaining submodule is used for obtaining the first operation time of the latest disk refreshing operation aiming at the filing log file and the disk refreshing position of the aimed transaction information in the affiliated third filing log file. The second obtaining submodule is used for obtaining a second operation time of the latest disk refreshing operation aiming at the redo log file. And the position updating submodule is used for updating the offset position according to the first operation time, the second operation time and the brush disc position.
According to an embodiment of the present invention, the location updating sub-module is configured to update the offset location to be a starting location of the third filing log file when the first operation time is earlier than the second operation time; and under the condition that the first operation time is not earlier than the second operation time, updating the offset position to be the brushing disk position.
According to an embodiment of the present invention, the data backup apparatus 700 further includes a file determining module, configured to determine whether a fourth archived log file that is not executed with a disk-flushing operation is included in any one of the databases before the request sending module sends the obtaining request to the log file providing device. The operation executing module 730 is further configured to, when the file determining module determines that the fourth archive log file exists, execute a disk-flushing operation on the fourth archive log file, and when the file determining module determines that the fourth archive log file exists, the request sending module 720 sends the acquisition request after all the transaction information in the fourth archive log file is executed with the disk-flushing operation. After determining that the fourth archived log file does not exist, the request sending module 720 sends an acquisition request according to the first archived log file for the latest disk flushing operation executed by any one of the databases before downtime. In an embodiment, the file determining module and the operation executing module 730 may be configured to execute operations S650 to S660 described in fig. 6, for example, and are not described herein again.
The equipment determination module 710 includes a main database determination sub-module and a supply equipment determination sub-module according to an embodiment of the present invention. And the master database determining submodule is used for determining a master database in the plurality of databases in response to the restart of any one database. The supply equipment determining submodule is used for determining the main database as the log file supply equipment under the condition that the main database is determined to be other databases except any one database; and a log file supplying device for determining the log database as the log file supplying device in case that the master database is determined as any one of the above databases. The data stored in the log database is all archived log files generated by the distributed system. In an embodiment, the master database determining submodule and the supply device determining submodule may be configured to perform operations S311 to S312 described in fig. 3, respectively, which is not described herein again.
Exemplary Medium
Having described the data backup apparatus according to the exemplary embodiment of the present invention, a program product according to an exemplary embodiment of the present invention, which is adapted to perform data backup, will be described with reference to fig. 8.
Fig. 8 schematically shows a schematic view of a program product adapted to perform a data backup according to an embodiment of the present invention.
In some possible embodiments, aspects of the present invention may also be implemented in the form of a program product including program code for causing a computing device to perform steps in a data processing method for diagrams according to various exemplary embodiments of the present invention described in the above section "exemplary method" of this specification, when the program product is run on the computing device, for example, the computing device may perform operation S210 as shown in fig. 2: in response to a reboot of any of the databases, determining a log file serving device; operation S220: according to a first filing log file which is aimed at by the latest disk refreshing operation executed by any database, sending an acquisition request to log file supply equipment to acquire a second filing log file which is recorded with transaction information aiming at data to be backed up; operation S230: and executing a disk refreshing operation aiming at the second filing log file so as to store the data to be backed up into a disk of any database.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As shown in fig. 8, a program product 80 for data processing of charts according to an embodiment of the present invention is depicted, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Exemplary computing device
Having described the methods, media, and apparatus of exemplary embodiments of the present invention, a computing device suitable for performing a data backup method of exemplary embodiments of the present invention is described next with reference to fig. 9.
FIG. 9 schematically illustrates a block diagram of a computing device adapted to perform a data backup in accordance with an embodiment of the present invention.
The embodiment of the invention also provides the computing equipment. As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
In some possible embodiments, a computing device according to the present invention may include at least one processing unit, and at least one memory unit. Wherein the storage unit stores program code which, when executed by the processing unit, causes the processing unit to perform the steps in the information presentation methods according to various exemplary embodiments of the present invention described in the above section "exemplary methods" of this specification. For example, the processing unit may perform operation S210 as shown in fig. 2: in response to a reboot of any of the databases, determining a log file serving device; operation S220: according to a first filing log file which is aimed at by the latest disk refreshing operation executed by any database, sending an acquisition request to log file supply equipment to acquire a second filing log file which is recorded with transaction information aiming at data to be backed up; operation S230: and executing a disk refreshing operation aiming at the second filing log file so as to store the data to be backed up into a disk of any database.
A computing device 90 for backing up data according to this embodiment of the invention is described below with reference to fig. 9. The computing device 90 shown in FIG. 9 is only one example and should not be taken to limit the scope of use and functionality of embodiments of the present invention.
As shown in fig. 9, computing device 90 is embodied in the form of a general purpose computing device. Components of computing device 90 may include, but are not limited to: the at least one processing unit 901, the at least one memory unit 902, and the bus 903 connecting the various system components (including the memory unit 902 and the processing unit 901).
Bus 903 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures.
The storage unit 902 may include readable media in the form of volatile memory, such as a Random Access Memory (RAM)9021 and/or a cache memory 9022, and may further include a Read Only Memory (ROM) 9023.
Storage unit 902 may also include a program/utility 9025 having a set (at least one) of program modules 9024, such program modules 9024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Computing device 90 may also communicate with one or more external devices 904 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with computing device 90, and/or with any devices (e.g., router, modem, etc.) that enable computing device 90 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/0) interface 905. Moreover, computing device 90 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via network adapter 906. As shown, network adapter 906 communicates with the other modules of computing device 90 via bus 903. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computing device 90, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the apparatus are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (12)

1. A data backup method is applied to any one of a plurality of databases included in a distributed system, and comprises the following steps:
in response to a reboot of either of the databases, determining a log file serving device;
according to a first filing log file which is aimed at by the latest disk refreshing operation executed by any database, sending an acquisition request to the log file supply equipment to acquire a second filing log file which is recorded with transaction information aiming at data to be backed up; and
executing a disk-flushing operation aiming at the second filing log file so as to store the data to be backed up into a disk of any database,
and storing the filing log and the submitted data in any database into the disk by adopting other updating rules except the real-time updating rule.
2. The method of claim 1, wherein sending a fetch request to the log file provision device comprises:
sending the acquisition request to the log file supply device according to an offset position pre-stored in any one of the databases,
wherein the offset location indicates a location in the first archive log file of transaction information for which the most recent disk-flushing operation is directed.
3. The method of claim 2, wherein the sending the acquisition request to the log file provision device according to an offset location pre-stored by the any one database comprises:
intercepting the first filing log file according to the offset position to obtain a log segment, wherein the transaction information in the log segment is the transaction information for the executed disk refreshing operation;
initializing a preset parameter according to the log segment, wherein the preset parameter is used for recording a global identifier of transaction information for which the executed disk refreshing operation is performed; and
and sending an acquisition request to the log file supply equipment according to the initialized preset parameters.
4. The method of any of claims 1-3, wherein the performing a flush operation for the second archival log file comprises:
performing a flush operation for the second archive log file based on an idempotent mode.
5. The method of claim 4, wherein the data stored in the disks of any of the databases is stored in a table format; the method also comprises the following steps in the process of backing up the data in any database:
and generating a new archive log file in response to the completion of the update operation on the structure of the table stored in the disk.
6. The method of claim 2, further comprising: updating the offset position in the process of backing up data in any database; updating the offset position comprises:
acquiring first operation time of the latest disk refreshing operation aiming at the filing log file and a disk refreshing position of the aimed transaction information in a third filing log file;
acquiring second operation time of the latest disk refreshing operation aiming at the redo log file; and
and updating the offset position according to the first operation time, the second operation time and the brushing position.
7. The method of claim 6, wherein the updating the offset position based on the first operating time, the second operating time, and the brush tray position comprises:
updating the offset location to a starting location of the third archiving log file if the first operating time is earlier than the second operating time;
and updating the offset position to be the brushing disk position when the first operation time is not earlier than the second operation time.
8. The method of claim 1, further comprising, prior to sending a fetch request to the log file provision device:
determining whether a fourth archive log file for which a disk-flushing operation has not been performed is included in the any database; and
performing a flush operation for the fourth archive file in the event that it is determined that the fourth archive log file exists,
under the condition that the fourth archiving log file is determined to exist, the obtaining request is sent after the transaction information in the fourth archiving log file is executed with a flash operation; and after determining that the fourth archived log file does not exist, the acquisition request is sent according to the first archived log file aiming at the latest disk refreshing operation executed by any database before downtime.
9. The method of claim 1, wherein said determining a log file serving device in response to a reboot of said any database comprises:
in response to a reboot of any of the databases, determining a master database of the plurality of databases;
determining the master database as the log file provision apparatus in a case where it is determined that the master database is a database other than the any one database; and
determining a log database as the log file supplying apparatus in a case where the master database is determined as the any one database,
and the data stored in the log database is all archived log files generated by the distributed system.
10. A data backup apparatus applied to any one of a plurality of databases included in a distributed system, the apparatus comprising:
the device determining module is used for responding to the restart of any database and determining the log file supply device;
the request sending module is used for sending an acquisition request to the log file supply equipment according to a first filing log file aiming at the latest disk refreshing operation executed by any database so as to acquire a second filing log file recorded with transaction information aiming at the data to be backed up; and
and the operation execution module is used for executing a disk refreshing operation aiming at the second filing log file so as to store the backup data into a disk of any database.
11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, implement a method according to any one of claims 1 to 9.
12. A computing device, comprising:
one or more memories storing executable instructions; and
one or more processors executing the executable instructions to implement the method of any one of claims 1-9.
CN202011199633.1A 2020-10-30 2020-10-30 Data backup method, device, medium and computing equipment Active CN112256485B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011199633.1A CN112256485B (en) 2020-10-30 2020-10-30 Data backup method, device, medium and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011199633.1A CN112256485B (en) 2020-10-30 2020-10-30 Data backup method, device, medium and computing equipment

Publications (2)

Publication Number Publication Date
CN112256485A true CN112256485A (en) 2021-01-22
CN112256485B CN112256485B (en) 2023-08-04

Family

ID=74267238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011199633.1A Active CN112256485B (en) 2020-10-30 2020-10-30 Data backup method, device, medium and computing equipment

Country Status (1)

Country Link
CN (1) CN112256485B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112791415A (en) * 2021-01-26 2021-05-14 广州心源互动科技有限公司 Game data storage method and device
CN114647624A (en) * 2022-05-11 2022-06-21 成都云祺科技有限公司 Method, system and storage medium for capturing database consistent point in block-level CDP
CN115202588A (en) * 2022-09-14 2022-10-18 云和恩墨(北京)信息技术有限公司 Data storage method and device and data recovery method and device

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7260590B1 (en) * 2000-12-06 2007-08-21 Cisco Technology, Inc. Streamed database archival process with background synchronization
CN101460930A (en) * 2005-05-06 2009-06-17 微软公司 Maintenance of link level consistency between database and file system
CN105955843A (en) * 2016-04-21 2016-09-21 久盈世纪(北京)科技有限公司 Method and device used for database recovery
US20160321142A1 (en) * 2015-04-28 2016-11-03 International Business Machines Corporation Database recovery and index rebuilds
CN106407356A (en) * 2016-09-07 2017-02-15 网易(杭州)网络有限公司 Data backup method and device
CN109542682A (en) * 2018-11-16 2019-03-29 上海达梦数据库有限公司 A kind of data back up method, device, equipment and storage medium
CN110209735A (en) * 2019-05-05 2019-09-06 深圳市腾讯计算机***有限公司 Database backup method, calculates equipment and storage medium at DB Backup device
CN110249321A (en) * 2017-09-29 2019-09-17 甲骨文国际公司 For the system and method that capture change data use from distributed data source for heterogeneous target
CN110704242A (en) * 2019-09-24 2020-01-17 上海爱数信息技术股份有限公司 Continuous data protection system and method based on Oracle log capture
CN111177161A (en) * 2019-11-07 2020-05-19 腾讯科技(深圳)有限公司 Data processing method and device, computing equipment and storage medium
CN111611107A (en) * 2020-05-21 2020-09-01 云和恩墨(北京)信息技术有限公司 Method and device for acquiring database logs
CN111694806A (en) * 2020-06-05 2020-09-22 上海达梦数据库有限公司 Transaction log caching method, device, equipment and storage medium
CN111813607A (en) * 2020-09-08 2020-10-23 北京优炫软件股份有限公司 Database cluster recovery log processing system based on memory fusion

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7260590B1 (en) * 2000-12-06 2007-08-21 Cisco Technology, Inc. Streamed database archival process with background synchronization
CN101460930A (en) * 2005-05-06 2009-06-17 微软公司 Maintenance of link level consistency between database and file system
US20160321142A1 (en) * 2015-04-28 2016-11-03 International Business Machines Corporation Database recovery and index rebuilds
CN105955843A (en) * 2016-04-21 2016-09-21 久盈世纪(北京)科技有限公司 Method and device used for database recovery
CN106407356A (en) * 2016-09-07 2017-02-15 网易(杭州)网络有限公司 Data backup method and device
CN110249321A (en) * 2017-09-29 2019-09-17 甲骨文国际公司 For the system and method that capture change data use from distributed data source for heterogeneous target
CN109542682A (en) * 2018-11-16 2019-03-29 上海达梦数据库有限公司 A kind of data back up method, device, equipment and storage medium
CN110209735A (en) * 2019-05-05 2019-09-06 深圳市腾讯计算机***有限公司 Database backup method, calculates equipment and storage medium at DB Backup device
CN110704242A (en) * 2019-09-24 2020-01-17 上海爱数信息技术股份有限公司 Continuous data protection system and method based on Oracle log capture
CN111177161A (en) * 2019-11-07 2020-05-19 腾讯科技(深圳)有限公司 Data processing method and device, computing equipment and storage medium
CN111611107A (en) * 2020-05-21 2020-09-01 云和恩墨(北京)信息技术有限公司 Method and device for acquiring database logs
CN111694806A (en) * 2020-06-05 2020-09-22 上海达梦数据库有限公司 Transaction log caching method, device, equipment and storage medium
CN111813607A (en) * 2020-09-08 2020-10-23 北京优炫软件股份有限公司 Database cluster recovery log processing system based on memory fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王超;祝永志;: "Oracle数据库非归档模式重做日志恢复方法", 微型机与应用, no. 10, pages 83 - 85 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112791415A (en) * 2021-01-26 2021-05-14 广州心源互动科技有限公司 Game data storage method and device
CN114647624A (en) * 2022-05-11 2022-06-21 成都云祺科技有限公司 Method, system and storage medium for capturing database consistent point in block-level CDP
CN114647624B (en) * 2022-05-11 2022-08-02 成都云祺科技有限公司 Method, system and storage medium for capturing database consistent point in block-level CDP
CN115202588A (en) * 2022-09-14 2022-10-18 云和恩墨(北京)信息技术有限公司 Data storage method and device and data recovery method and device
CN115202588B (en) * 2022-09-14 2022-12-27 本原数据(北京)信息技术有限公司 Data storage method and device and data recovery method and device

Also Published As

Publication number Publication date
CN112256485B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
US9575849B2 (en) Synchronized backup and recovery of database systems
CN112256485B (en) Data backup method, device, medium and computing equipment
Zhou et al. Foundationdb: A distributed unbundled transactional key value store
US6578041B1 (en) High speed on-line backup when using logical log operations
US8321377B2 (en) Creating host-level application-consistent backups of virtual machines
US7613743B1 (en) Methods and apparatuses for data protection
US7779295B1 (en) Method and apparatus for creating and using persistent images of distributed shared memory segments and in-memory checkpoints
US5745753A (en) Remote duplicate database facility with database replication support for online DDL operations
US6678809B1 (en) Write-ahead log in directory management for concurrent I/O access for block storage
US7996363B2 (en) Real-time apply mechanism in standby database environments
CN113396407A (en) System and method for augmenting database applications using blockchain techniques
US20050283504A1 (en) Disaster recovery system suitable for database system
JP2002505768A (en) Method and system for reconstructing the state of a computation
US20060123211A1 (en) Method for optimizing a snapshot operation on a file basis
CN111752901B (en) Index creation method and device, electronic equipment and storage medium
US7523204B2 (en) Coordinated quiesce of a distributed file system
US20230315713A1 (en) Operation request processing method, apparatus, device, readable storage medium, and system
WO2020040958A1 (en) Providing consistent database recovery after database failure for distributed databases with non-durable storage leveraging background synchronization point
CA2167902A1 (en) Remote duplicate database facility with database replication support for online ddl operations
WO2023111910A1 (en) Rolling back database transaction
JPH10289217A (en) Log stream management system
US11226875B2 (en) System halt event recovery
Zhou et al. FoundationDB: A Distributed Key-Value Store
US11681631B2 (en) Write-behind optimization of covering cache
US11301341B2 (en) Replication system takeover with handshake

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant