CN116756116A - Distributed database mode changing method based on delay migration - Google Patents

Distributed database mode changing method based on delay migration Download PDF

Info

Publication number
CN116756116A
CN116756116A CN202310624507.3A CN202310624507A CN116756116A CN 116756116 A CN116756116 A CN 116756116A CN 202310624507 A CN202310624507 A CN 202310624507A CN 116756116 A CN116756116 A CN 116756116A
Authority
CN
China
Prior art keywords
data
new
old
distributed database
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310624507.3A
Other languages
Chinese (zh)
Inventor
李辉
曾智林
孔涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202310624507.3A priority Critical patent/CN116756116A/en
Publication of CN116756116A publication Critical patent/CN116756116A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for changing a mode of a distributed database based on delay, which mainly solves the problem that the current online mode of the distributed database is changed only by allowing a new mode to provide service after all data of a source table are synchronized. The implementation scheme is as follows: modifying SQL engines of the distributed database through two stages, namely, mainly processing a mode change request in a first stage and initializing a new mode; the second stage mainly processes the request of the user on the new mode, in this stage, the data related to the request of the user is firstly migrated from the old mode to the new mode, and then the new mode is used for processing the request of the user, so as to provide the related service. The system can immediately use the new mode only by simple initialization after receiving the mode change request, avoids long-time waiting for data migration, can perform special optimization processing aiming at the mode change type, improves the migration efficiency, effectively reduces the time delay of user inquiry, and can be used for online upgrading and updating of application programs.

Description

Distributed database mode changing method based on delay migration
Technical Field
The invention belongs to the technical field of databases, and particularly relates to a distributed database mode changing method which can be used for online upgrading and updating of application programs.
Background
With the increasing demands for high availability, high expansibility and high reliability of data, more and more applications select a NewSQL type of distributed database as a back-end database instance. In the update iteration of these applications, the requirements of the front-end application are constantly changing, and as a formalized language describing the database structure, i.e. the database schema, is also constantly changing. Database schema changes, also known as schema migration, are typically implemented by means of the data definition language DDL. To avoid the impact of lock table, offline schema changes on traffic, conventional mainstream relational database management systems RDBMS use online schema change techniques or external migration tools to conduct schema change operations.
The online mode change techniques proposed in European patent EP3213229A1 and EP2418591A1, and the open source tools pt-osc and gh-osc provided by gitsub, are all capable of online mode change, and their general flow is similar: firstly, a system receives DDL statement with changed modes, a new mode and a table or materialized view corresponding to the new mode are registered after query processing, the table is also called a shadow table, and the new mode does not provide service at the moment; the system then copies the source table in the background, i.e., migrates the data on the old schema to the shadow table. Since the source table can still provide service when not offline, in order to ensure the consistency of the data, when the system is processing the request for changing the source table data, the modified data needs to be synchronized to the shadow table, and the synchronization process is realized by means of a trigger or log propagation mode bound on the source table. After all data copying and migration are completed, the system locks the source table and renames the shadow table to complete the switching of the new mode and the old mode. However, the techniques or tools described above only allow the new schema to be serviced after all the data in the source tables has been synchronized, and if the amount of data in the tables for the old schema is large, the schema change takes more time. This requires waiting for the code update of the front-end application instance to be deployed after the back-end database schema change is complete, which is detrimental to online upgrades or updates of the application.
NewSQL is a modern relational distributed database that integrates the features of non-relational databases NoSQL and RDBMS, and aims to provide the same expansion performance under NoSQL systems for online transaction OLTP read-write loads, while providing atomicity, consistency, isolation, and persistence guarantees for transactions. The concept of ***, spandex, is a representative and precursor in NewSQL, inspiring a distributed NewSQL database such as CockroachDB, tiDB, kaiwuDB. They are characterized by: 1. the database is partitioned into disjoint data sets, i.e., partitions or slices, to support horizontal expansion using a distributed, shared-nothing, storage-compute-split, fully new architecture. 2. Based on the distributed consistency and consensus algorithm, the log is synchronously redone among the multiple copies, thereby ensuring that the data is consistent among the multiple copies and realizing truly high availability and high reliability. 3. The multi-version concurrency control MVCC protocol or the combination scheme of the two-phase lock 2PL protocol and MVCC is mostly used for concurrency control. However, the online schema modification technology on the distributed NewSQL database still keeps track of the concept of the traditional mainstream RDBMS scheme, and meanwhile, mostly references the Google F1 paper to introduce transition states to ensure availability and consistency of cluster data, so that the shortcomings of the online schema modification technology are also consistent with the traditional mainstream RDBMS scheme.
In order to overcome the drawbacks of the prior art, researchers propose a technical scheme of combining mode change and delay migration to perform online mode change, so as to avoid waiting for data migration in mode change, and enable new modes to be immediately used after registration.
The article Non-blocking Lazy Schema Changes in Multi-Version Database Management Systems proposes a multi-stage delay mode migration method for mode change operation of an added column, and when the mode is changed, a new logical mode is added to the system, so that no data migration is needed. The data of the same table may be stored in a plurality of different schema physical tables, and the system processes queries on the new schema by interpreting the data in accordance with the latest schema, and moves the data from the old schema to the new schema only if necessary. However, the method is only used for changing the mode of adding columns, has limited application range, and the characteristics of maintaining a plurality of physical tables also increase the complexity and the cost of the database system, and meanwhile, the method is only discussed in a Terrier of a single database system and cannot be adapted to a NEWSQL type distributed database system.
The paper BullFrog Online Schema Evolution via Lazy Evaluation uses a similar idea to change modes by delaying migration data, except that the system only needs to maintain two mode versions, and services can be provided immediately after new mode registration. When the system receives the inquiry on the new mode, the related data related to the old mode is migrated to the new mode, and then the inquiry is processed on the new mode. This approach, although supporting a wider variety of mode change operations, solves the problem of backward compatibility of partial mode changes. However, the method can only be developed on a single database system PostgreSQL, and no discussion and research are performed on a newSQL type distributed database, but the work on mode change on the distributed database system is focused on parallel acceleration and multi-version data exception handling of DDL, so that the research on how to utilize a delayed migration strategy to change the mode of the distributed database has important significance.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a distributed database mode changing method based on delay migration, so that the online mode is changed on a newSQL distributed database, long waiting time for data migration is avoided, and a new mode can be rapidly provided with service.
The technical scheme of the invention is mainly that the SQL engine of the distributed NewSQL database is modified through two stages, namely, the first stage mainly processes the mode change request to perform some initialization operations; the second stage mainly processes the user request on the new mode, in the process, the data related to the user request is firstly migrated from the old mode to the new mode, and then the new mode is used for processing the user request. The implementation steps comprise the following steps:
(1) A user firstly submits a mode change request and describes the structure of a table in a new mode and related data;
(2) After the distributed database system receives the request, creating an empty new table for the new mode according to the mode change type;
(3) Creating a predicate filter on the old table, associating a background migration thread for the new table, completing initialization, and starting to send an add-delete-check SQL request by a user through the created new table;
(4) The distributed database system receives the SQL request on the new table and judges the type of the SQL request:
if the request is an INSERT statement, then execute (5),
if the request is a SELECT/UPDATE/DELETE statement, then execution (6),
the rest requests are not processed, and error information is returned;
(5) The distributed database system directly executes INSERT logic on the new table and returns the successful execution result to the user;
(6) The distributed database system uses predicate filters to locate data on the old table required by the request statement;
(7) The distributed database system examines the positioning data and determines if the data has been migrated to the new table in its entirety:
if all migration is performed, executing the step (8);
otherwise, executing (9);
(8) The distributed database system executes the SELECT/UPDATE/DELETE logic on the new table and returns the successful execution result to the user;
(9) And (7) migrating the data which is not migrated yet, and returning to the step (7).
Compared with the prior art, the invention has the following advantages:
first, the present invention delays and splits the whole migration process by migrating related data as needed before processing a request of a new mode, so that the system can immediately use the new mode by simply initializing after receiving a mode change request, long waiting time for data migration is avoided, and support can be provided for upgrading or updating an application program which needs to be directly deployed into a front-end code instance and a back-end database instance.
Secondly, the invention creates a special empty table consistent with the data distribution of the old table for the mode change of the old table main key contained in the new table by carrying out special optimization processing aiming at the type of the mode change, thereby reducing the number of nodes for executing distributed plan routing and distribution during data migration, improving migration efficiency and effectively reducing the time delay of user inquiry.
Drawings
For a clearer description of the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a general flow chart of an implementation of the present invention;
FIG. 2 is a first stage sub-flowchart of the present invention;
FIG. 3 is a schematic diagram of a mode change request in the first stage of embodiment 1 of the present invention;
FIG. 4 is a diagram of the data distribution of an old user table used in the present invention;
FIG. 5 is a second stage sub-flowchart of the present invention;
FIG. 6 is a diagram illustrating data migration on a special empty table in the second stage of embodiment 1 of the present invention;
FIG. 7 is a diagram illustrating a mode change request in the first stage of embodiment 2 of the present invention;
fig. 8 is a schematic diagram of data migration on a normal empty table in the second stage of embodiment 2 of the present invention.
FIG. 9 is a cluster diagram of a NewSQL database used in the present invention;
Detailed Description
Embodiments of the invention are described in further detail below with reference to the attached drawing figures:
the invention provides a distributed database mode changing method based on delay migration, which mainly modifies an SQL engine of a distributed NewSQL database. It is divided into two phases: the first stage is to process a mode change request, namely, perform some initialization operations without involving specific data migration, and obtain a new mode; the second stage is to process the user request in the new mode, i.e. the data related to the user request is migrated from the old mode to the new mode, and then the new mode is used to process the user request.
The present invention presents two embodiments that use a Key-Value (KV) storage engine-based, shared-architecture-free distributed NewSQL database, but are not limited to such databases.
Embodiment 1, a distributed database schema change is performed based on a schema change request created for a new table structure containing an old table primary key.
Referring to fig. 1, the implementation steps of this example are as follows:
step 1, a mode change request is processed in a first stage.
Referring to fig. 2, the implementation steps of this stage are as follows:
1.1 A user first submits a mode change request, namely a DDL statement, to the distributed database system, which describes the structure of the table in the new mode and the related data;
in this example, the old user table is changed to a new user_rights table, the primary key of the new user_rights table is consistent with the old user table, and the new user_rights table is an id column, and the DDL statement corresponding to the mode change is CREATE TABLE user _ rights AS SELECT id, and the rights FROM user is shown in fig. 3.
1.2 After the distributed database system receives the request, creating an empty new table for the new schema for the type of schema modification:
if the new table structure to be created does not contain the primary key of the old table, then a common empty table is created according to the default policy of the current distributed database system, as in embodiment 2;
if the new table structure to be created contains the primary key of the old table, on the basis of creating the common empty table, creating a special empty table consistent with the data distribution of the old table according to the metadata information of the old table, namely executing 1.3);
in this example consider a schema change from an old user table to a new user_rights table, the old table user used is a data distribution over a three-node database cluster, as shown in FIG. 4. The table contains 3 fragments range1, range2 and range3, each fragment having three copies, wherein the gray area is the copy holding the lease, and the copies are strongly synchronized by means of the distributed consensus protocol Raft. For convenience, the data in the cluster only identifies the Value of the primary Key id column, which represents the Key-Value pair already listed in the right half of the figure, the Key being encoded by the table number, primary Key and timestamp ts, the Value being encoded by the non-primary Key data, the Key form of the old user table data being/51/id/ts, the Value form being/name/age/rights assuming the old user table number is 51. In this example, because the new user_rights table contains the primary key column id in the old user table, a special empty table needs to be created.
1.3 The implementation steps of creating the special empty table are as follows:
1.3.1 Searching metadata information of the old table by the distributed database system, obtaining the table number of the old table and the column and data type of the main key of the old table, mapping the table number and the column and data type to the corresponding column and data type in the new table, and recording the table number and the column and the data type as mapping columns;
1.3.2 Key information of the new table is constructed: the Key of the new table data is formed by splicing two parts of a prefix and a main body, wherein the prefix is an old table number plus a mapping column, and the main body is a Key formed by the new table number and a main Key column of the main body, namely the Key of the old table is used as the prefix when the data of the new table is stored;
1.3.3 Writing the constructed Key information into metadata information of a new table, and taking the metadata information as a basis for explaining data when accessing a bottom Key Value Key-Value storage engine;
1.3.4 Additional flag bits are added in the new table metadata information to ensure that Key prefix adding and deleting operations can be performed when the distributed database system performs subsequent read-write operations on the new table.
In this example, since the old user table number is 51, the mapping column is id column, the new user_rights table number is 52, and the new table primary Key is also id column, the Key form of the new user_rights table data is/51/id/52/id/ts, the prefix and the body of the Key are divided by symbol, and the Value form is/rights.
1.4 Creating a predicate filter on the old table based on whether the distributed database system supports view resolution techniques:
if so, the distributed database system establishes a view consistent with the new table structure on the old table according to the structure of the new table in the mode change request, and the view serves as a predicate filter;
if not, the open source tool LessQL of query and rewrite on the existing github is used, and the link address ishttps://github.com/morris/lessqlLessQL middleware is introduced as a predicate filter.
In this example, because the distributed database system supports view resolution techniques, a view consistent with the structure of the new user_rights table is created on the old user table as a predicate filter, and the DDL statement for creating the view is CREATE VIEW user_rights_ view AS SELECT id, rights FROM users;
1.5 Associated with a background migration thread for the old table):
1.5.1 The distributed database system obtains the data total amount of the old table, divides the data total amount into tiny data blocks according to a data main key or a line id, and generates a plurality of range SELECT sentences on the basis to form a new table inquiry script which is enough to traverse the data of the old table;
1.5.2 A new thread is created and a timer is set to run the script periodically to simulate a user's query request for a new table to ensure that all old table data will eventually be migrated.
In this example, the old user table has five pieces of data, and three scope SELECT sentences can be generated according to the main key id <10, 11< id <20, id >20 to form a new table query script.
And 2, deploying and using a new mode.
The user migrates the relevant application traffic on the old mode to the new mode, using the new mode to serve it.
Step 3, the second stage processes the user's request on the new mode.
Referring to fig. 5, the implementation steps of this stage are as follows:
3.1 The user uses the application service to send the request on the new mode, the distributed database system receives the SQL request on the new table, and judges the type:
if the request is an INSERT statement, then 3.2) is performed),
if the request is a SELECT/UPDATE/DELETE statement, then 3.3) is executed,
the rest requests return error information;
in this example, the gateway node3 of the distributed database system receives a SELECT statement request on a new mode: SELECT id, lights FROM user_lights while id <10;
3.2 The distributed database system directly executes INSERT logic on the new table and returns the successful execution result to the user;
3.3 A distributed database system locates data on the old table required for the request statement using a predicate filter:
3.3.1 Judging whether the distributed database system supports view resolution technology:
if so, the distributed database system firstly rewrites the SQL request of the user for the new table into a request for the upper view of the old table, then inputs the request into the query optimizer to obtain an execution plan, obtains a filtering predicate or expression on the old table according to the execution plan, and constructs a SELECT statement on the old table by using the filtering predicate or expression;
if not, the distributed database system uses LessQL middleware to convert the SQL request of the user for the new table into the request for the old table, then inputs the request to a query optimizer to obtain an execution plan, obtains a filtering predicate or expression on the old table according to the execution plan, and constructs a SELECT statement on the old table by using the filtering predicate or expression;
in this example, the distributed database system will SELECT the SQL request of the user for the new table with the right FROM user_rights WHERE id <10; the rewrite is a request SELECT id for the view on the old table, lights from user_lights_view WHERE id <10; inputting the result to a query optimizer to obtain an execution plan, obtaining a filtering predicate id <10 on an old user table according to the execution plan, and constructing a SELECT statement on the old user table as a SELECT id rights from user WHERE id <10 by using the filtering predicate;
3.3.2 3.3.1) taking the old table SELECT statement constructed in 3.3.1) as the sub-query of the INSERT statement used in the process of migrating data, executing the sub-query part of the INSERT statement by the distributed database system, and acquiring the executed result from the storage engine, namely the data on the old table required by the new table SQL request statement.
In this example, SELECT statement SELECT id of the constructed old user table, rights from user WHERE id <10 is taken as sub-query of INSERT statement used in migrating data: INSERT INTO user_rights (SELECT id, rights from user WHERE id < 10); the distributed database system executes the sub-query part of the INSERT statement, so that the related data in the old user table is judged to be on the split Range1, and among the three copies of the split Range1, the copy on the node1 holds leases, namely the leaser node is the node1, and therefore, the system routes to the node1 to acquire the related data, as shown by solid arrows in fig. 6.
3.4 The distributed database system examines each piece of positioning data to determine if the data has migrated to a new table:
if the time stamp is a preset migrated flag bit, the data is migrated;
if the time stamp is a preset migration flag bit, the data is migrated;
the remaining time stamps indicate that the piece of data has not been migrated;
3.5 Judging whether all positioning data have been migrated:
if each piece of positioning data has been migrated, then performing 3.6);
otherwise, execute 3.7);
in this example, the preset migrating flag bit is ts_migrating1, the preset migrated flag bit is ts_migrated1, and the distributed database system checks two pieces of data with id 1 and id 2, and the timestamp is ts, which indicates that both pieces of data have not been migrated yet.
3.6 The distributed database system executing SELECT/UPDATE/DELETE logic on the new table and returning the result of successful execution to the user;
3.7 Migration of data that has not been migrated and then return to 3.4), as follows:
3.7.1 A distributed database system executing the parent query portion of the INSERT statement used by the migration data constructed in 3.3.2):
for data being migrated and data that has been migrated, the system skips migration;
performing 3.7.2) on data that has not been migrated;
3.7.2 System migration, inserting data in the old table into the new table:
when migration starts, the system modifies the time stamp of the data in the old table into a preset migrated flag bit;
when the migration is completed, the system modifies the timestamp of the data in the old table to a preset migrating flag bit and returns to 3.4).
In this example, the INSERT statement used in migrating data is INSERT INTERT user_rights (SELECT id, rights from user WHERE id < 10); the distributed database system executes the father inquiry part of the statement to perform data migration, because a special empty table is created in 1.3), and the Key in the data of the old user table is used as the prefix of the data Key of the new user_rights table, the data distribution of the new table and the old table is consistent, and the data with similar keys are on the same piece. The distributed database system modifies the time stamps of the data in the two old user tables with the ids of 1 and 2 into a preset migrating flag bit ts_migrating1, and according to the coding form of the Key of the new user_rights table, the two data with the ids of 1 and 2 are also positioned on the slice range1, so that the distributed database system inserts the data on a leaser node1 of range 1; finally, the time stamps of the data in the two old user tables with id of 1 and 2 are modified into a preset migrated flag bit ts_migrated1, as shown by a dotted arrow in fig. 6, and finally, the step 3.4 is returned.
Embodiment 2, a distributed database schema change is performed based on a schema change request that the created new table structure does not contain the old table primary key.
The implementation flow of this example is the same as that of embodiment 1, and the difference is that the user submits a request for mode change to the distributed database system, that is, in this example, the old user table is changed to the new user_new table, the main key of the new user_new table is the name column, and the DDL statement corresponding to the mode change is CREATE TABLE user _ new AS SELECT name, and the message FROM user is shown in fig. 7.
For this request, the main different steps of this example and embodiment 1 are described as follows:
in step 1.2), in this example, since the new user_new table does not contain the primary key in the old user table, a normal empty table needs to be created.
The data distribution of the old table user used in this example over a three-node database cluster is the same as in example 1, as shown in fig. 4. The node of the distributed database system receiving the mode change request is gateway node3, a common empty table is created on node3 according to the default strategy of the current distributed database system, the table number is 52, the Key form of the new user_new table data is/52/name/ts, and the Value form is/age.
In step 3.1), the gateway node3 of the distributed database system in this example receives a SELECT statement request on the new schema: SELECT name, age FROM user_new WHERE age <20;
in steps 3.3) to 3.7), in this example, the system locates the relevant data in the old table on the fragment Range1 through the predicate filter, and at this time, the node of the renter of Range1 is node1, so that the relevant data is routed to node1, and after judging that the relevant data has not migrated, the data is migrated to node3 where the new user_new table is located, as shown in fig. 8.
The common empty table created in this example increases the number of nodes performing distributed planned routing and distribution at the time of data migration, compared to the special empty table in embodiment 1.
Interpretation of the terms
Slicing: the method is characterized in that the newSQL database divides the globally increasing ordered Key-Value data into blocks with a certain unit size in a range partition mode. The system is a basic unit of routing, storage and copying, and has the capability of automatic splitting and merging.
And (5) copy: refers to multiple copies of a slice. The NewSQL database would replicate each shard and store each shard copy onto a different node to provide high availability.
Lease: refers to authorizers granting leases and commitments to the distributed environment for a period of time. The node where the copy of the lease in the newSQL database is located can provide the consistency read-write of the fragment KV.
Distributed consensus protocol: refers to a protocol used by the NewSQL database to ensure data consistency between nodes. Taking the Raft distributed consistency protocol in paper In search of an understandable consensus algorithm as an example, all copies of a slice form a Raft group, and strong consistency synchronization is performed by means of the Raft protocol, so that high availability and consistency reading and writing are ensured. One copy is a long-stored leader, coordinates all read-write operations to the Raft group, and the other copy is a follower.
Leaseholder node: the method refers to a node where a copy of the lease is located in the NewSQL database, and the leaser node can provide consistent reading and writing of the fragment KV. A copy of the lease is held, typically also a leader in the Raft group.
Gateway node: refers to the node responsible for resolving SQL requests, acting as coordinator for transactions, and routing KV operations to the correct renter node, the gateway node being the concept opposite to the renter node.
A three-node NewSQL database cluster has a structure shown in FIG. 9, and three fragments r1, r2 and r3 are total. Each slice contains three copies, the consistency of data is ensured by using a Raft protocol between the copies, and consistency reading and writing is provided by a renter node.

Claims (8)

1. A method for modifying a distributed database schema based on delayed migration, comprising the steps of:
(1) A user firstly submits a mode change request and describes the structure of a table in a new mode and related data;
(2) After the distributed database system receives the request, creating an empty new table for the new mode according to the mode change type;
(3) Creating a predicate filter on the old table, associating a background migration thread for the new table, completing initialization, and starting to send an add-delete-check SQL request by a user through the created new table;
(4) The distributed database system receives the SQL request on the new table and judges the type of the SQL request:
if the request is an INSERT statement, then execute (5),
if the request is a SELECT/UPDATE/DELETE statement, then execution (6),
the rest requests are not processed, and error information is returned;
(5) The distributed database system directly executes INSERT logic on the new table and returns the successful execution result to the user;
(6) The distributed database system uses predicate filters to locate data on the old table required by the request statement;
(7) The distributed database system examines each piece of location data and determines whether the data has been migrated to the new table in its entirety:
if all migration is performed, executing the step (8);
otherwise, executing (9);
(8) The distributed database system executes the SELECT/UPDATE/DELETE logic on the new table and returns the successful execution result to the user;
(9) And (7) migrating the data which is not migrated yet, and returning to the step (7).
2. The method according to claim 1, characterized in that: creating an empty new table according to whether the new table structure to be created by the mode change contains the primary key of the old table or not:
if not, creating a common empty table according to the default strategy of the current distributed database system;
if the data is included, on the basis of creating the common empty table, a special empty table consistent with the data distribution of the old table is created according to the metadata information of the old table.
3. The method according to claim 2, characterized in that: the method is characterized in that a special empty table consistent with the data distribution of the old table is created according to the metadata information of the old table, and the method is realized as follows:
firstly, searching metadata information of an old table by a distributed database system, acquiring the number of the old table and the column and data type of a main key of the old table, mapping the old table to the corresponding column and data type in a new table, and recording the new table as a mapping column;
next, key information of the new table is constructed: the Key of the new table data is formed by splicing two parts of a prefix and a main body, wherein the prefix is an old table number plus a mapping column, and the main body is a Key formed by the new table number and a main Key column of the main body, namely the Key of the old table is used as the prefix when the data of the new table is stored;
then, writing the constructed Key information into metadata information of a new table, and taking the metadata information as a basis for explaining data when accessing a bottom Key Value Key-Value storage engine;
and finally, adding an extra flag bit into the new table metadata information to ensure that Key prefix adding and deleting operations can be performed when the distributed database system performs subsequent read-write operations on the new table.
4. The method according to claim 1, characterized in that: the creating of a predicate filter on the old table in (3) is based on whether the distributed database system supports view resolution technology:
if so, the distributed database system establishes a view consistent with the new table structure on the old table according to the structure of the new table in the mode change request, and the view serves as a predicate filter;
if not, the LessQL middleware is introduced as a predicate filter by means of an open source tool LessQL of query rewrite on the existing gitsub.
5. The method according to claim 1, characterized in that: and (3) associating a background migration thread for the old table, wherein the implementation is as follows:
firstly, acquiring the total data amount of an old table by a distributed database system, dividing the data amount into small data blocks according to a data main key or a row id, and generating a plurality of range SELECT sentences on the basis to form a new table query script which is enough to traverse the data of the old table;
then, a new thread is created and a timer is set, and the script is run periodically to simulate a user's query request for a new table to ensure that all old table data will eventually be migrated.
6. The method according to claim 1, characterized in that: the distributed database system in the step (6) uses a predicate filter to locate the data on the old table required by the request statement, and the implementation is as follows:
(6a) Judging whether the distributed database system supports view resolution technology or not:
if so, the distributed database system firstly rewrites the SQL request of the user for the new table into a request for the upper view of the old table, then inputs the request into the query optimizer to obtain an execution plan, obtains a filtering predicate or expression on the old table according to the execution plan, and constructs a SELECT statement on the old table by using the filtering predicate or expression;
if not, the distributed database system uses LessQL middleware to convert the SQL request of the user for the new table into the request for the old table, then inputs the request to a query optimizer to obtain an execution plan, obtains a filtering predicate or expression on the old table according to the execution plan, and constructs a SELECT statement on the old table by using the filtering predicate or expression;
(6b) And (3) taking the old table SELECT statement constructed in the step (6 a) as a sub-query of an INSERT statement used for migrating data, executing the sub-query part of the INSERT statement by the distributed database system, and acquiring an executed result, namely data on the old table required by the new table SQL request statement, from a storage engine.
7. The method according to claim 1, characterized in that: the distributed database system in (7) checks each piece of positioning data by checking the time stamp of the positioning data to determine whether it has been all migrated to the new table:
if the time stamp is a preset migrated flag bit, the data is migrated;
if the time stamp is a preset migration flag bit, the data is migrated;
the remaining time stamps indicate that the piece of data has not been migrated.
8. The method according to claim 1, characterized in that: and (3) migrating the data which is not migrated yet, wherein the method is realized as follows:
9a) The distributed database system executes the parent query portion of the INSERT statement used by the migration data constructed in 6 (b):
for data being migrated and data that has been migrated, the system skips the migration,
performing 9 b) on data that has not been migrated;
9b) The system migrates and inserts the data in the old table into the new table:
when migration starts, the system modifies the time stamp of the data in the old table into a preset migrating flag bit;
when the migration is completed, the system modifies the time stamp of the data in the old table into a preset migrated flag bit.
CN202310624507.3A 2023-05-30 2023-05-30 Distributed database mode changing method based on delay migration Pending CN116756116A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310624507.3A CN116756116A (en) 2023-05-30 2023-05-30 Distributed database mode changing method based on delay migration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310624507.3A CN116756116A (en) 2023-05-30 2023-05-30 Distributed database mode changing method based on delay migration

Publications (1)

Publication Number Publication Date
CN116756116A true CN116756116A (en) 2023-09-15

Family

ID=87956350

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310624507.3A Pending CN116756116A (en) 2023-05-30 2023-05-30 Distributed database mode changing method based on delay migration

Country Status (1)

Country Link
CN (1) CN116756116A (en)

Similar Documents

Publication Publication Date Title
JP2583010B2 (en) Method of maintaining consistency between local index table and global index table in multi-tier index structure
US7577658B2 (en) Hierarchical locking in B-tree indexes
KR101069350B1 (en) Architecture for partition computation and propagation of changes in data replication
US8768977B2 (en) Data management using writeable snapshots in multi-versioned distributed B-trees
US7836037B2 (en) Selection of rows and values from indexes with updates
US9922075B2 (en) Scalable distributed transaction processing system
AU2006284498B2 (en) Database fragment cloning and management
US9501550B2 (en) OLAP query processing method oriented to database and HADOOP hybrid platform
US5625815A (en) Relational database system and method with high data availability during table data restructuring
US8504523B2 (en) Database management system
US6161109A (en) Accumulating changes in a database management system by copying the data object to the image copy if the data object identifier of the data object is greater than the image identifier of the image copy
Rae et al. Online, asynchronous schema change in F1
US9922086B1 (en) Consistent query of local indexes
Graefe et al. Transactional support for adaptive indexing
US7080072B1 (en) Row hash match scan in a partitioned database system
US11714794B2 (en) Method and apparatus for reading data maintained in a tree data structure
US6944633B1 (en) Performing a join in a partitioned database system
CN111460000B (en) Backtracking data query method and system based on relational database
Agrawal et al. Survey on Mongodb: an open-source document database
Kvet et al. Master index access as a data tuple and block locator
CN116756116A (en) Distributed database mode changing method based on delay migration
CN115687343A (en) Database system and database column changing method
US8706769B1 (en) Processing insert with normalize statements
Sabharwal et al. Cloud spanner explained
Querzoni THE CAPTHEOREM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination