CN117591523A - Data processing method and device based on shared storage architecture and computing equipment - Google Patents

Data processing method and device based on shared storage architecture and computing equipment Download PDF

Info

Publication number
CN117591523A
CN117591523A CN202311366597.7A CN202311366597A CN117591523A CN 117591523 A CN117591523 A CN 117591523A CN 202311366597 A CN202311366597 A CN 202311366597A CN 117591523 A CN117591523 A CN 117591523A
Authority
CN
China
Prior art keywords
data
data page
page
log information
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311366597.7A
Other languages
Chinese (zh)
Inventor
邢颖
温正湖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN202311366597.7A priority Critical patent/CN117591523A/en
Publication of CN117591523A publication Critical patent/CN117591523A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the disclosure provides a data processing method, a data processing device and a computing device based on a shared storage architecture. The method is applied to a slave node, the method comprising: receiving first log information sent by a master node; the first log information is obtained by the segmentation processing of the second log information by the master node; the second log information is used for recording the first data and the first data page identification; the first data is changed data which is made by updating the original data page into the first data page by the master node; the first log information is used for recording a first data page identifier and first position information of first data; if the second data page exists in the first cache pool according to the first log information, adding a first identifier for the second data page; and responding to the data query instruction, and determining target data according to the first cache pool. By the method, the data transmission quantity and the storage space occupation quantity of the slave node are reduced.

Description

Data processing method and device based on shared storage architecture and computing equipment
Technical Field
Embodiments of the present disclosure relate to the field of data processing, and more particularly, to a data processing method, apparatus and computing device based on a shared storage architecture.
Background
This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
Currently, in the field of data processing, a database architecture commonly used includes a plurality of devices, and a shared storage device that is shared by and accessible by the plurality of devices. Among the plurality of devices, a device that can be used to perform a read-write operation on data in the shared storage device may be referred to as a master device; devices that can only perform read operations are called slaves.
In the related art, when the host device performs the data update operation, since the updated data is written into the memory corresponding to the host device, the updated data in the memory is stored in the shared storage device in a persistent manner, that is, the host device does not update the data into the shared storage in real time when performing the data update.
Therefore, when the master device writes the updated data into the memory, but not into the shared storage device, how to ensure that the slave device can acquire the updated data of the master device in the above stage, and ensuring data synchronization between devices is a problem to be solved.
Disclosure of Invention
The disclosure provides a data processing method, a data processing device and a computing device based on a shared storage architecture, so that corresponding data can be accurately queried from a node.
In a first aspect of embodiments of the present disclosure, there is provided a data processing method based on a shared storage architecture, the method being applied to a slave node, the method comprising:
receiving first log information sent by a master node; the first log information is obtained by the master node performing segmentation processing on the second log information; the second log information is used for recording first data and a first data page identifier; the first data is changed data which is made by the main node to update the original data page into the first data page; the first data page identifier is an identifier of the first data page; the first log information is used for recording the first data page identifier and first position information of the first data; the first location information characterizes a location of the first data in a shared storage node;
if the second data page exists in the first cache pool according to the first log information, adding a first identification for the second data page; the first identifier characterizes that the data in the first data page and the data in the second data page are not identical; the second data page and the first data page have the same data page identification; the first cache pool is used for storing the data pages cached by the slave nodes;
And responding to a data query instruction, and determining target data according to the first cache pool.
In one example, in response to a data query instruction, determining target data from the first cache pool includes:
responding to a data query instruction, and if the first cache pool comprises a third data page and the third data page does not have a second identifier, determining that the data in the third data page is target data; the third data page and the target data page requested to be queried by the data query instruction have the same data page identification; the second identifier characterizes that the data in the third data page is different from the data in the target data page generated by the master node update.
In one example, in response to a data query instruction, determining target data from the first cache pool includes:
responding to a data query instruction, and if the first cache pool comprises a fourth data page and the fourth data page is provided with a third identifier, determining third log information corresponding to the fourth data page; the fourth data page and the target data page requested to be queried by the data query instruction have the same data page identification; the third identifier characterizes that the data in the fourth data page is different from the data in the target data page generated by updating the master node; the third log information is used for recording a second data page identifier and second position information of the fourth data page; the second location information is the location of second data in the shared storage node; the second data is the data changed by the main node to update the fourth data page to the target data page;
Acquiring the second data according to the second position information; and updating the data in the fourth data page according to the second data to obtain target data.
In one example, in response to a data query instruction, determining target data from the first cache pool includes:
responding to a data query instruction, and if the first cache pool does not comprise a fifth data page, determining fourth log information, wherein the fourth log information is used for recording a third data page identifier and third position information of a target data page; the third location information is the location of third data in the shared storage node; the third data is changed data which is made by the master node to update a sixth data page to the target data page; the sixth data page and the target data page are used for indicating that the data page identification is the same; the target data page is the data page requested to be queried by the data query instruction;
acquiring the third data and the sixth data page in the shared storage node according to the third data page identifier and the third position information;
and updating the data in the sixth data page according to the third data to obtain target data.
In one example, further comprising:
responding to a log playback request, and if the first cache pool comprises a seventh data page with a fourth identifier, acquiring fifth log information corresponding to the seventh data page; wherein the fourth identifier characterizes that the data in the seventh data page is different from the data in the eighth data page generated by the master node update; the eighth data page and the seventh data page have the same data page identification; the fifth log information is used for recording a fourth data page identifier and fourth position information of the seventh data page; the fourth location information is the location of fourth data in the shared storage node; the fourth data is the data changed by the master node updating the seventh data page to the eighth data page;
acquiring the fourth data in the shared storage node according to the fourth position information;
and updating the data in the seventh data page according to the fourth data to obtain an updated seventh data page.
In one example, further comprising:
receiving notification information sent by a main device; the notification information is used for indicating that a ninth data page with a first log sequence number is transferred to the shared storage node by the master device;
According to the notification information, deleting the sixth log information received by the slave node; the sixth log information is used for recording a data page identifier and fifth position information of the ninth data page; the fifth location information is the location of fifth data in the shared storage node; and the fifth data is data in a data page changed when the master node updates and generates the ninth data page, and the second log sequence number of the sixth log information is less than or equal to the first log sequence number.
In one example, the shared storage node includes a first storage area and at least one second storage area therein; the first storage area is used for first log information generated by the master node; the second storage area is used for storing sixth data corresponding to the tenth data page; the tenth data page is a data page with a data page identifier corresponding to the second storage area; and the sixth data is the data which is updated correspondingly when the master node updates and generates the tenth data page.
In one example, the first location information includes: the data amount and the initial position information of the first data; the initial position information is used for indicating an initial storage position of the first data in the shared storage node; the data amount of the first data is used for indicating the size of the storage space occupied by the first data.
In one example, the method further comprises:
a first data page identifier in the first log information is used as a keyword; determining indication information of the first log information based on a hash algorithm and the first data page identifier; the indication information is used for indicating the storage space of the first log information;
and storing the first log information into a storage space indicated by the indication information.
In a second aspect of the embodiments of the present disclosure, there is provided a data processing method based on a shared storage architecture, the method being applied to a master node, the method comprising:
responding to a data updating request, and updating data in at least one original data page to obtain a first data page after corresponding updating of the original data page; and generating at least one second log information; the second log information is used for recording first data and a first data page identifier; the first data is changed data which is made by the main node to update the original data page into the first data page; the first data page identifier is an identifier of the first data page;
performing segmentation processing on the second log information to obtain first log information; wherein the first log information is used for recording the first data page identifier and first position information of the first data; the first location information characterizes a location of the first data in a shared storage node;
The first log information is sent to a slave node.
In one example, further comprising:
caching the first data page into a second cache pool corresponding to the master node; the second cache pool is used for caching data pages generated by the master node;
and writing the first log information corresponding to the second log information and the first data included in the second log information into the second cache pool.
In one example, the second cache pool includes a first cache region and at least one second cache region; the first buffer area is used for storing first log information corresponding to each second log information; the second buffer area is used for storing seventh data corresponding to an eleventh data page; the eleventh data page is a data page with a data page identifier corresponding to the second buffer area; and the seventh data is the data which is updated correspondingly when the master node updates and generates the eleventh data page.
In one example, further comprising:
and based on a plurality of first threads, writing the data stored in the first cache region and the second cache region in the second cache pool into the shared storage node in parallel.
In one example, the shared storage node includes a first storage area and a second storage area therein; the first storage area is used for storing first log information generated by the master node; the second storage area is used for storing sixth data corresponding to the tenth data page; the tenth data page is a data page with a data page identifier corresponding to the second storage area; and the sixth data is the data which is updated correspondingly when the master node updates and generates the tenth data page.
In one example, the first storage area corresponds to the first cache area; the second cache areas are in one-to-one correspondence with the second storage areas.
In one example, the method further comprises:
determining first writing information and second writing information; the first writing information is used for indicating a first serial number corresponding to first log information written in the shared storage node in the first cache region; the first serial number is a global log serial number which is possessed by a data update request triggering the master node to generate fourth log information; the fourth log information is second log information corresponding to the first log information; the second writing information is used for indicating a second serial number corresponding to the seventh data written in the shared storage node in the second buffer area; the second serial number is a global log serial number which is used for triggering the master node to generate a data update request of fifth log information; the fifth log information includes the seventh data;
determining a global persistence log sequence number according to the first writing information and the second writing information; the global persistence log sequence number characterizes that all sixth log information corresponding to the first request is transferred to the shared storage node; the first request is a data update request with a second global log sequence number; the second global log sequence number is less than or equal to the global persistence log sequence number; the sixth log information is used for recording data page identification and adjusted data of the data page triggered to be updated by the first request;
Determining a data page to be written in a second cache pool according to the global persistence log serial number; the third log serial number corresponding to the data page to be written is smaller than the global persistence log serial number;
and updating the data page to be written into the shared storage node.
In one example, the data update request has a first global log sequence number; the shared storage node further comprises: a third storage area and a fourth storage area; the third storage area comprises N first subareas; the fourth storage area comprises N second subareas; the first subareas are in one-to-one correspondence with the second subareas; n is a positive integer;
the method further comprises the steps of:
determining at least one fourth location information and at least one fifth location information corresponding to the data update request; the fourth location information is the location of the first log information in the second cache pool; the fifth position information is the position of the first data in the second cache pool;
writing the first global log sequence number into a first sub-region; and writing the data updating request into a second sub-area corresponding to the first sub-area, wherein the fourth position information and the fifth position information correspond to the data updating request.
In a third aspect of the disclosed embodiments, there is provided a data processing apparatus based on a shared storage architecture, the apparatus being applied to a slave node, the apparatus comprising:
a first receiving unit, configured to receive first log information sent by a master node; the first log information is obtained by the master node performing segmentation processing on the second log information; the second log information is used for recording first data and a first data page identifier; the first data is changed data which is made by the main node to update the original data page into the first data page; the first data page identifier is an identifier of the first data page; the first log information is used for recording the first data page identifier and first position information of the first data; the first location information characterizes a location of the first data in a shared storage node;
the first determining unit is used for adding a first identifier to the second data page if the second data page exists in the first cache pool according to the first log information; the first identifier characterizes that the data in the first data page and the data in the second data page are not identical; the second data page and the first data page have the same data page identification; the first cache pool is used for storing the data pages cached by the slave nodes;
And the second determining unit is used for responding to the data query instruction and determining target data according to the first cache pool.
In one example, the second determining unit is specifically configured to: responding to a data query instruction, and if the first cache pool comprises a third data page and the third data page does not have a second identifier, determining that the data in the third data page is target data; the third data page and the target data page requested to be queried by the data query instruction have the same data page identification; the second identifier characterizes that the data in the third data page is different from the data in the target data page generated by the master node update.
In one example, the second determining unit is specifically configured to: responding to a data query instruction, and if the first cache pool comprises a fourth data page and the fourth data page is provided with a third identifier, determining third log information corresponding to the fourth data page; the fourth data page and the target data page requested to be queried by the data query instruction have the same data page identification; the third identifier characterizes that the data in the fourth data page is different from the data in the target data page generated by updating the master node; the third log information is used for recording a second data page identifier and second position information of the fourth data page; the second location information is the location of second data in the shared storage node; the second data is the data changed by the main node to update the fourth data page to the target data page;
Acquiring the second data according to the second position information; and updating the data in the fourth data page according to the second data to obtain target data.
In one example, the second determining unit is specifically configured to: responding to a data query instruction, and if the first cache pool does not comprise a fifth data page, determining fourth log information, wherein the fourth log information is used for recording a third data page identifier and third position information of a target data page; the third location information is the location of third data in the shared storage node; the third data is changed data which is made by the master node to update a sixth data page to the target data page; the sixth data page and the target data page are used for indicating that the data page identification is the same; the target data page is the data page requested to be queried by the data query instruction;
acquiring the third data and the sixth data page in the shared storage node according to the third data page identifier and the third position information;
and updating the data in the sixth data page according to the third data to obtain target data.
In one example, further comprising:
the first acquisition unit is used for responding to a log playback request, and if the first cache pool comprises a seventh data page with a fourth identifier, fifth log information corresponding to the seventh data page is acquired; wherein the fourth identifier characterizes that the data in the seventh data page is different from the data in the eighth data page generated by the master node update; the eighth data page and the seventh data page have the same data page identification; the fifth log information is used for recording a fourth data page identifier and fourth position information of the seventh data page; the fourth location information is the location of fourth data in the shared storage node; the fourth data is the data changed by the master node updating the seventh data page to the eighth data page;
a second obtaining unit, configured to obtain the fourth data in the shared storage node according to the fourth location information;
and the first updating unit is used for updating the data in the seventh data page according to the fourth data to obtain an updated seventh data page.
In one example, further comprising:
The second receiving unit is used for receiving the notification information sent by the main equipment; the notification information is used for indicating that a ninth data page with a first log sequence number is transferred to the shared storage node by the master device;
a deletion unit, configured to delete the sixth log information received from the slave node according to the notification information; the sixth log information is used for recording a data page identifier and fifth position information of the ninth data page; the fifth location information is the location of fifth data in the shared storage node; and the fifth data is data in a data page changed when the master node updates and generates the ninth data page, and the second log sequence number of the sixth log information is less than or equal to the first log sequence number.
In one example, the shared storage node includes a first storage area and at least one second storage area therein; the first storage area is used for first log information generated by the master node; the second storage area is used for storing sixth data corresponding to the tenth data page; the tenth data page is a data page with a data page identifier corresponding to the second storage area; and the sixth data is the data which is updated correspondingly when the master node updates and generates the tenth data page.
In one example, the first location information includes: the data amount and the initial position information of the first data; the initial position information is used for indicating an initial storage position of the first data in the shared storage node; the data amount of the first data is used for indicating the size of the storage space occupied by the first data.
In one example, the apparatus further comprises:
a third determining unit configured to use a first data page identifier in the first log information as a key;
a fourth determining unit configured to determine indication information of the first log information based on a hash algorithm and the first data page identifier; the indication information is used for indicating the storage space of the first log information;
and the storage unit is used for storing the first log information into a storage space indicated by the indication information.
In a fourth aspect of embodiments of the present disclosure, there is provided a data processing apparatus based on a shared storage architecture, the apparatus being applied to a master node, the apparatus comprising:
the second updating unit is used for responding to the data updating request, updating the data in at least one original data page and obtaining a first data page after corresponding updating of the original data page; and generating at least one second log information; the second log information is used for recording first data and a first data page identifier; the first data is changed data which is made by the main node to update the original data page into the first data page; the first data page identifier is an identifier of the first data page;
The segmentation unit is used for carrying out segmentation processing on the second log information to obtain first log information; wherein the first log information is used for recording the first data page identifier and first position information of the first data; the first location information characterizes a location of the first data in a shared storage node;
and the sending unit is used for sending the first log information to the slave node.
In one example, further comprising:
the first caching unit is used for caching the first data page into a second caching pool corresponding to the master node; the second cache pool is used for caching data pages generated by the master node;
and the second caching unit is used for writing the first log information corresponding to the second log information and the first data included in the second log information into the second caching pool.
In one example, the second buffer pool includes a first buffer area and at least one second buffer area; the first buffer area is used for storing first log information corresponding to each second log information; the second buffer area is used for storing seventh data corresponding to an eleventh data page; the eleventh data page is a data page with a data page identifier corresponding to the second buffer area; and the seventh data is the data which is updated correspondingly when the master node updates and generates the eleventh data page.
In one example, further comprising:
and the first writing unit is used for writing the data stored in the first cache region and the second cache region in the second cache pool into the shared storage node in parallel based on a plurality of first threads.
In one example, the shared storage node includes a first storage area and a second storage area therein; the first storage area is used for storing first log information generated by the master node; the second storage area is used for storing sixth data corresponding to the tenth data page; the tenth data page is a data page with a data page identifier corresponding to the second storage area; and the sixth data is the data which is updated correspondingly when the master node updates and generates the tenth data page.
In one example, the first storage area corresponds to the first cache area; the second cache areas are in one-to-one correspondence with the second storage areas.
In one example, the data update request has a first global log sequence number; the apparatus further comprises:
a fifth determining unit configured to determine the first writing information and the second writing information; the first writing information is used for indicating a first serial number corresponding to first log information written in the shared storage node in the first cache region; the first serial number is a global log serial number which is possessed by a data update request triggering the master node to generate fourth log information; the fourth log information is second log information corresponding to the first log information; the second writing information is used for indicating a second serial number corresponding to the seventh data written in the shared storage node in the second buffer area; the second serial number is a global log serial number which is used for triggering the master node to generate a data update request of fifth log information; the fifth log information includes the seventh data;
A sixth determining unit, configured to determine a global persistence log sequence number according to the first writing information and the second writing information; the global persistence log sequence number characterizes that all sixth log information corresponding to the first request is transferred to the shared storage node; the first request is a data update request with a second global log sequence number; the second global log sequence number is less than or equal to the global persistence log sequence number; the sixth log information is used for recording data page identification and adjusted data of the data page triggered to be updated by the first request;
a seventh determining unit, configured to determine, according to the global persistence log sequence number, a data page to be written in a second cache pool; the third log serial number corresponding to the data page to be written is smaller than the global persistence log serial number;
and the second writing unit is used for updating the data page to be written into the shared storage node.
In one example, the data update request has a first global log sequence number; the shared storage node further comprises: a third storage area and a fourth storage area; the third storage area comprises N first subareas; the fourth storage area comprises N second subareas; the first subareas are in one-to-one correspondence with the second subareas; n is a positive integer;
The apparatus further comprises:
an eighth determining unit, configured to determine at least one fourth location information and at least one fifth location information corresponding to the data update request; the fourth location information is the location of the first log information in the second cache pool; the fifth position information is the position of the first data in the second cache pool;
a third writing unit, configured to write the first global log sequence number into a first sub-area;
and a fourth writing unit, configured to write the data update request into a second sub-area corresponding to the first sub-area, where the fourth location information corresponds to the data update request and the fifth location information corresponds to the data update request.
In a fifth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium comprising: the computer readable storage medium has stored therein computer executable instructions which, when executed by a processor, implement the method of any of the first or second aspects.
In a sixth aspect of embodiments of the present disclosure, there is provided a computing device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the computing device to perform the method of any one of the first or second aspects.
Data processing method, device and computing equipment based on shared storage architecture according to the embodiment of the disclosure. The method is applied to the slave node, the accuracy of acquiring the data from the slave node can be ensured, the data transmission quantity between the master node and the slave node is reduced, and the occupation quantity of the storage space of the slave node is reduced.
Drawings
The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:
FIG. 1 is a schematic diagram of a master-slave node log replication framework provided by the present disclosure;
FIG. 2 is a schematic diagram of data synchronization according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a data change provided in an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a database architecture provided by the present disclosure;
FIG. 5 is a flow chart of a data processing method based on a shared memory architecture according to an embodiment of the disclosure;
FIG. 6 is a flow chart of a second data processing method based on a shared memory architecture according to an embodiment of the disclosure;
FIG. 7 is a schematic diagram of a slave node data query provided in an embodiment of the present disclosure;
FIG. 8 is a flow chart of a third method for processing data based on a shared memory architecture according to an embodiment of the disclosure;
FIG. 9 is a schematic diagram of yet another slave node data query provided by an embodiment of the present disclosure;
FIG. 10 is a flowchart of a fourth data processing method based on a shared memory architecture according to an embodiment of the disclosure;
FIG. 11 is a flowchart of a fifth data processing method based on a shared memory architecture according to an embodiment of the present disclosure;
FIG. 12 is a schematic diagram of a scenario of a data store provided by an embodiment of the present disclosure;
FIG. 13 is a schematic diagram of a shared storage node architecture according to an embodiment of the present disclosure;
FIG. 14 is a schematic diagram of a data processing flow provided in an embodiment of the disclosure;
FIG. 15 is a schematic diagram of a program product provided by an embodiment of the present disclosure;
FIG. 16 is a schematic diagram of a device structure of a data processing device based on a shared memory architecture according to an embodiment of the disclosure;
FIG. 17 is a schematic diagram of a data processing apparatus based on a shared memory architecture according to an embodiment of the present disclosure;
fig. 18 is a schematic structural diagram of a computing device according to an embodiment of the present disclosure.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present disclosure will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable one skilled in the art to better understand and practice the present disclosure and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Those skilled in the art will appreciate that embodiments of the present disclosure may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to an embodiment of the disclosure, a data processing method, a data processing device and a computing device based on a shared storage architecture are provided.
Herein, it is to be understood that the terms involved include:
Shared-Storage architecture: an architecture employing multiple computing nodes to commonly access the same storage node (shared storage) can efficiently perform computing node expansion and storage node capacity expansion.
Buffer Pool memory Pool: is a memory Buffer of the InnoDB, which is a storage engine of the MySQL database, and is used as a write Buffer and a read Buffer, thereby realizing a transaction system.
Redo log: the relational database is a way to realize the durability of transaction data, and for performance consideration, when a transaction is submitted, new data generated by the transaction is not immediately brushed to the storage, but the Redo log is persistence first, and then the new data is asynchronously refreshed to the storage. Wherein the Redo log records each modification made by the transaction.
LSN: log sequence number, i.e. the log sequence number, the redox log is typically identified by LSN, which increases monotonically. Each data page also has a corresponding LSN, which indicates the corresponding Redo log when the transaction makes the last modification to the page, so that the degree of freshness of the page can also be judged by the LSN. When the data page corresponding to the Redo before a certain LSN is already brushed to the storage, the space occupied by the Redo log before the LSN can be recovered.
Furthermore, any number of elements in the figures is for illustration and not limitation, and any naming is used for distinction only and not for any limiting sense.
In addition, the data related to the disclosure may be data authorized by the user or fully authorized by each party, and the collection, transmission, use and the like of the data all conform to the requirements of national related laws and regulations, and the embodiments of the disclosure may be mutually combined.
The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments thereof.
Summary of The Invention
In the related art, a master node, a plurality of slave nodes, and a shared storage node are generally disposed in a database architecture corresponding to the shared storage architecture. The shared storage nodes can be used as a common data storage space, and the master node and the slave nodes can access the shared data nodes to read data. And, only the master node is able to modify the data in the shared storage node.
Fig. 1 is a schematic diagram of a framework for master-slave node log replication provided in the present disclosure. As shown in fig. 1, in the related art, when the master node side needs to perform operations such as modifying, adding, deleting and the like on data in the shared storage node, the master node writes the redox log into an area for storing the log in the shared storage node through the log persistence thread. Wherein, the Redo log records the data change content of the operation; and the master node informs the slave node that the newly added Redo log exists in the shared storage node through the message sending thread. The slave node determines that the newly added Redo log exists in the shared storage node through the message receiving thread, and the message receiving thread in the slave node informs the log playback thread to read and play back the Redo log on the shared storage (the specific playback content can comprise operations such as transaction state synchronization, data definition language operation, data manipulation language operation, user and authority); the slave node informs the thread of the completion of the Redo log playback through a message. The master node receives the transmitted log played back message of the slave node through the message receiving thread. In addition, in fig. 1, the master node and the slave node respectively correspond to respective caches, and the shared storage node also comprises a data area for storing data pages.
After log playback operations of the master-slave nodes are performed in the manner shown in fig. 1, it can be ensured that the slave nodes can read accurate data.
For example, fig. 2 is a schematic diagram of data synchronization provided in the present disclosure, as shown in fig. 2, after the master node updates the data page P11, the log sequence number recorded in the data page itself is changed from 100 to 200 (e.g. the data page P12 in the figure), and the log sequence number corresponding to the corresponding redox log for recording the content of the data change is 200, where the log sequence number recorded in the data page is the log sequence number corresponding to the redox log for recording the data change in the data page. The master node then stores the log with log sequence number 200 in the shared storage node. When the master node has not changed the data in the data page P11 in the shared storage node and the master node has stored the log in the shared storage node, if the slave node needs to read the data in the data page P12, the slave node will extract the log and the data page P11 from the shared storage node, and update the data in the data page P11 according to the obtained log (i.e. playback in the figure), thereby obtaining the updated data page P12'.
However, in the example corresponding to fig. 2, if the master node updates the generated data page and does not synchronize with the shared storage node for a long time, if the slave node deletes the latest data page P12 'obtained by the previous playback due to the limited cache space in this stage, if the slave node needs to read the data content of the data page P12 again, the data page P11 acquired from the shared storage by the slave node is an updated data page because the latest data page P12' has already been deleted from the cache space of the slave node. Specifically, fig. 3 is a schematic diagram of data change provided in an embodiment of the disclosure. The data page P1 is understood to be a data page identified as P1.
As shown in fig. 3, at time T1, the master node writes a log with a log sequence number of 200 at time T1, and updates the log sequence number recorded in the data page P11 from 100 to 200 (here, the log sequence number is updated from 100 to 200 is only an example, in practical application, as long as the log sequence number generated by each update is an incremental value), and then the master node writes the log with the log sequence number of 200 into the shared storage node; at time T1, the log sequence number recorded on the data page P11 'cached in the slave node is 100, where the data page P11 and the data page P11' have the same data page identifier. At time T2, the master node notifies the slave node that a new redox log (i.e., a log for recording the data change of the data page P1) is generated, and the slave node knows that a new log exists; at time T3, when the slave node needs to read the data page P12, the slave node may read the log with the log sequence number of 200 in the shared storage node, and perform log playback processing (i.e., update the data in the data page P11 ') in combination with the cached data page P11' and the obtained log with the log sequence number of 200, to obtain the data page P12' (i.e., the data page P12 with the log sequence number of 200 updated and generated by the master node). At the time T4, the slave node deletes the data page P12' played back in the cache space due to insufficient cache space; but the master node does not refresh the data page P12 generated by the master node side update onto the shared storage node. At time T5, the slave node again initiates the read data page P12 operation, and the slave node has read the data page P11 on the shared storage node, i.e., has read the contents of the "past data page", because the slave node has obsolete the data page P12' previously played back by the slave node in its cache space. In practical applications, when a data page is generated by updating a master node and the data page generated by updating is not refreshed in a shared storage node, the data page generated by updating the master node may be referred to as a "dirty page".
In the related art, in order to avoid the problem that the data acquired from the slave node is inaccurate (for example, the acquired data is old data which is not updated) caused by that the master node does not update the data to the shared storage node, in one possible implementation, the foregoing problem may be avoided by limiting the manner in which the slave node deletes the data page cached in the slave node cache space. That is, the corresponding updated data page contained in the cache space in the slave node may be deleted only when the master node updates the data of the data page into the shared storage node. For example, after updating the data corresponding to the data page a with the log sequence number 100 to the shared storage node, the master node may delete the data page B with the specified log sequence number in the cache space of the slave node, where the specified log sequence number is a sequence number with a value less than or equal to 100. And the slave node is not allowed to be deleted for the data page C with the unrefreshed log sequence number in the cache space. The data page A, the data page B and the data page C have the same data page identification, and the unrefreshed log serial number is a log serial number with a value greater than 100. However, the above method easily causes more data pages to be cached in the slave node, which results in a problem of more occupied cache space.
In another possible implementation manner in the related art, the slave node needs to cache the log with the un-refreshed log sequence number, where the un-refreshed log sequence number is larger than the value of the log sequence number corresponding to the data page updated to the shared storage node by the master node. The log cached in the slave node includes a log type, meta information (including Space ID and Page Number, which are combined to be a data Page identifier), and Body information (indicating the change data of the record corresponding to the log record). However, the above method of caching logs also easily causes a problem that the slave node caches more space.
In this scheme, in order to avoid the above technical problem, when the master node synchronizes the log to the slave node, the master node will segment the content contained in the log, that is, only extract the first data page identifier in the log, determine the location information of the first data contained in the log, and use the first data page identifier and the location information as the log synchronized to the slave node, so that the slave node caches the received content into the corresponding cache space (i.e., the memory pool) thereof, thereby reducing the storage space occupied by the log that needs to be cached by the slave node, and further ensuring that the slave node can obtain accurate data.
Having described the basic principles of the present disclosure, various non-limiting embodiments of the present disclosure are specifically described below.
Application scene overview
Referring first to fig. 4, a schematic diagram of a database architecture provided in the present disclosure is provided. As shown in fig. 4, the graph includes one master node and two slave nodes. The master node can access the shared storage node at the far end of the network in a read-write mode, and the slave node can access the collinear storage node at the far end of the network in a read-only mode. In the database architecture, the writing capability of the database architecture can be expanded by improving the processing capability corresponding to the master node, and the reading capability of the database architecture can be improved by deploying more slave nodes.
Exemplary method
A data processing method based on a shared memory architecture according to an exemplary embodiment of the present disclosure is described below with reference to the following examples in conjunction with the application scenario of fig. 4. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in any way in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.
Fig. 5 is a flow chart of a data processing method based on a shared memory architecture according to an embodiment of the disclosure. As shown in fig. 5, the method comprises the steps of:
S501, receiving first log information sent by a master node; the first log information is obtained by the segmentation processing of the second log information by the master node; the second log information is used for recording the first data and the first data page identification; the first data is changed data which is made by updating the original data page into the first data page by the master node; the first data page identifier is an identifier of the first data page; the first log information is used for recording a first data page identifier and first position information of first data; the first location information characterizes a location of the first data in the shared storage node.
The execution body in this embodiment may be a slave node in the shared storage architecture, where the slave node may be a computing device, and specifically, the computing device may be a server (such as a cloud server, or a local server), or may be a computer, or may be a terminal device, or may be a processor, or may be a chip, or the like, which is not limited in this embodiment.
In this embodiment, when the master node needs to change the data in the shared storage node, the original data page to be modified is first modified in the memory pool corresponding to the master node, so as to obtain a first modified data page. And the master node correspondingly generates second log information, wherein the second log information is used for recording data (namely, first data) which is correspondingly changed when the master node modifies the original data page into the first data page, and the second log information is also recorded with a first data page identifier corresponding to the first data page. Thereafter, the master node further determines location information (i.e., first location information) of the first data in the second log information in the shared storage node based on the generated second log information.
It should be noted that, the master node may determine, according to the end position of the log information generated corresponding to the previous data modification operation in the shared storage node and the second log information generated this time, the position of the first data included in the second log information when the first data is stored correspondingly in the shared storage node. Here, the manner of determining the first location information is merely illustrative, and is not particularly limited.
And then, the master node sends the first data page identifier included in the second log information and the first position information corresponding to the determined first data as the first log information to the slave node.
It should be noted that, the master node may send the first log information to the slave node, which may be directly sent to the slave node by the master node, or may be based on forwarding to the slave node by other devices, or may notify the slave node to obtain the first log information from the above devices after the master node stores the first log information in a certain device, which is not limited in this embodiment.
S502, if the second data page exists in the first cache pool according to the first log information, adding a first identifier for the second data page; the first identifier characterizes that the data in the first data page and the data in the second data page are different; the second data page and the first data page have the same data page identification; the first cache pool is used for storing data pages cached from the nodes.
For example, when the master node performs the data change operation, it generally preferentially modifies the data page in the cache corresponding to the master node, and then the master node asynchronously modifies the data page stored in the shared storage node. After the slave node receives the first log information generated by the master node, it may search, according to the received first log information, whether a data page (i.e., a second data page) having the first data page identifier exists in the first cache pool corresponding to the slave node. If the second data page exists in the first cache pool corresponding to the slave node, it indicates that the master node side has performed a modification operation on the data page with the first data page identifier, where the data in the second data page with the first data page identifier in the first cache pool is old data, so that the first identifier may be added to the second data page in the first cache pool to indicate that the second data page is not updated yet, that is, the second data page is not synchronized with the data in the first data page generated by updating at the master node side, where the data in the two data pages are different.
It should be noted that, the data included in the data page having the same data page identifier may be understood as the data corresponding to the same data object. For example, each of the first data page and the second data page has a first data page identifier, and when the data stored in the first data page is the data corresponding to the field B in the data table a, the data stored in the second data page is also the data corresponding to the field B in the data table a, except that the data in the first data page is the latest data after updating, and the data in the second data page is the data at any time before updating the first data page.
S503, responding to a data query instruction, and determining target data according to the first cache pool.
In this embodiment, after receiving the data query instruction from the slave node, the target data of the query indicated by the data query instruction may be further determined according to the first cache pool corresponding to the slave node. When the slave node determines the target data according to the first cache pool, if a data page of a certain cache in the first cache pool is hit in the data query process, whether the data page has the first identifier may be further determined to determine whether the data in the data page is in the latest state, so as to ensure that the acquired data is the latest data. If the data is not the latest data, the data in the first cache pool can be updated and modified based on the position information of the first data indicated in the first log information received from the node by searching corresponding data in the shared storage node, so as to obtain the target data.
It may be appreciated that, in this embodiment, in order to ensure that the slave node can accurately acquire data, the master node may feedback the first log information to the slave node, so that the slave node may determine whether the data in the data page cached by itself is the latest data according to the first log information. Meanwhile, the first log information can also indicate the storage position of the changed data in the shared storage when the master node changes the data page, so that the changed data can be obtained from the node later based on the first log information, and further accurate data can be obtained. Compared with the mode that the master node directly transmits the second log information (i.e. the log recorded with the first data and the first data page identifier) to the slave node in the related art, in the embodiment, the master node only records the first data page identifier and the first position information corresponding to the first data in the first log information transmitted to the slave node, and the method provided by the embodiment can also effectively reduce the occupation of the storage space of the slave node and is beneficial to reducing the data transmission quantity.
Fig. 6 is a flowchart of another data processing method based on a shared memory architecture according to an embodiment of the disclosure. As shown in fig. 6, the method comprises the steps of:
s601, receiving first log information sent by a master node; the first log information is obtained by the segmentation processing of the second log information by the master node; the second log information is used for recording the first data and the first data page identification; the first data is changed data which is made by updating the original data page into the first data page by the master node; the first data page identifier is an identifier of the first data page; the first log information is used for recording a first data page identifier and first position information of first data; the first location information characterizes a location of the first data in the shared storage node.
The execution body in this embodiment may be a slave node in the shared storage architecture, where the slave node may be a computing device, and specifically, the computing device may be a server (such as a cloud server, or a local server), or may be a computer, or may be a terminal device, or may be a processor, or may be a chip, or the like, which is not limited in this embodiment.
S602, if the second data page exists in the first cache pool according to the first log information, adding a first identifier for the second data page; the first identifier characterizes that the data in the first data page and the data in the second data page are different; the second data page and the first data page have the same data page identification; the first cache pool is used for storing data pages cached from the nodes.
For example, the specific principles of step S601 and step S602 may be referred to above in steps S501-S502, and will not be described herein.
S603, responding to a data query instruction, and if the first cache pool comprises a third data page and the third data page does not have a second identifier, determining the data in the third data page as target data; the third data page and the target data page requested to be queried by the data query command have the same data page identification; the second identifier characterizes the data in the third data page as not identical to the data in the master node update generation.
Illustratively, in this embodiment, the data page identifier indicating the target data page to be queried is included in the data query instruction received from the node. After receiving the data query instruction from the node, the first cache pool corresponding to the node is searched for whether a data page with the same data page identifier as the target data page exists.
If the slave node determines that the first cache pool includes a data page having the same data page identifier as the target data page, it further determines whether the data page searched in the first cache pool has the second identifier.
And when the searched data page does not have the second identifier, the main node is characterized to update the data page to be queried, and the data in the current third data page is the latest updated data of the main node. That is, the data in the target data page generated after the update of the master node is the same as the data in the third data page, and the slave node may directly use the data contained in the third data page searched in the first cache pool as the target data to be queried.
It can be understood that in this embodiment, when the slave node performs data query, the slave node may first find whether a data page having the same data page identifier as the data page to be queried exists in the first cache pool corresponding to the slave node, which can improve data acquisition efficiency compared with a method of directly accessing the shared storage node to perform data query. In the data query process, whether the data in the data page is the latest data can be determined according to the identification of the data page in the first cache pool, so that the accuracy of the data query is improved.
S604, responding to a data query instruction, and if the first cache pool comprises a fourth data page and the fourth data page is provided with a third identifier, determining third log information corresponding to the fourth data page; the fourth data page and the target data page requested to be queried by the data query command have the same data page identification; the third identifier characterizes that the data in the fourth data page is different from the data in the target data page generated by updating the master node; the third log information is used for recording a second data page identifier and second position information of the fourth data page; the second location information is the location of the second data in the shared storage node; the second data is changed data which is changed by the master node updating the fourth data page to the target data page.
Illustratively, in this embodiment, the data page identifier of the target data page to be queried is included in the data query instruction received from the node. After receiving the data query instruction from the node, first, searching whether a fourth data page with the same data page identifier as the target data page exists in a first cache pool corresponding to the node.
And when the fourth data page has the third identification, the characterization master node updates the data page to be queried, and the data in the current fourth data page is not the latest data. That is, the data in the target data page generated after the master node update is different from the data in the fourth data page. It should be noted that, the third identifier may be determined by the slave node according to log information sent by the master node. The specific principle is the same as that of step S501 and step S502 in fig. 5, and will not be described here again.
Further, when the slave node determines that the data in the fourth data page currently acquired is not the latest data, the slave node may search the log information sent by the master node for the third log information corresponding to the fourth data page. Specifically, when determining the third log information, the second data page identifier corresponding to the fourth data page may be matched with the data page identifier included in the log information cached by the slave node, and further the log information having the second data page identifier may be used as the third log information. The third log information is similar to the first log information fed back to the slave node by the master node, and is sent to the slave node after the generated log information is subjected to segmentation processing when the master node changes the data page. Specifically, it is assumed that a log generated when the master node updates the fourth data page to generate the target data page is initial log information, where the initial log information includes: second data, second data page identification; the splitting process is performed on the initial log information, which can be understood as extracting the second data page identifier in the initial log information, and determining the position information of the second data in the shared storage node to obtain the third log information. The second data page identifier and the second location information are specifically recorded in the third log information. Wherein the second location information may be understood as the location of the data (i.e., the second data) corresponding to the change made in the shared storage node when the master node updates the fourth data page to generate the target data page.
It should be noted that, in practical application, if the master node updates and generates the target data page, the following two phases are included: the first stage: the master node updates the fourth data page to the intermediate data page; and a second stage: the master node updates the intermediate data page to the target data page; when the slave node acquires the log, the slave node can acquire the log information corresponding to each of the two stages. The method comprises the steps of recording position information of data changed when a master node updates a fourth data page to an intermediate data page in one log information, recording position information of data changed when the master node updates the intermediate data page to a target data page in the other log information, and recording second data page identifiers corresponding to the fourth data page in the two data pages.
S605, acquiring second data according to the second position information; and updating the data in the fourth data page according to the second data to obtain target data.
For example, after the second location information is acquired from the node, corresponding second data may be acquired in the shared storage node based on the location indicated by the second location information. Then, according to the second data, the data in the fourth data page acquired from the node is modified and updated, and then the data obtained after modification and updating can be used as target data.
It may be appreciated that in this embodiment, when the data page acquired by the slave node in the first cache pool during the data query process is provided with the fourth identifier, the data in the fourth data page may be updated based on the log information sent by the master node and received by the slave node before, so as to ensure accuracy of the queried target data.
In one example, after the slave node updates the data page in the fourth data page, the third identifier originally set in the fourth data page may be further deleted to characterize the data in the data page as the latest data.
S606, responding to a data query instruction, and if the first cache pool does not comprise a fifth data page, determining fourth log information, wherein the fourth log information is used for recording a third data page identifier and third position information of a target data page; the third location information is the location of the third data in the shared storage node; the third data is changed data which is made by the master node updating the sixth data page to the target data page; the sixth data page and the target data page are used for indicating that the data page identifiers are the same; the target data page is the data page for which the data query command requests a query.
Illustratively, in this embodiment, the data page identifier of the target data page to be queried is included in the data query instruction received from the node. After receiving the data query instruction from the node, first, it searches the first cache pool corresponding to the node for whether there is a fifth data page with the same data page identifier as the target data page. Further, if the slave node determines that the fifth data page does not exist in the first cache pool corresponding to the slave node, the slave node may match fourth log information having a data page identifier corresponding to the target data page in the log information stored in the slave node. It should be noted that, the fourth log information is sent to the slave node as the master node, and is used for recording the location information (i.e., the third location information) in the shared storage node corresponding to the data (i.e., the third data) corresponding to the update of the sixth data page with the data page identifier to the modification corresponding to the target data page
S607, acquiring third data and a sixth data page in the shared storage node according to the third data page identifier and the third position information.
When the third location information is determined from the node, the sixth data page with the third data page identifier may be further searched in the shared storage node according to the third data page identifier corresponding to the target data page. And then, the slave node can also search corresponding third data in the shared storage node according to the third position information.
And S608, updating the data in the sixth data page according to the third data to obtain target data.
For example, when the slave node obtains the third data and the sixth data page from the shared storage node, the data in the sixth data page may be further updated according to the third data, and then the data included in the data page obtained after the update is used as the target data.
It may be understood that in this embodiment, when there is no data page corresponding to the data page identifier corresponding to the data page to be queried in the first cache pool corresponding to the slave node, the slave node may further search for a corresponding sixth data page in the shared storage node, and may further determine, according to the data page identifier of the target data page, first determine third log information in the log information stored in the slave node corresponding to the slave node, and then obtain corresponding third data according to the third log information, so as to update data in the sixth data page, thereby ensuring accuracy of the obtained target data. It should be noted that, the data update of the slave node to the sixth data page only occurs in the first cache pool, and after the slave node updates the sixth data page, the updated data page may be stored in the first cache pool, so that when the slave node performs data lookup subsequently, the target data to be found may be preferentially found in the first cache pool, without needing to frequently access the shared storage node, which is beneficial to improving the data query efficiency. Furthermore, under the shared storage data architecture, the slave node has no authority to change the sixth data page in the shared storage node.
Fig. 7 is a schematic diagram of a slave node data query according to an embodiment of the present disclosure. As shown in fig. 7, when a user needs to find data, it is first determined in the first cache pool corresponding to the slave node whether there is a data page having the same data page identifier as the target data page to be queried (the data page identifier of the target data page is hereinafter referred to as a target data page identifier). In the scenario where there is a data page with the target data page identifier in the first cache pool (herein, the data page with the target data page identifier in the subsequent first cache pool is referred to as a hit data page), the further slave node needs to determine whether the data in the currently hit data page is up to date (may refer to the identifier corresponding to the data page in the above embodiment for determination), that is, whether the hit data page is out of date. If the hit data page expires or the hit data page does not exist in the first cache pool, the log information corresponding to the data page identifier currently required to be queried needs to be matched in the log information cached by the node, and based on the position information in the log information, the log reading module in fig. 7 searches the data with the changed data page identifier of the target data page in the shared storage node in the data updating process. And in the scene that the target data page is identified, the data page can be played back based on the hit data page and the queried changed data to obtain the data which finally needs to be queried. If the data page is not hit, further searching the data page with the corresponding data page identification from the data page stored in the shared storage node, and playing back the searched data with the change based on log reading on the data page to obtain the data which needs to be queried finally.
Fig. 8 is a flowchart of a third data processing method based on a shared memory architecture according to an embodiment of the disclosure. As shown in fig. 8, the method includes the steps of:
s801, receiving first log information sent by a master node; the first log information is obtained by the segmentation processing of the second log information by the master node; the second log information is used for recording the first data and the first data page identification; the first data is changed data which is made by updating the original data page into the first data page by the master node; the first data page identifier is an identifier of the first data page; the first log information is used for recording a first data page identifier and first position information of first data; the first location information characterizes a location of the first data in the shared storage node.
The execution body in this embodiment may be a slave node in the shared storage architecture, where the slave node may be a computing device, and specifically, the computing device may be a server (such as a cloud server, or a local server), or may be a computer, or may be a terminal device, or may be a processor, or may be a chip, or the like, which is not limited in this embodiment.
In one example, the first location information includes: data amount and start position information of the first data; the initial position information is used for indicating an initial storage position of the first data in the shared storage node; the data amount of the first data is used to indicate the amount of storage space occupied by the first data. Specifically, in this embodiment, the first location information in the first log information generated by the master node may specifically include the data amount of the first data and the start location information corresponding to the first data. The data amount of the first data may be used to characterize the size of the storage space that needs to be occupied when the first data is stored in the shared storage node. The start location information may be used to characterize a start location of the first data when stored in the shared storage node. Further, by setting the first location information, it is ensured that the following slave node can acquire complete first data according to the first location information.
S802, if the second data page exists in the first cache pool according to the first log information, adding a first identifier for the second data page; the first identifier characterizes that the data in the first data page and the data in the second data page are different; the second data page and the first data page have the same data page identification; the first cache pool is used for storing data pages cached from the nodes.
S803, responding to the data query instruction, and determining target data according to the first cache pool.
For example, the specific principle of steps S801 to S803 may be referred to as steps S501 to S503, and will not be described herein.
S804, responding to a log playback request, and if the first cache pool comprises a seventh data page with a fourth identifier, acquiring fifth log information corresponding to the seventh data page; the fourth identifier characterizes that the data in the seventh data page is different from the data in the eighth data page generated by updating the master node; the eighth data page and the seventh data page have the same data page identification; the fifth log information is used for recording a fourth data page identifier and fourth position information of the seventh data page; the fourth location information is the location of fourth data in the shared storage node; the fourth data is the data changed by the master node to update the eighth data page with the seventh data page.
In this embodiment, log playback may be understood as an operation that a slave node performs data update in a data page according to log information sent by a master node, and specifically may update a data page storing old data in a cache corresponding to the slave node.
When the slave node receives the log information sent by the master node, the data page updating operation is performed on the main node side. When the slave node receives the above log information, the slave node does not immediately perform a log playback operation based on the received log information. One possible log playback operation may refer to steps S604 and S605, where playback of the corresponding data page is performed when the slave node needs to read the data page, so that unnecessary log playback operations may be reduced, and consumption of processing resources of the slave node may be reduced. In another possible implementation manner, i.e., step S804 in the present embodiment, the data page in the first cache pool in the slave node may also be updated when the slave node receives the log playback request.
In practical application, the log playback request can be triggered by the slave node at regular time, so that the slave node can directly acquire the data page comprising the latest data from the first cache pool corresponding to the slave node, and the data query efficiency is improved. For example, when the slave node determines that the preset log playback time is reached, the slave node may wake up or establish one or more corresponding log playback threads to perform log playback processing.
Specifically, the present invention relates to a method for manufacturing a semiconductor device. When the slave node receives the log playback request, the slave node first determines whether a seventh data page with a fourth identifier exists in the first cache pool corresponding to the slave node.
Wherein the fourth identifier is used for indicating that the master node side updates the seventh data page to generate an eighth data page, that is, the data in the seventh data page in the slave node is old data which is not updated, and the data in the seventh data page and the data in the eighth data page are different. Wherein the seventh data page and the eighth data page have the same data page identification.
When the slave node determines that the seventh data page with the fourth identifier exists in the corresponding cache pool, further fifth log information associated with the seventh data page can be determined in the log information stored in the slave node. The fifth log information is specifically used for recording location information (i.e., fourth location information) corresponding to data (i.e., the fourth data) corresponding to modification when the master node updates the seventh data page to the eighth data page in the shared storage node. In addition, a fourth data page identifier corresponding to the seventh data page is also recorded in the fifth log information. So that the slave node can quickly match the log information corresponding to the seventh data page in the plurality of log information.
S805, fourth data is acquired from the shared storage node according to the fourth position information.
Illustratively, in this embodiment, after the slave node acquires the fourth location information described above, further, the slave node may acquire fourth data at the fourth location information in the shared storage.
And S806, updating the data in the seventh data page according to the fourth data to obtain an updated seventh data page.
In this embodiment, when the fourth data is acquired from the slave node, the data update process is performed on the seventh data page in the corresponding cache pool, so that the data in the data page in the cache pool in the slave node may be the same as the data in the data page having the same data page identifier as the data page that is updated last by the master node.
It can be understood that in this embodiment, the slave node does not perform the log playback operation immediately after receiving the log information sent by the master node, but performs the log playback after receiving the log playback request, and further, in the log playback manner described above, compared to a processing manner in which the log playback is performed only when the data page is read from the cache of the slave node, in this embodiment, when the log playback request is triggered periodically, the data page in the first cache pool also performs the log playback periodically; when the slave node needs to read the data page, the probability that the log playback operation is still needed to be further carried out on the data page in the first cache pool can be reduced, so that the data query efficiency of the slave node is improved. In addition, compared with the processing mode of real-time playback after the slave node receives the log information, the method of regular playback provided in the embodiment has lower real-time response requirement on the slave node equipment, and can avoid the phenomenon of playback failure caused by the fact that the slave node equipment is difficult to respond in time when receiving a large amount of log information.
S807, receiving notification information sent by the main equipment; the notification information is used to indicate that the ninth data page having the first log sequence number has been restored by the master device to the shared storage node.
Illustratively, the data page in this embodiment has a log sequence number. The corresponding log information when the data page is generated by corresponding searching and updating can be obtained according to the log serial number corresponding to the data page.
When the master node restores the ninth data page that it updated and generated to the shared storage node, notification information may be sent to the slave node to inform the slave node of the first log sequence number of the restored ninth data page.
S808, deleting the sixth log information received from the node according to the notification information; the sixth log information is used for recording a data page identifier and fifth position information of the ninth data page; the fifth location information is a location of fifth data in the shared storage node; the fifth data is data in the data page changed when the master node updates and generates the ninth data page, and the second log serial number of the sixth log information is smaller than or equal to the first log serial number.
For example, in practical application, since the log sequence number is monotonically increasing, after the slave node determines the first log sequence number according to the notification information, since the ninth data page corresponding to the first log sequence number is already stored in the shared storage node, the corresponding log information for recording, updating, and generating the data content of the ninth data page (i.e., the data page identifier of the ninth data page and the fifth location information of the data corresponding to the change when updating to the ninth data page) need not be stored in the slave node, so as to avoid occupying more storage space.
In one example, when the sixth log information to be deleted is determined from the slave node, from the log information stored in correspondence with the slave node, the log information having the same data page identifier as the ninth data page in the notification information may be first determined, and the determined log information may be used as the candidate log information. And then, according to the first log sequence number in the notification information, taking the candidate log information with the second log sequence number as sixth log information, wherein the second log sequence number is smaller than or equal to the first log sequence number.
It can be appreciated that in this embodiment, the master node may send notification information to the slave node, so that the slave node may determine, from the log information stored in the slave node, log information that need not to be stored continuously, thereby being beneficial to reducing the storage space at the slave node.
In one possible implementation, the shared storage node includes a first storage area and at least one second storage area; the first storage area is used for the first log information generated by the master node; the second storage area is used for storing sixth data corresponding to the tenth data page; the tenth data page is a data page having a data page identification corresponding to the second storage area; the sixth data is the data updated by the master node when the tenth data page is generated.
Typically, when the master node needs to change data in the shared storage node, the master node generally modifies the data in its corresponding cache, stores the log information for recording the modified content in the shared storage node, and then synchronously transfers the data modified in its cache to the shared storage node. In this embodiment, the log information is stored for the partition when the master node stores the log information in the shared storage node. When the master node generates log information for recording the modified data content, the log information is typically further subjected to a splitting process, for example, splitting the second log information into the first log information and the first data as in step S501.
Therefore, in the shared storage node, the log information that the master node needs to store can be stored in a partition. In this embodiment, the shared storage node includes a first storage area and at least one second storage area. Wherein the first log information generated by the master node may be stored in the first storage area. In addition, log information obtained after the splitting, which is transmitted to the slave node by the master node each time, may also be stored in the first storage area. Further, for the second memory area in the shared memory, the second memory area has a data page identifier corresponding thereto, and the second memory area is for storing data update contents corresponding to the data page having the data page identifier corresponding thereto. For example, when the master node updates and generates the tenth data page, and the data corresponding to the tenth data page updated and generated to make the modification is the sixth data, the master node may store the sixth data into the second storage area corresponding to the data page identifier of the tenth data page. Likewise, the first data in the second log information may also be stored in the second storage area corresponding to the first data page identifier. In practical applications, the second storage area and the data page identifier may also be in one-to-one correspondence, which is not specifically limited in this embodiment.
It can be understood that in this embodiment, the log and the data obtained after the master node is split are stored in a partition, so that when the data is searched for later, the data can be searched for in the second storage area corresponding to the data page according to the data page identifier of the data page to be searched for, thereby improving the data searching efficiency.
In a possible implementation manner, on the basis of the method provided in any one of the foregoing embodiments, in this embodiment, after receiving the first log information sent by the master node from the slave node, the first log information is further stored, and specifically, the log information may be stored by using the following method: the first data page identification in the first log information is used as a keyword; determining indication information of the first log information based on a hash algorithm and the first data page identifier; the indication information is used for indicating a storage space of the first log information; the first log information is stored in a storage space indicated by the indication information.
Illustratively, in this embodiment, when the slave node stores the first log information, the first log information may be stored in a hash storage manner. Specifically, the first data page identifier included in the first log information may be used as a key word during hash storage, the first location information corresponding to the first data in the first log information is used as a value corresponding to the key word, then hash processing is performed on the first data page identifier according to a hash algorithm, and further indication information is obtained, where the indication information may be used to indicate a storage space of the first log information, and then the first log information is stored in the storage space corresponding to the indication information.
It can be understood that, in this embodiment, when the slave node stores the first log information sent by the master device, a hash storage manner of key words and key value pairs may be used for storing the first log information, so that the corresponding location information may be matched according to the data page identifier subsequently, which is further beneficial to improving log matching efficiency in the subsequent slave node.
For example, fig. 9 is a schematic diagram of yet another slave node data query provided by an embodiment of the present disclosure. As shown in fig. 9, the slave node may find a data page P1 having a data page identifier of a data page to be found currently from the shared storage node, and then, if the master node performs multiple changes on the basis of the data page P1 when updating to generate the target data page to be found finally, the data (for example, data set 1, data set 2, and data set 3 are illustrated in the figure) corresponding to each change when acquiring the multiple changes in the second storage area (i.e., partition 1 in the figure) corresponding to the data page identifier. Then, on the basis of the data page P1, updating the data page according to each changed data in turn, so as to obtain a target data page P2 which is finally required to be queried, and feeding back the target data page to a user. The slave node may determine, when acquiring the data set, the position information included in the log information corresponding to the data page identifier of the target data page from among the log information cached by the slave node.
Fig. 10 is a flowchart of a fourth data processing method based on a shared memory architecture according to an embodiment of the disclosure. As shown in fig. 10, the method includes the steps of:
s1001, responding to a data updating request, and updating data in at least one original data page to obtain a first data page after corresponding updating of the original data page; and generating at least one second log information; the second log information is used for recording the first data and the first data page identification; the first data is changed data which is made by updating the original data page into the first data page by the master node; the first data page identity is an identity of the first data page.
The execution body in this embodiment may be a master node in the shared storage architecture, where the master node may be a computing device, specifically, the computing device may be a server (such as a cloud server, or a local server), or may be a computer, or may be a terminal device, or may be a processor, or may be a chip, or the like, and this embodiment is not limited.
Specifically, the data update request in this embodiment may be used to instruct the master node to modify one original data page, or may be used to instruct to modify a plurality of data pages, which is not limited in this embodiment.
S1002, performing segmentation processing on the second log information to obtain first log information; the first log information is used for recording a first data page identifier and first position information of first data; the first location information characterizes a location of the first data in the shared storage node.
S1003, transmitting the first log information to the slave node.
It should be noted that the technical principles of steps S1001 to S1003 in this embodiment may be described with reference to the embodiment in fig. 5, which is not described herein.
Fig. 11 is a flowchart of a fifth data processing method based on a shared memory architecture according to an embodiment of the disclosure. As shown in fig. 11, the method includes the steps of:
s1101, responding to a data updating request, and updating data in at least one original data page to obtain a first data page after corresponding updating of the original data page; and generating at least one second log information; the second log information is used for recording the first data and the first data page identification; the first data is changed data which is made by updating the original data page into the first data page by the master node; the first data page identity is an identity of the first data page.
S1102, performing segmentation processing on the second log information to obtain first log information; the first log information is used for recording a first data page identifier and first position information of first data; the first location information characterizes a location of the first data in the shared storage node; and transmits the first log information to the slave node.
For example, the specific principles of step S1101 and step S1102 may be referred to the description in the embodiment of fig. 5, and are not repeated herein.
S1103, caching the first data page into a second cache pool corresponding to the master node; the second cache pool is used for caching the data page generated by the master node; and writing the first log information corresponding to the second log information and the first data included in the second log information into a second cache pool. And caching the first data page into a second cache pool corresponding to the master node.
In an exemplary embodiment, after the master node updates and generates the first data page, the first data page is not directly updated to the shared storage node, but is first cached to the second cache pool corresponding to the master node, so as to avoid a phenomenon of processing resource waste caused when the master node frequently accesses the shared storage node. In addition, in order to accurately record the change from the original data page to the first data page, the master node writes the first data generated by the master node and the first log information into the second cache pool in time.
It should be noted that, step S1103 may be performed after the step of "sending the first log information to the slave node" in step S1102, or may be performed simultaneously, or may be performed before the step of "sending the first log information to the slave node", which is not particularly limited in this embodiment.
In one example, the second cache pool is used for caching data pages generated by the master node; and writing the first log information corresponding to the second log information and the first data included in the second log information into a second cache pool. The second cache pool comprises a first cache region and at least one second cache region; the first buffer area is used for storing first log information corresponding to each second log information; the second buffer area is used for storing seventh data corresponding to the eleventh data page; the eleventh data page is a data page with a data page identifier corresponding to the second buffer area; the seventh data is updated data corresponding to the update when the eleventh data page is generated by the master node.
Illustratively, in this embodiment, the master node is also stored in the partition when caching the log information and the update data in the second cache pool corresponding to the master node. Specifically, the second buffer pool includes a first buffer area and at least one second buffer area. When the master node changes the original data page according to the data updating request, at least one first log information generated by the master node is stored in the first cache region. That is, the first buffer area is used for storing the position information of the data which is correspondingly changed in the process of recording the data change in the shared storage node and the data page identification of the data page which is correspondingly changed. And the second buffer area is provided with a corresponding data page identifier. When the master node needs to store the changed data of the data page, the changed data can be stored in the second buffer area corresponding to the data page. For example, in practical application, the second buffer area corresponds to the data page identifier one by one. In a second buffer, only the data for which a change is made to the data page having the data page identification corresponding to the second buffer is stored.
It can be understood that, by the partition storage mode, the master node can conveniently perform corresponding data query.
S1104, based on a plurality of first threads, the data stored in the first cache area and the second cache area in the second cache pool are written into the shared storage node in parallel.
In this embodiment, when the second buffer pool includes a first buffer area and at least one second buffer area, and the master node needs to transfer the data in the first buffer area and the second buffer area to the shared storage node, the master node may simultaneously create a plurality of first threads, and write the data correspondingly stored in the first buffer area and the at least one second buffer area into the shared storage node in parallel, so as to improve the writing efficiency of writing the data into the shared storage node.
In one example, when the master node writes the data in the buffer area into the shared storage node based on the first thread, when the size of a sector in a disk of the shared storage node is 4K (i.e., 4096 bytes), when writing the data into the shared storage node, the data can be written into the disk of the shared storage node in a 4K aligned manner, and since 16K of data can be generally stored in one data page, the first thread can extract 16K of data each time, and then write 16K of data into the disk in a 4K aligned manner, thereby improving the subsequent data reading and writing capability of the hard disk in the above manner.
In one example, a shared storage node includes a first storage area and a second storage area; the first storage area is used for storing first log information generated by the master node; the second storage area is used for storing sixth data corresponding to the tenth data page; the tenth data page is a data page having a data page identification corresponding to the second storage area; the sixth data is the data updated by the master node when the tenth data page is generated.
Illustratively, in this embodiment, the shared storage node may also have a first storage area and a second storage area divided therein. The details in this example may refer to the description about the shared storage node partition in the embodiment shown in fig. 8, and will not be described herein.
In one example, the first storage area corresponds to the first cache area; the second buffer areas are in one-to-one correspondence with the second storage areas.
Illustratively, in this embodiment, the number of the second storage areas in the shared storage node is at least one. And when the second buffer pool corresponding to the master node is divided into a first buffer area and at least one second buffer area, the first storage area in the shared storage node and the first buffer area of the master node correspond to each other, that is, the data buffered in the first buffer area by the master node can be written into the first storage area in the shared storage node subsequently. Similarly, the data buffered in the second buffer area by the master node may be subsequently written into the second storage area in the shared storage node, which corresponds to the second buffer area one by one.
S1105, determining first writing information and second writing information; the first writing information is used for indicating a first serial number corresponding to first log information written into the shared storage node in the first cache region; the first serial number is a global log serial number which is used for triggering the master node to generate a data update request of fourth log information; the fourth log information is second log information corresponding to the first log information; the second writing information is used for indicating a second serial number corresponding to the seventh data written in the shared storage node in the second cache area; the second serial number is a global log serial number which triggers the master node to generate a data update request of fifth log information; the fifth log information includes seventh data; the data update request has a first global log sequence number.
Illustratively, before persisting the data page that it updates to, it first needs to ensure that the master node has written the log information corresponding to the data page that it updates to, to the shared storage node. In this embodiment, when the master node writes the log information generated by the master node corresponding to the plurality of first threads into the shared storage node (i.e., the content of step S1105 described above), and the master node persistently stores the data page generated by updating in the shared storage node, it is further required to determine whether the writing of the corresponding log information into the shared storage node has been completed.
Specifically, when the above determination is performed, it is first necessary to determine the serial numbers of the logs that have been written into the shared storage node in the first buffer area and the second buffer area, respectively.
The first buffer area is used for storing first log information generated after each segmentation process of the master node, and each first log information has a first serial number corresponding to the first log information. When the first log sequence number corresponding to the first log information is set, the master node generates the first log information through triggering of a data update request, generates the second log information through data update, cuts the second log information to generate the first log information, and the series of operation processes are obtained. When the data update request indicates that the data of the plurality of data pages is required to be changed, a plurality of first log information is correspondingly generated, and the first log sequence numbers corresponding to the plurality of first log information are global log sequence numbers corresponding to the data update request.
Similarly, the data stored in the second buffer is modified by the master node when the data page is updated under the triggering of the data update request. Therefore, when determining that the second serial number corresponding to the seventh data written in the second buffer area is written in the shared storage node, the global log serial number corresponding to the data update request triggering the master node to generate the fifth log information for recording the seventh data may be used as the second writing information corresponding to the second buffer area.
S1106, determining a global persistence log sequence number according to the first writing information and the second writing information; the global persistence log sequence number characterizes that all sixth log information corresponding to the first request is transferred to the shared storage node; the first request is a data update request having a second global log sequence number; the second global log sequence number is smaller than or equal to the global persistence log sequence number; the sixth log information is used to record the data page identification and the adjusted data of the data page for which the first request triggers an update.
For example, since global log sequence numbers corresponding to different data update requests are different, after the master node determines the first write information corresponding to the first cache region and the second write information corresponding to each second cache region, it may determine, according to the sequence numbers indicated by the write information, which data update requests are generated and stored corresponding to the first cache region and the second cache region, that all contents of the first cache region and the second cache region are stored in the shared storage node in a fully persistent manner. That is, it is desirable to determine which data update requests trigger the generation of updated data pages that can be written to the shared storage node to complete the persistent storage.
The global persistent log sequence number characterizes that all contents generated and stored in the first cache area and the second cache area corresponding to a data update request (i.e., a first request) with a second global log sequence number are subjected to persistent storage, wherein the second global log sequence number is smaller than or equal to the global persistent log sequence number. The sixth log information may be understood as log information generated by the master node under the triggering of the first request, and specifically may include a data page identifier of the modified data page indicated by the first request, modified data made during the modification, and location information of the modified data.
For example, if the first writing information indicates that the first log sequence number corresponding to the data that has been currently and durably stored is 10, and the second writing information indicates that the second log sequence number corresponding to the data that has been currently and durably stored is 11, it may be determined by the first writing information and the second writing information that the content that is generated and stored in the first buffer area under the triggering of the data update request with the current global log sequence number less than or equal to 10 has completed the durably stored, and the content that is generated and stored in the second buffer area under the triggering of the data update request with the current global log sequence number less than or equal to 11 has completed the durably stored, and further, by comparing the first log sequence number in the first writing information with the second log sequence number in the second writing information, it may be determined that the current global log sequence number is less than or equal to 10 (that may be regarded as the global durably log sequence number), and the whole content that is generated and stored in the first buffer area and the second buffer area has completed the durably stored in the triggering of the data update request with the current global log sequence number less than or equal to 10, and further the durably stored in the corresponding data page may be performed.
S1107, determining a data page to be written in a second cache pool according to the global persistence log serial number; the third log serial number corresponding to the data page to be written is smaller than the global persistence log serial number.
The method includes that after a global persistent log sequence number is determined by a master node, a data page to be written with a third log sequence number is searched in a second cache pool according to the global persistent log sequence number, wherein the data page to be written is updated and generated by the master node under a data update request and is cached in the second cache pool, and the third log sequence number corresponding to the data page to be written is smaller than the global persistent log sequence number. It should be noted that, the third log sequence number corresponding to the data page to be written is a global log sequence number of the data update request triggering the master node to update and generate the data page to be written.
S1108, updating the data page to be written into the shared storage node.
The master node may, for example, after determining the data page to be written, further update the data page to be written into the shared storage node. It should be noted that, after the master node updates the data page to be written into the shared storage node, notification information may be sent to the slave node, so as to inform that the data page to be written has completed the data page persistent storage, so that the slave node may delete the log information cached by the slave node. Here, the operation of the slave node performing log deletion according to the notification information may be referred to the description at steps S807-S808, and will not be described here.
It can be understood that in this embodiment, when the master node writes log information into the shared storage node in parallel, by determining the first writing information of the first buffer area and the second writing information of the second buffer area in the second buffer pool of the master node, it can be determined which data update requests trigger all log information generated to be restored to the shared storage node. And after all log information generated by triggering the data update request is transferred to the shared storage node, the master node transfers the updated data page generated by the data update request to the shared storage node. That is, after the master node ensures that all log information is written, the master node asynchronously writes updated data pages into the shared storage node, and when the data pages in the second cache pool are lost due to equipment failure, the master node can recover the data pages according to the log information stored in the shared storage node.
In one possible implementation, the data update request has a first global log sequence number; the shared storage node further comprises: a third storage area and a fourth storage area; the third storage area comprises N first subareas; the fourth storage area comprises N second subareas; the first subareas are in one-to-one correspondence with the second subareas; n is a positive integer;
On the basis of the embodiment, the method further comprises the following steps:
determining at least one fourth location information and at least one fifth location information corresponding to the data update request; the fourth position information is the position of the first log information in the second cache pool; the fifth position information is the position of the first data in the second cache pool; writing a first global log sequence number into a first sub-region; and writing the fourth position information and the fifth position information corresponding to the data updating request into the second subarea corresponding to the first subarea.
In this embodiment, when the master node writes the first log information and the first data triggered and generated by the data update request into the second cache area corresponding to the master node, the corresponding location of the first log information in the second cache pool (i.e. the fourth location information) and the location of the first data (i.e. the fifth location information) in the second cache pool are further determined, and are correspondingly stored into the shared storage node. It should be noted that, since the data update request may trigger modification of a plurality of data pages, a plurality of fourth location information and a plurality of fifth location information may be generated.
Specifically, in storing the above information, the third storage area and the fourth storage area may be provided in the shared storage node. Wherein the third memory area may be divided into N first sub-areas. And the fourth memory area is divided into N second sub-areas.
When the master node stores the first global log sequence number corresponding to the data update request and correspondingly determines the fourth location information and the fifth location information, the first global log sequence number may be stored in a first sub-area in the third storage area. And simultaneously, storing fourth position information and fifth position information corresponding to the first global log sequence number into a second subarea, wherein the second subarea is a second subarea corresponding to the first subarea for storing the first global log sequence number.
In this embodiment, the log serial number corresponding to the data update request and the fourth location information and the fifth location information of the first log information generated under the triggering of the data update request are correspondingly stored, so that after the fault of the subsequent master node is repaired, the meaning of the corresponding stored content in the second cache pool corresponding to the master node can be determined through sharing the location information corresponding to the storage node, and it is determined where the content contained in the generated log triggered by the next data update request starts to be stored.
For example, fig. 12 is a schematic view of a scenario of data storage according to an embodiment of the present disclosure. As shown in fig. 12, in the figure, log information 1 includes data (i.e., data segment 1 in the figure) of which the master node makes a change to data page P2 under the trigger of data update request 1 and data (i.e., data segment 2 in the figure) of which the master node makes a change to data page P1 under the trigger of data update request 1. Included in the log information 2 are data of changes made to the data page P1 by the master node (i.e., data segment 3 in the figure) triggered by the data update request 2, data of changes made to the data page P2 (i.e., data segment 4 in the figure) triggered by the data update request 2, and data of changes made to the data page P3 (i.e., data segment 5 in the figure) triggered by the data update request 2. Wherein, the time of the master node responding to the data update request 1 is before the time corresponding to the master node responding to the data update request 2. Wherein the data page P1 can be understood as a data page with a data page identification a; the data page P2 can be understood as a data page with a data page identification B; the data page P3 can be understood as a data page with a data page identification C.
And, the master node may set a plurality of second buffer areas in the second buffer pool corresponding to the master node. The second buffer area is in one-to-one correspondence with the data page identification. As shown in fig. 12, when the second buffer area corresponds to the data page identifier one by one, it is assumed that the data page identifier corresponding to the second buffer area 1 is the data page identifier a; the data page identifier corresponding to the second buffer area 2 is a data page identifier B; the data page identifier corresponding to the second cache region 2 is a data page identifier C; the main node can correspondingly store the data segment 2 and the data segment 4 into the second buffer area 1; correspondingly storing the data segment 1 and the data segment 3 into a second buffer area 3; the data segment 5 is correspondingly stored in the second buffer 3. By the partition storage mode, the subsequent master node can set a plurality of threads to write the stored contents in the plurality of cache areas into the shared storage node in parallel, so that the writing efficiency is improved.
And the second buffer pool corresponding to the master node further includes a first buffer area, where the first buffer area is mainly used to store log information (for example, the first log information in the above embodiment) obtained by the master node after performing the splitting processing on the initially generated log information under the triggering of the data update request. Specifically, taking the first log information as an example, the page identifier included in the first log information may specifically be a space (tablespace identifier) and a page number (page number) corresponding to the page. The association relationship between the data page identifier and the second buffer area can be established by performing modular processing on the page number in the data page identifier. Further, the first location information in the first log information may include only the data amount corresponding to the first data. Specifically, in this example, the first buffers are respectively used to store log information 3-log information 7, where the log information 3 is used to record the location information of the data segment 1 in the shared storage node, and the data page identity of the data page P2. The log information 4 is used to record the location information of the data segment 2 in the shared storage node, as well as the data page identity of the data page P1. The remaining log information 5-7 is similar to the log described above and will not be described again here.
Wherein, square shading parts in the first cache region and the second cache region represent data which is stored to a shared storage node in each cache region in a lasting way; and, the diagonal shading characterizes the cached data. And then, determining the first writing information and the second writing information corresponding to each second buffer area according to the corresponding durable content in each buffer area. By determining the first writing information and the second writing information, it can be timely determined which data updating requests received by the master node trigger the generated log information to be completely written into the shared storage node, which data pages in the master node can be written into the shared storage node is facilitated to be determined.
In addition, when the master node writes the data in each buffer area into the shared storage, a multi-process parallel writing mode can be adopted to improve the writing speed. As shown in fig. 12, the shared storage node in fig. 12 includes a first storage area and a second storage area corresponding to the second cache area one by one.
In addition, fig. 13 is a schematic diagram of an architecture of a shared storage node according to an embodiment of the disclosure. As shown in fig. 13, fig. 13 includes a third memory area and a fourth memory area. The third buffer area comprises a plurality of first sub-areas, and the first sub-areas are used for storing first global log sequence numbers corresponding to data update requests received by the master node. The fourth buffer area comprises a plurality of second subareas. Wherein the second sub-area is in one-to-one correspondence with the first sub-area (as indicated by the dashed arrow). When the master node performs data storage in the manner shown in the first buffer area and the second buffer area in fig. 12, specifically, the second sub-area further includes a plurality of partitions, where the number of partitions is equal to the sum of the number of the first buffer areas and the number of the second buffer areas. For example, if the number of the first buffer areas is 1 and the number of the second buffer areas is 3 as shown in fig. 12, the number of the partitions in the second sub-area is 4. The first partition of the second sub-area is used for storing the position of the data which is generated and stored in the first buffer area and corresponds to the data updating request corresponding to the second sub-area; and the second partition of the second sub-area is used for storing the position of the data which is generated and stored in the second buffer area and corresponds to the data update request corresponding to the second sub-area, and the rest partitions are not repeated. If the data update request does not store data in the corresponding cache region, filling data or writing preset characters is not needed in the corresponding sub region corresponding to the cache region in the corresponding second sub region, so that the data update request is characterized in that the data update request does not store data in the cache region correspondingly in a mode of not filling data or writing preset characters. The data update request corresponding to the second sub-region may be understood as a data update request corresponding to the first global sequence number stored in the first sub-region corresponding to the second sub-region. The fourth and fifth location information in the above embodiment are further characterized by the location information recorded in each buffer area in the above manner of partitioning the second sub-area. In this embodiment, when the master node records the location information of the log information generated by the data update request trigger in the second cache area, and when the second cache area stores the log information in the partition storage manner as shown in fig. 12, at this time, the second sub-area in the shared storage node may further perform area division, so that the location information of the data in each cache area generated by the data update request trigger and stored in the second cache pool is recorded in the second sub-area in a partition manner. The method is beneficial to the subsequent quick determination of the position of the content in the log generated by triggering corresponding to the data update request in the second cache pool according to the corresponding relation between the first subarea and the second subarea and the stored position information of each subarea in the second subarea. For example, in fig. 13, a first sub-region in the third memory region corresponds to a first second sub-region in the fourth memory region, and a second first sub-region in the third memory region corresponds to a second sub-region in the fourth memory region. When the space occupied by the first sub-area is a fixed value, the space occupied by the second sub-area is a fixed value, and the space occupied by each partition in the second sub-area is also a fixed value, and at this time, if it is determined that the global log sequence number corresponding to the data update request is stored in the 3 rd first sub-area in the first storage area, the corresponding location information of each cache area of the log content partition generated by the data update request and stored in the second cache pool can be directly determined in the third second sub-area of the fourth storage area.
Fig. 14 is a schematic diagram of a data processing flow provided in an embodiment of the disclosure. As shown in fig. 14, after the host node performs the data page update (for example, when the data in the data page is modified, deleted or added), second log information is correspondingly generated, where the second log information is used to record the first data changed by the update, and is also used to record the data page identifier of the data page updated this time. The master node also performs a slicing process on the second log information, that is, extracts a storage location of the first data in the second log information and a data page identifier in the second log information as the first log information. And the master node stores the generated first log information and the first data partition into a first buffer area and at least one second buffer area corresponding to a first buffer pool corresponding to the master node, namely stores the content in the log updated and generated by the master node in a partition mode. Then, the master node may store the contents in the first buffer area and the second buffer area in the shared storage node, for example, refer to fig. 12, where log information generated by triggering corresponding to one data update request is divided into a plurality of data segments and then stored in the corresponding buffer area. In the process of writing the content of the buffer area into the shared storage node, the content of the buffer area in the first buffer pool is also written into the shared storage node through a plurality of threads in parallel, and compared with a log storage mode of partition storage in the related art, the mode in the embodiment can improve the data writing efficiency. In addition, the master node may send the first log information stored in the first buffer area to the slave node through the log sending thread, and the log receiving thread in the slave node receives the first log information, that is, the first log information sent by the master node to the slave node is the log information that does not include the first data, compared with the mode of directly sending the second log information including the first data to the slave node, the mode in this embodiment can save the data transmission amount between the master node and the slave node, and also can reduce the occupation of the storage space of the slave node.
After the master node obtains the first log information and the first data generated by segmentation, and writes the first log information and the first data partition into the second cache pool of the master node, further, the corresponding writing information in each cache area contained in the second cache pool is determined, that is, the data which is already permanently written into the shared storage node in the data in each cache area is determined, so as to determine the global permanent log serial number. And then, according to the global persistence log serial number, determining the data page which can be written into the shared storage node at present, and writing the determined data page into the shared storage node. It can be understood that the master node side can reduce the data transmission quantity between the master node and the slave node through the segmentation processing of the log information; after the first data and the first log information are written into the second cache pool in a partition mode, the master node further determines a data page which can be written into the shared storage node according to the determined global persistence log serial number, so that after the log information is written into the shared storage node, the updated data page is asynchronously written into the shared storage node. When the stored data in the subsequent master node cache pool is lost, the previously lost data page can also be generated according to log information playback in the shared storage node.
On the slave node side, after receiving the first log information, the log thread in the slave node sends the first log information to the playback coordination thread in the slave node, so that the playback coordination thread can search whether a data page with a data page identifier in the first log information exists in a first cache pool of the slave node based on the received first log information, and if so, the data page is marked with a first identifier to identify that the data in the data page needs to be updated, namely the data in the data page is old data. And storing, by the playback coordination thread, the received first log information in the corresponding storage area. Later, when receiving a data query request from a user thread in a slave node, the user thread can determine target data to be queried according to a data page cached in a first cache pool corresponding to the slave node, log information sent by a master node and data in a shared storage node received by the slave node, and a specific query process can be described in the embodiments shown in fig. 6 and 7; that is, in this embodiment, the user thread in the slave node is configured to execute the operation of data query under the trigger of the data query request, and determine the target data to be queried. In this embodiment, the slave node performs data query through the log information sent by the master node, so that the phenomenon that the data in the data page finally obtained by the slave node under the trigger of the data query request is data which is not updated, that is, the phenomenon that the slave node obtains the past data page, can be avoided. In one possible implementation manner, the slave node side only receives the data query request, and executes the log playback operation when determining that the data page initially queried in the cache pool of the slave node is the past data page, so as to ensure that the data in the finally acquired data page is updated data, and further avoid the invalid log playback operation of the slave node side. Wherein the invalid log playback operation is an operation of playing back a data page that the slave node does not need to query.
In addition, the slave node may be provided with a background playback thread. The background playback thread may periodically perform playback processing on the data page in the first cache pool, where the data page is identified as old data, so as to update the data in the data page in the first cache pool to the latest data. Furthermore, by the above processing method of periodic playback, the probability that log playback operation is still required in the data query process of the slave node can be reduced, and the probability that the data in the data page acquired from the first cache pool by the slave node is the latest data can be improved, so that the data query efficiency can be improved. The specific implementation principle may refer to the description at step S804 in the above embodiment, which is not repeated here.
Exemplary Medium
Having described the method of the exemplary embodiments of the present disclosure, next, a storage medium of the exemplary embodiments of the present disclosure will be described with reference to fig. 15.
Referring to fig. 15, a storage medium 150 has stored therein a program product for implementing the above-described method according to an embodiment of the present disclosure, which may employ a portable compact disc read only memory (CD-ROM) and include computer-executable instructions for causing a computing device to perform the data processing method based on the shared memory architecture provided by the present disclosure. However, the program product of the present disclosure is not limited thereto.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave in which the computer-executable instructions are carried. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium.
Computer-executable instructions for performing the operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-executable instructions may be executed entirely on the user computing device, partly on the user device, partly on the remote computing device, or entirely on the remote computing device or server. In the context of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN).
Exemplary apparatus
Having described the media of the exemplary embodiments of the present disclosure, a data processing device 1600 based on a shared storage architecture according to the exemplary embodiments of the present disclosure will be described with reference to fig. 16, so as to implement the method in any of the foregoing method embodiments, and the implementation principle and technical effect are similar, and will not be repeated herein.
Fig. 16 is a schematic device structure of a data processing device based on a shared memory architecture according to an embodiment of the disclosure, where the device is applied to a slave node, and the device includes:
a first receiving unit 1601, configured to receive first log information sent by a master node; the first log information is obtained by the segmentation processing of the second log information by the master node; the second log information is used for recording the first data and the first data page identification; the first data is changed data which is made by updating the original data page into the first data page by the master node; the first data page identifier is an identifier of the first data page; the first log information is used for recording a first data page identifier and first position information of first data; the first location information characterizes a location of the first data in the shared storage node.
A first determining unit 1602, configured to add a first identifier to a second data page if it is determined that the second data page exists in the first cache pool according to the first log information; the first identifier characterizes that the data in the first data page and the data in the second data page are different; the second data page and the first data page have the same data page identification; the first cache pool is used for storing data pages cached from the nodes.
A second determining unit 1603, configured to determine target data according to the first buffer pool in response to the data query instruction.
In one example, the second determining unit is specifically configured to: responding to the data query instruction, and if the first cache pool comprises a third data page and the third data page does not have the second identifier, determining the data in the third data page as target data; the third data page and the target data page requested to be queried by the data query command have the same data page identification; the second identifier characterizes that the data in the third data page is not identical to the data in the target data page generated by the master node update.
In one example, the second determining unit is specifically configured to: responding to the data query instruction, and if the first cache pool comprises a fourth data page and the fourth data page is provided with a third identifier, determining third log information corresponding to the fourth data page; the fourth data page and the target data page requested to be queried by the data query command have the same data page identification; the third identifier characterizes that the data in the fourth data page is different from the data in the target data page generated by updating the master node; the third log information is used for recording a second data page identifier and second position information of the fourth data page; the second location information is the location of the second data in the shared storage node; the second data is changed data which is made by the main node updating the fourth data page into the target data page;
Acquiring second data according to the second position information; and updating the data in the fourth data page according to the second data to obtain target data.
In one example, the second determining unit is specifically configured to: responding to the data query instruction, and if the first cache pool does not comprise the fifth data page, determining fourth log information, wherein the fourth log information is used for recording a third data page identifier and third position information of the target data page; the third location information is the location of the third data in the shared storage node; the third data is changed data which is made by the master node updating the sixth data page to the target data page; the sixth data page and the target data page are used for indicating that the data page identifiers are the same; the target data page requests the data page queried according to the data query command;
and acquiring third data and a sixth data page in the shared storage node according to the third data page identifier and the third position information.
And updating the data in the sixth data page according to the third data to obtain target data.
In one example, the apparatus further comprises:
the first acquisition unit is used for responding to the log playback request, and if the first cache pool comprises a seventh data page with a fourth identifier, fifth log information corresponding to the seventh data page is acquired; the fourth identifier characterizes that the data in the seventh data page is different from the data in the eighth data page generated by updating the master node; the eighth data page and the seventh data page have the same data page identification; the fifth log information is used for recording a fourth data page identifier and fourth position information of the seventh data page; the fourth location information is the location of fourth data in the shared storage node; the fourth data is the data changed by the master node to update the eighth data page with the seventh data page.
And the second acquisition unit is used for acquiring fourth data in the shared storage node according to the fourth position information.
And the first updating unit is used for updating the data in the seventh data page according to the fourth data to obtain an updated seventh data page.
In one example, the apparatus further comprises:
the second receiving unit is used for receiving the notification information sent by the main equipment; the notification information is used to indicate that the ninth data page having the first log sequence number has been restored by the master device to the shared storage node.
A deleting unit, configured to delete the sixth log information received from the node according to the notification information; the sixth log information is used for recording a data page identifier and fifth position information of the ninth data page; the fifth location information is a location of fifth data in the shared storage node; the fifth data is data in the data page changed when the master node updates and generates the ninth data page, and the second log serial number of the sixth log information is smaller than or equal to the first log serial number.
In one example, a shared storage node includes a first storage area and at least one second storage area; the first storage area is used for the first log information generated by the master node; the second storage area is used for storing sixth data corresponding to the tenth data page; the tenth data page is a data page having a data page identification corresponding to the second storage area; the sixth data is the data updated by the master node when the tenth data page is generated.
In one example, the first location information includes: data amount and start position information of the first data; the initial position information is used for indicating an initial storage position of the first data in the shared storage node; the data amount of the first data is used to indicate the amount of storage space occupied by the first data.
In one example, the apparatus further comprises:
and a third determination unit configured to use the first data page identifier in the first log information as a key.
A fourth determining unit configured to determine indication information of the first log information based on the hash algorithm and the first data page identifier; the indication information is used for indicating a storage space of the first log information.
And a storage unit for storing the first log information in a storage space indicated by the indication information.
Fig. 17 is a schematic structural diagram of yet another data processing apparatus based on a shared storage architecture according to an embodiment of the present disclosure, where an apparatus 1700 is applied to a master node, and the apparatus includes:
a second updating unit 1701, configured to respond to a data update request, and update data in at least one original data page to obtain a first data page after the original data page is updated correspondingly; and generating at least one second log information; the second log information is used for recording the first data and the first data page identification; the first data is changed data which is made by updating the original data page into the first data page by the master node; the first data page identity is an identity of the first data page.
The segmentation unit 1702 is configured to perform segmentation processing on the second log information to obtain first log information; the first log information is used for recording a first data page identifier and first position information of first data; the first location information characterizes a location of the first data in the shared storage node.
A transmitting unit 1703 for transmitting the first log information to the slave node.
In one example, the apparatus further comprises:
the first caching unit is used for caching the first data page into a second caching pool corresponding to the master node; the second cache pool is used for caching data pages generated by the master node.
The second buffer unit is used for writing the first log information corresponding to the second log information and the first data included in the second log information into the second buffer pool.
In one example, the second buffer pool includes a first buffer region and at least one second buffer region; the first buffer area is used for storing first log information corresponding to each second log information; the second buffer area is used for storing seventh data corresponding to the eleventh data page; the eleventh data page is a data page with a data page identifier corresponding to the second buffer area; the seventh data is updated data corresponding to the update when the eleventh data page is generated by the master node.
In one example, further comprising:
and the first writing unit is used for writing the data stored in the first cache area and the second cache area in the second cache pool into the shared storage node in parallel based on the plurality of first threads.
In one example, a shared storage node includes a first storage area and a second storage area; the first storage area is used for storing first log information generated by the master node; the second storage area is used for storing sixth data corresponding to the tenth data page; the tenth data page is a data page having a data page identification corresponding to the second storage area; the sixth data is the data updated by the master node when the tenth data page is generated.
In one example, the first storage area corresponds to the first cache area; the second buffer areas are in one-to-one correspondence with the second storage areas.
In one example, the data update request has a first global log sequence number; the apparatus further comprises:
a fifth determining unit configured to determine the first writing information and the second writing information; the first writing information is used for indicating a first serial number corresponding to first log information written into the shared storage node in the first cache region; the first serial number is a global log serial number which is used for triggering the master node to generate a data update request of fourth log information; the fourth log information is second log information corresponding to the first log information; the second writing information is used for indicating a second serial number corresponding to the seventh data written in the shared storage node in the second cache area; the second serial number is a global log serial number which triggers the master node to generate a data update request of fifth log information; the fifth log information includes seventh data.
A sixth determining unit, configured to determine a global persistence log sequence number according to the first writing information and the second writing information; the global persistence log sequence number characterizes that all sixth log information corresponding to the first request is transferred to the shared storage node; the first request is a data update request having a second global log sequence number; the second global log sequence number is smaller than or equal to the global persistence log sequence number; the sixth log information is used to record the data page identification and the adjusted data of the data page for which the first request triggers an update.
A seventh determining unit, configured to determine, according to the global persistent log sequence number, a data page to be written in the second cache pool; the third log serial number corresponding to the data page to be written is smaller than the global persistence log serial number.
And the second writing unit is used for updating the data page to be written into the shared storage node.
In one example, the data update request has a first global log sequence number; the shared storage node further comprises: a third storage area and a fourth storage area; the third storage area comprises N first subareas; the fourth storage area comprises N second subareas; the first subareas are in one-to-one correspondence with the second subareas; n is a positive integer.
The apparatus further comprises:
an eighth determining unit, configured to determine at least one fourth location information and at least one fifth location information corresponding to the data update request; the fourth position information is the position of the first log information in the second cache pool; the fifth location information is a location of the first data in the second buffer pool.
And the third writing unit is used for writing the first global log sequence number into the first subarea.
And the fourth writing unit is used for writing the data updating request into the second subarea corresponding to the first subarea, wherein the fourth position information and the fifth position information correspond to the data updating request.
Exemplary computing device
Having described the methods, media, and apparatus of exemplary embodiments of the present disclosure, a computing device of exemplary embodiments of the present disclosure is next described with reference to fig. 18.
The computing device 180 shown in fig. 18 is merely an example and should not be taken as limiting the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 18, the computing device 180 is in the form of a general purpose computing device. Components of computing device 180 may include, but are not limited to: at least one processing unit 1801, at least one memory unit 1802, and a bus 1803 that connects the various system components, including the processing unit 1801 and the memory unit 1802. Wherein at least one memory unit 1802 has stored therein computer-executable instructions; at least one of the processing units 1801 includes a processor that executes instructions of the computer to implement the methods described above.
The bus 1803 includes a data bus, a control bus, and an address bus.
The storage unit 1802 may include readable media in the form of volatile memory, such as Random Access Memory (RAM) 18021 and/or cache memory 18022, and may further include readable media in the form of non-volatile memory, such as Read Only Memory (ROM) 18023.
The storage unit 1802 may also include a program/utility 18025 having a set (at least one) of program modules 18024, such program modules 18024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The computing device 180 may also communicate with one or more external devices 1804 (e.g., keyboard, pointing device, etc.). Such communication may occur through an input/output (I/O) interface 1805. Moreover, the computing device 180 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through the network adapter 1806. As shown in fig. 7, the network adapter 1806 communicates with other modules of the computing device 180 over bus 1803. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computing device 180, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of a data processing apparatus based on a shared memory architecture are mentioned, such a division is only exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
Furthermore, although the operations of the methods of the present disclosure are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that this disclosure is not limited to the particular embodiments disclosed nor does it imply that features in these aspects are not to be combined to benefit from this division, which is done for convenience of description only. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A data processing method based on a shared storage architecture, the method being applied to a slave node, the method comprising:
receiving first log information sent by a master node; the first log information is obtained by the master node performing segmentation processing on the second log information; the second log information is used for recording first data and a first data page identifier; the first data is changed data which is made by the main node to update the original data page into the first data page; the first data page identifier is an identifier of the first data page; the first log information is used for recording the first data page identifier and first position information of the first data; the first location information characterizes a location of the first data in a shared storage node;
if the second data page exists in the first cache pool according to the first log information, adding a first identification for the second data page; the first identifier characterizes that the data in the first data page and the data in the second data page are not identical; the second data page and the first data page have the same data page identification; the first cache pool is used for storing the data pages cached by the slave nodes;
And responding to a data query instruction, and determining target data according to the first cache pool.
2. The method of claim 1, responsive to a data query instruction, determining target data from the first cache pool, comprising:
responding to a data query instruction, and if the first cache pool comprises a third data page and the third data page does not have a second identifier, determining that the data in the third data page is target data; the third data page and the target data page requested to be queried by the data query instruction have the same data page identification; the second identifier characterizes that the data in the third data page is different from the data in the target data page generated by the master node update.
3. The method of claim 1, responsive to a data query instruction, determining target data from the first cache pool, comprising:
responding to a data query instruction, and if the first cache pool comprises a fourth data page and the fourth data page is provided with a third identifier, determining third log information corresponding to the fourth data page; the fourth data page and the target data page requested to be queried by the data query instruction have the same data page identification; the third identifier characterizes that the data in the fourth data page is different from the data in the target data page generated by updating the master node; the third log information is used for recording a second data page identifier and second position information of the fourth data page; the second location information is the location of second data in the shared storage node; the second data is the data changed by the main node to update the fourth data page to the target data page;
Acquiring the second data according to the second position information; and updating the data in the fourth data page according to the second data to obtain target data.
4. The method of claim 1, responsive to a data query instruction, determining target data from the first cache pool, comprising:
responding to a data query instruction, and if the first cache pool does not comprise a fifth data page, determining fourth log information, wherein the fourth log information is used for recording a third data page identifier and third position information of a target data page; the third location information is the location of third data in the shared storage node; the third data is changed data which is made by the master node to update a sixth data page to the target data page; the sixth data page and the target data page are used for indicating that the data page identification is the same; the target data page is the data page requested to be queried by the data query instruction;
acquiring the third data and the sixth data page in the shared storage node according to the third data page identifier and the third position information;
and updating the data in the sixth data page according to the third data to obtain target data.
5. The method of claim 1, further comprising:
responding to a log playback request, and if the first cache pool comprises a seventh data page with a fourth identifier, acquiring fifth log information corresponding to the seventh data page; wherein the fourth identifier characterizes that the data in the seventh data page is different from the data in the eighth data page generated by the master node update; the eighth data page and the seventh data page have the same data page identification; the fifth log information is used for recording a fourth data page identifier and fourth position information of the seventh data page; the fourth location information is the location of fourth data in the shared storage node; the fourth data is the data changed by the master node updating the seventh data page to the eighth data page;
acquiring the fourth data in the shared storage node according to the fourth position information;
and updating the data in the seventh data page according to the fourth data to obtain an updated seventh data page.
6. The method of claim 1, further comprising:
receiving notification information sent by a main device; the notification information is used for indicating that a ninth data page with a first log sequence number is transferred to the shared storage node by the master device;
According to the notification information, deleting the sixth log information received by the slave node; the sixth log information is used for recording a data page identifier and fifth position information of the ninth data page; the fifth location information is the location of fifth data in the shared storage node; and the fifth data is data in a data page changed when the master node updates and generates the ninth data page, and the second log sequence number of the sixth log information is less than or equal to the first log sequence number.
7. A data processing method based on a shared storage architecture, the method being applied to a master node, the method comprising:
responding to a data updating request, and updating data in at least one original data page to obtain a first data page after corresponding updating of the original data page; and generating at least one second log information; the second log information is used for recording first data and a first data page identifier; the first data is changed data which is made by the main node to update the original data page into the first data page; the first data page identifier is an identifier of the first data page;
Performing segmentation processing on the second log information to obtain first log information; wherein the first log information is used for recording the first data page identifier and first position information of the first data; the first location information characterizes a location of the first data in a shared storage node;
the first log information is sent to a slave node.
8. A data processing apparatus based on a shared storage architecture, the apparatus being applied to a slave node, the apparatus comprising:
a first receiving unit, configured to receive first log information sent by a master node; the first log information is obtained by the master node performing segmentation processing on the second log information; the second log information is used for recording first data and a first data page identifier; the first data is changed data which is made by the main node to update the original data page into the first data page; the first data page identifier is an identifier of the first data page; the first log information is used for recording the first data page identifier and first position information of the first data; the first location information characterizes a location of the first data in a shared storage node;
The first determining unit is used for adding a first identifier to the second data page if the second data page exists in the first cache pool according to the first log information; the first identifier characterizes that the data in the first data page and the data in the second data page are not identical; the second data page and the first data page have the same data page identification; the first cache pool is used for storing the data pages cached by the slave nodes;
and the second determining unit is used for responding to the data query instruction and determining target data according to the first cache pool.
9. A data processing apparatus based on a shared storage architecture, the apparatus being applied to a master node, the apparatus comprising:
the second updating unit is used for responding to the data updating request, updating the data in at least one original data page and obtaining a first data page after corresponding updating of the original data page; and generating at least one second log information; the second log information is used for recording first data and a first data page identifier; the first data is changed data which is made by the main node to update the original data page into the first data page; the first data page identifier is an identifier of the first data page;
The segmentation unit is used for carrying out segmentation processing on the second log information to obtain first log information; wherein the first log information is used for recording the first data page identifier and first position information of the first data; the first location information characterizes a location of the first data in a shared storage node;
and the sending unit is used for sending the first log information to the slave node.
10. A computing device, comprising:
at least one processor;
and a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor to cause the computing device to perform the method of any one of claims 1 to 6.
CN202311366597.7A 2023-10-19 2023-10-19 Data processing method and device based on shared storage architecture and computing equipment Pending CN117591523A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311366597.7A CN117591523A (en) 2023-10-19 2023-10-19 Data processing method and device based on shared storage architecture and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311366597.7A CN117591523A (en) 2023-10-19 2023-10-19 Data processing method and device based on shared storage architecture and computing equipment

Publications (1)

Publication Number Publication Date
CN117591523A true CN117591523A (en) 2024-02-23

Family

ID=89919086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311366597.7A Pending CN117591523A (en) 2023-10-19 2023-10-19 Data processing method and device based on shared storage architecture and computing equipment

Country Status (1)

Country Link
CN (1) CN117591523A (en)

Similar Documents

Publication Publication Date Title
JP7271670B2 (en) Data replication method, device, computer equipment and computer program
US11461202B2 (en) Remote data replication method and system
US8868512B2 (en) Logging scheme for column-oriented in-memory databases
US8874515B2 (en) Low level object version tracking using non-volatile memory write generations
US10949415B2 (en) Logging system using persistent memory
WO2018059441A1 (en) Data processing method, system, and device
CN108460045B (en) Snapshot processing method and distributed block storage system
EP2863310B1 (en) Data processing method and apparatus, and shared storage device
US9547706B2 (en) Using colocation hints to facilitate accessing a distributed data storage system
CN106951375B (en) Method and device for deleting snapshot volume in storage system
CN109542682B (en) Data backup method, device, equipment and storage medium
US20150121130A1 (en) Data storage method, data storage apparatus, and storage device
CN103595797B (en) Caching method for distributed storage system
CN113220729B (en) Data storage method and device, electronic equipment and computer readable storage medium
CN108415986B (en) Data processing method, device, system, medium and computing equipment
US11537582B2 (en) Data access method, a data access control device, and a data access system
US10606746B2 (en) Access request processing method and apparatus, and computer system
US7610320B2 (en) Technique for remapping data in a storage management system
CN110134551B (en) Continuous data protection method and device
KR20110046118A (en) Adaptive logging apparatus and method
CN111930684A (en) Small file processing method, device and equipment based on HDFS (Hadoop distributed File System) and storage medium
JP4394467B2 (en) Storage system, server apparatus, and preceding copy data generation method
US10073874B1 (en) Updating inverted indices
CN116049306A (en) Data synchronization method, device, electronic equipment and readable storage medium
CN117591523A (en) Data processing method and device based on shared storage architecture and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination