CN115599790A - Data storage system, data processing method, electronic device and storage medium - Google Patents

Data storage system, data processing method, electronic device and storage medium Download PDF

Info

Publication number
CN115599790A
CN115599790A CN202211407850.4A CN202211407850A CN115599790A CN 115599790 A CN115599790 A CN 115599790A CN 202211407850 A CN202211407850 A CN 202211407850A CN 115599790 A CN115599790 A CN 115599790A
Authority
CN
China
Prior art keywords
data processing
request
data
storage module
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211407850.4A
Other languages
Chinese (zh)
Other versions
CN115599790B (en
Inventor
刘汪根
谢玉波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transwarp Technology Shanghai Co Ltd
Original Assignee
Transwarp Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transwarp Technology Shanghai Co Ltd filed Critical Transwarp Technology Shanghai Co Ltd
Priority to CN202211407850.4A priority Critical patent/CN115599790B/en
Publication of CN115599790A publication Critical patent/CN115599790A/en
Application granted granted Critical
Publication of CN115599790B publication Critical patent/CN115599790B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data storage system, a data processing method, an electronic device and a storage medium. Wherein, this data storage system includes: the system comprises a request scheduling service module, a row storage module and a column storage module, wherein the request scheduling service module requests the scheduling service module to acquire a data processing request of a client and distributes the data processing request to at least one of the row storage module and the column storage module according to attribute information of the data processing request; the row storage module comprises a row data processing engine, the column storage module comprises a column data processing engine, and the row storage module and the column storage module execute data processing corresponding to the data processing request distributed by the request scheduling service module. According to the embodiment of the invention, the scheduling service module is distributed to the row storage module and the column storage module according to the attribute information of the data processing request, and different data processing requests are processed based on different processing engines, so that the high efficiency of processing different types of data processing requests is realized, and the use experience of a user is enhanced.

Description

Data storage system, data processing method, electronic device and storage medium
Technical Field
The present invention relates to the field of data storage technologies, and in particular, to a data storage system, a data processing method, an electronic device, and a storage medium.
Background
In a database system, the data storage manner and the access efficiency are important factors affecting the performance of the database system. A Hybrid Transaction and Analytical Processing (HTAP) database is a distributed database that supports both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) traffic types, and an HTAP system is capable of analyzing data generated from transactions in real time and has the above-described advantages of both OLTP and OLAP systems.
Currently, the OLT P and the OLAP system are connected together mainly by an Extract-Transform-Load (ETL) technology, and the OLTP system can import data into the OLAP system according to the ETL process. Due to the completely asynchronous mode of the data of the OLTP system and the OLAP system, the processing time is long, and the consistency can not be ensured. Therefore, a data storage system capable of ensuring efficient data processing becomes a problem to be solved.
Disclosure of Invention
The invention provides a data storage system, a data processing method, electronic equipment and a storage medium, which are used for realizing the efficient processing of different types of data by a database and improving the use experience of a user.
According to an aspect of the present invention, there is provided a data storage system, wherein the data storage system comprises:
the system comprises a request scheduling service module, a row storage module and a column storage module;
the request scheduling service module acquires a data processing request of a client and distributes the data processing request to at least one of the row storage module and the column storage module according to attribute information of the data processing request;
the row storage module comprises a row data processing engine, the column storage module comprises a column data processing engine, and the row storage module and the column storage module execute data processing corresponding to the data processing request distributed by the request scheduling service module.
According to another aspect of the present invention, there is provided a data processing method, applied to a data storage system, the method including:
acquiring a data processing request of a client;
distributing a data processing request to at least one of a row storage module and a column storage module according to the attribute information of the data processing request by a request scheduling service module;
and executing data processing corresponding to the data processing request distributed by the request scheduling service module based on at least one of the row storage module and the column storage module.
According to another aspect of the present invention, there is provided a data processing apparatus, for use in a data storage system, the apparatus comprising:
the request acquisition module is used for acquiring a data processing request of a client;
the request distribution module is used for distributing the data processing request to at least one of the row storage module and the column storage module according to the attribute information of the data processing request by the request scheduling service module;
and the data processing module is used for executing data processing corresponding to the data processing request distributed by the request scheduling service module based on at least one of the row storage module and the column storage module.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform a data processing method according to any of the embodiments of the invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement a data processing method according to any one of the embodiments of the present invention when the computer instructions are executed.
According to the technical scheme of the embodiment of the invention, the data processing request of the client is acquired by the request scheduling service module and is distributed to the row storage module or the column storage module according to the attribute information of the data processing request, and the row storage module or the column storage module executes the data processing corresponding to the data processing request distributed by the request scheduling service module, so that the data processing request is efficiently processed, the database processing performance is improved, and the use experience of a user is enhanced.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic structural diagram of a data storage system according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a data storage system according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a data storage system according to a second embodiment of the present invention;
FIG. 4 is a flowchart of a data processing method according to a third embodiment of the present invention;
fig. 5 is a flowchart of a data processing method according to a fourth embodiment of the present invention;
fig. 6 is a flowchart of a data processing method according to a fifth embodiment of the present invention;
fig. 7 is a flowchart of a data processing method according to a fifth embodiment of the present invention;
fig. 8 is a flowchart of a data processing method according to a fifth embodiment of the present invention;
fig. 9 is a schematic structural diagram of a data processing apparatus according to a sixth embodiment of the present invention;
fig. 10 is a schematic structural diagram of an electronic device implementing a data processing method according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a schematic structural diagram of a data storage system according to an embodiment of the present invention, which is applicable to a case of storing data in a database, as shown in fig. 1, the system includes: a request scheduling service module 10, a row storage module 20 and a column storage module 30.
The request scheduling service module 10 obtains a data processing request of a client and distributes the data processing request to at least one of the row storage module 20 and the column storage module 30 according to the attribute information of the data processing request.
The row storage module 20 includes a row data processing engine, the column storage module 30 includes a column data processing engine, and the row storage module 20 and the column storage module 30 execute data processing corresponding to the data processing request allocated by the request scheduling service module 10.
In this embodiment of the present invention, the request scheduling service module 10 may allocate the data processing request according to attribute information of the data processing request, where the attribute information of the data processing request may include a service type and a request type, the service type may include a transaction type processing service and an analysis type processing service, and the request type may include a data modification request and a data analysis request. The request scheduling service module 10 may allocate the data processing request to the row storage module 20 or the column storage module 30 according to the attribute information of the data processing request. The line storage module 20 may receive the data processing request allocated by the request scheduling service module 10, and the line storage module 20 includes a line data processing engine, which may be an engine that performs data processing on a line basis and may parse and execute data processing corresponding to the data processing request allocated by the request scheduling service module 10. The column storage module 30 includes a column data processing engine, which may be an engine for performing data processing on a column basis, and may parse and execute data processing corresponding to the data processing request requesting the scheduling service module 10.
In an embodiment, the number of the row storage modules 20 includes at least one, and each row storage module 20 corresponds to a service type and a request type in the attribute information of the data processing request.
In an embodiment, the number of the line storage modules 20 may be multiple, the data processing requests corresponding to different line storage modules 20 may be different, and each line storage module 20 may correspond to the service type and the request type in the attribute information of the data processing request. The attribute information may be stored in the data processing request, and the attribute information may be extracted by parsing the data processing request, and may include a service type and a request type. In an embodiment, each row of the storage module 20 may set a corresponding relationship with the service type and the request type, and may set the same identifier to implement the correspondence between the row of the storage module 20 and the attribute information.
In one embodiment, the memory modules 20 in each row are backup to each other.
In an embodiment, since the number of the row storage modules 20 may be multiple, the rows of the storage modules 20 may be backup to each other. The logs may be synchronized to other row storage modules 20 to enable the rows of storage modules 20 to backup each other.
Fig. 2 is a schematic structural diagram of a data storage system according to an embodiment of the present invention, and as shown in fig. 2, the system further includes a consistency management module 40.
The consistency management module 40 is used for synchronizing the storage data of the row storage module 20 and the column storage module 30.
In the embodiment of the present invention, after the line storage module 20 receives the data processing request including the data modification information, the line storage module 20 may perform data processing corresponding to the data processing request. Other row memory modules 20 and column memory modules 30 may store data synchronously. The synchronization may include synchronizing log information via a coherency protocol, from which the stored data may be synchronized to other row storage modules 20 or column storage modules 30.
In the embodiment of the invention, the data processing request of the client is acquired by the request scheduling service module and is distributed to the row storage module or the column storage module according to the attribute information of the data processing request, the row storage module or the column storage module executes the data processing corresponding to the data processing request distributed by the request scheduling service module, and the request scheduling service module realizes that different types of data processing requests are respectively distributed to the row storage module and the column storage module, so that the processing performance is improved, and the use experience of a user is enhanced.
Example two
Fig. 3 is a schematic structural diagram of a data storage system according to a second embodiment of the present invention, and the present embodiment is a further description of a data storage system based on the second embodiment of the present invention, in which a database cluster query scheduling service module 11 is used as a request scheduling service module 10, and an OLTP request and an OLAP request are used as data processing requests. As shown in fig. 3, the system includes: the database cluster queries the schedule service module 11, the row storage module 20, the column storage module 30, and the consistency management module 40.
The database cluster query scheduling service module 11 may obtain an OLTP request and an OLAP request of a client, and allocate the OLTP request to the row storage module and allocate the OLAP request to the column storage module.
The row storage module 20 includes a row optimizer, a row executor and a row store, the column storage module 30 includes a column optimizer, a column executor and a column store, and the row storage module 20 and the column storage module 30 perform data processing corresponding to the OLTP request and the OLAP request distributed by the database cluster query scheduling service module 11.
The consistency management module 40 is used for synchronizing the storage data of the row memory module and the column memory module.
EXAMPLE III
Fig. 4 is a flowchart of a data processing method according to a third embodiment of the present invention, where this embodiment is applicable to a case where data is processed in a database, and the method may be executed by a data processing apparatus, and the data processing apparatus may be implemented in the form of hardware and/or software. As shown in fig. 4, the method includes:
and S110, acquiring a data processing request of the client.
The client can send a data processing request to the scheduling service query module for processing, so that data modification and data query services are realized. The data processing request sent by the client may refer to a request to be processed, and the data processing request may include various types, and may include, for example, a data modification request, a data analysis request, and the like.
In the embodiment of the present invention, the client may send a data processing request to the scheduling service query module, and the scheduling service query module may receive the data processing request from the client. The data processing request sent by the client may include a data modification request, a data analysis request, and the like, and according to different requirements of the client, different data processing requests may be sent to the scheduling service query module. For example, when the client needs to modify the data in the database, a data modification request may be sent to the scheduling service query module; when the client needs to query the data in the database, a data analysis request can be sent to the scheduling service query module.
And S120, distributing the data processing request to at least one of the row storage module and the column storage module according to the attribute information of the data processing request by the request scheduling service module.
The attribute information of the data processing request may refer to data information carried by the data processing request, and the data processing request may carry information such as a service type and a request type.
In the embodiment of the present invention, the request scheduling service module may allocate the data processing request to the row storage module and/or the column storage module according to the attribute information of the data processing request to execute the corresponding request. The attribute information may include a service type and a request type, etc. For example, the data processing request may be pre-allocated according to the service type, and then allocated according to the request type; or the data processing request can be distributed in advance through the request type, and then the data processing request is distributed according to the service type; or may be assigned data processing requests by both the service type and the request type. In an embodiment, when a data processing request is distributed according to a request type, and when the request type carried by the data processing request is a data modification request, the scheduling service module may send the data processing request to the row storage module; when the request type carried by the data processing request is a data analysis request, the scheduling service module may send the data processing request to the row storage module or the column storage module. In an embodiment, when the request type is a data analysis request, the data processing request may be allocated to the row storage module or the column storage module according to the service type, and when the service type is a transaction type processing service, the data processing request may be allocated to the row storage module; when the service type is an analytic processing service, the data processing request may be allocated to the column storage module.
And S130, executing data processing corresponding to the data processing request distributed by the request scheduling service module based on at least one of the row storage module and the column storage module.
In this embodiment of the present invention, after the request scheduling service module allocates the data processing request to the row storage module or the column storage module, the row storage module or the column storage module may perform data processing corresponding to the data processing request. The row storage module or the column storage module can analyze the data processing request, read the content of the data processing request and perform data processing according to the content of the data processing request. In one embodiment, the row storage module may perform modification data processing and analysis data processing, and the column storage module may perform analysis data processing.
In the embodiment of the invention, the data processing request of the client is acquired by the request scheduling service module, the data processing request is distributed to the row storage module or the column storage module according to the attribute information of the data processing request, and the row storage module or the column storage module executes the data processing corresponding to the data processing request distributed by the request scheduling service module. The scheduling service module is distributed to the row storage module and the column storage module according to the attribute information of the data processing request, so that different query processing engines can process different service requirements, the data processing performance is improved, and the use experience of a user is enhanced.
Example four
Fig. 5 is a flowchart of a data processing method according to a fourth embodiment of the present invention, and this embodiment is a further refinement of the data processing method based on the foregoing embodiments. As shown in fig. 5, the method includes:
and S2010, acquiring a data processing request of the client.
S2020, routing information stored in association with the service type and the request type of the data processing request is searched in the request scheduling service module.
The service type may refer to type division performed according to different transactions to be processed, and the service type may include a transaction type processing service and an analysis type processing service; the request type may refer to a type of data processing request, and may include a data modification request, a data analysis request, and the like. The routing information may refer to route information for the information packet to be sent to the destination address.
In the embodiment of the present invention, the request scheduling service module may store a plurality of routing information, and may search the routing information stored in association with the request scheduling service module according to the service type and the request type of the data processing request. In an actual operation process, the scheduling service module may store a variety of routing information, and the routing information may be stored in a table form or in an information base form. The routing information corresponding to the field corresponding to the service type of the data processing request can be extracted, and then the routing information corresponding to the field corresponding to the request type is extracted, so that the routing information stored in association with the service type and the request type of the data processing request is searched in the request scheduling service module; or, the service type of the data processing request and the routing information of the field corresponding to the request type can be extracted at the same time, so that the routing information is searched in the request scheduling service module.
And S2030, under the condition that the request type is a data modification request, the service module is dispatched based on the request and sends a data processing request to the line storage module according to the routing information.
The line storage module may refer to a module for storing data in a line, the data is stored in a logical storage unit based on line data, and the data in a line exists in a continuous storage form in the storage medium.
In the embodiment of the present invention, the routing information corresponding to different request types in the scheduling service module is different, and when the request type is a data modification request, the destination address in the routing information may be a row storage module. When the request type is a data modification request, the routing information stored in the request scheduling service module can be queried, and the data processing request can be sent to the line storage module according to the routing information.
S2040, under the condition that the request type is a simple data analysis request, the scheduling service module based on the request sends a data processing request to the line storage module according to the routing information.
The simple data analysis request may be a request for data analysis based on a simple query statement. The row storage module may be a module based on a database of row-based storage, data being stored in row-based logical storage units, the data in a row being present in a storage medium in a form of continuous storage.
In the embodiment of the present invention, the scheduling service module may parse the type of the simple data analysis request, and the data analysis request may include a simple data analysis request and a complex data analysis request. When the request type is a simple data analysis request, the scheduling service module may allocate the request to the row storage module, and may query, according to the request type, corresponding routing information stored by the request scheduling service module and send the data processing request to the column storage module according to the routing information.
And S2050, under the condition that the request type is the complex data analysis request, the service module is dispatched based on the request and sends a data processing request to the column storage module according to the routing information.
The complex data analysis request may be a request for data analysis based on a complex query statement. The column storage module may be a module based on a column-wise stored database, and data in the column storage module is stored in column-based logical storage units.
In the embodiment of the present invention, when the request type is the complex data analysis request, a destination address corresponding to the routing information, where the request type is the complex data analysis request, stored in the scheduling service module may be queried, and the destination address may be a column storage module. And when the request type is a complex data analysis request, sending the data processing request to the column storage module through the routing information stored by the request scheduling service module.
S2060, analyzing the data modification command of the data processing request in the line storage module.
The data modification command may refer to a statement for modifying data in the database, and the data modification command may include multiple types, which may include, but are not limited to, a data write command, a data modification command, a data delete command, and the like.
In the embodiment of the present invention, when the data processing request includes a data modification request, the line storage module may receive the data processing request from the request scheduling service module, and may parse a data modification command of the data processing request. The line memory module can read the data processing request and disassemble and analyze the data modification command. The data modification request may include one or more data modification commands, and the data modification commands may include at least one of a data write command, a data modification command, and a data delete command.
S2070, executing a data modification command according to the line data processing engine of the line storage module, where the data modification command at least includes a data writing command, a data modification command, and a data deleting command.
In an embodiment of the present invention, the data processing engine may execute a data modification command parsed from the data processing request. The data modification command may include a plurality of data, which may include a location of the modified data, a modified content, a modified format, and the like, and the line data processing engine may perform data modification according to the content stored in the data modification command. In one embodiment, the data modification command includes at least a data write command, a data modification command, and a data delete command. When the data modification command is a data writing command, writing the data modification command into a corresponding position according to the corresponding format and content according to the data writing command; when the data modification command is a data modification command, writing corresponding data on the corresponding modified data in an overlaying manner according to the data modification command to realize data modification; when the data modification command is a data deletion command, blank data can be overlaid on the corresponding deleted data according to the data deletion command to realize data deletion.
S2080, synchronizing the log information of the data modification command to other row storage modules and/or column storage modules according to the consistency management module.
The log information may be an event record generated when the row storage module operates, and each row of log may record a description of a date, a time, a user, an action, and other related operations. In one embodiment, the Log may include multiple types, which may include a redo Log (redo Log), a Write Ahead Log (WAL), a bin Log, and so on.
In the embodiment of the present invention, after the row storage module executes the data modification command, log information of the data modification command may be generated, the log information of the data modification command may be synchronized to other row storage modules and/or column storage modules according to the consistency management module, and the other row storage modules and/or column storage modules may synchronize storage information according to the log information. Other line storage modules can copy and execute log contents according to data in a line storage format in the log, and other column storage modules can convert the data in the line storage format in the log into data in a column storage format and store the data in the column storage format.
S2090 feeds back the execution result of the data processing request to the client.
In the embodiment of the present invention, after the other row storage modules and/or the column storage modules synchronously store data according to the log information, the execution result of the data processing request may be fed back to the client. When the data processing request is a data writing command, feeding back a successful writing execution result to the client; when the data processing request is a data modification command, the execution result of successful data modification can be fed back to the client; when the data processing request is a data deletion command, the execution result of successful deletion can be fed back to the client. The path for feeding back the execution result to the user side may be an original path according to the routing information, and the feedback execution result may be sent to the client side through the path information in the routing information.
And S2100, analyzing the data analysis command of the data processing request by the row storage module or the column storage module.
Wherein the data analysis command may include a statement to query data in the database.
In the embodiment of the present invention, when the data processing request includes a data analysis request, the row storage module or the column storage module may receive the data analysis request from the request scheduling service module, and may parse a data analysis command of the data processing request. Parsing the data processing request may include extracting data locations, data information, and the like stored in the data analysis command, and parsing the analysis command of the data processing request.
And S2110, storing data in a row corresponding to the row data processing engine access data analysis command of the row storage module or storing data in a column corresponding to the column data processing engine access data analysis command of the column storage module to generate a data analysis result.
In the embodiment of the present invention, the row data processing engine may be an engine that performs data processing on a row basis, and the column data processing engine may be an engine that performs data processing on a column basis, both of which may analyze and execute data processing corresponding to the data processing request allocated by the request scheduling service module, and may operate according to an execution plan of the data analysis command to generate a data analysis result. The row data processing engine or the column data processing engine may search for corresponding row storage data or column storage data by analyzing data positions, data information, and the like stored in the obtained data analysis command, and generate a data analysis result after searching for the corresponding row storage data or column storage data.
And S2120, feeding back a data analysis result of the data processing request to the client.
In the embodiment of the present invention, after the row storage module or the column storage module generates the data analysis result, the execution result of the data analysis request may be fed back to the client. The data analysis result can be sent to the client through the path information in the routing information, or the path information can be reestablished and the data analysis result can be sent to the client.
According to the embodiment of the invention, the data processing request of the client is acquired, the data modification request is sent to the row storage module, the data analysis request is sent to the column storage module, the row storage module analyzes the execution data modification command and the simple data analysis command, the column storage module analyzes the execution complex data analysis command, and the execution results are respectively fed back to the client, so that the high efficiency of data processing is improved. The data is stored in the storage layer according to a row storage mode and a column storage mode through a consistency protocol, and the consistency of the bottom layer data is guaranteed.
EXAMPLE five
Fig. 6 is a flowchart of a data processing method according to a fifth embodiment of the present invention, where on the basis of the foregoing embodiment, the present embodiment further details the data processing method by using a cluster query scheduling service as a request scheduling service module, using an OLTP engine as a data processing engine, using an OLAP engine as a column data processing engine, and using a request type as a data modification request. As shown in fig. 6, the method includes:
and step 11, the client sends a data modification request to the cluster inquiry scheduling service.
And step 12, the cluster inquiry scheduling service is executed by a loader OLTP engine which routes to a consistency group according to the service type.
And step 13, the OLTP engine analyzes and executes data processing corresponding to the data modification request.
And 14, writing the data processing corresponding to the data modification request into the line memory engine.
And step 15, synchronizing the data processing to other line memory engines and other column memory engines through the log by a consistency protocol.
And step 16, after the other OLTP engines and the OLAP engine finish data processing synchronization, feeding back a successful writing execution result to the OLTP engine.
And step 17, after receiving the feedback of other OLTP engines and OLAP engines, the OLTP engine feeds back a successful execution result written in to the cluster query scheduling service.
And step 18, after the cluster query scheduling service receives the feedback of the OLTP engine, feeding back the successfully written execution result to the client.
Fig. 7 is a flowchart of a data processing method according to a fifth embodiment of the present invention, where in this embodiment, a cluster query scheduling service is used as a request scheduling service module, an OLTP engine is used as a data processing engine, and a request type is a common and simple data analysis request. As shown in fig. 7, the method includes:
and step 21, the client sends a common data modification request to the cluster inquiry and dispatch service.
And step 22, the cluster query scheduling service is routed to the OLTP engine according to the service type for execution.
And step 23, the OLTP engine analyzes and executes data processing corresponding to the data analysis request.
And 24, accessing the data of the line memory engine of the current instance by the OLTP engine, and inquiring corresponding data.
And 25, feeding back a query result to the cluster query scheduling service after the OLTP engine queries the corresponding data.
And step 26, after the cluster query scheduling service receives the feedback of the OLTP engine, feeding a query result back to the client.
Fig. 8 is a flowchart of a data processing method according to a fifth embodiment of the present invention, where the present embodiment is based on the foregoing embodiment, and further details the data processing method by using a cluster query scheduling service as a request scheduling service module, using an OLAP engine as a column data processing engine, and taking a data analysis request with a complicated request type as an example. As shown in fig. 8, the method includes:
and step 31, the client sends a complex data modification request to the cluster inquiry scheduling service.
And step 32, routing the cluster query scheduling service to an OLAP engine according to the service type for execution.
And step 33, the OLAP engine analyzes and executes data processing corresponding to the data analysis request.
And step 34, the OLAP engine accesses the storage engine data of the current instance and queries corresponding data.
And step 35, after the OLAP engine queries the corresponding data, feeding back a query result to the cluster query scheduling service.
And step 36, after the cluster query scheduling service receives the feedback of the OLAP engine, feeding back a query result to the client.
In the embodiment of the invention, by using a method for synchronizing the row storage Data and the column storage Data among all nodes of a database cluster through a consistency protocol, a Data log (such as a Redo log/WAL log/Bin log) of a Leader node is written into other nodes through the consistency protocol and is played back based on a copied log, so that the Data synchronization is performed, and the problems that a Data link is long and the write timeliness of log Data cannot be guaranteed through a Change Data Capture (CDC) mechanism in the prior art are solved.
EXAMPLE six
Fig. 9 is a schematic structural diagram of a data processing apparatus according to a sixth embodiment of the present invention. As shown in fig. 9, the apparatus includes: a request acquisition module 51, a request acquisition module 52 and a data processing module 53.
The request obtaining module 51 is configured to obtain a data processing request of a client.
The request obtaining module 52 is configured to allocate the data processing request to at least one of the row storage module and the column storage module according to the attribute information of the data processing request by the request scheduling service module.
And a data processing module 53, configured to execute data processing corresponding to the data processing request that is requested to be allocated by the scheduling service module based on at least one of the row storage module and the column storage module.
In the embodiment of the invention, the request acquisition module requests the scheduling service module to acquire the data processing request of the client, the data processing module distributes the data processing request to the row storage module or the column storage module according to the attribute information of the data processing request, and the data processing module executes the data processing corresponding to the data processing request distributed by the request scheduling service module based on the row storage module or the column storage module. Different service requirements are processed through different query processing engines, the data processing performance is improved, and the use experience of a user is enhanced.
In one embodiment, the request obtaining module 52 includes:
and the information searching unit is used for searching the routing information which is stored in association with the service type and the request type of the data processing request in the request scheduling service module.
And the first information sending unit is used for sending the data processing request to the line storage module according to the routing information based on the request scheduling service module under the condition that the request type is the data modification request.
And the second information sending unit is used for sending the data processing request to the column storage module according to the routing information based on the request scheduling service module under the condition that the request type is the data analysis request.
In one embodiment, when the data processing request includes a data modification request, the data processing module 53 includes:
the first command processing unit is used for analyzing the data modification command of the data processing request in the line storage module.
And the command execution unit is used for executing the data modification command according to the row data processing engine of the row storage module, wherein the data modification command at least comprises a data writing command, a data modification command and a data deleting command.
And the command synchronization unit is used for synchronizing the log information of the data modification command to other row storage modules and/or column storage modules according to the consistency management module.
And the first result feedback unit is used for feeding back the execution result of the data processing request to the client.
In one embodiment, when the data processing request includes a data analysis request, the data processing module 53 includes:
and the second command processing unit is used for analyzing the data analysis command of the data processing request in the line storage module.
And the result generating unit is used for accessing the data analysis command corresponding to the row to store the data according to the column data processing engine of the column storage module so as to generate a data analysis result.
And the second result feedback unit is used for feeding back the data analysis result of the data processing request to the client.
The data processing device provided by the embodiment of the invention can execute the data processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
EXAMPLE seven
Fig. 10 is a schematic structural diagram of an electronic device 10 implementing a data processing method according to an embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 10, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 may also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as a data processing method.
In some embodiments, a data processing method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of a data processing method as described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform a data processing method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Computer programs for implementing the methods of the present invention can be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired result of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A data storage system, the data storage system comprising:
the system comprises a request scheduling service module, a row storage module and a column storage module;
the request scheduling service module acquires a data processing request of a client and distributes the data processing request to at least one of the row storage module and the column storage module according to attribute information of the data processing request;
the row storage module comprises a row data processing engine, the column storage module comprises a column data processing engine, and the row storage module and the column storage module execute data processing corresponding to the data processing request distributed by the request scheduling service module.
2. The system of claim 1, wherein the number of the line memory modules comprises at least one, and each of the line memory modules corresponds to a service type and a request type in the attribute information of the data processing request.
3. The system of claim 2, wherein the row storage modules are backed up with each other.
4. The system of claim 1, further comprising a consistency management module configured to synchronize the storage data of the row storage module and the column storage module.
5. A data processing method, applied to a data storage system, the method comprising:
acquiring a data processing request of a client;
distributing the data processing request to at least one of a row storage module and a column storage module according to the attribute information of the data processing request by a request scheduling service module;
and executing data processing corresponding to the data processing request distributed by the request scheduling service module based on at least one of the row storage module and the column storage module.
6. The method of claim 5, wherein the scheduling of the data processing request by the service module according to the request allocates the data processing request to at least one of a row storage module and a column storage module according to attribute information of the data processing request, comprising:
searching routing information stored in association with the service type and the request type of the data processing request in the request scheduling service module;
under the condition that the request type is a data modification request, a service module is dispatched based on the request and sends the data processing request to the line storage module according to the routing information;
under the condition that the request type is a simple data analysis request, the data processing request is sent to the row storage module based on the request scheduling service module according to the routing information;
and under the condition that the request type is a complex data analysis request, the data processing request is sent to the column storage module based on the request scheduling service module according to the routing information.
7. The method according to claim 5, wherein the data processing request comprises a data modification request, and accordingly, the performing, based on at least one of the row storage module and the column storage module, the data processing corresponding to the data processing request allocated by the scheduling service module comprises:
analyzing a data modification command of the data processing request at the line storage module;
executing the data modification command according to a line data processing engine of the line storage module, wherein the data modification command at least comprises a data writing command, a data modification command and a data deleting command;
according to a consistency management module, synchronizing log information of the data modification command to other row storage modules and/or the column storage modules;
and feeding back the execution result of the data processing request to the client.
8. The method according to claim 5, wherein the data processing request comprises a data analysis request, and accordingly, the performing, based on at least one of the row storage module and the column storage module, the data processing corresponding to the data processing request allocated by the scheduling service module comprises:
analyzing a data analysis command of the data processing request in the row storage module or the column storage module;
accessing row storage data corresponding to the data analysis command according to a row data processing engine of the row storage module or accessing column storage data corresponding to the data analysis command by a column data processing engine of the column storage module to generate a data analysis result;
and feeding back a data analysis result of the data processing request to the client.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 5-8.
10. A computer-readable storage medium, characterized in that it stores computer instructions for causing a processor to implement the data processing method of any of claims 5-8 when executed.
CN202211407850.4A 2022-11-10 2022-11-10 Data storage system, data processing method, electronic equipment and storage medium Active CN115599790B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211407850.4A CN115599790B (en) 2022-11-10 2022-11-10 Data storage system, data processing method, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211407850.4A CN115599790B (en) 2022-11-10 2022-11-10 Data storage system, data processing method, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115599790A true CN115599790A (en) 2023-01-13
CN115599790B CN115599790B (en) 2024-03-15

Family

ID=84852889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211407850.4A Active CN115599790B (en) 2022-11-10 2022-11-10 Data storage system, data processing method, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115599790B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130166554A1 (en) * 2011-12-22 2013-06-27 Sap Ag Hybrid Database Table Stored as Both Row and Column Store
KR20140098529A (en) * 2013-01-31 2014-08-08 한국전자통신연구원 Apparatus and method for effective simultaneous supporting both olap and oltp which have different data access patterns
CN104715039A (en) * 2015-03-23 2015-06-17 星环信息科技(上海)有限公司 Column-based storage and research method and equipment based on hard disk and internal storage
US20160098425A1 (en) * 2014-10-07 2016-04-07 Frank Brunswig Heterogeneous database processing archetypes for hybrid system
CN106777027A (en) * 2016-12-08 2017-05-31 北京国电通网络技术有限公司 MPP ranks blended data storage device and storage, querying method
CN108616581A (en) * 2018-04-11 2018-10-02 深圳纳实大数据技术有限公司 Data-storage system and method based on OLAP/OLTP mixing applications
CN110019251A (en) * 2019-03-22 2019-07-16 深圳市腾讯计算机***有限公司 A kind of data processing system, method and apparatus
CN110222072A (en) * 2019-06-06 2019-09-10 江苏满运软件科技有限公司 Data Query Platform, method, equipment and storage medium
WO2020160265A1 (en) * 2019-02-02 2020-08-06 Alibaba Group Holding Limited Data storage apparatus, translation apparatus, and database access method
CN111858759A (en) * 2020-07-08 2020-10-30 平凯星辰(北京)科技有限公司 HTAP database based on consensus algorithm
US20210081389A1 (en) * 2019-09-13 2021-03-18 Oracle International Corporation Technique of efficiently, comprehensively and autonomously support native json datatype in rdbms for both oltp & olap
CN114356971A (en) * 2021-12-02 2022-04-15 阿里巴巴(中国)有限公司 Data processing method, device and system
WO2022126839A1 (en) * 2020-12-15 2022-06-23 跬云(上海)信息科技有限公司 Cloud computing-based adaptive storage hierarchy system and method

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130166554A1 (en) * 2011-12-22 2013-06-27 Sap Ag Hybrid Database Table Stored as Both Row and Column Store
KR20140098529A (en) * 2013-01-31 2014-08-08 한국전자통신연구원 Apparatus and method for effective simultaneous supporting both olap and oltp which have different data access patterns
US20160098425A1 (en) * 2014-10-07 2016-04-07 Frank Brunswig Heterogeneous database processing archetypes for hybrid system
CN104715039A (en) * 2015-03-23 2015-06-17 星环信息科技(上海)有限公司 Column-based storage and research method and equipment based on hard disk and internal storage
CN106777027A (en) * 2016-12-08 2017-05-31 北京国电通网络技术有限公司 MPP ranks blended data storage device and storage, querying method
CN108616581A (en) * 2018-04-11 2018-10-02 深圳纳实大数据技术有限公司 Data-storage system and method based on OLAP/OLTP mixing applications
WO2020160265A1 (en) * 2019-02-02 2020-08-06 Alibaba Group Holding Limited Data storage apparatus, translation apparatus, and database access method
CN110019251A (en) * 2019-03-22 2019-07-16 深圳市腾讯计算机***有限公司 A kind of data processing system, method and apparatus
CN110222072A (en) * 2019-06-06 2019-09-10 江苏满运软件科技有限公司 Data Query Platform, method, equipment and storage medium
US20210081389A1 (en) * 2019-09-13 2021-03-18 Oracle International Corporation Technique of efficiently, comprehensively and autonomously support native json datatype in rdbms for both oltp & olap
CN111858759A (en) * 2020-07-08 2020-10-30 平凯星辰(北京)科技有限公司 HTAP database based on consensus algorithm
WO2022126839A1 (en) * 2020-12-15 2022-06-23 跬云(上海)信息科技有限公司 Cloud computing-based adaptive storage hierarchy system and method
CN114356971A (en) * 2021-12-02 2022-04-15 阿里巴巴(中国)有限公司 Data processing method, device and system

Also Published As

Publication number Publication date
CN115599790B (en) 2024-03-15

Similar Documents

Publication Publication Date Title
US11341162B2 (en) Adaptive distribution method for hash operations
CN110196871B (en) Data warehousing method and system
CN111327681A (en) Cloud computing data platform construction method based on Kubernetes
CN109063196B (en) Data processing method and device, electronic equipment and computer readable storage medium
WO2013078583A1 (en) Method and apparatus for optimizing data access, method and apparatus for optimizing data storage
CN104133867A (en) DOT in-fragment secondary index method and DOT in-fragment secondary index system
CN106294695A (en) A kind of implementation method towards the biggest data search engine
CN103646073A (en) Condition query optimizing method based on HBase table
CN103440288A (en) Big data storage method and device
CN105740295B (en) A kind of processing method and processing device of distributed data
CN104731956A (en) Method and system for synchronizing data and related database
CN102262680A (en) Distributed database proxy system based on massive data access requirement
CN103207919A (en) Method and device for quickly inquiring and calculating MangoDB cluster
CN111597160A (en) Distributed database system, distributed data processing method and device
CN111008244A (en) Database synchronization and analysis method and system
US10747773B2 (en) Database management system, computer, and database management method
CN103034650A (en) System and method for processing data
CN115017159A (en) Data processing method and device, storage medium and electronic equipment
CN111125248A (en) Big data storage analysis query system
US10552419B2 (en) Method and system for performing an operation using map reduce
CN113918532A (en) Portrait label aggregation method, electronic device and storage medium
CN103634374A (en) Method and device for processing concurrent access requests
CN110515979B (en) Data query method, device, equipment and storage medium
CN115599790B (en) Data storage system, data processing method, electronic equipment and storage medium
JPWO2016092604A1 (en) Data processing system and data access method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant